-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Get Snapshots Serialization to Management Pool #83215
Move Get Snapshots Serialization to Management Pool #83215
Conversation
It's in the title. For large index counts generating the response, both for transport as well as REST layer gets quite expensive. Better generate it off the transport threads.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @original-brownbear, I've created a changelog YAML for you. |
@@ -86,7 +86,11 @@ public TransportGetSnapshotsAction( | |||
GetSnapshotsRequest::new, | |||
indexNameExpressionResolver, | |||
GetSnapshotsResponse::new, | |||
ThreadPool.Names.SAME | |||
ThreadPool.Names.MANAGEMENT // Execute this on the management pool because creating the response can become fairly expensive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this being a little controversial given that we generate the response on the meta pool if verbose=true. But in those cases I'd argue that we will probably be bound by the IO before the compute such that we never go 100% CPU on all pool threads simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable with the usual concerns about piling more stuff into the MANAGEMENT
threadpool, especially on the master, and losing the natural backpressure mechanism of blocking a transport thread. Still, these requests are cancellable so the backlog should eventually reach equilibrium without breaking anything.
We could of course move the serialization of the verbose=true
case over to a MANAGEMENT
thread too. I think I'd want there to be a limit on the number of in-flight get-snapshots requests before doing that tho.
Especially on master we don't want that backpressure mechanism IMO :) much worse for its transport threads to slow down.
Not so important in this case I think. The creation of This one isn't as bad as cluster stats or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks David! |
It's in the title. For large index counts generating the response, both for transport as well as REST layer gets quite expensive. Better generate it off the transport threads.
It's in the title. For large index counts generating the response, both for transport as well as REST layer gets quite expensive. Better generate it off the transport threads.
It's in the title. For large index counts generating the response,
both for transport as well as REST layer gets quite expensive and
I've seen a couple of clusters slow-log for verbose=false requests or on the REST layer.
(also it's easy to reproduce in benchmarks when the repo contains 50k indices)
Better generate it off the transport threads.
relates #77466