Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportBroadcastByNodeAction does O(#shards) work on transport worker thread #97914

Closed
3 tasks
Tracked by #77466
DaveCTurner opened this issue Jul 25, 2023 · 2 comments · Fixed by #97920
Closed
3 tasks
Tracked by #77466

TransportBroadcastByNodeAction does O(#shards) work on transport worker thread #97914

DaveCTurner opened this issue Jul 25, 2023 · 2 comments · Fixed by #97920
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jul 25, 2023

Three interrelated issues here:

  • TransportBroadcastByNodeAction derivatives are typically executed via NodeClient which bypasses the TransportService and therefore does not respect (or even know about) the executor parameter which would fork. This means that the coordination work, including grouping all the shards by node, happens on the calling thread. When executed from the REST layer, that's a transport worker. This is NodeClient executes transport actions without forking #97916, but until that's fixed we need a workaround.

  • The node-level responses are deserialised and processed on the receiving transport worker too.

  • Several of these actions use SAME for their executor, bypassing the forking that does exist today.

Relates #77466

@DaveCTurner DaveCTurner added >bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Jul 25, 2023
@DaveCTurner
Copy link
Contributor Author

I caught this when looking at a large (90k+ shards) cluster which would occasionally log warnings about slow execution of _cat/shards and _stats when under load.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 25, 2023
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jul 25, 2023
`TransportBroadcastByNodeAction` derivatives do work which scales as the
number of shards in the cluster, both during coordination and when
processing responses. We must therefore not do this work on
`transport_worker` threads.

Closes elastic#97914
elasticsearchmachine pushed a commit that referenced this issue Jul 27, 2023
`TransportBroadcastByNodeAction` derivatives do work which scales as the
number of shards in the cluster, both during coordination and when
processing responses. We must therefore not do this work on
`transport_worker` threads.

Closes #97914
felixbarny pushed a commit to felixbarny/elasticsearch that referenced this issue Aug 3, 2023
…tic#97920)

`TransportBroadcastByNodeAction` derivatives do work which scales as the
number of shards in the cluster, both during coordination and when
processing responses. We must therefore not do this work on
`transport_worker` threads.

Closes elastic#97914
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. Team:Distributed Meta label for distributed team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants