Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of MANAGEMENT threadpool on small node #71171

Conversation

DaveCTurner
Copy link
Contributor

Today by default the MANAGEMENT threadpool always permits 5 threads
even if the node has a single CPU, which unfairly prioritises management
activities on small nodes. With this commit we limit the size of this
threadpool to the number of processors if less than 5.

Relates #70435

Today by default the `MANAGEMENT` threadpool always permits 5 threads
even if the node has a single CPU, which unfairly prioritises management
activities on small nodes. With this commit we limit the size of this
threadpool to the number of processors if less than 5.

Relates elastic#70435
@DaveCTurner DaveCTurner added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.13.0 labels Apr 1, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Apr 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner
Copy link
Contributor Author

Darn it. This change results in deadlocks in the org.elasticsearch.http.*RestCancellationIT tests for stats actions, because they rely on blocking the responding thread but they need another management thread for checking whether the tasks have started.

@DaveCTurner
Copy link
Contributor Author

Darn it some more. TransportCancelTasksAction runs on the MANAGEMENT thread too. That seems like a bug, we might not be able to cancel remote management tasks if the remote node is too busy running them all. However it's not obvious we can move that to SAME, cancelling some tasks looks to involve nontrivial work.

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Apr 1, 2021

@original-brownbear this has moved enough since your LGTM that it's worth another look.

@elasticmachine test this please (just for another CI run in case it shakes out anything else before merging)

@DaveCTurner
Copy link
Contributor Author

once more with feeling ... @elasticmachine test this please

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

// The delete by query request will be executed successfully because the block will be released
assertThat(deleteByQuery().source("test").filter(QueryBuilders.matchAllQuery()).refresh(true).get(),
matcher().deleted(docs));
// Fire off the delete-by-query first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this simplification, so just out of curiosity, did not changing this cause a failure? I need a hint to realize how I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the threadPool.schedule(..., ThreadPool.Names.MANAGEMENT) made a blocking call on the (sole) management thread that could not complete because the refresh needs to make stats calls which also need management threads. I did a targeted fix in b8a20dd but then decided that the whole thing could be simplified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it now.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@DaveCTurner DaveCTurner merged commit b690798 into elastic:master Apr 6, 2021
@DaveCTurner DaveCTurner deleted the 2021-04-01-smaller-management-threadpool branch April 6, 2021 11:58
DaveCTurner added a commit that referenced this pull request Apr 6, 2021
Today by default the `MANAGEMENT` threadpool always permits 5 threads
even if the node has a single CPU, which unfairly prioritises management
activities on small nodes. With this commit we limit the size of this
threadpool to the number of processors if less than 5.

Relates #70435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement Team:Distributed Meta label for distributed team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants