New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't wait joinThread when stopping #8359

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@bleskes
Member

bleskes commented Nov 6, 2014

When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. However, the joining thread is part of a thread pool and will not stop until the thread pool is shutdown.

Another issue raised by the unneeded wait is that when we shutdown, we may ping ourselves - which results in an ugly warn level log. We now log all remote exception during pings at a debug level.

Discovery: a more lenient wait joinThread when stopping
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits.

Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.
@martijnvg

This comment has been minimized.

Show comment
Hide comment
@martijnvg

martijnvg Nov 6, 2014

Member

LGTM

Member

martijnvg commented Nov 6, 2014

LGTM

@bleskes bleskes closed this in 83d9dab Nov 6, 2014

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Nov 6, 2014

Member

pull request took the wrong approach. I reverted it from master and re-opening.

Member

bleskes commented Nov 6, 2014

pull request took the wrong approach. I reverted it from master and re-opening.

@bleskes bleskes reopened this Nov 6, 2014

@bleskes bleskes changed the title from Discovery: a more lenient wait joinThread when stopping to Discovery: don't wait joinThread when stopping Nov 6, 2014

@martijnvg

This comment has been minimized.

Show comment
Hide comment
@martijnvg

martijnvg Nov 6, 2014

Member

LGTM

Member

martijnvg commented Nov 6, 2014

LGTM

@bleskes bleskes closed this in 9192219 Nov 7, 2014

bleskes added a commit that referenced this pull request Nov 7, 2014

Discovery: don't wait joinThread when stopping
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. However, the joining thread is part of a thread pool and will not stop until the thread pool is shutdown.

Another issue raised by the unneeded wait is that when we shutdown, we may ping ourselves - which results in an ugly warn level log. We now log all remote exception during pings at a debug level.

Closes #8359
@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Dec 9, 2014

Contributor

@bleskes should we push this to 1.4 too? We just timed out on a node here:

http://build-us-00.elasticsearch.org/job/es_core_14_suse/144/

Contributor

s1monw commented Dec 9, 2014

@bleskes should we push this to 1.4 too? We just timed out on a node here:

http://build-us-00.elasticsearch.org/job/es_core_14_suse/144/

bleskes added a commit that referenced this pull request Dec 11, 2014

Discovery: a more lenient wait joinThread when stopping
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits.

Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.

Closes #8359

@bleskes bleskes added the v1.4.2 label Dec 11, 2014

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes
Member

bleskes commented Dec 11, 2014

@s1monw done.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Dec 11, 2014

Contributor

thx

Contributor

s1monw commented Dec 11, 2014

thx

@clintongormley clintongormley changed the title from Discovery: don't wait joinThread when stopping to Discovery: Don't wait joinThread when stopping Dec 16, 2014

@clintongormley clintongormley removed the review label Mar 19, 2015

@clintongormley clintongormley changed the title from Discovery: Don't wait joinThread when stopping to Don't wait joinThread when stopping Jun 6, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Discovery: a more lenient wait joinThread when stopping
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits.

Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.

Closes #8359
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment