Permalink
Browse files

Discovery: a more lenient wait joinThread when stopping

When a node stops, we cancel any ongoing join process. With elastic#8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits.

Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.

Closes elastic#8359
  • Loading branch information...
bleskes committed Nov 6, 2014
1 parent 9f86d25 commit abfbdaf0373596287cd9c4a9f7787683eb42ba27
Showing with 2 additions and 11 deletions.
  1. +2 −11 src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java
@@ -229,6 +229,7 @@ public void onFailure(String source, @org.elasticsearch.common.Nullable Throwabl
@Override
protected void doStop() throws ElasticsearchException {
joinThreadControl.stop();
pingService.stop();
masterFD.stop("zen disco stop");
nodesFD.stop();
@@ -258,7 +259,6 @@ protected void doStop() throws ElasticsearchException {
}
}
}
joinThreadControl.stop();
}
@Override
@@ -1348,16 +1348,7 @@ public void stop() {
running.set(false);
Thread joinThread = currentJoinThread.getAndSet(null);
if (joinThread != null) {
try {
joinThread.interrupt();
} catch (Exception e) {
// ignore
}
try {
joinThread.join(10000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
joinThread.interrupt();
}
}

0 comments on commit abfbdaf

Please sign in to comment.