Skip to content

Commit

Permalink
Fully initialize cluster state on ephemeral nodes (#71466)
Browse files Browse the repository at this point in the history
Today ephemeral nodes (i.e. those that aren't master-eligible and don't
contain data) have an initial "persisted" state which is very empty. In
particular it doesn't contain any cluster blocks or even the local node.
This violates some assumptions elsewhere that the local node is always
included in the cluster state, and breaks things like the
`ClusterFormationFailureHelper`:

    [DEBUG][o.e.c.c.ClusterFormationFailureHelper] unexpected exception scheduling cluster formation warning
      java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.node.DiscoveryNode.isMasterNode()" because the return value of "org.elasticsearch.cluster.node.DiscoveryNodes.getLocalNode()" is null
        at org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper$ClusterFormationState.getDescription(ClusterFormationFailureHelper.java:147) ~[elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper$WarningScheduler$1.doRun(ClusterFormationFailureHelper.java:92) [elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) [elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.11.0.jar:7.11.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

This commit addresses this by initializing the persisted state properly.
  • Loading branch information
DaveCTurner committed Apr 8, 2021
1 parent 875277f commit 15b110d
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -1039,6 +1039,7 @@ ClusterState getStateForMasterService() {
// expose last accepted cluster state as base state upon which the master service
// speculatively calculates the next cluster state update
final ClusterState clusterState = coordinationState.get().getLastAcceptedState();
assert clusterState.nodes().getLocalNode() != null;
if (mode != Mode.LEADER || clusterState.term() != getCurrentTerm()) {
// the master service checks if the local node is the master node in order to fail execution of the state update early
return clusterStateWithNoMasterBlock(clusterState);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,8 @@ public void start(Settings settings, TransportService transportService, ClusterS
}
} else {
final long currentTerm = 0L;
final ClusterState clusterState = ClusterState.builder(ClusterName.CLUSTER_NAME_SETTING.get(settings)).build();
final ClusterState clusterState = prepareInitialClusterState(transportService, clusterService,
ClusterState.builder(ClusterName.CLUSTER_NAME_SETTING.get(settings)).build());
if (persistedClusterStateService.getDataPaths().length > 0) {
// write empty cluster state just so that we have a persistent node id. There is no need to write out global metadata with
// cluster uuid as coordinating-only nodes do not snap into a cluster as they carry no state
Expand Down

0 comments on commit 15b110d

Please sign in to comment.