Check index uuid when merging incoming cluster state into the local one #9541

Closed
wants to merge 2 commits into
from

Projects

None yet

4 participants

@bleskes
Member
bleskes commented Feb 3, 2015

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489

Note that this is done against the 1.4 branch as i want it to go there too.

@bleskes bleskes Discovery: check in index uuid when merging incoming cluster state in…
…to local

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
397e4a2
@bleskes bleskes added the review label Feb 3, 2015
@bleskes bleskes changed the title from Discovery: check in index uuid when merging incoming cluster state into the local one to Discovery: check index uuid when merging incoming cluster state into the local one Feb 3, 2015
@martijnvg martijnvg self-assigned this Feb 3, 2015
@martijnvg martijnvg and 1 other commented on an outdated diff Feb 3, 2015
...lasticsearch/indices/state/RareClusterStateTests.java
allocator.allocateUnassigned(routingAllocation);
}
+
+ @Test
+ @TestLogging(value = "cluster.service:TRACE")
+ public void testDeleteCreateInOneBulk() throws Exception {
+ internalCluster().startNodesAsync(2, ImmutableSettings.builder()
+ .put(DiscoveryModule.DISCOVERY_TYPE_KEY, "zen")
+ .build()).get();
+ assertFalse(client().admin().cluster().prepareHealth().setWaitForNodes("2").get().isTimedOut());
+ prepareCreate("test").setSettings(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, cluster().numDataNodes() - 1).addMapping("type").get();
@martijnvg
martijnvg Feb 3, 2015 Member

maybe use index.auto_expand_replicas instead?

@bleskes
bleskes Feb 3, 2015 Member

yeah can do. Slightly fancier :)

@martijnvg martijnvg commented on the diff Feb 3, 2015
...lasticsearch/indices/state/RareClusterStateTests.java
+ .build()).get();
+ assertFalse(client().admin().cluster().prepareHealth().setWaitForNodes("2").get().isTimedOut());
+ prepareCreate("test").setSettings(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, cluster().numDataNodes() - 1).addMapping("type").get();
+ ensureGreen("test");
+
+ // now that the cluster is stable, remove publishing timeout
+ assertAcked(client().admin().cluster().prepareUpdateSettings().setTransientSettings(ImmutableSettings.builder().put(DiscoverySettings.PUBLISH_TIMEOUT, "0")));
+
+ Set<String> nodes = new HashSet<>(Arrays.asList(internalCluster().getNodeNames()));
+ nodes.remove(internalCluster().getMasterName());
+
+ // block none master node.
+ BlockClusterStateProcessing disruption = new BlockClusterStateProcessing(nodes.iterator().next(), getRandom());
+ internalCluster().setDisruptionScheme(disruption);
+ logger.info("--> indexing a doc");
+ index("test", "type", "1");
@martijnvg
martijnvg Feb 3, 2015 Member

Do we need the index operation here?

@bleskes
bleskes Feb 3, 2015 Member

I wanted to make sure we loose the doc. See the search count ==0 at the end.

@martijnvg
martijnvg Feb 3, 2015 Member

Ok, makes sense.

@martijnvg martijnvg and 1 other commented on an outdated diff Feb 3, 2015
...ava/org/elasticsearch/discovery/zen/ZenDiscovery.java
@@ -821,10 +821,14 @@ public ClusterState execute(ClusterState currentState) {
MetaData.Builder metaDataBuilder = MetaData.builder(updatedState.metaData()).removeAllIndices();
for (IndexMetaData indexMetaData : updatedState.metaData()) {
IndexMetaData currentIndexMetaData = currentState.metaData().index(indexMetaData.index());
- if (currentIndexMetaData == null || currentIndexMetaData.version() != indexMetaData.version()) {
+ if (currentIndexMetaData == null) {
@martijnvg
martijnvg Feb 3, 2015 Member

I would prefer this instead:

if (currentIndexMetaData != null && currentIndexMetaData.isSameUUID(indexMetaData.uuid()) && currentIndexMetaData.version() == indexMetaData.version())  {
   metaDataBuilder.put(currentIndexMetaData, false);
else {
   metaDataBuilder.put(indexMetaData, false);
}

Less code blocks at the price if a longer condition, but this is just my personal style.

@bleskes
bleskes Feb 3, 2015 Member

++. Will change.

@martijnvg
Member

@bleskes Great catch! How did you figured this out? (test failure?) I left a couple of comments/questions, but this looks good to me.

@bleskes
Member
bleskes commented Feb 3, 2015

@martijnvg thx. pushed another commit.

I was diagnosing a customer cluster and saw these kind of messages:

[2015-02-03 11:50:30,078][DEBUG][cluster.action.shard     ] [node_t1] [test][3] ignoring shard started, different index uuid, current IKfsCb3sT7WnfUcfpAcbxg, got [test][3], node[RBwFZ5K6TpuocqmI3dgHdw], [P], s[INITIALIZING], indexUUID [hABpEGPPQOuE_9UhwIa8Rg], reason [master [node_t1][8nttXu8AQb2r0wvtGO95VA][Boazs-Air.local][local[2]]{mode=local} marked shard as initializing, but shard state is [STARTED], mark shard as started]
@martijnvg
Member

@bleskes LGTM!

@kimchy
Member
kimchy commented Feb 3, 2015

LGTM

@bleskes bleskes added a commit that referenced this pull request Feb 3, 2015
@bleskes bleskes Discovery: check index uuid when merging incoming cluster state into …
…local

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
Closes #9541
c4231ce
@bleskes bleskes added a commit that referenced this pull request Feb 3, 2015
@bleskes bleskes Discovery: check index uuid when merging incoming cluster state into …
…local

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
Closes #9541
2294400
@bleskes bleskes added a commit that closed this pull request Feb 3, 2015
@bleskes bleskes Discovery: check index uuid when merging incoming cluster state into …
…local

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
Closes #9541
896e865
@bleskes bleskes closed this in 896e865 Feb 3, 2015
@bleskes bleskes deleted the bleskes:index_create_uuid branch Feb 3, 2015
@clintongormley clintongormley changed the title from Discovery: check index uuid when merging incoming cluster state into the local one to Check index uuid when merging incoming cluster state into the local one Jun 7, 2015
@mute mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
@bleskes bleskes Discovery: check index uuid when merging incoming cluster state into …
…local

In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
Closes #9541
f98a6e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment