Recover broken IndexMetaData as closed #17187

s1monw · 2016-03-18T12:15:35Z

Today if something is wrong with the IndexMetaData we detect it very
late and most of the time if that happens we already allocated the index
and get endless loops and full log files on data-nodes. This change tries
to verify IndexService creattion during initial state recovery on the master
and if the recovery fails the index is imported as closed and won't be allocated
at all.

@bleskes @ywelsch @jasontedor may I have your feedback on the approach

ywelsch · 2016-03-18T12:30:15Z

core/src/main/java/org/elasticsearch/gateway/Gateway.java

-        ObjectHashSet<String> nodesIds = new ObjectHashSet<>(clusterService.state().nodes().masterNodes().keys());
-        logger.trace("performing state recovery from {}", nodesIds);
-        TransportNodesListGatewayMetaState.NodesGatewayMetaState nodesState = listGatewayMetaState.list(nodesIds.toArray(String.class), null).actionGet();
+        String[] nodesIds = new ObjectHashSet<>(clusterService.state().nodes().masterNodes().keys()).toArray(String.class);


we are not going to have duplicates in map keys? Why not directly clusterService.state().nodes().masterNodes().keys().toArray()?

ywelsch · 2016-03-18T12:54:12Z

core/src/main/java/org/elasticsearch/indices/IndicesService.java

+        IndexService service = null;
+        try {
+            // this will also fail if some plugin fails etc. which is nice since we can verify that early
+            service = createIndexService(nodeServicesProvider, metaData, Collections.emptyList());


should we merge mappings as well here to check if they are consistent?

not sure what you mean by that?

I could be wrong (not that familiar with the code in that area) but I think that in-memory data structures for mappings are not created by the createIndex method. These are merged later (see e.g. MetaDataCreateIndexService:325). We could check here as well that all is good on the mapping level.

we actually get a full fledged mapping in the constructor - MetaDataCreateIndexService is different since it's done before we actually create the index so it has to process the default mapping. I think we are ok here.

Sorry, MetaDataCreateIndexService was a bad example. Still, the method MapperService.merge which does mapping validation is (AFAICS) not called by the createIndex method. This means that verifyIndexMetadata does not run the mapping checks in MapperService.merge. We check these however when we run MetaDataIndexUpgradeService.checkMappingsCompatibility which is called by MetaDataIndexUpgradeService.upgradeIndexMetaData when we start a node.

ywelsch · 2016-03-18T13:13:08Z

Left some comments but I really like the idea here 😄 . My main concern is how to make sure that verifyIndexMetadata does not make any disk writes or messes with existing caches etc. I'll have to have another closer look to feel confident that nothing bad happens there.
I also wonder whether we can apply the same approach when importing dangling indices (LocalAllocateDangledIndices).

s1monw · 2016-03-18T13:24:54Z

I also wonder whether we can apply the same approach when importing dangling indices (LocalAllocateDangledIndices).

agreed I think we can - lets do a followup

Left some comments but I really like the idea here 😄 . My main concern is how to make sure that verifyIndexMetadata does not make any disk writes or messes with existing caches etc. I'll have to have another closer look to feel confident that nothing bad happens there.

it's hard to assert to be honest but form architecture perspective I made all the global impacting things listeners that we do not pass in on the verify method so I think we are ok?

s1monw · 2016-03-21T09:06:03Z

@ywelsch @bleskes I pushed another change that also prevents the index from being opened if we can't create an index service.

s1monw · 2016-03-21T09:24:33Z

@ywelsch I looked into your concern and refactored the solution such taht we are creating private cache instances for the verification IndexService. This should prevent any modifications. All other datastructures are immutable.

In 5.0 we don't allow index settings to be specified on the node level ie. in yaml files or via commandline argument. This can cause problems during upgrade if this was used extensively. For instance if analyzers where specified on a node level this might cause the index to be closed when imported (see elastic#17187). In such a case all indices relying on this must be updated via `PUT /${index}/_settings`. Yet, this API has slightly different semantics since it overrides existing settings. To make this less painful this change adds a `preserve_existing` parameter on that API to ensure we have the same semantics as if the setting was applied on the node level. This change also adds a better error message and a change to the migration guide to ensure upgrades are smooth if index settings are specified on the node level. If a index setting is detected this change fails the node startup and prints a message like this: ``` ************************************************************************************* Found index level settings on node level configuration. Since elasticsearch 5.x index level settings can NOT be set on the nodes configuration like the elasticsearch.yaml, in system properties or command line arguments.In order to upgrade all indices the settings must be updated via the /${index}/_settings API. Unless all settings are dynamic all indices must be closed in order to apply the upgradeIndices created in the future should use index templates to set default values. Please ensure all required values are updated on all indices by executing: curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{ "index.number_of_shards" : "1", "index.query.default_field" : "main_field", "index.translog.durability" : "async", "index.ttl.disable_purge" : "true" }' ************************************************************************************* ```

bleskes · 2016-03-21T19:17:17Z

core/src/main/java/org/elasticsearch/indices/IndicesService.java

+        final Index index = indexMetaData.getIndex();
+        final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
+        final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
+        logger.debug("creating Index [{}], shards [{}]/[{}{}]",


can we pass a reason to this method and mention it here? I always to scroll to find out whether this is a "true" index or just one that was created when importing/creating one.

bleskes · 2016-03-21T19:21:09Z

I really like the change, but I'm afraid it's not enough, for example, when referring to an analyzer that used to be in the node settings (I tested it). The reason is that the mapper service doesn't instantiate anything until the merge method on it is called. This is imo something we should change (prepare everything in the constructor) but we don't have to do it in this PR. This patch works for me:

diff --git a/core/src/main/java/org/elasticsearch/indices/IndicesService.java b/core/src/main/java/org/elasticsearch/indices/IndicesService.java
index ca75d30..bbdd693 100644
--- a/core/src/main/java/org/elasticsearch/indices/IndicesService.java
+++ b/core/src/main/java/org/elasticsearch/indices/IndicesService.java
@@ -19,6 +19,7 @@

 package org.elasticsearch.indices;

+import com.carrotsearch.hppc.cursors.ObjectCursor;
 import org.apache.lucene.index.DirectoryReader;
 import org.apache.lucene.store.LockObtainFailedException;
 import org.apache.lucene.util.CollectionUtil;
@@ -34,6 +35,7 @@ import org.elasticsearch.cluster.ClusterService;
 import org.elasticsearch.cluster.ClusterState;
 import org.elasticsearch.cluster.metadata.IndexMetaData;
 import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
+import org.elasticsearch.cluster.metadata.MappingMetaData;
 import org.elasticsearch.common.Nullable;
 import org.elasticsearch.common.breaker.CircuitBreaker;
 import org.elasticsearch.common.bytes.BytesReference;
@@ -66,6 +68,7 @@ import org.elasticsearch.index.fielddata.FieldDataType;
 import org.elasticsearch.index.fielddata.IndexFieldDataCache;
 import org.elasticsearch.index.flush.FlushStats;
 import org.elasticsearch.index.get.GetStats;
+import org.elasticsearch.index.mapper.MapperService;
 import org.elasticsearch.index.merge.MergeStats;
 import org.elasticsearch.index.recovery.RecoveryStats;
 import org.elasticsearch.index.refresh.RefreshStats;
@@ -398,6 +401,12 @@ public class IndicesService extends AbstractLifecycleComponent<IndicesService> i
             closeables.add(indicesQueryCache);
             // this will also fail if some plugin fails etc. which is nice since we can verify that early
             IndexService service = createIndexService(nodeServicesProvider, metaData, indicesQueryCache, indicesFieldDataCache, Collections.emptyList());
+            for (ObjectCursor<MappingMetaData> typeMapping : metaData.getMappings().values()) {
+                // don't apply the default mapping, it has been applied when the mapping was created
+                service.mapperService().merge(typeMapping.value.type(), typeMapping.value.source(),
+                        MapperService.MergeReason.MAPPING_RECOVERY, true);
+            }
+
             closeables.add(() -> service.close("metadata verification", false));
         } finally {
             IOUtils.close(closeables);

s1monw · 2016-03-21T19:24:31Z

I was going to do that exact same thing since it would allow us to remove some places where we create an index for only that purpose. I can put this into the patch and add a test. Followups can clean up other places and we may move stuff into ctors.

s1monw · 2016-03-21T19:41:11Z

@bleskes pushed an update with a new test

bleskes · 2016-03-21T19:50:35Z

core/src/main/java/org/elasticsearch/indices/IndicesService.java

        final Index index = indexMetaData.getIndex();
        final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
        final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
-        logger.debug("creating Index [{}], shards [{}]/[{}{}]",
+        logger.debug("creating Index [{}], shards [{}]/[{}{}] - reason [{}]",


bleskes · 2016-03-21T19:54:39Z

LGTM. Thanks

Today if something is wrong with the IndexMetaData we detect it very late and most of the time if that happens we already allocated the index and get endless loops and full log files on data-nodes. This change tries to verify IndexService creattion during initial state recovery on the master and if the recovery fails the index is imported as `closed` and won't be allocated at all. Closes elastic#17187

In elastic#17187, we upgrade index state after upgrading index folder structure. As we don't have to write the upgraded state in the old index folder structure, we can cleanup how we write upgraded index state.

hadeslion · 2016-11-08T09:14:10Z

this verifyIndexMetadata spend too much time if a node has many indices and each index has many type.
It cause a long time waiting before the index recovery when a node restart.
Can you add any config option to skip this verify?

bleskes · 2016-11-08T13:35:22Z

@hadeslion there is no such settings as this is important. If it takes so long you will also run into problems elsewhere as these things should be parsable. Can you give some numbers about how many indices and types you have? how long it takes etc? Is this during node startup or later on?

hadeslion · 2016-11-09T02:28:41Z

@bleskes I have 100 indices. Some of them have 400-800 types. these indices just take 10-20s for each. The others have 1200-1600 types each, and it took 1-2minutes.
I run these indices on a single node cluster. This happens when I restart the node. according to the logs, it is during the gateway state recovery. During this metadata verify, all api request return SERVICE_UNAVAILABLE/1/state not recovered

With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.

s1monw added >enhancement review resiliency v5.0.0-alpha1 labels Mar 18, 2016

ywelsch reviewed Mar 18, 2016
View reviewed changes

clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Mar 18, 2016

ywelsch reviewed Mar 18, 2016
View reviewed changes

s1monw mentioned this pull request Mar 21, 2016

Improve upgrade experience of node level index settings #17223

Merged

bleskes reviewed Mar 21, 2016
View reviewed changes

s1monw force-pushed the recover_broken_as_closed branch from 1c196ce to c27f330 Compare March 21, 2016 19:40

bleskes reviewed Mar 21, 2016
View reviewed changes

s1monw force-pushed the recover_broken_as_closed branch from c27f330 to 8127a06 Compare March 21, 2016 21:52

s1monw merged commit 8127a06 into elastic:master Mar 21, 2016

areek mentioned this pull request Mar 21, 2016

Cleanup writing upgraded index state #17232

Merged

clintongormley mentioned this pull request Mar 25, 2016

Response for create index with config {"refresh_interval": "300"} is misleading #17344

Closed

ywelsch mentioned this pull request Feb 28, 2019

Do not close bad indices on startup #39500

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover broken IndexMetaData as closed #17187

Recover broken IndexMetaData as closed #17187

s1monw commented Mar 18, 2016

ywelsch Mar 18, 2016

ywelsch Mar 18, 2016

s1monw Mar 18, 2016

ywelsch Mar 18, 2016

s1monw Mar 18, 2016

ywelsch Mar 18, 2016

ywelsch commented Mar 18, 2016

s1monw commented Mar 18, 2016

s1monw commented Mar 21, 2016

s1monw commented Mar 21, 2016

bleskes Mar 21, 2016

bleskes commented Mar 21, 2016

s1monw commented Mar 21, 2016

s1monw commented Mar 21, 2016

bleskes Mar 21, 2016

bleskes commented Mar 21, 2016

hadeslion commented Nov 8, 2016

bleskes commented Nov 8, 2016

hadeslion commented Nov 9, 2016

Recover broken IndexMetaData as closed #17187

Recover broken IndexMetaData as closed #17187

Conversation

s1monw commented Mar 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch commented Mar 18, 2016

s1monw commented Mar 18, 2016

s1monw commented Mar 21, 2016

s1monw commented Mar 21, 2016

Choose a reason for hiding this comment

bleskes commented Mar 21, 2016

s1monw commented Mar 21, 2016

s1monw commented Mar 21, 2016

Choose a reason for hiding this comment

bleskes commented Mar 21, 2016

hadeslion commented Nov 8, 2016

bleskes commented Nov 8, 2016

hadeslion commented Nov 9, 2016