Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover broken IndexMetaData as closed #17187

Merged
merged 1 commit into from
Mar 21, 2016

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Mar 18, 2016

Today if something is wrong with the IndexMetaData we detect it very
late and most of the time if that happens we already allocated the index
and get endless loops and full log files on data-nodes. This change tries
to verify IndexService creattion during initial state recovery on the master
and if the recovery fails the index is imported as closed and won't be allocated
at all.

@bleskes @ywelsch @jasontedor may I have your feedback on the approach

ObjectHashSet<String> nodesIds = new ObjectHashSet<>(clusterService.state().nodes().masterNodes().keys());
logger.trace("performing state recovery from {}", nodesIds);
TransportNodesListGatewayMetaState.NodesGatewayMetaState nodesState = listGatewayMetaState.list(nodesIds.toArray(String.class), null).actionGet();
String[] nodesIds = new ObjectHashSet<>(clusterService.state().nodes().masterNodes().keys()).toArray(String.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not going to have duplicates in map keys? Why not directly clusterService.state().nodes().masterNodes().keys().toArray()?

@clintongormley clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Mar 18, 2016
IndexService service = null;
try {
// this will also fail if some plugin fails etc. which is nice since we can verify that early
service = createIndexService(nodeServicesProvider, metaData, Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we merge mappings as well here to check if they are consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong (not that familiar with the code in that area) but I think that in-memory data structures for mappings are not created by the createIndex method. These are merged later (see e.g. MetaDataCreateIndexService:325). We could check here as well that all is good on the mapping level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually get a full fledged mapping in the constructor - MetaDataCreateIndexService is different since it's done before we actually create the index so it has to process the default mapping. I think we are ok here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, MetaDataCreateIndexService was a bad example. Still, the method MapperService.merge which does mapping validation is (AFAICS) not called by the createIndex method. This means that verifyIndexMetadata does not run the mapping checks in MapperService.merge. We check these however when we run MetaDataIndexUpgradeService.checkMappingsCompatibility which is called by MetaDataIndexUpgradeService.upgradeIndexMetaData when we start a node.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 18, 2016

Left some comments but I really like the idea here 😄 . My main concern is how to make sure that verifyIndexMetadata does not make any disk writes or messes with existing caches etc. I'll have to have another closer look to feel confident that nothing bad happens there.
I also wonder whether we can apply the same approach when importing dangling indices (LocalAllocateDangledIndices).

@s1monw
Copy link
Contributor Author

s1monw commented Mar 18, 2016

I also wonder whether we can apply the same approach when importing dangling indices (LocalAllocateDangledIndices).

agreed I think we can - lets do a followup

Left some comments but I really like the idea here 😄 . My main concern is how to make sure that verifyIndexMetadata does not make any disk writes or messes with existing caches etc. I'll have to have another closer look to feel confident that nothing bad happens there.

it's hard to assert to be honest but form architecture perspective I made all the global impacting things listeners that we do not pass in on the verify method so I think we are ok?

@s1monw
Copy link
Contributor Author

s1monw commented Mar 21, 2016

@ywelsch @bleskes I pushed another change that also prevents the index from being opened if we can't create an index service.

@s1monw
Copy link
Contributor Author

s1monw commented Mar 21, 2016

@ywelsch I looked into your concern and refactored the solution such taht we are creating private cache instances for the verification IndexService. This should prevent any modifications. All other datastructures are immutable.

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Mar 21, 2016
In 5.0 we don't allow index settings to be specified on the node level ie.
in yaml files or via commandline argument. This can cause problems during
upgrade if this was used extensively. For instance if analyzers where
specified on a node level this might cause the index to be closed when
imported (see elastic#17187). In such a case all indices relying on this
must be updated via `PUT /${index}/_settings`. Yet, this API has slightly
different semantics since it overrides existing settings. To make this less
painful this change adds a `preserve_existing` parameter on that API to ensure
we have the same semantics as if the setting was applied on the node level.

This change also adds a better error message and a change to the migration guide
to ensure upgrades are smooth if index settings are specified on the node level.

If a index setting is detected this change fails the node startup and prints a message
like this:
```
*************************************************************************************
Found index level settings on node level configuration.

Since elasticsearch 5.x index level settings can NOT be set on the nodes
configuration like the elasticsearch.yaml, in system properties or command line
arguments.In order to upgrade all indices the settings must be updated via the
/${index}/_settings API. Unless all settings are dynamic all indices must be closed
in order to apply the upgradeIndices created in the future should use index templates
to set default values.

Please ensure all required values are updated on all indices by executing:

curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  "index.number_of_shards" : "1",
  "index.query.default_field" : "main_field",
  "index.translog.durability" : "async",
  "index.ttl.disable_purge" : "true"
}'
*************************************************************************************
```
final Index index = indexMetaData.getIndex();
final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
logger.debug("creating Index [{}], shards [{}]/[{}{}]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass a reason to this method and mention it here? I always to scroll to find out whether this is a "true" index or just one that was created when importing/creating one.

@bleskes
Copy link
Contributor

bleskes commented Mar 21, 2016

I really like the change, but I'm afraid it's not enough, for example, when referring to an analyzer that used to be in the node settings (I tested it). The reason is that the mapper service doesn't instantiate anything until the merge method on it is called. This is imo something we should change (prepare everything in the constructor) but we don't have to do it in this PR. This patch works for me:

diff --git a/core/src/main/java/org/elasticsearch/indices/IndicesService.java b/core/src/main/java/org/elasticsearch/indices/IndicesService.java
index ca75d30..bbdd693 100644
--- a/core/src/main/java/org/elasticsearch/indices/IndicesService.java
+++ b/core/src/main/java/org/elasticsearch/indices/IndicesService.java
@@ -19,6 +19,7 @@

 package org.elasticsearch.indices;

+import com.carrotsearch.hppc.cursors.ObjectCursor;
 import org.apache.lucene.index.DirectoryReader;
 import org.apache.lucene.store.LockObtainFailedException;
 import org.apache.lucene.util.CollectionUtil;
@@ -34,6 +35,7 @@ import org.elasticsearch.cluster.ClusterService;
 import org.elasticsearch.cluster.ClusterState;
 import org.elasticsearch.cluster.metadata.IndexMetaData;
 import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
+import org.elasticsearch.cluster.metadata.MappingMetaData;
 import org.elasticsearch.common.Nullable;
 import org.elasticsearch.common.breaker.CircuitBreaker;
 import org.elasticsearch.common.bytes.BytesReference;
@@ -66,6 +68,7 @@ import org.elasticsearch.index.fielddata.FieldDataType;
 import org.elasticsearch.index.fielddata.IndexFieldDataCache;
 import org.elasticsearch.index.flush.FlushStats;
 import org.elasticsearch.index.get.GetStats;
+import org.elasticsearch.index.mapper.MapperService;
 import org.elasticsearch.index.merge.MergeStats;
 import org.elasticsearch.index.recovery.RecoveryStats;
 import org.elasticsearch.index.refresh.RefreshStats;
@@ -398,6 +401,12 @@ public class IndicesService extends AbstractLifecycleComponent<IndicesService> i
             closeables.add(indicesQueryCache);
             // this will also fail if some plugin fails etc. which is nice since we can verify that early
             IndexService service = createIndexService(nodeServicesProvider, metaData, indicesQueryCache, indicesFieldDataCache, Collections.emptyList());
+            for (ObjectCursor<MappingMetaData> typeMapping : metaData.getMappings().values()) {
+                // don't apply the default mapping, it has been applied when the mapping was created
+                service.mapperService().merge(typeMapping.value.type(), typeMapping.value.source(),
+                        MapperService.MergeReason.MAPPING_RECOVERY, true);
+            }
+
             closeables.add(() -> service.close("metadata verification", false));
         } finally {
             IOUtils.close(closeables);

@s1monw
Copy link
Contributor Author

s1monw commented Mar 21, 2016

I was going to do that exact same thing since it would allow us to remove some places where we create an index for only that purpose. I can put this into the patch and add a test. Followups can clean up other places and we may move stuff into ctors.

@s1monw
Copy link
Contributor Author

s1monw commented Mar 21, 2016

@bleskes pushed an update with a new test

final Index index = indexMetaData.getIndex();
final Predicate<String> indexNameMatcher = (indexExpression) -> indexNameExpressionResolver.matchesIndex(index.getName(), indexExpression, clusterService.state());
final IndexSettings idxSettings = new IndexSettings(indexMetaData, this.settings, indexNameMatcher, indexScopeSetting);
logger.debug("creating Index [{}], shards [{}]/[{}{}]",
logger.debug("creating Index [{}], shards [{}]/[{}{}] - reason [{}]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx :)

@bleskes
Copy link
Contributor

bleskes commented Mar 21, 2016

LGTM. Thanks

Today if something is wrong with the IndexMetaData we detect it very
late and most of the time if that happens we already allocated the index
and get endless loops and full log files on data-nodes. This change tries
to verify IndexService creattion during initial state recovery on the master
and if the recovery fails the index is imported as `closed` and won't be allocated
at all.

Closes elastic#17187
@s1monw s1monw merged commit 8127a06 into elastic:master Mar 21, 2016
areek added a commit to areek/elasticsearch that referenced this pull request Mar 21, 2016
In elastic#17187, we upgrade index state after upgrading
index folder structure. As we don't have to write
the upgraded state in the old index folder structure,
we can cleanup how we write upgraded index state.
@hadeslion
Copy link

this verifyIndexMetadata spend too much time if a node has many indices and each index has many type.
It cause a long time waiting before the index recovery when a node restart.
Can you add any config option to skip this verify?

@bleskes
Copy link
Contributor

bleskes commented Nov 8, 2016

@hadeslion there is no such settings as this is important. If it takes so long you will also run into problems elsewhere as these things should be parsable. Can you give some numbers about how many indices and types you have? how long it takes etc? Is this during node startup or later on?

@hadeslion
Copy link

@bleskes I have 100 indices. Some of them have 400-800 types. these indices just take 10-20s for each. The others have 1200-1600 types each, and it took 1-2minutes.
I run these indices on a single node cluster. This happens when I restart the node. according to the logs, it is during the gateway state recovery. During this metadata verify, all api request return SERVICE_UNAVAILABLE/1/state not recovered

ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement resiliency v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants