Limit retries of failed allocations per index #18467

s1monw · 2016-05-19T15:26:07Z

Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
index.allocation.max_retry and it's default is set to 5. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to 0 once the shards
has been started.

Relates to #18417

Today if a shard fails during initialization phase due to misconfiguration, broken disks, missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize or rather allocate that shard. Yet, in the worst case scenario this ends in an endless allocation loop. To prevent this loop and all it's sideeffects like spamming log files over and over again this commit adds an allocation decider that stops allocating a shard that failed more than N times in a row to allocate. The number or retries can be configured via `index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated shards with less failures than the number set per index will be allowed to allocate again. Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards has been started. Relates to elastic#18417

s1monw · 2016-05-19T15:27:21Z

@ywelsch can you take a look?
@clintongormley what do you think how should we document that and where?

dakrone · 2016-05-19T15:38:18Z

core/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

        this.reason = reason;
        this.unassignedTimeMillis = unassignedTimeMillis;
        this.unassignedTimeNanos = unassignedTimeNanos;
        this.lastComputedLeftDelayNanos = 0L;
        this.message = message;
        this.failure = failure;
+        this.failedAllocations = failedAllocations;
+        assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:
+            "failedAllocations: " + 0 + " for reason " + reason;


This 0 is hardcoded, I think this was supposed to be failedAllocation instead

dakrone · 2016-05-19T15:43:24Z

I know you didn't ask for my review, but I left some comments regardless :)

bleskes · 2016-05-19T15:47:40Z

...ain/java/org/elasticsearch/cluster/routing/allocation/decider/MaxRetryAllocationDecider.java

+            int maxRetry = SETTING_ALLOCATION_MAX_RETRY.get(indexSafe.getSettings());
+            if (unassignedInfo.getNumFailedAllocations() >= maxRetry) {
+                return allocation.decision(Decision.NO, NAME, "shard has already failed allocating ["
+                    + unassignedInfo.getNumFailedAllocations() + "] times");


Will it be nice to show the last failure here as well? this will help explain how we got here.

ywelsch · 2016-05-19T15:49:59Z

core/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

        this.reason = reason;
        this.unassignedTimeMillis = unassignedTimeMillis;
        this.unassignedTimeNanos = unassignedTimeNanos;
        this.lastComputedLeftDelayNanos = 0L;
        this.message = message;
        this.failure = failure;
+        this.failedAllocations = failedAllocations;
+        assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:


just (assert failedAllocations > 0) == (reason == Reason.ALLOCATION_FAILED)?

* append unassigned info to allocaiton decision.

ywelsch · 2016-05-19T15:50:32Z

core/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

    }

    public UnassignedInfo readFrom(StreamInput in) throws IOException {
        return new UnassignedInfo(in);
    }

    /**
+     * Retruns the number of previously failed allocations of this shard.
+     */
+    public int getNumFailedAllocations() {return failedAllocations;}


some newlines are ok here :-)

ywelsch · 2016-05-19T15:57:07Z

...ain/java/org/elasticsearch/cluster/routing/allocation/decider/MaxRetryAllocationDecider.java

+    public Decision canAllocate(ShardRouting shardRouting, RoutingAllocation allocation) {
+        UnassignedInfo unassignedInfo = shardRouting.unassignedInfo();
+        if (unassignedInfo != null && unassignedInfo.getNumFailedAllocations() > 0) {
+            IndexMetaData indexSafe = allocation.metaData().getIndexSafe(shardRouting.index());


just call this variable indexMetaData?

bleskes · 2016-05-19T15:59:16Z

LGTM. Thx @s1monw

ywelsch · 2016-05-19T16:00:40Z

...c/test/java/org/elasticsearch/cluster/routing/allocation/MaxRetryAllocationDeciderTests.java

+        assertEquals(routingTable.index("idx").shards().size(), 1);
+        assertEquals(routingTable.index("idx").shard(0).shards().get(0).state(), INITIALIZING);
+        // now fail it 4 times - 5 retries is default
+        for (int i = 0; i < 4; i++) {


you could parameterize the test on the number of retries. alternatively I would suggest using the setting SETTING_ALLOCATION_MAX_RETRY.get(settings) explicitly here instead of hardcoded 4.

ywelsch · 2016-05-19T16:12:07Z

Left minor comments but LGTM o.w.
For the docs, I wonder if we should put some words on how a sysadmin is supposed to get this shard assigned again after fixing the issue? Closing the index and reopening will work. Is that what we would recommend?

clintongormley · 2016-05-20T08:28:54Z

@ywelsch Retrying allocation could be triggered by raising the value of index.allocation.max_retry, but my preference would be to have it obey the same override flag that is being added in #18321. That makes it more consistent.

s1monw · 2016-05-20T09:09:33Z

@clintongormley @ywelsch @bleskes I pushed a new commit that adds a retry_failed flag to the reroute API. This is also explained in the allocation explain output and in the documentation. I think we are ready here but please take another look

ywelsch · 2016-05-20T09:55:32Z

core/src/main/java/org/elasticsearch/action/admin/cluster/reroute/ClusterRerouteRequest.java

@@ -82,13 +83,30 @@ public ClusterRerouteRequest explain(boolean explain) {
    }

    /**
+     * Sets the retry failed flag (defaults to <tt>false</tt>). If true, the
+     * request will retry allocating shards that are currently can't be allocated due to too many allocation failures.


s/that are currently can't be allocated/that can't currently be allocated/

ywelsch · 2016-05-20T10:06:25Z

Found a small issue. LGTM after fixing this. Thanks @s1monw!

clintongormley · 2016-05-20T10:21:33Z

docs/reference/cluster/reroute.asciidoc

@@ -103,3 +103,15 @@ are available:
    To ensure that these implications are well-understood,
    this command requires the special field `accept_data_loss` to be
    explicitly set to `true` for it to work.
+


add [float]before the header

bleskes · 2016-05-20T10:55:55Z

LGTM2 . I wonder how we can rest test this. It's tricky and I'm not sure it's worth it to have a simple call with true to retry_after. I'm good with pushing as is - unless someone has a good idea.

s1monw · 2016-05-20T18:28:00Z

@bleskes I thought about it and I think the major problem is to wait for state. I think we should rather try to unittest this so I added a unittest for serialization (found a bug) and for the master side of things on the reroute command. I think we are ready, will push soon

* master: (158 commits) Document the hack Refactor property placeholder use of env. vars Force java9 log4j hack in testing Fix log4j buggy java version detection Make java9 work again Don't mkdir directly in deb init script Fix env. var placeholder test so it's reproducible Remove ScriptMode class in favor of boolean true/false [rest api spec] fix doc urls Netty request/response tracer should wait for send Filter client/server VM options from jvm.options [rest api spec] fix url for reindex api docs Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly. Limit retries of failed allocations per index (#18467) Proxy box method to use valueOf. Use the build-in valueOf method instead of the custom one. Fixed tests and added a comment to the box method. Fix boxing. Do not decode path when sending error Fix race condition in snapshot initialization ...

With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.

s1monw added >enhancement resiliency :Allocation v5.0.0-alpha3 labels May 19, 2016

dakrone reviewed May 19, 2016
View reviewed changes

apply feedback from @dakrone

21887c1

fix test

b8d56c7

bleskes reviewed May 19, 2016
View reviewed changes

s1monw added the review label May 19, 2016

ywelsch reviewed May 19, 2016
View reviewed changes

* Rename max_retry to max_retries

cb25e79

* append unassigned info to allocaiton decision.

ywelsch reviewed May 19, 2016
View reviewed changes

simplify assertion

0aef3d1

ywelsch reviewed May 19, 2016
View reviewed changes

s1monw added 2 commits May 20, 2016 09:10

apply feedback

4a4436b

fix line length in test

6e88223

clintongormley mentioned this pull request May 20, 2016

Don't ignore allocation settings with cluster reroute #18321

Closed

s1monw added 2 commits May 20, 2016 10:46

add retry_failed=true|false flag to reroute API and add documentation

2544fec

fix check condition

68ad1e0

ywelsch reviewed May 20, 2016
View reviewed changes

feedback from @ywelsch

36aeb55

clintongormley reviewed May 20, 2016
View reviewed changes

s1monw added 2 commits May 20, 2016 19:33

fix docs

fd18eeb

add unittests for ClusterReroute

6ff6822

s1monw merged commit 35e7058 into elastic:master May 20, 2016

s1monw deleted the limit_failed_allocation_retries branch May 20, 2016 18:37

This was referenced Jul 6, 2016

Monitoring plugin indices stuck in failed recovery status #19275

Closed

Entries repeatedly logged with sub-second frequency filling up disk space #19164

Closed

clintongormley mentioned this pull request Jul 15, 2016

Improve index-allocation retry limitation for primary shards #19446

Closed

jasontedor mentioned this pull request Aug 3, 2016

Handling disk and file system permission issues on new index creation #19789

Closed

abeyad mentioned this pull request Aug 8, 2016

Memory leak when shard fails to allocate on missing synonym dictionary #19879

Closed

clintongormley mentioned this pull request May 6, 2017

Exponential backoff of failed allocation #24530

Closed

MorrieAtElastic mentioned this pull request Jul 7, 2017

[DOCS] Document cluster behavior when a file system crashes but node remains operational #25591

Closed

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

ywelsch mentioned this pull request Feb 28, 2019

Do not close bad indices on startup #39500

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit retries of failed allocations per index #18467

Limit retries of failed allocations per index #18467

s1monw commented May 19, 2016

s1monw commented May 19, 2016

dakrone May 19, 2016

dakrone commented May 19, 2016

bleskes May 19, 2016

ywelsch May 19, 2016

ywelsch May 19, 2016

ywelsch May 19, 2016

bleskes commented May 19, 2016

ywelsch May 19, 2016

ywelsch commented May 19, 2016

clintongormley commented May 20, 2016

s1monw commented May 20, 2016

ywelsch May 20, 2016

ywelsch commented May 20, 2016

clintongormley May 20, 2016

bleskes commented May 20, 2016

s1monw commented May 20, 2016

Limit retries of failed allocations per index #18467

Limit retries of failed allocations per index #18467

Conversation

s1monw commented May 19, 2016

s1monw commented May 19, 2016

Choose a reason for hiding this comment

dakrone commented May 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented May 19, 2016

Choose a reason for hiding this comment

ywelsch commented May 19, 2016

clintongormley commented May 20, 2016

s1monw commented May 20, 2016

Choose a reason for hiding this comment

ywelsch commented May 20, 2016

Choose a reason for hiding this comment

bleskes commented May 20, 2016

s1monw commented May 20, 2016