Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit retries of failed allocations per index #18467

Merged
merged 12 commits into from
May 20, 2016

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented May 19, 2016

Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
index.allocation.max_retry and it's default is set to 5. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to 0 once the shards
has been started.

Relates to #18417

Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to elastic#18417
@s1monw
Copy link
Contributor Author

s1monw commented May 19, 2016

@ywelsch can you take a look?
@clintongormley what do you think how should we document that and where?

this.reason = reason;
this.unassignedTimeMillis = unassignedTimeMillis;
this.unassignedTimeNanos = unassignedTimeNanos;
this.lastComputedLeftDelayNanos = 0L;
this.message = message;
this.failure = failure;
this.failedAllocations = failedAllocations;
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:
"failedAllocations: " + 0 + " for reason " + reason;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 0 is hardcoded, I think this was supposed to be failedAllocation instead

@dakrone
Copy link
Member

dakrone commented May 19, 2016

I know you didn't ask for my review, but I left some comments regardless :)

int maxRetry = SETTING_ALLOCATION_MAX_RETRY.get(indexSafe.getSettings());
if (unassignedInfo.getNumFailedAllocations() >= maxRetry) {
return allocation.decision(Decision.NO, NAME, "shard has already failed allocating ["
+ unassignedInfo.getNumFailedAllocations() + "] times");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be nice to show the last failure here as well? this will help explain how we got here.

@s1monw s1monw added the review label May 19, 2016
this.reason = reason;
this.unassignedTimeMillis = unassignedTimeMillis;
this.unassignedTimeNanos = unassignedTimeNanos;
this.lastComputedLeftDelayNanos = 0L;
this.message = message;
this.failure = failure;
this.failedAllocations = failedAllocations;
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just (assert failedAllocations > 0) == (reason == Reason.ALLOCATION_FAILED)?

* append unassigned info to allocaiton decision.
}

public UnassignedInfo readFrom(StreamInput in) throws IOException {
return new UnassignedInfo(in);
}

/**
* Retruns the number of previously failed allocations of this shard.
*/
public int getNumFailedAllocations() {return failedAllocations;}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some newlines are ok here :-)

public Decision canAllocate(ShardRouting shardRouting, RoutingAllocation allocation) {
UnassignedInfo unassignedInfo = shardRouting.unassignedInfo();
if (unassignedInfo != null && unassignedInfo.getNumFailedAllocations() > 0) {
IndexMetaData indexSafe = allocation.metaData().getIndexSafe(shardRouting.index());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this variable indexMetaData?

@bleskes
Copy link
Contributor

bleskes commented May 19, 2016

LGTM. Thx @s1monw

assertEquals(routingTable.index("idx").shards().size(), 1);
assertEquals(routingTable.index("idx").shard(0).shards().get(0).state(), INITIALIZING);
// now fail it 4 times - 5 retries is default
for (int i = 0; i < 4; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could parameterize the test on the number of retries. alternatively I would suggest using the setting SETTING_ALLOCATION_MAX_RETRY.get(settings) explicitly here instead of hardcoded 4.

@ywelsch
Copy link
Contributor

ywelsch commented May 19, 2016

Left minor comments but LGTM o.w.
For the docs, I wonder if we should put some words on how a sysadmin is supposed to get this shard assigned again after fixing the issue? Closing the index and reopening will work. Is that what we would recommend?

@clintongormley
Copy link

@ywelsch Retrying allocation could be triggered by raising the value of index.allocation.max_retry, but my preference would be to have it obey the same override flag that is being added in #18321. That makes it more consistent.

@s1monw
Copy link
Contributor Author

s1monw commented May 20, 2016

@clintongormley @ywelsch @bleskes I pushed a new commit that adds a retry_failed flag to the reroute API. This is also explained in the allocation explain output and in the documentation. I think we are ready here but please take another look

@@ -82,13 +83,30 @@ public ClusterRerouteRequest explain(boolean explain) {
}

/**
* Sets the retry failed flag (defaults to <tt>false</tt>). If true, the
* request will retry allocating shards that are currently can't be allocated due to too many allocation failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that are currently can't be allocated/that can't currently be allocated/

@ywelsch
Copy link
Contributor

ywelsch commented May 20, 2016

Found a small issue. LGTM after fixing this. Thanks @s1monw!

@@ -103,3 +103,15 @@ are available:
To ensure that these implications are well-understood,
this command requires the special field `accept_data_loss` to be
explicitly set to `true` for it to work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add [float]before the header

@bleskes
Copy link
Contributor

bleskes commented May 20, 2016

LGTM2 . I wonder how we can rest test this. It's tricky and I'm not sure it's worth it to have a simple call with true to retry_after. I'm good with pushing as is - unless someone has a good idea.

@s1monw
Copy link
Contributor Author

s1monw commented May 20, 2016

@bleskes I thought about it and I think the major problem is to wait for state. I think we should rather try to unittest this so I added a unittest for serialization (found a bug) and for the master side of things on the reroute command. I think we are ready, will push soon

@s1monw s1monw merged commit 35e7058 into elastic:master May 20, 2016
@s1monw s1monw deleted the limit_failed_allocation_retries branch May 20, 2016 18:37
jasontedor added a commit that referenced this pull request May 22, 2016
* master: (158 commits)
  Document the hack
  Refactor property placeholder use of env. vars
  Force java9 log4j hack in testing
  Fix log4j buggy java version detection
  Make java9 work again
  Don't mkdir directly in deb init script
  Fix env. var placeholder test so it's reproducible
  Remove ScriptMode class in favor of boolean true/false
  [rest api spec] fix doc urls
  Netty request/response tracer should wait for send
  Filter client/server VM options from jvm.options
  [rest api spec] fix url for reindex api docs
  Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly.
  Limit retries of failed allocations per index (#18467)
  Proxy box method to use valueOf.
  Use the build-in valueOf method instead of the custom one.
  Fixed tests and added a comment to the box method.
  Fix boxing.
  Do not decode path when sending error
  Fix race condition in snapshot initialization
  ...
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
ywelsch added a commit that referenced this pull request Mar 1, 2019
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement resiliency v5.0.0-alpha3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants