Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit retries of failed allocations per index #18467

Merged
merged 12 commits into from May 20, 2016

Conversation

Projects
None yet
6 participants
@s1monw
Copy link
Contributor

commented May 19, 2016

Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
index.allocation.max_retry and it's default is set to 5. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to 0 once the shards
has been started.

Relates to #18417

Limit retries of failed allocations per index
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to #18417
@s1monw

This comment has been minimized.

Copy link
Contributor Author

commented May 19, 2016

@ywelsch can you take a look?
@clintongormley what do you think how should we document that and where?

this.reason = reason;
this.unassignedTimeMillis = unassignedTimeMillis;
this.unassignedTimeNanos = unassignedTimeNanos;
this.lastComputedLeftDelayNanos = 0L;
this.message = message;
this.failure = failure;
this.failedAllocations = failedAllocations;
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:
"failedAllocations: " + 0 + " for reason " + reason;

This comment has been minimized.

Copy link
@dakrone

dakrone May 19, 2016

Member

This 0 is hardcoded, I think this was supposed to be failedAllocation instead

@@ -325,7 +340,11 @@ public String shortSummary() {
StringBuilder sb = new StringBuilder();
sb.append("[reason=").append(reason).append("]");
sb.append(", at[").append(DATE_TIME_FORMATTER.printer().print(unassignedTimeMillis)).append("]");
if (failedAllocations > 0) {
sb.append(", failed_attemps[").append(failedAllocations).append("]");

This comment has been minimized.

Copy link
@dakrone

dakrone May 19, 2016

Member

attemps -> attempts

@@ -342,6 +361,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
builder.startObject("unassigned_info");
builder.field("reason", reason);
builder.field("at", DATE_TIME_FORMATTER.printer().print(unassignedTimeMillis));
if (failedAllocations > 0) {
builder.field("failed_attemps", failedAllocations);

This comment has been minimized.

Copy link
@dakrone

dakrone May 19, 2016

Member

same here, attemps -> attempts

*/
public class MaxRetryAllocationDecider extends AllocationDecider {

public static final Setting<Integer> SETTING_ALLOCATION_MAX_RETRY = Setting.intSetting("index.allocation.max_retry", 5, 0,

This comment has been minimized.

Copy link
@dakrone

dakrone May 19, 2016

Member

Personal preference, but I think index.allocation.max_retries would be a better name

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 19, 2016

Contributor

I like @dakrone's suggestion here.

@dakrone

This comment has been minimized.

Copy link
Member

commented May 19, 2016

I know you didn't ask for my review, but I left some comments regardless :)

int maxRetry = SETTING_ALLOCATION_MAX_RETRY.get(indexSafe.getSettings());
if (unassignedInfo.getNumFailedAllocations() >= maxRetry) {
return allocation.decision(Decision.NO, NAME, "shard has already failed allocating ["
+ unassignedInfo.getNumFailedAllocations() + "] times");

This comment has been minimized.

Copy link
@bleskes

bleskes May 19, 2016

Member

Will it be nice to show the last failure here as well? this will help explain how we got here.

@s1monw s1monw added the review label May 19, 2016

this.reason = reason;
this.unassignedTimeMillis = unassignedTimeMillis;
this.unassignedTimeNanos = unassignedTimeNanos;
this.lastComputedLeftDelayNanos = 0L;
this.message = message;
this.failure = failure;
this.failedAllocations = failedAllocations;
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED:

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 19, 2016

Contributor

just (assert failedAllocations > 0) == (reason == Reason.ALLOCATION_FAILED)?

* Rename max_retry to max_retries
* append unassigned info to allocaiton decision.
}

public UnassignedInfo readFrom(StreamInput in) throws IOException {
return new UnassignedInfo(in);
}

/**
* Retruns the number of previously failed allocations of this shard.
*/
public int getNumFailedAllocations() {return failedAllocations;}

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 19, 2016

Contributor

some newlines are ok here :-)

public Decision canAllocate(ShardRouting shardRouting, RoutingAllocation allocation) {
UnassignedInfo unassignedInfo = shardRouting.unassignedInfo();
if (unassignedInfo != null && unassignedInfo.getNumFailedAllocations() > 0) {
IndexMetaData indexSafe = allocation.metaData().getIndexSafe(shardRouting.index());

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 19, 2016

Contributor

just call this variable indexMetaData?

@bleskes

This comment has been minimized.

Copy link
Member

commented May 19, 2016

LGTM. Thx @s1monw

assertEquals(routingTable.index("idx").shards().size(), 1);
assertEquals(routingTable.index("idx").shard(0).shards().get(0).state(), INITIALIZING);
// now fail it 4 times - 5 retries is default
for (int i = 0; i < 4; i++) {

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 19, 2016

Contributor

you could parameterize the test on the number of retries. alternatively I would suggest using the setting SETTING_ALLOCATION_MAX_RETRY.get(settings) explicitly here instead of hardcoded 4.

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

Left minor comments but LGTM o.w.
For the docs, I wonder if we should put some words on how a sysadmin is supposed to get this shard assigned again after fixing the issue? Closing the index and reopening will work. Is that what we would recommend?

s1monw added some commits May 20, 2016

@clintongormley

This comment has been minimized.

Copy link
Member

commented May 20, 2016

@ywelsch Retrying allocation could be triggered by raising the value of index.allocation.max_retry, but my preference would be to have it obey the same override flag that is being added in #18321. That makes it more consistent.

@s1monw

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2016

@clintongormley @ywelsch @bleskes I pushed a new commit that adds a retry_failed flag to the reroute API. This is also explained in the allocation explain output and in the documentation. I think we are ready here but please take another look

@@ -82,13 +83,30 @@ public ClusterRerouteRequest explain(boolean explain) {
}

/**
* Sets the retry failed flag (defaults to <tt>false</tt>). If true, the
* request will retry allocating shards that are currently can't be allocated due to too many allocation failures.

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 20, 2016

Contributor

s/that are currently can't be allocated/that can't currently be allocated/

@@ -61,6 +61,15 @@ public ClusterRerouteRequestBuilder setExplain(boolean explain) {
}

/**
* Sets the retry failed flag (defaults to <tt>false</tt>). If true, the
* request will retry allocating shards that are currently can't be allocated due to too many allocation failures.

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 20, 2016

Contributor

same as above

@@ -91,7 +91,7 @@ public void onFailure(String source, Throwable t) {

@Override
public ClusterState execute(ClusterState currentState) {
RoutingAllocation.Result routingResult = allocationService.reroute(currentState, request.commands, request.explain());
RoutingAllocation.Result routingResult = allocationService.reroute(currentState, request.commands, request.explain(), true);

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 20, 2016

Contributor

request.isRetryFailed() as last parameter?

This comment has been minimized.

Copy link
@s1monw

s1monw May 20, 2016

Author Contributor

oh good call

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented May 20, 2016

Found a small issue. LGTM after fixing this. Thanks @s1monw!

@@ -103,3 +103,15 @@ are available:
To ensure that these implications are well-understood,
this command requires the special field `accept_data_loss` to be
explicitly set to `true` for it to work.

This comment has been minimized.

Copy link
@clintongormley

clintongormley May 20, 2016

Member

add [float]before the header


=== Retry failed shards

In the case when the allocation of a shard failed multiple times without a

This comment has been minimized.

Copy link
@clintongormley

clintongormley May 20, 2016

Member
The cluster will attempt to allocate a shard a maximum of
`index.allocation.max_retries` times in a row (defaults to `5`), before giving
up and leaving the shard unallocated. This scenario can be caused by
structural problems such as having an analyzer which refers to a stopwords
file which doesn't exist on all nodes.

Once the problem has been corrected, allocation can be manually retried by
calling the <<cluster-reroute,`_reroute`>> API with `?retry_failed`, which
will attempt a single retry round for these shards.
@bleskes

This comment has been minimized.

Copy link
Member

commented May 20, 2016

LGTM2 . I wonder how we can rest test this. It's tricky and I'm not sure it's worth it to have a simple call with true to retry_after. I'm good with pushing as is - unless someone has a good idea.

s1monw added some commits May 20, 2016

@s1monw

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2016

@bleskes I thought about it and I think the major problem is to wait for state. I think we should rather try to unittest this so I added a unittest for serialization (found a bug) and for the master side of things on the reroute command. I think we are ready, will push soon

@s1monw s1monw merged commit 35e7058 into elastic:master May 20, 2016

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@s1monw s1monw deleted the s1monw:limit_failed_allocation_retries branch May 20, 2016

jasontedor added a commit that referenced this pull request May 22, 2016

Merge branch 'master' into feature/seq_no
* master: (158 commits)
  Document the hack
  Refactor property placeholder use of env. vars
  Force java9 log4j hack in testing
  Fix log4j buggy java version detection
  Make java9 work again
  Don't mkdir directly in deb init script
  Fix env. var placeholder test so it's reproducible
  Remove ScriptMode class in favor of boolean true/false
  [rest api spec] fix doc urls
  Netty request/response tracer should wait for send
  Filter client/server VM options from jvm.options
  [rest api spec] fix url for reindex api docs
  Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly.
  Limit retries of failed allocations per index (#18467)
  Proxy box method to use valueOf.
  Use the build-in valueOf method instead of the custom one.
  Fixed tests and added a comment to the box method.
  Fix boxing.
  Do not decode path when sending error
  Fix race condition in snapshot initialization
  ...

ywelsch added a commit that referenced this pull request Mar 1, 2019

Do not close bad indices on startup (#39500)
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.

ywelsch added a commit that referenced this pull request Mar 1, 2019

Do not close bad indices on startup (#39500)
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.

ywelsch added a commit that referenced this pull request Mar 1, 2019

Do not close bad indices on startup (#39500)
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.