Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix messaging about delayed allocation #12515

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
5 participants
@dakrone
Copy link
Member

dakrone commented Jul 28, 2015

Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:

delaying allocation for [0] unassigned shards, next check in [0s]

This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.

This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).

Resolves #12456

(This PR is against 1.7 and will be forward-ported)

Fix messaging about delayed allocation
Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:

```
delaying allocation for [0] unassigned shards, next check in [0s]
```

This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.

This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).

Resolves #12456
TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state()));
long nextDelayMillis = UnassignedInfo.findNextDelayedAllocationIn(settings, event.state());
// Schedule the delay at least 5 seconds in the future
nextDelayMillis = Math.max(5000, nextDelayMillis);

This comment has been minimized.

Copy link
@martijnvg

martijnvg Jul 28, 2015

Member

perhaps the minimum delay should be a setting?

This comment has been minimized.

Copy link
@dakrone

dakrone Jul 28, 2015

Author Member

Sure, I will make it a non-dynamic configurable setting.

@martijnvg

This comment has been minimized.

Copy link
Member

martijnvg commented Jul 28, 2015

LGTM. Left a non blocking question.

@@ -190,7 +190,7 @@ public static int getNumberOfDelayedUnassigned(Settings settings, ClusterState s
if (shard.primary() == false) {
IndexMetaData indexMetaData = state.metaData().index(shard.getIndex());
long delay = shard.unassignedInfo().getDelayAllocationExpirationIn(settings, indexMetaData.getSettings());
if (delay > 0) {
if (delay != 0) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 28, 2015

Contributor

So negative delays count delays? I figured they should count as not-delayed.

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 28, 2015

Contributor

Count as delays I mean.

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 28, 2015

Contributor

Ah! I get it! 0 means they are not delayed at all but a negative number means they've already expired. They aren't delayed any more. OK - I retract my question. Maybe a comment?

This comment has been minimized.

Copy link
@dakrone

dakrone Jul 28, 2015

Author Member

Sure I'll add a comment

@Test
@LuceneTestCase.Slow
public void testDelayWithALargeAmountOfShards() throws Exception {
/**

This comment has been minimized.

Copy link
@nik9000

nik9000 Jul 28, 2015

Contributor

One too many *s here I think.

@nik9000

This comment has been minimized.

Copy link
Contributor

nik9000 commented Jul 28, 2015

LGTM

@dakrone

This comment has been minimized.

Copy link
Member Author

dakrone commented Jul 29, 2015

Closing this, handled a different way in #12532

@dakrone dakrone closed this Jul 29, 2015

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 5, 2015

Avoid extra reroutes of delayed shards in RoutingService
In order to avoid extra reroutes, `RoutingService` should avoid
scheduling a reroute of any shards where the delay is negative. To make
sure that we don't encounter a race condition between the
GatewayAllocator thinking a shard is delayed and RoutingService thinking
it is not, the GatewayAllocator will update the RoutingService with the
last time it checked in order to use a consistent "view" of the delay.

Resolves elastic#12456
Relates to elastic#12515 and elastic#12456

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 5, 2015

Avoid extra reroutes of delayed shards in RoutingService
In order to avoid extra reroutes, `RoutingService` should avoid
scheduling a reroute of any shards where the delay is negative. To make
sure that we don't encounter a race condition between the
GatewayAllocator thinking a shard is delayed and RoutingService thinking
it is not, the GatewayAllocator will update the RoutingService with the
last time it checked in order to use a consistent "view" of the delay.

Resolves elastic#12456
Relates to elastic#12515 and elastic#12456

@clintongormley clintongormley removed the review label Aug 7, 2015

@dakrone dakrone deleted the dakrone:delay-reroute-enhancement branch May 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.