Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify delayed shard allocation #18351

Merged
merged 1 commit into from May 26, 2016

Conversation

Projects
None yet
4 participants
@ywelsch
Copy link
Contributor

commented May 14, 2016

This PR simplifies the delayed shard allocation implementation by assigning clear responsibilities to the various components that are affected by delayed shard allocation:

  • UnassignedInfo gets a boolean flag delayed which determines whether assignment of the shard should be delayed. The flag gets persisted in the cluster state and is thus available across nodes, i.e. each node knows whether a shard was delayed-unassigned in a specific cluster state. Before, nodes other than the current master were unaware of that information.
  • This flag is initially set as true if the shard becomes unassigned due to a node leaving and the index setting index.unassigned.node_left.delayed_timeout being strictly positive. From then on, unassigned shards can only transition from delayed to non-delayed, never in the other direction.
  • The reroute step is in charge of removing the delay marker (comparing timestamp when node left to current timestamp).
  • A dedicated service DelayedAllocationService, reacting to cluster change events, has the responsibility to schedule reroutes to remove the delay marker.

Relates to #18293

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2016

@bleskes can you have a look?

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/RoutingService.java Outdated
@@ -59,18 +54,13 @@
private final AllocationService allocationService;

private AtomicBoolean rerouting = new AtomicBoolean();
private volatile long minDelaySettingAtLastSchedulingNanos = Long.MAX_VALUE;
private volatile ScheduledFuture registeredNextDelayFuture;

@Inject
public RoutingService(Settings settings, ThreadPool threadPool, ClusterService clusterService, AllocationService allocationService) {
super(settings);
this.threadPool = threadPool;

This comment has been minimized.

Copy link
@bleskes

bleskes May 20, 2016

Member

we don't need the thread pool anymore

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2016

@bleskes I've updated the PR. Please have another look.

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/DelayedAllocationService.java Outdated
if (earlierRerouteNeeded) {
logger.info("scheduling reroute for delayed shards in [{}] ({} delayed shards)", nextDelay,
UnassignedInfo.getNumberOfDelayedUnassigned(state));
newTask.schedule();

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

I'm thinking some more about scheduling first and making it visible second and having second doubts on that one. Looking at the code again , why do we need to schedule first/what's the down side of doing it after setting delayedRerouteTask, so we know removeTaskAndCancel/removeIfSameTask work?

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 25, 2016

Author Contributor

Yes, we can do it the other way around. We know that close will be run after scheduleIfNeeded, and never the other way around.

@ywelsch

View changes

core/src/main/java/org/elasticsearch/cluster/routing/DelayedAllocationService.java Outdated
UnassignedInfo.getNumberOfDelayedUnassigned(state));
newTask.schedule();
DelayedRerouteTask currentTask = delayedRerouteTask.getAndSet(newTask);
assert existingTask == currentTask || currentTask == null;

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 25, 2016

Author Contributor

This should just be assert currentTask == null; now as we removeTaskAndCancel(); a few lines above.

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

that's OK because of the fact that this run by a single thread, but it will be easier on the eye to use:

existingTask.cancel()

instead of removeTaskAndCancel()

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated
return applyDelayedShards(clusterState, true);
}

public RoutingAllocation.Result applyDelayedShards(ClusterState clusterState, boolean withReroute) {

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

we don't really need this variant do we?

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated
reroute(allocation);
}
RoutingAllocation.Result result = buildChangedResult(allocation);
logClusterHealthStateChange(

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

we always log this after building the change result, maybe fold it into the buildChangedResult method and call it buildResultAndLogHealthChange?

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 25, 2016

Author Contributor

good idea!

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated
public RoutingAllocation.Result applyDelayedShards(ClusterState clusterState, boolean withReroute) {
RoutingNodes routingNodes = getMutableRoutingNodes(clusterState);
RoutingAllocation allocation = new RoutingAllocation(allocationDeciders, routingNodes, clusterState.nodes(), clusterInfoService.getClusterInfo(), currentNanoTime());
boolean changed = removeDelayMarkers(allocation);

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

since withReroute is always true, and since reroute() also removes delay markes, do we really need this method? can't we just call reroute?

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 25, 2016

Author Contributor

This makes delayed scheduling efficient (same approach as for Shard started / shard failed). If we haven't removed a delay marker, we don't need to go through reroute and do all the shard balancing...

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

it's different because applyStartedShards and applyFailedShard are doing something that isn't done in the reroute. Also this method is called when we are 99% sure that something is going to happen. I don't think it's worth the extra code path.

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated
/**
* Removes delay markers from unassigned shards based on current time stamp. Returns true if markers were removed.
*/
public static boolean removeDelayMarkers(RoutingAllocation allocation) {

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

I expect to be tested as it's static, but I don't see test? I think its good to have one. Also, it seems it can be package private.

@@ -179,7 +179,7 @@ public boolean allocateUnassigned(RoutingAllocation allocation) {
}
} else if (matchingNodes.hasAnyData() == false) {
// if we didn't manage to find *any* data (regardless of matching sizes), check if the allocation of the replica shard needs to be delayed
changed |= ignoreUnassignedIfDelayed(unassignedIterator, shard);
ignoreUnassignedIfDelayed(unassignedIterator, shard);

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

this one is sneaky :)

This comment has been minimized.

Copy link
@ywelsch

ywelsch May 25, 2016

Author Contributor

I like that change in particular because it brings us in line with how we handle unassignedIterator.removeAndIgnore() in all other cases (we never set changed to true just for ignoring a shard).

@bleskes

View changes

core/src/test/java/org/elasticsearch/cluster/routing/DelayedAllocationServiceTests.java Outdated
.routingResult(allocationService.applyStartedShards(clusterState, clusterState.getRoutingNodes().shardsWithState(INITIALIZING)))
.build();
assertThat(clusterState.getRoutingNodes().unassigned().size() > 0, equalTo(false));
// remove node2 and reroute

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

I'm wondering if we want to randomly add a third node and see that we still allocate (not only say "delayed" is false) - maybe this is covered by another test. If so , no need for this here

@bleskes

View changes

core/src/test/java/org/elasticsearch/cluster/routing/DelayedAllocationServiceTests.java Outdated
// make sure the replica is marked as delayed (i.e. not reallocated)
assertEquals(1, UnassignedInfo.getNumberOfDelayedUnassigned(stateWithDelayedShard));
assertEquals(baseTimestampNanos,
stateWithDelayedShard.getRoutingNodes().unassigned().iterator().next().unassignedInfo().getUnassignedTimeInNanos());

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

can we assign stateWithDelayedShard.getRoutingNodes().unassigned().iterator().next().unassignedInfo() into some temp variable to make this readable?

clusterStateUpdateTask.set((ClusterStateUpdateTask)invocationOnMock.getArguments()[1]);
latch.countDown();
return null;
}).when(clusterService).submitStateUpdateTask(eq(CLUSTER_UPDATE_TASK_SOURCE), any(ClusterStateUpdateTask.class));

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

mocking FTW!

@bleskes

View changes

core/src/test/java/org/elasticsearch/cluster/routing/DelayedAllocationServiceTests.java Outdated
assertThat(delayedRerouteTask.nextDelay.nanos(),
equalTo(delaySetting.nanos() - (clusterChangeEventTimestampNanos - nodeLeftTimestampNanos)));

// update settings with shorter delay

This comment has been minimized.

Copy link
@bleskes

bleskes May 25, 2016

Member

can we randomly test another similar path where a shard from another index becomes unassinged and have a shorter delay? this should reschedule as well..

@bleskes

This comment has been minimized.

Copy link
Member

commented May 25, 2016

Looking good! left comments here and there. Almost all very minor.

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

commented May 25, 2016

@bleskes thanks for reviewing. I've updated the PR with your suggestions. Please have another look.

@bleskes

This comment has been minimized.

Copy link
Member

commented May 26, 2016

LGTM. @ywelsch thanks for the extra iterations

@ywelsch ywelsch force-pushed the ywelsch:fix/delayed-shard-allocation branch May 26, 2016

@ywelsch ywelsch force-pushed the ywelsch:fix/delayed-shard-allocation branch to 45e8798 May 26, 2016

@ywelsch ywelsch merged commit 31b0777 into elastic:master May 26, 2016

1 check passed

CLA Commit author is a member of Elasticsearch
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.