Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify delayed shard allocation #18351

Merged
merged 1 commit into from
May 26, 2016

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented May 14, 2016

This PR simplifies the delayed shard allocation implementation by assigning clear responsibilities to the various components that are affected by delayed shard allocation:

  • UnassignedInfo gets a boolean flag delayed which determines whether assignment of the shard should be delayed. The flag gets persisted in the cluster state and is thus available across nodes, i.e. each node knows whether a shard was delayed-unassigned in a specific cluster state. Before, nodes other than the current master were unaware of that information.
  • This flag is initially set as true if the shard becomes unassigned due to a node leaving and the index setting index.unassigned.node_left.delayed_timeout being strictly positive. From then on, unassigned shards can only transition from delayed to non-delayed, never in the other direction.
  • The reroute step is in charge of removing the delay marker (comparing timestamp when node left to current timestamp).
  • A dedicated service DelayedAllocationService, reacting to cluster change events, has the responsibility to schedule reroutes to remove the delay marker.

Relates to #18293

@ywelsch
Copy link
Contributor Author

ywelsch commented May 14, 2016

@bleskes can you have a look?

@@ -59,18 +54,13 @@
private final AllocationService allocationService;

private AtomicBoolean rerouting = new AtomicBoolean();
private volatile long minDelaySettingAtLastSchedulingNanos = Long.MAX_VALUE;
private volatile ScheduledFuture registeredNextDelayFuture;

@Inject
public RoutingService(Settings settings, ThreadPool threadPool, ClusterService clusterService, AllocationService allocationService) {
super(settings);
this.threadPool = threadPool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the thread pool anymore

@ywelsch
Copy link
Contributor Author

ywelsch commented May 23, 2016

@bleskes I've updated the PR. Please have another look.

if (earlierRerouteNeeded) {
logger.info("scheduling reroute for delayed shards in [{}] ({} delayed shards)", nextDelay,
UnassignedInfo.getNumberOfDelayedUnassigned(state));
newTask.schedule();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking some more about scheduling first and making it visible second and having second doubts on that one. Looking at the code again , why do we need to schedule first/what's the down side of doing it after setting delayedRerouteTask, so we know removeTaskAndCancel/removeIfSameTask work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do it the other way around. We know that close will be run after scheduleIfNeeded, and never the other way around.

@bleskes
Copy link
Contributor

bleskes commented May 25, 2016

Looking good! left comments here and there. Almost all very minor.

@ywelsch
Copy link
Contributor Author

ywelsch commented May 25, 2016

@bleskes thanks for reviewing. I've updated the PR with your suggestions. Please have another look.

@bleskes
Copy link
Contributor

bleskes commented May 26, 2016

LGTM. @ywelsch thanks for the extra iterations

@ywelsch ywelsch force-pushed the fix/delayed-shard-allocation branch from c7eeb27 to 2369748 Compare May 26, 2016 10:42
@ywelsch ywelsch force-pushed the fix/delayed-shard-allocation branch from 2369748 to 45e8798 Compare May 26, 2016 10:49
@ywelsch ywelsch merged commit 31b0777 into elastic:master May 26, 2016
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v5.0.0-alpha4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants