Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove scheduled routing #11776

Merged
merged 1 commit into from Jun 23, 2015

Conversation

@kimchy
Copy link
Member

commented Jun 19, 2015

Today, we have scheduled reroute that kicks every 10 seconds and checks if a
reroute is needed. We use it when adding nodes, since we don't reroute right
away once its added, and give it a time window to add additional nodes.

We do have recover after nodes setting and such in order to wait for enough
nodes to be added, and also, it really depends at what part of the 10s window
you end up, sometimes, it might not be effective at all. In general, its historic
from the times before we had recover after nodes and such.

This change removes the 10s scheduling, simplifies RoutingService, and adds
explicit reroute when a node is added to the system. It also adds unit tests
to RoutingService.

@bleskes
bleskes reviewed Jun 19, 2015
View changes
core/src/test/java/org/elasticsearch/test/InternalTestCluster.java Outdated
@@ -365,7 +365,6 @@ private static Settings getRandomNodeSettings(long seed) {
Random random = new Random(seed);
Builder builder = Settings.settingsBuilder()
// decrease the routing schedule so new nodes will be added quickly - some random value between 30 and 80 ms

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

comment can go away..

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

right, missed it..

}

@Test
public void testReroute() {

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

I'm confused by this test, what does it do?

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

check that reroute actually triggers the reroute flag...

@bleskes
bleskes reviewed Jun 19, 2015
View changes
core/src/main/java/org/elasticsearch/cluster/routing/RoutingService.java Outdated
public void scheduleReroute() {
routingTableDirty = true;
/**
* make sure that a reroute will be done by the next scheduled check

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

comment probably needs to be adapted - it's done immediately now,

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

right, will amend.

@bleskes
bleskes reviewed Jun 19, 2015
View changes
core/src/main/java/org/elasticsearch/cluster/routing/RoutingService.java Outdated
* make sure that a reroute will be done by the next scheduled check
*/
public void reroute(String reason) {
performReroute(reason);

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

do we need the performReroute? can't we fold this here?

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

I wanted to separate the method we override in our tests from the public method, will make this one final

if (lifecycle.stopped()) {
return;
}
if (rerouting.compareAndSet(false, true) == false) {
logger.trace("already has pending reroute, ignoring");
logger.trace("already has pending reroute, ignoring {}", reason);

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

Just an idea how about making rerouting an AtomicReference where a non null value signals the currently next reroute task name? this way we can log here logger.trace("already has pending reroute due to [{}]. ignoring [{}]")/

@@ -932,7 +932,10 @@ public ClusterState execute(ClusterState currentState) {
if (modified) {
stateBuilder.nodes(nodesBuilder);
}
return stateBuilder.build();
currentState = stateBuilder.build();

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

I wonder if we should shortcut if modified == false . I'm thinking about a join storm upon master failure which will lead to a series of "received a join request for an existing node " where we don't really need to do anything.

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

I was looking to reduce the amount of logical change here, but I can make this change...

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

actually, I don't want to change this behavior to be honest, I am going to leave it as is. We might to actually do want to send another cluster state as a result

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

We do need to publish a cluster state to seal the mastership on the joining nodes. I will add a big fat comment about this here after this is committed (to not be in your way when you merge). What I meant is avoiding the reroute, which was accumulated before for all the joins. But ok to leave as is for now if you have concerns.

@@ -90,52 +79,26 @@ protected void doStop() {

@Override
protected void doClose() {
FutureUtils.cancel(scheduledRoutingTableFuture);

This comment has been minimized.

Copy link
@bleskes

bleskes Jun 19, 2015

Member

not related to this change, but I think it's good to do FutureUtils.cancel(registeredNextDelayFuture) here. Might save on some cryptic error messages during shutdown..

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 19, 2015

Author Member

I am not sure its a real problem, cause on shutdown we cancel ongoing ones anyhow, but will add

@bleskes

This comment has been minimized.

Copy link
Member

commented Jun 19, 2015

I think it's awesome how easy this turned up to be. I looked at all the places we submit a cluster state update task and didn't see any place we miss a reroute (and would rely on the 10s) apart from the node join we already knew about.

Made some small suggestions.

this.schedule = settings.getAsTime("cluster.routing.schedule", timeValueSeconds(10));
clusterService.addFirst(this);
if (clusterService != null) {
clusterService.addFirst(this);

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 19, 2015

Contributor

at some point we should really add these ourside of the classes itself IMO. but that is totally unrelated

public void setReallocation(final ClusterService clusterService, final AllocationService allocationService) {
this.clusterService = clusterService;
this.allocationService = allocationService;
public void setReallocation(final ClusterService clusterService, final RoutingService routingService) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 19, 2015

Contributor

I wonder if there is still a circular dep. now that AllocatinService is not needed here?

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 19, 2015

Contributor

I also wonder if we can trash this method then altogether a make GatewayAllocator implement ClusterStateListener and add it to the ClusterService once constructed or maybe inside the ClusterService...

This comment has been minimized.

Copy link
@kimchy

kimchy Jun 20, 2015

Author Member

the cycling problem will still remain, do you mean fold RoutingService into ClusterService? I kinda like that ClusterService has no notion or knowledge of allocation / routing, and just deals with cluster state in a generic form

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 23, 2015

Contributor

oh man that class is a nighmare. really I just realized how fucked up this is grrr.

@kimchy

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2015

@bleskes I pushed a first review round, I don't like the suggestion on the atomic reference one, I don't think it adds a lot of value, on the other hand, I have an idea on what would, but its a bigger change, will see if I can work on it and if it stays small

@bleskes

This comment has been minimized.

Copy link
Member

commented Jun 22, 2015

LGTM. I'm +1 on 1.7 - it makes things easier to trace, reason about and debug.

@kimchy kimchy force-pushed the kimchy:remove_reroute_schedule branch Jun 23, 2015
@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2015

I am ok with this too... this entire circular dep stuff is just really bad but lets move on

Today, we have scheduled reroute that kicks every 10 seconds and checks if a
reroute is needed. We use it when adding nodes, since we don't reroute right
away once its added, and give it a time window to add additional nodes.

We do have recover after nodes setting and such in order to wait for enough
nodes to be added, and also, it really depends at what part of the 10s window
you end up, sometimes, it might not be effective at all. In general, its historic
from the times before we had recover after nodes and such.

This change removes the 10s scheduling, simplifies RoutingService, and adds
explicit reroute when a node is added to the system. It also adds unit tests
to RoutingService.

closes #11776
@kimchy kimchy force-pushed the kimchy:remove_reroute_schedule branch to 435ce7f Jun 23, 2015
@kimchy kimchy added the v1.7.0 label Jun 23, 2015
@kimchy kimchy merged commit 435ce7f into elastic:master Jun 23, 2015
1 check passed
1 check passed
CLA Commit author is a member of Elasticsearch
Details
@kevinkluge kevinkluge removed the review label Jun 23, 2015
kimchy added a commit that referenced this pull request Jun 23, 2015
Today, we have scheduled reroute that kicks every 10 seconds and checks if a
reroute is needed. We use it when adding nodes, since we don't reroute right
away once its added, and give it a time window to add additional nodes.

We do have recover after nodes setting and such in order to wait for enough
nodes to be added, and also, it really depends at what part of the 10s window
you end up, sometimes, it might not be effective at all. In general, its historic
from the times before we had recover after nodes and such.

This change removes the 10s scheduling, simplifies RoutingService, and adds
explicit reroute when a node is added to the system. It also adds unit tests
to RoutingService.

closes #11776
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jun 30, 2015
elastic#11776 has simplified our rerouting logic by removing a scheduled background reroute in favor of an explicit reroute during the cluster state processing of a node join (the only place where we didn't do it explicitly). While that change is conceptually good, it change semantics a bit in two ways:

 - shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see elastic#11923
 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately

 This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.