Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cluster update reroutes with async fetch #11421

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
4 participants
@kimchy
Copy link
Member

kimchy commented May 29, 2015

When using async fetch, we can end up with cluster updates and reroutes based on teh number of shards. While not disastrous we can optimize it, since a single reroute is enough to apply to all the async fetch results that arrived during that time.

Reduce cluster update reroutes with async fetch
When using async fetch, we can end up with cluster updates and reroutes based on teh number of shards. While not disastrous we can optimize it, since a single reroute is enough to apply to all the async fetch results that arrived during that time.
logger.trace("{} already has pending reroute, ignoring {}", shardId, reason);
return;
}
clusterService.submitStateUpdateTask("async_shard_fetch", Priority.HIGH, new ClusterStateUpdateTask() {

This comment has been minimized.

Copy link
@dakrone

dakrone May 29, 2015

Member

I think it would be valuable to have the original type, shardId, and reason in the message, did you remove it on purpose?

This comment has been minimized.

Copy link
@kimchy

kimchy May 29, 2015

Author Member

my thought that it becomes less relevant, since a single reroute actually represents a few events now, and it can be misleading seeing in the pending task information about a shard id, where it might be ones for multiple ones

This comment has been minimized.

Copy link
@dakrone

dakrone May 29, 2015

Member

makes sense, thanks!

@dakrone

This comment has been minimized.

Copy link
Member

dakrone commented May 29, 2015

left one comment about removing info from the message, other than that LGTM

@kimchy

This comment has been minimized.

Copy link
Member Author

kimchy commented May 29, 2015

@dakrone I added a comment back, tell me if it makes sense

@kimchy kimchy closed this May 29, 2015

@kevinkluge kevinkluge removed the review label May 29, 2015

@kimchy kimchy deleted the kimchy:minimize_reroute branch May 29, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jun 5, 2015

GatewayAllocator: reset rerouting flag after error
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. elastic#11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

bleskes added a commit that referenced this pull request Jun 5, 2015

GatewayAllocator: reset rerouting flag after error
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519

bleskes added a commit that referenced this pull request Jun 5, 2015

GatewayAllocator: reset rerouting flag after error
After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519

bleskes added a commit that referenced this pull request Jun 5, 2015

GatewayAllocator: reset rerouting flag after error
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.