Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cluster update reroutes with async fetch #11421

Closed
wants to merge 1 commit into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented May 29, 2015

When using async fetch, we can end up with cluster updates and reroutes based on teh number of shards. While not disastrous we can optimize it, since a single reroute is enough to apply to all the async fetch results that arrived during that time.

When using async fetch, we can end up with cluster updates and reroutes based on teh number of shards. While not disastrous we can optimize it, since a single reroute is enough to apply to all the async fetch results that arrived during that time.
logger.trace("{} already has pending reroute, ignoring {}", shardId, reason);
return;
}
clusterService.submitStateUpdateTask("async_shard_fetch", Priority.HIGH, new ClusterStateUpdateTask() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be valuable to have the original type, shardId, and reason in the message, did you remove it on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thought that it becomes less relevant, since a single reroute actually represents a few events now, and it can be misleading seeing in the pending task information about a shard id, where it might be ones for multiple ones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@dakrone
Copy link
Member

dakrone commented May 29, 2015

left one comment about removing info from the message, other than that LGTM

@clintongormley clintongormley added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels May 29, 2015
@kimchy
Copy link
Member Author

kimchy commented May 29, 2015

@dakrone I added a comment back, tell me if it makes sense

@kimchy kimchy closed this May 29, 2015
@kevinkluge kevinkluge removed the review label May 29, 2015
@kimchy kimchy deleted the minimize_reroute branch May 29, 2015 21:40
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. elastic#11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.
bleskes added a commit that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
bleskes added a commit that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
bleskes added a commit that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants