-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Downsampling] Ensure downsample tasks in stateless work with replicas #130160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Downsampling] Ensure downsample tasks in stateless work with replicas #130160
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, looks great Mary! I left one minor comment.
| * @param indexShardRouting the routing of the shard to be downsampled | ||
| * @return the set of candidate nodes downsampling can run on. | ||
| */ | ||
| Set<String> getEligibleNodes(IndexShardRoutingTable indexShardRouting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite this method like this:
Predicate<DiscoveryNode> getEligibleNodes(IndexShardRoutingTable indexShardRouting) {
if (isStateless) {
return candidateNode -> {
for (var shardRouting : indexShardRouting.replicaShards()) {
if (shardRouting.started()) {
return shardRouting.currentNodeId().equals(candidateNode.getId());
}
}
return false;
};
} else if (indexShardRouting.primaryShard().started()) {
return candidateNode -> indexShardRouting.primaryShard().currentNodeId().equals(candidateNode.getId());
} else {
return null;
}
}
That way the logic is better contained and no intermediate set is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about this, and I think both my current implementation and this can be a bit inefficient in big clusters because we iterate over all the candidate nodes that could be a lot.
Thinking about this some more, I am considering iterating over the eligible nodes which are probably way less than the total candidates and check if they are a candidate node.
What do you think?
About the code you shared, it does read nicely, but I am concerned that for every candidate node we are going to be building a new iterator over the replica shards ending up with more object churn than with the previous solution. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg I refactored it, I think it looks a bit cleaner now. Let me know what you think.
.../src/main/java/org/elasticsearch/xpack/downsample/DownsampleShardPersistentTaskExecutor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! Was a bit late reviewing, just one comment.
| * For simplicity, in non-stateless deployments we use the primary shard. | ||
| */ | ||
| private boolean isEligible(ShardRouting shardRouting) { | ||
| return shardRouting.started() && (isStateless ? shardRouting.isPromotableToPrimary() == false : shardRouting.primary()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I think the last parenthesis could be simplified with shardRouting.isSearchable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, I will change it.
elastic#130160) Downsample tasks run on nodes that hold a searchable version of the shard to be downsampled. So far, we chose the primary shard. This is sufficient in general but not in a stateless distribution when only non-primary shards are searchable. In this PR we add functionality to distinguish a stateless deployment and choose only search shards.
Downsample tasks run on nodes that hold a searchable version of the shard to be downsampled. So far, we chose the primary shard. This is sufficient in general but not in a stateless distribution when only non-primary shards are searchable.
In this PR we add functionality to distinguish a stateless deployment and choose only search shards.