-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fast refresh indices should use search shards #113478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast refresh indices should use search shards #113478
Conversation
3f87579 to
f1ff18a
Compare
|
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Looking forward to simplify things with ES-9563 after this PR is successfully rolled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments/questions.
.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java
Outdated
Show resolved
Hide resolved
| // Fast refresh indices do not depend on the unpromotables being refreshed | ||
| boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings()); | ||
| if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) { | ||
| if (location != null && indexShard.routingEntry().isSearchable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes it for future refreshes after the indexing node upgraded. But it does not guarantee immediate availability of the latest state on the search node. So we risk some seconds of non-realtime GET requests going backwards during such an upgrade? I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning here is this code runs on the primary/indexing node, and indeed that the indexing node will be upgraded after the search nodes.
But it does not guarantee immediate availability of the latest state on the search node.
Doesn't our upgrade process guarantee that, since search nodes are upgraded first?
So we risk some seconds of non-realtime GET requests going backwards during such an upgrade?
A non-realtime GET coordinated by an old search node will go the primary to execute.
A non-realtime GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A non-realtime GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should be fine as well. Not sure I see when/why it might go backwards?
I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?
A real-time GET coordinated by an old search node will go the primary to execute.
A real-time GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A real-time GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should use wait-for generation if necessary.
Please tell me if you see any corner cases I might have missed or not considered. It might be useful to think about the above combinations also for searches/mgets, but I believe it should be a similar story for them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that it works out. The upgrade will force a relocation, which forces a flush, bringing things back into order. Thanks.
server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/get/TransportShardMultiGetAction.java
Outdated
Show resolved
Hide resolved
f1ff18a to
40df9da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @henningandersen for the feedback! Feel free to review again.
.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java
Show resolved
Hide resolved
| // Fast refresh indices do not depend on the unpromotables being refreshed | ||
| boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings()); | ||
| if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) { | ||
| if (location != null && indexShard.routingEntry().isSearchable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning here is this code runs on the primary/indexing node, and indeed that the indexing node will be upgraded after the search nodes.
But it does not guarantee immediate availability of the latest state on the search node.
Doesn't our upgrade process guarantee that, since search nodes are upgraded first?
So we risk some seconds of non-realtime GET requests going backwards during such an upgrade?
A non-realtime GET coordinated by an old search node will go the primary to execute.
A non-realtime GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A non-realtime GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should be fine as well. Not sure I see when/why it might go backwards?
I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?
A real-time GET coordinated by an old search node will go the primary to execute.
A real-time GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A real-time GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should use wait-for generation if necessary.
Please tell me if you see any corner cases I might have missed or not considered. It might be useful to think about the above combinations also for searches/mgets, but I believe it should be a similar story for them as well.
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579
8f85a6c to
da342b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, main issue remaining is the BitsetFilterCache.
server/src/main/java/org/elasticsearch/action/get/TransportShardMultiGetAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java
Show resolved
Hide resolved
| // Fast refresh indices do not depend on the unpromotables being refreshed | ||
| boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings()); | ||
| if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) { | ||
| if (location != null && indexShard.routingEntry().isSearchable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that it works out. The upgrade will force a relocation, which forces a flush, bringing things back into order. Thanks.
As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java
Show resolved
Hide resolved
💚 Backport successful
|
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579
|
IIUC, it is not absolutely necessary to backport this PR to 8.16 since the change affects serverless only and serverless works on the |
|
Hi @ywangd , I think I backported it because I saw the transport versions are still on 8 major version, so somehow it made sense in my mind this should be backported. But, no, it was not necessary to backport it indeed. And it has nothing to do with any future work. Nor does it affect stateful. |
|
Thanks for the explanation. 🙏 |
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579
As recognized in PR #113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.
As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.
As recognized in PR #113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579
As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.
Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards.
For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally.
Relates ES-9573
Relates ES-9579