-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for search failures if shard is in POST_RECOVERY #12600
Conversation
I like this a lot! I wonder if we can streamline the implementation here a little more and forward the original request with the shard ID together to have some code we can share between flush and refresh and whatever comes after that? Anyway I think we can just start with what we have here and see how flush turns out afterwards. |
I was actually wondering if we should also have a dedicated Action, similar to TransportBroadcastAction something like TransportReplicatedBroadcastAction or something that flush and refresh can derive from. |
|
||
private final IndicesService indicesService; | ||
private final TransportReplicatedRefreshAction replicatedRefreshAction; | ||
ClusterService clusterService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be private final?
a294f81
to
0ab0903
Compare
…y not be searchable if still in POST_RECOVERY) see elastic#9421
…d when shard is in POST_RECOVERY
When a client indexes a documents and then calls refresh on the index then the document must be visible after that with search requests. This might not be the case if refresh is a BroadcastOperationAction, see DiscoveryWithServiceDisruptionsTests.testReadOnPostRecoveryShards related to elastic#9421
0ab0903
to
1100f56
Compare
99992c9
to
253e60f
Compare
I continued on this. I tried to generalize what I did for refresh so that it can be used for flush too. Now I wonder: should this work for synced flush too? |
@brwe what's the status here, do you wait for reviews? |
@brwe and I talked this through and we decided to try and simplify things and remove some intermediate abstractions. Concretely try to:
Also, we should break this into two PRs:
|
I made a pr for the first part here: #13068 |
prerequisite to elastic#9421 see also elastic#12600
Opened the second pr for the actual fix now here: #13246 |
Currently, we do not allow reads on shards which are in POST_RECOVERY which unfortunately can cause search failures on shards which just recovered if there no replicas (#9421).
The reason why we did not allow reads on shards that are in POST_RECOVERY is that after relocating a shard might miss a refresh if the node that executed the refresh is behind with cluster state processing. If that happens, a user might execute index/refresh/search but still not find the document that was indexed.
More details here.
@bleskes and I discussed this briefly and he mentioned we could make refresh a replicated operation that goes the same route that index operations go and thereby make sure that the refresh reaches every shard. In this case we could also allow reads on POST_RECOVERY.
I make this PR as a proof of concept so that we can discuss if this is actually a good idea.
This PR contains:
Let me know what you think. I would make the same changes for flush also.