-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RCI] Keep index routing table for closed indices #34108
[RCI] Keep index routing table for closed indices #34108
Conversation
This commit adds a new NoOpEngine implementation based on the current ReadOnlyEngine. This new implementation uses an empty DirectoryReader with no segments readers and will always returns 0 docs. The NoOpEngine is the default Engine created for IndexShards of closed indices. It expects an empty translog when it is instantiated. Relates to elastic#33888
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few questions and suggestions. Core logic looks very good already.
* Removes the {@link IndexService} of indices whose state has changed. | ||
* Closing the index services here will force them to be recreated later along with their shards. | ||
*/ | ||
private void removeIndices(final ClusterChangedEvent event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fold this into updateIndices
? There we use the same iteration order and have the same lookup patterns, so I think we can save a lot on the boiler plate and can avoid iterating yet another time over all indices
final AllocatedIndices.IndexRemovalReason reason = | ||
indexMetaData != null && indexMetaData.getState() == IndexMetaData.State.CLOSE ? CLOSED : NO_LONGER_ASSIGNED; | ||
AllocatedIndices.IndexRemovalReason reason = NO_LONGER_ASSIGNED; | ||
if (indexMetaData != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's worth the complexity of this code here just to provide a better message as to why an index service got removed. If you think it's useful, maybe factor the logic of determining the AllocatedIndices.IndexRemovalReason
based on currentState and newState into a helper method so it can be reused by removeIndices
|
||
@Before | ||
public void setUpService() { | ||
AllocationService allocationService = createAllocationService(Settings.EMPTY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use this method without needing to subclass ESAllocationTestCase
.
CloseIndexRequest closeIndexRequest = new CloseIndexRequest(state.metaData().index(index).getIndex().getName()); | ||
state = cluster.closeIndices(state, closeIndexRequest); | ||
IndexMetaData indexMetaData = state.metaData().index(index); | ||
if (state.routingTable().allShards(index).stream().allMatch(ShardRouting::started)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why restrict the test like this?
OpenIndexRequest openIndexRequest = new OpenIndexRequest(state.metaData().index(index).getIndex().getName()); | ||
state = cluster.openIndices(state, openIndexRequest); | ||
IndexMetaData indexMetaData = state.metaData().index(index); | ||
// Do not reopen an index that was just closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
final Index index = indexService.index(); | ||
// do not start or fail shards of indices that were just closed or reopened, because | ||
// they are still initializing and we must wait for the cluster state to be applied | ||
// on node before starting or failing them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. The cluster state was already applied on the node, no? I don't understand the extra restriction here.
@@ -141,6 +141,7 @@ public void testHiddenFiles() throws IOException { | |||
assertThat(e, hasToString(containsString(expected))); | |||
} | |||
|
|||
@AwaitsFix(bugUrl = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with this test?
d2be022
to
8e93bc4
Compare
95cfc1e
to
c2a3ec2
Compare
c2a3ec2
to
71fe1ac
Compare
49ce196
to
fbbeff4
Compare
fbbeff4
to
b00b323
Compare
a0296fc
to
e53a9be
Compare
Closed in favor of #38024 |
Note: this pull request is against the replicated-closed-indices branch
This pull request changes the
MetaDataIndexStateService
class so that it does not remove the routing table of closed indices anymore but instead reinitializes the shards routing with anINITIALIZING
state (and an unassignedINDEX_CLOSED
reason). This way the primary shard allocator will take care of reallocating the shards to the nodes that already hold valid copies of the unassigned primaries, forcing the recreation of the shards on data node. Thanks to #33903 the shards will be recreated using a NoOpEngine.This pull request also adds a
removeIndices()
theIndicesClusterStateService
that detects when the state of an index is changed (CLOSE <-> OPEN) and close the associatedIndexService
, forcing its recreation.I created this pull request to have feedback on these two points. The PR also adds some necessary tests and also adapts an important test
IndicesClusterStateServiceRandomUpdatesTests
.It also mutes a lot of tests (more than I expected...) that sometimes fails because:
For the first two cases, the test need to be adapted and I'd like to do this on a per-test basis. I haven't done it in this PR to reduce the noise.
For the two last cases, the Close Index API needs to be reworked so that it only closes an index when shards are active and fully in sync. This will address in another PR.
Relates #33888