-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ILM] Delete step deletes data stream with only one index #105772
Conversation
We seem to have a couple of checks to make sure we delete the data stream when the last index reaches the delete step however, these checks seem a bit contradictory. Namely, the first check makes use if `Index` equality (UUID included) and the second just checks the index name. So if a data stream with just one index (the write index) is restored from snapshot (different UUID) we would've failed the first index equality check and go through the second check `dataStream.getWriteIndex().getName().equals(indexName)` and fail the delete step (in a non-retryable way :( ) because we don't want to delete the write index of a data stream (but we really do if the data stream has only one index) This PR makes 2 changes: 1. use the index name equality everywhere in the step (we already looked up the index abstraction and the parent data stream, so we know for sure the managed index is part of the data stream) 2. do not throw exception when we got here via a write index that is NOT the last index in the data stream but report the exception so we keep retrying this step (i.e. this enables our users to simply execute a manual rollover and the index is deleted by ILM eventually on retry)
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @andreidan, I've created a changelog YAML for you. |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one comment about a simple comment, and another about enhancing the test, thanks for finding this Andrei!
@@ -41,7 +41,8 @@ public void performDuringNoSnapshot(IndexMetadata indexMetadata, ClusterState cu | |||
|
|||
if (dataStream != null) { | |||
assert dataStream.getWriteIndex() != null : dataStream.getName() + " has no write index"; | |||
if (dataStream.getIndices().size() == 1 && dataStream.getIndices().get(0).equals(indexMetadata.getIndex())) { | |||
|
|||
if (dataStream.getIndices().size() == 1 && dataStream.getWriteIndex().getName().equals(indexName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about why we use name equality here so it doesn't get accidentally changed back? (I know we have tests, but it's still easy to stop someone from wasting work)
} | ||
|
||
@Override | ||
public void onFailure(Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that now that this isn't throwing directly, we need to have a latch or other mechanism to ensure that the onFailure
handler was actually invoked. Otherwise if we were to introduce a bug where neither onResponse
nor onFailure
were called, then we wouldn't hit any asserts and the test would pass (when it shouldn't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution in this particular case is not async as we're not getting to the callback as part of a client interaction - this is all sync as it's part of the step validation.
I've stubbed the client to fail in case it's being called in this test so that changes to the ILM step structure will yield this test to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I (later) understood what you meant - opened #105914 to make sure we fail the test if the listener is not called at all
@elasticmachine update branch |
…5772) We seem to have a couple of checks to make sure we delete the data stream when the last index reaches the delete step however, these checks seem a bit contradictory. Namely, the first check makes use if `Index` equality (UUID included) and the second just checks the index name. So if a data stream with just one index (the write index) is restored from snapshot (different UUID) we would've failed the first index equality check and go through the second check `dataStream.getWriteIndex().getName().equals(indexName)` and fail the delete step (in a non-retryable way :( ) because we don't want to delete the write index of a data stream (but we really do if the data stream has only one index) This PR makes 2 changes: 1. use the index name equality everywhere in the step (we already looked up the index abstraction and the parent data stream, so we know for sure the managed index is part of the data stream) 2. do not throw exception when we got here via a write index that is NOT the last index in the data stream but report the exception so we keep retrying this step (i.e. this enables our users to simply execute a manual rollover and the index is deleted by ILM eventually on retry)
💚 Backport successful
|
…105897) We seem to have a couple of checks to make sure we delete the data stream when the last index reaches the delete step however, these checks seem a bit contradictory. Namely, the first check makes use if `Index` equality (UUID included) and the second just checks the index name. So if a data stream with just one index (the write index) is restored from snapshot (different UUID) we would've failed the first index equality check and go through the second check `dataStream.getWriteIndex().getName().equals(indexName)` and fail the delete step (in a non-retryable way :( ) because we don't want to delete the write index of a data stream (but we really do if the data stream has only one index) This PR makes 2 changes: 1. use the index name equality everywhere in the step (we already looked up the index abstraction and the parent data stream, so we know for sure the managed index is part of the data stream) 2. do not throw exception when we got here via a write index that is NOT the last index in the data stream but report the exception so we keep retrying this step (i.e. this enables our users to simply execute a manual rollover and the index is deleted by ILM eventually on retry)
We seem to have a couple of checks to make sure we delete the data stream when the last index reaches the delete step however, these checks seem a bit contradictory.
Namely, the first check makes use if
Index
equality (UUID included) and the second just checks the index name.So if a data stream with just one index (the write index) is restored from snapshot (different UUID) we would've failed the first index equality check and go through the second check
dataStream.getWriteIndex().getName().equals(indexName)
and fail the delete step (in a non-retryable way :( ) because we don't want to delete the write index of a data stream (but we really do if the data stream has only one index)This PR makes 2 changes: