-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Add daily task to manage .ml-state indices #137653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices. WIP
|
Hi @edsavage, I've created a changelog YAML for you. |
|
Pinging @elastic/ml-core (Team:ML) |
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about what happens if you try to remove an alias from an index if the alias does not exist, does that throw an error? I'm wondering the the various alias request need to check if there is an alias first.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/utils/MlIndexAndAlias.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/utils/MlIndexAndAlias.java
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/utils/MlIndexAndAlias.java
Outdated
Show resolved
Hide resolved
| triggerDeleteJobsInStateDeletingWithoutDeletionTask(continueOnFailureListener("reset-jobs", resetJobs)); | ||
| } | ||
|
|
||
| private ActionListener<AcknowledgedResponse> continueOnFailureListener(String nextTaskName, Runnable next) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice this is a lot easier to read
| Map<String, String> variables = new HashMap<>(); | ||
| variables.put(VERSION_ID_PATTERN, String.valueOf(ML_INDEX_TEMPLATE_VERSION)); | ||
| // In serverless a different version of "state_index_template.json" is shipped that won't substitute the ILM policy variable | ||
| variables.put(INDEX_LIFECYCLE_NAME, ML_SIZE_BASED_ILM_POLICY_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume there is an .ml-state-000001 index created in an earlier version (say 9.0) and that index has the ILM policy. There's now a race between the ILM policy and the ML maintenance service to roll over .ml-state-000001. Can the ML maintenance service ignore indices with a ILM policy? (in stateful but not serverless)
New ml-state indices created from this template won't have the ILM policy so there's no race once a new index is rolled over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a filter for those indices that have an associated ILM policy to avoid the potential race when rolling over. Thanks!
.../java/org/elasticsearch/xpack/ml/integration/MlDailyMaintenanceServiceRolloverIndicesIT.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/ml/integration/MlDailyMaintenanceServiceRolloverIndicesIT.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlDailyMaintenanceService.java
Outdated
Show resolved
Hide resolved
I think we're good here, no error is thrown when attempting to remove a non-existent alias. There's at least one integration test that exercises this scenario, e.g. https://github.com/edsavage/elasticsearch/blob/df2859c7589fbacdab908aeb8d56c7dd106e294d/x-pack/plugin/ml/src/internalClusterTest/java/org/elasticsearch/xpack/ml/integration/MlDailyMaintenanceServiceRolloverIndicesIT.java#L509 |
…nage_ad_state_indices
…search into manage_ad_state_indices
…nage_ad_state_indices
valeriy42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! 🚀 I have just some minor comments.
| rollAndUpdateAliases(clusterState, index, allIndices, updated); | ||
| try { | ||
| updated.actionGet(); | ||
| } catch (Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message in the next line says "Failed to rollover ML anomalies index" but this method is used for both results and state indices. The message should be more generic or parameterized.
| public void triggerRollResultsIndicesIfNecessaryTask(ActionListener<AcknowledgedResponse> finalListener) { | ||
| logger.info("[ML] maintenance task: triggerRollResultsIndicesIfNecessaryTask"); | ||
| // Helper function to check for the "index.lifecycle.name" setting on an index | ||
| private boolean hasIlm(String indexName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a unit test or a BWC test to verify that this works as expected.
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the ILM change
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB).
This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless.
This builds on the work done in PR #136065 which provides similar functionality for results indices.