Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ILM] Version the order of the actions in a phase. #97745

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Jul 18, 2023

This adds versioning for the order of actions within a phase in ILM.

Once an index enters a phase it will execute all the actions in the phase
according to one order (the version it sets when entering the phase).
Note that, as this is a new field we store in the ILM execution state and
expose via the explain API, some clusters out there don't have a value
for the actions_order_version. We treat a lack of value as version 1.

On transition to the next phase, we execute the actions in the current
latest order.

The versioning of the actions order must obey the following invariants:

  1. All versions must exist, from current all the way down to VERSION_ONE
  2. All phases must contain all versions
  3. Higher versions must contain all the actions in the previous version, irrespective of order (i.e. we don't support removing actions)

Aside from the versioning this also changes the order of the downsample
action in the warm and cold phase (fixes #96734)
in order to exemplify the versioning and use a real life example as part of
an integration test.

Fixes #97444

@andreidan andreidan added >non-issue :Data Management/ILM+SLM Index and Snapshot lifecycle management WIP labels Jul 18, 2023
Comment on lines +856 to +876
public void testTimeseriesRegistryContainsAllActionsVersions() {
for (int version = INSTANCE.getLatestActionsOrderVersion(); version >= VERSION_ONE; version--) {
assertThat(ORDERED_VALID_HOT_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_WARM_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_COLD_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_FROZEN_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_DELETE_ACTIONS.get(version), notNullValue());
}
}

public void testTimeseriesRegistryDoesntRemoveActionsInFutureVersions() {
assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_HOT_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_WARM_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_COLD_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_FROZEN_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_DELETE_ACTIONS);
}
Copy link
Contributor Author

@andreidan andreidan Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests encapsulate the versioning invariants:

  1. All versions must exist, from current all the way down to VERSION_ONE
  2. All phases must contain all versions
  3. Higher versions must contain all the actions in the previous version, irrespective of order (i.e. we don't support removing actions)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher versions must contain all the actions in the previous version, irrespective of order (i.e. we don't support removing actions)

Okay, I left a comment about this above, not sure if we should also have an assert for it in the code (though having a test does make me feel better)

@andreidan andreidan force-pushed the ilm-actions-order branch 2 times, most recently from 823dafb to 8775cfb Compare July 19, 2023 16:33
@dakrone dakrone self-requested a review July 19, 2023 16:52
@@ -60,9 +63,11 @@ public class PolicyStepsRegistry {
// keeps track of existing policies in the cluster state
private final SortedMap<String, LifecyclePolicyMetadata> lifecyclePolicyMap;
// keeps track of what the first step in a policy is, the key is policy name
private final Map<String, Step> firstStepMap;
private final Map<VersionedPolicyKey, Step> firstStepMap;
Copy link
Contributor Author

@andreidan andreidan Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the firstStep map versioned as if the first action in a phase is reordered in a subsequent version we'd get a different first step in the phase depending on which version we're executing

// keeps track of a mapping from policy/step-name to respective Step, the key is policy name
private final Map<String, Map<Step.StepKey, Step>> stepMap;
private final Map<VersionedPolicyKey, Map<Step.StepKey, Step>> stepMap;
Copy link
Contributor Author

@andreidan andreidan Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So stepMap is a bit tricky.

Given we don't allow removing actions across versions, so a later version will always contain all the actions in a previous version, and the fact that the stepMap is solely used in boolean stepExists(final String policy, final Step.StepKey stepKey) (note we do NOT return the Step that's in the stepMap), one could argue we don't technically need it versioned - it could hold only the latest actions version of the policy.

And that's true. The latest actions order will have all the step KEYs that exist in the previous versions. The value, the Step associated with a step KEY, could be different (if actions are reordered, the last step KEY in an action will point to different STEPs across actions order) however we never expose the Step.

So we could get away with not versioning the stepMap however it felt error prone to me. If we decide to expose the Step associated with a step key to the outside world we'd only ever expose the latest version Step (it'd be a very subtle bug). Also it feels wrong to me to have a steps registry that's not aware of all the versions of the steps.

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan removed the WIP label Jul 20, 2023
@andreidan andreidan marked this pull request as ready for review July 20, 2023 19:14
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jul 20, 2023
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_DELETE_ACTIONS);
}

private static void assertHigerVersionsContainActionsInLowerVersions(Map<Integer, List<String>> versionedActions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in assertHigerVersionsContainActionsInLowerVersions -> assertHigherVersionsContainActionsInLowerVersions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Thanks Mary

Is the rest ... LGTM ? :)

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Andrei, and I'm sorry it's taken me so long to get to reviewing it. I left some pretty minor comments but this looks solid otherwise.

) {

public LifecycleExecutionState {
if (actionsOrderVersion == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also enforce that it's > 0, to avoid weird -1 issues

@@ -58,6 +65,7 @@ public record LifecycleExecutionState(
private static final String SNAPSHOT_INDEX_NAME = "snapshot_index_name";
private static final String SHRINK_INDEX_NAME = "shrink_index_name";
private static final String DOWNSAMPLE_INDEX_NAME = "rollup_index_name";
private static final String ACTIONS_ORDER_VERSION = "actions_order_version";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with action_order_version and avoid the awkward plural-action-but-singular-version noun

@@ -221,7 +230,8 @@ private IndexLifecycleExplainResponse(
String snapshotName,
String shrinkIndexName,
BytesReference stepInfo,
PhaseExecutionInfo phaseExecutionInfo
PhaseExecutionInfo phaseExecutionInfo,
Integer actionsOrderVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We treat this as non-nullable elsewhere with the check in the state against null and setting it to 1. Should we remove the nullification here and make it a regular int, enforcing it to be 1 if it's unset?

@@ -31,4 +31,6 @@ public interface LifecycleType extends NamedWriteable {
* if a specific phase or lack of a specific phase is invalid.
*/
void validate(Collection<Phase> phases);

int getLatestActionsOrderVersion();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add javadocs for this?

Comment on lines +38 to +59
VERSION_ONE,
Stream.of(
SetPriorityAction.NAME,
UnfollowAction.NAME,
RolloverAction.NAME,
ReadOnlyAction.NAME,
DownsampleAction.NAME,
ShrinkAction.NAME,
ForceMergeAction.NAME,
SearchableSnapshotAction.NAME
).filter(Objects::nonNull).toList(),
VERSION_TWO,
Stream.of(
SetPriorityAction.NAME,
UnfollowAction.NAME,
RolloverAction.NAME,
ReadOnlyAction.NAME,
DownsampleAction.NAME,
ShrinkAction.NAME,
ForceMergeAction.NAME,
SearchableSnapshotAction.NAME
).filter(Objects::nonNull).toList()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are the same, can we put them in a variable so it's clear here that the order isn't changing between versions?

public static final int VERSION_ONE = 1;
// moves the downsample action after the migrate action in the warm and cold phases
public static final int VERSION_TWO = 2;
public static final int CURRENT_VERSION = VERSION_TWO;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, maybe we should call this LATEST_VERSION, what do you think?

Comment on lines +856 to +876
public void testTimeseriesRegistryContainsAllActionsVersions() {
for (int version = INSTANCE.getLatestActionsOrderVersion(); version >= VERSION_ONE; version--) {
assertThat(ORDERED_VALID_HOT_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_WARM_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_COLD_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_FROZEN_ACTIONS.get(version), notNullValue());
assertThat(ORDERED_VALID_DELETE_ACTIONS.get(version), notNullValue());
}
}

public void testTimeseriesRegistryDoesntRemoveActionsInFutureVersions() {
assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_HOT_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_WARM_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_COLD_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_FROZEN_ACTIONS);

assertHigerVersionsContainActionsInLowerVersions(ORDERED_VALID_DELETE_ACTIONS);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher versions must contain all the actions in the previous version, irrespective of order (i.e. we don't support removing actions)

Okay, I left a comment about this above, not sure if we should also have an assert for it in the code (though having a test does make me feel better)

Comment on lines +140 to +149
if (policyMetadataToRemove != null) {
for (int actionsOrderVersion = policyMetadataToRemove.getPolicy()
.getType()
.getLatestActionsOrderVersion(); actionsOrderVersion >= VERSION_ONE; actionsOrderVersion--) {
// remove all versions for the deleted policy
VersionedPolicyKey key = new VersionedPolicyKey(actionsOrderVersion, policyMetadataToRemove.getName());
firstStepMap.remove(key);
stepMap.remove(key);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment here might help when we want to understand why we need to loop through all versions.

Comment on lines +162 to +163
for (int actionsOrderVersion = policy.getType()
.getLatestActionsOrderVersion(); actionsOrderVersion >= VERSION_ONE; actionsOrderVersion--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about a comment explaining why we loop through the versions

@@ -214,7 +232,8 @@ private List<Step> getAllStepsForIndex(ClusterState state, Index index) {
ClientHelper.INDEX_LIFECYCLE_ORIGIN,
policyMetadata.getHeaders()
);
return policyMetadata.getPolicy().toSteps(policyClient, licenseState);
return policyMetadata.getPolicy()
.toSteps(policyClient, indexMetadata.getLifecycleExecutionState().actionsOrderVersion(), licenseState);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's extremely unlikely, but could we possibly reach this and have indexMetadata.getLifecycleExecutionState() return null and hit an NPE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >non-issue Team:Data Management Meta label for data/management team v8.15.0
Projects
None yet
8 participants