Skip to content

Conversation

@gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Mar 12, 2024

In this PR we introduce the effective retention calculation and we use it to determine the indices that are past retention.

This PR does not include exposing the effective retention via the different GET APIs, this will come in a subsequent PR.

Part of #106169

@gmarouli gmarouli added >non-issue :Data Management/Data streams Data streams and their lifecycles labels Mar 12, 2024
@gmarouli gmarouli requested a review from masseyke March 12, 2024 20:26
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team v8.14.0 labels Mar 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@gmarouli gmarouli requested a review from parkertimmins March 12, 2024 20:26
@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

* @return the time period or null, null represents that data should never be deleted.
* @deprecated use {@link #getEffectiveDataRetention(DataStreamGlobalRetention)}
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just delete this method? It's not part of any public API, and it looks like it's only used in 2 tests.

Copy link
Contributor Author

@gmarouli gmarouli Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be easier to track down the places that it is being used to properly update them in the follow up PR, because I will be able to use the Find usages.

It's a convenience thing and they will be removed in the next PR.

Of course, if you feel strongly about it I can just inline it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future PR sounds good.

* The least amount of time data should be kept by elasticsearch.
* @return the time period or null, null represents that data should never be deleted.
*/
@Nullable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this can actually ever return null, can it? Also, maybe best to make this package private? The only reason it needs to be exposed at all is for a unit test (and do we even need the source there since it doesn't seem to be used in any production code, or could it just use the method above (I haven't read the test yet so maybe it will become clear then)?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the test, I think we could test the same things without using the RetentionSource at all. Then we could get rid of this method and the RetentionSource enum, and just test getEffectiveDataRetention directly, right? Or is this something that's going to be used in a future PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something coming in a future PR. When we return the lifecycle in a few GET APIs, the lifecycle section will the following structure:

{
  "enabled": true,
  "data_retention": "365d",
  "effective_retention": "90d",
  "retention_determined_by": "max_global_retention"
}

I thought it makes sense to calculate them together always to avoid duplicating this logic.

* Converts the data stream lifecycle to XContent and injects the RolloverConditions if they exist.
* @deprecated use {@link #toXContent(XContentBuilder, Params, RolloverConfiguration, DataStreamGlobalRetention)}
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in 4 places -- might as well just remove this method rather than adding deprecation, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be easier to track down the places that it is being used to properly update them in the follow up PR, because I will be able to use the Find usages.

It's a convenience thing and they will be removed in the next PR.

Of course, if you feel strongly about it I can just inline it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with removing them in a follow-up PR!

effectiveDataRetentionWithSource = lifecycleRetention.getEffectiveDataRetentionWithSource(
new DataStreamGlobalRetention(randomBoolean() ? null : TimeValue.timeValueDays(10), maxRetentionLessThanDataStream)
);
assertThat(effectiveDataRetentionWithSource.v1(), equalTo(maxRetentionLessThanDataStream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor thing, but I think one case that isn't covered is when max retention is greater than data stream retention (you cover it being equal to above, and less than here). Seems worth having in a test just because it's likely the most common situation.


List<Index> backingIndices = dataStream.getIndicesPastRetention(metadata::index, () -> now, globalRetention);
assertThat(backingIndices.size(), is(2));
assertThat(backingIndices.get(0).getName(), is(DataStream.getDefaultBackingIndexName(dataStreamName, 1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to do dataStream.getIndices().get(0).getName() and dataStream.getIndices().get(1).getName(). I've had a lot of problems with this method in tests that were run across midnight (extremely unlikely in a unit test like this, but easier to deal with it now than later).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Great point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it on both retention related tests

assertThat(dataStream.getIndicesPastRetention(metadata::index, () -> now).isEmpty(), is(true));
List<Index> backingIndices = dataStream.getIndicesPastRetention(metadata::index, () -> now, globalRetention);
assertThat(backingIndices.size(), is(2));
assertThat(backingIndices.get(0).getName(), is(DataStream.getDefaultBackingIndexName(dataStreamName, 1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants