-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DSL Global Retention] Calculate and use global retention in DSL #106268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
@elasticmachine update branch |
| * @return the time period or null, null represents that data should never be deleted. | ||
| * @deprecated use {@link #getEffectiveDataRetention(DataStreamGlobalRetention)} | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not just delete this method? It's not part of any public API, and it looks like it's only used in 2 tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be easier to track down the places that it is being used to properly update them in the follow up PR, because I will be able to use the Find usages.
It's a convenience thing and they will be removed in the next PR.
Of course, if you feel strongly about it I can just inline it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future PR sounds good.
| * The least amount of time data should be kept by elasticsearch. | ||
| * @return the time period or null, null represents that data should never be deleted. | ||
| */ | ||
| @Nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this can actually ever return null, can it? Also, maybe best to make this package private? The only reason it needs to be exposed at all is for a unit test (and do we even need the source there since it doesn't seem to be used in any production code, or could it just use the method above (I haven't read the test yet so maybe it will become clear then)?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the test, I think we could test the same things without using the RetentionSource at all. Then we could get rid of this method and the RetentionSource enum, and just test getEffectiveDataRetention directly, right? Or is this something that's going to be used in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something coming in a future PR. When we return the lifecycle in a few GET APIs, the lifecycle section will the following structure:
{
"enabled": true,
"data_retention": "365d",
"effective_retention": "90d",
"retention_determined_by": "max_global_retention"
}
I thought it makes sense to calculate them together always to avoid duplicating this logic.
| * Converts the data stream lifecycle to XContent and injects the RolloverConditions if they exist. | ||
| * @deprecated use {@link #toXContent(XContentBuilder, Params, RolloverConfiguration, DataStreamGlobalRetention)} | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in 4 places -- might as well just remove this method rather than adding deprecation, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be easier to track down the places that it is being used to properly update them in the follow up PR, because I will be able to use the Find usages.
It's a convenience thing and they will be removed in the next PR.
Of course, if you feel strongly about it I can just inline it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with removing them in a follow-up PR!
| effectiveDataRetentionWithSource = lifecycleRetention.getEffectiveDataRetentionWithSource( | ||
| new DataStreamGlobalRetention(randomBoolean() ? null : TimeValue.timeValueDays(10), maxRetentionLessThanDataStream) | ||
| ); | ||
| assertThat(effectiveDataRetentionWithSource.v1(), equalTo(maxRetentionLessThanDataStream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor thing, but I think one case that isn't covered is when max retention is greater than data stream retention (you cover it being equal to above, and less than here). Seems worth having in a test just because it's likely the most common situation.
|
|
||
| List<Index> backingIndices = dataStream.getIndicesPastRetention(metadata::index, () -> now, globalRetention); | ||
| assertThat(backingIndices.size(), is(2)); | ||
| assertThat(backingIndices.get(0).getName(), is(DataStream.getDefaultBackingIndexName(dataStreamName, 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to do dataStream.getIndices().get(0).getName() and dataStream.getIndices().get(1).getName(). I've had a lot of problems with this method in tests that were run across midnight (extremely unlikely in a unit test like this, but easier to deal with it now than later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Great point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it on both retention related tests
| assertThat(dataStream.getIndicesPastRetention(metadata::index, () -> now).isEmpty(), is(true)); | ||
| List<Index> backingIndices = dataStream.getIndicesPastRetention(metadata::index, () -> now, globalRetention); | ||
| assertThat(backingIndices.size(), is(2)); | ||
| assertThat(backingIndices.get(0).getName(), is(DataStream.getDefaultBackingIndexName(dataStreamName, 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
|
@elasticmachine update branch |
|
@elasticmachine update branch |
In this PR we introduce the effective retention calculation and we use it to determine the indices that are past retention.
This PR does not include exposing the effective retention via the different GET APIs, this will come in a subsequent PR.
Part of #106169