Skip to content

Conversation

@gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Oct 23, 2025

Following #136813, we expose to data stream lifecycle the new sampling method config in the downsampling API. This will allow users to configure the sampling method directly in the lifecycle configuration. For example:

PUT _data_stream/my-ds/_lifecycle
{
  "data_retention": "10d",
  "downsampling_method": "last_value",
  "downsampling": [
     {
       "after": "1d",
       "fixed_interval": "5m
     }
  ]
}

Specification PR: elastic/elasticsearch-specification#5594

@gmarouli gmarouli added >enhancement :Data Management/Data streams Data streams and their lifecycles :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Oct 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@gmarouli gmarouli requested a review from nielsbauman October 23, 2025 12:49
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this, Mary! I posted some minor comments, so hopefully it doesn't take you too long to get through them.

round.config()
new DownsampleConfig(
round.fixedInterval(),
sourceIndexSamplingMethod == null ? requestedDownsamplingMethod : sourceIndexSamplingMethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the reasoning behind this sampling method logic? Why are we specifying the sampling method from the source index metadata? Is there a reason we can't just always pass the requestedDownsamplingMethod? I'm not saying it's wrong, it's just unexpected to me. Could you explain your thinking at least in a comment in the code and optionally here in the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this is to make data stream lifecycle handle changes of the sampling method gracefully. Consider the following scenario.

  1. User has a data stream with a couple of downsampling rounds, and there are backing indices that are half way in their downsampled journey, for example they have been downsampled once, but there are still more rounds that apply to them.
  2. User decides to change the sampling method of their lifecycle for this data stream.
  3. If we use the requested method to downsample an already downsampled index, this will fail. DLM will report the errors for this indices until it's time to delete them.

The same thing applies for ILM, but in ILM a user can create a new policy with the new sampling method so it will only be applied to the newest backing indices.

Considering that we want data stream lifecycle to work seamlessly, we thought that using the downsampling method of the source index when available is closer to how the user is expecting it to work (considering the limitations).

Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that definitely makes sense. Thanks for the explanation :). Could you put that explanation somewhere in this method?

Comment on lines +276 to +277
* This test ensures that when we change the sampling method, the already downsampled indices will use the original sampling method,
* while the raw data ones will be downsampled with the most recent configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you were already planning this, but I'd recommend adding some docs somewhere that describe this update behavior. I don't know downsampling that well, so TBH I have trouble understanding this test (i.e. why the 5m index gets deleted), but that's probably on me.

Copy link
Contributor Author

@gmarouli gmarouli Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I added it within the test just like the previous tests has a similar comment. Let me know if it's clear now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually referring to adding some documentation for users on how DLM handles changing the downsampling method. But I appreciate this a lot too! I still think there's value in documenting this for users, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, my concern is that we need to see documentation more holistically, not only on the API level. I will add a note there, but the sampling method needs to be documented and explained properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a documentation task in the meta issue: #128357

reason: "Downsampling method was added to data stream lifecycle was available from 9.3"

- do:
indices.put_index_template:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a test somewhere that verifies that the downsampling method is resolved correctly with certain permutations of component templates, index templates, and lifecycles? Or do you think that's already covered sufficiently by the ResettableValue infrastructure? This could be either in a YAML or Java REST tests, although the latter feels like a better choice to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we should extend template composition tests to include this too. However, I do not think this should be a YAML or a REST test, it feels too low level for that. We already have

. I would add the sampling method here. What do you think?

Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, we just have one or two outstanding discussions to resolve, after that I'll approve :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Data streams Data streams and their lifecycles >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:Data Management Meta label for data/management team Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants