Skip to content

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Oct 6, 2025

In this PR, we propose to add practical tips for downsampling. For now this includes, a guideline on how to choose the downsampling interval. And then specifically for ILM, an explanation on how downsampling relates with tiers. After elastic/elasticsearch#135834, we should also add here the option to disable force merge.

@gmarouli gmarouli requested review from a team as code owners October 6, 2025 13:57
@gmarouli gmarouli requested review from kkrik-es and marciw October 6, 2025 13:58
@kkrik-es
Copy link
Contributor

kkrik-es commented Oct 6, 2025

Can we wait for Marci to submit her update, then port these tips to the new structure?

## Practical tips

Downsampling requires reading and indexing the contents of a backing index. The following guidelines can help you get the most out of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a note about rollover? To avoid creating backing indices that are too big..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been going back and forth for this. For ILM it's easy because it's part of the policy, for data stream lifecycle, I would suggest that if we really think that it should be less maybe we should set it to something less. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, update the default? We can do that at a later point, but what about older versions, or ILM configurations with existing rollover overrides? It could still help to suggest a best practice here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could update the default, that would apply on all version unless the user chose to overwrite it. I restructure it a bit so we can have ILM focused recommendations. But if we think it should be reduced, we should consider updating the default for DLM as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a tracking issue for this, so that we don't forget.

@gmarouli
Copy link
Contributor Author

gmarouli commented Oct 6, 2025

Can we wait for Marci to submit her update, then port these tips to the new structure?

It is created on the updated downsampling page. Right?

Copy link

github-actions bot commented Oct 7, 2025

🔍 Preview links for changed docs

gmarouli and others added 2 commits October 7, 2025 11:02
Co-authored-by: Kostas Krikellas <131142368+kkrik-es@users.noreply.github.com>
gmarouli and others added 2 commits October 7, 2025 14:44
Co-authored-by: Kostas Krikellas <131142368+kkrik-es@users.noreply.github.com>

### Choosing the downsampling interval

When choosing the downsampling interval, you need to consider the original sampling rate of your measurements. Ideally, you would like an interval that would reduce your number of documents by a significant amount. For example, if a sensor sends data every 10 seconds downsampling to 1 minute would reduce the number of documents by 83%, compared to downsampling to 5 minutes by 96%.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When choosing the downsampling interval, you need to consider the original sampling rate of your measurements. Ideally, you would like an interval that would reduce your number of documents by a significant amount. For example, if a sensor sends data every 10 seconds downsampling to 1 minute would reduce the number of documents by 83%, compared to downsampling to 5 minutes by 96%.
When choosing the downsampling interval, you need to consider the original sampling rate of your measurements. Ideally, you would like an interval that would reduce your number of documents by a significant amount. For example, if a sensor sends data every 10 seconds, downsampling to 1 minute would reduce the number of documents by 83%, compared to downsampling to 5 minutes by 96%.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for Marci to have a pass too.

Copy link
Contributor

@marciw marciw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick edit -- let me know if anything's unclear :)

gmarouli and others added 2 commits October 8, 2025 11:05
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>

### Reduce the index size (ILM only)

When configuring an ILM policy with downsampling, use the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) in the `hot` phase to control index size. Using smaller indices helps to minimize the impact of downsampling on a cluster's performance.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important here to say use and not define. When writing an ILM policy a user needs to define a rollover action no matter what. However, if they are using downsampling, they can consider using this to reduce the size of their index. I want us to be careful to not imply that a user needs to define it only if they are trying to reduce the size. @marciw does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you mean. we could try "set the rollover action to run in the hot phase"

("use" doesn't seem idiomatic to me here, so i'm hoping we can find an alternative)

@gmarouli gmarouli requested a review from marciw October 8, 2025 08:16
@gmarouli
Copy link
Contributor Author

gmarouli commented Oct 8, 2025

I am adding @leontyevdv as a reviewer because he recently watched a tutorial on how to configure TSDS and he can tell us how well it reads for a user.

@gmarouli gmarouli requested a review from leontyevdv October 8, 2025 08:18
@gmarouli gmarouli self-assigned this Oct 8, 2025
Copy link
Contributor

@marciw marciw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a few more more comments but approving to unblock 🚀

@gmarouli gmarouli merged commit b916a47 into main Oct 9, 2025
7 checks passed
@gmarouli gmarouli deleted the downsampling-practical-tips branch October 9, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants