Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling: copy the _tier_preference setting #96982

Conversation

salvatore-campagna
Copy link
Contributor

We need to override the actual value because the target index already has a default value set,
"data_content". Because of the default value set on the target index we were not overriding it
with the settings coming from the source index. This results in potential usage of different tiers
for the downsampling source and target index.

Here we explicitly override the default with the setting coming from the source index.
As a result the downsampling target index will use the same _tier_preference of the
downsampling source index.

Resolves #96733

We need to override the actual value because the target
index already has a default value set, "data_content".
Because of the default value set on the target index
we were not overriding it with the settings coming from
the source index. This results in potential usage
of different tiers for the downsampling source and target
index.
@salvatore-campagna salvatore-campagna added >bug :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Jun 21, 2023
@salvatore-campagna salvatore-campagna self-assigned this Jun 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added v8.10.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this Salvatore 🚀

@@ -595,6 +598,10 @@ private IndexMetadata.Builder copyIndexMetadata(IndexMetadata sourceIndexMetadat
if (FORBIDDEN_SETTINGS.contains(key)) {
continue;
}

if (OVERRIDE_SETTINGS.contains(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could be package private and even static in order to be unit tested. Perhaps in a subsequent PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to unit test the copyIndexMetadata() method and changing the visibility to package-protected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also need to make this static and then make indexScopedSettings a parameter to this method.

@andreidan
Copy link
Contributor

This PR assumes that the downsample index lives in the same tier as the source index. If that's the desired requirement here this is ready to be merged.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a unit test, otherwise LGTM.

@@ -91,6 +91,275 @@
- match: { test-rollup.settings.index.default_pipeline: null }
- match: { test-rollup.settings.index.final_pipeline: null }

---
"Downsample index with tier preference":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only keep one yaml test? (downsampling data stream with tier preference?)
I think rest should be tested via unit tests. We have been adding a lot of yaml tests,
which is good, but also are more costly compared to unit tests. If things can be tested
with unit tests we should always prefer that and keep using yaml tests to real test integration at the rest api level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote different tests because in the two scenarios the default value we are overwriting is different. In one case it is "data_content" while in the other is "data_hot". My understanding is that this setting is applied differently and I wanted to make sure we are ALWAYS overwriting that.

@salvatore-campagna
Copy link
Contributor Author

This PR assumes that the downsample index lives in the same tier as the source index. If that's the desired requirement here this is ready to be merged.

I assume this for the following reasons:

  1. if downsampling is triggered by means of a simple API call (opposed to using ILM action) we have no other (easy) way to know about the existence of other tiers. A user might not care about tiers or use only some tiers.
  2. in my opinion, if using the API, the POLA principle suggests that if the source index has some settings we try to keep those settings as close as possible to the original.
  3. if we use downsampling through ILM then we will apply the setting using an appropriate ILM step that take the ILM phase as input. In that case it makes sense to me that, if a downsampling action is configured in the warm phase, downsampling happens using resources from the warm tier (and so on).


private static void assertSourceSettings(final IndexMetadata indexMetadata, final Settings settings) {
assertEquals(indexMetadata.getIndex().getName(), settings.get(IndexMetadata.INDEX_DOWNSAMPLE_SOURCE_NAME_KEY));
assertEquals(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing this test I realised that copying the tier preference from the source index might not always be the right thing to do.

I wonder if the following scenario is possible considering that, at least at the moment, the name of the downsample target index is generated automatically and not user provided.

A user might define an index template that matches the downsample target index name and that template applies setting to the rollup index, including, for instance, the _tier_preference. In that case we would override a user defined setting which probably we don't want. I think, anyway, that we have no way to know if the setting is coming from a template. Also consider that the value we see at this stage changes:

  1. if the index is a data stream backing index the value we see is "data_hot"
  2. if the index is a "standalone" index the value we see if "data_content"

Does it make sense that we override the value only if the target index setting is the default (data_hot or data_content) and the source index setting is set with a different value? Or should we introduce an additional downsampling request parameter to define the tier to use for downsampling?

I am not sure what is the right strategy here to apply...because if for instance the template matches and applies a certain user-selected _tier_preference using the template, then we would override the user choice.

@martijnvg @andreidan any idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is possible to setup an index template for downsample- index prefix, I don't think this will happen often?
And I see this is an anti-pattern. My thinking here is that all the templates are defined on the data stream level. Whenever a new backing index of a data stream is created, the mappings/settings from the template that matches with the data stream is applied. I don't think that during the lifecycle of a backing index whenever we downsample, shrink or do create a searchable snapshot, we should apply a whole different template. Even though the concrete backing index gets replaced by another concrete backing index.

I don't think we should let custom _tier_preference get applied here. ILM is actually in charge of that. I think we should maybe even prevent templates from being applied the backing indices getting downsampled.

Copy link
Contributor

@andreidan andreidan Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I think we shouldn't worry about _tier_preference being custom configured in index templates and we should copy it from the source index. As Martijn said, ILM is in charge of transitioning the indices through tiers and will change this setting as part of the migrate action in different phases.

Custom, user-defined allocations will still be able to exist via the custom node attributes.

IMO we shouldn't add special handling in index templates for this particular namespace (which might already exist for certain users out there - outside of the downsampling feature)

@salvatore-campagna salvatore-campagna merged commit 99d70fd into elastic:main Jun 26, 2023
12 checks passed
jpountz pushed a commit to jpountz/elasticsearch that referenced this pull request Jun 26, 2023
We need to override the actual value because the target
index already has a default value set, "data_content"
or "data_hot" (when the index is a datastream backing index).
Because of the default value set on the target index
we were not overriding it with the settings coming from
the source index. This results in potential usage
of different tiers for the downsampling source and target
index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The downsample operation doesn't set any tier preference for the downsampled index
4 participants