Downsampling: copy the _tier_preference setting #96982

salvatore-campagna · 2023-06-21T14:09:17Z

We need to override the actual value because the target index already has a default value set,
"data_content". Because of the default value set on the target index we were not overriding it
with the settings coming from the source index. This results in potential usage of different tiers
for the downsampling source and target index.

Here we explicitly override the default with the setting coming from the source index.
As a result the downsampling target index will use the same _tier_preference of the
downsampling source index.

Resolves #96733

We need to override the actual value because the target index already has a default value set, "data_content". Because of the default value set on the target index we were not overriding it with the settings coming from the source index. This results in potential usage of different tiers for the downsampling source and target index.

elasticsearchmachine · 2023-06-21T14:09:42Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2023-06-21T14:10:05Z

Hi @salvatore-campagna, I've created a changelog YAML for you.

andreidan

LGTM, thanks for fixing this Salvatore 🚀

andreidan · 2023-06-22T08:15:37Z

...lugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/TransportDownsampleAction.java

@@ -595,6 +598,10 @@ private IndexMetadata.Builder copyIndexMetadata(IndexMetadata sourceIndexMetadat
            if (FORBIDDEN_SETTINGS.contains(key)) {
                continue;
            }
+
+            if (OVERRIDE_SETTINGS.contains(key)) {


This method could be package private and even static in order to be unit tested. Perhaps in a subsequent PR?

+1 to unit test the copyIndexMetadata() method and changing the visibility to package-protected.

Maybe also need to make this static and then make indexScopedSettings a parameter to this method.

andreidan · 2023-06-22T10:49:19Z

This PR assumes that the downsample index lives in the same tier as the source index. If that's the desired requirement here this is ready to be merged.

martijnvg

Let's add a unit test, otherwise LGTM.

martijnvg · 2023-06-22T12:46:02Z

...k/plugin/rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/60_settings.yml

@@ -91,6 +91,275 @@
  - match: { test-rollup.settings.index.default_pipeline: null }
  - match: { test-rollup.settings.index.final_pipeline: null }

+---
+"Downsample index with tier preference":


Maybe only keep one yaml test? (downsampling data stream with tier preference?)
I think rest should be tested via unit tests. We have been adding a lot of yaml tests,
which is good, but also are more costly compared to unit tests. If things can be tested
with unit tests we should always prefer that and keep using yaml tests to real test integration at the rest api level.

I wrote different tests because in the two scenarios the default value we are overwriting is different. In one case it is "data_content" while in the other is "data_hot". My understanding is that this setting is applied differently and I wanted to make sure we are ALWAYS overwriting that.

salvatore-campagna · 2023-06-22T13:36:23Z

This PR assumes that the downsample index lives in the same tier as the source index. If that's the desired requirement here this is ready to be merged.

I assume this for the following reasons:

if downsampling is triggered by means of a simple API call (opposed to using ILM action) we have no other (easy) way to know about the existence of other tiers. A user might not care about tiers or use only some tiers.
in my opinion, if using the API, the POLA principle suggests that if the source index has some settings we try to keep those settings as close as possible to the original.
if we use downsampling through ILM then we will apply the setting using an appropriate ILM step that take the ILM phase as input. In that case it makes sense to me that, if a downsampling action is configured in the warm phase, downsampling happens using resources from the warm tier (and so on).

salvatore-campagna · 2023-06-22T16:18:23Z

...n/rollup/src/test/java/org/elasticsearch/xpack/downsample/TransportDownsampleActionTest.java

+
+    private static void assertSourceSettings(final IndexMetadata indexMetadata, final Settings settings) {
+        assertEquals(indexMetadata.getIndex().getName(), settings.get(IndexMetadata.INDEX_DOWNSAMPLE_SOURCE_NAME_KEY));
+        assertEquals(


While writing this test I realised that copying the tier preference from the source index might not always be the right thing to do.

I wonder if the following scenario is possible considering that, at least at the moment, the name of the downsample target index is generated automatically and not user provided.

A user might define an index template that matches the downsample target index name and that template applies setting to the rollup index, including, for instance, the _tier_preference. In that case we would override a user defined setting which probably we don't want. I think, anyway, that we have no way to know if the setting is coming from a template. Also consider that the value we see at this stage changes:

if the index is a data stream backing index the value we see is "data_hot"

if the index is a "standalone" index the value we see if "data_content"

Does it make sense that we override the value only if the target index setting is the default (data_hot or data_content) and the source index setting is set with a different value? Or should we introduce an additional downsampling request parameter to define the tier to use for downsampling?

I am not sure what is the right strategy here to apply...because if for instance the template matches and applies a certain user-selected _tier_preference using the template, then we would override the user choice.

@martijnvg @andreidan any idea?

Although it is possible to setup an index template for downsample- index prefix, I don't think this will happen often?
And I see this is an anti-pattern. My thinking here is that all the templates are defined on the data stream level. Whenever a new backing index of a data stream is created, the mappings/settings from the template that matches with the data stream is applied. I don't think that during the lifecycle of a backing index whenever we downsample, shrink or do create a searchable snapshot, we should apply a whole different template. Even though the concrete backing index gets replaced by another concrete backing index.

I don't think we should let custom _tier_preference get applied here. ILM is actually in charge of that. I think we should maybe even prevent templates from being applied the backing indices getting downsampled.

++ I think we shouldn't worry about _tier_preference being custom configured in index templates and we should copy it from the source index. As Martijn said, ILM is in charge of transitioning the indices through tiers and will change this setting as part of the migrate action in different phases.

Custom, user-defined allocations will still be able to exist via the custom node attributes.

IMO we shouldn't add special handling in index templates for this particular namespace (which might already exist for certain users out there - outside of the downsampling feature)

We need to override the actual value because the target index already has a default value set, "data_content" or "data_hot" (when the index is a datastream backing index). Because of the default value set on the target index we were not overriding it with the settings coming from the source index. This results in potential usage of different tiers for the downsampling source and target index.

salvatore-campagna added >bug :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Jun 21, 2023

salvatore-campagna requested review from martijnvg and andreidan June 21, 2023 14:09

salvatore-campagna self-assigned this Jun 21, 2023

elasticsearchmachine added v8.10.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 21, 2023

Update docs/changelog/96982.yaml

1a42d08

andreidan approved these changes Jun 22, 2023

View reviewed changes

salvatore-campagna added 2 commits June 22, 2023 11:32

test: datastream with and without tier preference

d5f9eff

fix: use the setting instead of the name

288e40d

martijnvg approved these changes Jun 22, 2023

View reviewed changes

salvatore-campagna added 4 commits June 22, 2023 15:38

test: remove some tests

73e27e9

refactor: make copyIndexMetadata package private and static

618e315

test: indexCopyMetadata

91e5ab8

test: add a few more settings

8f2b8e6

salvatore-campagna commented Jun 22, 2023

View reviewed changes

salvatore-campagna added 3 commits June 22, 2023 19:11

test: include the final_pipeline setting

7a41df8

Merge branch 'main' into feature/96733-downsampling-tier-preference

db36a18

fix: filename naming convention

2d0c77b

salvatore-campagna merged commit 99d70fd into elastic:main Jun 26, 2023
12 checks passed

benwtrent mentioned this pull request Jun 27, 2023

[CI] RollupRestIT test {p0=rollup/60_settings/Downsample datastream with tier preference} failing #97150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downsampling: copy the _tier_preference setting #96982

Downsampling: copy the _tier_preference setting #96982

salvatore-campagna commented Jun 21, 2023

elasticsearchmachine commented Jun 21, 2023

elasticsearchmachine commented Jun 21, 2023

andreidan left a comment

andreidan Jun 22, 2023

martijnvg Jun 22, 2023

martijnvg Jun 22, 2023

andreidan commented Jun 22, 2023

martijnvg left a comment

martijnvg Jun 22, 2023

salvatore-campagna Jun 22, 2023

salvatore-campagna commented Jun 22, 2023

salvatore-campagna Jun 22, 2023 •

edited

martijnvg Jun 23, 2023

andreidan Jun 23, 2023 •

edited

Downsampling: copy the _tier_preference setting #96982

Downsampling: copy the _tier_preference setting #96982

Conversation

salvatore-campagna commented Jun 21, 2023

elasticsearchmachine commented Jun 21, 2023

elasticsearchmachine commented Jun 21, 2023

andreidan left a comment

Choose a reason for hiding this comment

andreidan Jun 22, 2023

Choose a reason for hiding this comment

martijnvg Jun 22, 2023

Choose a reason for hiding this comment

martijnvg Jun 22, 2023

Choose a reason for hiding this comment

andreidan commented Jun 22, 2023

martijnvg left a comment

Choose a reason for hiding this comment

martijnvg Jun 22, 2023

Choose a reason for hiding this comment

salvatore-campagna Jun 22, 2023

Choose a reason for hiding this comment

salvatore-campagna commented Jun 22, 2023

salvatore-campagna Jun 22, 2023 • edited

Choose a reason for hiding this comment

martijnvg Jun 23, 2023

Choose a reason for hiding this comment

andreidan Jun 23, 2023 • edited

Choose a reason for hiding this comment

salvatore-campagna Jun 22, 2023 •

edited

andreidan Jun 23, 2023 •

edited