Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TSDB downsampling ILM configuration #138748

Merged
merged 14 commits into from Sep 13, 2022

Conversation

Dosant
Copy link
Contributor

@Dosant Dosant commented Aug 12, 2022

Summary

Close #130437

Enables TSDB downsampling action in ILM configuration.
Actions can be set in the hot, warm, cold phases.

  • It is hidden in warm and cold phases when searchable snapshots are enabled in hot phase (just like shrink, readonly, force merge actions).

Screenshot 2022-08-31 at 17 00 43

Users need to specify an interval:

Screenshot 2022-08-31 at 17 00 48

Intervals have the following validation:

  • they must be greater than any previous interval and a multiple of that interval.
    • For example, on the warm phase, the user includes a 3-hour interval; if they choose to add a rollup action on the cold phase, the interval must be a multiple of 3h. It cannot be 5h, for example.
    • Another example: a user can have a rollup on the hot phase of 2d and then no rollup action on the warm phase, and then a rollup action on the cold phase, but it must be a multiple of 2d, for example, it cannot be 1d nor can it be 3d.

Screenshot 2022-08-31 at 17 07 36

How to test

(What I tested)

  1. Create an ILM policy with Hot phase only. Enabled rollover, disable "use recommended settings" and set "maximum documents to 1
  2. Enable new "Downsample" action, set interval to 1 hour.
  3. Save policy
  4. Then create a data stream template using Dev tools: https://github.com/csoulios/downsampling-demo/blob/main/downsample-ilm.es#L26-L169. Currently, it isn't possible to create this template using UI because of [Mappings Editor] allow time series related properties #139922. Make sure you are attaching your ilm policy to the template. You can also do it through UI
  5. Create data stream by injesting data using Dev tools: https://github.com/csoulios/downsampling-demo/blob/1e66a13bccc7d70aae7763bcc9c4499ec8b3fd02/downsample-ilm.es#L174-L194 . You will have to change timestamp in the data to the current hour
  6. Inspect indices and data streams. You should see them changed during the next hour. Inspect data of the downsampled stream in Discover

Note

  • Rollup of rollups is not yet implemented on es side. We should test everything end to end when it is ready. We wouldn't need to change ILM screen when it ready so we can merge without waiting for that.
  • Ideally we should have a es docs link about ILM and downsampling to point to

Release Notes

Enables time series downsampling action in ILM configuration

@Dosant Dosant changed the title ilm rollup wip [wip] Enable TSDB downsampling ILM configuration Aug 24, 2022
@Dosant
Copy link
Contributor Author

Dosant commented Sep 6, 2022

@elasticmachine merge upstream

@Dosant Dosant added the ci:cloud-deploy Create or update a Cloud deployment label Sep 7, 2022
@Dosant
Copy link
Contributor Author

Dosant commented Sep 7, 2022

@elasticmachine merge upstream

@Dosant Dosant added Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more release_note:skip Skip the PR/issue when compiling release notes Team:AppServicesUx v8.5.0 labels Sep 9, 2022
@Dosant Dosant marked this pull request as ready for review September 9, 2022 13:24
@Dosant Dosant requested a review from a team as a code owner September 9, 2022 13:24
@elasticmachine
Copy link
Contributor

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@Dosant Dosant changed the title [wip] Enable TSDB downsampling ILM configuration Enable TSDB downsampling ILM configuration Sep 9, 2022
@Dosant Dosant requested review from jloleysens, ghudgins, a team and debadair September 9, 2022 13:25
Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @Dosant !

I had two observations I wanted to get y'alls thoughts on (both non-blockers):

First: @yuliacech (and @Dosant) as I was interacting with this beast of a form I thought one thing that could be really useful is to add the ability to filter out actions - maybe like a global fuzzy matcher so that if I type "down" then all the advanced options will auto expand and show only matching actions. This increases the chances that you'll be able to see the continuity between all relevant config more easily. (I'm thinking of this as a future enhancement of course). Similar to the UX you get when searching for settings in VS Code:

Screenshot 2022-09-12 at 10 48 13

Second: When I enable the downsampling action for the first time the field starts in an errored state.

Screenshot 2022-09-12 at 10 25 16

I'm guessing this is because the form is not in a pristine state anymore (you need to enable the action). Would it be possible to have a default value here?

Otherwise, this PR LGTM tested locally.

Comment on lines +316 to +327
return {
message: i18n.translate(
'xpack.indexLifecycleMgmt.editPolicy.downsamplePreviousIntervalWarmPhaseError',
{
defaultMessage:
'Must be greater than and a multiple of the hot phase value ({value})',
values: {
value: intervalValues.hot.esFormat,
},
}
),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to provide a pre-filled number for downsampling here? One thing that is tricky about this form is how big it is and I feel like scrolling between phases you can easily get distracted by all the shiny things.

One alternative I thought of was providing a list of multiples in the error message like: (ex. 10 days, 20 days...)

Might not be worth the effort.

Copy link
Contributor Author

@Dosant Dosant Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to provide a pre-filled number for downsampling here?

As I understand, this isn't a low effort because this wouldn't be just a static value and have to depend on the previous step:
e.g., if someone configured a 1-day interval in a hot phase and then enabled a warm phase, then we should prefill with e.g., 2 days. As I understand, such support for "dynamic" initial values needs to be implemented first

Also really not sure what could be good defaults here.

One alternative I thought of was providing a list of multiples in the error message like: (ex. 10 days, 20 days...)

Could be!

cc @ghudgins

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets worry about optimizations and better defaults in subsequent PRs. hard to say what the rule should be without spending some time with it / hearing from users

@Dosant
Copy link
Contributor Author

Dosant commented Sep 12, 2022

@elasticmachine merge upstream

@Dosant
Copy link
Contributor Author

Dosant commented Sep 12, 2022

@jloleysens,

Second: When I enable the downsampling action for the first time the field starts in an errored state.
I'm guessing this is because the form is not in a pristine state anymore (you need to enable the action). Would it be possible to have a default value here?

This was because I validated the interval for the current and next phases when downsample action was toggled. I fixed this by validating only the next phases when it is toggled.
just FYI: We need this for the following scenario:

  1. Have hot downsample action enabled and set 1d
  2. Have warm downsample action enabled and set 1d -> see the error
  3. Disable downsample on hot action (need to trigger validation of warm, so that error resets)

note: there is the following edge case:

  1. Have the warm downsample action enabled and set 1d
  2. Have the cold downsample action enabled and set 1d -> see the error
  3. Disable the warm step completely -> error on cold downsample action is still there until triggering change or saving.

To fix this, we need to validate downsample actions when steps are toggled. But I thought this is an overkill for this edge case.

@yuliacech
Copy link
Contributor

Hey @jloleysens, thanks a lot for this suggestion, I think that is a great idea for the ILM form, it's becoming scarier with each PR 😆 Do you mind opening an issue for that so we don't forget?

Thanks a lot for working on this feature @Dosant! I had a quick look and the validation worked correctly for me. I agree with JL that the interval input should not be marked as error as soon as the user toggles the Downsampling action. I think there is some similar logic in the MinAge field. The field is required if the phase is enabled but it's nor marked as error straight away. Also there is similar validation logic in this field, because the MinAge can't be smaller than in the previous phase. Also is the Downsampling action also available in the Frozen phase?

A suggestion I have for a follow up PR: since downsampling turns the index read-only automatically, I think we could hide the ReadOnly action in the phase if there is a downsampling action enabled in it or in any of the previous phases.

Could you please add the Downsampling action to this text in the Hot phase?

Screenshot 2022-09-12 at 14 12 13

@Dosant
Copy link
Contributor Author

Dosant commented Sep 12, 2022

@yuliacech,

I agree with JL that the interval input should not be marked as error as soon as the user toggles the Downsampling action. I think there is some similar logic in the MinAge field. The field is required if the phase is enabled but it's nor marked as error straight away.

this should be fixed 👍 #138748 (comment)

Also is the Downsampling action also available in the Frozen phase?

No! only hot,warm,cold

A suggestion I have for a follow-up PR: since downsampling turns the index read-only automatically, I think we could hide the ReadOnly action in the phase if there is a downsampling action enabled in it or in any of the previous phases.

Good suggestion. I will create an issue and will try in a separate pr #140515

Could you please add the Downsampling action to this text in the Hot phase?

👍

Copy link
Contributor

@alisonelizabeth alisonelizabeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Dosant!

I agree with Yulia and JL's comments on looking into ways we can improve the usability of the form in a future iteration.

I left some minor code comments, nothing blocking.

Also is the Downsampling action also available in the Frozen phase?

No! only hot,warm,cold

I noticed in your last comment on the issue, you mentioned it should be available in the frozen phase, but is not yet available in ES. Is this future work?


// disable warm phase;
await actions.togglePhase('warm');
// TODO: there is a bug that disabling a phase doesn't trigger downsample validation in other phases,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a separate issue open to address this? Will the validation work as expected on save?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the validation work as expected on saving?

Yes! the form state stays in the error state until the validation is triggered again.
More details on the case here: #138748 (comment)

note: there is the following edge case:
Have the warm downsample action enabled and set 1d
Have the cold downsample action enabled and set 1d -> see the error
Disable the warm step completely -> error on cold downsample action is still there > until triggering change or saving.
To fix this, we need to validate downsample actions when steps are toggled. But I thought this is an overkill for this edge case.

Is there a separate issue open to address this?

I wanted to leave a TODO comment to point this out as a known bug/edge case but didn't think this was worth an issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can fix this with low effort: #140628

['Must be greater than and a multiple of the hot phase value (60m)'],
'cold'
);
}, /* increase a timeout of this test as it could fail on ci*/ 12000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity - is there any more context you can provide here? Was the test failing without the increased timeout?

Copy link
Contributor Author

@Dosant Dosant Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was failing with a 5-second timeout because it was a very long test. initially, I also implemented a frozen phase and tested it here. Since we removed the frozen phase, I also removed a timeout, as I expect it should fit in 5 seconds now.

defaultMessage="Roll up documents within a fixed interval to a single summary document. Reduces the index footprint by storing time series data at reduced granularity."
/>{' '}
{/* TODO: add when available*/}
{/* <LearnMoreLink docPath={docLinks.links.elasticsearch.ilmDownsample} /> */}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documentation planned, or already in progress? I'm wondering if it makes sense to remove this code until it is ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is planned: #130437 (comment) cc @debadair

/*
* Labels for fixed intervals
*/
export const fixedIntervalUnits = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we already have a timeUnits const. Just want to confirm these values should be separate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micros and nanos are technically not supported for the fixed interval.
btw, I saw this: #137081

I think we can unite these consts if we remove everything smaller then a minute

@@ -0,0 +1,130 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests 👍

@Dosant
Copy link
Contributor Author

Dosant commented Sep 13, 2022

I noticed in your #130437 (comment), you mentioned it should be available in the frozen phase, but is not yet available in ES. Is this future work?

@alisonelizabeth,

There was some back-and-forth :(
Initially, we planned hot,warm,cold.
Then we agreed to add frozen. I added frozen.
Then we agreed not to add frozen (it turned out to be more difficult on es side than originally thought). I removed the frozen.
The last state: keep hot,warm,cold

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
indexLifecycleManagement 189 190 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
indexLifecycleManagement 152.8KB 158.7KB +5.8KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Dosant Dosant merged commit 678edce into elastic:main Sep 13, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Sep 13, 2022
@alisonelizabeth
Copy link
Contributor

@Dosant I forgot to ask during my review - I noticed you have the release_note:skip label on this. However, this seems like something we might want to add a release note for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:feature Makes this part of the condensed release notes Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable TSDB downsampling ILM configuration
8 participants