Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation of the SLM schedule frequency #64452

Merged
merged 11 commits into from Nov 6, 2020

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Oct 30, 2020

Closes #55450

Adds the new slm.minimum_interval setting (default: 15 minutes) and validates that SLM policies are not more frequent than that minimum interval.

/cc @original-brownbear

@joegallo joegallo added >enhancement WIP :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Oct 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Oct 30, 2020
@joegallo

This comment has been minimized.

@joegallo
Copy link
Contributor Author

joegallo commented Nov 3, 2020

Additional question for review: one could argue that this is a breaking change w.r.t. 7.x, what kind of additional changelog notes or the like are appropriate for on-the-edge changes like these?

@joegallo
Copy link
Contributor Author

joegallo commented Nov 3, 2020

The 'next interval' calculation is naive, there's no guarantee that somebody hasn't entered something 'fun' like

0 0,30,35,40,45,[...] * * * [...]

which would be okay or not okay by this current logic depending on exactly when you tried to validate it. 😬

edit: if we wanted to get more serious about checking for this possibility, which maybe we do, then I suppose we could run through the next hour's worth of intervals returning the minimum one found. I don't think it would be too difficult to implement that.

@dakrone dakrone self-requested a review November 3, 2020 22:47
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Joe, I left some comments about this but it looks good in general.

one could argue that this is a breaking change w.r.t. 7.x, what kind of additional changelog notes or the like are appropriate for on-the-edge changes like these?

I think this fits squarely in the "fixing cluster stability" bucket, not really a breaking change (insert disclaimer about any change being a breaking change). I don't think we need to explicitly add anything about it (there is a workaround regardless)

if we wanted to get more serious about checking for this possibility, which maybe we do, then I suppose we could run through the next hour's worth of intervals returning the minimum one found. I don't think it would be too difficult to implement that.

How about we do this as a follow-up?

long minimumIntervalMillis = minimumInterval.getMillis();
long nextInterval = lifecycle.calculateNextInterval();
if (nextInterval >= 0 && minimumIntervalMillis > 0 && nextInterval < minimumIntervalMillis) {
throw new IllegalArgumentException("invalid schedule [" + lifecycle.getSchedule() + "]: too frequent");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include what the limit for frequency is in the error message, and perhaps something like:

invalid schedule [*/1 * * * ?]: schedule would be too frequent, executing more frequently than [15m] interval

(the wording doesn't have to be exactly that, just an example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, how about 2798b78?

@joegallo

This comment has been minimized.

@joegallo
Copy link
Contributor Author

joegallo commented Nov 4, 2020

How about we do this as a follow-up?

I'm going to poke at it for a bit -- if it seems sane enough I might keep it on this PR, but we'll see. (edit: Ehhhh, yeah, I'm punting on this one for now.)

@joegallo
Copy link
Contributor Author

joegallo commented Nov 4, 2020

I think this fits squarely in the "fixing cluster stability" bucket, not really a breaking change [...]

+1, sounds fair to me.

@joegallo joegallo requested a review from dakrone November 5, 2020 19:16
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Joe!

@joegallo joegallo merged commit 43529e8 into elastic:master Nov 6, 2020
@joegallo joegallo deleted the validate-slm-schedule-frequency branch November 6, 2020 16:57
@joegallo joegallo added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Nov 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ui (Team:UI)

@joegallo
Copy link
Contributor Author

/cc @elastic/es-ui long story short, at the very least, this is going to mean that minute should not be an option in this select box anymore. 👋 (❤️)

Screen Shot 2020-11-19 at 11 59 16 AM

@alisonelizabeth
Copy link
Contributor

Thanks for the heads up @joegallo! I created elastic/kibana#83918 to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add (Hard-) Limit to SLM Snapshot Frequency?
5 participants