-
Notifications
You must be signed in to change notification settings - Fork 58
[reconfigurator] Enable execution and planning by default #8980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The change to enable blueprint execution will only affect newly-deployed systems that go through RSS after this change lands. Existing systems will need to have execution enabled via `omdb`. The changes to the default chicken switches will enable the planner and disable zone additions if a mupdate override is present on any systems that have not had reconfigurator chicken switches explicitly set (this is _probably_ all deployed systems we care about; even the dogfood rack has not had any of these set explicitly).
Initial spot checks on a racklette look reasonable. We have exactly one blueprint and it's enabled:
and execution is happening:
I don't think (?) we have an easy way to see "the current chicken switches" if they've never been set, but we can see that we haven't set any:
The planner is running (and doing nothing as expected):
and we can get the current value of the inner switches by asking for the entire reconfigurator state:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have a few more questions (e.g., maybe checking that a planner run would be a no-op) if this changed existing systems - but for new deployments, seems good to me.
Seems like we should move existing systems to "enabled" pretty soon too, to avoid a dichotomy in our deployments.
Ehh, it does change planning enabled for existing systems (under certain circumstances I noted above, which I think in practice are true for every existing system). Do you want to ask those extra questions? 😅
I'm not entirely sure what this means - the next release ships update, so we have to move all of them to "enabled" as a part of that release anyway, right? |
Oh, actually, I guess even though this does enable the planner on existing systems, the fact that it doesn't enable execution means if the planner actually does produce something unexpected, we won't try to execute it. We can add details about this to the update runbook for the release, I think. |
Yeah, I was keying on:
Basically, I was wondering "do we want to verify, before we let execution start running by itself, that it won't create/enact a plan that is immediately unreasonable"? And related: What should an operator be looking at, before flipping the execution bit to "enabled"? My overwhelming assumption is "this should be fine", but figured we could at least check the temperature before fully jumping in. |
This broke... a lot of tests. I think (?) the right move is to just disable this automation for tests, so I made several nontrivial changes in e00edaf:
I'm converting this to draft for now and will re-request review after CI passes and I get another round on a racklette to make sure this didn't break anything surprising. |
The chicken switch loading task exposes a watch channel that downstream consumers can use to read the current value. It initializes that channel to the `::default()` value of chicken switches and then reads the most recent value from the db when it's activated, but that introduces a race where consumers can see an incorrect value: if there is a chicken switch value in the db, they can see the `::default()` _before_ the first activation, allowing them to ignore the actual value in the db in favor of whatever the default set is. This PR closes the race window by setting the initial watch channel value to `NotYetLoaded`, and then replacing that during the first activation. This fell out of trying to fix up tests on #8980, but is a legit bug and easily separable from that work. So here it is!
Alright, tests are passing, so I'm marking this as ready for review. The major changes from the original version of this PR are related to disabling both blueprint execution and planning when running under
but they're both functional enough. Feedback very welcome. I'll run this on a racklette again and make sure none of my "disable automation in tests" things broke the prod path, but I think this is ready for review again. |
} | ||
}).collect(), | ||
).await.map_err(|e| { | ||
err.set(RackInitError::BlueprintTargetSet(e)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for patching this. Clearly I missed it.
/// We use this hook to disable reconfigurator automation in the test suite | ||
#[serde(default)] | ||
pub initial_reconfigurator_chicken_switches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this a little in chat, but wanted to make it public:
The usage of this planner_enabled
option for tests is 100% reasonable, but not at all a "chicken switch":
- We aren't leaving it as a toggle-able option to be conservative, and for enabling later
- ... it's "just a config option", which we want to have set in some circumstances, and unset in others.
It probably behooves us to rename ReconfiguratorChickenSwitch
to ReconfiguratorConfig
, if the list of settings is going to be long-lasting (which is fine! It's just confusing to call it a "chicken switch").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: Obviously, doesn't need to block this PR.
I repeated all of these spot checks on the latest commit with all the "disable for tests" work, and they're all identical - execution and planning are still on by default for newly-deployed prod systems. |
…9005) Followup from #8980 (comment).
The change to enable blueprint execution will only affect newly-deployed systems that go through RSS after this change lands. Existing systems will need to have execution enabled via
omdb
.The changes to the default chicken switches will:
on any systems that have not had reconfigurator chicken switches explicitly set (this is probably all deployed systems we care about; even the dogfood rack has not had any of these set explicitly).
Fixes #8960. Fixes #8961.
I'll run this through a racklette before merging.