-
Notifications
You must be signed in to change notification settings - Fork 62
Update Propolis and Crucible #9061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Crucible changes are: update to latest `vergen` (#1770) Update rand dependencies, and fallout from that. (#1764) [crucible-downstairs] migrate to API traits (#1768) [crucible-agent] migrate to API trait (#1766) [crucible-pantry] migrate to API trait (#1767) Add back job delays in the downstairs with the --lossy flag (#1761) Propolis changes are: Crucible update plus a few other dependency changes. (#948) [2/n] [propolis-server] switch to API trait (#946) [1/n] add a temporary indent to propolis server APIs (#945) Handle Intel CPUID leaves 4 and 18h, specialize CPUID for VM shape (#941) Increase viona receive queue length to 2048 (#935) Expand viona header pad to account for options (#937) fix linux p9fs multi message reads (#932) add a D script to report VMs' CPUID queries (#934) Update GH actions Re-enable viona packet data loaning
iximeow
approved these changes
Sep 23, 2025
- Actually update nexus generation within the top-level blueprint and Nexus zones - Deploy new and old nexus zones concurrently # Blueprint Planner - Automatically determine nexus generation when provisioning new Nexus zones, based on existing deployed zones - Update the logic for provisioning nexus zones, to deploy old and new nexus images side-by-side - Update the logic for expunging nexus zones, to only do so when running from a "newer" nexus - Add a planning stage to bump the top-level "nexus generation", if appropriate, which would trigger the old Nexuses to quiesce. Fixes #8843, #8854
…es being known (#8921) Expand the set of gates for adds/updates to include the fact that zone image sources should be known. Add tests for this: * `cmds-mupdate-update-flow` contains the bulk of testing for this scenario. * I had to make tweaks to some tests, particularly to `cmds-target-release.txt`, in order to start running the test in earnest from the Artifact state rather than the InstallDataset state.
Planning reports are contained in `Blueprint`s, which have an ID. Prior to this PR we duplicated the containing blueprint's ID. This bit @davepacheco and me in a couple different (admittedly unusual) testing contexts where we were duplicating blueprints and making changes, not realizing we produced a new blueprint with a different ID but carrying a report that still pointed to the original blueprint's ID. The only thing we lose here is that the display output of the planning report can no longer say what blueprint it's for, but I think that's fine - all the places where we want to display a report, we already know the blueprint ID.
First step of #8902. It's enough work to get Nexus to stand up another HTTP service that this is worth its own PR ahead of moving APIs out of nexus-internal and into nexus-lockstep.
Finishes the `target-release` `reconfigurator-test`, showing the simulate update walking through the process of starting new Nexus zones, waiting for handoff, then expunging the old Nexus zones. Has two tweaks: * Fixes a planning report off-by-one bug where we'd claim a zone was both out of date and expunged (or updated) within the same plan. * Adds a `set active-nexus-gen N` command to `reconfigurator-cli` to control Nexus handoff instead of assuming it completes instantly. Closes #8478 --------- Co-authored-by: Sean Klein <sean@oxidecomputer.com>
Fixes #9047 --------- Co-authored-by: Alex Plotnick <alex@oxidecomputer.com>
This PR completes the first version of the sans-io trust quorum protocol implementation. LRTQ upgrade can now be started via `Node::coordinate_upgrade_from_lrtq`. This triggers the coordinating node to start collecting the LRTQ key shares so that they can be used to construct the LRTQ rack secret via the bootstore code. After this occurs, a Prepare message is sent out with this old rack secret encrypted in a manner identical to a normal reconfiguration. The prepare and commit paths remain the same. The cluster proptest was updated to sometimes start out with an existing LRTQ configuration and then to upgrade from there. Like normal reconfigurations it allows aborting and pre-empting of the LRTQ upgrade with a new attempt at a higher epoch. In production this is how we "retry" if the coordinating node crashes prior to commit, or more accurately, if nexus can't talk to the coordinating node for some period of time and just moves on. After the LRTQ upgrade commits, normal reconfigurations are run. We also remove unnecessary config related messages in this commit. Since a `Configuration` does not contain sensitive information it can be retrieved when Nexus polls the coordinator before it commits. Then Nexus can save this info and send it in `PrepareAndCommit` messages rather than having the receiving node try to find a live peer with the config prior to collecting shares. This is a nice optimization that reduces protocol complexity a bit. This removal allowed removing the TODO in the message `match` statement in `Node::handle` and completing the protocol.
…tion is 1 (#9066) For customers that are going to continue relying on MUPdate, the planner should act the same way as it did before self-service update existed. We ascertain this by looking at whether a target release has ever been set. Most of the tests no longer require the `add_zones_with_mupdate_override` config, so add a new reconfigurator-cli script which specifically tests that config.
Fixes #8912 Should be merged after the rest of Nexus quiesce/handoff is complete.
(This also includes #9077 to avoid failing CI.) --------- Co-authored-by: iliana etaoin <iliana@oxide.computer>
…t even if no target release is set (#9082) Currently: * if a target release is set, we go ahead and clear the remove-mupdate-override instruction from blueprints, regardless of whether artifacts match * if no target release is set, we don't do that This behavior is inconsistent. We shouldn't gate the mupdate override part of the state machine on a target release not being set.
See #9071 for context; this is the short/medium-term fix proposed in that issue.
Contributor
Author
|
Oh what the heck.. this was suppose to be a merge with main, and now it's pandemonium... |
Contributor
Author
|
Okay, there we go, now the diffs are just what I wanted. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Crucible changes are:
update to latest
vergen(#1770)Update rand dependencies, and fallout from that. (#1764) [crucible-downstairs] migrate to API traits (#1768) [crucible-agent] migrate to API trait (#1766)
[crucible-pantry] migrate to API trait (#1767)
Add back job delays in the downstairs with the --lossy flag (#1761)
Propolis changes are:
Crucible update plus a few other dependency changes. (#948) [2/n] [propolis-server] switch to API trait (#946) [1/n] add a temporary indent to propolis server APIs (#945) Handle Intel CPUID leaves 4 and 18h, specialize CPUID for VM shape (#941) Increase viona receive queue length to 2048 (#935) Expand viona header pad to account for options (#937) fix linux p9fs multi message reads (#932)
add a D script to report VMs' CPUID queries (#934) Update GH actions
Re-enable viona packet data loaning