-
Notifications
You must be signed in to change notification settings - Fork 53
initial draft of blueprints #4804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ollection (and have a separate code path for the initial blueprint)
I incorporated feedback from @ahl and I think I've finished adding all the tests I'm planning to add here. |
// It's kind of unfortunate to pull in such a complex and unstable type | ||
// as "blueprint" this way, but we have really useful functionality | ||
// (e.g., diff'ing) that's implemented on our local type. | ||
Blueprint = nexus_types::deployment::Blueprint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahl installed some healthy fear of replace
in me a while back - how painful would it be to impl From
for blueprints instead of replace
ing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share yours and Adam's reluctance about replace
. In this case, the functionality being shared (diff'ing blueprints) is pretty non-trivial and I think it would be fairly painful to impl the requisite From
s: I think it would require implementing From
for Blueprint
, and OmicronZonesConfig
, which in turn requires OmicronZoneConfig
, then OmicronZoneType
(a big enum with a bunch of properties in many variants), OmicronZoneDataset
, SledRole
, ZpoolName
, NetworkInterface
, NetworkInterfaceKind
, and SourceNatConfig
. (I'm not positive, but I think that's right.)
I think better approaches here might be to have Progenitor validate that your replace
type really is compatible with the spec, or alternatively if there were a way to automatically generate From
impls between two types that are identical (which would presumably break -- correctly -- if they weren't identical). It doesn't feel sustainable to me to either duplicate complex functionality on multiple types or hand-roll so many complex From
impls. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had the opposite reaction, and didn't think it was unfortunate to use replace
at all. However, clearly I am missing what is problematic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what sucks about using replace
is that you lose the ability to notice when your local type doesn't match the spec. In turn that means we lose all of the benefit of Rust's strong static typing across the OpenAPI boundary.
Concretely: in the normal case where Progenitor generates the types for you, say we pull in a new version of the spec with a new required field. Progenitor's type will have that new field in it. So if we had a code path that creates an instance of that type (i.e., to make the request) it will now fail to compile. That's good: if we didn't know it before, we now know we've made an incompatible change to the spec. If we did know it, now's when we update our code to fill in the new value.
If on the other hand we use replace
to supply the type, when we pull in the updated spec, everything still compiles -- there's no reason it wouldn't. Instead, that code path will fail at runtime when the server rejects the request because it's missing the required field. That really sucks.
Similarly on the output side, if we use replace
to supply our own type, we don't find out at compile time when our type is incompatible. A particularly bad case would be an enum, where the server might add a new variant (which one might even consider a non-breaking change). But if the client sees it, it will fail to deserialize the response. By contrast, if we use the Progenitor-generated type, we'll fail to build if, say, we try to match
on the enum and don't include the new variant.
In summary, using Progenitor types gets a lot of the benefit of Rust across the OpenAPI boundary. Using "replace" bypasses all of it (for that type). The downside of Progenitor types is that when there's non-trivial functionality on the original type, you either have to impl that functionality twice (sucks) or convert the type to your own type (okay for simple types in a few places, but kind of sucks in the large). For simple, stable types like Name
I think the benefits of "replace" probably outweigh the risks. For complex types that are likely to change, I'm less sure (but still feel what I said earlier about this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better approaches here might be to have Progenitor validate that your replace type really is compatible with the spec
This feels like it would be huge - it would allow use to use replace
fearlessly, even on big / complex / volatile types like this one, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better approaches here might be to have Progenitor validate that your replace type really is compatible with the spec
This feels like it would be huge - it would allow use to use
replace
fearlessly, even on big / complex / volatile types like this one, I think?
Potentially, yeah. I spoke with Adam about this a bit. It's not clear how we could do this at compile time, but I imagine we might be able to do it at test-time.
} | ||
|
||
/// Describes a complete set of software and configuration for the system | ||
// Blueprints are a fundamental part of how the system modifies itself. Each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit, but this feels like it should all be part of the doc comment, I think? It's all extremely relevant to anyone working with or around Blueprint
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that previously. But this is a publicly-exposed type, so the block comment is used for API consumers, not just Rust consumers. I remember discussing this months ago and deciding that unless there was a really compelling reason to try to support separate API docs and rustdoc for the same type, we'd just let the rustdoc reflect API docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but this is an API type for the internal API, right? Do those exist? (To be fair, another question is whether we generate rustdocs for crates. I've done that locally occasionally but it's not a habit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been heavily using API docs for internal APIs (generated by hand with redoc). Then again I also use rustdocs via rust-analyzer. But I'm inclined to treat both APIs the same here -- part of the idea behind using OpenAPI everywhere was so that we could have a good experience using our own APIs through the patterns/processes/tooling we use for the external API. I find it annoying when I'm using the internal API docs and I get a lot of irrelevant Rust-isms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I don't feel super strongly about this either way, happy to defer to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dave, this looks great. It's a huge step forward. I particularly like all the blueprint diff stuff.
.map(|sled_row| { | ||
let sled_id = sled_row.id(); | ||
let subnet = Ipv6Subnet::<SLED_PREFIX>::new(sled_row.ip()); | ||
let zpools = zpools_by_sled_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's safe to catch this on the next planning round. The inventory collection will change and the new sled will show up. I think the only thing you could do if you checked here is go back and redo this context, which seems somewhat redundant/unnecessary.
// lying around). Recompute it based on what boundary servers | ||
// currently exist. | ||
let ntp_servers = self | ||
.parent_blueprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, like other parts of the builder that rely on the parent_blueprint
seem to implicitly assume that the parent_blueprint
was successfully executed. That is why they are capable of performing "next steps". Am I correct here?
If I am correct then this may present a problem when a plan is not successfully executed, or partially executed. While conceptually I like the idea of one plan building off the next plan, it seems like we may need to pass in more parameters where this may be problematic, or we need to use the inventory rather than the prior plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered this too, but wasn't sure how to phrase the question, because I assume it's pretty reasonable we end up in this case:
- We have a current target blueprint that we have not yet achieved
- Something in the system changes such that we can never achieve the current target
Now what? We either need to be able to generate a child blueprint despite the parent not being successfully executed, or maybe we need some escape hatch to say "regenerate a new initial blueprint based on the latest collection"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think eventually we're going to want to use both the parent blueprint and the most recent (valid) inventory.
This model does assume that the parent blueprint is currently the source of truth for the intended state of the system. That's why I added the constraints that (1) aside from the initial blueprint, blueprints get generated from the current target, and (2) a blueprint can only become the new target if its parent is [still] the target.
I think this particular code does not assume the parent blueprint has been executed. Suppose this sequence:
- we start with some initial blueprint B1
- we create a new blueprint B2 (parent is B1) where we provision a third boundary NTP zone
- before we start executing B2, we create a new blueprint B3 (parent is B2) that adds the "internal NTP" zone to a new sled
- this code will include in the config for the new internal NTP zone the boundary NTP zone added in B2
I think that's right, even though the new boundary NTP zone isn't necessarily deployed yet. It is one of the current boundary NTP zones right now, even if it's not up yet, and clients always need to handle dependencies not being up yet. Put differently: the new blueprint describes a new state of the system, and in that state, there is an internal NTP zone and it uses the boundary NTP zone from B2. It's fine if it takes a minute to get there because it can all be done in one execution step. (sort of a footnote: We may well want to only add things to DNS after they're up, but that doesn't really change this -- it just means that the internal DNS zone will fail to resolve the DNS name at first instead of resolving it to something that's not running NTP yet.)
All of the particulars here should be obviated by #4791 but it's certainly a general question that applies to everything here. Your comment has helped me think through this more. I think some principles on using the parent blueprint vs. inventory might be:
- When constructing the new blueprint, start with the parent blueprint and evolve it as needed. (Sounds obvious but it wasn't to me at first. Concretely: in the new blueprint, the set of Omicron zones for each sled starts with whatever was in the parent blueprint and not what's in the latest inventory precisely for the reason you mentioned: there might be other changes in the parent blueprint that haven't been executed yet, and that's fine -- we just want to make sure our new blueprint reflects those also.)
- When the new blueprint requires knowing what zones will exist in the new world (i.e., as in the case here, to construct a list of dependencies to put into a config block), you use the contents of the new blueprint itself. Here we use the parent blueprint instead because there are no changes because we do not touch boundary NTP zones yet. But if, say, we had deployed a new boundary NTP zone within the same blueprint, that'd be fine, too, and we'd want to include that in the list of boundary NTP zones that the internal NTP zones know about. (Again, this is a dicey example because there's not really a way to reconfigure the existing internal NTP zones -- we should do NTP zone config could be more dynamic #4791 and that would obviate this particular case.)
- When there are runtime dependencies in a sequence of operations, the planner might need to use the latest inventory collection to decide what to do, if anything. Examples include: when rolling out a bunch of new Nexus zones one at a time (phased rollout), we'll want to wait for execution of each blueprint that deploys a new zone. Or, right now, we really should wait for an internal NTP zone to show up in inventory before creating a plan that deploys Crucible zones. It's not really a problem in this PR because the plans are not generated automatically, but we'll need to update this before this is fully automated. (Another footnote: this is pretty quirky and I'm increasingly feeling like we should probably go do the real async
PUT /omicron-zones
instead, but in the meantime, this is the case and the planner needs to deal with it.) These will probably involve higher-level considerations, too: we don't want to remove a CockroachDB node, even if we have more than the minimum number, until we know that CockroachDB has come to rest in terms of replication. You could imagine, say, replacing a 3-node cluster by deploying 3 new nodes and removing the original 3. It's not safe to do that until we know the cluster's actually replicated to the 3 new nodes! So I expect we'll eventually add this sort of thing to inventory, too.
This is definitely a really tricky area that I went back and forth on while working on this and will probably evolve, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered this too, but wasn't sure how to phrase the question, because I assume it's pretty reasonable we end up in this case:
* We have a current target blueprint that we have not yet achieved * Something in the system changes such that we can never achieve the current target
Now what? We either need to be able to generate a child blueprint despite the parent not being successfully executed, or maybe we need some escape hatch to say "regenerate a new initial blueprint based on the latest collection"?
Agreed. I think the conclusion is that we can't assume the parent blueprint has/will be executed. So far, I don't think that's been a difficult constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your third principle covers what I was worried about. Specifically if we started to deploy crucible zones assuming that the internal NTP zone was deployed already. If the planner takes into consideration the current inventory and that the parent blueprint may not have executed before generating the next blueprint, then we should be all set. This also means that the planner may very well be in a "wait for the target blueprint to complete" before it does anything for a good amount of time. That seems normal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great - just one nit about a comment. Thanks!
fn sled_alloc_ip(&mut self, sled_id: Uuid) -> Result<Ipv6Addr, Error> { | ||
let sled_subnet = self.sled_resources(sled_id)?.subnet; | ||
|
||
// Work around rust-lang/rust-clippy#11935 and related issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer relevant since we reworked this
Thanks for the close reviews, @jgallagher and @andrewjstone! |
This PR is the first step in creating a background task that is capable of taking a `Blueprint` and then reifying that blueprint into deployed or updated software. This PR uses the initial version of a Blueprint introduced in #4804. A basic executor that sends the related `OmicronZonesConfig` to the appropriate sled-agents for newly added sleds was created. A test is included that shows how a hypothetical planner for an `add-sled` workflow will deploy Omicron zones in a manner similar to RSS, where first the internal DNS zone is deployed and then the internal DNS and NTP zones are deployed. Deployment alwyas contains all zones expected to be running on the sled-agent. Any zones running that are not included are expected to be shut down.
This PR is the first step in creating a background task that is capable of taking a `Blueprint` and then reifying that blueprint into deployed or updated software. This PR uses the initial version of a Blueprint introduced in #4804. A basic executor that sends the related `OmicronZonesConfig` to the appropriate sled-agents for newly added sleds was created. A test is included that shows how a hypothetical planner for an `add-sled` workflow will deploy Omicron zones in a manner similar to RSS, where first the internal DNS zone is deployed and then the internal DNS and NTP zones are deployed. Deployment alwyas contains all zones expected to be running on the sled-agent. Any zones running that are not included are expected to be shut down.
This replaces the in-memory blueprint storage added as a placeholder in #4804 with cockroachdb-backed tables. Both the tables and related queries are _heavily_ derived from the similar tables in the inventory system (particularly serializing omicron zones and their related properties). The tables are effectively identical as of this PR, but we opted to keep the separate because we expect them to diverge some over time (e.g., inventory might start collecting additional per-zone properties that don't exist for blueprints, such as uptime). The big exception to "basically the same as inventory" is the `bp_target` table which tracks the current (and past) target blueprint. Inserting into this table has some subtleties, and we use a CTE to check and enforce the invariants. This is the first diesel/CTE I've written; it's based on other similar CTEs in Nexus, but I'd still appreciate a particularly careful look there. Fixes #4793.
This PR is the first step in creating a background task that is capable of taking a `Blueprint` and then reifying that blueprint into deployed or updated software. This PR uses the initial version of a Blueprint introduced in #4804. A basic executor that sends the related `OmicronZonesConfig` to the appropriate sled-agents for newly added sleds was created. A background task that loads the target `Blueprint` from the database and feeds it to the executor is also included, along with a test for each.
This PR provides an initial implementation for Blueprints, which we've variously called "update plans" or "deployment plans". RFD 418 (and the comments here) describe broadly how we expect this to work.
What's here is basically:
Blueprint
-related typesWhat's notably missing:
Importantly, none of the new stuff in this PR actually does anything unless a person goes out of their way to use the internal APIs to configure it. Even then, it only generates some structures in memory. The main reason to land it now is to unblock a bunch of follow-on work like the stuff mentioned above.
A few changes came along for the ride here:
omicron_common
instead of using the client-generated types. This is more stuff in the spirit of sled agent client could use some primitives from omicron_common #4754 but these became surprisingly load-bearing in this case. A bunch of this PR are changes to a handful of lines in a bunch of files related to this (mostly removing unneeded calls tointo()
,from()
,clone()
, or changing an import).Some explicit questions for reviewers:
-w
/--destructive
flag (which I don't think exists). But here I'm adding some commands that do (sort of) modify the system. I filed omdb could more safely support "write" commands #4805 for better omdb support for this sort of thing. In the meantime, I feel like what's here is a reasonable step (because it doesn't actually do anything yet; when it does start doing something, we may well rip out these interfaces anyway; and I think there's a lot of benefit to having this tooling and I don't know where else to put it). But if people would rather I left that out I could remove the omdb stuff from this PR.Status: I think the bulk of this is ready for review. I still want to add a bit to this PR: more automated tests plus some omdb commands for diff'ing a blueprint against another blueprint or an inventory collection. I hope those things won't change much of what's here but feel free to hold off on review if you're worried about that churn.