Skip to content

Conversation

@jgallagher
Copy link
Contributor

This finishes the set of PRs that tackle the first bullet of #9238:

  • Some limited resources (e.g., external IP addresses) are managed via the BlueprintResourceAllocator type. As currently implemented, this type is constructed after expungement (so it can reuse resources that have been freed up by that process) but before any new zones are added, but it's very non-obvious: BlueprintBuilder internally holds a OnceCell<BlueprintResourceAllocator> that it constructs on demand, so in practice this is constructed during the first zone add that needs an allocated resource. We should clean this up: it's fragile and not at all obvious that we're (ab)using a OnceCell specifically to delay decision making until the right type.

This requires callers to pass in external networking decisions when adding zones that need them (Nexus, External DNS, and Boundary NTP). IMO this is a big win to the planner / builder split - the planner creates an allocator at an appropriate point during planning, and uses that to make decisions about external networking - but it makes all the other BlueprintBuilder consumers (tests) a fair bit noisier, because they all have to do the same thing. I think this is still a win overall, but I'm very interested if others feel differently (and how we could clean up the test consumers too, if possible!).

A couple ideas on that last bit:

  • Add an BlueprintBuilder::available_external_networking(policy: &ExternalIpPolicy) -> Result<ExternalNetworkingAllocator>) method that just passes itself to ExternalNetworkingAllocator::from_current_zones(). (No real semantic change, but slightly shorter call sites?)
  • Add BlueprintBuilder::next_{nexus,external_dns,boundary_ntp}_external_ip(policy: &ExternalIpPolicy) -> Result<ExternalIp> methods that internally create an ExternalNetworkingAllocator, use it to generate a single IP, then throw it away. This would be correct but wasteful if used by the few callers that allocate multiple IPs, so they could still go with the "construct an allocator, use it several times" they have as the PR is now, but it would shorten many of the tests that only want to create one or two and that don't care about a bit of waste.

// similar case to the previous paragraph: a zone with networking resources was
// expunged, the database doesn't realize it yet, but can still move forward and
// make planning decisions that reuse those resources for new zones.
pub(super) fn ensure_input_networking_records_appear_in_parent_blueprint(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this for a couple reasons:

  • I removed its only caller (resource_allocator()), and it's not obvious where a new caller would be. This is more of a blippy-level check, except that all the existing blippy checks only look at the Blueprint in isolation, and this is checking for consistency between a Blueprint and a PlanningInput.
  • This is here to bail out of planning early if we have some kind of mismatch between the external_ip CRDB table and the current target Blueprint. I don't believe it's ever actually caught a problem, and we don't have similar checks for other tables that blueprint execution updates.

But if this makes folks uneasy and have a suggestion for the first bullet, I don't feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd love to have blippy (optionally?) accept a PlanningInput at some point. There's at least one check with the mupdate-update work (host phase 2 image sources should be set to current if a mupdate override is set for that sled) that would be unblocked by that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy enough! #9356

Comment on lines 123 to 126
.context("constructing ExternalNetworkingAllocator")?
.for_new_nexus()
.context("choosing external IP for new Nexus")?;
builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the context messages here don't quite match up with the ones above

// similar case to the previous paragraph: a zone with networking resources was
// expunged, the database doesn't realize it yet, but can still move forward and
// make planning decisions that reuse those resources for new zones.
pub(super) fn ensure_input_networking_records_appear_in_parent_blueprint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd love to have blippy (optionally?) accept a PlanningInput at some point. There's at least one check with the mupdate-update work (host phase 2 image sources should be set to current if a mupdate override is set for that sled) that would be unblocked by that.

jgallagher added a commit that referenced this pull request Nov 10, 2025
This PR adds a new method to (optionally) check a blueprint against a
given planning input. The only check included in this PR is a
line-for-line port of the builder's
`ensure_input_networking_records_appear_in_parent_blueprint()` function
(which I'm removing from the builder in #9347). I updated most of the
`Blippy` call sites to use this method; I skipped the one in an RSS test
because it has no planning input available - RSS just generates a raw
blueprint to hand off to Nexus.
@jgallagher jgallagher merged commit b7eb043 into main Nov 10, 2025
16 checks passed
@jgallagher jgallagher deleted the john/prune-blueprint-resource-allocator branch November 10, 2025 19:49
jgallagher added a commit that referenced this pull request Nov 10, 2025
…o planner (#9370)

This is staged on top of #9347 and is a part of #9238. In particular, by
moving this from the builder to the planner, we can remove the builder's
need for an inventory collection entirely.

There should be no functional changes here; it's just moving code around
(with some reformatting to break it up a bit) and then removing the
`collection` argument from all the callers of
`BlueprintBuilder::new_based_on()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants