[reconfigurator] `BlueprintBuilder` cleanup 3/5 - introduce `BlueprintStorageEditor` #7105

jgallagher · 2024-11-19T20:56:48Z

Builds and is staged on top of #7104. This introduces a BlueprintStorageEditor that handles disks and datasets together, and attempts to address the API issues described by #7080:

Adding a zone also adds its datasets
Adding a disk also adds its Debug and Zone Root datasets
Expunging a disk expunges any datasets that were on it
Expunging a dataset expunges any zones that were on it

This should allow BlueprintBuilder clients who don't or can't call sled_ensure_{disks,datasets} (like reconfigurator-cli and some tests) to construct valid blueprints.

Changes the API style to be more imperative. One behavioral change is that "generation 1" is only ever the set of empty disks; if any disks are added, that becomes "generation 2". This is consistent with how we define the OmicronZonesConfig generation, but is a change from what's on main.

smklein · 2024-11-19T21:48:52Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+                    for (zone, _) in self.zones.current_sled_zones(
+                        sled_id,
+                        BlueprintZoneFilter::ShouldBeRunning,
+                    ) {
+                        if let Some(fs_zpool) = &zone.filesystem_pool {
+                            if *fs_zpool == expunged_zpool {
+                                zones_to_expunge.insert(zone.id);
+                            }
+                        }
+                        if let Some(dataset) = zone.zone_type.durable_dataset()
+                        {
+                            if dataset.dataset.pool_name == expunged_zpool {
+                                zones_to_expunge.insert(zone.id);
+                            }
+                        }
+                    }


Should this be factored out, to an expunge_zones_using_pool function? Seems like it could be a method on self.zones.

Hmm, not super easily, I don't think? This gets to storage and zones being separate things, which I'd like to fix but is more work.

I did factor out a chunk of this into a zones_using_zpool() method in 73b209f

smklein · 2024-11-19T21:50:59Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+        // Expunging a zpool necessarily requires also expunging any zones that
+        // depended on it.
+        for zone_id in zones_to_expunge {
+            self.sled_expunge_zone(sled_id, zone_id)?;


It's a little weird that we could be expunging zones, but that doesn't propagate through the EnsureMultiple result because that's assumed to be referring to "disks", right?

Yeah; this got major rework, mostly in 8573d08

smklein · 2024-11-19T21:52:04Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+            self.sled_expunge_zone(sled_id, zone_id)?;
+        }

        if added == 0 && removed == 0 {


Suggested change

if added == 0 && removed == 0 {

if added == 0 && removed == 0 && updated == 0 {

(Maybe EnsureMultiple should have a constructor that automatically makes the NotNeeded variant if all zeroes are passed)

Reworked in 8573d08

smklein · 2024-11-19T21:58:12Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+        for zone in removed_zones {
+            sled_storage.expunge_zone_datasets(zone);
+        }
+        mem::drop(sled_storage);


(Opinion, feel free to argue with me)

WDYT about adding a method that consumes self on sled_storage, and documents a little more clearly what we're doing here? Something like fn commit_to_blueprint(self), or fn bump_blueprint_generation_numbers(sefl)?

The explicit drop calls are dropping the Dataset / Disk editors, and their drop impls bump generation numbers if anything has changed, but that's kinda only "easy to see" because I just reviewed that code. Outside of that context, it requires jumping through a few files to infer what's happening here, and why this call is necessary.

Same for the other mem::drop callsite

FWIW, I am not thinking of these drop calls as behavioral (i.e., bumping generation numbers); I think it would be fine if that didn't happen right here and happened later when this would naturally be dropped. The drop is here to appease the borrow checker: we can't move on and call self.record_operation while we still have a mutable borrow back into self.storage (which sled_storage is).

That said - I added a finalize() method to the datasets editor based on your feedback in #7104, so I'm certainly not opposed to it! Let me take a stab at this in addition to the other comments around Ensure.

smklein · 2024-11-19T22:27:04Z

nexus/reconfigurator/planning/src/blueprint_builder/builder/storage_editor.rs

+    pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure {
+        let zpool = ZpoolName::new_external(disk.pool_id);
+
+        let result = self.disks.ensure_disk(disk);
+
+        // We ignore the result of possibly adding or updating the disk's
+        // dataset. Do we care to log if they've changed even if the disk
+        // doesn't?
+        self.datasets.ensure_debug_dataset(zpool.clone());
+        self.datasets.ensure_zone_root_dataset(zpool);
+
+        result
+    }


I think we do care, especially for the purposes of keeping blueprint diffs accurate.

Would it be unreasonable to return:

struct DiskEnsure { disk: Ensure, dataset: Ensure, }

? To basically propagate information about "what disks changed" separately from "what datasets changed"?

(... this also makes me think that Ensure should be typed, like our UUIDs, but that's a whole 'nother can of worms)

I'd prefer to get rid of Ensure altogether and instead return the EditCounts recently added in PR 2/5 when build or finalize is called.

Totally agree that's a better name for it, but I still think there's the question of scope (e.g., did you edit a disk, dataset, zone, etc?)

Totally agree that's a better name for it, but I still think there's the question of scope (e.g., did you edit a disk, dataset, zone, etc?)

Agreed. I think returning multiple EditCounts would make sense.

I think we do care, especially for the purposes of keeping blueprint diffs accurate.

Can you say more about this? I think the return values of these functions are only really used in logging and some tests (which assert on the specific counts); diffs already do their own independent analysis of changes.

I think we do care, especially for the purposes of keeping blueprint diffs accurate.

Can you say more about this? I think the return values of these functions are only really used in logging and some tests (which assert on the specific counts); diffs already do their own independent analysis of changes.

Which is not to say I'm opposed! Returning a combined edit counts thingamajig seems good, but it will need to be from a finalize(self) -> AllEditCounts kind of method, I think. Will give that a shot.

yeah, I suppose not for the diffs, but the point of passing back the ensure structs in the first place was to keep track of "what changed", as you say, for tests, comments in the blueprint, etc.

(I'll take a look at the AllEditCounts stuff, but it sounds like we're on the same page here)

andrewjstone · 2024-11-19T23:46:20Z

nexus/reconfigurator/planning/src/blueprint_builder/builder/disks_editor.rs

-        self.config
-            .disks
-            .insert(PhysicalDiskUuid::from_untyped_uuid(disk.id), disk)
+    pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure {


Same as in Sean and my comments in #7104. Can we get rid of the Ensure type and track the counts in SledDisksEditor and SledDatasetsEditor themselves?

andrewjstone · 2024-11-20T00:01:37Z

nexus/reconfigurator/planning/src/blueprint_builder/builder/storage_editor.rs

+        self.disks.current_sled_disks(sled_id)
+    }
+
+    pub fn into_builders(


Rather than returning builder types and then having the BlueprintBuilder call build on both of them what if we avoid creating the wrapper types and instead make this a build method that takes the in service sled ids and returns the results as a pair?

That should make the caller code and this code simpler and I think meets the goal of not allowing separate builders to be exposed for datasets and disks.

Yeah, I actually had it that way originally and it was sort of awkward to require the sled IDs iterator to be cloneable. But I can change it back.

Changed in 83a49b7

andrewjstone · 2024-11-20T16:44:52Z

nexus/reconfigurator/planning/src/blueprint_builder/builder/storage_editor.rs

+    pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure {
+        let zpool = ZpoolName::new_external(disk.pool_id);
+
+        let result = self.disks.ensure_disk(disk);
+
+        // We ignore the result of possibly adding or updating the disk's
+        // dataset. Do we care to log if they've changed even if the disk
+        // doesn't?
+        self.datasets.ensure_debug_dataset(zpool.clone());
+        self.datasets.ensure_zone_root_dataset(zpool);
+
+        result
+    }


I'd prefer to get rid of Ensure altogether and instead return the EditCounts recently added in PR 2/5 when build or finalize is called.

jgallagher · 2024-11-20T22:47:50Z

Thanks for the quick reviews! Merging with the changes with EditCounts etc. in #7104 and then changing this PR to be more in that style was nontrivial - maybe worth a second look? Particularly 2eef785 and 8573d08

smklein · 2024-11-20T23:02:52Z

nexus/reconfigurator/planning/src/blueprint_builder/builder/disks_editor.rs

 impl Drop for SledDisksEditor<'_> {
    fn drop(&mut self) {
-        if self.changed {
+        if self.counts != EditCounts::default() {


Extremely minor nitpick, but maybe this is worth an EditCounts::has_changes() -> bool method?

Yeah, seems reasonable. I might go with has_nonzero_counts() since this thing itself is just counts, and only the bigger context makes "changes" meaningful.

…torage-3

…ol, it must be from a disk present in the blueprint (#7106) Builds and is staged on top of #7105. The intended change here is in the first commit (8274174): In `BlueprintBuilder::sled_select_zpool()`, instead of only looking at the `PlanningInput`, we also look at the disks present in the blueprint, and only select a zpool that the planning input says is in service and that we have in the blueprint. This had a surprisingly-large blast radius in terms of tests - we had _many_ tests which were adding zones (which implicitly selects a zpool) from a `BlueprintBuilder` where there were no disks configured at all, causing them to emit invalid blueprints. These should all be fixed as of this PR, but I'm a little worried about test fragility in general, particularly with an eye toward larger changes like #7078. Nothing to do about that at the moment, but something to keep an eye on. Fixes #7079.

jgallagher added 6 commits November 19, 2024 10:54

expectorate

c9a767a

[reconfigurator] Move BlueprintDatasetsEditor to its own module

3a7f7e3

[reconfigorator] Add SledStorageEditor and rework sled_ensure_datasets

dc37f91

module rename

b8d37a0

expectorate

ee88586

jgallagher requested review from andrewjstone and smklein November 19, 2024 20:56

hakari

458d1c7

jgallagher mentioned this pull request Nov 19, 2024

[reconfigurator] BlueprintBuilder cleanup 4/5 - when choosing a zpool, it must be from a disk present in the blueprint #7106

Merged

cargo fmt

08f9a20

smklein reviewed Nov 19, 2024

View reviewed changes

andrewjstone reviewed Nov 20, 2024

View reviewed changes

jgallagher force-pushed the john/reconfigurator-storage-2 branch from ab94020 to 552d8bc Compare November 20, 2024 18:52

Base automatically changed from john/reconfigurator-storage-2 to main November 20, 2024 21:11

jgallagher added 7 commits November 20, 2024 16:39

Merge branch 'main' into john/reconfigurator-storage-3

dc1807e

storage editor: prefer EditCounts over EnsureMultiple

2eef785

clarify intent of mem::drop()

dc9b89e

extract zones_using_zpool() method

73b209f

return combined edit counts from sled_ensure_disks()

8573d08

cargo fmt

a0ad5d0

into_builders -> into_blueprint_maps

83a49b7

smklein reviewed Nov 20, 2024

View reviewed changes

smklein approved these changes Nov 20, 2024

View reviewed changes

jgallagher added 4 commits November 21, 2024 11:00

fix rustdoc

e415cd2

clearer EditCounts API

7a5e7d9

Merge remote-tracking branch 'origin/main' into john/reconfigurator-s…

9da163b

…torage-3

expectorate

2df3e50

jgallagher merged commit ee0db08 into main Nov 21, 2024
17 checks passed

jgallagher deleted the john/reconfigurator-storage-3 branch November 21, 2024 18:14

jgallagher mentioned this pull request Nov 25, 2024

[reconfigurator] Builder API does not provide guarantees that datasets are provisioned for zones, but should #6990

Closed

2 tasks

davepacheco mentioned this pull request Dec 20, 2024

[reconfigurator] BlueprintBuilder API allows non-planner clients to emit invalid blueprints #7080

Closed

	if added == 0 && removed == 0 {
	if added == 0 && removed == 0 && updated == 0 {

[reconfigurator] BlueprintBuilder cleanup 3/5 - introduce BlueprintStorageEditor #7105

[reconfigurator] BlueprintBuilder cleanup 3/5 - introduce BlueprintStorageEditor #7105

Uh oh!

Conversation

jgallagher commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher commented Nov 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[reconfigurator] `BlueprintBuilder` cleanup 3/5 - introduce `BlueprintStorageEditor` #7105

[reconfigurator] `BlueprintBuilder` cleanup 3/5 - introduce `BlueprintStorageEditor` #7105

jgallagher commented Nov 19, 2024 •

edited

Loading