-
Notifications
You must be signed in to change notification settings - Fork 58
[reconfigurator] BlueprintBuilder
cleanup 3/5 - introduce BlueprintStorageEditor
#7105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changes the API style to be more imperative. One behavioral change is that "generation 1" is only ever the set of empty disks; if any disks are added, that becomes "generation 2". This is consistent with how we define the OmicronZonesConfig generation, but is a change from what's on main.
for (zone, _) in self.zones.current_sled_zones( | ||
sled_id, | ||
BlueprintZoneFilter::ShouldBeRunning, | ||
) { | ||
if let Some(fs_zpool) = &zone.filesystem_pool { | ||
if *fs_zpool == expunged_zpool { | ||
zones_to_expunge.insert(zone.id); | ||
} | ||
} | ||
if let Some(dataset) = zone.zone_type.durable_dataset() | ||
{ | ||
if dataset.dataset.pool_name == expunged_zpool { | ||
zones_to_expunge.insert(zone.id); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be factored out, to an expunge_zones_using_pool
function? Seems like it could be a method on self.zones
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not super easily, I don't think? This gets to storage and zones being separate things, which I'd like to fix but is more work.
I did factor out a chunk of this into a zones_using_zpool()
method in 73b209f
// Expunging a zpool necessarily requires also expunging any zones that | ||
// depended on it. | ||
for zone_id in zones_to_expunge { | ||
self.sled_expunge_zone(sled_id, zone_id)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little weird that we could be expunging zones, but that doesn't propagate through the EnsureMultiple
result because that's assumed to be referring to "disks", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; this got major rework, mostly in 8573d08
self.sled_expunge_zone(sled_id, zone_id)?; | ||
} | ||
|
||
if added == 0 && removed == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if added == 0 && removed == 0 { | |
if added == 0 && removed == 0 && updated == 0 { |
(Maybe EnsureMultiple
should have a constructor that automatically makes the NotNeeded
variant if all zeroes are passed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked in 8573d08
for zone in removed_zones { | ||
sled_storage.expunge_zone_datasets(zone); | ||
} | ||
mem::drop(sled_storage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Opinion, feel free to argue with me)
WDYT about adding a method that consumes self
on sled_storage
, and documents a little more clearly what we're doing here? Something like fn commit_to_blueprint(self)
, or fn bump_blueprint_generation_numbers(sefl)
?
The explicit drop calls are dropping the Dataset / Disk editors, and their drop impls bump generation numbers if anything has changed, but that's kinda only "easy to see" because I just reviewed that code. Outside of that context, it requires jumping through a few files to infer what's happening here, and why this call is necessary.
Same for the other mem::drop
callsite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I am not thinking of these drop calls as behavioral (i.e., bumping generation numbers); I think it would be fine if that didn't happen right here and happened later when this would naturally be dropped. The drop is here to appease the borrow checker: we can't move on and call self.record_operation
while we still have a mutable borrow back into self.storage
(which sled_storage
is).
That said - I added a finalize()
method to the datasets editor based on your feedback in #7104, so I'm certainly not opposed to it! Let me take a stab at this in addition to the other comments around Ensure
.
pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure { | ||
let zpool = ZpoolName::new_external(disk.pool_id); | ||
|
||
let result = self.disks.ensure_disk(disk); | ||
|
||
// We ignore the result of possibly adding or updating the disk's | ||
// dataset. Do we care to log if they've changed even if the disk | ||
// doesn't? | ||
self.datasets.ensure_debug_dataset(zpool.clone()); | ||
self.datasets.ensure_zone_root_dataset(zpool); | ||
|
||
result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do care, especially for the purposes of keeping blueprint diffs accurate.
Would it be unreasonable to return:
struct DiskEnsure {
disk: Ensure,
dataset: Ensure,
}
? To basically propagate information about "what disks changed" separately from "what datasets changed"?
(... this also makes me think that Ensure
should be typed, like our UUIDs, but that's a whole 'nother can of worms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to get rid of Ensure
altogether and instead return the EditCounts
recently added in PR 2/5 when build
or finalize
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that's a better name for it, but I still think there's the question of scope (e.g., did you edit a disk, dataset, zone, etc?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that's a better name for it, but I still think there's the question of scope (e.g., did you edit a disk, dataset, zone, etc?)
Agreed. I think returning multiple EditCounts
would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do care, especially for the purposes of keeping blueprint diffs accurate.
Can you say more about this? I think the return values of these functions are only really used in logging and some tests (which assert on the specific counts); diffs already do their own independent analysis of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do care, especially for the purposes of keeping blueprint diffs accurate.
Can you say more about this? I think the return values of these functions are only really used in logging and some tests (which assert on the specific counts); diffs already do their own independent analysis of changes.
Which is not to say I'm opposed! Returning a combined edit counts thingamajig seems good, but it will need to be from a finalize(self) -> AllEditCounts
kind of method, I think. Will give that a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I suppose not for the diffs, but the point of passing back the ensure
structs in the first place was to keep track of "what changed", as you say, for tests, comments in the blueprint, etc.
(I'll take a look at the AllEditCounts
stuff, but it sounds like we're on the same page here)
self.config | ||
.disks | ||
.insert(PhysicalDiskUuid::from_untyped_uuid(disk.id), disk) | ||
pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in Sean and my comments in #7104. Can we get rid of the Ensure
type and track the counts in SledDisksEditor
and SledDatasetsEditor
themselves?
self.disks.current_sled_disks(sled_id) | ||
} | ||
|
||
pub fn into_builders( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than returning builder types and then having the BlueprintBuilder
call build on both of them what if we avoid creating the wrapper types and instead make this a build
method that takes the in service sled ids and returns the results as a pair?
That should make the caller code and this code simpler and I think meets the goal of not allowing separate builders to be exposed for datasets and disks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually had it that way originally and it was sort of awkward to require the sled IDs iterator to be cloneable. But I can change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 83a49b7
pub fn ensure_disk(&mut self, disk: BlueprintPhysicalDiskConfig) -> Ensure { | ||
let zpool = ZpoolName::new_external(disk.pool_id); | ||
|
||
let result = self.disks.ensure_disk(disk); | ||
|
||
// We ignore the result of possibly adding or updating the disk's | ||
// dataset. Do we care to log if they've changed even if the disk | ||
// doesn't? | ||
self.datasets.ensure_debug_dataset(zpool.clone()); | ||
self.datasets.ensure_zone_root_dataset(zpool); | ||
|
||
result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to get rid of Ensure
altogether and instead return the EditCounts
recently added in PR 2/5 when build
or finalize
is called.
ab94020
to
552d8bc
Compare
impl Drop for SledDisksEditor<'_> { | ||
fn drop(&mut self) { | ||
if self.changed { | ||
if self.counts != EditCounts::default() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely minor nitpick, but maybe this is worth an EditCounts::has_changes() -> bool
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems reasonable. I might go with has_nonzero_counts()
since this thing itself is just counts, and only the bigger context makes "changes" meaningful.
…ol, it must be from a disk present in the blueprint (#7106) Builds and is staged on top of #7105. The intended change here is in the first commit (8274174): In `BlueprintBuilder::sled_select_zpool()`, instead of only looking at the `PlanningInput`, we also look at the disks present in the blueprint, and only select a zpool that the planning input says is in service and that we have in the blueprint. This had a surprisingly-large blast radius in terms of tests - we had _many_ tests which were adding zones (which implicitly selects a zpool) from a `BlueprintBuilder` where there were no disks configured at all, causing them to emit invalid blueprints. These should all be fixed as of this PR, but I'm a little worried about test fragility in general, particularly with an eye toward larger changes like #7078. Nothing to do about that at the moment, but something to keep an eye on. Fixes #7079.
Builds and is staged on top of #7104. This introduces a
BlueprintStorageEditor
that handles disks and datasets together, and attempts to address the API issues described by #7080:This should allow
BlueprintBuilder
clients who don't or can't callsled_ensure_{disks,datasets}
(like reconfigurator-cli and some tests) to construct valid blueprints.