Skip to content

[blueprint][datastore] Load blueprints before parsing them#9703

Merged
smklein merged 2 commits intomainfrom
bp-read-before-parse
Jan 22, 2026
Merged

[blueprint][datastore] Load blueprints before parsing them#9703
smklein merged 2 commits intomainfrom
bp-read-before-parse

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Jan 21, 2026

This avoids parsing a "torn blueprint" by leaving all parsing checks until after the Blueprint rows have been confirmed
to arrive from a non-deleted blueprint.

Fixes #9697


paginator = p.found_batch(&batch, &|(s, _, _)| s.sled_id);

for (s, slot_a_version, slot_b_version) in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to lines 1199-1237


paginator = p.found_batch(&batch, &|n| n.id);

for n in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to lines 1229-1237


paginator = p.found_batch(&batch, &|(z, _)| z.id);

for (z, artifact) in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to lines 1240-1292


paginator = p.found_batch(&batch, &|d| d.id);

for d in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to lines 1295-1322


paginator = p.found_batch(&batch, &|d| d.id);

for d in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to lines 1325-1353

paginator =
p.found_batch(&batch, &|k| k.omicron_zone_id);

for k in batch {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to line 1356ish

)
.map_err(|_| {
Error::internal_error(&format!(
"max server id is negative: {}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was a copy-pasted error message from above, but it's misleading. This is actually about highest_seen_keeper_leader_committed_log_index. Updated when moved on line 1427

@smklein smklein marked this pull request as ready for review January 21, 2026 23:37
@smklein smklein requested a review from jgallagher January 21, 2026 23:48
@smklein
Copy link
Collaborator Author

smklein commented Jan 21, 2026

Somewhat tricky to validate the original flake, since it was a timing issue, without some pretty invasive changes. Here's what I did:

  • Made the blueprint loading task yield + sleep between loading NICs + zones. This made the test flake reliably, which pretty much confirms "it was a concurrent delete" between those phases.
  • Re-ordered the row-reading to happen earlier, and parsing to happen later
  • Test passes, regardless of if/when/where we yield and sleep

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the organization here quite a lot; the block comments about the stages are very useful.

@smklein smklein merged commit 9ccd203 into main Jan 22, 2026
16 checks passed
@smklein smklein deleted the bp-read-before-parse branch January 22, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test failed in CI: test_concurrent_blueprint_read_delete

2 participants