Skip to content

Return sentinels from sled_insert_resource_query#10399

Merged
jmpesp merged 8 commits intooxidecomputer:mainfrom
jmpesp:another_instance_start_permutation_party
May 8, 2026
Merged

Return sentinels from sled_insert_resource_query#10399
jmpesp merged 8 commits intooxidecomputer:mainfrom
jmpesp:another_instance_start_permutation_party

Conversation

@jmpesp
Copy link
Copy Markdown
Contributor

@jmpesp jmpesp commented May 7, 2026

The failure mode seen in #10353 involves multiple contending sled reservations and lead to a sled reservation iterating over all possible potential allocation mappings, causing the instance-start to appear to hang when in reality it was exploring all potential permutations.

The fix for that was to update the iterator's stored information so that it could recognize when the sled insert query it was about to execute could never work. This fix was insufficient as further testing lead to more instance-start sagas that appeared stuck.

During contention, there are a few broad categories to how the sled reservation insert query could fail:

  • the target sled could no longer have the available resources

  • putting a VMM on the target sled would now violate affinity or anti-affinity rules

  • the pools affected by requested local storage mapping no longer have the required available space

Updating the iterator's stored information would only let Nexus detect the last category where the instance-start saga hang seen today fell into the first. Nexus needs to know which part of the insert query failed so that it can decide what to do based on if local storage allocations are required or not, and this lead to this commit: using the "true or cast error" pattern, return sentinels from the sled insert resource query, and bail out of searching for a valid local storage allocation permutation if the sled target is no longer valid.

Fixes #10393

The failure mode seen in oxidecomputer#10353 involves multiple contending sled
reservations and lead to a sled reservation iterating over all possible
potential allocation mappings, causing the instance-start to appear to
hang when in reality it was exploring all potential permutations.

The fix for that was to update the iterator's stored information so that
it could recognize when the sled insert query it was about to execute
could never work. This fix was insufficient as further testing lead to
more instance-start sagas that appeared stuck.

During contention, there are a few broad categories to how the sled
reservation insert query could fail:

- the target sled could no longer have the available resources

- putting a VMM on the target sled would now violate affinity or
  anti-affinity rules

- the pools affected by requested local storage mapping no longer have
  the required available space

Updating the iterator's stored information would only let Nexus detect
the last category where the instance-start saga hang seen today fell
into the first. Nexus needs to know which part of the insert query
failed so that it can decide what to do based on if local storage
allocations are required or not, and this lead to this commit: using the
"true or cast error" pattern, return sentinels from the sled insert
resource query, and bail out of searching for a valid local storage
allocation permutation if the sled target is no longer valid.

Fixes oxidecomputer#10393
@jmpesp jmpesp requested review from hawkw and smklein May 7, 2026 02:51
|query| {
query.sql("EXISTS(SELECT 1 FROM sled_has_space)");
},
"SLED_HAS_SPACE",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a nitpick, but seems like actually important future-proofing:

Can we do:

const SPACE_SENTINEL: &static str = "SLED HAS SPACE";

const SLED_INSERT_QUERY_SENTINELS: [&'static str; 3] =
    [SPACE_SENTINEL, THE OTHER ONES ];

The contents of these strings is load-bearing; I'd like to define them in a single spot

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, same brain: #10399 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// as errors as this branch does not have any
// requested local storage allocations. Ignore these
// and proceed to the next sled_target.
info!(&log, "reservation failed due to {sentinel}");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other variants of this, we use the "sentinel" as a "dense but clear" indicator of the class of error, and then in rust code, we "Match on that variant to return a more readable error".

E.g., seeing BANNED_SLEDS in the logs with no context requires that someone go dig into the code. Might be nice to have a match statement going from "sentinel" -> "what does it mean in english though"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also same brain lol #10399 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


info!(
&log,
"reservation failed due to {sentinel}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about having a decoder ring for the sentinels

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1217 to +1223
{
// The only part of the `insert_valid` section of
// the insertion query that can fail are the same
// places where these sentinels are cast and thrown
// as errors as this branch does not have any
// requested local storage allocations. Ignore these
// and proceed to the next sled_target.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
{
// The only part of the `insert_valid` section of
// the insertion query that can fail are the same
// places where these sentinels are cast and thrown
// as errors as this branch does not have any
// requested local storage allocations. Ignore these
// and proceed to the next sled_target.
{
// The only part of the `insert_valid` section of
// the insertion query that can fail are the same
// places where these sentinels are cast and thrown
// as errors, as this branch does not have any
// requested local storage allocations. Ignore these
// and proceed to the next sled_target.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info!(&log, "reservation failed");
Err(e) => {
if let Some(sentinel) =
matches_sentinel(&e, &SLED_INSERT_QUERY_SENTINELS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I had not seen matches_sentinel before and was going to leave some criticism of it, but I guess it's actually code that was here before this PR. Basically, I was just going to suggest that rather than doing this, we could have an error enum and convert the query error into an enum of SledHasSpace, BannedSleds, RequiredSleds, DbError(e) and match on that here...but, since this is a pre-existing pattern, I'm not going to make you go change it up as part of this bugfix. Carry on!

// as errors as this branch does not have any
// requested local storage allocations. Ignore these
// and proceed to the next sled_target.
info!(&log, "reservation failed due to {sentinel}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit: i feel like "reservation failed due to SLED_HAS_SPACE" is kind of a weird sentence, and i wonder if we ought to have a way to turn those sentinels into more human-able descriptions. on the other hand, i think including the sentinel that we hit in the log line is a good idea as it's greppable in the source when debugging.

maybe just turning the sentinel into a structured field would make it feel less weird?

Suggested change
info!(&log, "reservation failed due to {sentinel}");
info!(&log, "reservation failed"; "sentinel" => sentinel);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread nexus/db-queries/src/db/datastore/sled.rs
Comment on lines +1353 to +1355
// Alternatively, checking affinity related
// conditions could now also invalidate this
// sled_target.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe:

Suggested change
// Alternatively, checking affinity related
// conditions could now also invalidate this
// sled_target.
// Alternatively, concurrent reservations
// could have allocated an instance which
// makes this sled_target invalid for the
// instance we are trying to allocate due
// to affinity/anti-affinity constraints.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// sled_target.
//
// This isn't a problem when _not_ performing
// local storage allocations because the insert
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// local storage allocations because the insert
// local storage allocations, because the insert

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎

Comment on lines +1360 to +1366
// be chosen right away. With local storage
// (just like the failure mode mentioned in a
// previous block comment), the iterator will
// erroneously keep returning different
// permutations, meanwhile the insert query
// isn't failing due to the available local
// storage space.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not totally sure if I get what this comment is trying to say: you're saying that if we are allocating an instance with local storage, and we hit a sentinel indicating that a concurrent allocation rendered this sled target invalid, the insert query won't fail? That seems counter to what's actually happening here. It seems like the point this comment is actually trying to make is that when an insert query fails, we must break out of the iteration so that we don't keep trying other a whole bunch of other possible local storage allocations on the same sled which will also fail. Am I understanding correctly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're saying that if we are allocating an instance with local storage, and we hit a sentinel indicating that a concurrent allocation rendered this sled target invalid, the insert query won't fail?

I wasn't precise with the language here: the insert will fail in the sense that it returns one of the sentinel errors. We have to break out of the iteration only when we see one of the sentinels. I rewrote the comment in 7c00158

Comment on lines +415 to +443
"SLED_HAS_SPACE",
);
query.sql(" AND ");

query.true_or_cast_error(
|query| {
query
.sql("NOT(EXISTS(SELECT 1 FROM banned_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql("))");
},
"BANNED_SLEDS",
);
query.sql(" AND ");

query.true_or_cast_error(
|query| {
query
.sql("(EXISTS(SELECT 1 FROM required_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql(") OR NOT EXISTS (SELECT 1 FROM required_sleds))");
},
"REQUIRED_SLEDS",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, perhaps not that important, but: since the string value of the sentinels is both (a) load-bearing and (b) would not change the validity of the SQL query itself if typo'd, I'd kind of rather that these were string constants defined in this module, and the array of sentinels used in sled.rs was defined in terms of those constants. so here, we'd write:

Suggested change
"SLED_HAS_SPACE",
);
query.sql(" AND ");
query.true_or_cast_error(
|query| {
query
.sql("NOT(EXISTS(SELECT 1 FROM banned_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql("))");
},
"BANNED_SLEDS",
);
query.sql(" AND ");
query.true_or_cast_error(
|query| {
query
.sql("(EXISTS(SELECT 1 FROM required_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql(") OR NOT EXISTS (SELECT 1 FROM required_sleds))");
},
"REQUIRED_SLEDS",
SLED_HAS_SPACE,
);
query.sql(" AND ");
query.true_or_cast_error(
|query| {
query
.sql("NOT(EXISTS(SELECT 1 FROM banned_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql("))");
},
BANNED_SLEDS,
);
query.sql(" AND ");
query.true_or_cast_error(
|query| {
query
.sql("(EXISTS(SELECT 1 FROM required_sleds WHERE sled_id = ")
.param()
.bind::<sql_types::Uuid, _>(
resource.sled_id.into_untyped_uuid(),
)
.sql(") OR NOT EXISTS (SELECT 1 FROM required_sleds))");
},
REQUIRED_SLEDS,

and then somewhere at the top of this file:

/// The sled insert query will return a sentinel (using the pattern of an
/// erroneous cast of a string into a bool) if the insert is no longer valid due
/// to three of the conditions related to a sled's available hardware resources,
/// or affinity rules.
pub const SLED_INSERT_QUERY_SENTINELS: [&'static str; 3] =
    [SLED_HAS_SPACE, BANNED_SLEDS, REQUIRED_SLEDS];

const SLED_HAS_SPACE: &'static str = "SLED_HAS_SPACE";
const BANNED_SLEDS: &'static str = "BANNED_SLEDS";
const REQUIRED_SLEDS: &'static str = "REQUIRED_SLEDS";

This way, we are guaranteeing that the sentinel value in the query and the one that we are checking for in the sled-allocation logic are always the same string. I realize that you didn't typo them here, but I think it's maybe worth establishing the pattern now so that future code which might add a new sentinel value also does it this way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some annoying nits. overall, though, i think the change makes sense!

Comment on lines +142 to +156
// [`TrueOrCastError`] exists but requires an expression that evaluates to
// sql_type::Bool. This function applies the same pattern but for the
// QueryBuilder.
pub fn true_or_cast_error<F>(&mut self, condition: F, error: &'static str)
where
F: FnOnce(&mut QueryBuilder),
{
self.sql("CAST(IF(");
self.sql("(");
condition(self);
self.sql("),");
self.sql("'TRUE', '");
self.sql(error);
self.sql("') AS BOOL)");
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cute! i wonder if it might make sense for TrueOrCastError to be reimplemented using this but LET'S DO THAT IN A SEPARATE PR FROM THE BUGFIX. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not actually something that changed in this PR, but i noticed that there is no longer a loop labeled 'outer, as neither loop is labeled. mayhaps we should update that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 75dedae

"reservation failed due to {sentinel}",
);

break;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is doing the correct thing and breaking out of the inner loop, so this isn't a bug or anything, but...part of me would kind of like to see a label on the inner loop and use a labeled break here (like break 'local_storage_allocations or something?). mainly, there is just a lot of nesting going on here and (especially because the loop keywords are several lines above here due to the very long comments (which , for the record, are good and we should keep!) but it's a bit hard to tell which loop is being breaked here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 75dedae

Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback, this looks good to me! I had a couple more little nitpicks, but nothing worth blocking the fix over.

Comment on lines +26 to +33
use crate::db::queries::sled_reservation::BANNED_SLEDS_SENTINEL;
use crate::db::queries::sled_reservation::BANNED_SLEDS_SENTINEL_REASON;
use crate::db::queries::sled_reservation::LocalStorageAllocation;
use crate::db::queries::sled_reservation::LocalStorageAllocationRequired;
use crate::db::queries::sled_reservation::REQUIRED_SLEDS_SENTINEL;
use crate::db::queries::sled_reservation::REQUIRED_SLEDS_SENTINEL_REASON;
use crate::db::queries::sled_reservation::SLED_HAS_SPACE_SENTINEL;
use crate::db::queries::sled_reservation::SLED_HAS_SPACE_SENTINEL_REASON;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, i find it a bit odd that all the sentinels and their reasons are defined in sled_reservation but the SLED_INSERT_QUERY_SENTINELS array and sentinel_to_reason are defined here...I kinda feel like we could import less stuff if we put all the sentinel-related code in that module? then you would just need SLED_INSERT_QUERY_SENTINELS and SENTINEL_TO_REASON...

not a big deal though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I agree - f60dfe9

Comment on lines +675 to +680
SLED_HAS_SPACE_SENTINEL => SLED_HAS_SPACE_SENTINEL_REASON,

BANNED_SLEDS_SENTINEL => BANNED_SLEDS_SENTINEL_REASON,

REQUIRED_SLEDS_SENTINEL => REQUIRED_SLEDS_SENTINEL_REASON,

Copy link
Copy Markdown
Member

@hawkw hawkw May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird whitespace here

Suggested change
SLED_HAS_SPACE_SENTINEL => SLED_HAS_SPACE_SENTINEL_REASON,
BANNED_SLEDS_SENTINEL => BANNED_SLEDS_SENTINEL_REASON,
REQUIRED_SLEDS_SENTINEL => REQUIRED_SLEDS_SENTINEL_REASON,
SLED_HAS_SPACE_SENTINEL => SLED_HAS_SPACE_SENTINEL_REASON,
BANNED_SLEDS_SENTINEL => BANNED_SLEDS_SENTINEL_REASON,
REQUIRED_SLEDS_SENTINEL => REQUIRED_SLEDS_SENTINEL_REASON,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not that weird! f60dfe9


REQUIRED_SLEDS_SENTINEL => REQUIRED_SLEDS_SENTINEL_REASON,

_ => unreachable!("unrecognized sentinel"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd kinda rather we not panic here, even though I agree that it should be unreachable in real life. We could just return the string instead?

Suggested change
_ => unreachable!("unrecognized sentinel"),
_ => "unrecognized sentinel",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly we could also "just return the sentinel" in this case. the point is just "improve readability" in this case

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like Sean's thing, you should do that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) {
// Concurrent sled reservations could have
// allocated enough hardware threads, RSS RAM,
// and/ or reservoir RAM that means this
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// and/ or reservoir RAM that means this
// and/or reservoir RAM that means this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1395 to +1396
// permutations, meanwhile the insert query
// isn't succeeding for reasons other than the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// permutations, meanwhile the insert query
// isn't succeeding for reasons other than the
// permutations. Meanwhile, the insert query
// will not succeed for reasons other than the

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1397 to +1401
// available space for local storage. In this
// case, one of the sentinels will be returned,
// and if we see those then bail out of the
// search right away and pick another
// sled_target.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this is definitely clearer!

@hawkw hawkw requested a review from smklein May 7, 2026 22:20

REQUIRED_SLEDS_SENTINEL => REQUIRED_SLEDS_SENTINEL_REASON,

_ => unreachable!("unrecognized sentinel"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly we could also "just return the sentinel" in this case. the point is just "improve readability" in this case

@jmpesp jmpesp enabled auto-merge (squash) May 8, 2026 02:15
@jmpesp jmpesp merged commit 7c6ad41 into oxidecomputer:main May 8, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too many local storage requests ruin the instance placement solver

3 participants