Skip to content

Conversation

@labbott
Copy link
Contributor

@labbott labbott commented Sep 17, 2025

This is the minimum set of changes to be able to update the next generation sled Cosmo. The most intrusive change is choosing a separate host phase 1 artifact.

This change does not attempt to differentiate between a Gimlet and a Cosmo at the inventory level which will need to happen separately. This change also does not support updating cosmos via reconfigurator driven update.

Comment on lines 724 to 739
// We expect to record a `KnownArtifactKind::Host`
// but we also need to ensure we end up with a unique
// entry of (name, version, ArtifactKind::HOST_PHASE_1)
// for use in the database
let gimlet_unique_artifact_id = ArtifactId {
name: format!("{}-gimlet", artifact_id.name),
version: artifact_id.version.clone(),
kind: KnownArtifactKind::Host.into(),
};

let cosmo_unique_artifact_id = ArtifactId {
name: format!("{}-cosmo", artifact_id.name),
version: artifact_id.version.clone(),
kind: KnownArtifactKind::Host.into(),
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the weirder edge cases I needed to handle. I went back and forth about adding ArtifactKind::COSMO_HOST_PHASE_1 and ArtifactKind::GIMLET_HOST_PHASE_1 but decided against it and used the existing board entry to differentiate between the two artifacts like we do for SP images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility with reconfigurator, I think it would be nicer to have separate ArtifactKind for each of the sled types. If we differentiate via board, the code in the reconfigurator planner would be a bit awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is enough encouragement to get me to do it. It also means we shouldn't need the extra board information floating around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was easier than I expected and it seems much cleaner! Thanks for the nudge to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice to see the cosmo integration work reaching this part of the system!

if self.host_phase_1.is_some() || self.host_phase_2_hash.is_some() {
if self.cosmo_host_phase_1.is_some()
|| self.gimlet_host_phase_1.is_some()
|| self.host_phase_2_hash.is_some()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding. Host phase 2 artifacts will always be the same regardless of the type of sled, yes? Do you have a link to an RFD, or somewhere where I can read more on how this works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karencfv that's correct: the Phase 2 image is common across sled types. Phase 1 differs only because we have different firmware blobs and things (from AMD) that we vary across microarchitectures (Milan vs Turin) that need to be loaded by the PSP; the x86 code (phbl and the OS bits) are identical, however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't think I know an RFD off the top of my head

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split is discussed in RFD 241. RFD 281 discusses the split, but doesn't go into much detail. RFD 284 discusses the phase 1 loading process in detail.

To my knowledge, no RFD has been updated to refer to the board-specific ROM images that are now created during image generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

Comment on lines 724 to 739
// We expect to record a `KnownArtifactKind::Host`
// but we also need to ensure we end up with a unique
// entry of (name, version, ArtifactKind::HOST_PHASE_1)
// for use in the database
let gimlet_unique_artifact_id = ArtifactId {
name: format!("{}-gimlet", artifact_id.name),
version: artifact_id.version.clone(),
kind: KnownArtifactKind::Host.into(),
};

let cosmo_unique_artifact_id = ArtifactId {
name: format!("{}-cosmo", artifact_id.name),
version: artifact_id.version.clone(),
kind: KnownArtifactKind::Host.into(),
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility with reconfigurator, I think it would be nicer to have separate ArtifactKind for each of the sled types. If we differentiate via board, the code in the reconfigurator planner would be a bit awkward.

@labbott labbott force-pushed the labbott/cosmo_awareness branch from c8a2cd1 to bbcb178 Compare September 19, 2025 15:16
update_cx,
&mut host_registrar,
plan,
host_type.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we reuse OxideSled that Dan added to sled-hardware, we can drop these .clone() too (since it's Copy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out we can't drop all the clone because the are wrapped in a StepResult

(
ExtractedArtifactDataHandle,
ExtractedArtifactDataHandle,
ExtractedArtifactDataHandle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace this tuple with a struct to give these names, and the same for a struct containing three &mut HashingNamedUtf8TempFiles for the args to F? (Maybe this also bleeds into the tufaceous API?)

Tuples where all the values are the same type make me nervous in terms of ordering - all the callers just have to get it right, and there isn't any compiler help since swapping any of them still typechecks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the tufaceous API to also handle this and I do think it's cleaner

let gimlet_phase_1_id = ArtifactId {
name: format!("{}-gimlet", artifact_id.name),
version: artifact_id.version.clone(),
kind: ArtifactKind::TRAMPOLINE_PHASE_1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have separate GIMLET and COSMO flavors of TRAMPOLINE_PHASE_1? I don't think we want the name to be the only way to distinguish them. (Maybe this doesn't matter for wicket, since it has access to the named fields here, and doesn't yet matter for Nexus, since it doesn't kick off mupdates. But that last bit will change eventually.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I can add that as well. I think the lack of Nexus work is why I didn't catch this sooner.

(
ExtractedArtifactDataHandle,
ExtractedArtifactDataHandle,
ExtractedArtifactDataHandle,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the tufaceous API to also handle this and I do think it's cleaner

update_cx,
&mut host_registrar,
plan,
host_type.clone(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out we can't drop all the clone because the are wrapped in a StepResult

Comment on lines +114 to +118
struct HostArtifactSource<'a> {
gimlet_host_phase_1: &'a mut HashingNamedUtf8TempFile,
cosmo_host_phase_1: &'a mut HashingNamedUtf8TempFile,
phase_2: &'a mut HashingNamedUtf8TempFile,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out to be easier to have a separate struct vs just using the HostPhaseImageSource from tufaceous directly

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@labbott labbott force-pushed the labbott/cosmo_awareness branch 2 times, most recently from 86d2762 to ce95d1d Compare September 20, 2025 00:04
if artifact.id.kind == ArtifactKind::HOST_PHASE_1 {
// TODO-correctness we only support gimlet at the moment, need
// to tell if this target is a gimlet or a comso
if artifact.id.kind == ArtifactKind::GIMLET_HOST_PHASE_1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could also test for ArtifactKind::COSMO_HOST_PHASE_1 and push into phase_1_artifacts here? Perhaps they're not being generated yet, and perhaps there's a further reason I'm not aware of for not doing so?

I also vaguely wonder whether a match on artifact.id.kind might be more expressive than an if ... else if ... chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the next step of work. I purposely didn't want to plumb through the cosmo portions because I wasn't 100% sure how we wanted to differentiate between cosmo/gimlet at the reconfigurator level.

@labbott labbott force-pushed the labbott/cosmo_awareness branch 2 times, most recently from c0209ca to 066fb48 Compare September 20, 2025 23:23
This is the minimum set of changes to be able to update the next
generation sled Cosmo. The most intrusive change is choosing a
separate host phase 1 artifact.

This change does not attempt to differentiate between a Gimlet
and a Cosmo at the inventory level or ignition level which will
need to happen later via API changes.
@labbott labbott merged commit 9a946c0 into main Sep 21, 2025
18 checks passed
@labbott labbott deleted the labbott/cosmo_awareness branch September 21, 2025 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants