Support measurement_corpus #7906

labbott · 2025-04-02T17:58:41Z

No description provided.

andrewjstone

@labbott This looks like a totally reasonable way forward to me. I don't see a reason to change direction.

andrewjstone · 2025-05-02T00:25:14Z

installinator-common/src/progress.rs

+    CreateMeasurementDir,
+
+    /// Writing a measurement corpus
+    MeasurementCorpus { name: String },


I don't think this structure precludes anything I'm about to say. I'm just writing a note as I think about this.

We've talked about stacking multiple versioned corpuses so each TUF repo doesn't need to include the measurements for all versions. This would allow us to check for version N and N-1 measurement values without having to put both of them in one TUF repo. However, for installinator, for a mupdate of a single sled on an existing rack, we'd probably want to be able to include the last few versions of measurements in a TUF repo or have some other way to stack when doing a fresh install. The problem with the former is that we won't always require strict release upgrading from version N-1 to N in the future. The problem with the latter is we now need to be able to source the different manifests from wicket.

Including previous versions of measurements in a TUF repo would get a little tricky and require some manual hand holding unless our automation gets a lot smarter. What I was planning to do is rotate the measurements from INSTALL to CLUSTER after an initial boot/attestation and have a set of all available measurements. I started trying to do that here but I struggled with finding a good spot to do the rotation.

Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.

installinator/src/write.rs

dev-tools/releng/src/tuf.rs

labbott · 2025-07-09T13:14:41Z

sled-agent/src/bootstrap/measurements.rs

+        current
+            .boot_disk_install_dataset()
+            .ok_or(MeasurementError::MissingBootDisk)?
+            .join("measurements"),


@jgallagher is this the recommended way to get the install dataset? I think by the time we're calling these functions the boot disks should have come up. I also wasn't sure if we should just be doing something like

let paths: Vec<_> = internal_disks_rx .current() .all_install_datasets() .map(|p| p.join("measurements")) .collect();

with a new API

Hm, sorry if this is pedantic but want to make sure we're on the same page. There are two internal (M.2) disks, but only one of them is the boot disk (whichever slot the SP told us to boot from). What you have here looks fine to me if you only want the install dataset from the single boot disk. If you want to check both internal disks, then yeah your proposed all_install_datasets() would be the way to go.

I think by the time we're calling these functions the boot disks should have come up.

IIUC, before any of the callers of this function get to run, we will have waited for the boot disk to show up:

omicron/sled-agent/src/long_running_tasks.rs

Lines 105 to 109 in 24d6575

// Wait for the boot disk so that we can work with any ledgers,

// such as those needed by the bootstore and sled-agent

info!(log, "Waiting for boot disk");

let internal_disks = config_reconciler.wait_for_boot_disk().await;

info!(log, "Found boot disk {:?}", internal_disks.boot_disk_id());

However, the API still represents this as optional because it's possible the OS could tell us the boot disk has gone away some time later. (Presumably this never happens? Or if it does it's followed swiftly by the sled dying a horrible death? But it's still representable in the API.)

andrewjstone

@labbott This looks great. All my comments are about errors.

I opened this up locally and jumped around via RA to remind myself how the init worked and it all seems right to me.

tools/permslip_production

andrewjstone · 2025-07-10T21:36:15Z

sled-agent/src/bootstrap/measurements.rs

+    #[error("Missing INSTALL dataset")]
+    MissingInstallSet,
+    #[error("io: {0}")]
+    Io(std::io::Error),


@jgallagher just wrote this wonderful doc about error handling (after this code was written), and I think we should follow it now. I'm as guilty as anyone of doing the wrong thing here, but I'm going to do better and also start pointing this stuff out in code reviews.

As described in section 2 of that doc we should tag std::io::Error with #[source] and include context such as the file or directory being operated on in another field. We also shouldn't print the source directly in the display implementation.

andrewjstone · 2025-07-10T21:37:52Z

dev-tools/releng/src/hubris.rs

    fs::create_dir_all(&output_dir).await?;

+    if !std::fs::exists(&output_dir.join("measurement_corpus"))? {
+        fs::create_dir_all(&output_dir.join("measurement_corpus")).await?;


can we add a .context() here for the directory that doesn't exist? Maybe do the join in a temporary and then add that into the context.

andrewjstone · 2025-07-10T21:43:56Z

dev-tools/releng/src/hubris.rs

+            if let Some(corpus) = hash_manifest.corpus {
+                let hash = match corpus {
+                    Source::File(file) => file.hash,
+                    _ => anyhow::bail!("Unexpected file type"),


Worth specifying what the mismatched type is?

andrewjstone · 2025-07-10T21:44:40Z

dev-tools/releng/src/hubris.rs

+                    output_dir.join("measurement_corpus").join(hash),
+                    data,
+                )
+                .await?;


Another spot where a .context may be useful.

andrewjstone · 2025-07-10T21:46:09Z

dev-tools/releng/src/tuf.rs

+
+    for entry in std::fs::read_dir(
+        output_dir.join("hubris-staging").join("measurement_corpus"),
+    )? {


.context?

andrewjstone · 2025-07-10T21:49:47Z

installinator-common/src/progress.rs

+    CreateMeasurementDir,
+
+    /// Writing a measurement corpus
+    MeasurementCorpus { name: String },


Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.

andrewjstone · 2025-07-10T22:16:52Z

sled-agent/src/bootstrap/server.rs

    #[error("Failed to initialize lrtq node as learner: {0}")]
    FailedLearnerInit(bootstore::NodeRequestError),
+
+    #[error("Measurment error: {0}")]


Following the error guide, we shouldn't display the inner value here with {0} and should instead mark it as #[source]. We have the same problem above, which is my fault!

sled-agent/src/bootstrap/measurements.rs

Uses a basic measurement set with sprockets. Future work will include rotation of measurements.

labbott marked this pull request as draft April 2, 2025 17:58

labbott force-pushed the measurement_corpus branch from a2d6300 to 996d4fc Compare April 30, 2025 19:21

andrewjstone reviewed May 2, 2025

View reviewed changes

labbott force-pushed the measurement_corpus branch from e0062e4 to cb8de68 Compare May 14, 2025 17:24

labbott force-pushed the measurement_corpus branch from cb8de68 to 016311f Compare July 2, 2025 18:51

labbott commented Jul 2, 2025

View reviewed changes

dev-tools/releng/src/tuf.rs Show resolved Hide resolved

labbott commented Jul 9, 2025

View reviewed changes

labbott force-pushed the measurement_corpus branch from c95acd1 to 6480a41 Compare July 9, 2025 14:21

labbott marked this pull request as ready for review July 9, 2025 14:22

andrewjstone approved these changes Jul 10, 2025

View reviewed changes

labbott force-pushed the measurement_corpus branch 3 times, most recently from 9748b25 to 6e0c442 Compare July 22, 2025 20:20

Support measurement checking via sprockets

6e0c442

Uses a basic measurement set with sprockets. Future work will include rotation of measurements.

labbott merged commit f897585 into main Jul 23, 2025
18 checks passed

labbott deleted the measurement_corpus branch July 23, 2025 11:05

jgallagher mentioned this pull request Jul 23, 2025

Uploading a TUF repo fails while unpacking the composite control plane tarball #8675

Closed

andrewjstone mentioned this pull request Oct 16, 2025

Update sprockets to include latest changes #9232

Merged

	// Wait for the boot disk so that we can work with any ledgers,
	// such as those needed by the bootstore and sled-agent
	info!(log, "Waiting for boot disk");
	let internal_disks = config_reconciler.wait_for_boot_disk().await;
	info!(log, "Found boot disk {:?}", internal_disks.boot_disk_id());

Support measurement_corpus #7906

Support measurement_corpus #7906

Uh oh!

Conversation

labbott commented Apr 2, 2025

Uh oh!

andrewjstone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

labbott Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewjstone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labbott Jul 9, 2025 •

edited

Loading