Skip to content

Conversation

labbott
Copy link
Contributor

@labbott labbott commented Apr 2, 2025

No description provided.

@labbott labbott marked this pull request as draft April 2, 2025 17:58
@labbott labbott force-pushed the measurement_corpus branch from a2d6300 to 996d4fc Compare April 30, 2025 19:21
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@labbott This looks like a totally reasonable way forward to me. I don't see a reason to change direction.

CreateMeasurementDir,

/// Writing a measurement corpus
MeasurementCorpus { name: String },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this structure precludes anything I'm about to say. I'm just writing a note as I think about this.

We've talked about stacking multiple versioned corpuses so each TUF repo doesn't need to include the measurements for all versions. This would allow us to check for version N and N-1 measurement values without having to put both of them in one TUF repo. However, for installinator, for a mupdate of a single sled on an existing rack, we'd probably want to be able to include the last few versions of measurements in a TUF repo or have some other way to stack when doing a fresh install. The problem with the former is that we won't always require strict release upgrading from version N-1 to N in the future. The problem with the latter is we now need to be able to source the different manifests from wicket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including previous versions of measurements in a TUF repo would get a little tricky and require some manual hand holding unless our automation gets a lot smarter. What I was planning to do is rotate the measurements from INSTALL to CLUSTER after an initial boot/attestation and have a set of all available measurements. I started trying to do that here but I struggled with finding a good spot to do the rotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.

@labbott labbott force-pushed the measurement_corpus branch from e0062e4 to cb8de68 Compare May 14, 2025 17:24
@labbott labbott force-pushed the measurement_corpus branch from cb8de68 to 016311f Compare July 2, 2025 18:51
Comment on lines 32 to 33
current
.boot_disk_install_dataset()
.ok_or(MeasurementError::MissingBootDisk)?
.join("measurements"),
Copy link
Contributor Author

@labbott labbott Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgallagher is this the recommended way to get the install dataset? I think by the time we're calling these functions the boot disks should have come up. I also wasn't sure if we should just be doing something like

    let paths: Vec<_> = internal_disks_rx
        .current()
        .all_install_datasets()
        .map(|p| p.join("measurements"))
        .collect();

with a new API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sorry if this is pedantic but want to make sure we're on the same page. There are two internal (M.2) disks, but only one of them is the boot disk (whichever slot the SP told us to boot from). What you have here looks fine to me if you only want the install dataset from the single boot disk. If you want to check both internal disks, then yeah your proposed all_install_datasets() would be the way to go.

I think by the time we're calling these functions the boot disks should have come up.

IIUC, before any of the callers of this function get to run, we will have waited for the boot disk to show up:

// Wait for the boot disk so that we can work with any ledgers,
// such as those needed by the bootstore and sled-agent
info!(log, "Waiting for boot disk");
let internal_disks = config_reconciler.wait_for_boot_disk().await;
info!(log, "Found boot disk {:?}", internal_disks.boot_disk_id());

However, the API still represents this as optional because it's possible the OS could tell us the boot disk has gone away some time later. (Presumably this never happens? Or if it does it's followed swiftly by the sled dying a horrible death? But it's still representable in the API.)

@labbott labbott force-pushed the measurement_corpus branch from c95acd1 to 6480a41 Compare July 9, 2025 14:21
@labbott labbott marked this pull request as ready for review July 9, 2025 14:22
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@labbott This looks great. All my comments are about errors.

I opened this up locally and jumped around via RA to remind myself how the init worked and it all seems right to me.

#[error("Missing INSTALL dataset")]
MissingInstallSet,
#[error("io: {0}")]
Io(std::io::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgallagher just wrote this wonderful doc about error handling (after this code was written), and I think we should follow it now. I'm as guilty as anyone of doing the wrong thing here, but I'm going to do better and also start pointing this stuff out in code reviews.

As described in section 2 of that doc we should tag std::io::Error with #[source] and include context such as the file or directory being operated on in another field. We also shouldn't print the source directly in the display implementation.

fs::create_dir_all(&output_dir).await?;

if !std::fs::exists(&output_dir.join("measurement_corpus"))? {
fs::create_dir_all(&output_dir.join("measurement_corpus")).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a .context() here for the directory that doesn't exist? Maybe do the join in a temporary and then add that into the context.

if let Some(corpus) = hash_manifest.corpus {
let hash = match corpus {
Source::File(file) => file.hash,
_ => anyhow::bail!("Unexpected file type"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth specifying what the mismatched type is?

output_dir.join("measurement_corpus").join(hash),
data,
)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another spot where a .context may be useful.


for entry in std::fs::read_dir(
output_dir.join("hubris-staging").join("measurement_corpus"),
)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.context?

CreateMeasurementDir,

/// Writing a measurement corpus
MeasurementCorpus { name: String },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.

#[error("Failed to initialize lrtq node as learner: {0}")]
FailedLearnerInit(bootstore::NodeRequestError),

#[error("Measurment error: {0}")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the error guide, we shouldn't display the inner value here with {0} and should instead mark it as #[source]. We have the same problem above, which is my fault!

@labbott labbott force-pushed the measurement_corpus branch 3 times, most recently from 9748b25 to 6e0c442 Compare July 22, 2025 20:20
Uses a basic measurement set with sprockets. Future work will
include rotation of measurements.
@labbott labbott merged commit f897585 into main Jul 23, 2025
18 checks passed
@labbott labbott deleted the measurement_corpus branch July 23, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants