-
Notifications
You must be signed in to change notification settings - Fork 59
Support measurement_corpus #7906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2d6300
to
996d4fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@labbott This looks like a totally reasonable way forward to me. I don't see a reason to change direction.
CreateMeasurementDir, | ||
|
||
/// Writing a measurement corpus | ||
MeasurementCorpus { name: String }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this structure precludes anything I'm about to say. I'm just writing a note as I think about this.
We've talked about stacking multiple versioned corpuses so each TUF repo doesn't need to include the measurements for all versions. This would allow us to check for version N and N-1 measurement values without having to put both of them in one TUF repo. However, for installinator, for a mupdate of a single sled on an existing rack, we'd probably want to be able to include the last few versions of measurements in a TUF repo or have some other way to stack when doing a fresh install. The problem with the former is that we won't always require strict release upgrading from version N-1 to N in the future. The problem with the latter is we now need to be able to source the different manifests from wicket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including previous versions of measurements in a TUF repo would get a little tricky and require some manual hand holding unless our automation gets a lot smarter. What I was planning to do is rotate the measurements from INSTALL
to CLUSTER
after an initial boot/attestation and have a set of all available measurements. I started trying to do that here but I struggled with finding a good spot to do the rotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.
e0062e4
to
cb8de68
Compare
cb8de68
to
016311f
Compare
current | ||
.boot_disk_install_dataset() | ||
.ok_or(MeasurementError::MissingBootDisk)? | ||
.join("measurements"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallagher is this the recommended way to get the install dataset? I think by the time we're calling these functions the boot disks should have come up. I also wasn't sure if we should just be doing something like
let paths: Vec<_> = internal_disks_rx
.current()
.all_install_datasets()
.map(|p| p.join("measurements"))
.collect();
with a new API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sorry if this is pedantic but want to make sure we're on the same page. There are two internal (M.2) disks, but only one of them is the boot disk (whichever slot the SP told us to boot from). What you have here looks fine to me if you only want the install dataset from the single boot disk. If you want to check both internal disks, then yeah your proposed all_install_datasets()
would be the way to go.
I think by the time we're calling these functions the boot disks should have come up.
IIUC, before any of the callers of this function get to run, we will have waited for the boot disk to show up:
omicron/sled-agent/src/long_running_tasks.rs
Lines 105 to 109 in 24d6575
// Wait for the boot disk so that we can work with any ledgers, | |
// such as those needed by the bootstore and sled-agent | |
info!(log, "Waiting for boot disk"); | |
let internal_disks = config_reconciler.wait_for_boot_disk().await; | |
info!(log, "Found boot disk {:?}", internal_disks.boot_disk_id()); |
However, the API still represents this as optional because it's possible the OS could tell us the boot disk has gone away some time later. (Presumably this never happens? Or if it does it's followed swiftly by the sled dying a horrible death? But it's still representable in the API.)
c95acd1
to
6480a41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@labbott This looks great. All my comments are about errors.
I opened this up locally and jumped around via RA to remind myself how the init worked and it all seems right to me.
#[error("Missing INSTALL dataset")] | ||
MissingInstallSet, | ||
#[error("io: {0}")] | ||
Io(std::io::Error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallagher just wrote this wonderful doc about error handling (after this code was written), and I think we should follow it now. I'm as guilty as anyone of doing the wrong thing here, but I'm going to do better and also start pointing this stuff out in code reviews.
As described in section 2 of that doc we should tag std::io::Error
with #[source]
and include context such as the file or directory being operated on in another field. We also shouldn't print the source directly in the display implementation.
dev-tools/releng/src/hubris.rs
Outdated
fs::create_dir_all(&output_dir).await?; | ||
|
||
if !std::fs::exists(&output_dir.join("measurement_corpus"))? { | ||
fs::create_dir_all(&output_dir.join("measurement_corpus")).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a .context()
here for the directory that doesn't exist? Maybe do the join in a temporary and then add that into the context.
dev-tools/releng/src/hubris.rs
Outdated
if let Some(corpus) = hash_manifest.corpus { | ||
let hash = match corpus { | ||
Source::File(file) => file.hash, | ||
_ => anyhow::bail!("Unexpected file type"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth specifying what the mismatched type is?
dev-tools/releng/src/hubris.rs
Outdated
output_dir.join("measurement_corpus").join(hash), | ||
data, | ||
) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another spot where a .context
may be useful.
dev-tools/releng/src/tuf.rs
Outdated
|
||
for entry in std::fs::read_dir( | ||
output_dir.join("hubris-staging").join("measurement_corpus"), | ||
)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.context
?
CreateMeasurementDir, | ||
|
||
/// Writing a measurement corpus | ||
MeasurementCorpus { name: String }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As long as we can handle sleds running either old (for some specific set of old) or new software, this should work fine.
sled-agent/src/bootstrap/server.rs
Outdated
#[error("Failed to initialize lrtq node as learner: {0}")] | ||
FailedLearnerInit(bootstore::NodeRequestError), | ||
|
||
#[error("Measurment error: {0}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the error guide, we shouldn't display the inner value here with {0}
and should instead mark it as #[source]
. We have the same problem above, which is my fault!
9748b25
to
6e0c442
Compare
Uses a basic measurement set with sprockets. Future work will include rotation of measurements.
No description provided.