Skip to content

[sled-agent] Encrypt a specific set of U.2 datasets #4853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 22, 2024
Merged

[sled-agent] Encrypt a specific set of U.2 datasets #4853

merged 16 commits into from
Jan 22, 2024

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jan 19, 2024

This PR does the following:

  • As a part of processing U.2s during initialization, ensure_zpool_datasets_are_encrypted is invoked. This identifies all datasets which should be encrypted (cockroachdb, clickhouse, internal_dns, external_dns, clickhouse_keeper) and performs a one-way migration from unencrypted to encrypted dataset.
  • Additionally, during zone launching, the sled agent verifies properties about datasets that it expects should be encrypted. This helps prevent these encrypted dataset from being used before their transfer has finished, and also prevents these zones from ever using unencrypted datasets in the future.
  • Furthermore, for all new deployments, this PR uses encryption on these datasets by default.

@@ -2930,6 +2948,35 @@ impl ServiceManager {
// Currently, the zone filesystem should be destroyed between
// reboots, so it's fine to make this decision locally.
let root = if let Some(dataset) = zone.dataset_name() {
// Check that the dataset is actually ready to be used.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where, on the "zone launching pathway", we validate these properties before starting a zone with a dataset.

I believe this should prevent us from using a dataset before the sled-storage code has "finalized" it with expected properties, and should be a guard against using unencrypted datasets generally in the future.

};
check_property("zoned", zoned, "on")?;
check_property("canmount", canmount, "on")?;
if dataset.dataset().dataset_should_be_encrypted() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this check because crucible's dataset is still unencrypted.

//
// Crucible already performs encryption internally, and we
// avoid double-encryption.
DatasetKind::Crucible => false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this line, Crucible will also become encrypted, and Crucible datasets will be automatically migrated.

I think that's the only change which would be necessary to make that move

///
/// If this dataset should be encrypted, this automatically adds the
/// "crypt" dataset component.
pub fn full_name(&self) -> String {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're looking for "how are we changing the provisioning of new datasets to become encrypted?", it's here.

There's some code in sled-storage/src/manager.rs that invokes this function to find the full path of the dataset -- since we return encrypted names for "should-be-encrypted" datasets, that's where they get placed.

Comment on lines +479 to +485
// TODO: Could do this in parallel?
for dataset in unencrypted_datasets {
let log = &log.new(slog::o!("dataset" => dataset.clone()));
info!(log, "Found unencrypted dataset");

ensure_zpool_dataset_is_encrypted(&log, &zpool_name, &dataset).await?;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we actually want to parallelize this, but it would be easy to do.

Seems like we should make disk addition parallel at least, so we can fire up all U.2s at once -- but we might get less benefit from putting extra workload on a U.2 already performing a migration. That's all speculative though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to start adding some concurrency support later today, or Monday at the latest. Got a touch delayed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good - I think it would be nice to have, at any time, but I don't think it needs to be a blocker here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I meant concurrency support for Disk::new, not ensure_zpool_dataset_is_encrypted. I agree with your speculation here, and if actual migrations are too slow we can revisit. I'm hesitant to complicate this unnecessarily.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic @smklein!

source: illumos_utils::zfs::GetValueError,
},

#[error("Cannot launch {zone} with {dataset} (saw {prop_name} = {prop_value}, expected {prop_value_expected})")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the detail in this error

Comment on lines +479 to +485
// TODO: Could do this in parallel?
for dataset in unencrypted_datasets {
let log = &log.new(slog::o!("dataset" => dataset.clone()));
info!(log, "Found unencrypted dataset");

ensure_zpool_dataset_is_encrypted(&log, &zpool_name, &dataset).await?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I meant concurrency support for Disk::new, not ensure_zpool_dataset_is_encrypted. I agree with your speculation here, and if actual migrations are too slow we can revisit. I'm hesitant to complicate this unnecessarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants