Add initial trust quorum support #487

andrewjstone · 2021-12-06T22:17:30Z

In order to allow for encrypted storage on individual sleds without the need for
a user to type a password at bootup, we utilize secret sharing across sleds,
where a threshold number of sleds need to communicate in order to generate a
rack secret. This rack secret can then be used to derive local encryption keys
from individual sleds. We therefore provide the ability to prevent an attacker
from stealing a subset of sleds or storage devices and obtaining any data. In
fact, the control plane software does not even boot until the rack secret is
reconstructed and the protected storage unlocked.

There are quite a few moving parts required in order to implement a trust
quorum, some of which involve the service processor and hardware root of trust.
This commit only implements the part of the trust quorum responsible for
retrieiving existing key shares over an unfinished SPDM channel. It runs
entirely on the host machine as part of the sled-agent. The code builds upon the
multicast discovery code in #404, the SPDM negotiation code in #407 and the
secret sharing code in #429.

In the "normal" lifetime of an Oxide rack, a rack secret will be generated upon
initialization of the new rack by the customer. The shares will then be
destributed over SPDM channels to individual sleds such that they can be
retrieved and combined at a later time when an individual sled or the entire
rack reboots. The initial generation and distribution of shares is not part of
this commit. We fake rack initialization through the completely insecure use of
a configuration file provided as part of the omicron-package install that
contains all key shares. The configuration file disables the trust quorum by
default, so that the sled-agent continues to run on a single node. When enabled,
share retrieval attempts will begin and when a quorum of shares are received,
the rack secret will be reconstructed, and the rest of the control plane will
begin to boot. In order for this to work, the user also has to edit the config
file to ensure that a different sled_index (which points to a given unique
share) exists in each config file, and then the sled-agent must be restarted
with svcadm restart sled-agent. The included config file only includes shares
for 2 sleds, but a new one can be generated with the provided
gen_trust_quorum_config program. Lastly, the location of the config file is given in
the sled-agent smf file and passed through as rack_secret_dir in the
BootstrapConfig struct.

The SPDM protocol is run over a 2-byte size header framed transport operating
over a TCP stream. We generate a client and server to initialize this transport,
perform SPDM negotiation, and then begin share retrieval. As noted in #407, only
the negotiation phase of the SPDM protocol is currently implemented, and so we
simply return the TCP based transport when negotiation completes, and pretend
for now that we are operating over a secure channel. This allows us to test out
the end-to-end behavior before we have a production ready SPDM implementation
integrated.

This commit also makes a small change to the SPDM transport to provide for
timeouts on send and recv operations, and no longer requires passing a
logger to each call of recv.

In order to allow for encrypted storage on individual sleds without the need for a user to type a password at bootup, we utilize secret sharing across sleds, where a threshold number of sleds need to communicate in order to generate a `rack secret`. This rack secret can then be used to derive local encryption keys from individual sleds. We therefore provide the ability to prevent an attacker from stealing a subset of sleds or storage devices and obtaining any data. In fact, the control plane software does not even boot until the rack secret is reconstructed and the protected storage unlocked. There are quite a few moving parts required in order to implement a trust quorum, some of which involve the service processor and hardware root of trust. This commit only implements the part of the trust quorum responsible for retrieiving existing key shares over an unfinished SPDM channel. It runs entirely on the host machine as part of the sled-agent. The code builds upon the multicast discovery code in #404, the SPDM negotiation code in #407 and the secret sharing code in #429. In the "normal" lifetime of an Oxide rack, a rack secret will be generated upon initialization of the new rack by the customer. The shares will then be destributed over SPDM channels to individual sleds such that they can be retrieved and combined at a later time when an individual sled or the entire rack reboots. The initial generation and distribution of shares is *not* part of this commit. We fake rack initialization through the completely insecure use of a configuration file provided as part of the `omicron-package` install that contains all key shares. The configuration file disables the trust quorum by default, so that the sled-agent continues to run on a single node. When enabled, share retrieval attempts will begin and when a quorum of shares are received, the rack secret will be reconstructed, and the rest of the control plane will begin to boot. In order for this to work, the user also has to edit the config file to ensure that a different `sled_index` (which points to a given unique share) exists in each config file, and then the sled-agent must be restarted with `svcadm restart sled-agent`. The included config file only includes shares for 2 sleds, but a new one can be generated with the provided `gen_trust_quorum_config` program. Lastly, the location of the config file is given in the sled-agent smf file and passed through as `rack_secret_dir` in the `BootstrapConfig` struct. The SPDM protocol is run over a 2-byte size header framed transport operating over a TCP stream. We generate a client and server to initialize this transport, perform SPDM negotiation, and then begin share retrieval. As noted in #407, only the negotiation phase of the SPDM protocol is currently implemented, and so we simply return the TCP based transport when negotiation completes, and pretend for now that we are operating over a secure channel. This allows us to test out the end-to-end behavior before we have a production ready SPDM implementation integrated. This commit also makes a small change to the SPDM transport to provide for timeouts on `send` and `recv` operations, and no longer requires passing a logger to each call of `recv`.

smklein

Good start - awesome to see this all coming together.

Some questions below about how we're doing things in the short-term vs long-term.

test-utils/src/dev/mod.rs

test-utils/Cargo.toml

sled-agent/src/bootstrap/trust_quorum/config.rs

sled-agent/src/bootstrap/trust_quorum/mod.rs

sled-agent/Cargo.toml

sled-agent/src/bootstrap/trust_quorum/config.rs

sled-agent/src/bootstrap/agent.rs

sled-agent/src/bootstrap/trust_quorum/server.rs

Replace this with a `ShareDistribution` type that will be generated by a deployment system coming in a follow up commit. In the newer code only individual shares for a given sled are handed out, so the sled-agent can't cheat and unlock itself from reading a file. Also cleanup all the other stuff related to Sean's review.

Co-authored-by: Sean Klein <sean@oxide.computer>

smklein

Looks good, thanks for being so thorough with this PR!

(Most of my comments are nits - the PR LGTM!)

sled-agent/src/bin/sled-agent-overlay-files.rs

sled-agent/src/bootstrap/agent.rs

sled-agent/src/bootstrap/trust_quorum/share_distribution.rs

andrewjstone requested a review from smklein December 6, 2021 22:17

Add missing MPL headers

974a9ec

smklein reviewed Dec 7, 2021

View reviewed changes

This was referenced Dec 14, 2021

Remove key-share file reading when trust-quorum keys are dynamically generated #513

Closed

Rack Unlock: Parallelize share retrieval and only retrieve shares not already retrieved #514

Closed

andrewjstone and others added 2 commits December 14, 2021 18:17

Update sled-agent/src/bootstrap/trust_quorum/client.rs

eeac0ac

Co-authored-by: Sean Klein <sean@oxide.computer>

andrewjstone force-pushed the trust-quorum branch from 5207c23 to eeac0ac Compare December 14, 2021 23:22

andrewjstone added 3 commits December 14, 2021 18:28

remove generated trust quorum secrets

37bba0f

fix clippy

fc0118b

Merge branch 'main' into trust-quorum

1ea7e71

smklein approved these changes Dec 17, 2021

View reviewed changes

andrewjstone added 2 commits December 17, 2021 19:59

fixes for review feedback

ae9d666

more fixes for review feedback

e1922a8

andrewjstone merged commit 953b627 into main Dec 18, 2021

andrewjstone deleted the trust-quorum branch December 18, 2021 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add initial trust quorum support #487

Add initial trust quorum support #487

Uh oh!

andrewjstone commented Dec 6, 2021

Uh oh!

smklein left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smklein left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add initial trust quorum support #487

Add initial trust quorum support #487

Uh oh!

Conversation

andrewjstone commented Dec 6, 2021

Uh oh!

smklein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smklein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants