Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keymanager/src/runtime: Support master secret rotations #5196

Merged
merged 7 commits into from Jul 7, 2023

Conversation

peternose
Copy link
Contributor

@peternose peternose commented Feb 23, 2023

Master secret proposal requirements:

  • A new master secret can be proposed after the rotation period expires and for the next generation only.
    • The first master secret (generation 0) can be proposed immediately and even if the rotation interval is set to 0.
    • If the rotation interval is set to 0, rotations are disabled and secrets cannot be proposed anymore. To enable them again, update the rotation interval in the policy.
  • Only nodes from the committee can propose secrets.
  • Only one master secret can be proposed per epoch.
    • If accepted, the next secret can be proposed after the rotation interval expires.
    • If not accepted, the next secret can be proposed in the next epoch.

Master secret rotation requirements:

  • Can happen on every epoch if the following conditions are met:
    • A master secret was proposed for the next generation and the current epoch.
      • The rotation period is not verified here as it is already checked when the secret is proposed.
      • Optionally, we can add this check to cover the case when the policy changes after the secret is proposed.
    • The majority of the enclaves have replicated the proposed secret and announced that in their init status (next_checksum field is set).

Setup:

  • The key manager is initialized with an empty checksum and no nodes.
  • Every node needs to register with an empty checksum to be included in the key manager committee.

Protocol (key manager):

  • Try to rotate the master secret every epoch as part of the key manager status generation.
    • Fetch the latest master secret proposal or abort.
    • Verify the generation number and epoch of the proposal or abort.
    • Count how many nodes have stored the proposal locally.
      • Compare the checksum of the proposal to the next_checksum field in the init response.
    • Accept the proposal if the majority of the nodes have replicated the proposal.
      • Increment the generation number by 1.
      • Update the last rotation epoch.
      • Update the checksum.
    • Broadcast the new status.
      • If the generation number has advanced, the enclaves will try to apply the proposal they stored locally.

Protocol (node):

  • Listen to key manager status updates and master secret publication events.
    • Rotation period and next generation can be computed from the key manager status.
  • Wait until the rotation period expires.
  • Select a random block in the current epoch.
  • When the selected block arrives, generate and propose a master secret for the next generation:
    • Generate a random master secret (in enclave).
    • Encrypt the secret to all enclaves from the key manager committee (in enclave).
    • Compute the next checksum (in enclave).
      • The checksum is computed using hash chains, i.e., KDF(KDF(...KDF(runtime_id, secret_0), secret_N), proposal).
    • Publish a proposal for the next master secret.
      • The transaction contains all ciphertexts, the next checksum, and the next epoch.
  • When a new master secret is proposed (either by us or some other node), store it locally:
    • Send the encrypted proposal to the enclave.
    • Decrypt the ciphertext or abort if the secret was not encrypted with the enclave's REK key (in enclave).
    • Encrypt and store the proposal locally (in enclave).
    • Compute the next checksum (in enclave).
    • Initialize the enclave again and register with the updated next checksum field.
    • Postpone the master secret generation to the next epoch.
      • If the current proposal is not replicated by the majority of the enclaves, propose another secret in the next epoch.
  • When the key manager status is updated, try to apply the proposal if the generation number has advanced:
    • Send the key manager status to the enclave.
    • If the generation number is up-to-date, abort (in enclave).
    • Load the proposal for the next master secret (in enclave):
      • If the proposal is not found, replicate it from another enclave (in enclave).
    • Compute and verify the checksum (in enclave).
      • If the checksum matches, accept the proposal and store it locally (in enclave).
      • Otherwise, abort.
    • Initialize the enclave again and register with updated checksum field and empty next_checksum field.
    • Postpone the master secret generation after the rotation period expires.

@codecov
Copy link

codecov bot commented Feb 23, 2023

Codecov Report

Merging #5196 (f87ffb9) into master (0961dcc) will increase coverage by 0.14%.
The diff coverage is 79.35%.

❗ Current head f87ffb9 differs from pull request most recent head e7b4180. Consider uploading reports for the commit e7b4180 to get more accurate results

@@            Coverage Diff             @@
##           master    #5196      +/-   ##
==========================================
+ Coverage   66.97%   67.12%   +0.14%     
==========================================
  Files         524      516       -8     
  Lines       55370    55095     -275     
==========================================
- Hits        37082    36980     -102     
+ Misses      13771    13622     -149     
+ Partials     4517     4493      -24     
Impacted Files Coverage Δ
go/keymanager/api/policy_sgx.go 46.66% <ø> (+13.33%) ⬆️
go/registry/api/api.go 56.54% <ø> (+0.75%) ⬆️
go/worker/keymanager/api/api.go 21.62% <ø> (-10.82%) ⬇️
go/worker/keymanager/metrics.go 100.00% <ø> (ø)
...nsensus/tendermint/apps/keymanager/transactions.go 59.67% <61.16%> (ø)
go/keymanager/api/grpc.go 52.22% <62.50%> (+14.42%) ⬆️
go/consensus/tendermint/keymanager/keymanager.go 72.22% <75.00%> (ø)
...onsensus/tendermint/apps/keymanager/state/state.go 70.11% <76.47%> (ø)
...consensus/tendermint/apps/keymanager/keymanager.go 78.26% <78.26%> (ø)
go/keymanager/api/api.go 78.57% <81.81%> (+0.44%) ⬆️
... and 5 more

... and 275 files with indirect coverage changes

@peternose peternose changed the title go/consensus/tendermint/apps/keymanager: Remove secure transition check Master keys forward secrecy Feb 23, 2023
@peternose peternose changed the title Master keys forward secrecy keymanager/src/runtime: Support master secret rotations Mar 2, 2023
@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch 8 times, most recently from eb0cf5c to 343a604 Compare March 3, 2023 13:27
@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch 7 times, most recently from 551217a to 900f093 Compare March 28, 2023 23:19
@peternose peternose marked this pull request as ready for review March 28, 2023 23:19
Copy link
Member

@kostko kostko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to explicitly test the scenario where master secret replication fails (e.g. not enough nodes confirm to have replicated the master secret) and then succeeds in the next rotation, making sure that nodes properly handle potential reverts.

Are there any backwards compatibility concerns with regards to running old (22.2.x) enclaves with new consensus state due to added fields? Specifically for the case where a transition to the new version is in progress.

go/consensus/tendermint/apps/keymanager/keymanager.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/keymanager/keymanager.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/keymanager/transactions.go Outdated Show resolved Hide resolved
go/consensus/tendermint/keymanager/keymanager.go Outdated Show resolved Hide resolved
go/worker/keymanager/worker.go Outdated Show resolved Hide resolved
go/worker/keymanager/api/api.go Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Outdated Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Outdated Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Show resolved Hide resolved
Copy link
Contributor

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just looking so far. how does the overall protocol work? what was this about key manager nodes replicating the proposed new key? why is that done on the Go side, i.e. outside the enclave?

go/consensus/tendermint/apps/keymanager/transactions.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/keymanager/transactions.go Outdated Show resolved Hide resolved
go/consensus/tendermint/apps/keymanager/transactions.go Outdated Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Show resolved Hide resolved
let plaintext = d2
.open(&nonce, ciphertext.to_vec(), runtime_id)
.expect("persisted state is corrupted");
let plaintext = match d2.open(&nonce, ciphertext.to_vec(), vec![]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here for example the additional data is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, now proposals have a separate context and now the additional data is not empty but equal to generation number of the proposal.

@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch from 900f093 to ad27710 Compare May 8, 2023 08:52
@peternose
Copy link
Contributor Author

Would be good to explicitly test the scenario where master secret replication fails.

Added new test.

Are there any backwards compatibility concerns with regards to running old (22.2.x) enclaves with new consensus state due to added fields? Specifically for the case where a transition to the new version is in progress.

I tested this once (before comments were taken into account) and secret replication worked. Will test that again with cross-version tests.

There should be no compatibility concerns, as new requests (load, generate master secret, ...) should fail when sent to an old enclave and discarded after few retries.

keymanager/src/crypto/kdf.rs Outdated Show resolved Hide resolved
keymanager/src/crypto/kdf.rs Outdated Show resolved Hide resolved
@peternose
Copy link
Contributor Author

Could log an error in case of verification failures instead of silently ignoring?

I thought that we are not logging anything in the oasis-core-keymanager project for security reasons, e.g. to not accidentally log secrets. Should I add the logging anyway? I guess I will have to update the secret fetcher and expose the ID of a node that returned the last secret.

@kostko
Copy link
Member

kostko commented May 9, 2023

Could log an error in case of verification failures instead of silently ignoring?

I thought that we are not logging anything in the oasis-core-keymanager project for security reasons, e.g. to not accidentally log secrets. Should I add the logging anyway? I guess I will have to update the secret fetcher and expose the ID of a node that returned the last secret.

That is true. Ok, maybe leave it for now as it should never happen (famous last words).

Copy link
Contributor

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through the ADR and matched it up with this implementation. some thoughts on the two together:

  • naming clarifications for anyone else reading through:
    • in the tendermint app, the "master secret," a new storage slot in the state, is only the proposed master secret. that's why it's immediately accepted in the publish master secret transaction. for the actual master secret in use, look in the "status."
    • so don't confuse "state" and "status" when you're reading this. "status" is the good thing. "state" is the mechanism that stores the status and other things.
    • verbs used on master secrets:
      • "generate" refers to an enclave sampling an unused master secret at random. it gives it to its node. this one's pretty intuitive
      • "publish" refers to a node giving a proposal to the consensus system
      • "load" refers to a node giving a master secret to its enclave. I need to look more into this
      • "add" is something within the kdf.rs file. I need to look more into this.
      • "rotate" is the action that actually cuts over to another master secret.

there's a check in the consensus layer that a proposal is encrypted for at least 66% of the committee. it can't verify that the payloads are correctly encrypted though, so it's more of a thing to catch unlucky timing for honest nodes. make sure we don't rely on this for security, as a malicious node could enter junk data to pretend that it has encrypted it for many nodes.

I believe there's a confirmation steps where the committee confirms it's able to decrypt the proposal (I haven't walked through the code for this though), so I think we're fine on integrity.

as for availability, it looks like a malicious node could DoS by proposing junk, at the beginning of every epoch. then honest nodes would try to wait for a random time in the epoch so they typically wouldn't compete. is there a mechanism to prevent a node from proposing on every epoch?

the committee proceeds with rotation as long as a majority successfully processes the proposal. is there anything to fill in the secret for the minority that somehow failed? e.g. if they got restarted but hadn't registered their new REK before another node generated a new proposal

// the proposal for the next master secret.
if numNodes := len(status.Nodes); numNodes > 0 && nextChecksum != nil {
percent := len(updatedNodes) * 100 / numNodes
if percent > minProposalReplicationPercent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the consensus app has subtly off-by-one logic, it rejects if percent < 66 when deciding to allow a proposal to be published. >= would be consistent here

@kostko
Copy link
Member

kostko commented May 10, 2023

there's a check in the consensus layer that a proposal is encrypted for at least 66% of the committee. it can't verify that the payloads are correctly encrypted though, so it's more of a thing to catch unlucky timing for honest nodes. make sure we don't rely on this for security, as a malicious node could enter junk data to pretend that it has encrypted it for many nodes.

That is right, it is only used as a sanity check, then other nodes need to actually confirm that replication/decryption was successful (by updating its status). Note that "malicious nodes" would actually need to be malicious/compromised enclaves. I mentioned this in one of the earlier comments, if one wanted a better guard against compromised enclaves one would need to include a ZKP that the encryption was performed correctly so that the consensus layer could verify.

I believe there's a confirmation steps where the committee confirms it's able to decrypt the proposal (I haven't walked through the code for this though), so I think we're fine on integrity.

Yes that is right, enough enclaves need to actually confirm that replication/decryption was successful (by updating its status) before the master secret will actually be activated.

as for availability, it looks like a malicious node could DoS by proposing junk, at the beginning of every epoch. then honest nodes would try to wait for a random time in the epoch so they typically wouldn't compete. is there a mechanism to prevent a node from proposing on every epoch?

This is true and there is currently no such mechanism. Note that it would require a compromised enclave to propose bad encryptions (as the enclaves are actually retrieving REKs from the consensus layer state, encrypting to them and signing the entire proposal with their RAK).

An alternative would be to collect multiple proposals and then randomly select one based on VRF (at epoch boundary?). This would however delay the rotation, but we could maybe start collecting proposals one epoch earlier?

the committee proceeds with rotation as long as a majority successfully processes the proposal. is there anything to fill in the secret for the minority that somehow failed? e.g. if they got restarted but hadn't registered their new REK before another node generated a new proposal

Yes, the usual master secret replication mechanism that is performed on key manager runtime bootstrapping. There is now also an E2E test for this mechanism as per my earlier comments.

@pro-wh
Copy link
Contributor

pro-wh commented May 10, 2023

ah yeah it would have to compromise the enclave. maybe that's fine then

@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch from ad27710 to b8ec88f Compare May 12, 2023 08:13
keymanager/src/secrets/interface.rs Show resolved Hide resolved
}
}

fn new_d2(key_policy: Keypolicy, context: &[u8]) -> DeoxysII {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just export the new_d2 from common::sgx::seal as it seems identical? Or maybe with a slight rename to be more verbose what it is.

@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch from b8ec88f to f87ffb9 Compare May 12, 2023 15:51
@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch 2 times, most recently from f025174 to e7b4180 Compare July 7, 2023 09:39
Key managers now have the ability to rotate the master secret
at predetermined intervals. Each rotation introduces a new generation,
or version, of the master secret that is sequentially numbered,
starting from zero.
Store only the last ephemeral secret in the key manager state.
- Fixed the grammar issue in the publish ephemeral secret method.
- Moved the initialization of secret notifiers to the constructor for better
  code organization.
- Simplified method for computing master secret generation epoch.
- Created scenarios for replicating multiple secrets and for rotation failures.
- Added a comment to clarify that a node must register with an empty checksum
  until the first secret is generated.
- Renamed the context for master secret sealing and added a separate context
  for master secret proposals to enhance the security.
- Limited the number of replicated ephemeral secrets.
- Replaced fetch functions with secret provider which does not break enclave
  initialization if the checksum of the replicated secret is invalid.
- Fix off-by-one logic when rejecting a master secret proposal.
- Rename and export Deoxys-II SGX constructor.
@peternose peternose force-pushed the peternose/feature/master-keys-forward-secrecy branch from e7b4180 to c069bb3 Compare July 7, 2023 09:40
@peternose
Copy link
Contributor Author

Rebased and fixed conflicts. Upgrade test will be added in another PR after merge.

@peternose peternose merged commit 1e8c200 into master Jul 7, 2023
3 checks passed
@peternose peternose deleted the peternose/feature/master-keys-forward-secrecy branch July 7, 2023 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants