New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keymanager/src/runtime: Support master secret rotations #5196
keymanager/src/runtime: Support master secret rotations #5196
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5196 +/- ##
==========================================
+ Coverage 66.97% 67.12% +0.14%
==========================================
Files 524 516 -8
Lines 55370 55095 -275
==========================================
- Hits 37082 36980 -102
+ Misses 13771 13622 -149
+ Partials 4517 4493 -24
|
eb0cf5c
to
343a604
Compare
551217a
to
900f093
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to explicitly test the scenario where master secret replication fails (e.g. not enough nodes confirm to have replicated the master secret) and then succeeds in the next rotation, making sure that nodes properly handle potential reverts.
Are there any backwards compatibility concerns with regards to running old (22.2.x) enclaves with new consensus state due to added fields? Specifically for the case where a transition to the new version is in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just looking so far. how does the overall protocol work? what was this about key manager nodes replicating the proposed new key? why is that done on the Go side, i.e. outside the enclave?
keymanager/src/crypto/kdf.rs
Outdated
let plaintext = d2 | ||
.open(&nonce, ciphertext.to_vec(), runtime_id) | ||
.expect("persisted state is corrupted"); | ||
let plaintext = match d2.open(&nonce, ciphertext.to_vec(), vec![]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here for example the additional data is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, now proposals have a separate context and now the additional data is not empty but equal to generation number of the proposal.
900f093
to
ad27710
Compare
Added new test.
I tested this once (before comments were taken into account) and secret replication worked. Will test that again with cross-version tests. There should be no compatibility concerns, as new requests (load, generate master secret, ...) should fail when sent to an old enclave and discarded after few retries. |
I thought that we are not logging anything in the |
That is true. Ok, maybe leave it for now as it should never happen (famous last words). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked through the ADR and matched it up with this implementation. some thoughts on the two together:
- naming clarifications for anyone else reading through:
- in the tendermint app, the "master secret," a new storage slot in the state, is only the proposed master secret. that's why it's immediately accepted in the publish master secret transaction. for the actual master secret in use, look in the "status."
- so don't confuse "state" and "status" when you're reading this. "status" is the good thing. "state" is the mechanism that stores the status and other things.
- verbs used on master secrets:
- "generate" refers to an enclave sampling an unused master secret at random. it gives it to its node. this one's pretty intuitive
- "publish" refers to a node giving a proposal to the consensus system
- "load" refers to a node giving a master secret to its enclave. I need to look more into this
- "add" is something within the kdf.rs file. I need to look more into this.
- "rotate" is the action that actually cuts over to another master secret.
there's a check in the consensus layer that a proposal is encrypted for at least 66% of the committee. it can't verify that the payloads are correctly encrypted though, so it's more of a thing to catch unlucky timing for honest nodes. make sure we don't rely on this for security, as a malicious node could enter junk data to pretend that it has encrypted it for many nodes.
I believe there's a confirmation steps where the committee confirms it's able to decrypt the proposal (I haven't walked through the code for this though), so I think we're fine on integrity.
as for availability, it looks like a malicious node could DoS by proposing junk, at the beginning of every epoch. then honest nodes would try to wait for a random time in the epoch so they typically wouldn't compete. is there a mechanism to prevent a node from proposing on every epoch?
the committee proceeds with rotation as long as a majority successfully processes the proposal. is there anything to fill in the secret for the minority that somehow failed? e.g. if they got restarted but hadn't registered their new REK before another node generated a new proposal
// the proposal for the next master secret. | ||
if numNodes := len(status.Nodes); numNodes > 0 && nextChecksum != nil { | ||
percent := len(updatedNodes) * 100 / numNodes | ||
if percent > minProposalReplicationPercent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the consensus app has subtly off-by-one logic, it rejects if percent < 66 when deciding to allow a proposal to be published. >=
would be consistent here
That is right, it is only used as a sanity check, then other nodes need to actually confirm that replication/decryption was successful (by updating its status). Note that "malicious nodes" would actually need to be malicious/compromised enclaves. I mentioned this in one of the earlier comments, if one wanted a better guard against compromised enclaves one would need to include a ZKP that the encryption was performed correctly so that the consensus layer could verify.
Yes that is right, enough enclaves need to actually confirm that replication/decryption was successful (by updating its status) before the master secret will actually be activated.
This is true and there is currently no such mechanism. Note that it would require a compromised enclave to propose bad encryptions (as the enclaves are actually retrieving REKs from the consensus layer state, encrypting to them and signing the entire proposal with their RAK). An alternative would be to collect multiple proposals and then randomly select one based on VRF (at epoch boundary?). This would however delay the rotation, but we could maybe start collecting proposals one epoch earlier?
Yes, the usual master secret replication mechanism that is performed on key manager runtime bootstrapping. There is now also an E2E test for this mechanism as per my earlier comments. |
ah yeah it would have to compromise the enclave. maybe that's fine then |
ad27710
to
b8ec88f
Compare
keymanager/src/crypto/kdf.rs
Outdated
} | ||
} | ||
|
||
fn new_d2(key_policy: Keypolicy, context: &[u8]) -> DeoxysII { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just export the new_d2
from common::sgx::seal
as it seems identical? Or maybe with a slight rename to be more verbose what it is.
b8ec88f
to
f87ffb9
Compare
f025174
to
e7b4180
Compare
Key managers now have the ability to rotate the master secret at predetermined intervals. Each rotation introduces a new generation, or version, of the master secret that is sequentially numbered, starting from zero.
Store only the last ephemeral secret in the key manager state.
- Fixed the grammar issue in the publish ephemeral secret method. - Moved the initialization of secret notifiers to the constructor for better code organization.
- Simplified method for computing master secret generation epoch. - Created scenarios for replicating multiple secrets and for rotation failures. - Added a comment to clarify that a node must register with an empty checksum until the first secret is generated. - Renamed the context for master secret sealing and added a separate context for master secret proposals to enhance the security. - Limited the number of replicated ephemeral secrets. - Replaced fetch functions with secret provider which does not break enclave initialization if the checksum of the replicated secret is invalid. - Fix off-by-one logic when rejecting a master secret proposal. - Rename and export Deoxys-II SGX constructor.
e7b4180
to
c069bb3
Compare
Rebased and fixed conflicts. Upgrade test will be added in another PR after merge. |
Master secret proposal requirements:
Master secret rotation requirements:
next_checksum
field is set).Setup:
Protocol (key manager):
next_checksum
field in the init response.Protocol (node):
KDF(KDF(...KDF(runtime_id, secret_0), secret_N), proposal)
.next checksum
field.checksum
field and emptynext_checksum
field.