feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704

netrome · 2024-04-22T11:15:46Z

Description

This PR extends the signer persistence to allow for storing shares for multiple concurrent DKG rounds, identified by the aggregate public key for that round. On startup, the signer looks up the approved aggregate key in the contract and loads the corresponding shares.

To ensure the Nakamoto sign coordinator is able to pick up loaded DKG shares, the signers broadcast DkgResults messages when loaded with their polynomial commitments. The sign coordinator, which previously assumed that any DkgResults message contained the complete set of polynomial commitments, has been extended to also allow individual polynomial commitments in these messages - which it will then aggregate for all signers to form the complete set of commitments.

Applicable issues

closes Nakamoto Signer[3.0] - A signer must be able to save multiple DKG aggregate key round states and load it where appropriate #4654

Additional info

To ensure a bounded size stored in StackerDB, we limit the implementation to only hold two sets of shares concurrently to handle the scenario described in #4654.

…s already set

stacks-signer/src/runloop.rs

hstove

Everything LGTM! There are a few instances where it would be helpful to pluralize function names that now return states instead of one state.

stacks-signer/src/signer.rs

netrome · 2024-04-29T21:57:33Z

Everything LGTM! There are a few instances where it would be helpful to pluralize function names that now return states instead of one state.

Good point. I've updated get_signer as you noted, and also load_encrypted_signer_state which should be pluralized.

kantai

Just have a few questions

stacks-signer/src/runloop.rs

stacks-signer/src/signer.rs

testnet/stacks-node/src/nakamoto_node/sign_coordinator.rs

… in one place

…roved

stacks-signer/src/runloop.rs

stacks-signer/src/signer.rs

jcnelson

The structure of this code is giving me a lot of pause. It seems that there are several places where the signer state can just be willy-nilly reset from saved state, but without any consideration to how it will affect the execution of the signer at that point in the reload. I fear this will lead to hard-to-reproduce bugs and state corruption, because it is hard to see in advance what the downstream consequences can be when changing this state mid-way through a main loop pass.

Instead, can this code be factored to make a state reload happen only at the top-level of the main loop? A reload should cancel any in-progress activities and force the signer to restart whatever it was doing anew.

netrome · 2024-05-03T20:48:51Z

The structure of this code is giving me a lot of pause. It seems that there are several places where the signer state can just be willy-nilly reset from saved state, but without any consideration to how it will affect the execution of the signer at that point in the reload. I fear this will lead to hard-to-reproduce bugs and state corruption, because it is hard to see in advance what the downstream consequences can be when changing this state mid-way through a main loop pass.

Instead, can this code be factored to make a state reload happen only at the top-level of the main loop? A reload should cancel any in-progress activities and force the signer to restart whatever it was doing anew.

I'm not too happy about the current structure either. While I have attempted to be careful with how and when we load state from the signer, I do share your concern. We've already seen plenty of hard-to-reproduce bugs in the signer so far, and I definitely don't want to add to it, especially since the intention of this change is to make the signer more robust towards bugs caused by network delays and timing issues.

I don't fully understand right now how exactly we'd be able to make the state reloads only happen at the top-level of the main loop (assuming you mean within theSignerRunLoop::run_one_pass function and not the top-level provided SignerRunLoop::main_loop function), but I will look into it with fresh eyes on Monday and see what I can do.

netrome · 2024-05-06T08:48:22Z

I've updated now so that state reloads only happen in signer::update_approved_aggregate_key which is only called in signer::refresh_dkg, which in turn is only called from the main loop if no approved aggregate key is set. This eliminates some of the unnecessary boilerplate and allows us to instantiate the signer without doing yet another state reload after the signer has been constructed.

This feels much cleaner than the previous solution. Thank you for bringing this up @jcnelson. Let me know if you have further opinions on how to structure this. For example, if you want state reloads to be explicit in the main loop I can do some further refactors to achieve this - but I think the solution we have now fits better with the current structure of the signer.

…to-save-multiple-dkg-aggregate-key-round-states-and-load-it-where-appropriate

kantai · 2024-05-07T15:54:27Z

stacks-signer/src/signer.rs

+                match self.load_saved_state_for_aggregate_key(approved_aggregate_key) {
+                    Ok(()) => self.send_dkg_results(&approved_aggregate_key),
+                    Err(e) => warn!(
+                        "{self}: Failed to load saved state for key {approved_aggregate_key}: {e}"
+                    ),
+                }


I'm still a little confused by the logic here. update_approved_aggregate_key is invoked to refresh the signer's view of the approved aggregate key. Why should that trigger a load from saved state? Shouldn't the signer's state already be loaded into memory at this point? Why doesn't the signer just load state into memory during process startup (as I think was the general idea behind Jude's suggestion)?

The general logic seems like it should be:

On boot: self.refresh_approved_aggregate_key(); if self.approved_aggregate_public_key.is_some(): self.signer_state = self.load_saved_states()[self.aggregate_key]; if self.signer_state.pending_dkg_results.is_some(): send_dkg_results(); self.signer_state.pending_dkg_results.clear(); self.overwrite_saved_states() On refresh aggregate public key: if self.approved_aggregate_public_key is changed: if self.signer_state.pending_dkg_results.is_some(): send_dkg_results(); self.signer_state.pending_dkg_results.clear(); self.overwrite_saved_states()

That way, the signer sends a DkgResults message after the aggregate key is approved if they're running at the time, or if they wakeup after the aggregate key has been approved, they send the message then.

This avoids excessively reloading the signer states and it avoids resending the DkgResults message potentially every time a state reload occurs.

Hmm yeah I think we could do it that way. Two thoughts on the logic:

On boot the signer won't have any pending DKG results so we don't have to send DKG results at that point.

State writes happen now as soon as DKG is finished (i.e. a DkgEnd(_) message is received), which I think makes sense because you want to persist the shares as soon as you have confirmation that you have been able to create an aggregate key.

With these I think we can simplify the logic to

On boot: self.refresh_approved_aggregate_key(); if self.approved_aggregate_public_key.is_some(): self.signer_state = self.load_saved_states()[self.aggregate_key]; On refresh aggregate public key: if self.approved_aggregate_public_key is changed: if self.signer_state.pending_dkg_results.is_some(): send_dkg_results(); self.signer_state.pending_dkg_results.clear(); On DkgEnd received: self.save_signer_state()

What do you think about this?

Why should that trigger a load from saved state? Shouldn't the signer's state already be loaded into memory at this point?

It probably should be, but the rationale behind the current logic is to ensure that the saved state is always consistent with the approved aggregate key. If we for any reason observe a change in the approved aggregate key, a natural reaction would be to reload the state observed with the changed aggregate key.

I think its possible to have a pending dkg results on boot, because the signer may have gone offline between DkgEnd and the key being approved by the network. I think we should still be persisting these states after the pending_dkg_results is cleared as well, so I think my suggested logic is about as simplified as it can get. I do agree that we should be persisting on DkgEnd as well.

Actually, a problem with the proposed logic is if you boot the signer and an approved aggregate key is not yet set you would never load any persisted state. If you are unlucky with the timing, your signer might boot just before the last vote for an aggregate key arrives and sets that key in the contract.

From that perspective, I think it is safer to load the saved state when you observe a change in the aggregate key.

To prevent excessive state reloads I could add a condition to only load the saved state in case the approved aggregate key is different from the one already present in the state.

So I'd then propose considering something that looks like this:

On boot: Start without any approved_aggregate_key or loaded signer state On refresh aggregate public key: if self.approved_aggregate_public_key is changed: if the new approved_aggregate_public_key is not the same as the one in our signer state: self.load_saved_state(approved_aggregate_key) if self.signer_state.pending_dkg_results.is_some(): send_dkg_results(); self.signer_state.pending_dkg_results.clear(); On DkgEnd received: self.save_signer_state()

This would avoid excessive state reloads, but ensure that the signer doesn't miss loading the state for a particular aggregate key.

The way we've addressed this in the past (albeit in very different contexts) is to have two structs: a "preload" struct and a "loaded" struct which is instantiated from a "preload" struct and some additional state. You'd have an UninitializedSigner struct, which represents a Signer that does not yet have its state loaded. Its impl would have a conversion method like into_signer(state), which fully instantiated the Signer from the loaded state once it became available.

Then, you'd make it impossible to accidentally have a partially-instantiated or corrupted Signer used somewhere. Given the importance of the signer to chain liveness, I think it's worth the effort to make invalid Signer states unrepresentable here.

I agree that it makes a lot of sense to differentiate between an UninitializedSigner and a Signer instead of having the current State field within the signer. It's not clear to me what the relation is between an initialized/uninitialized signer and which DKG shares the signer should have loaded. Afaik, a Signer may still be initialized but without having a known approved aggregate key or any DKG shares.

stacks-signer/src/storage.rs

stacks-signer/src/signer.rs

testnet/stacks-node/src/tests/signer.rs

…nd them

netrome self-assigned this Apr 22, 2024

netrome linked an issue Apr 22, 2024 that may be closed by this pull request

Nakamoto Signer[3.0] - A signer must be able to save multiple DKG aggregate key round states and load it where appropriate #4654

Open

netrome force-pushed the 4654-stacks-signer-a-signer-must-be-able-to-save-multiple-dkg-aggregate-key-round-states-and-load-it-where-appropriate branch 3 times, most recently from 0ff200c to 70f7d69 Compare April 25, 2024 19:28

netrome added 5 commits April 29, 2024 16:13

refactor: Load signer state after signer is constructed

df3a3b2

feat: Saving and loading the two last signer states per DKG round

c5f3d0a

feat: Store signer state per group key

5331a3c

feat: Don't process dkg results for a reward cycle if aggregate key i…

8d30a25

…s already set

test: Integration test to ensure signers are able to recover DKG keys

f433116

netrome force-pushed the 4654-stacks-signer-a-signer-must-be-able-to-save-multiple-dkg-aggregate-key-round-states-and-load-it-where-appropriate branch from 70f7d69 to f433116 Compare April 29, 2024 14:23

netrome marked this pull request as ready for review April 29, 2024 14:23

netrome requested review from hstove, jferrant, kantai, jcnelson and 8marz8 April 29, 2024 14:24

jferrant reviewed Apr 29, 2024

View reviewed changes

stacks-signer/src/runloop.rs Outdated Show resolved Hide resolved

feat: Do not panic on network errors

4f7b1da

hstove previously approved these changes Apr 29, 2024

View reviewed changes

stacks-signer/src/signer.rs Show resolved Hide resolved

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

netrome dismissed hstove’s stale review via 4f7b1da April 29, 2024 21:50

refactor: Pluralize some names

035815e

netrome requested review from jferrant and hstove April 30, 2024 13:58

kantai reviewed May 2, 2024

View reviewed changes

netrome added 4 commits May 3, 2024 13:16

refactor: Helper function to access an Rng instead of hard-coding OsRng

f6b7a51

refactor: Remove get_signer_state utility function which is only used…

38649df

… in one place

feat: Try load signer state from SignerDB before StackerDB

69b3c67

refactor: Move storage utility functions new module

5db9210

netrome added 2 commits May 3, 2024 15:04

feat: Only send DkgResult messages once an aggregate key has been app…

46e67e9

…roved

fix: format

864842b

jcnelson reviewed May 3, 2024

View reviewed changes

stacks-signer/src/runloop.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 3, 2024

View reviewed changes

stacks-signer/src/signer.rs Show resolved Hide resolved

jcnelson reviewed May 3, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 3, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 3, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson requested changes May 3, 2024

View reviewed changes

feat: Only reload saved signer state in update_approved_aggregate_key

34cf8ac

netrome added 3 commits May 6, 2024 10:51

fix: Only send dkg results if loading the aggregate key was successful

2284657

Merge branch 'develop' into 4654-stacks-signer-a-signer-must-be-able-…

333c677

…to-save-multiple-dkg-aggregate-key-round-states-and-load-it-where-appropriate

fix: Skip missed mutant in get_signer_commitments

8d3add0

netrome requested review from kantai and jcnelson May 7, 2024 09:09

kantai reviewed May 7, 2024

View reviewed changes

jcnelson reviewed May 7, 2024

View reviewed changes

stacks-signer/src/storage.rs Show resolved Hide resolved

jcnelson reviewed May 7, 2024

View reviewed changes

stacks-signer/src/storage.rs Show resolved Hide resolved

jcnelson reviewed May 7, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 7, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 7, 2024

View reviewed changes

stacks-signer/src/signer.rs Outdated Show resolved Hide resolved

jcnelson reviewed May 7, 2024

View reviewed changes

testnet/stacks-node/src/tests/signer.rs Show resolved Hide resolved

netrome added 4 commits May 8, 2024 14:19

fix: Add copyright header to storage.rs module

65680c3

feat: Top-level Error type for storage.rs module

248550f

feat: Return errors if encountering failures in DKG processing

4b58dc1

feat: Check if signer has pending dkg results before attempting to se…

ff15ebf

…nd them

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704

feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704

netrome commented Apr 22, 2024 •

edited

hstove left a comment

netrome commented Apr 29, 2024

kantai left a comment

jcnelson left a comment

netrome commented May 3, 2024

netrome commented May 6, 2024

kantai May 7, 2024

kantai May 7, 2024

netrome May 7, 2024

netrome May 7, 2024

kantai May 7, 2024

netrome May 7, 2024

netrome May 7, 2024 •

edited

netrome May 7, 2024

jcnelson May 7, 2024

netrome May 7, 2024

feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704

Are you sure you want to change the base?

feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704

Conversation

netrome commented Apr 22, 2024 • edited

Description

Applicable issues

Additional info

hstove left a comment

Choose a reason for hiding this comment

netrome commented Apr 29, 2024

kantai left a comment

Choose a reason for hiding this comment

jcnelson left a comment

Choose a reason for hiding this comment

netrome commented May 3, 2024

netrome commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netrome May 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netrome commented Apr 22, 2024 •

edited

netrome May 7, 2024 •

edited