Feat: Miner acts as signer set coordinator during block signing #4481

kantai · 2024-03-04T20:31:22Z

This PR alters the stacks-node miner thread to act as the signing set coordinator for gettings its own blocks signed. There's a couple hacks here that I know are hacks. I added a note about it below (it has to do with figuring out the current miner's key).

Otherwise, this PR does a handful of things in support of the miner-as-coordinator:

Replaces msg_id u32 with an enum for message identification. Straight-forward refactor.
Adds an additional slot for miner messages. Miners need to send signer messages, not just block proposals. This adds a new slot for that.
Adds a sync channel for listening to StackerDB events. The miner needs to receive StackerDB events. It could either poll the database (seems like a bad idea) or listen for events. Registering for RPC callbacks seems bad. Instead, this uses a singleton sync channel.
Adds a StackerDBs method for pushing a chunk locally and emitting event. The node shouldn't make RPC calls to itself. This could trigger a deadlock if we aren't careful, it could lead to some pretty weird network patterns (depending on the deployment), and its less overhead this way anyways.
A new signer message type to store DKG results. This is read by miners to instantiate FIRE coordinator state. It is checked against the chainstate's aggregate key, and if even just one signer produces this message, the miner should be able to retrieve it. I understand this data can be cached on the miner for the duration of a reward cycle. I do not think that should happen in this PR, though, because I want to first focus on correctness and deal with optimizations separately.
Test signing channel for nakamoto integration tests. Because the miner now doesn't listen for block approvals, but instead actively acts as a coordinator, the blind signer could no longer work as it did. It would need to mimic the entire FIRE protocol, which I think would be brittle, a waste of time, and not really the point of the non-signer nakamoto integration tests. So instead, there's a [cfg(test)] code block which checks if the blind signing channel is initialized and uses it for the test if so.
Currently builds with a branch of wsts (this is reflected in the Cargo.toml, and should allow builds to work!) Once https://github.com/stacks-network/wsts/pull/3 merges and a release is built, the version in this repo will have to be bumped rather than building off the git branch.

Hack: Determining the coordinator public key as a signer

If the active miner is acting as the signer, its public key needs to be used as the coordinator public key. Right now, the way this works is that the signer instance checks if its reward cycle is active, and if so, it uses the most recent miner public key it has received.

It fetches this from the stackerdb chunks messages -- using the most recent received event to get the public key. This is mostly safe: the .miners stackerdb really will only allow the current miner and the prior miner to send events. However, this doesn't deal with issues related to the current miner being either non-responsive or malicious. To deal with that, the signer binary needs to be actually tracking some additional state, like the current burn chain view of the node. That way, the signer can have a designated "current trusted miner", that it only moves forward when a miner is selected. This is true regardless of the changes in this PR, though. The right way to handle this is to subscribe to burn block events, and expand that event's information. I don't necessarily think that needs to be done at the same time as this miner-as-coordinator work, though.

Former Hacks!

These all used to be necessary or hacky things in this branch, but are all no longer necessary or have otherwise been fixed.

Reusing the FIRE coordinator library in the miner

This is the correct thing to do, and the FIRE coordinator is instantiated with sufficient state to be able to detect which signers returned bad data so that it can restart signing with malicious signers removed.

As you could tell by inspecting the new sign_coordinator module, the wsts fire coordinator is reused by the miner. I don't think this is necessarily so hack-y (in fact, code reuse is A Good Thing), but it's something to think about. The miner isn't actually involved in, say, DKG, and so the fire coordinator may be a over-parameterized from its perspective, as well as theoretically capable of reaching a bunch of bad internal states (if the miner thread's coordinator was ever in a DKG state, e.g.). You can also see that I needed to make some internal state of the coordinator public, as well as implementing an Aggregator that took the aggregate public key as a given.

~~It's worth thinking about whether or not the miner should actually have its own state machine here.~~

Signers continue processing messages on their local coordinator

This was necessary to get the signers integration tests passing (as well as setting the DKG ID in the miner) without changes to them. This had more to do with the tests than anything else, and I updated the tests. I'm still working through fixing all of the signer tests though.

Removing the signature assembled checks from the signers tests

This isn't really a hack, but instead, the change from signers being the coordinator to the miner being a coordinator breaks some of the assumptions made in the signers tests setup. Namely, they no longer have a channel to a coordinator that assembles the final signature. There's definitely ways to fix these tests so that they still can assert the signatures are in place (like, we can add a variable to the tracking globals used by the miner in other tests), so I can just do that.

My fix was to just use the block production events from the event observer, and assert that the signature on the confirmed block matches the expected aggregate key.

Cargo.toml

libsigner/src/events.rs

stacks-common/src/util/mod.rs

stacks-signer/src/runloop.rs

libsigner/src/events.rs

testnet/stacks-node/src/event_dispatcher.rs

stacks-signer/src/signer.rs

jferrant · 2024-03-05T16:25:56Z

Is it not possible to just have a fire coordinator within the miner that triggers a sign round exactly as the signers do now? The only thing that would be different is the miner would not include its own key as part of the CoordinatorConfig/if it did, it would be assigned zero key ids. Do we actually need an aggregator?

kantai · 2024-03-05T16:36:00Z

Is it not possible to just have a fire coordinator within the miner that triggers a sign round exactly as the signers do now? The only thing that would be different is the miner would not include its own key as part of the CoordinatorConfig/if it did, it would be assigned zero key ids. Do we actually need an aggregator?

That's basically what sign_coordinator does. It instantiates a fire coordinator and triggers a sign round basically the same way the signers do now. The reason that a new Aggregator implementation is necessary is because the miner doesn't participate in DKG, which means it doesn't read the DKG results and therefore can't be initialized the same way that the default aggregator is (see https://github.com/Trust-Machines/wsts/blob/main/src/v2.rs#L383-L406). This is okay for the miner, though, because it doesn't need those commitments, it just takes the aggregate public key as given.

xoloki · 2024-03-05T18:14:04Z

Is it not possible to just have a fire coordinator within the miner that triggers a sign round exactly as the signers do now? The only thing that would be different is the miner would not include its own key as part of the CoordinatorConfig/if it did, it would be assigned zero key ids. Do we actually need an aggregator?

That's basically what sign_coordinator does. It instantiates a fire coordinator and triggers a sign round basically the same way the signers do now. The reason that a new Aggregator implementation is necessary is because the miner doesn't participate in DKG, which means it doesn't read the DKG results and therefore can't be initialized the same way that the default aggregator is (see https://github.com/Trust-Machines/wsts/blob/main/src/v2.rs#L383-L406). This is okay for the miner, though, because it doesn't need those commitments, it just takes the aggregate public key as given.

The miner needs those commitments if it wants to be able to check signature shares when aggregation fails. If you can't check signature shares then you can't identify malicious actors. Which means you can't keep signing blocks.

jcnelson · 2024-03-13T16:06:46Z

@kantai Is this ready for review now?

kantai · 2024-03-13T20:45:32Z

@kantai Is this ready for review now?

No, not yet. I'm working on getting the existing nakamoto integrations tests to work with my changes (in branch feat/miner-coordinator+dream-fixes). The issue is that the previous auto-signer used in the integration tests depended on the stacks miner not really being involved in the signing communications, so there needs to be a workaround for that.

* Replaces msg_id u32 with an enum for message identification * Adds an additional slot for miner messages * Adds a sync channel for listening to StackerDB events * Adds a StackerDBs method for pushing a chunk locally and emitting event * Uses a new message type to store DKG results, to be read by miners to instantiate coordinator * Uses a test signing channel for nakamoto integration tests * Currently builds with a branch of wsts

…heme

codecov · 2024-03-14T20:56:43Z

Codecov Report

Attention: Patch coverage is 82.08955% with 228 lines in your changes are missing coverage. Please review.

Project coverage is 83.32%. Comparing base (498b6c0) to head (cb219c9).

Additional details and impacted files

@@            Coverage Diff             @@
##             next    #4481      +/-   ##
==========================================
- Coverage   83.42%   83.32%   -0.10%     
==========================================
  Files         453      455       +2     
  Lines      329589   330179     +590     
  Branches      323      323              
==========================================
+ Hits       274944   275135     +191     
- Misses      54637    55036     +399     
  Partials        8        8

Files	Coverage Δ
clarity/src/vm/types/mod.rs	`91.71% <100.00%> (+0.02%)`	⬆️
libstackerdb/src/libstackerdb.rs	`92.56% <100.00%> (+0.25%)`	⬆️
stacks-common/src/util/macros.rs	`83.85% <ø> (ø)`
stacks-common/src/util/mod.rs	`53.44% <ø> (ø)`
stacks-common/src/util/secp256k1.rs	`79.27% <100.00%> (+0.12%)`	⬆️
stacks-signer/src/client/mod.rs	`99.14% <100.00%> (ø)`
stacks-signer/src/client/stackerdb.rs	`89.36% <100.00%> (-0.44%)`	⬇️
stacks-signer/src/client/stacks_client.rs	`87.42% <100.00%> (ø)`
stacks-signer/src/config.rs	`87.50% <100.00%> (+0.95%)`	⬆️
stacks-signer/src/runloop.rs	`90.56% <100.00%> (-0.99%)`	⬇️
... and 21 more

... and 25 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 498b6c0...cb219c9. Read the comment docs.

…UNT const

libsigner/src/events.rs

obycode

This looks great!

stackslib/src/chainstate/burn/db/sortdb.rs

testnet/stacks-node/src/event_dispatcher.rs

testnet/stacks-node/src/nakamoto_node/sign_coordinator.rs

obycode

👍

jferrant

Just delete the defunct deserialize_scalar and deserialize_point fcns (unless I am missing something) and I will approve :D

jferrant

LGTM! :D

kantai requested review from jferrant, xoloki and jcnelson March 4, 2024 20:31