-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doomslug #1991
Doomslug #1991
Conversation
Codecov Report
@@ Coverage Diff @@
## staging #1991 +/- ##
===========================================
+ Coverage 87.13% 87.38% +0.25%
===========================================
Files 170 170
Lines 33544 33845 +301
===========================================
+ Hits 29227 29575 +348
+ Misses 4317 4270 -47
Continue to review full report at Codecov.
|
fe367d8
to
f88ba21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for runtime
@@ -165,7 +168,14 @@ fn create_block( | |||
} | |||
|
|||
fn apr(account_id: AccountId, reference_hash: CryptoHash, parent_hash: CryptoHash) -> Approval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to approve
?
self.tip.reference_hash, | ||
target_height, | ||
is_endorsement, | ||
&**self.signer.as_ref().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like something evil.
if let Some((me, signer)) = (self.me.as_ref(), self.signer.as_ref()) {
...
&*signer
}
also might be easier to pack me
+ signer
into one object - BlockSigner.
We also need BlockSigner
to get HSM signing going (because we will need to pass non-hashed data to HSM for signatures anyway ,so we need API that has specific sign_block
, sign_approval
APIs).
/\ this all sounds like a separate PR if you are not me :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to change anything related to HSM yet, it will be a separate effort.
Your approach still requires &**signer
, because it's a reference to an Arc
.
@@ -97,15 +103,27 @@ impl Client { | |||
let num_block_producer_seats = config.num_block_producer_seats as usize; | |||
let data_parts = runtime_adapter.num_data_parts(); | |||
let parity_parts = runtime_adapter.num_total_parts() - data_parts; | |||
|
|||
let doomslug = Doomslug::new( | |||
block_producer.as_ref().map(|x| x.account_id.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be let's move BlockProducer to primitives (and rename to BlockSigner) and use it in Doomslug instead of splitting here?
f88ba21
to
2e3e7c4
Compare
chain/chain/src/doomslug.rs
Outdated
/// `None` | ||
/// `PassedThreshold(when)` - after processing this approval the block has passed the threshold set by | ||
/// `threshold_mode` (either one half of the total stake, or a single approval). | ||
/// Once the threshold is hit, we want for `T(h - h_final) / 2` before producing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want -> wait. Also why do we wait for T(h - h_final) / 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper has been updated to reflect as to why we sleep
pub fn new( | ||
me: Option<AccountId>, | ||
largest_previously_skipped_height: BlockHeight, | ||
largest_previously_endorsed_height: BlockHeight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these two arguments passed in instead of being initialized as 0?
approved_stake > threshold | ||
} | ||
|
||
pub fn remove_witness( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to remove witness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove it when we consume it to produce a block
self.timer.started = now; | ||
|
||
self.approval_tracking | ||
.retain(|h, _| *h > height && *h <= height + MAX_HEIGHTS_AHEAD_TO_STORE_APPROVALS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems inefficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 10k iterations per each update of the head, which is acceptable.
Opened #2022 to track it.
2e3e7c4
to
61ca21e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RPC changes are fine and I approve them, but I have a few suggestions in other places.
if self.time_passed_threshold == None { | ||
self.time_passed_threshold = Some(now); | ||
} | ||
DoomslugBlockProductionReadiness::ReadyToProduce(self.time_passed_threshold.unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to avoid .unwrap()
, .e.g.:
let time_passed_threshold = if let Some(threshold) = self.time_passed_threshold {
threshold
} else {
self.time_passed_threshold = Some(now);
now
};
DoomslugBlockProductionReadiness::ReadyToProduce(time_passed_threshold)
pub fn create_approval( | ||
&self, | ||
target_height: BlockHeight, | ||
is_endorsement: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to bring this to our attention, that bool
-based APIs are usually hard to follow when you read/maintain the code, e.g. the use looks like self.create_approval(self.timer.height + 1, false)
(it is hard to tell what that false
implies): http://blakesmith.me/2019/05/07/rust-patterns-enums-instead-of-booleans.html / https://www.reddit.com/r/programming/comments/ebxzp8/dont_use_booleans/
pub reference_hash: Option<CryptoHash>, | ||
pub target_height: BlockHeight, | ||
pub is_endorsement: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the code correctly, Approval
can be an enum, which would express the type in a safer way.
pub next_bp_hash: CryptoHash, | ||
pub approvals: Vec<(AccountId, CryptoHash, CryptoHash, Signature)>, | ||
pub approvals: Vec<(AccountId, CryptoHash, Option<CryptoHash>, BlockHeight, bool, Signature)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about making this tuple a dedicated type (is it Approval
structure?) where the arguments are named?
} else if self.approved_stake > self.total_stake / 2 | ||
|| self.threshold_mode == DoomslugThresholdMode::NoApprovals | ||
{ | ||
if self.time_passed_threshold == None { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.time_passed_threshold == None { | |
if self.time_passed_threshold.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, what I suggested above is to actually avoid this check and the unwrapping a few lines below in the first place.
chain/chain/tests/doomslug.rs
Outdated
let mut is_done = false; | ||
while !is_done { | ||
now = now + Duration::from_millis(25); | ||
/*println!("{:?}, {}, {}", now, approval_queue.len(), block_queue.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be removed?
@@ -291,10 +265,22 @@ impl Client { | |||
)? | |||
.clone(); | |||
|
|||
let account_id_to_stake = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate code here?
&approval, | ||
)?; | ||
|
||
self.collect_block_approval(&approval, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is not necessary
This reverts commit 2c7bf9c.
In preparation for implementing Doomslug, removing the concept of `weight` and replacing the fork choice rule to choose the chain with the highest score, and then height. Replacing score with the height of the last pre-voted block rather than the weight.
8d6bb3e
to
3cd4e80
Compare
Implementation and integration of Doomslug (see https://near.ai/doomslug). Doomslug itself is in `chain/doomslug.rs`, and doesn't depend on chain or any other code. It also always takes the current time as an argument. It significantly simplifies testing. Doomslug is stored on the client, not chain, because both block production and approvals processing happens on client. Instantiating on chain would require a slightly more complex interface (e.g. it would be impossible to pass `me`). Doomslug needs to know about the latest Tip. Instead of intercepting the lowest level tip-updating routine (which is in storage), I update the Tip in the client when we accept the new head. It could miss head changes related to syncing and challenges. To be safe I always update the head whenever I interact with Doomslug, so the head is guaranteed to be accurate whenever we do anything related to Doomslug, but it might get sent to Doomslug with a slight delay. This change also solves a problem that approvals before were in a cache hash map, therefore an adversary could have spammed a node with lots of invalid approvals and remove all valid approvals from the cache, resulting in a node not having enough approvals when producing the block. New logic is more complex, see `DoomslugApprovalsTrackersAtHeight` class. Test plan --------- 1. Sanity tests (basic invariants for `set_tip` and approval processing) 2. A fuzzy test that tests safety and liveness under different times to GST and deltas. 3. `test_catchup_sanity_blocks_produced_doomslug` verifies that heights are properly skipped. 4. Also added checking doomslug invariant into `cross_shard_tx`, which is known to evoke all kinds of weird structures (but `cross_shard_tx` operates without the requirement to have 50% approvals on the block, thus still causing lots of forkfullness). 5. A new version of `cross_shard_tx` that enables doomslug, but disables tampering with the finality gadget. Thus the vanilla version tests heavy forkfulness and tampers with FG, while the new doomslug version has practically no forfulness due to doomslug, and doesn't stress the FG as much, but does test block production with doomslug.
3cd4e80
to
edcc305
Compare
This PR consists of three commits: one reverts proof-of-stake-time, one completely removes the concept of weight from the codebase, and the final one contains the implementation of Doomslug. It might be easier to review the last commit without the first two, since the first two are very mechanical, but create enormous number of changes.
Since Github doesn't by default pick up the description of the multi-commit PR, pasting here the description of Doomslug commit:
Implementation and integration of Doomslug (see https://near.ai/doomslug).
Doomslug itself is in
chain/doomslug.rs
, and doesn't depend on chain or anyother code. It also always takes the current time as an argument. It
significantly simplifies testing.
Doomslug is stored on the client, not chain, because both block
production and approvals processing happens on client. Instantiating on
chain would require a slightly more complex interface (e.g. it would be
impossible to pass
me
).Doomslug needs to know about the latest Tip. Instead of intercepting the
lowest level tip-updating routine (which is in storage), I update the
Tip in the client when we accept the new head. It could miss head
changes related to syncing and challenges. To be safe I always update
the head whenever I interact with Doomslug, so the head is guaranteed to
be accurate whenever we do anything related to Doomslug, but it might
get sent to Doomslug with a slight delay.
This change also solves a problem that approvals before were in a cache
hash map, therefore an adversary could have spammed a node with lots of
invalid approvals and remove all valid approvals from the cache,
resulting in a node not having enough approvals when producing
the block. New logic is more complex, see
DoomslugApprovalsTrackersAtHeight
class.Test plan
set_tip
and approval processing)GST and deltas.
test_catchup_sanity_blocks_produced_doomslug
verifies that heightsare properly skipped.
cross_shard_tx
, whichis known to evoke all kinds of weird structures (but
cross_shard_tx
operates without the requirement to have 50% approvals on the block,
thus still causing lots of forkfullness).
cross_shard_tx
that enables doomslug, but disablestampering with the finality gadget. Thus the vanilla version tests heavy
forkfulness and tampers with FG, while the new doomslug version has
practically no forfulness due to doomslug, and doesn't stress the FG as
much, but does test block production with doomslug.