Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bank_send_loop: Get feature flag from root bank #31954

Merged
merged 2 commits into from Jun 13, 2023

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Jun 2, 2023

Problem

Previously we were using the working bank to access the feature set. This only works while we are leader, but we start sending gossip votes to banking stage slightly before we are leader.

Summary of Changes

Instead use the root bank from bank forks to access feature set.

Fixes #

@codecov
Copy link

codecov bot commented Jun 3, 2023

Codecov Report

Merging #31954 (cacb5f3) into master (446b82a) will decrease coverage by 0.1%.
The diff coverage is 77.7%.

@@            Coverage Diff            @@
##           master   #31954     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         766      766             
  Lines      208683   208642     -41     
=========================================
- Hits       171012   170964     -48     
- Misses      37671    37678      +7     

@@ -426,6 +431,15 @@ impl ClusterInfoVoteListener {
&verified_vote_packets,
)?;
}
// Check if we've crossed the feature boundary
Copy link
Contributor

@steviez steviez Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: Better to have this here a second time (in addition to before the loop) to avoid grabbing a read lock on bank_forks every iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we factor this optimization out to a separate PR to reduce the backport surface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we factor this optimization out to a separate PR to reduce the backport surface?

if you gimme write on your branch, i have this change made locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we factor this optimization out to a separate PR

To confirm, you're only talking about removing the if statement that skips reading feature set if we know the feature is already activated ? My initial comment was pointing out that we were reading in two places, but that was better to both 1) start with a valid value on first couple iterations of loop and 2) avoid spamming the lock.

That being said, receive_and_process_vote_packets() does recv_timeout(200ms) so my concern about having feature set happen every iteration of the loop is mitigated by receive_and_process_vote_packets() taking a nontrivial amount of time to return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just gave you write access. if you're talking about the refactor to pass the flag rather than the feature set that was to avoid cloning or holding the bank forks lock for too long.
the if statement could probably be removed taking into consideration the 200ms sleep that steve pointed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. the whole swap from Option<FeatureSet> to bool. master...t-nelson:solana:feature-flag-fix

we're already doing the inefficient thing and that inefficiency did not contribute to the testnet outage, so the optimization doesn't need to be backported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, i'm fine with the clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, i'm fine with the clone.

hrmf, seems like github has some new default branch protection rules or something. it won't let me push my changes 'cause i rewrote your tip. mind pulling my branch and pushing up the change?

ping me when we're readhy and i'll re-enable rebase+merge for this so we don't need a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks, should now be 2 commits. hopefully github does the right thing and only backports the first one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks, should now be 2 commits. hopefully github does the right thing and only backports the first one.

it'll try to do them both. trivial to drop the last one tho

@t-nelson
Copy link
Contributor

t-nelson commented Jun 3, 2023

any concerns about the 32 block delay in activation? seems like ideally we'd use the most recently frozen bank.

@AshwinSekar
Copy link
Contributor Author

any concerns about the 32 block delay in activation? seems like ideally we'd use the most recently frozen bank.

Don't expect it would be a huge difference. Also most normal ways of startup would start with a root bank right? starting from snapshot or catching up old ledger?

@steviez
Copy link
Contributor

steviez commented Jun 3, 2023

any concerns about the 32 block delay in activation? seems like ideally we'd use the most recently frozen bank.

Don't expect it would be a huge difference.

Agreed with Ashwin here - I don't think so. It is still a binary on/off regardless of any delay. If anything, the 32 block delay imposed by it being a root in the new epoch will create some distance from the epoch boundary, so maybe cluster more stable there as opposed to the Bank(s) that initially cross epoch boundary.

Also most normal ways of startup would start with a root bank right? starting from snapshot or catching up old ledger?

Correct, we will always have a root in BankForks:

pub fn new(bank: Bank) -> Self {
let root = bank.slot();
Self::new_from_banks(&[Arc::new(bank)], root)
}

also noting that we return an Arc<Bank>, not an Option<Arc<Bank>> here:

pub fn root_bank(&self) -> Arc<Bank> {
self[self.root()].clone()
}

@steviez
Copy link
Contributor

steviez commented Jun 3, 2023

It is still a binary on/off regardless of any delay

This is the case for mainnet beta; it will be a gradual thing for testnet as nodes upgrade if we reuse the existing feature key.

@AshwinSekar
Copy link
Contributor Author

nice! i would prefer not to create a feature gate if we can avoid it, the original optimization this code handled wasn't consensus breaking either.
The reason this was feature gated in the first place was that the optimization to store 1 vote per validator is only possible with the new vote format.

@carllin
Copy link
Contributor

carllin commented Jun 3, 2023

any concerns about the 32 block delay in activation? seems like ideally we'd use the most recently frozen bank.

What's the worst case here? That after block A when the feature gets activated, validators on the network will start pushing FullTowerVote's that will then get dropped in gossip until the block A with the feature activation gets rooted?

Seems like should be fine as long as turbine votes are landing 😃

Alternatively we could remove the is_full_tower_vote_enabled check entirely and just only queue fulltower votes after we see them and drop incremental votes

@t-nelson
Copy link
Contributor

t-nelson commented Jun 3, 2023

What's the worst case here?

i think i'm more concerned about the class of bug than this particular instance of it. consider a feature gate that affects multiple components which sample different banks...

actually, even in this case, i believe the runtime and vote program instances would be observed active at different slots

@AshwinSekar
Copy link
Contributor Author

AshwinSekar commented Jun 3, 2023

actually, even in this case, i believe the runtime and vote program instances would be observed active at different slots

The only bank hash breaking check is in vote_processor - we fail VoteStateUpdate txs if the feature flag isn't enabled. The rest of the pipeline including vote program was designed to handle a mix of Vote and VoteStateUpdate

@carllin
Copy link
Contributor

carllin commented Jun 3, 2023

i think i'm more concerned about the class of bug than this particular instance of it. consider a feature gate that affects multiple components which sample different banks...

Pretty sure in most cases we're doing bank specific activities like processing transactions/program modifications for feature flags, so it's not an issue because then you just check if the feature is active on that specific bank

Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Ashwin & Carl in thinking that the delay caused by using root_bank() should not be an issue for this PR. Think it would be good to ensure Trent agrees too tho

@t-nelson
Copy link
Contributor

t-nelson commented Jun 5, 2023

actually, even in this case, i believe the runtime and vote program instances would be observed active at different slots

The only bank hash breaking check is in vote_processor - we fail VoteStateUpdate txs if the feature flag isn't enabled. The rest of the pipeline including vote program was designed to handle a mix of Vote and VoteStateUpdate

i'm not strictly concerned with overt consensus breakage. the timing stuff is less straight-forward to reason about (i think in the case of this instance, it's mostly fine).

the only part that's not clear to me now is the timing considerations of this block

// Check if our tower is behind, if so (and the feature migration flag is in use)
// overwrite with the newer bank.
if let (true, Some(vote_account)) = (
Tower::is_direct_vote_state_update_enabled(bank),
bank.get_vote_account(my_vote_pubkey),
) {

@AshwinSekar
Copy link
Contributor Author

I see, yeah for this particular change I think it should be fine - we don't drop any VoteStateUpdate if the feature flag is not active on the root bank, we just don't dedup and end up sending all of them to banking stage.

As for the replay code you linked, the intention was not to change any behavior at the feature boundary but intended for improving our restart behavior regarding missing/out of date local towers.
if bank_vote_state.last_voted_slot() > tower.vote_state.last_voted_slot() under normal operation this should never be possible - but could happen on restart due to mistiming between vote and syncing tower to file.

As to why we feature flagged it, I believe it was because under Vote on chain vote states were likely to be too inaccurate to trust so we ignored the problem. @carllin might remember better.

@t-nelson
Copy link
Contributor

t-nelson commented Jun 6, 2023

I see, yeah for this particular change I think it should be fine - we don't drop any VoteStateUpdate if the feature flag is not active on the root bank, we just don't dedup and end up sending all of them to banking stage.

As for the replay code you linked, the intention was not to change any behavior at the feature boundary but intended for improving our restart behavior regarding missing/out of date local towers. if bank_vote_state.last_voted_slot() > tower.vote_state.last_voted_slot() under normal operation this should never be possible - but could happen on restart due to mistiming between vote and syncing tower to file.

As to why we feature flagged it, I believe it was because under Vote on chain vote states were likely to be too inaccurate to trust so we ignored the problem. @carllin might remember better.

so it seems like the potential timing issues are

  • vote program accepts VoteStateUpdate ix before validators start sending them. this is fine
  • on startup, replay may attempt to sync banks and local tower before validators start sending VoteStateUpdate ix. this seems racy given the cited potential inaccuracy of on chain vote state. if a validator were to restart from a snapshot near the activation epoch boundary, every frozen bank from the snapshot slot 'til the activation slot gets rooted would be subject to this race. is that an accurate assessment?

@AshwinSekar
Copy link
Contributor Author

Seems the concerns left are about the general timing issues of activating this feature? Should we merge this and pick it up in a separate issue?

vote program accepts VoteStateUpdate ix before validators start sending them

do you have more context on this? vote_processor will reject VoteStateUpdate until the flag has been activated for the associated bank. Similarly handle_votable_bank will only send VoteStateUpdate if that bank has the feature activated.

if a validator were to restart from a snapshot near the activation epoch boundary, every frozen bank from the snapshot slot 'til the activation slot gets rooted would be subject to this race. is that an accurate assessment?

Not sure i follow this exact scenario, we root the in between slots locally, or they were rooted by us pre restart?

But basically any restart around the epoch boundary could be trigger this race if the saved tower was later or missing.

The intention here is to try our best to not send a slashable vote by adopting on chain state.
In practice this condition is triggered multiple times as we catchup, so that when we are ready to vote we hopefully have the latest lockouts.

In this restart around epoch boundary case the worst that could happen is the current behavior of starting with an empty or inaccurate tower, we can only possibly start from a tower the cluster observed us having previously.

@t-nelson
Copy link
Contributor

Seems the concerns left are about the general timing issues of activating this feature? Should we merge this and pick it up in a separate issue?

vote program accepts VoteStateUpdate ix before validators start sending them

do you have more context on this? vote_processor will reject VoteStateUpdate until the flag has been activated for the associated bank. Similarly handle_votable_bank will only send VoteStateUpdate if that bank has the feature activated.

i must've read the code wrong last week. these two cases use the same bank (+/- refreshes)

if a validator were to restart from a snapshot near the activation epoch boundary, every frozen bank from the snapshot slot 'til the activation slot gets rooted would be subject to this race. is that an accurate assessment?

Not sure i follow this exact scenario, we root the in between slots locally, or they were rooted by us pre restart?

But basically any restart around the epoch boundary could be trigger this race if the saved tower was later or missing.

The intention here is to try our best to not send a slashable vote by adopting on chain state. In practice this condition is triggered multiple times as we catchup, so that when we are ready to vote we hopefully have the latest lockouts.

In this restart around epoch boundary case the worst that could happen is the current behavior of starting with an empty or inaccurate tower, we can only possibly start from a tower the cluster observed us having previously.

ok thanks for the details! that last paragraph assuages my concerns


one last doom scenario: if all or part of the cluster were to not be making roots at all across the epoch boundary, the feature would not activate for the effected nodes. i think this is actually fine though, as the whole cluster would activate at the same time wrt ancestry?

@AshwinSekar
Copy link
Contributor Author

one last doom scenario: if all or part of the cluster were to not be making roots at all across the epoch boundary, the feature would not activate for the effected nodes. i think this is actually fine though, as the whole cluster would activate at the same time wrt ancestry?

In this case we would start receiving VoteStateUpdate, but store them as incremental votes instead. This would be fine as they still get forwarded to banking stage (including the redundant ones).

Once we make a root past the boundary the worst case is that it happens right at a leader boundary, and that leader would drop some votes.
As you pointed out for this to be a doom scenario the cluster would have to root at different times, staggering according to the leader schedule so that each leader drops votes.

This is incredibly unlikely as the cluster usually will root and activate around the same time. Even if this were to somehow happen it could only happen once per leader compared to every leader slot as it is now.

@steviez
Copy link
Contributor

steviez commented Jun 13, 2023

I can see Ashwin force-pushed 34 minutes ago, but I'm guessing because the diff is identical (with items now split across commits), it didn't remove the ship-its that Carl and I already gave ... if that's what happened and not some other glitch that didn't wipe our ship-its that's pretty neat

@t-nelson
Copy link
Contributor

that's pretty neat

or exploitable 🤔

@t-nelson t-nelson merged commit 077e29a into solana-labs:master Jun 13, 2023
19 checks passed
mergify bot added a commit that referenced this pull request Jun 13, 2023
…31954) (#32113)

bank_send_loop: Get feature flag from root bank

(cherry picked from commit dd379bf)

# Conflicts:
#	core/src/cluster_info_vote_listener.rs

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
mergify bot added a commit that referenced this pull request Jun 14, 2023
…31954) (#32112)

bank_send_loop: Get feature flag from root bank

(cherry picked from commit dd379bf)

# Conflicts:
#	core/src/cluster_info_vote_listener.rs

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants