Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed missing Root notifications via geyser plugin framework #31180

Merged
merged 11 commits into from May 3, 2023

Conversation

lijunwangs
Copy link
Contributor

@lijunwangs lijunwangs commented Apr 13, 2023

Problem

It is reported that Geyser is missing some Root notifications for slots. #31124
The Root notification is sent from replay_stage's code in handle_votable_bank. https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1981. However, the validator does not necessarily vote on every slot on the rooted chain. From @carllin

For instance if the rooted chain is 1->2->3->4

You might only vote on 1 and 4
But when 4 is rooted, 2 is also rooted
But handle_votavle_bank is not called on 2

As result of this, we may miss notifications for slot 2 and 3.

Summary of Changes

  1. Enhanced BankNotification to add NewRootedChain enum to send the chains of parent roots.
  2. Renamed BankNotification::Root -> BankNotification::NewRootBank
  3. Introduced SlotNotification for SlotStatusObserver interfaces to send slot status without Bank.
  4. In the OptimisticallyConfirmedBankTracker notify parents of a new root if these parents were not notified.
  5. Modified and added unit test cases to verify the logic.

Fixes #

@lijunwangs lijunwangs force-pushed the missing_root_notification branch 2 times, most recently from 6691a1b to 4704831 Compare April 14, 2023 01:35
@codecov
Copy link

codecov bot commented Apr 14, 2023

Codecov Report

Merging #31180 (bf76b14) into master (cc6c454) will decrease coverage by 0.1%.
The diff coverage is 84.6%.

@@            Coverage Diff            @@
##           master   #31180     +/-   ##
=========================================
- Coverage    81.5%    81.5%   -0.1%     
=========================================
  Files         734      734             
  Lines      207175   207248     +73     
=========================================
+ Hits       168965   169000     +35     
- Misses      38210    38248     +38     

if bank_notification_subscribers.is_none() {
return;
}
for root_bank in bank.clone().parents_inclusive().iter().rev() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure Bank::parents_inclusive() is what we want to use here. it's going to be a slow linked-list walk to build the Vec that it returns and iirc, what's available is limited to what's in the local node's ledger. Bank::ancestors is probably what we want instead.

i think we should also take_while from current root back to the last reported root, reverse that, then notify forward, to reduce considering very old roots that have almost certainly been notified already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The code in line 180 uses the same logic. Should we do the same optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately -- we cannot use Bank::ancestors -- the notification interfaces requires a Bank for the ancestor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any assurances that the ancestor roots haven't already been dropped from BankForks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Looking into it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Tyera's point ancestors() is not safe to use because there could have been new roots made by the time this is called since this is async from replay_stage where the roots are set, and then earlier rooted banks would have been removed from BankForks

What we probably want is to send a Vec<Arc<Bank>> from replay_stage where the root is being set. In set_root We are already building a vector of parents of the current root, which should include all the parents up until the last root https://github.com/AshwinSekar/solana/blob/2efd72b1a1a06a89613232d92f75a8e1c1995a6a/core/src/replay_stage.rs#L2023-L2024. We could just grab those and send them over this channel.

A more detailed questions though is why do we need the actual banks in these notifications? I'm assuming to access AccountsDB? But having the bank N does not guarantee AccountsDB is in the same view as it was during bank N because future roots > N could have been made and cleaned up state

Thanks @carllin -- as pointed by @t-nelson, we can change the notifications by sending the (slot, parent_slot) pair without requiring the bank. Regarding the race condition on the roots on the BankForks -- I wonder why https://github.com/AshwinSekar/solana/blob/2efd72b1a1a06a89613232d92f75a8e1c1995a6a/rpc/src/optimistically_confirmed_bank_tracker.rs#L176 does not have the same concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same issue is present there as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we could reduce the scope of this pr to "be consistently broken", rather than uniquely broken? 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or fix the other place first...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or fix the other place first...?

I will fix this issue first and the investigate the other next to reduce the scope.

rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
rpc/src/optimistically_confirmed_bank_tracker.rs Outdated Show resolved Hide resolved
@carllin
Copy link
Contributor

carllin commented Apr 25, 2023

For clarity in the future, dense changes like the switches from BankNotification -> SlotNotification could be pulled into a separate PR to make reviewing easier

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy with the changes. However, optimistically_confirmed_bank_tracker is getting really difficult to grok. Can you update the PR summary of changes to include more details about the intermediate changes? eg. why are there two different notification types and sender/receivers defined?
Can you also update the module docs as well?

For clarity in the future, dense changes like the switches from BankNotification -> SlotNotification could be pulled into a separate PR to make reviewing easier

Yes, also this.

@lijunwangs
Copy link
Contributor Author

For clarity in the future, dense changes like the switches from BankNotification -> SlotNotification could be pulled into a separate PR to make reviewing easier

The reason for that is it would be more messy to use BankNotification in the SlotStatusObserver interfaces with the changed BankNotification interface.

Comment on lines 2028 to 2037
let rooted_slots_with_parents = if bank_notification_sender
.as_ref()
.map_or(false, |sender| sender.should_send_parents)
{
let mut new_chain = rooted_slots.clone();
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot()));
Some(new_chain)
} else {
None
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let rooted_slots_with_parents = if bank_notification_sender
.as_ref()
.map_or(false, |sender| sender.should_send_parents)
{
let mut new_chain = rooted_slots.clone();
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot()));
Some(new_chain)
} else {
None
};
let rooted_slots_with_parents = bank_notification_sender
.as_ref()
.map_or(false, |sender| sender.should_send_parents)
.then(|| {
let mut new_chain = rooted_slots.clone();
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot()));
new_chain
};

bool::then()'s a little cleaner

carllin
carllin previously approved these changes Apr 27, 2023
CriesofCarrots
CriesofCarrots previously approved these changes Apr 27, 2023
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm with Trent's nit

@lijunwangs lijunwangs dismissed stale reviews from CriesofCarrots and carllin via bf76b14 April 28, 2023 12:43
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@lijunwangs
Copy link
Contributor Author

Enhanced BankNotification to add NewRootedChain enum to send the chains of parent roots.
Renamed BankNotification::Root -> BankNotification::NewRootBank
Introduced SlotNotification for SlotStatusObserver interfaces to send slot status without Bank.
In the OptimisticallyConfirmedBankTracker notify parents of a new root if these parents were not notified.
Modified and added unit test cases to verify the logic.

@lijunwangs lijunwangs merged commit 7cf50e6 into solana-labs:master May 3, 2023
23 checks passed
@fanatid
Copy link
Contributor

fanatid commented May 3, 2023

Do you have a plan to make a backport to 1.13/1.14?

@t-nelson
Copy link
Contributor

t-nelson commented May 8, 2023

Definitely not 1.13. 1.14 bp probably unavoidable

mergify bot pushed a commit that referenced this pull request May 15, 2023
* Fixed missing Root notifications via geyser plugin framework

* Renamed a variable

* fmt issue

* Do not try the loop if no subscribers.

* Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions

* clippy issue

* Address some reviewing findings

* Addressed some feedback from Carl

* fix a clippy issue

* Added comments on optimistically_confirmed_bank_tracker module to explain the workflow

* Addressed Trent's review

(cherry picked from commit 7cf50e6)

# Conflicts:
#	core/src/validator.rs
#	geyser-plugin-manager/src/geyser_plugin_service.rs
lijunwangs pushed a commit that referenced this pull request May 22, 2023
…backport of #31180) (#31650)

Problem

It is reported that Geyser is missing some Root notifications for slots. #31124
The Root notification is sent from replay_stage's code in handle_votable_bank. https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1981. However, the validator does not necessarily vote on every slot on the rooted chain. From @carllin

For instance if the rooted chain is 1->2->3->4

You might only vote on 1 and 4
But when 4 is rooted, 2 is also rooted
But handle_votavle_bank is not called on 2

As result of this, we may miss notifications for slot 2 and 3.

Summary of Changes

Enhanced BankNotification to add NewRootedChain enum to send the chains of parent roots.
Renamed BankNotification::Root -> BankNotification::NewRootBank
Introduced SlotNotification for SlotStatusObserver interfaces to send slot status without Bank.
In the OptimisticallyConfirmedBankTracker notify parents of a new root if these parents were not notified.
Modified and added unit test cases to verify the logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants