New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed missing Root notifications via geyser plugin framework #31180
Fixed missing Root notifications via geyser plugin framework #31180
Conversation
6691a1b
to
4704831
Compare
4704831
to
2edf2f6
Compare
Codecov Report
@@ Coverage Diff @@
## master #31180 +/- ##
=========================================
- Coverage 81.5% 81.5% -0.1%
=========================================
Files 734 734
Lines 207175 207248 +73
=========================================
+ Hits 168965 169000 +35
- Misses 38210 38248 +38 |
if bank_notification_subscribers.is_none() { | ||
return; | ||
} | ||
for root_bank in bank.clone().parents_inclusive().iter().rev() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure Bank::parents_inclusive()
is what we want to use here. it's going to be a slow linked-list walk to build the Vec
that it returns and iirc, what's available is limited to what's in the local node's ledger. Bank::ancestors
is probably what we want instead.
i think we should also take_while
from current root back to the last reported root, reverse that, then notify forward, to reduce considering very old roots that have almost certainly been notified already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The code in line 180 uses the same logic. Should we do the same optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately -- we cannot use Bank::ancestors -- the notification interfaces requires a Bank for the ancestor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any assurances that the ancestor roots haven't already been dropped from BankForks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Looking into it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Tyera's point
ancestors()
is not safe to use because there could have been new roots made by the time this is called since this is async fromreplay_stage
where the roots are set, and then earlier rooted banks would have been removed fromBankForks
What we probably want is to send a
Vec<Arc<Bank>>
fromreplay_stage
where the root is being set. Inset_root
We are already building a vector of parents of the current root, which should include all the parents up until the last root https://github.com/AshwinSekar/solana/blob/2efd72b1a1a06a89613232d92f75a8e1c1995a6a/core/src/replay_stage.rs#L2023-L2024. We could just grab those and send them over this channel.A more detailed questions though is why do we need the actual banks in these notifications? I'm assuming to access AccountsDB? But having the bank
N
does not guarantee AccountsDB is in the same view as it was during bankN
because future roots> N
could have been made and cleaned up state
Thanks @carllin -- as pointed by @t-nelson, we can change the notifications by sending the (slot, parent_slot) pair without requiring the bank. Regarding the race condition on the roots on the BankForks -- I wonder why https://github.com/AshwinSekar/solana/blob/2efd72b1a1a06a89613232d92f75a8e1c1995a6a/rpc/src/optimistically_confirmed_bank_tracker.rs#L176 does not have the same concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same issue is present there as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess we could reduce the scope of this pr to "be consistently broken", rather than uniquely broken? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or fix the other place first...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or fix the other place first...?
I will fix this issue first and the investigate the other next to reduce the scope.
155b930
to
0d9b3ef
Compare
For clarity in the future, dense changes like the switches from |
… avoid race conditions
2eb627b
to
e343bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm happy with the changes. However, optimistically_confirmed_bank_tracker is getting really difficult to grok. Can you update the PR summary of changes to include more details about the intermediate changes? eg. why are there two different notification types and sender/receivers defined?
Can you also update the module docs as well?
For clarity in the future, dense changes like the switches from BankNotification -> SlotNotification could be pulled into a separate PR to make reviewing easier
Yes, also this.
The reason for that is it would be more messy to use BankNotification in the SlotStatusObserver interfaces with the changed BankNotification interface. |
…lain the workflow
core/src/replay_stage.rs
Outdated
let rooted_slots_with_parents = if bank_notification_sender | ||
.as_ref() | ||
.map_or(false, |sender| sender.should_send_parents) | ||
{ | ||
let mut new_chain = rooted_slots.clone(); | ||
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot())); | ||
Some(new_chain) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let rooted_slots_with_parents = if bank_notification_sender | |
.as_ref() | |
.map_or(false, |sender| sender.should_send_parents) | |
{ | |
let mut new_chain = rooted_slots.clone(); | |
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot())); | |
Some(new_chain) | |
} else { | |
None | |
}; | |
let rooted_slots_with_parents = bank_notification_sender | |
.as_ref() | |
.map_or(false, |sender| sender.should_send_parents) | |
.then(|| { | |
let mut new_chain = rooted_slots.clone(); | |
new_chain.push(oldest_parent.unwrap_or(bank.parent_slot())); | |
new_chain | |
}; |
bool::then()
's a little cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm with Trent's nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhanced BankNotification to add NewRootedChain enum to send the chains of parent roots. |
Do you have a plan to make a backport to 1.13/1.14? |
Definitely not 1.13. 1.14 bp probably unavoidable |
* Fixed missing Root notifications via geyser plugin framework * Renamed a variable * fmt issue * Do not try the loop if no subscribers. * Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions * clippy issue * Address some reviewing findings * Addressed some feedback from Carl * fix a clippy issue * Added comments on optimistically_confirmed_bank_tracker module to explain the workflow * Addressed Trent's review (cherry picked from commit 7cf50e6) # Conflicts: # core/src/validator.rs # geyser-plugin-manager/src/geyser_plugin_service.rs
…backport of #31180) (#31650) Problem It is reported that Geyser is missing some Root notifications for slots. #31124 The Root notification is sent from replay_stage's code in handle_votable_bank. https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1981. However, the validator does not necessarily vote on every slot on the rooted chain. From @carllin For instance if the rooted chain is 1->2->3->4 You might only vote on 1 and 4 But when 4 is rooted, 2 is also rooted But handle_votavle_bank is not called on 2 As result of this, we may miss notifications for slot 2 and 3. Summary of Changes Enhanced BankNotification to add NewRootedChain enum to send the chains of parent roots. Renamed BankNotification::Root -> BankNotification::NewRootBank Introduced SlotNotification for SlotStatusObserver interfaces to send slot status without Bank. In the OptimisticallyConfirmedBankTracker notify parents of a new root if these parents were not notified. Modified and added unit test cases to verify the logic.
Problem
It is reported that Geyser is missing some Root notifications for slots. #31124
The Root notification is sent from replay_stage's code in handle_votable_bank. https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1981. However, the validator does not necessarily vote on every slot on the rooted chain. From @carllin
For instance if the rooted chain is 1->2->3->4
You might only vote on 1 and 4
But when 4 is rooted, 2 is also rooted
But handle_votavle_bank is not called on 2
As result of this, we may miss notifications for slot 2 and 3.
Summary of Changes
Fixes #