Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ShredFetchStage refresh root bank even if no shreds arrive #33078

Closed
wants to merge 2 commits into from

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Aug 30, 2023

Problem

#31576 introduced a QUIC receiver for turbine. From the networking perspective, this code is currently dormant and doesn't really do anything given that we aren't broadcasting shreds over QUIC / to QUIC TVU port.

ShredFetchStage implements some basic verification in modify_packets. Some of this verification currently relies on having a (root) bank in order to check feature status as well as to check number of slots in an epoch (number of slots in an epoch could be calculated and then Arc<Bank> could be dropped). The root bank is fetched here:

let (mut root_bank, mut last_slot) = {
let bank_forks_r = bank_forks.read().unwrap();
(bank_forks_r.root_bank(), bank_forks_r.highest_slot())

At the time that the bank is fetched, the node just unpacked its' snapshot. So, this initial root bank corresponds to the load-snapshot-slot. Later in the function, we currently use a blocking method to pull shred out of the crossbeam receiver:

for mut packet_batch in recvr {

Since we're not getting any QUIC turbine packets, execution is blocked waiting for a shred to show up, and the following code that updates the root bank is never executed:

let bank_forks_r = bank_forks.read().unwrap();
root_bank = bank_forks_r.root_bank();

As a result, the initial root bank at startup (ie the snapshot slot) is held by the solTvuFetchQuic thread that is spawned below for the duration of the process.

Builder::new()
.name("solTvuFetchQuic".to_string())
.spawn(move || {
Self::modify_packets(
packet_receiver,
sender,
&bank_forks,
shred_version,
"shred_fetch_quic",
PacketFlags::empty(),
None, // repair_context
turbine_disabled,
)
})

Summary of Changes

Rework the code to timeout on reading from the crossbeam channel so that the root bank is updated even if no shreds are actually showing up.

I have some scaffolding code that tracks creation / drop of new banks.

  • Prior to this change, I could see the bank that the load-snapshot-slot bank was never dropped
  • With this change, I do see that bank getting dropped

@behzadnouri
Copy link
Contributor

Wondering where this is causing problem, because this is a master-only code.

That aside, the feature activation check can be removed, and we don't need a root-bank anymore: #33088
So I believe we can just remove the root-bank here. Once #33088 is merged I can do that or you do it, whichever you prefer.

@steviez
Copy link
Contributor Author

steviez commented Aug 31, 2023

Wondering where this is causing problem, because this is a master-only code.

Yep, master only. Was looking to do some profiling and saw this, so tracked this down prior for cleaner results.

I can do that or you do it, whichever you prefer.

Either or. If you have time and want to knock it out real quick, feel free. If you're more preoccupied with other stuff, happy to scoop it up tomorrow too

@steviez
Copy link
Contributor Author

steviez commented Aug 31, 2023

Closing in lieu of #33088 - no need to rework this function since the root bank even being pulled out will be going away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants