Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retransmits shreds recovered from erasure codes #19233

Merged
merged 6 commits into from
Aug 17, 2021

Conversation

behzadnouri
Copy link
Contributor

@behzadnouri behzadnouri commented Aug 13, 2021

Problem

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

Summary of Changes

  • 4th commit reworks retransmit stage to simply receive shreds instead of packets; because converting recovered "shreds" to "packets" just to send them to retransmit-stage is superfluous. This will also avoid partially deserializing packet in retransmit stage when the entire packet was already deserialized to shred earlier in the window-stage.
  • In preparation of the above change, first 2 commits move a number of metrics related to packet-counts from retransmit stage to window-service earlier in the pipeline. Metrics for repair and discard packets are emitted in the window stage and, with the 4th commit, those packets are no longer sent down to retransmit-stage (which was pointless since they will not be retransmitted).
  • Last commit, channels through recovered shreds to retransmit stage in order to further broadcast the shreds to downstream nodes in the tree.

@behzadnouri behzadnouri changed the title retransmits shreds recovered from erasure codings retransmits shreds recovered from erasure codes Aug 13, 2021
@behzadnouri behzadnouri force-pushed the retrans-shreds branch 3 times, most recently from 12b60d2 to 0de4fc8 Compare August 14, 2021 16:26
@codecov
Copy link

codecov bot commented Aug 14, 2021

Codecov Report

Merging #19233 (8f5553c) into master (692aa99) will decrease coverage by 0.0%.
The diff coverage is 79.9%.

@@            Coverage Diff            @@
##           master   #19233     +/-   ##
=========================================
- Coverage    82.8%    82.8%   -0.1%     
=========================================
  Files         455      455             
  Lines      130044   129960     -84     
=========================================
- Hits       107780   107692     -88     
- Misses      22264    22268      +4     

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.
Adding back these metrics from the earlier commit which removed them
from retransmit stage.
Working towards channelling through shreds recovered from erasure codes
to retransmit stage.
instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
.zip(&repair_infos)
.filter(|(_, repair_info)| repair_info.is_none())
.map(|(shred, _)| shred)
.cloned()
Copy link
Contributor

@carllin carllin Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could avoid costly clones if we switched over to some Arc<Shred>, but that's a story for another time

rpc_subscriptions: Option<Arc<RpcSubscriptions>>,
duplicate_slots_sender: Sender<Slot>,
ancestor_hashes_replay_update_receiver: AncestorHashesReplayUpdateReceiver,
) -> Self {
let (retransmit_sender, retransmit_receiver) = channel();
// https://github.com/rust-lang/rust/issues/39364#issuecomment-634545136
let _retransmit_sender = retransmit_sender.clone();
Copy link
Contributor

@carllin carllin Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ick, I've run into this too, was annoying to debug

@carllin
Copy link
Contributor

carllin commented Aug 17, 2021

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

Hmm, what if in cases where we just happen to receive some combination of erasure shreds + data shreds first such that we recover some data shreds that we eventually would have gotten from turbine. Will this cause a massive spike in bandwidth if we then retransmit these recovered shreds, even though turbine had no problem circulating them?

To avoid this, should we wait for a bit to see that these recovered shreds don't ultimately arrive through turbine?

Copy link
Contributor

@carllin carllin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is so much cleaner, awesome! 😃

Copy link
Contributor

@jbiseda jbiseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

bank_forks: Arc<RwLock<BankForks>>,
retransmit: PacketSender,
retransmit_sender: Sender<Vec<Shred>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for renaming these :)

@solana-labs solana-labs deleted a comment from KeKo6988 Aug 17, 2021
@solana-labs solana-labs deleted a comment from KeKo6988 Aug 17, 2021
@solana-labs solana-labs deleted a comment from KeKo6988 Aug 17, 2021
@behzadnouri
Copy link
Contributor Author

Hmm, what if in cases where we just happen to receive some combination of erasure shreds + data shreds first such that we recover some data shreds that we eventually would have gotten from turbine. Will this cause a massive spike in bandwidth if we then retransmit these recovered shreds, even though turbine had no problem circulating them?

To avoid this, should we wait for a bit to see that these recovered shreds don't ultimately arrive through turbine?

That should be picked up by duplicates check in retransmit stage (i.e. check_if_already_received) and so the 2nd copy of the shred will be skipped:
https://github.com/solana-labs/solana/blob/0e5ea36cc/core/src/retransmit_stage.rs#L227-L253

@behzadnouri behzadnouri merged commit 7a8807b into solana-labs:master Aug 17, 2021
@behzadnouri behzadnouri deleted the retrans-shreds branch August 17, 2021 13:44
behzadnouri added a commit that referenced this pull request Sep 27, 2021
…0249)

* removes packet-count metrics from retransmit stage

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.

(cherry picked from commit bf437b0)

# Conflicts:
#	core/src/retransmit_stage.rs

* adds packet/shred count stats to window-service

Adding back these metrics from the earlier commit which removed them
from retransmit stage.

(cherry picked from commit 8198a7e)

* removes erroneous uses of Arc<...> from retransmit stage

(cherry picked from commit 6e41333)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs

* sends shreds (instead of packets) to retransmit stage

Working towards channelling through shreds recovered from erasure codes
to retransmit stage.

(cherry picked from commit 3efccbf)

# Conflicts:
#	core/src/retransmit_stage.rs

* returns completed-data-set-info from insert_data_shred

instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.

(cherry picked from commit 3c71670)

# Conflicts:
#	ledger/src/blockstore.rs

* retransmits shreds recovered from erasure codes

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.

(cherry picked from commit 7a8807b)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Apr 5, 2022
test_skip_repair in retransmit-stage is no longer relevant because
following: solana-labs#19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
solana-labs#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
behzadnouri added a commit that referenced this pull request Apr 5, 2022
…4121)

test_skip_repair in retransmit-stage is no longer relevant because
following: #19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
mergify bot pushed a commit that referenced this pull request Apr 5, 2022
…4121)

test_skip_repair in retransmit-stage is no longer relevant because
following: #19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.

(cherry picked from commit 2282571)
behzadnouri added a commit that referenced this pull request Apr 5, 2022
…4121) (#24126)

test_skip_repair in retransmit-stage is no longer relevant because
following: #19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.

(cherry picked from commit 2282571)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
mergify bot pushed a commit that referenced this pull request Apr 25, 2022
…4121)

test_skip_repair in retransmit-stage is no longer relevant because
following: #19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.

(cherry picked from commit 2282571)

# Conflicts:
#	core/src/retransmit_stage.rs
mergify bot added a commit that referenced this pull request Apr 25, 2022
…ckport #24121) (#24663)

* removes outdated and flaky test_skip_repair from retransmit-stage (#24121)

test_skip_repair in retransmit-stage is no longer relevant because
following: #19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
#24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.

(cherry picked from commit 2282571)

# Conflicts:
#	core/src/retransmit_stage.rs

* removes mergify merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants