-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds metric for turbine retransmit tree mismatch #17351
adds metric for turbine retransmit tree mismatch #17351
Conversation
125796e
to
ec4e94f
Compare
Codecov Report
@@ Coverage Diff @@
## master #17351 +/- ##
=========================================
- Coverage 82.7% 82.7% -0.1%
=========================================
Files 421 424 +3
Lines 118048 118422 +374
=========================================
+ Hits 97664 97969 +305
- Misses 20384 20453 +69 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems straightforward. Couple nits.
let r_lock = r.lock().unwrap(); | ||
let packets = r_lock.recv_timeout(timer)?; | ||
let packets = r_lock.recv_timeout(RECV_TIMEOUT)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let packets = r_lock.recv_timeout(RECV_TIMEOUT)?; | |
let packets = r_lock.recv_timeout(Duration::from_secs(1))?; |
Could this just be a one-liner? Or do you expect to use the const elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I picked up this habit from previous jobs, to use consts to avoid magic numbers in the code as a means of (mini) documentation and maybe more readability. I guess here it does not matter much either case.
core/src/retransmit_stage.rs
Outdated
// If the node is on the critical path (i.e. the first node in each | ||
// neighborhood), then we expect that the packet arrives at tvu socket | ||
// as opposed to tvu-forwards. If this is not the case, then the | ||
// turbine broadcast/retransmit tree mismatch across nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// turbine broadcast/retransmit tree mismatch across nodes. | |
// turbine broadcast/retransmit tree is mismatched across nodes. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
// neighborhood), then we expect that the packet arrives at tvu socket | ||
// as opposed to tvu-forwards. If this is not the case, then the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clever way to detect this, nice!
In order to remove port-based forwarding logic in turbine, we need to first track how often the turbine retransmit/broadcast trees mismatch across nodes. One consistency condition is that if the node is on the critical path (i.e. the first node in each neighborhood), then we expect that the packet arrives at tvu socket as opposed to tvu-forwards. This commit adds a metric to track how often above condition is not met.
The code can be simplified by just flattening the vector of packets.
ec4e94f
to
2134646
Compare
Pull request has been modified.
…17392) * adds metric for turbine retransmit tree mismatch In order to remove port-based forwarding logic in turbine, we need to first track how often the turbine retransmit/broadcast trees mismatch across nodes. One consistency condition is that if the node is on the critical path (i.e. the first node in each neighborhood), then we expect that the packet arrives at tvu socket as opposed to tvu-forwards. This commit adds a metric to track how often above condition is not met. (cherry picked from commit 71de021) * removes the nested for loop from retransmit-stage The code can be simplified by just flattening the vector of packets. (cherry picked from commit ff0e623) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see solana-labs#6672 solana-labs#7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in solana-labs#17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see solana-labs#6672 solana-labs#7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in solana-labs#17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see solana-labs#6672 solana-labs#7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in solana-labs#17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see solana-labs#6672 solana-labs#7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in solana-labs#17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see solana-labs#6672 solana-labs#7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in solana-labs#17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see #6672 #7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in #17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.
Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see #6672 #7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in #17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe. (cherry picked from commit 1618386)
…17973) Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see #6672 #7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in #17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe. (cherry picked from commit 1618386) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
In order to remove port-based forwarding logic in turbine, we need to first track how often the turbine retransmit/broadcast trees mismatch across nodes. One consistency condition is that if the node is on the critical path (i.e. the first node in each neighborhood), then we expect that the packet arrives at tvu socket as opposed to tvu-forwards.
Summary of Changes