Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GossipSub: Limit flood publishing #911

Merged
merged 12 commits into from
Jul 31, 2023
Merged

GossipSub: Limit flood publishing #911

merged 12 commits into from
Jul 31, 2023

Conversation

Menduist
Copy link
Contributor

@Menduist Menduist commented Jun 9, 2023

Part of #854

This PR adds a limit to the flood publishing, so that we spend at most 1 heartbeat flood publishing
Of course, this requires to know our bandwidth, which is currently hardcoded to 100 mbps, but this will be replaced by an actual estimate later

With 700ms heartbeat and 1mb message size, we will send it to a maximum of 17 peers (including the mesh)

The "1 heartbeat" period is used because after that, we will probably be responding to IWANTs

Next step for flood publishing will be to integrate #850 (staggered sending), which will not only give us the bandwidth estimate, but also send first to the mesh and then the flood.

@Menduist Menduist changed the title Limit flood publishing GossipSub: Limit flood publishing Jun 9, 2023
@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Merging #911 (f68b32a) into unstable (b784167) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           unstable     #911      +/-   ##
============================================
- Coverage     83.51%   83.46%   -0.05%     
============================================
  Files            91       91              
  Lines         15144    15145       +1     
============================================
- Hits          12647    12641       -6     
- Misses         2497     2504       +7     
Files Changed Coverage Δ
libp2p/utility.nim 42.22% <ø> (ø)
libp2p/protocols/pubsub/gossipsub.nim 87.01% <100.00%> (+0.25%) ⬆️
libp2p/protocols/pubsub/gossipsub/behavior.nim 88.39% <100.00%> (+0.24%) ⬆️

... and 9 files with indirect coverage changes

msgSize = data.len
bandwidth = 25_000_000 #TODO replace with bandwidth estimate
msToTransmit = max(msgSize div (bandwidth div 1000), 1)
maxFloodPublish =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for a min?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the messages are that big (2mb), it will effectively disable flood publish
The idea of this PR is that flood publish shouldn't last longer than one heartbeat, since after that we will be busy responding to IWANT requests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider the case where we're not subscribed to the topic - sending to 0 peers seems a bit harsh then...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that won't happen before >50mb, but still good to cover it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought a bit more about this, and I think we want to put the minimum at dmin at least ..

it feels very risky to send only to one peer - if that peer is slow, it'll delay the message for all peers by a heartbeat and it relies a bit too heavily on the IHAVE/IWANT mechanism to recover, ie there's no redundancy...

I know there's a cost here for sending big messages from slow peers, but I think the risk for normal/highbandwidth peers is more real and costly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum is currently dLow (it will be caught by the if peers.len < g.parameters.dLow: below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't fanout peers populated only when we're not subscribed? also as such, there might not be enough of them in the fanout table either?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use them when we can't publish to enough peers (ie bad mesh), and they are replenished when < dLow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use them when we can't publish to enough peers (ie bad mesh), and they are replenished when < dLow

so sending to g.gossipsub[topic] as a last resort wouldn't be appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same as using the fanout, the fanout is only a small cache of gossipsub to have more stable diffusion routes, basically

@arnetheduck
Copy link
Contributor

The "1 heartbeat" period is used because after that, we will probably be responding to IWANTs

after 1 heartbeat, should we sent an IHAVE to all peers we didn't flood it to?

@Menduist
Copy link
Contributor Author

after 1 heartbeat, should we sent an IHAVE to all peers we didn't flood it to?

Not sure thats wise, most likely they will all reply with IWANTs, and we will be back to square one

@arnetheduck
Copy link
Contributor

Not sure thats wise, most likely they will all reply with IWANTs, and we will be back to square one

hm, what might make sense here is an exponential increase per heartbeat .. if we send message to 4 peers, we iwant 4 peers at the heartbeat, then 8 and so on for each heartbeat that passes - that should help avoid eclipsing while at the same time allowing the message to spread before

if g.parameters.floodPublish:
let
msgSize = data.len
bandwidth = 25_000_000 #TODO replace with bandwidth estimate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we need to make this a (debug) parameter until we have something better - ie in nimbus-eth2, this would be a hidden command-line parameter that we pull out in case of weird behavior - hidden, because obviously it's something that should go away at some point

@Menduist
Copy link
Contributor Author

@diegomrsantos not sure I like the refacto you did, the splits seems a bit random, and the flow became harder to follow IMO

Not sure publishMessage is relevant, but we could do something like

if g.parameters.floodPublish:
  addFloodPublishPeers(g, topic, data)

if peers.len < g.parameters.dLow:
  # not subscribed, or bad mesh, send to fanout peers
  addFanoutPeers(g, topic, peers)

@diegomrsantos
Copy link
Collaborator

diegomrsantos commented Jul 25, 2023

@Menduist the original code was confusing to me, but feel free to remove or modify the changes.

@Menduist Menduist merged commit 7a369dd into unstable Jul 31, 2023
10 checks passed
@Menduist Menduist deleted the limitfloodpublish branch July 31, 2023 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants