Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) #19

JustinDrake · 2018-05-08T09:51:50Z

Hi there :) This is Justin from Ethereum. I am doing some due diligence on libp2p's pubsub to see if we can potentially use it for the sharding (Ethereum 2.0) gossip feeds.

Are gossipsub and meshsub the same thing? Is gossipsub the official name?
Who is working on the Go implementation of gossipsub? Is is @vyzo only right now?
Where can I find documentation on gossipsub?
Did I understand correctly that gossipsub is a pubsub MVP? What is the pubsub roadmap after gossipsub has been implemented?
How mature is gossipsub? Would it be fair to say it is still alpha? What is the target release date for gossipsub?
How does gossipsub scale with number of users? How well would it handle, say, 10000 users on the same topic?
Is there a public gossipsub testnet we can play with?
Where does the gossipsub design come from? Is it "home-rolled", based on a specific academic paper, etc.?
Would it be fair to say that pubsub research has settled on the current gossipsub design?
What are the high-level properties of gossipsub? Can I confirm that the design is "best effort" in that it provides no guarantees around delivery, ordering or persistence of messages.
What is the persistence layer of messages? In this comment there is a mention of a "time t" after which messages are expired. Where can I find this parameter t in the code?
Does gossipsub build upon floodsub? Specifically are the IHAVE messages flooded?
Is the gossipsub simulation for a single topic?
Do you know of any projects that have experimented/tested the existing gossipsub codebase?
What is the status of gossipsub in js-libp2p?
Is it possible to wrap the Go implementation with Python bindings? Would it be feasible to use XTP for that?
Is the recurring research call still active?

Various links (to help me for future reference)

Gossipsub notes
https://gist.github.com/vyzo/3e42cd92becd9972e007c8bffb1ea298

Gossipsub simulator
https://github.com/vyzo/gerbil-simsub

Gossipsub pull request
libp2p/go-libp2p-pubsub#67

Reliable and/or Persistent Pubsub
libp2p/go-libp2p-pubsub#42

PulsarCast M.Sc Thesis - Scaling libp2p PubSub
ipfs/notes#266

Take a look at pubsub on IPFS
https://ipfs.io/blog/25-pubsub

Pubsub research
https://github.com/libp2p/research-pubsub

XL peer-to-peer PubSub systems (pubsub survey)
http://dl.acm.org/citation.cfm?id=2543583

The text was updated successfully, but these errors were encountered:

vyzo · 2018-05-08T10:08:58Z

cc @whyrusleeping

vyzo · 2018-05-08T11:14:04Z

Are gossipsub and meshsub the same thing? Is gossipsub the official name?

gossipsub is intended to be the baseline of the "meshsub" family of protocols.

Who is working on the Go implementation of gossipsub? Is is @vyzo only right now?

we are working with why on this, it's not a solo effort.

Where can I find documentation on gossipsub?

Uhm, we are currently lacking on that department. I will put something together for the specs repo.

Did I understand correctly that gossipsub is a pubsub MVP? What is the pubsub roadmap after gossipsub has been implemented?

Yes, it's the MVP. It is designed to be general purpose, robust, scalable, and simple to implement.
Particular thought in the design went into catering for multiple, random (possible one time) sources.

We are planning to integrate additional protocols on top for specific applicaiton profiles.
For example, single source multicast (eg: video stream) would require a more specialized protocol, which we plan to do with epidemic broadcast tree forming.

How mature is gossipsub? Would it be fair to say it is still alpha? What is the target release date for gossipsub?

It's very alpha. It's been simulated, reviewed and unit tested.
We are looking for more eyes, and testing in the wild.

How does gossipsub scale with number of users? How well would it handle, say, 10000 users on the same topic?

It should scale quite well, modulo bugs. The amplification factor with the current design parameters is about 6.4.

Is there a public gossipsub testnet we can play with?

Not yet, but having one is probably a good idea.

Where does the gossipsub design come from? Is it "home-rolled", based on a specific academic paper, etc.?

It's not based on a specific academic paper, because this is not an academic project. We have extensively reviewed the literature, and strived to design a practical protocol that is robust, scalable, and simple to implement.

Would it be fair to say that pubsub research has settled on the current gossipsub design?

As a general purpose protocol basis, i would say yes.

What are the high-level properties of gossipsub? Can I confirm that the design is "best effort" in that it provides no guarantees around delivery, ordering or persistence of messages.

that is correct.

What is the persistence layer of messages? In this comment there is a mention of a "time t" after which messages are expired. Where can I find this parameter t in the code?

There is no persistence for messages; they are buffered for 5s, but gossiped for 3s.
The parameters are GossipSubHistoryLength and GossipSubHistoryGossip.

Does gossipsub build upon floodsub? Specifically are the IHAVE messages flooded?

It's backwards compatible, meaning that it advertises the floodsub protocol and will behave like one towards floodsub peers.
IHAVE messages are not flooded; they are probabilistically emitted towards peers that are part of the topic mesh but not direct peers.

Is the gossipsub simulation for a single topic?

Yes.

Do you know of any projects that have experimented/tested the existing gossipsub codebase?

there is interest from other projects as well; perhaps it's time to start testing the protocol in the wild.

What is the status of gossipsub in js-libp2p?

None; but this won't affect deployment, as we will act like floodsub towards js peers.

Is it possible to wrap the Go implementation with Python bindings? Would it be feasible to use XTP for that?

Hrm, i don't know how that would work; but then again i am only an intermediate gopher and not a pythonista.

Is the recurring research call still active?

Not that I know of.

whyrusleeping · 2018-05-08T11:55:46Z

Thanks for all the answers @vyzo :)

I'll supplement a few things:

Would it be fair to say that pubsub research has settled on the current gossipsub design?

Libp2p pubsub research has many different tendrils, thinking about things at many different layers. Floodsub and gossipsub are part of our 'foundational' pubsub work. These protocols are meant to be the lowest layer that other protocols can build on top of, the 'UDP of pubsub'. No guarantees around ordering or reliability are made, instead we expect them to be implemented on top of these. Other research efforts (like pulsarcast) look into how to build these other guarantees.

It's fair to say that gossipsub is our current research focus for this low layer pubsub work.

Is it possible to wrap the Go implementation with Python bindings? Would it be feasible to use XTP for that?

Are you looking for python api bindings to go? Or python bindings that call into go code directly? You should be able to do both, the second one is a bit trickier, but should be possible.

whyrusleeping · 2018-05-08T11:57:20Z

I think our next steps for gossipsub are:

Write a spec so we can discuss and review outside of the code
Merge the implementation with the explicit caveat that it may change
Deploy Deploy Deploy. Test Test Test

JustinDrake · 2018-05-08T13:36:04Z

Thanks for all the answers :)

which we plan to do with epidemic broadcast tree forming

For future reference there's a relevant writeup here.

The amplification factor with the current design parameters is about 6.4.

What is the definition of amplification factor? Does an amplification factor of 6.4 mean that every message that is published causes an average of 6.4 networking events per node?

IHAVE messages are not flooded; they are probabilistically emitted towards peers that are part of the topic mesh but not direct peers.

Ok I see the getPeers function is probabilistic. Why would IHAVE messages not be sent to direct peers (I imagine that by "direct peer" you mean one with a direct connection)?

Are you looking for python api bindings to go? Or python bindings that call into go code directly?

One of the main sharding implementations is in Python, so Python support of pubsub is pretty high up our list of requirements. I guess API bindings to the Go code would suffice. Do you know of projects that have done this?

Another possibility would be to implement libp2p and gossipsub from scratch in Python. What do you think of that? Also, would it be fair to say that the XTP project to "containarise" the transport layer (e.g. for reuse from Python) is currently in hibernation?

strived to design a practical protocol that is robust

Under what adversarial model is the protocol robust? At first glance, because there is no spam protection, it looks like an attacker may be able to DoS the network with junk.

Another worry is a Sybil attack targeting a victim node. How easy would it be for an attacker to completely isolate the victim from the rest of the network, allowing for arbitrary inbound or outbound censorship by the attacker?

Besides the DoS and censorship attacks, what other attacks should we be weary of?

Further questions looking at the code:

What is piggybacking?
How does initial peer discovery work? That is, how does a node find its first peers for a given topic?
Did I understand correctly that the "fanout" is a local and transient list of direct peers used whenever a message is published to a topic that is not joined? On the other hand "mesh" is a global and persistent list of peers (both direct and indirect) that have joined a particular topic?

vyzo · 2018-05-08T14:23:14Z

What is the definition of amplification factor? Does an amplification factor of 6.4 mean that every message that is published causes an average of 6.4 networking events per node?

Yes, it means that each message is published on average 6.4 times.
There are a few more network events associated with gossip and mesh formation, but piggybacking in the mesh with multiple subscriptions should make it a marginal fixed cost.

Why would IHAVE messages not be sent to direct peers

They already have the relevant messages with high probability, that's just gossip overhead.

(I imagine that by "direct peer" you mean one with a direct connection)

These are the direct peers in the topic mesh (overlay) -- we only push messages to those peers in steady state flow.

Under what adversarial model is the protocol robust?

Natural network events (churn, failures, etc); for the threat model I like to consider what I call Rational Adversarial Actors.
Miscreants can be dealt with at a different layer, perhaps with operator intervention.

At first glance, because there is no spam protection, it looks like an attacker may be able to DoS the network with junk.

We support topic specific validators that control whether a message is accepted -- see libp2p/go-libp2p-pubsub#55.
Combine with message signing and you can implement suitable policies in the application layer.

What is piggybacking?

A peer participating in multiple topic meshes can combine gossip with regular message flow; this minimizes the traffic impact of gossip messages in dense overlays.

How does initial peer discovery work? That is, how does a node find its first peers for a given topic?

We use auto-discovery with peer connection events.

Initial connectivity and bootstrap are outside the scope of the protocol.
Having said that, these are the techniques we are working on:

publish a provider record in the DHT; see ipns pubsub for an example.
we are working on a rendezvous protocol for generalized peer discovery; see Implement rendezvous protocol spec go-libp2p-rendezvous#1

Did I understand correctly that the "fanout" is a local and transient list of direct peers used whenever a message is published to a topic that is not joined? On the other hand "mesh" is a global and persistent list of peers (both direct and indirect) that have joined a particular topic?

More or less.

fanout is indeed for sources that publish without joing the overlay.
We use a transient list of peers (with a 1min ttl) so that we have stable publishing routes that would allow more sophisticated gossipsub implementations to optimize the overlay accordingly.

The mesh peers are the gossipsub peers that have joined the topic and for which we have grafted mesh links.

whyrusleeping · 2018-05-08T16:43:19Z

@JustinDrake Implementing just enough of libp2p to implement gossipsub should be a reasonably sized endeavor (meaning, it should be doable pretty quickly). If you know of any talented python devs with experience in writing networked applications, definitely connect us. Though, that endeavor can be separate from getting python bindings to libp2p

ghost · 2018-05-10T01:07:53Z

This thread is pretty insightful (to me too) 👍 let's keep it open until pubsub is better documented and roadmapped.

Another possibility would be to implement libp2p and gossipsub from scratch in Python. What do you think of that?

While you can get something working reasonably quickly, It'll take a significant amount of continuous effort to make it really solid and bug-free. We want to get a standalone go-libp2p daemon started very soon, and that'll let you attach transports, listen/connect, attach protocols, subscribe to topics, publish, etc. (for starters maybe with an API similar to IPFS's).

mhchia · 2018-05-10T15:57:19Z

@lgierth

While you can get something working reasonably quickly, It'll take a significant amount of continuous effort to make it really solid and bug-free. We want to get a standalone go-libp2p daemon started very soon, and that'll let you attach transports, listen/connect, attach protocols, subscribe to topics, publish, etc. (for starters maybe with an API similar to IPFS's).

It will be fantastic if the go-libp2p daemon is available! :)

Is there existing discussion or reference to the go-libp2p daemon?
It seems there already are RPCs for "subscribe to topics, publish", dht operations, and peers management in go-ipfs daemon. It is just a little bit unclear to me about how will "attach transports" and "attach protocols" be implemented. Will that be something like, the go-libp2p daemon just forward transports to the client, and the client sends back the results to the daemon?

jamesray1 · 2018-05-17T06:07:16Z

rust-libp2p, implemented by ParityTech, uses floodsub, not gossip-sub, and doesn't seem to have plans to implement it and other features (libp2p/rust-libp2p#187). It would be easier for me to work with what rust-libp2p has capability for. I could port go-floodsub and other go-libp2p stuff if needed, but it'd be better to write stuff in Rust, due to it's memory safety guarantees and performance. If implementing libp2p features that rust-libp2p doesn't have becomes more of an issue, I can address that, although at the moment it is difficult for me to tell—I'll need to read more.

vyzo · 2018-05-18T00:20:00Z

that's a feature, not a bug. don't put all eggs in one basket.

jamesray1 · 2018-07-05T04:18:00Z

From https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md:

In steady state, the protocol optimizes the multicast tree in two ways. Whenever a message is received via both an eager link and a lazy message summary, its hop count is compared. When the eager transmission hop count exceeds the lazy hop count by some threshold, then the lazy link can replace the eager link as a tree edge, reducing latency as measured in hops. In addition, active peers may be periodically replaced by passive peers with better network proximity, thus reducing propagation latency in time.

This is good for reducing latency, but by prioritizing nearer nodes, it reduces randomness and potentially security, even with certain countermeasures / changes to Kademlia (not sure if they are applicable to this protocol yet) such as:

having a maximum limit on incoming connections (implemented in Geth v1.8.0 and Parity also made changes although I'm not sure which ones);
rethink node IDs;
- one-to-one mapping between IP address and public ECDSA key ("geth places a limit on nodes in the same /24 subnet to be 2 per bucket and 10 for the whole table, which isn't as high as the researchers hoped
- Non-public mapping from node ID to bucket and "Make lookup(self) non-public.": not implemented in geth 1.8.0
always run seeding and eliminate the reboot exploitation window (done in geth 1.8.0, plus "runs lookup on three random targets during seeding in order to add more legitimate nodes from the db to the table. This prevents the attacker from inserting its own attacker nodes ID into a nearly-empty table during seeding";
"each UDP query packet (ping and findnode) should include a fresh random nonce and the returned UDP response (pong and neighbor) should include this nonce." (not implemented in Geth 1.8.0 because it is not backwards compatible with existing RLPx v4 packet formats—TODO for Ethereum 2.0's blob serialization).

jamesray1 · 2018-07-05T06:47:06Z

From https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md:

In steady state, the protocol optimizes the multicast tree in two ways. Whenever a message is received via both an eager link and a lazy message summary, its hop count is compared. When the eager transmission hop count exceeds the lazy hop count by some threshold, then the lazy link can replace the eager link as a tree edge, reducing latency as measured in hops. In addition, active peers may be periodically replaced by passive peers with better network proximity, thus reducing propagation latency in time.

This is good for reducing latency, but by prioritizing nearer nodes, it reduces randomness and potentially security, even with certain countermeasures / changes to Kademlia (not sure if they are applicable to this protocol yet) such as:

having a maximum limit on incoming connections (implemented in Geth v1.8.0 and Parity also made changes although I'm not sure which ones);
rethink node IDs;
- one-to-one mapping between IP address and public ECDSA key ("geth places a limit on nodes in the same /24 subnet to be 2 per bucket and 10 for the whole table, which isn't as high as the researchers hoped
- Non-public mapping from node ID to bucket and "Make lookup(self) non-public.": not implemented in geth 1.8.0
always run seeding and eliminate the reboot exploitation window (done in geth 1.8.0, plus "runs lookup on three random targets during seeding in order to add more legitimate nodes from the db to the table. This prevents the attacker from inserting its own attacker nodes ID into a nearly-empty table during seeding";
"each UDP query packet (ping and findnode) should include a fresh random nonce and the returned UDP response (pong and neighbor) should include this nonce." (not implemented in Geth 1.8.0 because it is not backwards compatible with existing RLPx v4 packet formats—TODO for Ethereum 2.0's blob serialization).

Also from https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md:

Missing messages and overlay repair are managed by a single background timer instead of of creating timers left and right for every missing message; that's impractical from an implementation point of view, at least in Go.

I think this may actually be possible with the tokio crate in Rust. Not sure if that means, for compatibility, that a single background timer should still be implemented.

whyrusleeping · 2018-07-05T20:11:34Z

Great notes on the security issues of peer sampling. I think the quoted bit from the spec there is referring to a future implementation based on epidemic broadcast trees, but i'm not sure. Will have to dig in a bit more.

As for the timer issue, i'm sure we can find an economical way to implement that in go. It just might be a little more complicated than using standard system timers.

jamesray1 · 2018-07-06T02:11:53Z

I think the quoted bit from the spec there is referring to a future implementation based on epidemic broadcast trees, but i'm not sure. Will have to dig in a bit more.

Right, if it's for a future optimization, it may be best to make that explicit.

jamesray1 · 2018-07-09T09:44:14Z

For the latest Ethereum sharding + Casper spec, the theoretical maximum of validators (assuming a supply cap of 128m ETH) is 4m. This conflicts with the target size of 10,000 in https://github.com/libp2p/specs/tree/master/pubsub/gossipsub#membership-management-protocol. 10,000 won't even be OK initially, since there are 16,825 nodes as I write this. Dynamic sizing is needed support up to 4,000,000 (or more) will be needed eventually, and 20,000 will be needed initially. Moved comments to libp2p/go-libp2p-pubsub#86.

vyzo · 2018-09-13T16:27:18Z

Since it was first brought up in this issue, I am sure you'd like to know that we have started work in the libp2p daemon: https://github.com/libp2p/go-libp2p-daemon

Initial barebones implementation already in place: libp2p/go-libp2p-daemon#4

daviddias changed the title ~~Misc questions~~ Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) Mar 25, 2020

daviddias transferred this issue from libp2p/go-libp2p-pubsub Mar 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) #19

Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) #19

JustinDrake commented May 8, 2018

vyzo commented May 8, 2018

vyzo commented May 8, 2018

whyrusleeping commented May 8, 2018

whyrusleeping commented May 8, 2018

JustinDrake commented May 8, 2018

vyzo commented May 8, 2018

whyrusleeping commented May 8, 2018

ghost commented May 10, 2018

mhchia commented May 10, 2018 •

edited

Loading

jamesray1 commented May 17, 2018

vyzo commented May 18, 2018

jamesray1 commented Jul 5, 2018

jamesray1 commented Jul 5, 2018 •

edited

Loading

whyrusleeping commented Jul 5, 2018

jamesray1 commented Jul 6, 2018 •

edited

Loading

jamesray1 commented Jul 9, 2018 •

edited

Loading

vyzo commented Sep 13, 2018

Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) #19

Multiple questions and datapoints on PubSub (FloodSub, Gossipsub, gerbil-simsub) #19

Comments

JustinDrake commented May 8, 2018

Various links (to help me for future reference)

vyzo commented May 8, 2018

vyzo commented May 8, 2018

whyrusleeping commented May 8, 2018

whyrusleeping commented May 8, 2018

JustinDrake commented May 8, 2018

vyzo commented May 8, 2018

whyrusleeping commented May 8, 2018

ghost commented May 10, 2018

mhchia commented May 10, 2018 • edited Loading

jamesray1 commented May 17, 2018

vyzo commented May 18, 2018

jamesray1 commented Jul 5, 2018

jamesray1 commented Jul 5, 2018 • edited Loading

whyrusleeping commented Jul 5, 2018

jamesray1 commented Jul 6, 2018 • edited Loading

jamesray1 commented Jul 9, 2018 • edited Loading

vyzo commented Sep 13, 2018

mhchia commented May 10, 2018 •

edited

Loading

jamesray1 commented Jul 5, 2018 •

edited

Loading

jamesray1 commented Jul 6, 2018 •

edited

Loading

jamesray1 commented Jul 9, 2018 •

edited

Loading