Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[Draft] Bitswap Protocol Extensions #186
This proposal extends the Bitswap Protocol:
In #167 we discuss improvements to Bitswap without changing the protocol
Request patterns broadly fit into a couple of common use cases.
A new node wants all the information that many other nodes already have. The new node requests
Web Page / Package Manager
For example, a user clicks on a link to
The local node requests:
For use cases with high fan-out (eg Wikipedia), typically
For both of these typical request patterns it is important to fetch the initial blocks quickly as they contain links to blocks at deeper levels in the DAG.
Current Bitswap Implementation
In the current implementation of Bitswap, the local node sends WANT requests for blocks to its peers. Nodes maintain a "want list" for each peer they are connected to, ie the list of blocks that a peer has asked for. When a node receives a block that it wants, it sends a CANCEL message to all peers it had previously sent a WANT message to.
Initially the Session discovers peers that have blocks it is interested in by
When the provider interface returns peers the Session adds them to an "unoptimized" peers list.
Once some peers have been discovered, subsequent requests are split across the list of peers. The Session maintains a count of how many unique blocks and duplicate blocks it receives (duplicate blocks are those that are received more than once). It uses the unique / duplicate ratio to adjust the split factor, which determines how many peers each WANT request will be sent to (see #167 and #165 for more detail on splitting). The Session adjusts the split factor up and down to try to maintain a balance between receiving the data it wants quickly while trying to minimize duplicates.
For example if there are 8 peers (A - H) and 10 CIDs (0 - 9), and the split factor is 3:
The Session limits the "live" want list to 32 WANTs (a "live" want is a request that has been sent but a response has not yet been received).
Bitswap Protocol Extensions
The goals are:
HAVE / DONT_HAVE
In the current implementation the local node sends a WANT message to request a block. However in some cases it's advantageous to know whether a peer has a block but not to request the block data itself. This allows a node to build up knowledge about the distribution of blocks amongst its peers, without requesting a lot of duplicate data.
This proposal extends the WANT message with two flags:
Note that both these flags can be set (they are not exclusive).
When a node is in the discovery phase, it broadcasts a request to all connected peers. At this stage it is only interested in HAVE (it is not interested in blocks or DONT_HAVE) eg:
When the node starts retrieving groups of blocks, it splits the requests for blocks in the group across peers. The node asks each peer for some of the blocks, and sets
A WANT with
Outstanding Queue Size
When the local node sends a request to a peer, the peer's response includes the number of outstanding blocks and their total size in bytes. For example:
This allows the local node to choose which peers to send requests to so as to
Follow the current model:
Keep peers busy
As the session discovers peers it moves them into a candidate peer list. The session sends each new want to the candidate peer with the least items in its live want queue. The live want queue size has a per-peer limit that varies according to how much data the peer is waiting to send to the local node (outstanding data):
Bitswap tries to keep the outstanding data just above zero using the queue size and a moving average of variance in the size of outstanding data.
The next WANT would be sent to Peer A as it has the smallest queue size (2 free spaces).
This allows Bitswap to send WANTs to the peers with the highest throughput, while responding to back pressure.
The current implementation processes wants in groups (see #167). However when bandwidth is limited we can only send individual wants or small groups of wants, so in this proposal we move towards processing wants individually (as a stream).
For each want CID, the session selects a peer by:
In order to determine how many peers to send a WANT to, each peer / want combination is assigned a probability called the "want potential". Wants are sent in order of want potential, then FIFO.
The session sends wants to multiple peers until the "want potential" is over a threshold. The threshold varies according to the ratio of unique / duplicate blocks received by the session (so as to tradeoff a high chance of receiving a block quickly vs too much duplicate data)
The want potential changes as the local node receives messages from peers.
The session piggybacks informational requests onto WANTS it sends to peers.
Let's make it clear that these are flags. That is, when sending a want to a peer, one can specify one or more of the following flags:
We may also want to note that, when SEND_HAVE is specified, the remote peer may send the actual block if it determines that it's small enough (up to the remote peer).
8-9 in the diagram are a bit unclear. Let's specify exactly what we send to each peer.
We should probably talk about how we'd like to keep the queue size just above 0 by using the queue size + moving variance.
There's also an open question about whether we should just use queue length instead of the amount of data in the queue. That makes it easier to figure out how many wants we should send.
Something to consider - how does WantHave/SentDontHave work in the context of the PeerTaskQueue / load balancing on the responder side. Currently the responder does check if it has the block before the load balancing in the PeerTaskQueue occurs (https://github.com/ipfs/go-bitswap/blob/master/decision/engine.go#L284) -- but I wonder if the entire processing should occur before load balancing occurs? i.e. is the a solid DOS attack by just sending a ton of want requests that are want_have=true / send_dont_have=true?
Another thing to consider -- ipfs/specs#201
I still would love it if this PR/spec idea got some love. If you're going to be relying on a more detailed conversation to happening between peers, I think you need a way to verify everyone is hearing each other :)
Currently only wantlist deltas are sent, but there is no error correction, so if messages get lost or a peer hangs up, the only way to recover is to wait for a periodic wantlist rebroadcast (every 30 seconds I think -- see https://github.com/ipfs/go-bitswap/blob/master/messagequeue/messagequeue.go#L18).
Seems like the already existing problems with this approach might be amplified by increasing the chattiness of bitswap.
Those are good points, thanks Hannah.
I agree that we need to take precautions against DoS - maybe a per-client rate limit on the serving node.
With respect to recovering from disconnects, would it be sufficient to resend the whole wantlist on reconnect?
Agreed, I'm planning to improve the documentation to explain how Bitswap works in detail, as I mostly had to work it out from reading through the code :)
There are some issues about adding better documentation for Bitswap: