-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wantlist overflow handling to select newer entries #629
base: main
Are you sure you want to change the base?
Conversation
wantlist overflow handling now cancels existing entries to make room for newer requests. This fix prevents the wantlist from filling up with CIDs that the server does not have. Fixes #527
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #629 +/- ##
==========================================
+ Coverage 59.79% 59.87% +0.07%
==========================================
Files 238 238
Lines 29984 30014 +30
==========================================
+ Hits 17930 17971 +41
+ Misses 10434 10425 -9
+ Partials 1620 1618 -2
|
// are not in the new request. | ||
for _, entry := range wants { | ||
if e.peerLedger.CancelWant(p, entry.Cid) { | ||
e.peerRequestQueue.Remove(entry.Cid, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to not remove the entry from peerRequestQueue
here, since this one is being replaced with an identical want from the new request.
This option will need a minor change |
if e.maxCidSize != 0 && uint(entry.Cid.ByteLen()) > e.maxCidSize { | ||
// Ignore requests about CIDs that big. | ||
continue | ||
wants = filteredWants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which line to leave this comment on, but adding some thoughts based on conversations with @gammazero and @Stebalien
It might be helpful if we logically had two "wantlists" here:
- The set of blocks that the server is currently in the middle of processing responses for (i.e. request-response style semantics)
- The set of blocks that the server doesn't currently have, but if it receives them it will notify / send them out to requesters (i.e. subscription semantics)
We currently have the wantlist and taskqueue which might be these two queues, but they also might not given we want to be able to do things like cancel tasks in the taskqueue without needing to re-enumerate the entire queue in case it's large. It might also be fine if the capacity for subscription-wants only really exists if the request-response wants have been satisfied already (e.g. if there are tons of new requests coming in flushing out all the old subscriptions is probably fine, but it seems ok to have more capacity for request-response which is supposed to be "moving" than for subscriptions that can stay and take up memory indefinitely).
Also, for others watching: I think it's fairly clear that some protocol changes to help with backpressure here would be great, but since they're protocol changes those will have to wait for another time 😅.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(slightly tangential)
We use Bitswap with a case where more than 1024(default max) CIDs are requested in a single GetBlocks
. If we do that, the Bitswap hangs because of this limit and recovers in a minute after rebroadcasting. It would be great if Bitswap clients could handle that case and avoid sending more than protocol-wide constant by backpressure-ing the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if Bitswap clients could handle that case and avoid sending more than protocol-wide constant by backpressure-ing the caller.
Could you clarify your suggestion? Reading it I see 3 options (but these might not even be correct 😅):
- Having the server backpressure the client on the other side of the wire
- Similar to my comment above, and AFAICT requires some protocol changes to accommodate
- Have the bitswap client internally batch
GetBlocks
calls that are for larger thanmaxWantlistSize
into batches before returning them in case the batch is fully returned before the rebroadcast interval?- Doable if it'll be helpful, although it's a bit gross since A)
maxWantlistSize
shouldn't really be a protocol/network-wide thing and be per-client B) this batching can be done outside of the bitswap client package.
- Doable if it'll be helpful, although it's a bit gross since A)
- The bitswap client backpressuring the caller (e.g. code that's walking a DAG)
- My suspicion is this would best be served by having a streaming version of
GetBlocks
that you could block on which is a separate problem that seems like [ipfs/go-bitswap] Proposal: Streaming GetBlocks #121. Which if you're interested in let's chat on the issue there.
- My suspicion is this would best be served by having a streaming version of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the elaborate response!
My suggestion was some form of 2 and 3. The client should be able to deal with the server-side request rate-limiting; otherwise, the client gets stuck expecting to be served, while the server simply cuts off his wants. The rebroadcasting after one minute helps, but that's still one minute and request spamming overhead over the wire.
Ideally, we would do a protocol change as described in 1, but as it's breaking, we may consider other less clean options, like or similar to 2. Setting protocol-wide maxWantlistSize
is gross, I agree. Another option might be negotiating the limit between the client and the server so the client knows it should never exceed it.
The 3 is complimentary and provides a new, powerful way to interface with a client. However, I just realized that it is not necessary for our case if the client is smart enough. In our case, we have a flat structure, i.e. we don't traverse a DAG where we unpack IPLD nodes to get more CID links to fetch them and unpack again to get to the data. Essentially, we know all the CIDs in advance and could simply ask Bitswap to get all of them over GetBlocks
as long the client is smart enough not to get into any limitations of its immediate peers, which we are currently facing with this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re protocol breaking 1. We are actually fine with this being a protocol-breaking change as we are building a new Bitswap-based protocol that hasn't been deployed yet and I believe there is more or less a clear way how to handle protocol version bumps in bitswap network component.
wantlist overflow handling now cancels existing entries to make room for newer requests. This fix prevents the wantlist from filling up with CIDs that the server does not have.
Fixes #527