Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wantlist overflow handling to select newer entries #629

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gammazero
Copy link
Contributor

wantlist overflow handling now cancels existing entries to make room for newer requests. This fix prevents the wantlist from filling up with CIDs that the server does not have.

Fixes #527

wantlist overflow handling now cancels existing entries to make room for newer requests. This fix prevents the wantlist from filling up with CIDs that the server does not have.

Fixes #527
@gammazero gammazero requested a review from a team as a code owner June 23, 2024 20:23
Copy link

codecov bot commented Jun 23, 2024

Codecov Report

Attention: Patch coverage is 95.45455% with 3 lines in your changes missing coverage. Please review.

Project coverage is 59.87%. Comparing base (dfd4a53) to head (713faee).

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #629      +/-   ##
==========================================
+ Coverage   59.79%   59.87%   +0.07%     
==========================================
  Files         238      238              
  Lines       29984    30014      +30     
==========================================
+ Hits        17930    17971      +41     
+ Misses      10434    10425       -9     
+ Partials     1620     1618       -2     
Files Coverage Δ
bitswap/message/message.go 81.97% <100.00%> (+0.97%) ⬆️
bitswap/server/internal/decision/peer_ledger.go 94.23% <100.00%> (+0.05%) ⬆️
bitswap/server/internal/decision/engine.go 91.78% <94.64%> (+0.55%) ⬆️

... and 12 files with indirect coverage changes

@lidel lidel changed the title Fix wantlist overflow handling to select newer entries. fix: wantlist overflow handling to select newer entries Jun 24, 2024
// are not in the new request.
for _, entry := range wants {
if e.peerLedger.CancelWant(p, entry.Cid) {
e.peerRequestQueue.Remove(entry.Cid, p)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to not remove the entry from peerRequestQueue here, since this one is being replaced with an identical want from the new request.

@Wondertan
Copy link
Member

This option will need a minor change

if e.maxCidSize != 0 && uint(entry.Cid.ByteLen()) > e.maxCidSize {
// Ignore requests about CIDs that big.
continue
wants = filteredWants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which line to leave this comment on, but adding some thoughts based on conversations with @gammazero and @Stebalien

It might be helpful if we logically had two "wantlists" here:

  1. The set of blocks that the server is currently in the middle of processing responses for (i.e. request-response style semantics)
  2. The set of blocks that the server doesn't currently have, but if it receives them it will notify / send them out to requesters (i.e. subscription semantics)

We currently have the wantlist and taskqueue which might be these two queues, but they also might not given we want to be able to do things like cancel tasks in the taskqueue without needing to re-enumerate the entire queue in case it's large. It might also be fine if the capacity for subscription-wants only really exists if the request-response wants have been satisfied already (e.g. if there are tons of new requests coming in flushing out all the old subscriptions is probably fine, but it seems ok to have more capacity for request-response which is supposed to be "moving" than for subscriptions that can stay and take up memory indefinitely).

Also, for others watching: I think it's fairly clear that some protocol changes to help with backpressure here would be great, but since they're protocol changes those will have to wait for another time 😅.

Copy link
Member

@Wondertan Wondertan Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(slightly tangential)
We use Bitswap with a case where more than 1024(default max) CIDs are requested in a single GetBlocks. If we do that, the Bitswap hangs because of this limit and recovers in a minute after rebroadcasting. It would be great if Bitswap clients could handle that case and avoid sending more than protocol-wide constant by backpressure-ing the caller.

Copy link
Contributor

@aschmahmann aschmahmann Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if Bitswap clients could handle that case and avoid sending more than protocol-wide constant by backpressure-ing the caller.

Could you clarify your suggestion? Reading it I see 3 options (but these might not even be correct 😅):

  1. Having the server backpressure the client on the other side of the wire
    • Similar to my comment above, and AFAICT requires some protocol changes to accommodate
  2. Have the bitswap client internally batch GetBlocks calls that are for larger than maxWantlistSize into batches before returning them in case the batch is fully returned before the rebroadcast interval?
    • Doable if it'll be helpful, although it's a bit gross since A) maxWantlistSize shouldn't really be a protocol/network-wide thing and be per-client B) this batching can be done outside of the bitswap client package.
  3. The bitswap client backpressuring the caller (e.g. code that's walking a DAG)
    • My suspicion is this would best be served by having a streaming version of GetBlocks that you could block on which is a separate problem that seems like [ipfs/go-bitswap] Proposal: Streaming GetBlocks #121. Which if you're interested in let's chat on the issue there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the elaborate response!

My suggestion was some form of 2 and 3. The client should be able to deal with the server-side request rate-limiting; otherwise, the client gets stuck expecting to be served, while the server simply cuts off his wants. The rebroadcasting after one minute helps, but that's still one minute and request spamming overhead over the wire.

Ideally, we would do a protocol change as described in 1, but as it's breaking, we may consider other less clean options, like or similar to 2. Setting protocol-wide maxWantlistSize is gross, I agree. Another option might be negotiating the limit between the client and the server so the client knows it should never exceed it.

The 3 is complimentary and provides a new, powerful way to interface with a client. However, I just realized that it is not necessary for our case if the client is smart enough. In our case, we have a flat structure, i.e. we don't traverse a DAG where we unpack IPLD nodes to get more CID links to fetch them and unpack again to get to the data. Essentially, we know all the CIDs in advance and could simply ask Bitswap to get all of them over GetBlocks as long the client is smart enough not to get into any limitations of its immediate peers, which we are currently facing with this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re protocol breaking 1. We are actually fine with this being a protocol-breaking change as we are building a new Bitswap-based protocol that hasn't been deployed yet and I believe there is more or less a clear way how to handle protocol version bumps in bitswap network component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bitswap/server: wantlist overflows fails in a toxic maner preventing any data transfer
3 participants