Skip to content

fix: keep enriching provider records past stale ones#140

Merged
lidel merged 3 commits into
mainfrom
improve-provider-enrichment
May 13, 2026
Merged

fix: keep enriching provider records past stale ones#140
lidel merged 3 commits into
mainfrom
improve-provider-enrichment

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented May 12, 2026

This PR brings ipfs-check closer to how real world Kubo provider lookup looks like during retrieval. cc @aschmahmann as he noticed https://check.ipfs.network/ being flaky with /ipns/ipfs.tech (which has high drive-by provider churn).

Problem

Provider records returned by the DHT often arrive without addresses. For CIDs with high provider churn like /ipns/ipfs.tech, the previous 10-record cap was exhausted by stale records before any usable provider arrived. Even when the routing layer did return one, each per-provider probe ran on a fresh libp2p host with an empty peerstore, so addresses the daemon had already learned for that peer were ignored.

Solution

Keep the same record-arrival behavior, but stop counting address-less records against the soft cap, drain longer, let FindPeer enrichment run on the full request context, and seed each bitswap dial with addresses the daemon already knows for the peer.

Details

Commit 90bd607 (soft/hard caps + UI hint)

  • daemon.go: split the per-provider worker into checkProvider; stream with count=0 from both DHT and IPNI; soft cap of 20 counts only providers that resolved at least one usable multiaddr; hard cap of 40 bounds total attempts. 20 is sized for how many providers Kubo's bitswap accumulates across a real retrieval (10 per provider-search round, several rounds per session).
  • web/script.js: when every returned record lacks a multiaddr, show a hint that records are likely stale and point at Backend Config.
  • CHANGELOG.md: documents the UI hint and the soft/hard cap behavior.

Commit 7f8dab3 (peerstore-seeded dials, longer timeout)

  • checkProvider seeds each bitswap dial with addresses the daemon's main host already holds for the target peer, on top of record-supplied addrs and any FindPeer fallback. Mirrors how Kubo's bitswap dials.
  • The probe host stays per-provider. Sharing one host across the request's concurrent probes is unsafe because vole.CheckBitswapCID installs a bitswap stream handler via host.SetStreamHandler, which replaces any prior handler, so a shared host would route every probe's responses to whichever receiver was registered last and the others' messages would fail the sender-vs-target check inside bsReceiver.
  • Bitswap dial timeout raised from 15s to 30s so more NAT hole-punches and relay setups have time to complete, simulating real-world Kubo behavior.
  • web/script.js: the "records likely stale" hint now requires at least one returned record, so it does not fire when the routing layer returned nothing at all.

Difference

Before After (This PR)
image image

Smoke test on a cold local ipfs-check backend daemon for /ipns/ipfs.tech (accelerated DHT off):

Before (main) After (this PR)
Attempted records 5 40
With at least one address 0 12
Bitswap.Found=true 0 2

Production with the accelerated DHT warmed up should do better.

Should we revisit other places?

Namely, does Kubo fail on /ipns/ipfs.tech? I think No.
I ran two fresh Kubo nodes and got identical results:

Config findprovs ipfs ls /ipns/ipfs.tech
Default (Routing.Type=auto, autoconf on) 20 providers success in 2.3s
DHT-only (Routing.Type=dht, no delegated) 20 providers success in 4.7s

Both Kubo runs effectively retrieve over the DHT for this CID. cid.contact returns 404. The DHT-only run had no delegated routers at all and still succeeded. The retrieval gap is therefore specific to ipfs-check, not the routing layer. It came from the cold per-probe host and the early cap on records, both of which this PR fixes.

Provider records returned by the DHT often arrive without addresses, so
the previous 10-record cap was exhausted by stale records before any
usable provider arrived. Keep the same record-arrival behavior but stop
counting address-less records against the soft cap, drain longer, and
let FindPeer enrichment run on the full request context.

- daemon.go: split the per-provider worker into checkProvider; stream
  with count=0 from both DHT and IPNI; soft cap counts only providers
  that resolved at least one usable multiaddr; hard cap bounds total
  attempts. Mirror Kubo's `ipfs routing findprovs --num-providers`
  default of 20.
- web/script.js: when every returned record lacks a multiaddr, show a
  hint that records are likely stale and point at Backend Config.
- CHANGELOG.md: document the UI hint and the soft/hard cap behavior.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

🚀 Build Preview on IPFS ready

@lidel lidel requested a review from a team May 12, 2026 21:59
A fresh libp2p host was spawned per provider, so each probe started
with an empty peerstore and ignored every address the daemon had
already learned about the peer. Kubo's bitswap dials from the
daemon-wide peerstore.

- checkProvider seeds each bitswap dial with addresses the daemon's
  main host already holds for the target peer, on top of
  record-supplied addrs and any FindPeer fallback. The probe host
  itself stays per-provider: vole.CheckBitswapCID installs a bitswap
  stream handler via host.SetStreamHandler, which replaces any prior
  handler, so a shared host would deliver concurrent probes'
  responses to the wrong receiver.
- Bitswap dial timeout raised from 15s to 30s so NAT hole-punches and
  relay setup can complete.
- web/script.js: the stale-records hint now requires at least one
  returned record, so it does not fire when the routing layer
  returned nothing at all.
- CHANGELOG entries rewritten to lead with user-visible effects; the
  constant comment now references Kubo bitswap's per-round size
  rather than the one-shot findprovs CLI default.

Smoke test on a cold local daemon for /ipns/ipfs.tech:
  before: 5 attempted, 0 with addrs, 0 Bitswap.Found
  after:  40 attempted, 12 with addrs, 2 Bitswap.Found
@lidel lidel force-pushed the improve-provider-enrichment branch from 175245c to 7f8dab3 Compare May 12, 2026 22:13
@lidel lidel marked this pull request as ready for review May 12, 2026 22:19
Comment thread daemon.go Outdated
Co-authored-by: Guillaume Michel <guillaumemichel@users.noreply.github.com>
@lidel lidel merged commit 7b9f3d3 into main May 13, 2026
11 checks passed
@lidel lidel deleted the improve-provider-enrichment branch May 13, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants