-
Notifications
You must be signed in to change notification settings - Fork 153
Description
Problem
When an internal HAMT shard block is requested by CID, the gateway renders a directory listing with correct filenames but every link 404s.
Example: bafybeihn2f7lhumh4grizksi2fl233cyszqadkn424ptjajfenykpsaiw4 is a HAMT directory. bafybeig3pmmx75s7ngcgfrqiw7is67dxrzrrfltpjazaei4h2mhh4q3hqe is one of its internal shard nodes.
/ipfs/bafybeig3pmmx75s7ngcgfrqiw7is67dxrzrrfltpjazaei4h2mhh4q3hqe/-- renders directory listing, filenames look fine/ipfs/bafybeig3pmmx75s7ngcgfrqiw7is67dxrzrrfltpjazaei4h2mhh4q3hqe/Trailing_arm-- 404
HAMT lookup consumes log2(fanout) hash bits per trie level, always starting at bit 0. Correct at the root, wrong at an internal shard which expects a deeper offset. The UnixFS spec stores no depth field, so a shard node cannot know its own level.
The listing works because MapIterator walks links sequentially. The lookup (LookupByString in go-unixfsnode) uses hash navigation and picks the wrong bucket.
Prior art
ipfs/service-worker-gateway#1014 fixes this in the JS stack by retrying with translateHAMTPath: false on 404. The Go stack has no equivalent toggle -- path resolution goes through the IPLD ADL layer in go-unixfsnode where HAMT reification is automatic.
The server-side Go stack also has a stricter risk profile (DoS vectors matter more than in a browser service worker).
Possible solutions
A. Detect internal shard, render raw HAMT links
Detect via hash prefix alignment check, then render raw block links (NNfilename with CID-based hrefs) instead of a normal directory listing.
- Pro: zero broken links, honest UX
- Con: CID-based links need care on subdomain gateways; template and plumbing changes in
boxo/gateway
B. Non-recursive fallback scan in LookupByString
On lookup failure, scan value links in the current node by name (no sub-shard loading).
- Pro: fixes all callers (gateway, CLI, IPLD traversal), not just HTML
- Con: ~5-10% of entries in sub-shards are still not found -- partially broken is worse than clearly different; requires
go-unixfsnodechanges
C. Recursive fallback scan in LookupByString
Same as B, but recurse into sub-shard nodes.
- Pro: finds 100% of entries, fixes all callers
- Con:
loadChildtriggers bitswap fetches -- a 404 on a root shard could fetch the entire HAMT tree; requiresgo-unixfsnodechanges
D. Detect internal shard, return error
Detect via hash prefix alignment, return HTTP 501 with diagnostic link (e.g., explore.ipld.io).
- Pro: simplest, no broken links, no template changes, contained in
boxo/gateway+boxo/ipld/unixfs/hamt - Con: user sees an error instead of any listing
Recommendation
Unsure.
D is the safest starting point. Internal shard blocks have no standalone semantic meaning and detection is reliable (hash prefix alignment check with one-block fallback for all-sub-shard nodes). Implemented in #1127.
B/C are ruled out: B leaves links broken, C might introduce complexity / DoS risk via unbounded block fetching if not careful. A could be a future enhancement over D if there's demand for browsing raw HAMT structure.
Thought?