Reproducer (FxFiles, real device)
User flow with walkableV8WriterEnabled: true + Phase 2.4 + Phase 3.3 all configured:
- Offline (S3 endpoint mutated to non-resolvable
s33.cloud.fx.land to simulate master-down): open bucket images → succeeds via warm cache.
- Online (real
s3.cloud.fx.land): upload IMG-20260509-WA0056(1).jpg to images. Etag returned: bafkr4ibeguudvn5bpgdmgu5fciupmu2rydjpyjm4x47halvotdth6zjpti.
- Online:
listObjects(images, prefix=""): 29 files (raw forest=29) — succeeds.
- Offline (back to
s33): listObjects(images) → FAILS with:
Forest load for images failed: AnyhowException(Download failed:
failed to fetch manifest page 0 for bucket images:
Master unreachable (health gate; down for ~7s))
Same failure on face-metadata, fula-metadata, images — every bucket that was written to during the online window.
Server side is fine
Server logs from cloud.fx.land for the same window show:
- Phase 2 root commits succeed.
- ipfs-cluster pins every CID (
CID pinned successfully cid=bafkr4ie2vudjzb...).
- IPNS publisher tick succeeds (
sequence=144, ipns_name=k51qzi5uqu5dkkd6tv8slgoouzzs505qdcr4cb5egc9rlx7qwq0e794yxj9cg4).
Populated forest_manifest_cid (v0.4.4) bucket=face-metadata cid=... per write.
The breakers are all in fula-client.
Four independent breakers (Cluster A — offline-walkability)
Breaker 1 (PRIMARY) — BLOCKS cache not warmed by write path
crates/fula-client/src/encryption.rs flush path (save_sharded_hamt_forest, Phase 1.5 page commits, Phase 1.6 dir-index commits, Phase 2 root commits) writes bytes to master via S3BlobBackend::put but never calls cache.put(&cid, &bytes) on the local BLOCKS table. Verified by grep: zero cache.put calls in the write path; all cache.put calls live in READ wrappers (client.rs:675, 775, 874 and encryption.rs:3589 cold-start).
Consequence: the only way a write's bytes land in BLOCKS is if a subsequent master-up READ for the same CID happens. In the user's flow, that READ did happen (listObjects(images): 29 files). It should have populated BLOCKS via the cid-hint variant of get_object_with_offline_fallback_known_cid. Then offline read should serve from BLOCKS without master OR gateway.
The fact that offline-after-online-list STILL fails suggests either (a) the cache was re-opened against a different file across reinitializeFulaClient and lost data, (b) the cache's cache.get(cid) returns None for the post-upload CIDs because the cache file lock was held by the prior EncryptedClient, or (c) the in-memory forest cache served stale page_ref.cid values that don't match what BLOCKS has. The fix below pre-warms BLOCKS on every write so no read-after-write step is required.
Breaker 2 — list_buckets has zero offline fallback
crates/fula-client/src/client.rs:282-286:
pub async fn list_buckets(&self) -> Result<ListBucketsResult> {
let response = self.request("GET", "/", None, None, None).await?;
let text = response.text().await?;
parse_list_buckets_response(&text)
}
No health-gate check, no cache, no gateway race, no cold-start. DNS error / master-down propagates raw. FxFiles compensates with its own listBucketsCached shim — but every other consumer of the SDK has to invent the same workaround.
Breaker 3 — Cloudflare dead in default gateway list (top priority slot)
crates/fula-client/src/gateway_fetch.rs:82-91 puts cloudflare-ipfs.com/ipfs/{cid} at slot 0. Cloudflare retired the public IPFS gateway in 2024/2025. The race tries top-3, so Cloudflare consumes a slot returning fast errors. Effective gateway race is 2 alive (dweb.link, ipfs.io) instead of 3.
Breaker 4 — Silent block_cache=None → gateway_pool=None cascade
crates/fula-client/src/client.rs:99-132:
let block_cache = if config.block_cache_enabled {
match build_block_cache(&config) {
Ok(cache) => Some(Arc::new(cache)),
Err(e) => {
warn!(error = %e, "block_cache: failed to open; offline fallback disabled for this session");
None
}
}
} else { None };
let gateway_pool = if config.gateway_fallback_enabled && block_cache.is_some() {
...
} else { None };
If the redb cache file is briefly locked (e.g., by a prior EncryptedClient instance during reinitializeFulaClient before its Arc is dropped), the new client gets BlockCacheError::AlreadyOpen, block_cache becomes None, and gateway_pool is also disabled as a consequence. The entire offline fallback path is dead for the session — with only a warn! line that may not surface in flutter logs.
The cascade is too brittle. The gateway race doesn't actually need the block_cache to work — only the (bucket,key)→cid mapping does. The cid-hint variant of the wrapper has a CID directly. Decouple.
Out of scope here (Cluster B — separate issue)
Forest load for tag-metadata failed: AnyhowException(Encryption error: serialization error: expected value at line 1 column 1)
Forest load for website-metadata failed: ... (same)
Forest load for nft-metadata failed: ... (same)
These are serde_json errors — expected value at line 1 column 1 is the classic "unexpected leading byte" pattern. tag/website/nft buckets are FxFiles JSON-keyed singletons (.fula/tags/<userId>.json). Either the SDK is returning empty bytes (which JSON-parses to that error) or FxFiles is mis-handling a typed SDK error as a body. Different code path; will file separately after triage.
Proposed fixes (single PR, four files, ~120 LOC total)
encryption.rs: after every Phase 1.5 / 1.6 / 2 write in the cascade, call cache.put(&cid, &bytes) for the just-written bytes. Best-effort (a BlockTooLarge error doesn't fail the write).
client.rs::list_buckets: cache the response body keyed by a deterministic per-user key, route through health-gate + offline-fallback. On master-down, serve cached ListBucketsResult if present.
gateway_fetch.rs::default_gateway_urls: drop cloudflare-ipfs.com, re-order to dweb.link, ipfs.io, trustless-gateway.link, 4everland.io, gateway.pinata.cloud, plus one IPFS-only fallback added at the end. 5 working gateways > 6 with a dead one.
client.rs::new: when block_cache fails to open, leave gateway_pool enabled for cid-hint fetches. Decouple the gate.
Test plan
Integration test in crates/fula-client/tests/offline_e2e.rs that exactly mirrors the FxFiles flow:
- Wiremock master, real
BlockCache + stubbed GatewayPool (returning failures).
- Online write a file → online list (warms cache) → tear master down → offline list → MUST succeed solely from BLOCKS.
Test added pre-fix to demonstrate the failure; passes post-fix.
Verification
- All four fixes pass dual-reviewer audit + advisor signoff per repo's standard discipline for SDK changes.
- Cross-platform alignment verified: fula-flutter FRB bindings, fula-js wasm-bindgen surface, wasm gating.
Reproducer (FxFiles, real device)
User flow with
walkableV8WriterEnabled: true+ Phase 2.4 + Phase 3.3 all configured:s33.cloud.fx.landto simulate master-down): open bucketimages→ succeeds via warm cache.s3.cloud.fx.land): uploadIMG-20260509-WA0056(1).jpgtoimages. Etag returned:bafkr4ibeguudvn5bpgdmgu5fciupmu2rydjpyjm4x47halvotdth6zjpti.listObjects(images, prefix=""): 29 files (raw forest=29)— succeeds.s33):listObjects(images)→ FAILS with:Same failure on
face-metadata,fula-metadata,images— every bucket that was written to during the online window.Server side is fine
Server logs from
cloud.fx.landfor the same window show:CID pinned successfully cid=bafkr4ie2vudjzb...).sequence=144, ipns_name=k51qzi5uqu5dkkd6tv8slgoouzzs505qdcr4cb5egc9rlx7qwq0e794yxj9cg4).Populated forest_manifest_cid (v0.4.4) bucket=face-metadata cid=...per write.The breakers are all in
fula-client.Four independent breakers (Cluster A — offline-walkability)
Breaker 1 (PRIMARY) — BLOCKS cache not warmed by write path
crates/fula-client/src/encryption.rsflush path (save_sharded_hamt_forest, Phase 1.5 page commits, Phase 1.6 dir-index commits, Phase 2 root commits) writes bytes to master viaS3BlobBackend::putbut never callscache.put(&cid, &bytes)on the local BLOCKS table. Verified by grep: zerocache.putcalls in the write path; allcache.putcalls live in READ wrappers (client.rs:675, 775, 874andencryption.rs:3589cold-start).Consequence: the only way a write's bytes land in BLOCKS is if a subsequent master-up READ for the same CID happens. In the user's flow, that READ did happen (
listObjects(images): 29 files). It should have populated BLOCKS via the cid-hint variant ofget_object_with_offline_fallback_known_cid. Then offline read should serve from BLOCKS without master OR gateway.The fact that offline-after-online-list STILL fails suggests either (a) the cache was re-opened against a different file across
reinitializeFulaClientand lost data, (b) the cache'scache.get(cid)returns None for the post-upload CIDs because the cache file lock was held by the priorEncryptedClient, or (c) the in-memory forest cache served stale page_ref.cid values that don't match what BLOCKS has. The fix below pre-warms BLOCKS on every write so no read-after-write step is required.Breaker 2 —
list_bucketshas zero offline fallbackcrates/fula-client/src/client.rs:282-286:No health-gate check, no cache, no gateway race, no cold-start. DNS error / master-down propagates raw. FxFiles compensates with its own
listBucketsCachedshim — but every other consumer of the SDK has to invent the same workaround.Breaker 3 — Cloudflare dead in default gateway list (top priority slot)
crates/fula-client/src/gateway_fetch.rs:82-91putscloudflare-ipfs.com/ipfs/{cid}at slot 0. Cloudflare retired the public IPFS gateway in 2024/2025. The race tries top-3, so Cloudflare consumes a slot returning fast errors. Effective gateway race is 2 alive (dweb.link,ipfs.io) instead of 3.Breaker 4 — Silent
block_cache=None → gateway_pool=Nonecascadecrates/fula-client/src/client.rs:99-132:If the redb cache file is briefly locked (e.g., by a prior
EncryptedClientinstance duringreinitializeFulaClientbefore its Arc is dropped), the new client getsBlockCacheError::AlreadyOpen,block_cachebecomesNone, and gateway_pool is also disabled as a consequence. The entire offline fallback path is dead for the session — with only awarn!line that may not surface in flutter logs.The cascade is too brittle. The gateway race doesn't actually need the block_cache to work — only the
(bucket,key)→cidmapping does. The cid-hint variant of the wrapper has a CID directly. Decouple.Out of scope here (Cluster B — separate issue)
These are
serde_jsonerrors —expected value at line 1 column 1is the classic "unexpected leading byte" pattern. tag/website/nft buckets are FxFiles JSON-keyed singletons (.fula/tags/<userId>.json). Either the SDK is returning empty bytes (which JSON-parses to that error) or FxFiles is mis-handling a typed SDK error as a body. Different code path; will file separately after triage.Proposed fixes (single PR, four files, ~120 LOC total)
encryption.rs: after every Phase 1.5 / 1.6 / 2 write in the cascade, callcache.put(&cid, &bytes)for the just-written bytes. Best-effort (aBlockTooLargeerror doesn't fail the write).client.rs::list_buckets: cache the response body keyed by a deterministic per-user key, route through health-gate + offline-fallback. On master-down, serve cachedListBucketsResultif present.gateway_fetch.rs::default_gateway_urls: dropcloudflare-ipfs.com, re-order todweb.link,ipfs.io,trustless-gateway.link,4everland.io,gateway.pinata.cloud, plus one IPFS-only fallback added at the end. 5 working gateways > 6 with a dead one.client.rs::new: whenblock_cachefails to open, leavegateway_poolenabled for cid-hint fetches. Decouple the gate.Test plan
Integration test in
crates/fula-client/tests/offline_e2e.rsthat exactly mirrors the FxFiles flow:BlockCache+ stubbedGatewayPool(returning failures).Test added pre-fix to demonstrate the failure; passes post-fix.
Verification