Offline reads break after online write: BLOCKS cache not warmed by writer + 3 collateral SDK gaps

## Reproducer (FxFiles, real device)

User flow with `walkableV8WriterEnabled: true` + Phase 2.4 + Phase 3.3 all configured:

1. **Offline** (S3 endpoint mutated to non-resolvable `s33.cloud.fx.land` to simulate master-down): open bucket `images` → succeeds via warm cache.
2. **Online** (real `s3.cloud.fx.land`): upload `IMG-20260509-WA0056(1).jpg` to `images`. Etag returned: `bafkr4ibeguudvn5bpgdmgu5fciupmu2rydjpyjm4x47halvotdth6zjpti`.
3. **Online**: `listObjects(images, prefix=""): 29 files (raw forest=29)` — succeeds.
4. **Offline** (back to `s33`): `listObjects(images)` → **FAILS** with:
   ```
   Forest load for images failed: AnyhowException(Download failed:
     failed to fetch manifest page 0 for bucket images:
     Master unreachable (health gate; down for ~7s))
   ```

Same failure on `face-metadata`, `fula-metadata`, `images` — every bucket that was written to during the online window.

## Server side is fine

Server logs from `cloud.fx.land` for the same window show:
- Phase 2 root commits succeed.
- ipfs-cluster pins every CID (`CID pinned successfully cid=bafkr4ie2vudjzb...`).
- IPNS publisher tick succeeds (`sequence=144, ipns_name=k51qzi5uqu5dkkd6tv8slgoouzzs505qdcr4cb5egc9rlx7qwq0e794yxj9cg4`).
- `Populated forest_manifest_cid (v0.4.4) bucket=face-metadata cid=...` per write.

The breakers are all in `fula-client`.

## Four independent breakers (Cluster A — offline-walkability)

### Breaker 1 (PRIMARY) — BLOCKS cache not warmed by write path

`crates/fula-client/src/encryption.rs` flush path (`save_sharded_hamt_forest`, Phase 1.5 page commits, Phase 1.6 dir-index commits, Phase 2 root commits) writes bytes to master via `S3BlobBackend::put` but **never calls `cache.put(&cid, &bytes)` on the local BLOCKS table**. Verified by grep: zero `cache.put` calls in the write path; all `cache.put` calls live in READ wrappers (`client.rs:675, 775, 874` and `encryption.rs:3589` cold-start).

Consequence: the only way a write's bytes land in BLOCKS is if a subsequent master-up READ for the same CID happens. In the user's flow, that READ did happen (`listObjects(images): 29 files`). It should have populated BLOCKS via the cid-hint variant of `get_object_with_offline_fallback_known_cid`. Then offline read should serve from BLOCKS without master OR gateway.

The fact that offline-after-online-list STILL fails suggests either (a) the cache was re-opened against a different file across `reinitializeFulaClient` and lost data, (b) the cache's `cache.get(cid)` returns None for the post-upload CIDs because the cache file lock was held by the prior `EncryptedClient`, or (c) the in-memory forest cache served stale page_ref.cid values that don't match what BLOCKS has. The fix below pre-warms BLOCKS on every write so no read-after-write step is required.

### Breaker 2 — `list_buckets` has zero offline fallback

`crates/fula-client/src/client.rs:282-286`:

```rust
pub async fn list_buckets(&self) -> Result<ListBucketsResult> {
    let response = self.request("GET", "/", None, None, None).await?;
    let text = response.text().await?;
    parse_list_buckets_response(&text)
}
```

No health-gate check, no cache, no gateway race, no cold-start. DNS error / master-down propagates raw. FxFiles compensates with its own `listBucketsCached` shim — but every other consumer of the SDK has to invent the same workaround.

### Breaker 3 — Cloudflare dead in default gateway list (top priority slot)

`crates/fula-client/src/gateway_fetch.rs:82-91` puts `cloudflare-ipfs.com/ipfs/{cid}` at slot 0. Cloudflare retired the public IPFS gateway in 2024/2025. The race tries top-3, so Cloudflare consumes a slot returning fast errors. Effective gateway race is 2 alive (`dweb.link`, `ipfs.io`) instead of 3.

### Breaker 4 — Silent `block_cache=None → gateway_pool=None` cascade

`crates/fula-client/src/client.rs:99-132`:

```rust
let block_cache = if config.block_cache_enabled {
    match build_block_cache(&config) {
        Ok(cache) => Some(Arc::new(cache)),
        Err(e) => {
            warn!(error = %e, "block_cache: failed to open; offline fallback disabled for this session");
            None
        }
    }
} else { None };

let gateway_pool = if config.gateway_fallback_enabled && block_cache.is_some() {
    ...
} else { None };
```

If the redb cache file is briefly locked (e.g., by a prior `EncryptedClient` instance during `reinitializeFulaClient` before its Arc is dropped), the new client gets `BlockCacheError::AlreadyOpen`, `block_cache` becomes `None`, and **gateway_pool is also disabled** as a consequence. The entire offline fallback path is dead for the session — with only a `warn!` line that may not surface in flutter logs.

The cascade is too brittle. The gateway race doesn't actually need the block_cache to work — only the `(bucket,key)→cid` mapping does. The cid-hint variant of the wrapper has a CID directly. Decouple.

## Out of scope here (Cluster B — separate issue)

```
Forest load for tag-metadata failed: AnyhowException(Encryption error: serialization error: expected value at line 1 column 1)
Forest load for website-metadata failed: ... (same)
Forest load for nft-metadata failed: ... (same)
```

These are `serde_json` errors — `expected value at line 1 column 1` is the classic "unexpected leading byte" pattern. tag/website/nft buckets are FxFiles JSON-keyed singletons (`.fula/tags/<userId>.json`). Either the SDK is returning empty bytes (which JSON-parses to that error) or FxFiles is mis-handling a typed SDK error as a body. Different code path; will file separately after triage.

## Proposed fixes (single PR, four files, ~120 LOC total)

1. **`encryption.rs`**: after every Phase 1.5 / 1.6 / 2 write in the cascade, call `cache.put(&cid, &bytes)` for the just-written bytes. Best-effort (a `BlockTooLarge` error doesn't fail the write).
2. **`client.rs::list_buckets`**: cache the response body keyed by a deterministic per-user key, route through health-gate + offline-fallback. On master-down, serve cached `ListBucketsResult` if present.
3. **`gateway_fetch.rs::default_gateway_urls`**: drop `cloudflare-ipfs.com`, re-order to `dweb.link`, `ipfs.io`, `trustless-gateway.link`, `4everland.io`, `gateway.pinata.cloud`, plus one IPFS-only fallback added at the end. 5 working gateways > 6 with a dead one.
4. **`client.rs::new`**: when `block_cache` fails to open, leave `gateway_pool` enabled for cid-hint fetches. Decouple the gate.

## Test plan

Integration test in `crates/fula-client/tests/offline_e2e.rs` that exactly mirrors the FxFiles flow:
- Wiremock master, real `BlockCache` + stubbed `GatewayPool` (returning failures).
- Online write a file → online list (warms cache) → tear master down → offline list → MUST succeed solely from BLOCKS.

Test added pre-fix to demonstrate the failure; passes post-fix.

## Verification

- All four fixes pass dual-reviewer audit + advisor signoff per repo's standard discipline for SDK changes.
- Cross-platform alignment verified: fula-flutter FRB bindings, fula-js wasm-bindgen surface, wasm gating.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline reads break after online write: BLOCKS cache not warmed by writer + 3 collateral SDK gaps #8

Reproducer (FxFiles, real device)

Server side is fine

Four independent breakers (Cluster A — offline-walkability)

Breaker 1 (PRIMARY) — BLOCKS cache not warmed by write path

Breaker 2 — `list_buckets` has zero offline fallback

Breaker 3 — Cloudflare dead in default gateway list (top priority slot)

Breaker 4 — Silent `block_cache=None → gateway_pool=None` cascade

Out of scope here (Cluster B — separate issue)

Proposed fixes (single PR, four files, ~120 LOC total)

Test plan

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Offline reads break after online write: BLOCKS cache not warmed by writer + 3 collateral SDK gaps #8

Description

Reproducer (FxFiles, real device)

Server side is fine

Four independent breakers (Cluster A — offline-walkability)

Breaker 1 (PRIMARY) — BLOCKS cache not warmed by write path

Breaker 2 — list_buckets has zero offline fallback

Breaker 3 — Cloudflare dead in default gateway list (top priority slot)

Breaker 4 — Silent block_cache=None → gateway_pool=None cascade

Out of scope here (Cluster B — separate issue)

Proposed fixes (single PR, four files, ~120 LOC total)

Test plan

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Breaker 2 — `list_buckets` has zero offline fallback

Breaker 4 — Silent `block_cache=None → gateway_pool=None` cascade