Walkable-v8: force-rewrite v7 buckets to v8 on first master-up load (closes lazy-migration gap)

## Summary

Buckets written before walkable-v8 (pre-v0.6) are stuck with `PointerWire::Link(StorageKey)` pointers in their HAMT internal nodes. After upgrading to a walkable-v8-enabled SDK (current default: `walkable_v8_writer_enabled = true` since 0.6.1 / #89), those legacy buckets stay v7 until each shard happens to be touched by a real write. Per the v0.6.1 release notes: *"Lazy migration is per-shard, not per-bucket"*. For users with many existing buckets and infrequent writes, this means **offline-walkability never engages on their existing data**.

This issue proposes a transparent, one-shot per-bucket force-rewrite that fires on the first master-up load after the SDK upgrade — closing the lazy-migration gap without requiring users to write to every shard manually.

## Evidence

A real user device on `fula_client 0.5.2` (the published version that includes #8 fix #3 + walkable-v8 writer default-on) shows the failure pattern when the master endpoint is mutated to a non-resolvable URL (deliberate offline-simulation):

**Old bucket (`images`) — fails offline:**

```
Forest loaded for bucket: images
listObjects(images, prefix="") error: AnyhowException(Encryption error:
  storage backend error: HTTP error: error sending request for url
  (https://<masked-bogus-master>/images/__fula_forest_v7_nodes/bef00324f310ac3c032a4b94b9779c6af865a7a1854d))
```

The path `__fula_forest_v7_nodes/<storage_key>` is the v7 layout. The HAMT walker is fetching an internal node by raw storage_key against master because the parent pointer is `Link(StorageKey)`, not `LinkV2 { storage_key, cid }`. No CID hint → no gateway-race fallback path engages.

**New bucket (`walkable-v8-test-…`) — works offline on the same device, same session:**

```
Forest loaded for bucket: walkable-v8-test-1778428540
listObjects(walkable-v8-test-1778428540, prefix=""): 5 files (raw forest=5)
```

Same SDK, same bogus master URL. Only difference: the manifest has `LinkV2` stamps everywhere because it was created from a v8 writer.

## Acceptance criteria

A regression test in `crates/fula-client/tests/` that:

1. Creates a bucket entirely under v7 writer (no `LinkV2` stamps).
2. Verifies that `list_files_from_forest(bucket)` against a DNS-failing master returns Err.
3. Applies the proposed migration (single SDK call, no operator intervention).
4. Re-runs `list_files_from_forest(bucket)` against the same DNS-failing master, asserts it returns the expected file list.
5. Verifies the post-migration manifest's `page_index` entries all have `cid: Some(_)` (i.e., LinkV2 cascade fully fired).

## Proposed mechanism (minimal spec)

**Trigger.** Inside `load_forest_internal` after the manifest is decoded, scan `manifest_snapshot.root.page_index`. If ANY entry has `cid: None`, the bucket has un-migrated v7 pages. Set a "needs migration" flag on the loaded forest cache entry.

**Marker.** Persist a small flag in the existing `BlockCache::METADATA` table keyed `migrate_to_walkable_v8/v1/<bucket_lookup_h_hex>` → 1-byte `0x01`. Set after a successful migration flush. Checked on next load to short-circuit.

**Mechanism.** When the flag is set and the marker is absent:
1. Iterate every shard in the loaded forest. Mark its HAMT root + every reachable internal node as dirty. The existing dirty-tracker (`fula-crypto::wnfs_hamt::Node::dirty`) is the only seam needed; flushing a dirty node re-encodes it under the v8 writer (since `walkable_v8_writer_enabled = true`).
2. Mark every `manifest_snapshot.root.page_index[*].cid` entry as needing a re-stamp (or just mark the page dirty — same effect via Phase 1.5).
3. Call the existing `save_sharded_hamt_forest` path. The Phase 1.5 / 1.6 / 2 cascade re-encodes pages, writes them, etag-self-verifies the new CIDs, stamps them into the new `ManifestRoot`, and commits via Phase 2's `If-Match` CAS.
4. On successful Phase 2 commit, write the migration marker.

**Atomicity.** Phase 2 root commit uses `If-Match` on the prior etag. The migration either fully commits a new v8 root OR fails cleanly and the legacy v7 root stays live. Mid-flight crashes leave the old root + some orphan v8 blobs — same cleanup behavior as any other failed Phase 2 commit (no corruption).

**One-shot per bucket per device.** The marker prevents re-running. If two devices concurrently load + migrate the same bucket, one wins the Phase 2 CAS, the other gets `ConcurrentModification` and just observes the already-v8 state on retry.

**No effect when bucket is already v8.** The page_index scan short-circuits cheaply (single integer-equality check per page). Buckets where every page already has `cid: Some(_)` skip the migration path entirely. Zero overhead on healthy buckets.

**No effect when master is unreachable.** Migration requires Phase 2 PUT to commit. If master is down at load time, the scan still detects "needs migration" but the actual flush is deferred until next master-up load. The marker is only set after a successful commit, so retry is automatic.

## Bounded write cost

Per the existing W.8.4 analysis, a fully-rewritten manifest costs ≤ 5% more bytes than its v7 predecessor (the `LinkV2` variant adds 22 bytes per pointer × pointer count). For a bucket with 16 shards, ~80 pages, ~500 internal HAMT nodes (typical user), that's ~50 KB of additional traffic for the migration commit. One-time per bucket per device. Negligible.

## What this does NOT do

- **No chunk-level migration.** Chunks already have content-addressed keys; their CIDs are independent of walkable-v8. Already covered by issue #8 fix #3 (BLOCKS warm-on-write).
- **No retroactive offline-walking of unmigrated buckets.** Until a user does a master-up load that fires the migration, their bucket stays v7-only. The migration is opt-in by SDK upgrade + first reconnect.
- **No cross-device coordination.** Each device migrates independently the first time it observes the v7 state. If device A migrates and device B reads after, device B sees the v8 root and short-circuits the scan immediately.

## Why this is safe to ship default-on

- The migration is **additive** — it only RE-WRITES existing data under a stricter format. It does not change any bucket's logical contents.
- The migration is **observable** — a `tracing::info!` line on every fire lets operators measure adoption + diagnose stuck buckets.
- The migration is **rollback-safe** — if the migration fix is reverted in a future SDK, the v8-stamped pointers are still readable by every v0.6+ SDK. The marker just becomes stale (gets skipped on next load), and the bucket continues working under whichever wire format the writer emits.

## Implementation plan

1. Add `migrate_v7_to_v8_if_needed(bucket)` helper on `EncryptedClient` (~80 LOC).
2. Wire trigger inside `load_forest_internal` post-decode (~10 LOC).
3. Add marker get/set methods on `BlockCache` mirroring the existing `store_users_index_state` pattern (~40 LOC).
4. Add `walkable_v8_migrate_v7_bucket` integration test in `crates/fula-client/tests/` (~200 LOC).
5. Cross-platform alignment check: fula-flutter + fula-js no-op (this is a Rust-internal path), wasm32 compile-clean.

Total: ~330 LOC + test.

## Acceptance test (gold standard)

Filed as part of this PR. End-to-end on the user's real `images` bucket on `s3.cloud.fx.land`:

1. Setup: assume the user's `images` bucket exists and was originally written under pre-v0.6 SDK (verified by the v7-nodes-URL failure mode above).
2. Pre-fix: spin a fresh `EncryptedClient`, set endpoint to a non-resolvable URL, attempt `list_files_from_forest("images")`. MUST return Err with the v7-nodes URL in the error chain.
3. Apply fix: spin up an `EncryptedClient` against the REAL master, call the migration trigger (initially manual via a test-only entry point; in production it fires on first load).
4. Verify post-migration: spin a third `EncryptedClient`, set endpoint back to bogus URL, repeat `list_files_from_forest("images")`. MUST succeed.
5. Verify manifest is v8: inspect the post-migration manifest's `page_index`, assert every entry has `cid: Some(_)`.

## Out of scope / future work

- **Forced re-migration after a wire-format upgrade past v8.** When v9 lands, a similar marker-and-scan pattern will be needed. The marker version (`v1` suffix) is so future migrations don't conflict.
- **Migration progress reporting.** Users with very large buckets may want a UI indicator. Out of scope for v1; can be added via a future Phase 19 transparency surface.
- **Per-shard parallelism.** The current dirty-flag-and-flush approach re-flushes shards sequentially. Could be parallelized, but adds complexity. Defer until measured to be a bottleneck.

## Related

- Issue #8 (BLOCKS warm-on-write — shipped in 0.5.2). Necessary but not sufficient; this issue closes the remaining gap for buckets written before walkable-v8.
- Memory: `project_walkable_v8_default_on.md` documents the "lazy migration per-shard" acceptance trade-off that this issue revisits.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Walkable-v8: force-rewrite v7 buckets to v8 on first master-up load (closes lazy-migration gap) #10

Summary

Evidence

Acceptance criteria

Proposed mechanism (minimal spec)

Bounded write cost

What this does NOT do

Why this is safe to ship default-on

Implementation plan

Acceptance test (gold standard)

Out of scope / future work

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Walkable-v8: force-rewrite v7 buckets to v8 on first master-up load (closes lazy-migration gap) #10

Description

Summary

Evidence

Acceptance criteria

Proposed mechanism (minimal spec)

Bounded write cost

What this does NOT do

Why this is safe to ship default-on

Implementation plan

Acceptance test (gold standard)

Out of scope / future work

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions