feat: multi-hop contract WASM migration via legacy_contracts.toml#5
feat: multi-hop contract WASM migration via legacy_contracts.toml#5
Conversation
When Delta republishes with a new `site_contract.wasm`, every site's contract_key changes because `contract_key = BLAKE3(BLAKE3(wasm) || params)`. For sites whose delegate-stored `KnownSiteRecord.contract_key_b58` is persisted, the stored key is authoritative and the UI migrates state automatically. But records restored from delegates older than b82d3bc have `contract_key_b58 = None` and fell through to a hardcoded one-hop `OLD_WASM_HASH` constant that had silently rotted across releases — pointing at `1188d108…` (commit 2e664c3) while the actual previous release shipped `b92da83d…`. Users of that previous release were stranded on a permanent "Loading..." screen after the V7 republish because the single fallback hash didn't match where their state actually lived, and the OLD_WASM_HASH constant had no automation to keep it accurate. This is the same class of bug as the delegate WASM migration issue already solved by `legacy_delegates.toml`, but for the contract side. Introduce `legacy_contracts.toml` as the single source of truth for previous contract WASM hashes, mirroring `legacy_delegates.toml`: - `ui/build.rs` parses it and emits a generated `LEGACY_CONTRACT_HASHES: &[[u8; 32]]` const, consumed via `include!(concat!(env!("OUT_DIR"), "/legacy_contracts.rs"))`. - `contract_id_for_prefix_with_hash(prefix, hash)` computes the ContractInstanceId for any (prefix, WASM hash) pair — the single piece of logic that governs contract-key derivation, now pure and unit-tested. - `legacy_contract_ids_for_prefix(prefix, current)` builds the migration probe set, filters out the current key, and de-duplicates. - `fire_legacy_contract_migrations(prefix, current_b58)` registers a `PENDING_MIGRATIONS` entry and issues a GET for every historical hash. The first response carrying state wins. - `clear_pending_migrations_for_prefix` cancels still-in-flight probes for a prefix after one completes, so a slower response from an older hash cannot race ahead and overwrite freshly-migrated state. - `restore_known_sites` now always issues a GET for the current key, plus a probe for the stored-but-stale `contract_key_b58` (if any), plus the generic legacy sweep. The NotFound handler no longer eagerly retries the current key on every legacy-probe miss, since the current-key GET is already in flight. Release workflow automation mirrors the delegate side: - `scripts/add-contract-migration.sh VERSION "DESCRIPTION"` captures the currently-committed `site_contract.wasm` BLAKE3 before rebuild. Run BEFORE touching `common/` or the contract. - `scripts/check-migration.sh` is extended: if the contract WASM changed since git HEAD, the previous hash MUST be present in `legacy_contracts.toml` or the script fails the preflight. This turns "forgot to record the old hash" into a loud publish-time error instead of a silent post-release "Loading..." incident. - `AGENTS.md` upgrade workflow documents the new step. - `contract_id_is_deterministic_and_depends_on_both_hash_and_prefix`: different WASM hashes produce different keys, same inputs are deterministic, different prefixes differ under the same hash. - `legacy_ids_are_deduplicated_and_exclude_current`: when one legacy hash happens to compute to the "current" key, it's filtered out of the probe set; the returned set has no duplicates. - `legacy_contract_hashes_table_is_populated`: guards against a silently-empty `legacy_contracts.rs` — without at least one entry, users of the immediately-preceding release have no fallback. All existing tests continue to pass (delta-core 19/19, delta-ui 6/6, site-delegate 4/4, site-contract 0/0). `legacy_contracts.toml` seeds with two entries: - C1 = `1188d108…` — pre-tombstone WASM (commit 2e664c3), inherited from the previous `OLD_WASM_HASH` constant. - C2 = `b92da83d…` — pre-V7 WASM (f5ecff5), the hash of the release immediately before the per-prefix export signing-key fix. This is the hash where today's affected users' state actually lives. Contract and delegate WASMs are byte-identical to main; this PR is pure UI logic. [AI-assisted - Claude]
Review findings on the multi-hop contract WASM migration PR:
**H1/H2 (high severity): late-response overwrite race.** The original
cancellation mechanism only removed entries from `PENDING_MIGRATIONS`,
but any GET response already in flight when cancellation fired would
take the non-migration branch of `handle_contract_response` and
last-write-wins over the freshly-captured state via `handle_site_state`.
A legacy-hash probe returning older state after a successful
current-key GET could silently clobber fresh data.
Fix: introduce a `MIGRATING_PREFIXES: BTreeSet<String>` populated by
`restore_known_sites` for each site entering its initial-capture
window, and an explicit `classify_get_response` state machine that
routes each incoming GET into one of four branches:
- `PendingMigration { prefix }` — legacy/stale-key probe response;
process if non-empty AND prefix still migrating, drop otherwise.
- `InitialCurrentKey { prefix }` — current-key response while still
capturing; process if non-empty, cancel siblings, exit the
migration window.
- `LiveUpdate` — prefix already captured or steady-state; process
normally as an `UpdateNotification`-equivalent.
- `Unknown` — unrecognized key; process as live update.
The `finalize_prefix_capture` helper removes the prefix from
`MIGRATING_PREFIXES` AND clears all `PENDING_MIGRATIONS` entries for
it atomically, so late responses land in the `LiveUpdate` branch but
are dropped from the migration PUT path.
**M2: startup thundering herd.** `fire_legacy_contract_migrations`
now runs only when the stored `contract_key_b58` is missing or
stale. In the steady-state case where the delegate's stored key
matches the current WASM, the site was created under the current
contract and no earlier WASM can have state for it — the sweep was
pure waste. Drops the startup-load cost from N × M GETs to just the
stored-key GET for up-to-date sites.
**M4 / cross-consistency test.** Added
`contract_id_matches_state_key_derivation_for_current_wasm` which
computes the current WASM's hash and asserts that
`contract_id_for_prefix_with_hash` agrees with the production
`state::contract_key_from_prefix` path. Guards against a
backwards-incompatible change to freenet-stdlib's
`ContractKey::from_params_and_code` silently breaking legacy probes.
**L1: test gap for the race logic.** Added five unit tests of the
pure `classify_get_response` state machine covering every branch:
pending-migration routing, current-key-during-capture, post-capture
live update, unknown key, and the tiebreaker where a key is in both
PENDING_MIGRATIONS and its prefix is in MIGRATING_PREFIXES (the
pending branch must win so the migration PUT runs).
**Code-first concern 6: `add-contract-migration.sh` race.** The
script now always hashes HEAD's tracked contract WASM via
`git show HEAD:...`, not the working-tree WASM. A developer who
accidentally ran `sync-wasm.sh` before recording the migration would
previously have recorded the *new* hash, silently defeating the
mechanism. The script now warns if the working tree and HEAD differ
and always records the HEAD hash.
**Rebase onto main.** Picked up the reproducible-WASM fix (#4) and
resolved the check-migration.sh conflict to use the new reproducible
`scripts/build-wasm.sh` wrapper for both the delegate and the
contract build-vs-committed verification. `legacy_contracts.toml`
gains a C3 entry for `53e3395f…`, the V7 contract hash shipped
immediately before reproducible builds landed — this is the hash
where the user's stranded site state currently lives on the network.
Tests: delta-core 19/19, delta-ui 12/12 (5 new classifier tests +
1 cross-consistency test + 3 legacy_contract tests + 3 export_key
tests), site-delegate 4/4. Clippy clean.
[AI-assisted - Claude]
31bb9f2 to
02c5f5e
Compare
|
Addressed review feedback in 02c5f5e (force-pushed after rebase onto main for the reproducible-WASM fix). Code-first / Skeptical H1+H2+H3 (critical race): late-response overwrites. Before, cancellation only removed entries from Fix: introduced Skeptical M2 (thundering herd): Skeptical M4 (derivation symmetry): added Test gap L1: extracted Code-first concern 6: Skeptical M3 (C1 ghost hash): verified historically — Rebase: picked up #4's reproducible-WASM fix. Not addressed (minor, deferred):
Tests: delta-core 19/19, delta-ui 12/12, site-delegate 4/4. Clippy clean. [AI-assisted - Claude] |
Problem
When Delta republishes with a new
site_contract.wasm, every site's contract_key changes becausecontract_key = BLAKE3(BLAKE3(wasm) || params). Sites with a persistedKnownSiteRecord.contract_key_b58migrate fine, but records restored from delegates older than b82d3bc havecontract_key_b58 = Noneand were falling through to a hardcoded one-hopOLD_WASM_HASHconstant. That constant had silently rotted across releases — pointing at1188d108…(commit 2e664c3) while the actually-previous release shippedb92da83d…— so users of the previous release were stranded on a permanent "Loading..." screen after the V7 republish.See also: 356e6b6 which updated
OLD_WASM_HASHas an emergency hotfix. This PR replaces the single-constant mechanism with a proper multi-hop registry, on the same pattern aslegacy_delegates.toml.Approach
legacy_contracts.toml— single source of truth for previous contract WASM hashes.ui/build.rs— parses it and emitsLEGACY_CONTRACT_HASHES: &[[u8; 32]].contract_id_for_prefix_with_hash— pure, unit-tested function governing contract-key derivation for any (prefix, hash) pair.fire_legacy_contract_migrations— fires a migration GET for every historical hash at startup. Uses the existingPENDING_MIGRATIONSmap to correlate responses.clear_pending_migrations_for_prefix— on successful migration, cancels other in-flight probes for that prefix so a slower response from an older hash can't race ahead and overwrite fresh state.restore_known_sites— now always issues current-key GET + stored-but-stale probe + generic legacy sweep. The NotFound handler no longer eagerly retries the current key on every legacy-probe miss.scripts/add-contract-migration.sh— mirrorsscripts/add-migration.sh. Run before touchingcommon/or the contract.scripts/check-migration.sh— extended to enforce the contract-side recording. If the contract WASM changed since HEAD and the previous hash is not inlegacy_contracts.toml, preflight fails loudly instead of silently stranding users.AGENTS.md— updated upgrade workflow.Tests
contract_id_is_deterministic_and_depends_on_both_hash_and_prefixlegacy_ids_are_deduplicated_and_exclude_currentlegacy_contract_hashes_table_is_populated— guards against an empty generated file.All existing tests continue to pass: delta-core 19/19, delta-ui 6/6, site-delegate 4/4.
Initial entries
C1 = 1188d108…— pre-tombstone WASM (commit 2e664c3).C2 = b92da83d…— pre-V7 WASM (f5ecff5), where today's stranded users' state actually lives.WASMs
Contract and delegate WASMs are byte-identical to main. This PR is pure UI logic + scripts + registry.
[AI-assisted - Claude]