Skip to content

fix(private-rooms): UI rotate back-fill + PUT join_event + #110 prune exemption (Bug #3 PR B)#272

Merged
sanity merged 5 commits into
mainfrom
fix/dm-bug-3-pr-b
May 17, 2026
Merged

fix(private-rooms): UI rotate back-fill + PUT join_event + #110 prune exemption (Bug #3 PR B)#272
sanity merged 5 commits into
mainfrom
fix/dm-bug-3-pr-b

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented May 17, 2026

Problem

Ivvor (Matrix 2026-05-17): "First attempt to create and share private room failed at sync stage... Two invitees processed invite and room opened, but no messages shown... Only room owner can send messages, but they don't show up... Owner is on secret #3. Other members are on secret #0."

PR #270 (PR A) relaxed MessagesV1::apply_delta so the user-visible symptom (messages getting dropped) is resolved. PR B closes the three UI / delegate gaps that produced the v3-vs-v0 divergence in the first place.

Approach

Three independent fixes, all with regression tests:

1. UI rotate_secret lacks per-version back-fill (UI-only)

Before this PR the UI's synchronous fast-path (used on ban and manual-rotate) only emitted blobs at new_version. The delegate's build_rotation_encrypted_secrets already back-filled prior versions for members missing them — they're inconsistent.

  • Extract build_rotation_encrypted_secrets into river-core::room_state::secret (gated on ecies feature) so both UI and delegate share byte-identical logic.
  • UI's rotate_secret now produces blobs at every existing version for any current member who lacks a blob at that version.
  • Delegate's wrapper delegates to the shared helper, eliminating duplication.

2. get_response.rs PUT excludes locally-injected join_event (UI-only)

The invitee's get_response synthesises a join_event into their local room_state so post_apply_cleanup doesn't prune them locally. But the original PUT used the PRE-mutation copy — so the owner-side contract never saw the invitee's activity, and the owner's post_apply_cleanup pruned them until the next sync delta landed.

  • Extract a pure build_state_for_put helper that runs synchronously before the defer() block.
  • Inject the member + join_event into the PUT payload directly.

3. Issue #110 prune-fix (contract WASM)

post_apply_cleanup now exempts members for whom the owner has issued an encrypted_secrets entry. The owner's signed blob is proof of membership-intent that pre-dates any authored message; pruning them on an invitee's first state ingestion is the underlying cause of the "DM to inactive member fails" / "newly-invited member silently dropped" pattern.

This is the cleaner architectural fix for issue #110 (Bug #1's PR #269 handled the DM symptom via bundled rejoin-delta; the underlying prune issue is fixed here).

Testing

  • common/src/room_state.rs: test_member_with_encrypted_secret_survives_cleanup pins fix 3 (member with no messages survives cleanup when owner issued them an encrypted_secret).
  • delegates/chat-delegate/src/subscription/tests.rs: existing 4 back-fill tests (backfill_uses_real_v0_recovered_from_owners_encrypted_secret, backfill_handles_multiple_simultaneous_new_members, backfill_dedups_against_state_for_continuing_members, backfill_handles_readmitted_member) now cover the shared helper that both UI and delegate use.
  • ui/src/room_data.rs: ui_rotation_backfills_prior_versions_for_new_member pins fix 1 from the UI side — rotate to v1 with a member who lacks v0 must emit (member, 0), and that blob must decrypt to the real v0 secret.
  • ui/src/components/app/freenet_api/response_handler/get_response.rs: build_state_for_put_includes_synthesised_join_event and build_state_for_put_owner_path_is_noop pin fix 2.

Full test run: 290+ tests pass; UI tests: 132 pass; delegate tests: 24 pass; migration test: 4 pass.

Migration

Migration entry V21 added to legacy_delegates.toml since item 3 changes contract / delegate WASM hash.

[[entry]]
version = "V21"
description = "Before Bug #3 PR B: post_apply_cleanup encrypted_secrets exemption (issue #110)"
date = "2026-05-17"
delegate_key = "9615964a72a194c0d400161072d3d2747ef3a2d5adc04718ab0641b758b45d0d"
code_hash = "1a63414c706954884409f8bf2ec54fdd97740d08ec89ddd45cd9f4e4fa6ba7ef"

References PR #270 (PR A) and Bug #3 (Ivvor 2026-05-17 Matrix).

[AI-assisted - Claude]

sanity and others added 5 commits May 17, 2026 11:28
… exemption (Bug #3 PR B)

PR A (#270) fixed contract validation. PR B closes the three UI / delegate gaps
that produced Bug #3's v3-vs-v0 divergence:

1. UI rotate_secret now back-fills prior versions for members lacking blobs,
   mirroring delegate logic via a shared build_rotation_encrypted_secrets
   helper extracted into river-core.
2. PUT after invitation acceptance now includes the invitee's synthesised
   join_event so owner-side post_apply_cleanup doesn't prune them on first
   ingestion.
3. post_apply_cleanup exempts members with encrypted_secrets entries
   (issue #110). The owner's signed blob is proof of membership-intent
   that pre-dates any authored message.

Migration entry V21 added since item 3 changes contract WASM.

References PR #270 (PR A) and Bug #3 (Ivvor 2026-05-17 Matrix).

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex review of PR #272 caught that `build_state_for_put` and the
deferred ROOMS mutation each signed their OWN join_event with
fresh timestamps and non-deterministic ed25519 signatures, so
`AuthorizedMessageV1::id()` differed between the two — leaving the
room with duplicate "joined" entries after the network state sync
settled.

Refactor `build_state_for_put` to return the synthesised
`AuthorizedMessageV1` alongside the new state, and update the
deferred ROOMS mutation to reuse that same message (cloned).
Both the local state and the PUT payload now carry the
byte-identical join_event.

Extended `build_state_for_put_includes_synthesised_join_event` to
assert MessageId equality between the returned event and the one
embedded in the PUT state; extended the owner-path noop test to
assert `None` is returned.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex review second pass caught: when accepting an invitation for a
room already at `max_members`, `build_state_for_put` was pushing the
invitee onto the full-state PUT, producing `max_members + 1` members.
The contract's `validate_state` rejects that (it goes through
`MembersV1::verify`, which enforces the cap), while the normal delta
path would let `MembersV1::apply_delta`'s `remove_excess_members`
trim deterministically.

Fix: skip the member + join_event injection when the room is at
capacity, falling back to the pre-PR-B PUT-as-retrieved behaviour
for that one case. The natural sync delta then handles trimming or
reports the failure — same as before PR B.

Added `build_state_for_put_respects_max_members` regression test
(room with cap=1 + one existing member, invitee push attempt).
Verifies the returned state has len=1, invitee absent, no
synthesised join_event, and the state still verifies cleanly.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…P2 pass 3)

Codex review pass 3 caught: build_rotation_encrypted_secrets looped
`0..=new_version` for every member, even though
`RoomSecretsV1::apply_delta` does not require contiguous secret
versions. A valid owner-signed state can legitimately jump to a
large `current_version` (e.g. v=1_000_000), and the next rotation
would spend potentially millions of iterations checking versions
that have no recoverable secret — freezing the rotation pipeline
on both UI and delegate paths.

Fix: iterate `prior_secrets` directly (the versions we actually
have secrets for, plus the newly-derived one). Completion time
is now O(members * recovered_versions), independent of the numeric
value of `new_version`.

Regression test `backfill_handles_sparse_high_version_state`
exercises a v0 → v1_000_000 jump and asserts the call returns in
under 5s with exactly the expected (member, version) blob set.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mption TTL, AGENTS docs)

Round-2 review on PR #272 caught two HIGH-severity bugs in the
at-capacity invitation path, an under-documented architectural fix,
and several smaller TTL/test-hygiene items. This commit addresses
all of them.

HIGH-1: at-capacity fallback leaked phantom local state
  Before this commit, when an invitee accepted into a room at
  `max_members`, `build_state_for_put` returned `Ok((state_unchanged,
  None))`. The PUT correctly skipped the member injection (good — the
  contract would reject the PUT otherwise), but the deferred ROOMS
  mutation still pushed `authorized_member` and
  `authorized_member_info` into LOCAL `room_data.room_state`. Result:
  UI shows the user as a member, network never sees the join, next
  sync from the owner silently strips them — with NO user-facing
  signal. This is the exact class of bug PR B was supposed to
  eliminate.

  Fix: `build_state_for_put` now returns `Result<…,
  BuildStateForPutError>`; `RoomAtCapacity` surfaces the failure
  explicitly. The caller short-circuits BEFORE touching ROOMS,
  posting `PendingRoomStatus::Error` and `RoomSyncStatus::Error` so
  the modal can report "room full" cleanly.

  Pinned by `build_state_for_put_errs_at_capacity` and
  `build_state_for_put_input_state_unchanged_on_capacity_error`.

HIGH-2: `build_state_for_put` was not pure
  The doc-comment claimed "pure function — no I/O" but the body
  called `get_current_system_time()` for the join_event timestamp.
  Fix: take `time: SystemTime` as a parameter; the production caller
  passes `get_current_system_time()`, tests fix it deterministically.
  Doc-comment rewritten to say "synchronous, state-only mutation"
  rather than "pure".

  Pinned by `build_state_for_put_is_deterministic_given_time`.

Big-picture BLOCKING: AGENTS.md updated
  "Private Room Support" now documents:
    1. `build_rotation_encrypted_secrets` as the single source of
       truth for both rotation paths, including the byte-identical-
       blob-sets convergence invariant.
    2. `post_apply_cleanup`'s `encrypted_secrets` exemption from
       issue #110, including the new current-version scoping.
    3. Cross-reference to issue #110.

IMPORTANT items folded in:

  #4 — ban + encrypted_secrets must NOT resurrect a banned member.
  New test `test_banned_member_with_encrypted_secret_is_still_pruned`
  pins that the `members_by_id.contains_key(recipient_id)` guard at
  the cleanup site prevents the exemption from firing for
  not-currently-a-member entries.

  #5 — exemption TTL. The `secret_recipients` filter at the cleanup
  site is now scoped to `current_version`. Previously, every
  ever-recipient + their entire invite-chain ancestor set was exempt
  from cleanup forever, defeating the inactivity prune. Now: a
  member who has only old-version blobs and hasn't been re-issued
  at `current_version` is correctly pruned.

  Pinned by `test_stale_secret_recipient_is_pruned_after_rotation`.

  #6 — ban-race convergence with encrypted_secrets. New test
  `test_ban_race_with_encrypted_secret_converges_to_pruned` walks
  both delta orderings (add-X-then-ban-X and ban-X-then-add-X) with
  an in-flight encrypted_secret for X, asserts both peers converge
  to byte-identical state with X pruned.

  #7 — warn on duplicate owner blobs. `build_rotation_encrypted_
  secrets` now emits a defensive `eprintln!` warning when it sees
  a duplicate owner blob at the same version (first-wins still
  applies). This is rare — the contract dedups (member, version) —
  but logging helps surface real-world occurrences. (Using
  `eprintln!` rather than `tracing::warn!` because the common crate
  intentionally has no logging dep, to keep room-contract WASM
  small. No-op in WASM, visible in native tests / delegate native
  builds.)

  #8 — sparse-version test wall-clock threshold. Dropped the
  flaky-prone `elapsed.as_secs() < 5` assertion from
  `backfill_handles_sparse_high_version_state`; the behavioural
  assertion (emitted set is exactly `{(owner@v_new), (bob@v0),
  (bob@v_new)}`, which is O(members * recovered_versions) not
  O(new_version)) is the real signal. Per
  ~/.claude/rules/flaky-tests.md.

Migration: V22 added to `legacy_delegates.toml` since item #5
changes contract WASM hash. V21 retained — it captures the actual
deployed (pre-this-PR-merge) WASM that production users have. V22
captures the in-PR HEAD WASM as a defence-in-depth entry in case
anyone managed to install a build of PR-B-before-this-followup
(e.g. from CI artifacts).

Test counts:
  river-core: 158 passed
  chat-delegate: 25 passed
  river-ui: 135 passed
  migration: 4 passed
  room-contract: 0 (no tests, build clean)

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sanity sanity merged commit 9baf850 into main May 17, 2026
6 checks passed
@sanity sanity deleted the fix/dm-bug-3-pr-b branch May 17, 2026 17:54
sanity added a commit that referenced this pull request May 17, 2026
…publish

Publishes the merged tip of main containing all 5 bug fixes:

- #263 (#255): legacy delegate cursor overwrite
- #268 (#268, Bug #2): share-invite-via-DM picker hang
- #269 (#269, Bug #1): DM to inactive-but-invited members
- #270 (Bug #3 PR A): room-contract accepts msgs at any known secret version
- #272 (Bug #3 PR B): UI rotate back-fill + PUT join_event + #110 prune exemption
- #273 (#273, Bug #5): WebSocket indicator on no-rooms screen

river-core 0.1.8 published to crates.io (workspace version bump).
riverctl 0.1.58 published to crates.io alongside the UI publish
(addresses Bug #4 — riverctl now matches the deployed room contract WASM).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant