fix(runtimed): resolve duplicate rooms after saving untitled notebook#987
fix(runtimed): resolve duplicate rooms after saving untitled notebook#987
Conversation
When `rekey_ephemeral_room` found a different room already at the canonical path (created by a racing `get_or_create_room`), it silently returned None — leaving the UUID-keyed room orphaned. Now it evicts the interloper and proceeds with the rekey. Also adds an `Arc::ptr_eq` guard to the eviction task so a stale timer from an evicted room cannot accidentally remove a rekeyed room at the same key. Closes #942
rgbkrk
left a comment
There was a problem hiding this comment.
This fixes a real race condition — the Arc::ptr_eq guard on eviction is the key safety addition. Without it, a stale eviction timer could remove a rekeyed room that happens to be at the same key.
The interloper eviction logic is sound: the ephemeral room has the user's content (cells, outputs, kernel), while the interloper is an empty/just-created room from a racing get_or_create_room. Removing the interloper and proceeding with rekey is the right call.
One concern on the interloper cleanup:
The tokio::spawn for interloper shutdown runs detached. If the interloper had a kernel (unlikely but possible if a second client connected and triggered auto-launch during the race window), the shutdown runs asynchronously. The interloper's watcher_shutdown_tx and kernel shutdown are fire-and-forget — if they fail, the resources leak silently. This is acceptable for a rare race condition, but worth a warn! if the kernel shutdown fails:
if let Err(e) = kernel.shutdown().await {
warn!("[notebook-sync] Failed to shut down interloper kernel: {}", e);
}Also worth noting: the interloper's persist_tx and changed_tx channels will be dropped when the Arc is dropped, which is correct — the debounced persistence task will exit when the watch channel closes.
The Arc::ptr_eq guard on eviction is the more important fix here and is clean. Good PR.
There was a problem hiding this comment.
Pull request overview
Fixes a race in runtimed’s notebook room re-keying (UUID → canonical file path) that could leave two rooms mapped to the same notebook path, causing clients (notably Python) to sync against a different room than the UI.
Changes:
- Adds an
Arc::ptr_eqguard in the idle-eviction task to avoid removing a room that has since been replaced under the same key. - Updates
rekey_ephemeral_roomto evict an “interloper” room already present at the canonical path and proceed with the re-key, cleaning up interloper resources asynchronously.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| warn!( | ||
| "[notebook-sync] Re-key skipped: canonical path {} already has a different room", | ||
| "[notebook-sync] Re-key collision: evicting interloper room at {}", | ||
| canonical | ||
| ); | ||
| return None; | ||
| // Remove the interloper — the ephemeral room has the real content. | ||
| let interloper = rooms_guard.remove(&canonical); | ||
| // Clean up interloper resources in background | ||
| if let Some(interloper) = interloper { | ||
| tokio::spawn(async move { | ||
| if let Some(mut kernel) = interloper.kernel.lock().await.take() { | ||
| let _ = kernel.shutdown().await; | ||
| } |
There was a problem hiding this comment.
In rekey_ephemeral_room, the collision path unconditionally removes the room currently keyed by canonical and then shuts down its kernel/watcher. This is unsafe if that “interloper” already has (or is about to have) an in-flight/connected peer that obtained an Arc from get_or_create_room (active_peers may still be 0 at this moment). In that case, removing it from the map and shutting down resources can orphan/disrupt that other connection. Consider only evicting when you can prove the interloper is truly unused (e.g., interloper.active_peers == 0 AND Arc::strong_count(&interloper) == 1, and ideally no running kernel), otherwise keep the prior behavior (skip rekey) or implement a safer merge/redirect strategy.
| if rooms_guard.contains_key(&canonical) { | ||
| let existing = rooms_guard.get(&canonical); | ||
| let is_same_room = existing | ||
| .is_some_and(|r| Arc::ptr_eq(r, rooms_guard.get(old_notebook_id).unwrap_or(r))); | ||
| if !is_same_room { |
There was a problem hiding this comment.
The is_same_room calculation is hard to reason about and can produce surprising results because it falls back to comparing Arc::ptr_eq(r, r) when old_notebook_id is missing from the map (unwrap_or(r)). That masks the “old room not found” condition and makes the collision logic fragile. A clearer approach is to explicitly look up the old room Arc first (or compare existing.as_deref() with the room: &NotebookRoom parameter via std::ptr::eq) and only treat the rooms as the same when that real comparison succeeds.
| warn!( | ||
| "[notebook-sync] Re-key skipped: canonical path {} already has a different room", | ||
| "[notebook-sync] Re-key collision: evicting interloper room at {}", | ||
| canonical | ||
| ); | ||
| return None; | ||
| // Remove the interloper — the ephemeral room has the real content. | ||
| let interloper = rooms_guard.remove(&canonical); |
There was a problem hiding this comment.
This new “collision → evict interloper” behavior changes an important concurrency edge case and the file already has a sizeable test module, but there are no tests covering rekey_ephemeral_room. Please add a test that sets up (1) a UUID-keyed room with content and (2) a concurrently-created canonical-path room, then asserts the post-rekey map contains exactly one room for the canonical path and that the correct room is retained (and that eviction is skipped when the canonical room is not safe to evict).
Summary
rekey_ephemeral_roomfound a different room already at the canonical path (created by a racingget_or_create_roomcall), it silently returnedNone— leaving the UUID-keyed room orphaned in the rooms map. This produced two rooms for the same file path, causing edits from Python clients to be invisible in the UI.rekey_ephemeral_roomevicts the interloper room (which is empty/just-created) and proceeds with the normal rekey, so only one room exists for a given path.Arc::ptr_eqguard to the eviction task to prevent a stale eviction timer from accidentally removing a rekeyed room that now lives at the same key.Closes #942
Verification
runt notebooksand confirm only one room exists for that path.PR submitted by @rgbkrk's agent, Quill