Fix deprecation, dead code warnings, and other clippiness by rgbkrk · Pull Request #1 · nteract/desktop

rgbkrk · 2026-02-11T18:48:22Z

Replaces all deprecated create_client_shell_connection calls with create_client_shell_connection_with_identity + peer_identity_for_session from runtimelib 1.2.0. Suppresses the dead code warning on execute_with_stdin.

- sidecar: replace deprecated create_client_shell_connection with create_client_shell_connection_with_identity + peer_identity_for_session at all three call sites (run, request_kernel_info, request_python_cwd) - runt: #[allow(dead_code)] on execute_with_stdin (useful but unused) - remove build step for packages/ui (source-only library, no build script)

- Remove redundant `use env_logger` import - Box JupyterMessage in SidecarEvent to reduce enum size (464 → ~8 bytes) - Use shorthand field init (`content` instead of `content: content`) - Remove useless .into() on string literals in Key::Character() - Compare Method::GET directly instead of &Method::GET

…update_display_data - Fix High #1: Normalize daemon output from JupyterMessageContent to nbformat shape - Fix High #2: Add daemon_execution to save_setting_locally for local persistence - Fix Medium #3: Add daemon_execution to from_json and apply_json_changes for migration - Fix Medium #4: Add onUpdateDisplayData callback for update_display_data handling - Fix Low #5: Remove verbose broadcast logging (keep only error logs)

* Add Tauri commands for daemon kernel execution Extends the notebook sync client to support request/response patterns: - Add NotebookBroadcastReceiver for kernel events from daemon - Add send_request() method with typed frames - Add recv_frame_any() for handling all frame types - Forward broadcasts to frontend via daemon:broadcast event New Tauri commands for daemon-owned execution: - launch_kernel_via_daemon - queue_cell_via_daemon - clear_outputs_via_daemon - interrupt_via_daemon - shutdown_kernel_via_daemon - get_daemon_kernel_info - get_daemon_queue_state This is Step 4 of the daemon-owned kernel execution plan. Frontend hooks (Step 5) still need to be updated to use these. * Add frontend useDaemonKernel hook and broadcast types New frontend components for daemon-owned kernel execution: types.ts: - DaemonBroadcast: Types for kernel status, outputs, queue changes, errors - DaemonNotebookResponse: Types for daemon request responses useDaemonKernel.ts: - Hook for daemon kernel operations - Listens for daemon:broadcast events - Provides launchKernel, queueCell, clearOutputs, interruptKernel, shutdownKernel - Tracks kernel status, queue state, and kernel info - Parses output JSON from broadcasts and calls onOutput callback This hook is separate from useKernel (local execution) to allow gradual migration. Apps can choose which execution mode to use. * Add daemon_execution setting to synced settings New setting to enable/disable daemon-owned kernel execution: Backend (settings_doc.rs, sync_client.rs): - Add daemon_execution: bool to SyncedSettings struct - Add get_bool/put_bool methods to SettingsDoc - Handle boolean values in put_value for Tauri commands Frontend (useSyncedSettings.ts): - Add daemonExecution state and setDaemonExecution callback - Sync across windows via settings:changed event The setting defaults to false. When enabled (in a future PR), the app will use useDaemonKernel instead of useKernel. * Fix daemon_execution field in SyncedSettings tests * Address review feedback: output normalization, settings persistence, update_display_data - Fix High #1: Normalize daemon output from JupyterMessageContent to nbformat shape - Fix High #2: Add daemon_execution to save_setting_locally for local persistence - Fix Medium #3: Add daemon_execution to from_json and apply_json_changes for migration - Fix Medium #4: Add onUpdateDisplayData callback for update_display_data handling - Fix Low #5: Remove verbose broadcast logging (keep only error logs) * Wire daemon execution switch in App.tsx When daemon_execution setting is enabled: - Use useDaemonKernel hook for kernel operations - Launch kernel via daemon instead of local ensureKernelStarted - Queue cells via daemon with cell source - Route outputs through daemon broadcasts The switch allows testing daemon-owned kernel execution while keeping local execution as the stable default. * Fix daemon execution order: launch kernel before queuing cells The daemon returns NoKernel when QueueCell is called without a running kernel, dropping the request. Fix by ensuring kernel is launched first, then queueing cells after it's ready. * Bump runtimed to 0.1.0-dev.3 and speed up dev builds - Bump version from 0.1.0-dev.2 to 0.1.0-dev.3 - cargo xtask build now uses debug mode for runtimed/runt-cli (~50s faster) - cargo xtask build-dmg/build-app still use release mode for distribution - cargo xtask install-daemon unchanged (always release for perf) * Update Cargo.lock for runtimed version bump

The daemon was regenerating Jupyter message headers when sending comm messages to the kernel, breaking widget protocol expectations. This caused slider interactions in one window to not sync to other windows. Changes: - Send full message envelope through daemon instead of just msg_type/content - Parse and preserve frontend's original header (session, msg_id, etc.) - Use JupyterMessageContent::from_type_and_content for proper type parsing - Handle no_kernel response as an error condition in frontend This fixes issue #1 from code review: daemon routing was dropping the original frontend header/session and regenerating new headers per message.

* Enable widget support (ipywidgets) in daemon execution mode Implement bidirectional comm message routing through the daemon to support ipywidgets and other comm-based outputs. Comm messages from the kernel are broadcast to all connected frontends, and widget interactions are routed back through the daemon to the kernel. All windows sharing a notebook can now interact with widgets with synchronized state. - Add NotebookBroadcast::Comm variant for broadcasting comm_open/msg/close - Handle comm messages in kernel iopub task and route through daemon - Add NotebookRequest::SendComm for frontend→kernel comm messages - Extend useDaemonKernel hook with onCommMessage callback and sendCommMessage - Wire daemon comm sending in App.tsx with module-level sender pattern - Bump daemon version to 0.1.0-dev.9 * Fix widget sync by preserving frontend message headers The daemon was regenerating Jupyter message headers when sending comm messages to the kernel, breaking widget protocol expectations. This caused slider interactions in one window to not sync to other windows. Changes: - Send full message envelope through daemon instead of just msg_type/content - Parse and preserve frontend's original header (session, msg_id, etc.) - Use JupyterMessageContent::from_type_and_content for proper type parsing - Handle no_kernel response as an error condition in frontend This fixes issue #1 from code review: daemon routing was dropping the original frontend header/session and regenerating new headers per message. * Bump daemon version to 0.1.0-dev.10 * Document widget multi-window sync limitation (#276) Update runtimed.md to reflect current widget support status: - Widget comm routing is implemented (PR #275) - Single-window widgets work correctly - Multi-window sync is a known limitation tracked in #276

Track active comm channels in daemon and send CommSync broadcast to newly connected clients, enabling widget reconstruction in secondary windows. - Add CommState module for tracking comm_open/msg/close - Wire CommState into NotebookRoom and kernel manager - Send CommSync after initial Automerge sync on client connect - Handle comm_sync in frontend to synthesize comm_open messages - Clear comm state on kernel shutdown Known limitations: - Third window opened after execution may not receive comm_sync (needs investigation) - Widget state not synced in real-time (by design - avoids feedback loops) - Buffer paths not fully reconstructed (Codex review finding #1) - Replay order nondeterministic (Codex review finding #2) Closes #276 Co-Authored-By: QuillAid <261289082+quillaid@users.noreply.github.com>

* Add widget comm state sync for multi-window support Track active comm channels in daemon and send CommSync broadcast to newly connected clients, enabling widget reconstruction in secondary windows. - Add CommState module for tracking comm_open/msg/close - Wire CommState into NotebookRoom and kernel manager - Send CommSync after initial Automerge sync on client connect - Handle comm_sync in frontend to synthesize comm_open messages - Clear comm state on kernel shutdown Known limitations: - Third window opened after execution may not receive comm_sync (needs investigation) - Widget state not synced in real-time (by design - avoids feedback loops) - Buffer paths not fully reconstructed (Codex review finding #1) - Replay order nondeterministic (Codex review finding #2) Closes #276 Co-Authored-By: QuillAid <261289082+quillaid@users.noreply.github.com> * Address Codex review findings #2 and #3 - Use sequence numbers for deterministic comm replay order (finding #2) Widget models can reference other widgets via IPY_MODEL_. Returning comms in insertion order ensures dependencies are created before dependents. - Clear comm state when launching new kernel (finding #3) If a kernel crashes without proper shutdown, stale comm snapshots could persist. Now cleared on new kernel launch, not just explicit shutdown. Co-Authored-By: QuillAid <261289082+quillaid@users.noreply.github.com> * Add error logging for broadcast deserialization failures Previously, failed deserializations during init were silently ignored. Now logs a warning with the error and payload size to help diagnose issues like comm_sync not being received. Co-Authored-By: QuillAid <261289082+quillaid@users.noreply.github.com> * Add diagnostic logging for comm_sync handling Helps diagnose third window widget sync issue by logging: - When comm_sync event is received - Whether onCommMessage handler is available - Unknown broadcast event types Co-Authored-By: QuillAid <261289082+quillaid@users.noreply.github.com> --------- Co-authored-by: QuillAid <261289082+quillaid@users.noreply.github.com>

- Fix issue #1: Remove pending completion entry if shell.send() fails, preventing stale entries from accumulating - Fix issue #3: Downgrade completion logging from info! to debug! to reduce noise from high-frequency autocomplete requests

* Add daemon support for code completions (Tab/Ctrl+Space) Implement code completion requests via the daemon using the Jupyter protocol's complete_request/complete_reply messages. Uses LSP-ready CompletionItem structure for future extensibility with additional completion sources (ruff, basedpyright). Closes #314. * Address review feedback: clean up pending completions on send failure - Fix issue #1: Remove pending completion entry if shell.send() fails, preventing stale entries from accumulating - Fix issue #3: Downgrade completion logging from info! to debug! to reduce noise from high-frequency autocomplete requests

Bug #1: New notebooks with explicit runtime selection now set the kernelspec metadata correctly. Previously, new_empty_with_runtime() left kernelspec as None, causing the daemon to fall back to default_runtime instead of respecting the user's choice. - Set kernelspec in new_empty(), new_empty_with_runtime(), new_empty_with_conda_from_environment_yml(), and new_empty_with_uv_from_pyproject() Bug #2: Notebooks in directories with pyproject.toml now properly use the project's dependencies via `uv run --with ipykernel`. - kernel_manager.rs: Branch on env_source for Python kernels; use `uv run` for uv:pyproject instead of pooled env - notebook_sync_server.rs: Skip pool acquisition for uv:pyproject since kernel manager uses uv run directly

…#337) * feat: extract kernel-launch crate for shared kernel launching Create a new shared crate that both notebook and runtimed can use for kernel launching and tool bootstrapping: - Extract tools.rs from notebook crate into kernel-launch - Add kernel-launch dependency to notebook and runtimed - Update daemon to properly dispatch Python vs Deno kernels - Deno kernels now use bootstrapped deno via kernel_launch::tools::get_deno_path() - Python kernels require pooled environment (no kernelspec fallback) - Add default_runtime check to auto_launch_kernel This fixes the issue where Deno notebooks were incorrectly launching Python kernels because the daemon wasn't checking default_runtime. * fix: update LaunchKernel to handle Deno and require pool env - Deno kernels skip pooled env acquisition - Python kernels return error if pool empty (no kernelspec fallback) - Non-prewarmed Python sources also require pool env * feat: add inline Deno detection (runt.deno metadata) - check_inline_deps now detects runt.deno in notebook metadata - Auto-launch checks inline deps first, which can override default_runtime - Notebooks with runt.deno config launch Deno kernels regardless of settings * fix: prioritize project files over default_runtime and route conda to correct pool Issues fixed: 1. Project files (pyproject.toml, environment.yml, pixi.toml) now take precedence over default_runtime setting. This prevents Deno from launching when opening a Python project notebook. 2. Conda-prefixed env_sources (conda:env_yml, conda:pixi, conda:inline) now correctly route to the conda pool instead of UV pool. Detection priority is now: 1. Inline deps (runt.deno, runt.uv, runt.conda) 2. Project files (implies Python context) 3. Default runtime (only if no project context) * fix: prioritize notebook kernelspec over project files for runtime detection Detection priority is now: 1. Notebook's kernelspec (metadata.kernelspec.name) - determines python vs deno 2. For Python notebooks: resolve env (inline deps → project files → prewarmed) 3. For Deno notebooks: just launch Deno 4. For new notebooks (no kernelspec): use default_runtime setting This allows Deno and Python notebooks to coexist in the same project directory. A Deno notebook in a directory with pyproject.toml will still launch Deno. * docs: clarify kernel launching priority with two-stage detection Document the new kernel launching architecture where runtime type (Python vs Deno) is determined first, then environment resolution for Python notebooks. This allows Python and Deno notebooks to coexist in the same project directory. - Add kernel-launch crate to documentation - Explain Stage 1: Runtime Detection (kernelspec-first) - Explain Stage 2: Python Environment Resolution (inline → project → prewarmed) - Document Deno kernel bootstrapping via rattler - Update developer guide (AGENTS.md) with new two-stage detection - Update user guide (docs/environments.md) with clearer explanation - Add detailed architecture notes to contributing/environments.md * Fix kernel detection for new notebooks and pyproject support Bug #1: New notebooks with explicit runtime selection now set the kernelspec metadata correctly. Previously, new_empty_with_runtime() left kernelspec as None, causing the daemon to fall back to default_runtime instead of respecting the user's choice. - Set kernelspec in new_empty(), new_empty_with_runtime(), new_empty_with_conda_from_environment_yml(), and new_empty_with_uv_from_pyproject() Bug #2: Notebooks in directories with pyproject.toml now properly use the project's dependencies via `uv run --with ipykernel`. - kernel_manager.rs: Branch on env_source for Python kernels; use `uv run` for uv:pyproject instead of pooled env - notebook_sync_server.rs: Skip pool acquisition for uv:pyproject since kernel manager uses uv run directly * Fix inline UV dependencies to use uv run --with For uv:inline sources, the kernel manager now uses `uv run --with <dep>` for each inline dependency instead of a prewarmed pool environment. This ensures that notebooks with inline dependencies (like `requests`) actually have those packages available at runtime. Changes: - kernel_manager.rs: Add inline_deps parameter to launch(); handle uv:inline by building `uv run --with ipykernel --with <deps>...` - notebook_sync_server.rs: Add get_inline_uv_deps() helper to extract dependency list from notebook metadata; pass None for pooled_env and extract deps when env_source is uv:inline * feat(runtimed): add cached inline UV env support Replace ephemeral `uv run --with` approach with cached environments for inline UV dependencies. Environments are cached by dependency hash in ~/.cache/runt/inline-envs/ for fast reuse on subsequent launches. New module inline_env.rs provides: - prepare_uv_inline_env(): Creates/returns cached venv with deps installed - Dependency hash for cache key (sorted, stable) - Uses `uv venv` + `uv pip install` for reliable installation Changes: - kernel_manager.rs: uv:inline now uses prepared cached env - notebook_sync_server.rs: Calls prepare_uv_inline_env() before launch for both auto_launch_kernel and LaunchKernel request handler - lib.rs: Export new inline_env module TODO: Add conda:inline support using rattler (helper function added but not yet wired up) * docs: document inline dependency environment caching Add documentation for how uv:inline deps work: - Cached environments in ~/.cache/runt/inline-envs/ - Hash-based caching for fast reuse - Flow from detection to kernel launch Add inline_env.rs to key files table.

1. Socket path mismatch (Issue #1): Session.connect() now respects RUNTIMED_SOCKET_PATH env var, and test fixtures set it when spawning daemon in CI mode. 2. Nested timeouts (Issue #2): Remove outer 10ms timeout wrapping recv_frame_any() since it already has internal 100ms timeout. This prevents repeatedly canceling mid-read. 3. sync_rx not drained (Issue #3): Use try_send() instead of send().await for changes channel. If receiver isn't keeping up, skip the update rather than blocking. Python bindings keep sync_rx alive but don't consume it. 4. Parse failure semantics (Issue #4): When output_type is "error" but parsing fails, create an error Output to preserve success=false semantics. 5. CONDUCTOR_WORKSPACE_PATH (Safia's comment): Use env var as preferred repo root fallback before walking up parent directories.

* feat(runtimed-py): add PyO3 bindings for daemon client Add Python bindings for runtimed daemon operations: - DaemonClient: pool status, ping, list rooms, flush, shutdown - Session: connect to notebook room, start kernel, execute code - ExecutionResult/Output: structured output types Uses PyO3 0.28 with pyo3-async-runtimes for tokio integration. Note: Current Session.execute() uses QueueCell shortcut which bypasses automerge doc. See design gap notes for proper doc-based execution flow. * chore: add *.so to gitignore, remove stray architecture.md - Add *.so to root gitignore for Python extension modules - Remove contributing/architecture.md (lives on architectural-principles branch) * feat(runtimed-py): implement document-first execution Switch runtimed-py bindings to use ExecuteCell instead of deprecated QueueCell. The daemon now reads cell source from the automerge document, ensuring all connected clients see the same code being executed. Changes: - Add Cell class to expose cell info to Python - Add document operations: create_cell, set_source, get_cell, get_cells, delete_cell - Add execute_cell() using ExecuteCell (reads from doc) - Add run() convenience method (create + execute) - Fix sync task exit bug by storing sync_rx in SessionState - Add timeout to recv_frame_any() to prevent blocking - Add biased select! to prioritize commands over polling * test(runtimed-py): add daemon integration tests Add comprehensive integration tests for the document-first execution pattern. Tests cover: - Basic connectivity to daemon - Document operations (create/update/get/delete cells) - Cell execution via ExecuteCell (reading from automerge doc) - Multi-client synchronization (two sessions sharing a notebook) - Kernel lifecycle (start/interrupt/shutdown) - Output types (stdout/stderr/display_data) - Error handling Supports two modes: - Dev mode: uses existing daemon via `cargo xtask dev-daemon` - CI mode: spawns isolated daemon with log capture 24 tests covering the core document-first architecture. * ci(runtimed-py): add daemon integration tests to CI Add a new job to the build workflow that runs the runtimed-py integration tests. The job: - Builds runtimed binary and runtimed-py Python bindings - Runs 24 integration tests in CI mode (spawns isolated daemon) - Uploads test logs as artifacts for debugging Tests verify document-first execution, multi-client sync, kernel lifecycle, output capture, and error handling. * docs: add Python bindings documentation Comprehensive guide for the runtimed Python package covering: - Session API for code execution - DaemonClient for low-level operations - Document-first execution pattern - Multi-client scenarios - Result types (ExecutionResult, Output, Cell) - Sidecar launcher for rich output * docs(runtimed-py): add package README for PyPI * fix(runtimed-py): address code review findings 1. Socket path mismatch (Issue #1): Session.connect() now respects RUNTIMED_SOCKET_PATH env var, and test fixtures set it when spawning daemon in CI mode. 2. Nested timeouts (Issue #2): Remove outer 10ms timeout wrapping recv_frame_any() since it already has internal 100ms timeout. This prevents repeatedly canceling mid-read. 3. sync_rx not drained (Issue #3): Use try_send() instead of send().await for changes channel. If receiver isn't keeping up, skip the update rather than blocking. Python bindings keep sync_rx alive but don't consume it. 4. Parse failure semantics (Issue #4): When output_type is "error" but parsing fails, create an error Output to preserve success=false semantics. 5. CONDUCTOR_WORKSPACE_PATH (Safia's comment): Use env var as preferred repo root fallback before walking up parent directories.

…ides Add comprehensive documentation for developers: - contributing/frontend-architecture.md: src/ vs apps/notebook/src/ split, path aliases, data flow - contributing/testing.md: unified testing guide (Vitest, Rust, Hone, Python, E2E) - contributing/typescript-bindings.md: ts-rs workflow for Rust → TypeScript types - contributing/widget-development.md: widget system internals, WidgetStore API, adding widgets - contributing/architecture.md: expanded Principle 5 with blob store implementation details Addresses high-priority documentation gaps from gap analysis (#1-4) without multi-window sync docs as requested.

…ides (#652) Add comprehensive documentation for developers: - contributing/frontend-architecture.md: src/ vs apps/notebook/src/ split, path aliases, data flow - contributing/testing.md: unified testing guide (Vitest, Rust, Hone, Python, E2E) - contributing/typescript-bindings.md: ts-rs workflow for Rust → TypeScript types - contributing/widget-development.md: widget system internals, WidgetStore API, adding widgets - contributing/architecture.md: expanded Principle 5 with blob store implementation details Addresses high-priority documentation gaps from gap analysis (#1-4) without multi-window sync docs as requested.

Agents (and humans) waste time when they don't know the right incantation for rebuilding Python bindings, running integration tests, or interacting with the dev daemon. Add a copy-paste-ready Quick Recipes section at the top of AGENTS.md covering: - Dev daemon env vars (the #1 footgun) - Rebuilding runtimed-py into the correct venv - Running Python integration tests against the dev daemon - Launching the notebook app in dev mode - WASM rebuild - Pointers to the right contributing docs per task

* feat(runtimed-py): per-cell accessors for O(1) MCP reads Add get_cell_source, get_cell_type, get_cell_outputs, get_cell_execution_count, get_cell_ids, get_cell_metadata, and get_cell_position to DocHandle, Session, and AsyncSession. Each reads from the snapshot watch channel (no mutex) and only clones the requested field — not the entire Vec<CellSnapshot>. This makes MCP reads O(1) per field instead of O(n_cells). Available on both Session (sync) and AsyncSession (async). * feat(nteract-mcp): use per-cell accessors to skip full materialization set_cell and_run: get_cell_type() instead of get_cell() — only needs to check if the cell is code before executing, no reason to materialize all cells and resolve blobs. resource_cell_by_index: get_cell_ids() to resolve index → ID, then get_cell() for just the one cell. Previously materialized every cell. set_cell unchanged path: remove unused get_cell() call that was materializing all cells for no reason. Also add .pyi stubs for all new per-cell accessors on both Session and AsyncSession. * perf(runtimed-py): get_cell uses single-cell lookup instead of full materialization handle.get_cell(cell_id) clones just the one matching CellSnapshot from the snapshot watch channel. The previous code called handle.get_cells() which cloned ALL cells, then linear-scanned to find the one it wanted. * fix(runtimed-py): add missing get_cell_position, reduce lock contention Add get_cell_position to session_core, Session, and AsyncSession — the .pyi stub declared it but the implementation was missing. All per-cell accessors now clone the DocHandle (cheap Arc clone) and drop the SessionState mutex before calling the accessor method. The accessors read from a watch::Receiver that doesn't need the mutex held, so this reduces contention with concurrent operations. * test(runtimed-py): integration tests for per-cell accessors 20 tests covering sync + async per-cell accessor methods: - get_cell_ids ordering - get_cell_source, get_cell_type, get_cell_outputs, get_cell_execution_count, get_cell_position for existing cells - None returns for nonexistent cells - Position ordering matches creation order - Consistency with get_cell() full materialization - Source updates reflected after sync * docs(agents): add quick recipes for common dev tasks Agents (and humans) waste time when they don't know the right incantation for rebuilding Python bindings, running integration tests, or interacting with the dev daemon. Add a copy-paste-ready Quick Recipes section at the top of AGENTS.md covering: - Dev daemon env vars (the #1 footgun) - Rebuilding runtimed-py into the correct venv - Running Python integration tests against the dev daemon - Launching the notebook app in dev mode - WASM rebuild - Pointers to the right contributing docs per task

…tarts **Problem:** When the MCP child process (`runt-nightly mcp`) exits after idle timeout, the proxy holds a stale handle and never triggers restart. Upstream clients see "Transport closed" and stay dead until manual `/mcp` restart. This is Bug #3 from ~/reports/2026-04-11-daemon-and-proxy-bugs.md and is the root cause of daemon orphaning issues (cascades to Bugs #1 and #2). **Solution:** Add background monitoring task that polls child status every 500ms and automatically restarts when the child exits: - Spawn monitor task after init_child() and restart_child() - Poll child_client.is_closed() to detect exit - Trigger restart_child() transparently - Prevent concurrent restarts with mutex - Send tool_list_changed notification to keep upstream alive **Restart Metrics:** Add tracking for supervisor_status: - restart_count() - number of child restarts - child_uptime_secs() - current child uptime - last_restart_time() - when last restart occurred **Structured Logging:** Enhanced events for lifecycle tracking: - child_spawned (with binary path, args, upstream client) - child_restart_requested (with restart number, uptime) - child_exited (with uptime before restart) - child_restart_failed (with error message) **Testing:** - 3 new unit tests (restart metrics, concurrent restart prevention) - All 97 tests pass - Manual testing still needed (idle timeout scenario) **Design Choice:** Used polling (500ms) instead of waiting() because RunningService doesn't implement Clone and waiting() consumes self. Polling is simpler and low-overhead. Resolves: Bug #3 (MCP Proxy Child Monitoring) Research: ~/reports/bug3-proxy-lifecycle-research.md Plan: ~/reports/consolidated-fix-plan.md

…tarts (#1701) * fix(runt-mcp-proxy): add child process monitoring for transparent restarts **Problem:** When the MCP child process (`runt-nightly mcp`) exits after idle timeout, the proxy holds a stale handle and never triggers restart. Upstream clients see "Transport closed" and stay dead until manual `/mcp` restart. This is Bug #3 from ~/reports/2026-04-11-daemon-and-proxy-bugs.md and is the root cause of daemon orphaning issues (cascades to Bugs #1 and #2). **Solution:** Add background monitoring task that polls child status every 500ms and automatically restarts when the child exits: - Spawn monitor task after init_child() and restart_child() - Poll child_client.is_closed() to detect exit - Trigger restart_child() transparently - Prevent concurrent restarts with mutex - Send tool_list_changed notification to keep upstream alive **Restart Metrics:** Add tracking for supervisor_status: - restart_count() - number of child restarts - child_uptime_secs() - current child uptime - last_restart_time() - when last restart occurred **Structured Logging:** Enhanced events for lifecycle tracking: - child_spawned (with binary path, args, upstream client) - child_restart_requested (with restart number, uptime) - child_exited (with uptime before restart) - child_restart_failed (with error message) **Testing:** - 3 new unit tests (restart metrics, concurrent restart prevention) - All 97 tests pass - Manual testing still needed (idle timeout scenario) **Design Choice:** Used polling (500ms) instead of waiting() because RunningService doesn't implement Clone and waiting() consumes self. Polling is simpler and low-overhead. Resolves: Bug #3 (MCP Proxy Child Monitoring) Research: ~/reports/bug3-proxy-lifecycle-research.md Plan: ~/reports/consolidated-fix-plan.md * fix(runt-mcp-proxy): address code review feedback **Code Review Fixes:** 1. **Critical:** Fix task leak on restart - old monitor now exits after spawning new one (prevents N+1 tasks after N restarts) 2. **Important:** Improve docstring to document polling vs waiting() constraint (RunningService ownership issue) 3. **Important:** Make polling interval configurable via ProxyConfig (monitor_poll_interval_ms, default 500ms) **Changes:** - Monitor task now breaks after successful restart instead of continuing - Added monitor_poll_interval_ms field to ProxyConfig (default: 500ms) - Updated all ProxyConfig creation sites (supervisor, mcpb, tests) - Enhanced docstring to explain design constraint **Testing:** - All 97 tests still pass - cargo check passes for runt-mcp-proxy, mcp-supervisor, mcpb-runt - cargo xtask lint --fix passes **Review:** superpowers:code-reviewer identified issues, all addressed

**Problem:** Pool size settings written by `runt config set` were silently ignored by the daemon. Users could set `uv_pool_size: 15` in settings.json, but the daemon always used compiled defaults (3/3/2) because: 1. CLI writes pool sizes to settings.json correctly 2. Daemon reads from Automerge SettingsDoc 3. SettingsDoc import functions never read pool_size keys from JSON This is Bug #1 from ~/reports/2026-04-11-daemon-and-proxy-bugs.md. **Solution:** 1. **Add pool size import in from_json()** (initial migration) - Read uv_pool_size, conda_pool_size, pixi_pool_size from JSON - Write to SettingsDoc during first-time migration - File: crates/runtimed-client/src/settings_doc.rs:409-419 2. **Add pool size import in apply_json_changes()** (live updates) - Detect pool_size changes in settings.json - Update SettingsDoc when values change - Log changes at info level - File: crates/runtimed-client/src/settings_doc.rs:886-918 3. **Remove CLI seed override** (daemon.rs:516-519) - Removed code that overwrote pool sizes from CLI args - Settings.json is now the single source of truth - CLI defaults only apply on first launch before settings.json exists 4. **Add Pixi pool size to startup logs** (main.rs:413) - Was missing from daemon config dump - Now matches UV and Conda logging 5. **Add --pixi-pool-size CLI flag** (main.rs:72-74) - Completes parity with UV and Conda flags - Default: 2 (matches existing behavior) **Testing:** Added integration test `test_pool_size_config_honored` that verifies: - from_json() correctly imports pool sizes - apply_json_changes() correctly updates on JSON changes - No-op optimization works (no change when values match) All unit tests pass (125 tests in runtimed-client). New integration test passes. **Impact:** Before: Pool sizes effectively read-only at compile time (3/3/2) After: Pool sizes fully configurable via settings.json or CLI commands **Verification:** ```bash # Set pool sizes runt-nightly config set uv_pool_size 15 runt-nightly config set conda_pool_size 15 runt-nightly config set pixi_pool_size 15 # Restart daemon runt-nightly daemon restart # Verify runt-nightly daemon status # Expected: UV: 15/15, Conda: 15/15, Pixi: 15/15 ``` Resolves: Bug #1 (Config Pool Sizes) Research: ~/reports/bug1-config-research.md Plan: ~/reports/consolidated-fix-plan.md

Fixes test_daemon_take_empty_pool timeout failure in CI. **Root cause:** After the Bug #1 fix, daemon warming loops read pool sizes exclusively from SettingsDoc (which imports from settings.json). When tests set config.uv_pool_size = 0, the daemon ignored that and used SettingsDoc defaults (3/3/2), causing it to try warming environments during tests. This led to timeouts because: 1. Tests use config.{uv,conda,pixi}_pool_size = 0 to disable warming 2. Daemon's SettingsDoc has no settings.json in test runs 3. SettingsDoc defaults to (3, 3, 2) from schema 4. Daemon tries to warm 3 UV + 3 Conda + 2 Pixi = 8 environments 5. Environment creation hangs or times out in CI **Solution:** Check BOTH config and SettingsDoc for pool sizes: - If config.{uv,conda,pixi}_pool_size == 0 → use 0 (test mode) - Otherwise → use SettingsDoc value (production mode) This preserves test isolation while honoring user-configured pool sizes. **Changes:** - daemon.rs:2705-2720 (UV loop): Check config.uv_pool_size before SettingsDoc - daemon.rs:2793-2808 (Conda loop): Check config.conda_pool_size before SettingsDoc - daemon.rs:2891-2906 (Pixi loop): Check config.pixi_pool_size before SettingsDoc **Testing:** - test_daemon_take_empty_pool now passes (was timing out) - All 22 integration tests pass - test_multiple_notebooks_concurrent_isolation is flaky but unrelated Fixes: test_daemon_take_empty_pool timeout in PR #1702 Related: Bug #1 (Config Pool Sizes)

* fix(runtimed): honor pool_size settings from settings.json **Problem:** Pool size settings written by `runt config set` were silently ignored by the daemon. Users could set `uv_pool_size: 15` in settings.json, but the daemon always used compiled defaults (3/3/2) because: 1. CLI writes pool sizes to settings.json correctly 2. Daemon reads from Automerge SettingsDoc 3. SettingsDoc import functions never read pool_size keys from JSON This is Bug #1 from ~/reports/2026-04-11-daemon-and-proxy-bugs.md. **Solution:** 1. **Add pool size import in from_json()** (initial migration) - Read uv_pool_size, conda_pool_size, pixi_pool_size from JSON - Write to SettingsDoc during first-time migration - File: crates/runtimed-client/src/settings_doc.rs:409-419 2. **Add pool size import in apply_json_changes()** (live updates) - Detect pool_size changes in settings.json - Update SettingsDoc when values change - Log changes at info level - File: crates/runtimed-client/src/settings_doc.rs:886-918 3. **Remove CLI seed override** (daemon.rs:516-519) - Removed code that overwrote pool sizes from CLI args - Settings.json is now the single source of truth - CLI defaults only apply on first launch before settings.json exists 4. **Add Pixi pool size to startup logs** (main.rs:413) - Was missing from daemon config dump - Now matches UV and Conda logging 5. **Add --pixi-pool-size CLI flag** (main.rs:72-74) - Completes parity with UV and Conda flags - Default: 2 (matches existing behavior) **Testing:** Added integration test `test_pool_size_config_honored` that verifies: - from_json() correctly imports pool sizes - apply_json_changes() correctly updates on JSON changes - No-op optimization works (no change when values match) All unit tests pass (125 tests in runtimed-client). New integration test passes. **Impact:** Before: Pool sizes effectively read-only at compile time (3/3/2) After: Pool sizes fully configurable via settings.json or CLI commands **Verification:** ```bash # Set pool sizes runt-nightly config set uv_pool_size 15 runt-nightly config set conda_pool_size 15 runt-nightly config set pixi_pool_size 15 # Restart daemon runt-nightly daemon restart # Verify runt-nightly daemon status # Expected: UV: 15/15, Conda: 15/15, Pixi: 15/15 ``` Resolves: Bug #1 (Config Pool Sizes) Research: ~/reports/bug1-config-research.md Plan: ~/reports/consolidated-fix-plan.md * docs: address code review feedback with clarifying comments Added comments per code review: - Warming loops now document they read from SettingsDoc not config - Links to integration test for verification - CLI flag default linked to schema constant No functional changes, only documentation improvements. * fix(runtimed): honor config pool_size=0 for test isolation Fixes test_daemon_take_empty_pool timeout failure in CI. **Root cause:** After the Bug #1 fix, daemon warming loops read pool sizes exclusively from SettingsDoc (which imports from settings.json). When tests set config.uv_pool_size = 0, the daemon ignored that and used SettingsDoc defaults (3/3/2), causing it to try warming environments during tests. This led to timeouts because: 1. Tests use config.{uv,conda,pixi}_pool_size = 0 to disable warming 2. Daemon's SettingsDoc has no settings.json in test runs 3. SettingsDoc defaults to (3, 3, 2) from schema 4. Daemon tries to warm 3 UV + 3 Conda + 2 Pixi = 8 environments 5. Environment creation hangs or times out in CI **Solution:** Check BOTH config and SettingsDoc for pool sizes: - If config.{uv,conda,pixi}_pool_size == 0 → use 0 (test mode) - Otherwise → use SettingsDoc value (production mode) This preserves test isolation while honoring user-configured pool sizes. **Changes:** - daemon.rs:2705-2720 (UV loop): Check config.uv_pool_size before SettingsDoc - daemon.rs:2793-2808 (Conda loop): Check config.conda_pool_size before SettingsDoc - daemon.rs:2891-2906 (Pixi loop): Check config.pixi_pool_size before SettingsDoc **Testing:** - test_daemon_take_empty_pool now passes (was timing out) - All 22 integration tests pass - test_multiple_notebooks_concurrent_isolation is flaky but unrelated Fixes: test_daemon_take_empty_pool timeout in PR #1702 Related: Bug #1 (Config Pool Sizes)

Two codex-review fixes on commit 9878b6c: **P2 #1 — widget patches before CRDT writer**. The bootstrap fallback only mirrored patches into the local store and returned; nothing replayed them into the CRDT once the writer became available. Under the pre-A2 manager, the debounced flush kept retrying until the writer appeared. Restored equivalent behavior with a small pending queue: - On `updateAndPersist` without a writer: queue the patch (last-wins merge per comm) AND mirror to the local store so the UI still feels responsive. - On the next `updateAndPersist` that sees a writer: drain the queue in insertion order before processing the new write. - Fallback 50ms polling timer for the "writer just showed up but no new interaction happened" case. Cancels when the queue drains. - `reset()` (called on kernel restart) clears the queue — stale pre-restart patches shouldn't reach the new kernel. - `clearComm(commId)` (comm_close) drops the per-comm entry. **P2 #2 — stall watchdog false positives under sustained traffic**. The watchdog was idempotent: once armed, new outbound flushes didn't extend the deadline. A long slider drag or play-widget loop produces 100+ outbound flushes/sec; the daemon never quite catches up so `generate_runtime_state_sync_reply()` never returns null; watchdog fires after 3s mid-interaction. Switched arm to reset-on-each-flush. Each new outbound write gets a fresh window, so healthy-but-busy sessions don't trip the recovery. The stall we do want to catch — pipe drops the frame, daemon panics — shows up as no more outbound to reset the timer AND no inbound within 3s, which this still detects. Tests updated: - 2 new bootstrap-queue tests: drain on next update, coalescing multiple pre-writer writes on the same comm. - Existing watchdog tests still pass (reset-on-flush changes timing slightly but not test semantics). 765 tests pass (+2); lint clean.

rgbkrk force-pushed the fix-deprecation-warnings branch from d270cdd to 6a3cf14 Compare February 11, 2026 18:52

rgbkrk changed the title ~~Fix deprecation and dead code warnings~~ Fix deprecation, dead code warnings, and other clippiness Feb 11, 2026

rgbkrk merged commit 814595c into main Feb 11, 2026
7 checks passed

rgbkrk deleted the fix-deprecation-warnings branch February 11, 2026 19:10

rgbkrk mentioned this pull request Feb 21, 2026

Fix pyproject.toml environment support and error diagnostics #168

Merged

quillaid mentioned this pull request Apr 11, 2026

fix(runt-mcp-proxy): add child process monitoring for transparent restarts #1701

Merged

13 tasks

quillaid mentioned this pull request Apr 11, 2026

fix(runtimed): honor pool_size settings from settings.json #1702

Merged

7 tasks

This was referenced Apr 15, 2026

Align notebook-docs retention with blob grace window #1818

Closed

feat(runtimed): llm_preview fields for blob-spilled stream and error outputs #1829

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deprecation, dead code warnings, and other clippiness#1

Fix deprecation, dead code warnings, and other clippiness#1
rgbkrk merged 2 commits intomainfrom
fix-deprecation-warnings

rgbkrk commented Feb 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rgbkrk commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rgbkrk commented Feb 11, 2026 •

edited

Loading