feat(runtimed-py): per-cell accessors for O(1) MCP reads#924
Conversation
Add get_cell_source, get_cell_type, get_cell_outputs, get_cell_execution_count, get_cell_ids, get_cell_metadata, and get_cell_position to DocHandle, Session, and AsyncSession. Each reads from the snapshot watch channel (no mutex) and only clones the requested field — not the entire Vec<CellSnapshot>. This makes MCP reads O(1) per field instead of O(n_cells). Available on both Session (sync) and AsyncSession (async).
set_cell and_run: get_cell_type() instead of get_cell() — only needs to check if the cell is code before executing, no reason to materialize all cells and resolve blobs. resource_cell_by_index: get_cell_ids() to resolve index → ID, then get_cell() for just the one cell. Previously materialized every cell. set_cell unchanged path: remove unused get_cell() call that was materializing all cells for no reason. Also add .pyi stubs for all new per-cell accessors on both Session and AsyncSession.
…aterialization handle.get_cell(cell_id) clones just the one matching CellSnapshot from the snapshot watch channel. The previous code called handle.get_cells() which cloned ALL cells, then linear-scanned to find the one it wanted.
There was a problem hiding this comment.
Pull request overview
Adds per-cell read accessors across notebook-sync and the runtimed-py bindings to avoid full-cell materialization for common MCP read paths, and wires the new accessors into the nteract MCP server for cheaper cell lookups.
Changes:
- Introduces per-cell snapshot accessors on
notebook-sync::DocHandle(IDs, source, type, outputs, execution_count, metadata, position). - Exposes a subset of those accessors through
runtimed-pySession/AsyncSession, plus.pyistub updates. - Updates nteract MCP server to use
get_cell_type()/get_cell_ids()to reduce unnecessary full-cell materialization.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| python/runtimed/src/runtimed/runtimed.pyi | Adds type stubs for new per-cell accessors on Session and AsyncSession. |
| python/nteract/src/nteract/_mcp_server.py | Switches MCP server code paths to use new per-cell accessors (get_cell_type, get_cell_ids). |
| crates/runtimed-py/src/session_core.rs | Implements new per-cell accessor functions in the shared Rust core for Python bindings. |
| crates/runtimed-py/src/session.rs | Exposes new per-cell accessors on the sync Session Python class. |
| crates/runtimed-py/src/async_session.rs | Exposes new per-cell accessors on the async AsyncSession Python class. |
| crates/notebook-sync/src/handle.rs | Adds per-cell snapshot accessor methods to DocHandle to reduce cloning and improve read performance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def get_cell_outputs(self, cell_id: str) -> list[str] | None: ... | ||
| def get_cell_execution_count(self, cell_id: str) -> str | None: ... | ||
| def get_cell_ids(self) -> list[str]: ... | ||
| def get_cell_position(self, cell_id: str) -> str | None: ... |
There was a problem hiding this comment.
Fixed in d2dceec — get_cell_position wired through session_core, Session, and AsyncSession.
| def get_cell_outputs(self, cell_id: str) -> Coroutine[Any, Any, list[str] | None]: ... | ||
| def get_cell_execution_count(self, cell_id: str) -> Coroutine[Any, Any, str | None]: ... | ||
| def get_cell_ids(self) -> Coroutine[Any, Any, list[str]]: ... | ||
| def get_cell_position(self, cell_id: str) -> Coroutine[Any, Any, str | None]: ... |
| let st = state.lock().await; | ||
| let handle = st | ||
| .handle | ||
| .as_ref() | ||
| .ok_or_else(|| to_py_err("Not connected"))?; | ||
| Ok(handle.get_cell_source(cell_id)) |
There was a problem hiding this comment.
Fixed in d2dceec — all per-cell accessors now clone the DocHandle (cheap Arc clone) inside a scoped block and drop the mutex before calling the accessor. The accessors read from a watch::Receiver that doesn't need the mutex held.
| /// Get a cell's source without materializing all cells. | ||
| fn get_cell_source(&self, cell_id: &str) -> PyResult<Option<String>> { | ||
| self.runtime | ||
| .block_on(session_core::get_cell_source(&self.state, cell_id)) | ||
| } | ||
|
|
||
| /// Get a cell's type without materializing all cells. | ||
| fn get_cell_type(&self, cell_id: &str) -> PyResult<Option<String>> { | ||
| self.runtime | ||
| .block_on(session_core::get_cell_type(&self.state, cell_id)) | ||
| } | ||
|
|
||
| /// Get a cell's raw outputs without blob resolution. | ||
| fn get_cell_outputs(&self, cell_id: &str) -> PyResult<Option<Vec<String>>> { | ||
| self.runtime | ||
| .block_on(session_core::get_cell_outputs(&self.state, cell_id)) | ||
| } | ||
|
|
||
| /// Get a cell's execution count. | ||
| fn get_cell_execution_count(&self, cell_id: &str) -> PyResult<Option<String>> { | ||
| self.runtime | ||
| .block_on(session_core::get_cell_execution_count(&self.state, cell_id)) | ||
| } | ||
|
|
||
| /// Get all cell IDs in document order. | ||
| fn get_cell_ids(&self) -> PyResult<Vec<String>> { | ||
| self.runtime | ||
| .block_on(session_core::get_cell_ids(&self.state)) | ||
| } |
There was a problem hiding this comment.
Agree these need tests. Deferring to a follow-up — integration tests require a running daemon and the test harness in python/runtimed/tests/ connects to a live socket. Will add coverage for existing/nonexistent cell cases in both sync and async flavors.
Add get_cell_position to session_core, Session, and AsyncSession — the .pyi stub declared it but the implementation was missing. All per-cell accessors now clone the DocHandle (cheap Arc clone) and drop the SessionState mutex before calling the accessor method. The accessors read from a watch::Receiver that doesn't need the mutex held, so this reduces contention with concurrent operations.
20 tests covering sync + async per-cell accessor methods: - get_cell_ids ordering - get_cell_source, get_cell_type, get_cell_outputs, get_cell_execution_count, get_cell_position for existing cells - None returns for nonexistent cells - Position ordering matches creation order - Consistency with get_cell() full materialization - Source updates reflected after sync
Agents (and humans) waste time when they don't know the right incantation for rebuilding Python bindings, running integration tests, or interacting with the dev daemon. Add a copy-paste-ready Quick Recipes section at the top of AGENTS.md covering: - Dev daemon env vars (the #1 footgun) - Rebuilding runtimed-py into the correct venv - Running Python integration tests against the dev daemon - Launching the notebook app in dev mode - WASM rebuild - Pointers to the right contributing docs per task
0c59c6b to
7c9ed42
Compare
…tub types Audit of merged #923 + #924 found several issues: Rust (session_core.rs): - collect_outputs: get_cells() + find → get_cell() (single-cell lookup) - get_cell_metadata: get_cells() + find → handle.get_cell_metadata() with clone-handle-drop-lock pattern MCP server: - replace_match: get_cell() → get_cell_source() (only needs source) - replace_regex: same fix Type stubs (.pyi): - move_cell: None → str (returns new position) - get_cell: Cell | None → Cell (raises on missing, never None) - set_cell_source_hidden: None → bool - set_cell_outputs_hidden: None → bool - set_cell_tags: None → bool All fixed for both Session and AsyncSession.
…tub types (#927) Audit of merged #923 + #924 found several issues: Rust (session_core.rs): - collect_outputs: get_cells() + find → get_cell() (single-cell lookup) - get_cell_metadata: get_cells() + find → handle.get_cell_metadata() with clone-handle-drop-lock pattern MCP server: - replace_match: get_cell() → get_cell_source() (only needs source) - replace_regex: same fix Type stubs (.pyi): - move_cell: None → str (returns new position) - get_cell: Cell | None → Cell (raises on missing, never None) - set_cell_source_hidden: None → bool - set_cell_outputs_hidden: None → bool - set_cell_tags: None → bool All fixed for both Session and AsyncSession.
Per-cell accessors on the Python bindings so MCP reads are O(1) per field instead of O(n_cells) full materialization.
The problem
session.get_cell(cell_id)materializes ALL cells viahandle.get_cells(), then linear-scans for the matching ID. For a 100-cell notebook, reading one cell's source means cloning 100CellSnapshots (each with source text, outputs, metadata) just to throw 99 away.The fix
New methods on
DocHandle,Session, andAsyncSession:get_cell_ids()Vec<String>get_cell_source(cell_id)Option<String>get_cell_type(cell_id)Option<String>get_cell_outputs(cell_id)Option<Vec<String>>get_cell_execution_count(cell_id)Option<String>get_cell_metadata(cell_id)Option<serde_json::Value>get_cell_position(cell_id)Option<String>All read from the snapshot watch channel (lock-free borrow, no mutex). Linear scan over the cell list but only clone the one field requested.
MCP server integration
Wired the new accessors into the nteract MCP server:
set_celland_run —get_cell_type()instead ofget_cell(). Only needs to check if it's code before executing, no reason to materialize all cells and resolve blobs.resource_cell_by_index—get_cell_ids()to resolve index → ID, thenget_cell()for just the one cell. Previously materialized every cell.set_cellunchanged path — removed unusedget_cell()call that was materializing all cells for no reason.Also added
.pyitype stubs for all new accessors on bothSessionandAsyncSession.PR submitted by @rgbkrk's agent Quill, via Zed