Fix `_code_mode` cell ID collisions on large notebooks by manzt · Pull Request #8951 · marimo-team/marimo

manzt · 2026-03-31T18:39:08Z

CellIdGenerator uses a deterministic seed (42) for reproducible cell IDs, but the code_mode context was using the same seed when generating new cells. This meant it replayed the exact same ID sequence that created the notebook's existing cells, causing 100% collision rates on notebooks with ~50+ cells and eventually hitting the 100-attempt limit.

The fix gives CellIdGenerator a configurable seed parameter (defaulting to 42 for backwards compatibility) and uses a different seed (7) in the code_mode context. The seen_ids set is now populated from document.cell_ids rather than kernel.graph.cells.keys(), since the graph may not contain all cells (e.g. cells that exist on disk but were never executed).

vercel · 2026-03-31T18:39:12Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Mar 31, 2026 8:19pm

Copilot

Pull request overview

This PR addresses deterministic cell ID collisions in code_mode by making CellIdGenerator seed configurable and ensuring the code_mode context avoids reusing the same ID sequence as existing notebooks.

Changes:

Add a configurable seed parameter to CellIdGenerator (defaulting to 42 for compatibility).
Initialize AsyncCodeModeContext’s ID generator with a different seed and seed seen_ids from document.cell_ids to avoid collisions with document-only cells.
Update snapshots and add a regression test covering document-only cell IDs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`marimo/_ast/cell_id.py`	Adds `seed` parameter to make deterministic ID generation configurable.
`marimo/_code_mode/_context.py`	Uses a non-default seed in code_mode and seeds `seen_ids` from the document to avoid collisions.
`tests/_code_mode/test_context.py`	Updates expected deterministic IDs and adds a regression test for document-only ID collisions.

Copilot · 2026-03-31T18:45:40Z

marimo/_code_mode/_context.py

+        # ID generator for new cells — use a different seed than the
+        # default (42) so we don't replay the same ID sequence that
+        # created the notebook's existing cells.  Also seed seen_ids
+        # with graph + document IDs to avoid collisions.
+        self._id_generator = CellIdGenerator(seed=7)


CellIdGenerator(seed=7) is still deterministic and replays the same ID sequence every time cm.get_context() constructs a new AsyncCodeModeContext. Once a notebook accumulates >100 IDs from this seed (e.g. after many separate code_mode requests), create_cell_id() will hit the 100-attempt cap and raise again. Consider using a per-notebook/per-document varying seed (e.g. derived from document.version / document.cell_ids), persisting the generator across contexts, or switching code_mode IDs to a UUID-based prefix so the generator doesn’t have to “skip past” previously created IDs each time.

Suggested change

# ID generator for new cells — use a different seed than the

# default (42) so we don't replay the same ID sequence that

# created the notebook's existing cells. Also seed seen_ids

# with graph + document IDs to avoid collisions.

self._id_generator = CellIdGenerator(seed=7)

# ID generator for new cells — derive a seed from the current

# document so we don't replay the same ID sequence across

# notebooks or document states. Also seed seen_ids with the

# document's existing IDs to avoid collisions.

seed = hash(("code_mode", tuple(sorted(document.cell_ids)))) & 0xFFFFFFFF

if not seed:

# Extremely unlikely fallback to a non-zero constant.

seed = 7

self._id_generator = CellIdGenerator(seed=seed)

Copilot · 2026-03-31T18:45:41Z

tests/_code_mode/test_context.py

+        """create_cell must not generate IDs that collide with cells
+        that exist in the document but not in the kernel graph (B4)."""
+        # Build a large set of document-only cells whose IDs come from
+        # the same deterministic generator used by AsyncCodeModeContext.
+        from marimo._ast.cell_id import CellIdGenerator


The new test’s docstring/comments say the doc-only IDs come from “the same deterministic generator used by AsyncCodeModeContext”, but AsyncCodeModeContext now uses CellIdGenerator(seed=7) while this test uses the default seed (42). Please update the wording (and/or pass seed=42 explicitly) to reflect that these IDs represent the notebook’s existing (default-seeded) IDs that code_mode must avoid colliding with. Also, the inline comment about expecting a ValueError is a bit misleading since the assertion is about not reusing an existing document ID.

marimo/_code_mode/_context.py

CellIdGenerator uses a deterministic seed (42) for reproducible cell IDs, but the code_mode context was using the same seed when generating new cells. This meant it replayed the exact same ID sequence that created the notebook's existing cells, causing 100% collision rates on notebooks with ~50+ cells and eventually hitting the 100-attempt limit. The fix gives CellIdGenerator a configurable `seed` parameter (defaulting to 42 for backwards compatibility) and uses a different seed (7) in the code_mode context. The seen_ids set is now populated from `document.cell_ids` rather than `kernel.graph.cells.keys()`, since the graph may not contain all cells (e.g. cells that exist on disk but were never executed).

The previous fix (seed=7) would still replay the same ID sequence across sessions, eventually exhausting the 100-attempt collision limit on notebooks where many cells were created via code_mode. Switching to `seed=None` (OS entropy) eliminates sequence replay entirely while `seen_ids` from the document still prevents collisions with existing cells.

kirangadhave

🚀 thanks for the quick fix!

I'm curious why we are setting seeds outside the test setup in notebook.py and cell_manager.py

kirangadhave · 2026-03-31T20:22:31Z

marimo/_ast/cell_manager.py

        self.prefix = prefix
        self.unparsable = False
-        self._cell_id_generator = CellIdGenerator(prefix)
+        self._cell_id_generator = CellIdGenerator(prefix, seed=42)


do we need the seed outside the test setup?

kirangadhave · 2026-03-31T20:23:08Z

marimo/_convert/notebook.py

        NotebookV1: The notebook v1.
    """
-    cell_id_generator = CellIdGenerator()
+    cell_id_generator = CellIdGenerator(seed=42)


do we need this to be deterministic?

Good question. I was just trying to preserve the previous behavior... but you are probably right that we only need a seed for fixtures.

We could make a follow up PR with some ContextVar that we just set in tests and probably remove the seed param entirely.

github-actions · 2026-03-31T20:36:23Z

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.21.2-dev104

Copilot AI review requested due to automatic review settings March 31, 2026 18:39

manzt requested a review from dmadisetti as a code owner March 31, 2026 18:39

Copilot started reviewing on behalf of manzt March 31, 2026 18:39 View session

vercel bot deployed to Preview March 31, 2026 18:40 View deployment

vercel bot deployed to Preview March 31, 2026 18:41 View deployment

Copilot AI reviewed Mar 31, 2026

View reviewed changes

mscolnick added the bug Something isn't working label Mar 31, 2026

manzt force-pushed the push-lyzmszsrltzw branch from d1108a6 to 38a3900 Compare March 31, 2026 20:17

manzt and others added 2 commits March 31, 2026 16:17

manzt force-pushed the push-lyzmszsrltzw branch from 38a3900 to a11b279 Compare March 31, 2026 20:18

vercel bot deployed to Preview March 31, 2026 20:19 View deployment

manzt requested a review from kirangadhave March 31, 2026 20:19

kirangadhave approved these changes Mar 31, 2026

View reviewed changes

mscolnick merged commit 010cac0 into main Mar 31, 2026
35 of 43 checks passed

mscolnick deleted the push-lyzmszsrltzw branch March 31, 2026 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `_code_mode` cell ID collisions on large notebooks#8951

Fix `_code_mode` cell ID collisions on large notebooks#8951
mscolnick merged 2 commits intomainfrom
push-lyzmszsrltzw

manzt commented Mar 31, 2026

Uh oh!

vercel bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

kirangadhave left a comment

Uh oh!

kirangadhave Mar 31, 2026

Uh oh!

kirangadhave Mar 31, 2026

Uh oh!

manzt Mar 31, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        # ID generator for new cells — use a different seed than the
-        # default (42) so we don't replay the same ID sequence that
-        # created the notebook's existing cells.  Also seed seen_ids
-        # with graph + document IDs to avoid collisions.
-        self._id_generator = CellIdGenerator(seed=7)
+        # ID generator for new cells — derive a seed from the current
+        # document so we don't replay the same ID sequence across
+        # notebooks or document states. Also seed seen_ids with the
+        # document's existing IDs to avoid collisions.
+        seed = hash(("code_mode", tuple(sorted(document.cell_ids)))) & 0xFFFFFFFF
+        if not seed:
+            # Extremely unlikely fallback to a non-zero constant.
+            seed = 7
+        self._id_generator = CellIdGenerator(seed=seed)

Conversation

manzt commented Mar 31, 2026

Uh oh!

vercel bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kirangadhave left a comment

Choose a reason for hiding this comment

Uh oh!

kirangadhave Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

kirangadhave Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

manzt Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Mar 31, 2026 •

edited

Loading