feat: fallback static parsing for completely invalid notebooks by dmadisetti · Pull Request #8723 · marimo-team/marimo

dmadisetti · 2026-03-16T22:47:02Z

📝 Summary

Allows notebooks with broken syntax to be parsed by extracting the "cell boundaries" (i.e. @app.cell) and leveraging indentation. Possibly a bit heavy handed, but leverages a simple state machine to work through the tokens to find the boundaries.

NB, this is ONLY a fallback mechanism, but should prevent breakage in vs-code/ watch when the source file breaks.

vercel · 2026-03-16T22:47:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Mar 30, 2026 7:28pm

mscolnick · 2026-03-25T18:52:42Z

marimo/_ast/parse.py

+    """When ast.parse() fails, use scanner to recover individual cells."""
+    from marimo._ast.scanner import scan_parse_fallback
+
+    return scan_parse_fallback(source, filepath)


does this do anything? should we just inline this?

Copilot

Pull request overview

Adds a tokenizer-driven fallback parser so marimo notebooks with broken Python syntax can still be statically parsed by recovering @app.cell-style boundaries, improving resilience for --watch/VS Code workflows.

Changes:

Introduces marimo._ast.scanner with token-based boundary detection plus recovery logic, and a scan_parse_fallback() that parses cells individually.
Updates Parser.node_stack() to fall back to the scanner on SyntaxError, and emits a dedicated violation for scanner-generated unparsable cells.
Adjusts linting/tests to treat syntax-broken files without cell boundaries as skipped/unrecognisable, adds an encoding-bytes regression test, and avoids duplicate diagnostics for unparsable cells.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`marimo/_ast/scanner.py`	New tokenizer-based scanner and per-cell fallback parsing to recover boundaries in syntactically invalid notebooks.
`marimo/_ast/parse.py`	Hooks scanner fallback into parsing flow and tags scanner-generated unparsable cells with a specific violation.
`marimo/_ast/load.py`	Reads notebook text with `errors="replace"` to avoid crashing on invalid UTF-8 bytes.
`marimo/_lint/rules/formatting/general.py`	Skips scanner-specific unparsable violations to prevent duplicate reporting alongside MB001.
`tests/_lint/test_run_check.py`	Updates expectations for syntax-broken non-notebooks, adds encoding and deduplication regression tests.
`tests/_lint/test_json_formatter.py`	Updates JSON output expectations when broken files are skipped rather than errored.
`tests/_ast/test_load.py`	Updates expected status behavior for syntax-broken inputs with/without notebook boundaries.

Copilot · 2026-03-25T18:56:16Z

marimo/_ast/scanner.py

+        # Error recovery: scan forward from error line for boundaries
+        error_line_in_chunk, _exc = error_info
+        error_line_abs = error_line_in_chunk + offset
+        found_restart = False
+
+        for candidate_line_0 in range(error_line_abs, total_lines):
+            line_text = lines[candidate_line_0]


In scan_notebook() error recovery, error_line_in_chunk is 1-indexed (tokenize reports line numbers starting at 1), but error_line_abs is used as a 0-indexed list index into lines. This causes an off-by-one (skipping the actual error line and potentially indexing past the end), which can prevent finding the next boundary and break recovery. Convert the absolute error line to 0-index before iterating over lines (and consider clamping to [0, total_lines)).

Copilot · 2026-03-25T18:56:17Z

marimo/_ast/scanner.py

+def _has_cell_boundaries(source: str) -> bool:
+    """Quick check whether source has any cell boundary markers."""
+    return (
+        "@app.cell" in source
+        or "@app.function" in source
+        or "@app.class_definition" in source
+        or "with app.setup" in source
+        or "app._unparsable_cell" in source
+    )
+
+
+def scan_parse_fallback(
+    source: str, filepath: str
+) -> tuple[list[ast.stmt], frozenset[int]]:
+    """Fallback parser: scan for cell boundaries, parse each cell individually.
+
+    Called when ast.parse() on the full file fails due to syntax errors.
+    Returns a tuple of (nodes, scanner_generated_lines) where
+    scanner_generated_lines contains the 1-indexed start line numbers of
+    unparsable cells created by the scanner (vs. pre-existing
+    app._unparsable_cell() calls in the source).
+    Returns ([], frozenset()) if no cell boundaries are found.
+    """
+    from marimo._ast.parse import ast_parse
+
+    if not _has_cell_boundaries(source):
+        return [], frozenset()
+
+    scan = scan_notebook(source)


_has_cell_boundaries() is a raw substring check, so it can return true for @app.cell appearing in a string/comment. In that case scan_parse_fallback() proceeds, scan_notebook() may find zero real boundaries, and then the preamble parse re-raises SyntaxError (fatal) even though the file is effectively “no boundaries” and should be skipped gracefully. Consider removing this precheck or making it token/line-anchored (column-0) and, importantly, early-returning ([], frozenset()) when scan_notebook() finds no boundaries before attempting to parse scan.preamble.

Copilot · 2026-03-25T18:56:17Z

marimo/_ast/scanner.py

+            line_idx = start_line - 2  # 0-indexed, one line before
+            while line_idx >= 0:
+                line = lines[line_idx].strip()
+                if line.startswith("@"):
+                    adjusted_start = line_idx + 1  # 1-indexed
+                    line_idx -= 1
+                elif not line:
+                    # Skip blank lines between decorators
+                    line_idx -= 1
+                else:
+                    break


When expanding a boundary upward to include preceding decorators, using lines[line_idx].strip().startswith("@") ignores indentation. That means an indented line like @decorator (e.g., inside a previous cell/function, or in malformed code) can be incorrectly pulled into the next cell’s start, corrupting cell slicing. To keep boundaries stable, only treat decorators at column 0 (e.g., lines[line_idx].startswith("@")) or match the boundary line’s indentation explicitly.

Copilot · 2026-03-25T18:56:17Z

marimo/_ast/parse.py

+            nodes, scanner_lines = _scan_parse_fallback(
+                self.extractor.contents or "", self.filepath
+            )
+            self._scanner_generated_lines = scanner_lines
+            return PeekStack(iter(nodes))


Parser.node_stack() claims the scanner fallback will “Never re-raise”, but _scan_parse_fallback() / scan_parse_fallback() can still raise SyntaxError (e.g., if there’s a syntax error in the preamble before the first cell boundary, it is explicitly re-raised as “fatal”). This can still break --watch/IPC, which the comment (and PR description) suggests should not happen. Either catch SyntaxError around _scan_parse_fallback() here and return a best-effort empty node list (or an Unparsable marker), or adjust the comment/behavior so the contract is accurate.

Suggested change

nodes, scanner_lines = _scan_parse_fallback(

self.extractor.contents or "", self.filepath

)

self._scanner_generated_lines = scanner_lines

return PeekStack(iter(nodes))

try:

nodes, scanner_lines = _scan_parse_fallback(

self.extractor.contents or "", self.filepath

)

self._scanner_generated_lines = scanner_lines

return PeekStack(iter(nodes))

except SyntaxError:

# If the scanner itself encounters a fatal syntax error (e.g.,

# in the preamble before any cell boundary), fall back to an

# empty node list rather than propagating the exception.

self._scanner_generated_lines = frozenset()

return PeekStack(iter(()))

mscolnick · 2026-03-25T21:39:38Z

marimo/_ast/scanner.py

+            else:
+                self._reset()
+
+        # if __name__ == "__main__":


oh lol nvm. i see what is it

tests/_ast/test_scanner.py

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

marimo/_ast/scanner.py

Use a token scanner to recover individual cells when the notebook has syntax errors, so --watch and IPC are never broken by a syntax error. Unparsable cells are flagged with a lint violation.

github-actions · 2026-03-30T19:54:22Z

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.21.2-dev91

dmadisetti changed the title ~~stash~~ feat: fallback static parsing for completely invalid notebooks Mar 16, 2026

dmadisetti added the bug Something isn't working label Mar 16, 2026

dmadisetti force-pushed the dm/parse branch from a9f18cd to f6ba95f Compare March 17, 2026 00:58

vercel bot deployed to Preview March 17, 2026 00:59 View deployment

vercel bot deployed to Preview March 25, 2026 18:01 View deployment

dmadisetti marked this pull request as ready for review March 25, 2026 18:11

dmadisetti requested review from Copilot and manzt and removed request for Copilot March 25, 2026 18:11

Copilot started reviewing on behalf of dmadisetti March 25, 2026 18:11 View session

vercel bot deployed to Preview March 25, 2026 18:11 View deployment

mscolnick requested a review from Copilot March 25, 2026 18:51

Copilot started reviewing on behalf of mscolnick March 25, 2026 18:51 View session

mscolnick reviewed Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

vercel bot deployed to Preview March 25, 2026 21:23 View deployment

vercel bot deployed to Preview March 25, 2026 21:28 View deployment

dmadisetti requested a review from mscolnick March 25, 2026 21:28

mscolnick requested a review from Copilot March 25, 2026 21:39

mscolnick reviewed Mar 25, 2026

View reviewed changes

Copilot started reviewing on behalf of mscolnick March 25, 2026 21:39 View session

mscolnick previously approved these changes Mar 25, 2026

View reviewed changes

mscolnick reviewed Mar 25, 2026

View reviewed changes

tests/_ast/test_scanner.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 25, 2026

View reviewed changes

marimo/_ast/scanner.py Show resolved Hide resolved

marimo/_ast/scanner.py Show resolved Hide resolved

dmadisetti dismissed mscolnick’s stale review via 364d8d2 March 25, 2026 22:12

vercel bot deployed to Preview March 25, 2026 22:15 View deployment

dmadisetti requested a review from mscolnick March 26, 2026 17:53

mscolnick previously approved these changes Mar 26, 2026

View reviewed changes

dmadisetti dismissed mscolnick’s stale review via 32d0d6d March 26, 2026 18:43

vercel bot deployed to Preview March 26, 2026 18:44 View deployment

dmadisetti requested a review from mscolnick March 26, 2026 22:41

dmadisetti added the merge when ready label Mar 26, 2026

mscolnick previously approved these changes Mar 30, 2026

View reviewed changes

dmadisetti dismissed mscolnick’s stale review via 75702ca March 30, 2026 19:20

vercel bot deployed to Preview March 30, 2026 19:21 View deployment

dmadisetti force-pushed the dm/parse branch from 75702ca to 6d1984c Compare March 30, 2026 19:23

dmadisetti requested a review from Light2Dark as a code owner March 30, 2026 19:23

dmadisetti marked this pull request as draft March 30, 2026 19:24

vercel bot deployed to Preview March 30, 2026 19:25 View deployment

feat: fallback static parsing for completely invalid notebooks

120277b

Use a token scanner to recover individual cells when the notebook has syntax errors, so --watch and IPC are never broken by a syntax error. Unparsable cells are flagged with a lint violation.

dmadisetti force-pushed the dm/parse branch from 6d1984c to 120277b Compare March 30, 2026 19:27

dmadisetti marked this pull request as ready for review March 30, 2026 19:28

mscolnick approved these changes Mar 30, 2026

View reviewed changes

vercel bot deployed to Preview March 30, 2026 19:28 View deployment

dmadisetti merged commit 9565d52 into main Mar 30, 2026
41 of 43 checks passed

dmadisetti deleted the dm/parse branch March 30, 2026 19:49

Conversation

dmadisetti commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Summary

Uh oh!

vercel bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mscolnick Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

mscolnick Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

mscolnick Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmadisetti commented Mar 16, 2026 •

edited

Loading

vercel bot commented Mar 16, 2026 •

edited

Loading