Fix display math round-trip inside blockquotes / lists (issue #181)#188
Merged
Conversation
…-q6ed) Parser-side defect: `pandoc_display_math` grammar matches its body as a single regex between `$$` delimiters and so never consumes `block_continuation` markers, leaving the literal `> ` prefix bytes inside `Math.text`. The qmd writer is correct; the grammar needs to be line-structured the way `pandoc_code_block` already is. See claude-notes/issue-reports/181/triage.md for full evidence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (bd-q6ed) The `pandoc_display_math` grammar rule matches its body as a single regex between `$$` delimiters and so never goes through the block-continuation machinery. When the math sits inside a blockquote (or any combination of blockquotes, list items, etc.), the `> ` and indentation bytes those enclosing blocks would normally consume end up captured verbatim in `Math.text`. The qmd writer then re-prefixes every line on output, so each round trip adds another `> ` / indent level (issue #181). Fix it in the AST extractor: the opening `$$` sits at some column C, so on every interior line of the math, bytes at columns 0..C are the accumulated continuation prefix added by all enclosing blocks. Strip those bytes column-wise — but only if every one is in `{>, space, tab}`, so lazy-continuation lines (no explicit `> `) aren't chewed. This handles arbitrary mixed nesting (`> - $$`, `- > $$`, `> - > $$`, `> > $$`, divs around either, etc.) without enumerating block types, because the column already encodes the cumulative prefix width. Regression fixtures in `crates/pampa/tests/roundtrip_tests/qmd-json-qmd/`: - display_math_in_blockquote.qmd (reporter's exact input) - display_math_in_nested_blockquote.qmd - display_math_in_list_in_blockquote.qmd - display_math_in_blockquote_in_list.qmd - display_math_in_bq_list_bq.qmd Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #181. Display math
$$ ... $$inside a blockquote retained the>continuation prefix verbatim inMath.textbecausepandoc_display_mathin the tree-sitter grammar matches its body as a single regex token between$$delimiters and so never goes through the block-continuation machinery. The qmd writer then correctly re-prefixed every line on output, doubling the>and adding another level on each round trip.The fix is in the AST extractor (
crates/pampa/src/pandoc/treesitter.rs):$$sits at columnC=node.start_position().column. On every interior line of the math, bytes at columns0..Care the accumulated continuation prefix added by all enclosing blocks (any combination of blockquotes, list items, fenced divs, etc.).{>, space, tab}. Lines that don't fit that pattern (lazy continuation, where the user wrote no explicit>) are left alone rather than having real content chewed off.Because the column already encodes the cumulative prefix width, this handles arbitrary mixed nesting (
> - $$,- > $$,> - > $$,> > $$, divs around either, etc.) without enumerating block types or walking ancestors per block kind.I first attempted a structural grammar fix — making
pandoc_display_mathline-structured likepandoc_code_blockso each interior line goes through_soft_line_break(which consumesblock_continuation). That broke existing tests (Display math with list markers should remain a single paragraph,Display math inside fenced div should parse correctly) intoERRORnodes, becausepandoc_display_mathis an inline element living inside_inlines, and crossing soft line breaks at the inline level conflicts with_inlines's own line structure. The AST-extraction approach was the smaller, safer fix.Triage doc and minimal repros live at
claude-notes/issue-reports/181/for context.Test plan
Regression fixtures added under
crates/pampa/tests/roundtrip_tests/qmd-json-qmd/:display_math_in_blockquote.qmd— the reporter's exact inputdisplay_math_in_nested_blockquote.qmd—> > $$ ... $$display_math_in_list_in_blockquote.qmd—> - $$ ... $$display_math_in_blockquote_in_list.qmd—- > $$ ... $$display_math_in_bq_list_bq.qmd—> - > $$ ... $$All five previously diverged on
qmd → JSON → qmd → JSONand now round-trip cleanly viatest_qmd_roundtrip_consistency.cargo nextest run -p pampa— 3685 passed, 2 skippedcargo xtask verify --skip-hub-tests— full Rust workspace + WASM hub-client build + trace-viewer tests passcargo run --bin pampa: AST is clean (Math DisplayMath "\np = q\n"), round-tripped qmd is correctly>-prefixed, re-parsing yields the same AST as the original (idempotent)vitest runnot exercised — there is a pre-existingERR_MODULE_NOT_FOUNDfailure onmainHEAD unrelated to this changeTracks beads
bd-q6ed.🤖 Generated with Claude Code