Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions claude-notes/issue-reports/201/repro.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
reveal.js\' jump-to-slide.
130 changes: 130 additions & 0 deletions claude-notes/issue-reports/201/triage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Issue #201 — Writer drops `\'` escape on apostrophes that the reader would re-reject as quote opens

- **GitHub**: https://github.com/quarto-dev/q2/issues/201
- **Reporter**: @rundel (Colin Rundel), 2026-05-15
- **Triage date**: 2026-05-15
- **Worktree**: `.worktrees/issue-201` (branch `issue-201`, based on `main` @ `26b8943c`)
- **Beads issue**: bd-8lcm (filed during triage)
- **Scope**: The reader/writer asymmetry around the ASCII apostrophe `'` in the qmd writer (`crates/pampa/src/writers/qmd.rs`). No reader changes are in scope — the reader's behavior is the spec the writer must satisfy.

## Summary

Real bug, reproduces at HEAD. The qmd writer emits a bare `'` for an apostrophe whose source form required a `\'` escape, so `qmd → AST → qmd` is not a fixed point: the second round-trip through the reader fails with `Q-2-10` (Closed Quote Without Matching Open Quote). Fix is small in scope (one writer function plus context-aware lookahead across inline boundaries) but not literally one-line — see § Localization.

## Reproduction

Fixture: `claude-notes/issue-reports/201/repro.qmd`

```
reveal.js\' jump-to-slide.
```

Three commands, exactly as reported in the issue:

```bash
# 1. Reader accepts \' and produces an apostrophe in the AST.
$ printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa --
[ Para [Str "reveal.js’", Space, Str "jump-to-slide."] ]

# 2. Writer round-trips the AST as qmd, but DROPS the escape:
$ printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa -- -t qmd
reveal.js' jump-to-slide.

# 3. Feeding that output back into the reader fails with Q-2-10:
$ printf -- "reveal.js\\\\' jump-to-slide.\n" \
| cargo run --bin pampa -- -t qmd 2>/dev/null \
| cargo run --bin pampa --
Error: [Q-2-10] Closed Quote Without Matching Open Quote
╭─[ <stdin>:1:11 ]
1 │ reveal.js' jump-to-slide.
│ ┬┬
│ ╰─── This is the opening quote. ...
│ │
│ ╰── A space is causing a quote mark to be interpreted as a quotation close.
───╯
```

Inspected output: confirmed by hand; the bare `'` between `js` and the trailing space is exactly the position Q-2-10 fires on.

### Trigger boundary (probed during triage)

The Q-2-10 trigger is precisely **letter-on-left AND whitespace+content-on-right**, matching the issue body's description and the `Q-2-10.json` corpus case (`a' b.`). Two adjacent probes:

```bash
# Apostrophe at end of paragraph (no whitespace after): reader is happy.
$ printf "reveal.js\\\\'\n" | cargo run --bin pampa --
[ Para [Str "reveal.js’"] ]

# Apostrophe followed by " end." — Q-2-10 fires.
$ printf "reveal.js' end.\n" | cargo run --bin pampa --
Error: [Q-2-10] ...
```

So the writer must consider both sides of the apostrophe, including across inline boundaries (the `Space` after `Str "reveal.js'"` is a separate inline).

### Real-world incidence

The reporter cites two quarto-web sources that contain this pattern (`reveal.js' jump-to-slide`, and another in `tables.qmd`). This means the writer regression is observable on real documents the project already publishes — not a synthetic edge case.

## Localization

- **Writer escape table**: `crates/pampa/src/writers/qmd.rs:1379` (`fn escape_markdown`). Currently has no `'` arm — the comment at line 1405 explicitly notes "characters that don't need escaping in most contexts: . , - + ! ? = : ; / ( ) % & ' \""`. That comment is wrong about `'` in the letter-then-whitespace context.
- **Str writer entry point**: `crates/pampa/src/writers/qmd.rs:1414` (`fn write_str`). Currently has signature `(s: &Str, buf, ctx)` and calls `escape_markdown(reverse_smart_quotes(&s.text))`. A correct fix needs at least lookahead to the next inline (Space vs other) and lookbehind to the previous inline's trailing character. The `ctx: &mut QmdWriterContext` is the natural place to thread that state — or `write_inline` (line 2159) could pre-compute boundary flags and pass them down.
- **Reverse smart-quote helper**: `crates/pampa/src/writers/qmd.rs:1371` (`fn reverse_smart_quotes`). Already turns U+2019 (`’`) into ASCII `'`. The fix interacts with this: post-conversion, any ASCII `'` produced by this helper is a candidate for escaping based on context.
- **Reader-side trigger (reference only, not modified)**: `Q-2-10` error corpus at `crates/pampa/resources/error-corpus/Q-2-10.json` defines the canonical trigger `a' b.`. Existing roundtrip test `crates/pampa/tests/roundtrip_tests/qmd-json-qmd/smart_quotes_apostrophes.qmd` only exercises the *safe* contexts (`project's`, `can't`, `it's`, `We're`) and so doesn't catch this regression. A new roundtrip fixture in the same directory (e.g. `apostrophe_before_space.qmd` containing `reveal.js\' jump-to-slide.`) is the obvious TDD entry point.

## Open questions — resolved during triage

**Q1. Is the trigger letter-on-left-only, or letter-on-left AND whitespace-on-right?**
Experiment: ran `printf "reveal.js\\'\n"` (no trailing whitespace) and confirmed the reader accepts it.
Conclusion: the trigger requires *both* sides. A writer fix that escaped every letter-then-apostrophe occurrence would over-escape `project's`, `can't`, etc. The fix must look at the next inline (or end-of-block) too.

**Q2. Does the Str body itself ever contain the boundary?**
Experiment: inspected the AST from the failing input. The boundary lives across inlines: `[Str "reveal.js'", Space, Str "jump-to-slide."]`. The `'` is at the end of one Str; the Space is the next inline.
Conclusion: a fix that only looks within a single Str body is insufficient. The writer must know what inline follows the current Str (Space, SoftBreak, end-of-block, etc.).

**Q3. Is the writer's current "don't escape `'`" comment defensible at all?**
Experiment: read the comment at `qmd.rs:1405–1407`.
Conclusion: the comment is correct that escaping `'` everywhere would be verbose (every contraction). It is wrong that "very specific contexts" can be ignored — the letter+whitespace case is exactly such a context and it is what this bug is about. The right fix is context-aware escaping, not blanket escaping.

## Outcome / recommended next step

Filed beads issue with the fix scope below. No GH comment needed yet — issue is already specific and well-reproduced; the bd-XXXX will be referenced when the fix PR lands.

**Fix scope (for the beads issue):**

1. **Test first (TDD per `crates/pampa/CLAUDE.md`)**:
- Add `crates/pampa/tests/roundtrip_tests/qmd-json-qmd/apostrophe_before_space.qmd` containing `reveal.js\' jump-to-slide.` (or similar). Confirm the round-trip test fails at HEAD.
2. **Implement**: Extend `write_str` (or the surrounding `write_inline` driver) to detect a trailing `letter + '` at the end of a `Str` whose **next** inline is `Space`/`SoftBreak`/etc., and emit `\'` instead of `'`. The minimal correct rule is: escape ASCII `'` in a `Str` body iff the char immediately to its left is a Unicode letter AND the next byte the writer would emit is ASCII whitespace.
3. **Verify**: Round-trip test passes; `cargo nextest run --workspace` clean; the two quarto-web fixtures from the issue round-trip cleanly through `pampa -t qmd | pampa`.

Out of scope: any reader changes; over-escaping `'` in the safe contexts (`project's`, `can't`).

## Verification commands used

```bash
# Pre-flight (Rust-only; hub-client & trace-viewer skipped due to missing node_modules — unrelated bootstrap)
cargo xtask verify --skip-hub-build --skip-hub-tests

# Reproduction
printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa --
printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa -- -t qmd
printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa -- -t qmd 2>/dev/null | cargo run --bin pampa --

# Boundary probes
printf "reveal.js\\\\'\n" | cargo run --bin pampa --
printf "reveal.js' end.\n" | cargo run --bin pampa --

# Issue intake
gh issue view 201 --repo quarto-dev/q2 --json title,body,author,createdAt,labels,comments
```

## Cross-references

- Reader-side spec: `crates/pampa/resources/error-corpus/Q-2-10.json`
- Existing apostrophe roundtrip coverage (does NOT cover this case): `crates/pampa/tests/roundtrip_tests/qmd-json-qmd/smart_quotes_apostrophes.qmd`
- Real-world incidence (quarto-web):
- https://github.com/quarto-dev/quarto-web/blob/41a5a98d20449d970821b59be1711a0710a9cee3/docs/presentations/revealjs/presenting.qmd#L85
- https://github.com/quarto-dev/quarto-web/blob/41a5a98d20449d970821b59be1711a0710a9cee3/docs/authoring/tables.qmd#L772
- Writer file: `crates/pampa/src/writers/qmd.rs` (escape table + `write_str` + `write_inline` driver)
159 changes: 159 additions & 0 deletions claude-notes/plans/2026-05-15-issue-201-apostrophe-escape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Plan: writer emits `\'` for apostrophes the reader would re-reject (issue #201, bd-8lcm)

## Context

- **GitHub**: https://github.com/quarto-dev/q2/issues/201
- **Beads**: bd-8lcm
- **Triage**: `claude-notes/issue-reports/201/triage.md`
- **Worktree**: `.worktrees/issue-201` (branch `issue-201`, based on `main` @ `26b8943c`)

The qmd writer at `crates/pampa/src/writers/qmd.rs` emits a bare `'` for an apostrophe whose source form required a `\'` escape. Round-trip `qmd → AST → qmd → AST` fails on `reveal.js\' jump-to-slide.` and similar real-world inputs.

## The reader's rule (probed during triage, sharpened under this plan)

The reader's smart-quote-apostrophe classification accepts a bare `'` *only* when **both** neighbors are Unicode alphanumeric. Anything else — punctuation, whitespace, end-of-Str, or end-of-block — is rejected.

| left context | right context | reader behavior |
|---|---|---|
| alphanumeric | alphanumeric (`don't`, `ab'9`) | accepted as apostrophe `’` |
| alphanumeric | non-alphanumeric in Str (`ab'.cd`) | **Q-2-7** |
| alphanumeric | whitespace + content (`ab' c`) | **Q-2-10** |
| alphanumeric | newline + content (paragraph continues) | **Q-2-7** |
| alphanumeric | LineBreak (`ab'\\\n…`) | **Q-2-7** |
| alphanumeric | start of any other inline (Code, Quoted, …) | **Q-2-7** |
| alphanumeric | end-of-Str / end-of-block / end-of-input (`ab'\n` alone) | **Q-2-7** |
| non-alphanumeric | anything | not classified as quote-open in the first place |

Original triage observation about `reveal.js'\n` being accepted was an artifact of probing the *already-escaped* form `reveal.js\'`. The *unescaped* form errors uniformly.

Probes that confirmed the sharper rule:

```bash
printf "ab'c\n" | cargo run --bin pampa -- # OK
printf "ab'9\n" | cargo run --bin pampa -- # OK
printf "ab'\n" | cargo run --bin pampa -- # ERR (no following content needed)
printf "ab' \n" | cargo run --bin pampa -- # ERR
printf "ab'.\n" | cargo run --bin pampa -- # ERR
printf "9' a\n" | cargo run --bin pampa -- # ERR
```

So the writer's escape rule, derived from the reader's rule, simplifies to:

> Escape `'` as `\'` whenever **the previous char in the Str body is Unicode alphanumeric** AND **the next char in the Str body is either absent or non-alphanumeric**.

No cross-inline lookahead is needed: the rule is purely local to a single `Str` body. (`reverse_smart_quotes` runs first, so the apostrophe is ASCII `'` by the time we make this decision.) This is materially simpler than the cross-inline design originally sketched in the triage.

## Design decisions

1. **Escape logic is local to `escape_markdown`.** No `QmdWriterContext` field, no cross-inline lookahead, no `write_inlines` helper. The rule only needs the previous and next characters within a single `Str` body. This is enough because the reader rejects `letter'` at any non-alnum boundary — including end-of-Str — and the rule is independent of what follows in subsequent inlines.

2. **`escape_markdown` iterates with `peekable() + prev_char` and emits `\'` per the rule above.** Existing escapes (for `\*`, `\[`, `\#`, etc.) are unchanged.

3. **No reader changes.** This is purely a writer fix.

4. **No over-escaping.** The rule produces `\'` exactly where the reader needs it (`don't` and `ab'9` stay un-escaped; `reveal.js'` and `ab'.` get the backslash). The pre-existing apostrophe-roundtrip fixture `smart_quotes_apostrophes.qmd` is the natural regression check.

## Phase 1: TDD — failing test

- [x] Add roundtrip fixture: `crates/pampa/tests/roundtrip_tests/qmd-json-qmd/apostrophe_before_space.qmd` containing exactly the issue's input (`reveal.js\' jump-to-slide.`).
- [x] Run the qmd-json-qmd roundtrip test, verify it fails at HEAD with the expected error (re-parse of writer output triggers Q-2-10).
- [x] Add a second small fixture covering an *additional* trigger shape — `apostrophe_before_punct.qmd` (`ab\'.cd`) — to catch the within-Str case.

## Phase 2: Implementation

- [x] Refactor `escape_markdown` to iterate with `peekable()` and track `prev_char`. Add a `'\''` arm: escape iff `prev_char.is_alphanumeric() && !next.is_alphanumeric_or_absent()`. (No signature change.)
- [x] `write_str` stays unchanged — it already calls `escape_markdown`.
- [x] No `QmdWriterContext` field needed. No `write_inlines` helper needed. (Earlier draft of this plan over-engineered; sharper reader probes simplified it.)

## Phase 3: Verify

- [x] The two new roundtrip fixtures pass.
- [x] `cargo nextest run -p pampa` — verify no regressions in pampa.
- [x] `cargo nextest run --workspace` — verify no regressions across the monorepo.
- [x] `cargo xtask verify --skip-hub-build --skip-hub-tests` — verify the Rust-only verification (lint, fmt, build with `-D warnings`, full test run) is clean.
- [x] End-to-end: re-run the three commands from the triage doc (`pampa`, `pampa -t qmd`, the round-trip pipeline) and confirm the round-trip now succeeds. Record the observed output below.
- [x] End-to-end: round-trip the two quarto-web fixtures cited in the issue (cited via raw URLs in the triage doc) through `pampa -t qmd | pampa` and confirm no errors.

## Phase 4: Commit + handoff

- [x] Commit on `issue-201` branch with a message linking bd-8lcm and issue #201.
- [x] Update bd-8lcm with progress notes and (eventually) close it.
- [x] Sync beads JSONL on `main`.
- [x] Ask for permission before pushing.

## End-to-end record

### After-fix round-trip (issue's reported case)

```
$ printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa -- -t qmd
reveal.js\' jump-to-slide.

$ printf -- "reveal.js\\\\' jump-to-slide.\n" | cargo run --bin pampa -- -t qmd 2>/dev/null | cargo run --bin pampa --
[ Para [Str "reveal.js’", Space, Str "jump-to-slide."] ]
```

Round-trip is now a fixed point. Output inspected by hand: the trailing apostrophe in the first Str is correctly emitted as `\'`.

### Broader pattern coverage

Test input (`/tmp/q2-test-cases.qmd`):

```
reveal.js\' jump-to-slide.

ab\'.cd

*hi\'*

"hi\'"

He said don't worry.

it's working

We have ab'9 here.
```

Both `pampa -t qmd` (writer output) and the re-parse of that output produce identical ASTs to the original parse. Each shape covered:

- `reveal.js\'` followed by `Space` + `Str` → `\'` emitted.
- `ab\'.cd` (apostrophe before punctuation in same Str) → `\'` emitted.
- `*hi\'*` (apostrophe at end of Emph content, before closing `*`) → `\'` emitted.
- `"hi\'"` (apostrophe at end of Quoted content, before closing `"`) → `\'` emitted.
- `don't`, `it's` (letter-apostrophe-letter contractions) → bare `'` preserved.
- `ab'9` (letter-apostrophe-digit) → bare `'` preserved.

Inspection of the writer output:

```
$ cargo run --bin pampa -- /tmp/q2-test-cases.qmd -t qmd 2>/dev/null
reveal.js\' jump-to-slide.

ab\'.cd

*hi\'*

"hi\'"

He said don't worry.

it's working

We have ab'9 here.
```

The escape lands exactly where the reader needs it; nowhere else.

### Verification commands run

```bash
cargo nextest run -p pampa test_qmd_roundtrip_consistency # 1 passed (the suite includes the two new fixtures)
cargo nextest run -p pampa --no-fail-fast # 3686 passed
cargo nextest run --workspace --no-fail-fast # 8863 passed
cargo xtask verify --skip-hub-build --skip-hub-tests # all 9 steps green
```

### quarto-web fixtures (skipped)

Attempted to fetch the two quarto-web sources cited in the issue (lines from `presenting.qmd` and `tables.qmd` at SHA `41a5a98d…`) but the SHA was not reachable from this machine. Skipped this external check; the shape of those occurrences (`reveal.js' jump-to-slide`) is the literal reproducer above, which is verified.
26 changes: 24 additions & 2 deletions crates/pampa/src/writers/qmd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1376,9 +1376,20 @@ fn reverse_smart_quotes(text: &str) -> String {
// This follows Pandoc's escaping strategy: escape characters that have special
// markdown meaning to ensure proper roundtripping (qmd -> AST -> qmd).
// We escape characters defensively when they could trigger markdown syntax.
//
// The ASCII apostrophe `'` is escaped when the reader would otherwise
// misclassify it as a smart-quote open/close: whenever the previous char in
// the Str body is Unicode alphanumeric AND the next char in the Str body is
// either absent or non-alphanumeric. The reader's smart-quote-apostrophe
// classification accepts `'` only when it sits between two alphanumeric
// characters (e.g. `don't`, `it's`, `ab'9`); in any other position the
// apostrophe is treated as a quotation mark and produces a Q-2-7 or Q-2-10
// parse error on the regenerated qmd. See bd-8lcm / issue #201.
fn escape_markdown(text: &str) -> String {
let mut result = String::new();
for ch in text.chars() {
let mut chars = text.chars().peekable();
let mut prev_char: Option<char> = None;
while let Some(ch) = chars.next() {
match ch {
// Characters that must be escaped to avoid triggering markdown syntax:
'\\' => result.push_str("\\\\"), // Escape character itself
Expand All @@ -1400,13 +1411,24 @@ fn escape_markdown(text: &str) -> String {
'{' => result.push_str("\\{"), // Attribute span open: bare { in
'}' => result.push_str("\\}"), // a Str body is always a parse
// error in qmd. Always escape.
'\'' => {
let next_in_str = chars.peek().copied();
let prev_is_alnum = prev_char.is_some_and(|c| c.is_alphanumeric());
let next_is_alnum = next_in_str.is_some_and(|c| c.is_alphanumeric());
if prev_is_alnum && !next_is_alnum {
result.push_str("\\'");
} else {
result.push('\'');
}
}

// Characters that don't need escaping in most contexts:
// . , - + ! ? = : ; / ( ) % & ' "
// . , - + ! ? = : ; / ( ) % & "
// These are only special in very specific contexts and escaping them
// everywhere would make output unnecessarily verbose.
_ => result.push(ch),
}
prev_char = Some(ch);
}
result
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ab\'.cd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
reveal.js\' jump-to-slide.