Skip to content

Scrub all S-suffix source-info keys in round-trip JSON comparison (bd-j9wp)#166

Merged
cscheid merged 1 commit intomainfrom
bugfix/j9wp-roundtrip-source-info-scrub
May 8, 2026
Merged

Scrub all S-suffix source-info keys in round-trip JSON comparison (bd-j9wp)#166
cscheid merged 1 commit intomainfrom
bugfix/j9wp-roundtrip-source-info-scrub

Conversation

@cscheid
Copy link
Copy Markdown
Member

@cscheid cscheid commented May 8, 2026

Summary

remove_location_fields in crates/pampa/tests/test.rs (used by test_qmd_roundtrip_consistency to compare JSON1 vs JSON3 modulo source info) was scrubbing only attrS, targetS, citationIdS — but the JSON writer emits seven more S-suffix keys for table internals: bodiesS, bodyS, captionS, cellsS, footS, headS, rowsS. captionS in particular is a scalar foreign-key into astContext.sourceInfoPool that sits directly on the Table object, so it survived the existing scrub as a dangling integer reference and could fail an otherwise content-stable round trip.

This was discovered while implementing #165 (issue #162); the table fixture there had to be dropped because of this gap and is reinstated in this PR.

Not a determinism issue

Verified across 5 repeated runs that the same input always produces the same captionS:N on JSON1 and the same captionS:M on JSON3. The failure was deterministic but unscrubbed, not flaky:

  • JSON1's astContext.sourceInfoPool has 23 entries derived from the original qmd's source positions.
  • JSON3's pool has 21 entries derived from the regenerated qmd's source positions.
  • Both pools are deterministic; the writer regenerates qmd in a stable way, but the layout differs from the original (different cell padding etc.), so the pool contents and traversal-order IDs differ.
  • After scrubbing astContext (which removes the pools themselves), the bare S-IDs are references to nothing, but the test compared them numerically anyway.

Change

  • Extend the scrub list in remove_location_fields to cover all ten S-suffix keys the JSON writer emits today (grep -oE '"[a-zA-Z]+S"' crates/pampa/src/writers/json.rs is the source of truth).
  • Add a doc-comment on remove_location_fields calling out the S-suffix convention so future writer-side additions extend this list too.
  • Restore crates/pampa/tests/roundtrip_tests/qmd-json-qmd/table_with_inline_nbsp_in_cell.qmd as a regression fixture. It failed at HEAD on the captionS:17 ≠ captionS:15 mismatch and passes after the scrub-list extension.

Verification

  • TDD: fixture failed at HEAD with the expected captionS mismatch, passes after the fix.
  • cargo nextest run -p pampa: 3685 tests passing.
  • cargo xtask verify --skip-hub-build --skip-hub-tests --skip-trace-viewer-build --skip-trace-viewer-tests: clean.

Risk

Adding more scrubs is monotone-safe: a previously-passing fixture cannot start failing because the comparison only became less strict. The theoretical risk is that a future writer-side change names a content-bearing key with S-suffix, and this list silently swallows it. The doc-comment on remove_location_fields flags the convention so a future reviewer extending json.rs sees the dependency.

Test plan

  • Confirm there is no expectation elsewhere in the repo that any S-suffix key carries content rather than source info.

🤖 Generated with Claude Code

…-j9wp)

remove_location_fields in crates/pampa/tests/test.rs (used by
test_qmd_roundtrip_consistency to compare JSON1 vs JSON3 modulo source
info) was scrubbing only attrS, targetS, citationIdS — but the JSON
writer emits seven more S-suffix keys for table internals: bodiesS,
bodyS, captionS, cellsS, footS, headS, rowsS. captionS in particular
is a scalar foreign-key into astContext.sourceInfoPool that sits
directly on the Table object, so it survived the existing scrub as a
dangling integer reference and could fail an otherwise content-stable
round trip.

This was deterministic, not flaky: same input always produced the
same captionS:N on JSON1 and the same captionS:M on JSON3, because the
two parses build differently-sized sourceInfoPools (the regenerated
qmd has different source positions from the original) and the
traversal-order IDs into each pool are stable. After scrubbing
astContext (which removes the pools themselves), the bare IDs are
references to nothing, but the test compared them numerically anyway.

Add the missing seven keys to the scrub list, plus a regression
fixture (table_with_inline_nbsp_in_cell.qmd) that exercises the gap
— the fixture was originally written for #162 / bd-1aip but had to
be dropped from that PR because of this scrub gap; it's reinstated
here.

Closes bd-j9wp.
@cscheid cscheid merged commit 8e6f786 into main May 8, 2026
4 checks passed
@cscheid cscheid deleted the bugfix/j9wp-roundtrip-source-info-scrub branch May 8, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant