Skip to content

🐛 Treat an indented ---/... as a plain scalar, not a document marker#112

Merged
frenck merged 2 commits into
mainfrom
frenck/fix-indented-document-marker
Jun 24, 2026
Merged

🐛 Treat an indented ---/... as a plain scalar, not a document marker#112
frenck merged 2 commits into
mainfrom
frenck/fix-indented-document-marker

Conversation

@frenck

@frenck frenck commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Breaking change

This changes the parse result for a narrow, previously-mishandled input: an indented ---/... used as a block mapping value or sequence item. Before, the fast path silently decoded it to null (and could drop the keys that followed) or raised a spurious "trailing content after document" error. After, it decodes to the literal string "---"/"...", matching the YAML spec and PyYAML. Anyone who happened to rely on the old null would see the correct string instead.

Proposed change

The fast-path scanner recognized ---/... as document markers wherever they appeared, not only at the start of a line. A document marker is only a marker at column 0; indented, ---/... is ordinary plain-scalar content. The quoted-scalar scanner already enforced this (at_document_marker checks column() == 0) and so does the % directive arm, but the block-token dispatch for ---/... did not.

The result was silent data corruption on the default loads path:

  • top: ... decoded to {"top": None} instead of {"top": "..."}
  • m:\n n: ...\n o: 2 decoded to {"m": {"n": None}}, silently dropping o
  • the same for ---

This gates the document-start/document-end dispatch (and the directive-window reopen) on column() == 0, so an indented marker falls through to plain-scalar scanning. Real column-0 markers and multi-document streams are unaffected.

Surfaced by auditing the real-world config corpus: the fast path rejected three valid Stripe OpenAPI fixtures (fixtures3*.yaml, which use - ... list items) that PyYAML accepts.

Type of change

  • Dependency or tooling upgrade
  • Bugfix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Deprecation (replaces or removes a feature, with a migration path)
  • Breaking change (a fix or feature that changes existing behavior)
  • Code quality, refactor, or test-only change
  • Documentation only

Additional information

  • This PR fixes or closes issue: None
  • This PR is related to: None
  • Link to a separate documentation pull request: None

Checklist

  • I have read the AI Policy, and this pull request was not created by an autonomous agent.
  • I fully understand the code in this pull request and can explain every line, including any AI-assisted changes.
  • The change is covered by tests, and uv run pytest passes locally. A pull request cannot be merged unless CI is green.
  • uv run ruff check . and uv run ruff format --check . pass.
  • cargo fmt --check and cargo clippy --all-targets -- -D warnings pass.
  • Round-trip fidelity is preserved: an unmodified document still re-emits byte-for-byte.
  • No commented-out or dead code is left in the pull request.

If the change is user-facing:

  • Documentation under docs/ is added or updated, and docs/verify_examples.py still passes.

The fast-path scanner recognized `---`/`...` as document markers wherever they
appeared, including indented as a block mapping value or sequence item. Per the
spec (and PyYAML/libyaml, and our own quoted-scalar scanner), those markers are
only markers at the start of a line (column 0); indented they are ordinary plain
scalar content. The bug silently turned such a value into null and could swallow
the keys that followed it (`m:\n  n: ...\n  o: 2` lost `o`), or surfaced as a
spurious "trailing content after document" error.

Gate the document-start/document-end dispatch (and the directive-window reopen)
on column 0, matching the existing `%` directive arm and `at_document_marker`.
Surfaced by auditing the real-world corpus: it rejected three valid Stripe
OpenAPI fixtures that PyYAML accepts.
Copilot AI review requested due to automatic review settings June 24, 2026 20:33
@frenck frenck added bugfix Inconsistencies or issues which will cause a problem for users or implementers. rust labels Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@frenck, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 29 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c906ad60-83e1-4c46-9880-e3041e0eea0b

📥 Commits

Reviewing files that changed from the base of the PR and between 7964db9 and 383e1a8.

📒 Files selected for processing (1)
  • tests/core/test_loads.py
📝 Walkthrough

Walkthrough

Scanner handling now treats --- and ... as document markers only at column 0. Indented forms are parsed as literal scalar content instead of starting or ending documents. Tests were added for indented marker values in mappings and sequences, and for column-0 document separation in loads_all.

Possibly related PRs

  • frenck/YAMLRocks#84 — Also changes fetch_block_token handling for YAML document markers ---/... and related directive-window behavior.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: indented YAML markers are treated as plain scalars instead of document markers.
Description check ✅ Passed The description is directly related and accurately explains the bug, the fix, and its impact on parsing.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codspeed-hq

codspeed-hq Bot commented Jun 24, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing frenck/fix-indented-document-marker (383e1a8) with main (db7ce34)

Open in CodSpeed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a fast-path scanner bug in the YAMLRocks library where ---/... were recognized as document markers wherever they appeared, rather than only at column 0. Per the YAML spec, a document marker is only a marker at the start of a line; when indented, ---/... is ordinary plain-scalar content. The previous behavior caused silent data corruption (e.g., top: ... decoded to {"top": None}, and following keys could be dropped) or spurious "trailing content" errors. The fix gates the block-token dispatch and the directive-window reopen on column() == 0, bringing the block path in line with the quoted-scalar, plain-scalar, flow-token, and directive paths that already enforced this invariant.

Changes:

  • Gate the ---/... block-token dispatch on column() == 0 so indented markers fall through to plain-scalar scanning.
  • Gate the directive-window is_doc_end reopen on column() == 0 so an indented ... no longer keeps the directive window open.
  • Add regression tests for indented markers as mapping values, sequence items, the no-key-swallowing case, and a guard that real column-0 markers still delimit documents.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/scanner/mod.rs Adds column() == 0 to the ---/... dispatch arms and to is_doc_end, so indented markers are treated as plain scalars and don't reopen the directive window.
tests/core/test_loads.py Adds parametrized regression tests covering indented markers as values/sequence items, no key swallowing, and column-0 markers still delimiting documents.

I verified the change is consistent with the four other scanner sites that already gate marker recognition on column() == 0 (src/scanner/scalar.rs:10-14, src/scanner/scalar.rs:624-628, src/scanner/mod.rs:296-298, src/scanner/mod.rs:438), confirmed the match-arm ordering and edge cases (-, --, real column-0 markers) behave correctly, and confirmed no existing tests or static corpus exclusion lists rely on the old behavior. My only observation is a nit about adding round-trip assertions to align with the sibling marker regression tests.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/core/test_loads.py
The scanner is shared between the fast and round-trip paths, so assert the
indented `---`/`...` cases also re-emit byte-for-byte through OPT_ROUND_TRIP,
not just that the fast path decodes the string. Addresses review feedback.
@frenck frenck merged commit 64ba8dd into main Jun 24, 2026
73 checks passed
@frenck frenck deleted the frenck/fix-indented-document-marker branch June 24, 2026 21:25
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jun 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bugfix Inconsistencies or issues which will cause a problem for users or implementers. rust

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants