🐛 Treat an indented ---/... as a plain scalar, not a document marker#112
Conversation
The fast-path scanner recognized `---`/`...` as document markers wherever they appeared, including indented as a block mapping value or sequence item. Per the spec (and PyYAML/libyaml, and our own quoted-scalar scanner), those markers are only markers at the start of a line (column 0); indented they are ordinary plain scalar content. The bug silently turned such a value into null and could swallow the keys that followed it (`m:\n n: ...\n o: 2` lost `o`), or surfaced as a spurious "trailing content after document" error. Gate the document-start/document-end dispatch (and the directive-window reopen) on column 0, matching the existing `%` directive arm and `at_document_marker`. Surfaced by auditing the real-world corpus: it rejected three valid Stripe OpenAPI fixtures that PyYAML accepts.
|
Warning Review limit reached
More reviews will be available in 31 minutes and 29 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughScanner handling now treats Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a fast-path scanner bug in the YAMLRocks library where ---/... were recognized as document markers wherever they appeared, rather than only at column 0. Per the YAML spec, a document marker is only a marker at the start of a line; when indented, ---/... is ordinary plain-scalar content. The previous behavior caused silent data corruption (e.g., top: ... decoded to {"top": None}, and following keys could be dropped) or spurious "trailing content" errors. The fix gates the block-token dispatch and the directive-window reopen on column() == 0, bringing the block path in line with the quoted-scalar, plain-scalar, flow-token, and directive paths that already enforced this invariant.
Changes:
- Gate the
---/...block-token dispatch oncolumn() == 0so indented markers fall through to plain-scalar scanning. - Gate the directive-window
is_doc_endreopen oncolumn() == 0so an indented...no longer keeps the directive window open. - Add regression tests for indented markers as mapping values, sequence items, the no-key-swallowing case, and a guard that real column-0 markers still delimit documents.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/scanner/mod.rs |
Adds column() == 0 to the ---/... dispatch arms and to is_doc_end, so indented markers are treated as plain scalars and don't reopen the directive window. |
tests/core/test_loads.py |
Adds parametrized regression tests covering indented markers as values/sequence items, no key swallowing, and column-0 markers still delimiting documents. |
I verified the change is consistent with the four other scanner sites that already gate marker recognition on column() == 0 (src/scanner/scalar.rs:10-14, src/scanner/scalar.rs:624-628, src/scanner/mod.rs:296-298, src/scanner/mod.rs:438), confirmed the match-arm ordering and edge cases (-, --, real column-0 markers) behave correctly, and confirmed no existing tests or static corpus exclusion lists rely on the old behavior. My only observation is a nit about adding round-trip assertions to align with the sibling marker regression tests.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The scanner is shared between the fast and round-trip paths, so assert the indented `---`/`...` cases also re-emit byte-for-byte through OPT_ROUND_TRIP, not just that the fast path decodes the string. Addresses review feedback.
Breaking change
This changes the parse result for a narrow, previously-mishandled input: an indented
---/...used as a block mapping value or sequence item. Before, the fast path silently decoded it tonull(and could drop the keys that followed) or raised a spurious "trailing content after document" error. After, it decodes to the literal string"---"/"...", matching the YAML spec and PyYAML. Anyone who happened to rely on the oldnullwould see the correct string instead.Proposed change
The fast-path scanner recognized
---/...as document markers wherever they appeared, not only at the start of a line. A document marker is only a marker at column 0; indented,---/...is ordinary plain-scalar content. The quoted-scalar scanner already enforced this (at_document_markercheckscolumn() == 0) and so does the%directive arm, but the block-token dispatch for---/...did not.The result was silent data corruption on the default
loadspath:top: ...decoded to{"top": None}instead of{"top": "..."}m:\n n: ...\n o: 2decoded to{"m": {"n": None}}, silently droppingo---This gates the document-start/document-end dispatch (and the directive-window reopen) on
column() == 0, so an indented marker falls through to plain-scalar scanning. Real column-0 markers and multi-document streams are unaffected.Surfaced by auditing the real-world config corpus: the fast path rejected three valid Stripe OpenAPI fixtures (
fixtures3*.yaml, which use- ...list items) that PyYAML accepts.Type of change
Additional information
Checklist
uv run pytestpasses locally. A pull request cannot be merged unless CI is green.uv run ruff check .anduv run ruff format --check .pass.cargo fmt --checkandcargo clippy --all-targets -- -D warningspass.If the change is user-facing:
docs/is added or updated, anddocs/verify_examples.pystill passes.