⚡ Skip doomed numeric parses and the merge pass on common loads#90
Conversation
classify_plain_12 attempted a full integer parse and a float parse (dec2flt) on every plain scalar, including the common case of a string that is plainly not a number (a name, path, or word). Every YAML integer or float starts with a digit, a sign, or a dot, so after the null/bool/merge checks, gate the parse attempts on the first byte and return Str directly otherwise. Behavior is unchanged; it just avoids two doomed parses per string scalar.
… 1.1) Apply the same first-byte gate as the 1.2 resolver: after the null/merge/bool checks, a plain scalar that does not start with a digit, sign, or dot cannot be any 1.1 number (decimal, 0x/0o/0b, sexagesimal, underscored, .inf/.nan), so return Str without trying the integer, big-integer, and float parses. Behavior unchanged.
apply_merge_keys walks the entire decoded value tree to resolve and strip << markers, and ran unconditionally for every document. Track whether any merge key was produced during decode and skip the pass entirely otherwise, which is the common case (most documents have no merge keys). Behavior is unchanged: a stream with no << has no markers to resolve or strip.
|
Warning Review limit reached
More reviews will be available in 29 minutes and 8 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughTwo independent performance optimizations are added. In 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merging this PR will improve performance by 5.42%
Performance Changes
Tip Curious why this is faster? Comment Comparing |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 40b31360-e19c-445b-91d1-bd137dd419f8
📒 Files selected for processing (3)
src/decode/mod.rssrc/resolver/yaml11.rssrc/resolver/yaml12.rs
There was a problem hiding this comment.
Pull request overview
This PR applies two profiling-driven, intended-to-be-behavior-preserving micro-optimizations to the fast loads path of YAMLRocks (a Rust/PyO3 YAML library). It adds an early-exit in the scalar resolvers so plainly-non-numeric scalars skip the integer/float parse attempts, and it skips the whole-tree merge-key post-pass when no merge key was produced during decoding.
Changes:
- Added a first-byte gate in
classify_plain_12/classify_plain_11that returnsStrimmediately when a plain scalar doesn't start with a digit, sign, or dot (no number can), avoiding wasted parse attempts. - Added a
saw_mergeflag on the decoder that records whether any<<merge marker was produced, and used it to skipapply_merge_keysentirely when no merge was seen.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/resolver/yaml12.rs |
First-byte gate to skip int/float parsing for non-numeric plain scalars (1.2 schema). Verified correct and behavior-preserving. |
src/resolver/yaml11.rs |
Same first-byte gate for the richer 1.1 number forms. Verified correct and behavior-preserving. |
src/decode/mod.rs |
Records saw_merge when a plain << marker is produced and skips the merge post-pass otherwise; the sibling explicit-!!merge-tag marker site (line 649) is not flagged, which is a correctness gap. |
Notes for the author:
- The numeric-parse gate is sound:
is_nullconsumes the empty string before the gate, so thevalue.as_bytes()[0]index cannot panic, and every YAML 1.1/1.2 integer, float, and special-float form begins with a digit,+,-, or., so no number is misclassified as a string. - The merge-skip optimization misses one of the two merge-marker producers (the YAML 1.1 explicit
!!mergetag on an empty node), which can silently break explicit-tag merges and leak an internal marker to the public API. See the inline comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The merge-skip optimization gated the merge post-pass on a saw_merge flag set where a plain << produced a merge marker. The YAML 1.1 resolver also produces a merge marker from the !!merge tag (classify_tagged_11), via the tagged-empty node path, which did not set the flag, so a marker from that path would skip the post-pass and leak unresolved. Set the flag there too. Caught in review by CodeRabbit on #90; add a 1.1 regression test.
|
Good catch, this was real. I had checked only the 1.2 resolver's Fixed in f74e07e: the tagged-merge path sets the flag too, so the optimization is now equivalent to always running the post-pass (it runs whenever any marker exists). Added a 1.1 regression test for ../Frenck Blogging my personal ramblings at frenck.dev |
The 1.1 int and float parsers strip every underscore before parsing, so a leading-underscore scalar (_5 -> 5, _1:30 -> 90) resolves as a number on main. The first-byte gate added for the perf pass dropped those to strings. Add _ to the gate's allowed first bytes so they still reach the parsers, keeping the optimization behavior-preserving. Caught in review by Copilot on #90; add a regression test. (1.2 has no underscore separators, so its gate is unaffected.)
|
Also a valid catch. The 1.1 int and float parsers strip every underscore before parsing, so a leading-underscore scalar resolves as a number on Fixed in 60a4b69 by adding ../Frenck Blogging my personal ramblings at frenck.dev |
Breaking change
None. Internal performance work; types and values are unchanged.
Proposed change
Two behavior-preserving optimizations found by profiling
loadsover a representative real-world corpus (311 documents, ~566 KB, across Kubernetes, GitHub Actions, Docker Compose, Home Assistant, Helm, OpenAPI, Prometheus, and ESPHome) under callgrind.1. Skip doomed numeric parses in the scalar resolver (1.2 and 1.1).
classifyattempted a full integer parse and a float parse (dec2flt) on every plain scalar, including the common case of a string that is plainly not a number: a name, a path, a word. Every YAML integer and float starts with a digit, a sign, or a dot, so after the null/bool/merge checks, a plain scalar that starts with anything else returnsStrimmediately. The same gate applies to the 1.1 resolver, where it pays off more because 1.1 has a larger boolean set and octal/binary/sexagesimal/underscore integer forms to skip.2. Skip the merge-key post-pass when no merge key was seen.
apply_merge_keyswalks the whole decoded value tree to resolve and strip<<markers, and ran for every document. The decoder now records whether any merge key was produced and skips the pass entirely otherwise, which is the common case.Measured (real-world corpus, median over 50 runs)
loads(YAML 1.2)loads(YAML 1.1)Re-profiling confirms
classifyroughly halved and the numeric-parse and merge-pass functions dropped out of the hot list. No hot-path branches were added; both changes only remove work.Type of change
Additional information
Checklist
uv run pytestpasses locally. A pull request cannot be merged unless CI is green.uv run ruff check .anduv run ruff format --check .pass.cargo fmt --checkandcargo clippy --all-targets -- -D warningspass.If the change is user-facing:
docs/is added or updated, anddocs/verify_examples.pystill passes.