-
-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid computing line offsets after the last token #1023
Conversation
Previously when constructing the root Pairs instance line offsets were computed for the full input string. If the parse stopped before EOI this can require a substantial amount of unnecessary work.
WalkthroughThe recent changes in the Changes
Poem
Tip Early access features: enabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- pest/src/iterators/flat_pairs.rs (1 hunks)
- pest/src/iterators/pairs.rs (2 hunks)
Additional comments not posted (3)
pest/src/iterators/flat_pairs.rs (1)
35-42
: LGTM!The changes to the
new
function simplify its responsibilities by acceptingline_index
as a parameter. This is a good design choice.pest/src/iterators/pairs.rs (2)
54-64
: LGTM!The changes to the
new
function ensure thatline_index
is always available, either passed in or created based on the last token position. This is a good design choice.
218-224
: LGTM!The
flatten
method now correctly passes theline_index
to theflat_pairs::new
function, ensuringFlatPairs
receives the necessary parameter directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm or is there any concern @huacnlee ?
@@ -32,13 +32,14 @@ pub struct FlatPairs<'i, R> { | |||
pub fn new<'i, R: RuleType>( | |||
queue: Rc<Vec<QueueableToken<'i, R>>>, | |||
input: &'i str, | |||
line_index: Rc<LineIndex>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even though this method is pub, it's not exposed, so it should be ok semver-wise
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [peg](https://togithub.com/kevinmehall/rust-peg) | dependencies | patch | `0.8.3` -> `0.8.4` | | [pest](https://pest.rs/) ([source](https://togithub.com/pest-parser/pest)) | dependencies | patch | `2.7.10` -> `2.7.11` | | [serde_json](https://togithub.com/serde-rs/json) | dependencies | patch | `1.0.119` -> `1.0.121` | | [winnow](https://togithub.com/winnow-rs/winnow) | dependencies | patch | `0.6.13` -> `0.6.18` | --- ### Release Notes <details> <summary>kevinmehall/rust-peg (peg)</summary> ### [`v0.8.4`](https://togithub.com/kevinmehall/rust-peg/releases/tag/0.8.4) [Compare Source](https://togithub.com/kevinmehall/rust-peg/compare/0.8.3...0.8.4) #### Fixes - Fix macro panic for parse error at end of `peg::parser!{ }` specification ([https://github.com/kevinmehall/rust-peg/issues/376](https://togithub.com/kevinmehall/rust-peg/issues/376)) by [@​kevinmehall](https://togithub.com/kevinmehall) - Fix handling of `r#` raw idents ([#​378](https://togithub.com/kevinmehall/rust-peg/issues/378)) by [@​A4-Tacks](https://togithub.com/A4-Tacks) in [https://github.com/kevinmehall/rust-peg/pull/379](https://togithub.com/kevinmehall/rust-peg/pull/379) **Full Changelog**: kevinmehall/rust-peg@0.8.3...0.8.4 </details> <details> <summary>pest-parser/pest (pest)</summary> ### [`v2.7.11`](https://togithub.com/pest-parser/pest/releases/tag/v2.7.11) [Compare Source](https://togithub.com/pest-parser/pest/compare/v2.7.10...v2.7.11) ##### What's Changed - Avoid computing line offsets after the last token by [@​wabain](https://togithub.com/wabain) in [https://github.com/pest-parser/pest/pull/1023](https://togithub.com/pest-parser/pest/pull/1023) - fix: Remove unnecessary qualification by [@​austriancoder](https://togithub.com/austriancoder) in [https://github.com/pest-parser/pest/pull/1024](https://togithub.com/pest-parser/pest/pull/1024) ##### New Contributors - [@​wabain](https://togithub.com/wabain) made their first contribution in [https://github.com/pest-parser/pest/pull/1023](https://togithub.com/pest-parser/pest/pull/1023) - [@​austriancoder](https://togithub.com/austriancoder) made their first contribution in [https://github.com/pest-parser/pest/pull/1024](https://togithub.com/pest-parser/pest/pull/1024) **Full Changelog**: pest-parser/pest@v2.7.10...v2.7.11 ##### Warning: Semantic Versioning Note that the node tag feature in 2.6.0 was a technically semver-breaking change even though it is a backwards-compatible / non-breaking change in the meta-grammar. There may be similar non-breaking changes to the meta-grammar between minor versions in the future. These non-breaking changes, however, may translate into semver-breaking changes due to the additional variants propagated from the generated `Rule` enum. This new feature caused issues in some Cargo version resolution situations where Cargo mixed different versions of pest dependencies. For this reason, these "grammar non-breaking but semver-breaking" changes are now available only under the "grammar-extras" feature flag. If you would like to use node tags (or other future grammar features), you can do so by enabling this flag on the pest_derive crate in your Cargo.toml: ... pest_derive = { version = "2.7", features = ["grammar-extras"] } </details> <details> <summary>serde-rs/json (serde_json)</summary> ### [`v1.0.121`](https://togithub.com/serde-rs/json/releases/tag/v1.0.121) [Compare Source](https://togithub.com/serde-rs/json/compare/v1.0.120...v1.0.121) - Optimize position search in error path ([#​1160](https://togithub.com/serde-rs/json/issues/1160), thanks [@​purplesyringa](https://togithub.com/purplesyringa)) ### [`v1.0.120`](https://togithub.com/serde-rs/json/releases/tag/v1.0.120) [Compare Source](https://togithub.com/serde-rs/json/compare/v1.0.119...v1.0.120) - Correctly specify required version of `indexmap` dependency ([#​1152](https://togithub.com/serde-rs/json/issues/1152), thanks [@​cforycki](https://togithub.com/cforycki)) </details> <details> <summary>winnow-rs/winnow (winnow)</summary> ### [`v0.6.18`](https://togithub.com/winnow-rs/winnow/blob/HEAD/CHANGELOG.md#0618---2024-07-31) [Compare Source](https://togithub.com/winnow-rs/winnow/compare/v0.6.17...v0.6.18) ### [`v0.6.17`](https://togithub.com/winnow-rs/winnow/blob/HEAD/CHANGELOG.md#0617---2024-07-31) [Compare Source](https://togithub.com/winnow-rs/winnow/compare/v0.6.16...v0.6.17) ##### Features - Make `Checkpoint`s comparable ### [`v0.6.16`](https://togithub.com/winnow-rs/winnow/blob/HEAD/CHANGELOG.md#0616---2024-07-25) [Compare Source](https://togithub.com/winnow-rs/winnow/compare/v0.6.15...v0.6.16) ### [`v0.6.15`](https://togithub.com/winnow-rs/winnow/blob/HEAD/CHANGELOG.md#0615---2024-07-22) [Compare Source](https://togithub.com/winnow-rs/winnow/compare/v0.6.14...v0.6.15) ##### Compatibility - Deprecated `Parser::recognize` in favor of `Parser::take` - Deprecated `Parser::with_recognized` in favor of `Parser::taken` ##### Fixes - Renamed `Parser::recognize` to `Parser::take` to be consistent with other `take` parsers - Renamed `Parser::with_recognized` to `Parser::with_taken` to be consistent with other `take` parsers ### [`v0.6.14`](https://togithub.com/winnow-rs/winnow/blob/HEAD/CHANGELOG.md#0614---2024-07-19) [Compare Source](https://togithub.com/winnow-rs/winnow/compare/v0.6.13...v0.6.14) ##### Fixes - Removed unused `I: Clone` bound on `Parser::parse` </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 5am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://togithub.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/rosetta-rs/parse-rosetta-rs). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This pull request changes the logic which precomputes line endings so that it stops after the end index of the last token in the parse output.
Background: I've integrated pest into a tool which parses partially structured data. When I identify certain kinds of data, I call pest to parse a single item at the start of the input. (It isn't easily possible to determine how long an item might be without parsing it, so I can't pass pest only the source it needs for the current item.) This mostly works great, but I noticed that on some inputs it was unexpectedly slow, and that the vast majority of the time was being spent in
pest::iterators::line_index::LineIndex::new()
. This was because each time pest completed a toplevel parse it would then precompute line endings for the rest of the input string, even though only at most a few kilobytes at the start of the string were parsed.This change takes the time to process a 23MB, 300,000-line document with 32,000 separate pest parse calls down from two minutes to about 4 seconds.
I also changed the FlatPairs initialization to reuse the LineIndex from Pairs. Previously it was being recomputed there.
Summary by CodeRabbit
FlatPairs
andPairs
components to enhance performance and maintainability.