-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve the line/column number caching logic #321
Conversation
First off, thanks! This looks great. The O(n^2) behavior has been bothering me, but I was waiting to fix it in Parsimmon 2.0 (vaporware? is the maintainer wavebeem every going to release it???) via keeping track of index+line+column instead of just index, a la my implementation in bread-n-butter (https://github.com/wavebeem/bread-n-butter/blob/main/src/bread-n-butter.ts#L455-L472) My only concern with this approach is that given this is optimizing for extremely large inputs, it's also implementing a cache which is never cleared, causing that input string to remain in memory for the life of the program. Maybe the cache could be cleared at the end of each parse, to avoid leaking memory? https://github.com/jneen/parsimmon/blob/master/src/parsimmon.js#L871 Also, this should definitely be added to 90 seconds to under 1 second: you LOVE to see it!! |
Good call, I'll make those adjustments. Keeping track of line/column definitely seems like the way to go. I'm happy to help out on releasing 2.0 if I can, Parsimmon has been great so far. Thanks for the speedy feedback! |
Looks like you need to run prettier btw |
I'll sort out the test coverage drop |
Added a simple test using a couple index parsers to hit the untested branch. I think this is the easiest way to do it because we need Let me know if there is an easier way though |
Looks great, I'll do a release this weekend probably. Thanks for the excellent work! |
@brendo-m Thanks! 1.17.0 is out https://www.npmjs.com/package/parsimmon |
Thanks! |
npm run lint:fix
to ensure Prettier and ESLint have passednpm test
)CHANGELOG.md
(andAPI.md
if this is an API change)I was running into performance issues when parsing very large files because of the O(n^2) nature of the
makeLineColumnIndex
method. #297 added some memoization but it didn't go far enough.Instead of just returning the cached result if it exists we now walk backwards from the index until we find a previously cached index and compute the new line and column using that info. Hopefully the comments make it clear what's happening.
This version runs in O(n) and is a drastic improvement in efficiency. Parsing time for a 25k line file drops from ~90s to under 1s.