Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve the line/column number caching logic #321

Merged
merged 1 commit into from
Apr 18, 2021
Merged

fix: Improve the line/column number caching logic #321

merged 1 commit into from
Apr 18, 2021

Conversation

brendo-m
Copy link
Contributor

  • I have run npm run lint:fix to ensure Prettier and ESLint have passed
  • Coveralls bot has replied that the tests pass with 100% code coverage (npm test)
  • I have updated CHANGELOG.md (and API.md if this is an API change)

I was running into performance issues when parsing very large files because of the O(n^2) nature of the makeLineColumnIndex method. #297 added some memoization but it didn't go far enough.

Instead of just returning the cached result if it exists we now walk backwards from the index until we find a previously cached index and compute the new line and column using that info. Hopefully the comments make it clear what's happening.

This version runs in O(n) and is a drastic improvement in efficiency. Parsing time for a 25k line file drops from ~90s to under 1s.

@coveralls
Copy link

coveralls commented Apr 16, 2021

Coverage Status

Coverage remained the same at 100.0% when pulling 6a2df04 on Stedi:caching into 16a6496 on jneen:master.

@wavebeem
Copy link
Collaborator

First off, thanks! This looks great. The O(n^2) behavior has been bothering me, but I was waiting to fix it in Parsimmon 2.0 (vaporware? is the maintainer wavebeem every going to release it???) via keeping track of index+line+column instead of just index, a la my implementation in bread-n-butter (https://github.com/wavebeem/bread-n-butter/blob/main/src/bread-n-butter.ts#L455-L472)

My only concern with this approach is that given this is optimizing for extremely large inputs, it's also implementing a cache which is never cleared, causing that input string to remain in memory for the life of the program.

Maybe the cache could be cleared at the end of each parse, to avoid leaking memory? https://github.com/jneen/parsimmon/blob/master/src/parsimmon.js#L871

Also, this should definitely be added to CHANGELOG.md, and you're more than welcome to add your handle as attribution for your work on this :)

90 seconds to under 1 second: you LOVE to see it!!

@brendo-m
Copy link
Contributor Author

Good call, I'll make those adjustments.

Keeping track of line/column definitely seems like the way to go. I'm happy to help out on releasing 2.0 if I can, Parsimmon has been great so far.

Thanks for the speedy feedback!

package.json Outdated Show resolved Hide resolved
@wavebeem
Copy link
Collaborator

> prettier --list-different '{examples,src,test}/**/*.js' webpack.config.js && eslint examples src test webpack.config.js

src/parsimmon.js

The command "npm test" exited with 1.

Looks like you need to run prettier btw

@brendo-m
Copy link
Contributor Author

I'll sort out the test coverage drop

@brendo-m
Copy link
Contributor Author

Added a simple test using a couple index parsers to hit the untested branch. I think this is the easiest way to do it because we need makeLineColumnIndex to run twice and for the second call to walk back over a newline.

Let me know if there is an easier way though

@wavebeem
Copy link
Collaborator

Looks great, I'll do a release this weekend probably. Thanks for the excellent work!

@wavebeem wavebeem merged commit a5585aa into jneen:master Apr 18, 2021
@wavebeem
Copy link
Collaborator

@brendo-m Thanks! 1.17.0 is out https://www.npmjs.com/package/parsimmon

@brendo-m
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants