fix: Improve the line/column number caching logic #321

brendo-m · 2021-04-16T21:15:18Z

I have run npm run lint:fix to ensure Prettier and ESLint have passed
Coveralls bot has replied that the tests pass with 100% code coverage (npm test)
I have updated CHANGELOG.md (and API.md if this is an API change)

I was running into performance issues when parsing very large files because of the O(n^2) nature of the makeLineColumnIndex method. #297 added some memoization but it didn't go far enough.

Instead of just returning the cached result if it exists we now walk backwards from the index until we find a previously cached index and compute the new line and column using that info. Hopefully the comments make it clear what's happening.

This version runs in O(n) and is a drastic improvement in efficiency. Parsing time for a 25k line file drops from ~90s to under 1s.

coveralls · 2021-04-16T21:16:49Z

Coverage remained the same at 100.0% when pulling 6a2df04 on Stedi:caching into 16a6496 on jneen:master.

wavebeem · 2021-04-16T21:42:47Z

First off, thanks! This looks great. The O(n^2) behavior has been bothering me, but I was waiting to fix it in Parsimmon 2.0 (vaporware? is the maintainer wavebeem every going to release it???) via keeping track of index+line+column instead of just index, a la my implementation in bread-n-butter (https://github.com/wavebeem/bread-n-butter/blob/main/src/bread-n-butter.ts#L455-L472)

My only concern with this approach is that given this is optimizing for extremely large inputs, it's also implementing a cache which is never cleared, causing that input string to remain in memory for the life of the program.

Maybe the cache could be cleared at the end of each parse, to avoid leaking memory? https://github.com/jneen/parsimmon/blob/master/src/parsimmon.js#L871

Also, this should definitely be added to CHANGELOG.md, and you're more than welcome to add your handle as attribution for your work on this :)

90 seconds to under 1 second: you LOVE to see it!!

brendo-m · 2021-04-16T21:54:15Z

Good call, I'll make those adjustments.

Keeping track of line/column definitely seems like the way to go. I'm happy to help out on releasing 2.0 if I can, Parsimmon has been great so far.

Thanks for the speedy feedback!

package.json

wavebeem · 2021-04-16T22:18:35Z

> prettier --list-different '{examples,src,test}/**/*.js' webpack.config.js && eslint examples src test webpack.config.js

src/parsimmon.js

The command "npm test" exited with 1.

Looks like you need to run prettier btw

brendo-m · 2021-04-16T22:23:08Z

I'll sort out the test coverage drop

brendo-m · 2021-04-16T23:36:41Z

Added a simple test using a couple index parsers to hit the untested branch. I think this is the easiest way to do it because we need makeLineColumnIndex to run twice and for the second call to walk back over a newline.

Let me know if there is an easier way though

wavebeem · 2021-04-17T03:20:43Z

Looks great, I'll do a release this weekend probably. Thanks for the excellent work!

wavebeem · 2021-04-18T20:18:06Z

@brendo-m Thanks! 1.17.0 is out https://www.npmjs.com/package/parsimmon

brendo-m · 2021-04-18T20:19:31Z

Thanks!

wavebeem reviewed Apr 16, 2021

View reviewed changes

package.json Outdated Show resolved Hide resolved

fix: Improve the line/column number caching logic

6a2df04

wavebeem merged commit a5585aa into jneen:master Apr 18, 2021

brendo-m mentioned this pull request Jun 18, 2021

fix: Fix line number double counting bug when parsing optional newlines #324

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Improve the line/column number caching logic #321

fix: Improve the line/column number caching logic #321

brendo-m commented Apr 16, 2021

coveralls commented Apr 16, 2021 •

edited

wavebeem commented Apr 16, 2021

brendo-m commented Apr 16, 2021

wavebeem commented Apr 16, 2021

brendo-m commented Apr 16, 2021

brendo-m commented Apr 16, 2021

wavebeem commented Apr 17, 2021

wavebeem commented Apr 18, 2021

brendo-m commented Apr 18, 2021

fix: Improve the line/column number caching logic #321

fix: Improve the line/column number caching logic #321

Conversation

brendo-m commented Apr 16, 2021

coveralls commented Apr 16, 2021 • edited

wavebeem commented Apr 16, 2021

brendo-m commented Apr 16, 2021

wavebeem commented Apr 16, 2021

brendo-m commented Apr 16, 2021

brendo-m commented Apr 16, 2021

wavebeem commented Apr 17, 2021

wavebeem commented Apr 18, 2021

brendo-m commented Apr 18, 2021

coveralls commented Apr 16, 2021 •

edited