Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space leak in reachOffset' #486

Closed
tomjaguarpaw opened this issue Aug 29, 2022 · 6 comments · Fixed by #487
Closed

Space leak in reachOffset' #486

tomjaguarpaw opened this issue Aug 29, 2022 · 6 comments · Fixed by #487
Labels

Comments

@tomjaguarpaw
Copy link
Contributor

The SourcePos field of St is lazy:

data St = St SourcePos ShowS

This causes a space leak in reachOffset':

St spos f = foldl'' go (St pstateSourcePos id) pre
go (St apos g) ch =
let SourcePos n l c = apos
c' = unPos c
w = unPos pstateTabWidth
in if
| ch == newlineTok ->
St
(SourcePos n (l <> pos1) pos1)
id
| ch == tabTok ->
St
(SourcePos n l (mkPos $ c' + w - ((c' - 1) `rem` w)))
(g . (fromTok ch :))
| otherwise ->
St
(SourcePos n l (c <> pos1))
(g . (fromTok ch :))

Although the St constructor remains evaluated each time around the foldl' iteration, the SourcePos inside does not. Instead it accumulates a large thunk.

Now, this doesn't affect megaparsec itself, because it only uses reachOffset' to derive reachOffsett methods and the reachOffset methods are never used within the library itself. However, a problem can arise if a client defines a TraversableStream whose reachOffsetNoLine method is derived from its reachOffset. If that reachOffset uses reachOffset' directly, or indirectly by using one of the built in TraversableStream instances, then the space leak will manifest itself.

Given that reachOffset seems no longer used by megaparsec perhaps the best long term fix is to deprecate then remove that method. In the short term perhaps you could make that field strict again.

The SourcePos field was made lazy in e307266 but I don't understand why. I can hardly imagine that allowing this space leak is good for performance.

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 29, 2022

IIRC this was a conscious decision. reachOffset is supposed to be used only when parsing is finished as part of error reporting. Its performance therefore doesn't matter as much because it will be evaluated only in "parse failure" branches of execution flow. I remember that I profiled some parsers and making that field lazy gave some noticeable performance improvement, so I made it lazy.

I'm open to either updating the documentation to mention this possible performance problem or making the field strict provided the benchmarks are not getting any worse.

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 29, 2022

It may be that my memory deceives me though. Happy to make this strict provided there is no performance regression.

@tomjaguarpaw
Copy link
Contributor Author

the reachOffset methods are never used within the library itself

I just realised this isn't true. reachOffset is used in errorBundlePretty, so it's even more imperative to fix this issue.

Is there any circumstance in which one would call reachOffset but not evaluate the PosState it returns? If not then it makes no sense to make that field of St lazy. If it's lazy you still do exactly as much computation updating the PosState, plus additional computation creating and entering thunks, plus consume additional amounts of memory proportional to o - pstateOffset. The constant factor on the memory usage is large, tens of bytes at least, so if you are parsing a 10 MB file you can easily consume 1 GB just to report errors.

@tomjaguarpaw
Copy link
Contributor Author

master

Uses 2.4 MB to report errors in a 10 kB input stream, an overhead of 24x! (Naturally, the size of the space leak scales linearly with the size of the input file.)

screenshot0

Strict SourcePos in St

The space leak is gone.

screenshot1

Code

https://github.com/tomjaguarpaw/megaparsec-bug

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 31, 2022

Would you like to open a PR?

tomjaguarpaw added a commit to tomjaguarpaw/megaparsec that referenced this issue Sep 1, 2022
@tomjaguarpaw
Copy link
Contributor Author

OK: #487

mrkkrp pushed a commit to tomjaguarpaw/megaparsec that referenced this issue Sep 1, 2022
@mrkkrp mrkkrp closed this as completed in a0efaab Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants