Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml-0.10.0 requires larger stack usage #147

Closed
ndmitchell opened this issue Aug 20, 2018 · 18 comments
Closed

yaml-0.10.0 requires larger stack usage #147

ndmitchell opened this issue Aug 20, 2018 · 18 comments

Comments

@ndmitchell
Copy link
Contributor

For large stack usage, it's nearly 100% of the time a space leak, leaking performance. I detected this using the HLint space leak tests, which started failing with the 0.10.0 upgrade. See https://github.com/ndmitchell/spaceleak for my methodology.

Using the test case:

import Data.Yaml

main = do
    res <- decodeFileEither "C:\\Neil\\hlint\\data\\hlint.yaml"
    print $ (res :: Either ParseException Value)

This crashes when I do:

ghc --make yaml-overflow.hs -isrc dist\build\c\helper.o dist\build\libyaml\api.o dist\build\libyaml\dumper.o dist\build\libyaml\emitter.o dist\build\libyaml\loader.o dist\build\libyaml\parser.o dist\build\libyaml\reader.o dist\build\libyaml\scanner.o dist\build\libyaml\writer.o -rtsopts -prof -auto-all -XBangPatterns && yaml-overflow +RTS -K30K -xc

This code starts failing with be36373, but works fine with -K1K before that, so ping @sol. The test data can be found at https://github.com/ndmitchell/hlint/blob/master/data/hlint.yaml.

I did some "obvious" changes based on the diff - moving to RWS.Strict, moving to Map.Strict, moving to foldl'. I thought I'd spotted it with parseS taking an Int it never forces, which is almost certainly a space leak, but even that didn't make the space leak go away (but is probably desirable). I suspect the best way forward is to apply the changes from the patch in question one at a time.

@ndmitchell
Copy link
Contributor Author

The diff that introduced the space leak: be36373

The diff between the two versions: http://hdiff.luite.com/cgit/yaml/diff/

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

Bummer! I can try to take a look tomorrow.

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

I took a quick look, memory usage went up from 3 MB to 10 MB and is linear (if I double the file size it's 7 MB vs 18 MB).

Doesn't look very dramatic to me.

@ndmitchell
Copy link
Contributor Author

Measuring memory usage may well not detect the space leak - the space leak is about accumulating relatively large amounts of gunk which are later forced. E.g. we accumulate succ (succ (succ 0)) instead of 3. It's somewhat a hygiene thing - I recommend adding +RTS -K1K to all projects to catch these space leaks early. If you decide to leave one in your project then everyone downstream isn't going to be able to spot them.

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

Using Control.Monad.RWS.Strict brings total memory usage down to 4 MB, however -K70K is required.

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

At the point I switch

type Parse = StateT (Map String Value) (ResourceT IO)

to

type Parse = RWST () () (Map String Value) (ResourceT IO)

I need -K34K.

(I'm using Control.Monad.RWS.Strict here)

@ndmitchell
Copy link
Contributor Author

Before it was using Control.Monad.Trans.State, which is actually the lazy one. Would be interesting to see if the strict and lazy state were both small stack usage.

@ndmitchell
Copy link
Contributor Author

Would also be good to see the RWS.Lazy as well.

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

-K33K required for Control.Monad.RWS.

@sol
Copy link
Collaborator

sol commented Aug 20, 2018

I don't observe any difference between strict / lazy StateT.

@ndmitchell
Copy link
Contributor Author

I see that the Strict RWST is not strict in the writer, which is very likely where the space leak is coming from. It will be building up a [] ++ [] ++ [] ++ [] ++ [] ++... that just keeps growing and growing. Looking at Control.Monad.Trans.Writer that seems to suffer exactly the same space leak as far as I can tell.

Alternative would be to use the https://www.fpcomplete.com/blog/2017/06/readert-design-pattern pattern, which is my goto alternative to RWST.

@ndmitchell
Copy link
Contributor Author

https://blog.infinitenegativeutility.com/2016/7/writer-monads-and-space-leaks explains the problem - Writer monads are inherently a space leak, and there's nothing you can do about it. ReaderT/IORef is the solution.

@snoyberg
Copy link
Owner

I thought I dropped a comment here, but looks like it got lost.

I'm in favor of fixing this problem, and generally avoiding any flavor of WriterT. @sol are you interested in making the code changes, or should I take it?

@sol
Copy link
Collaborator

sol commented Aug 22, 2018

Ok, I assume we will revert to using StateT again. Technically we could put everything into StateT, but I may still want to keep the ReaderT as the use of local fits the problem domain very well.

I'll give it a stab.

@ndmitchell
Copy link
Contributor Author

Why switch to a StateT? I'd recommend a ReaderT that contains an IORef (or perhaps more than one). That pattern tends to be most performant and simple.

sol added a commit that referenced this issue Aug 22, 2018
@sol
Copy link
Collaborator

sol commented Aug 22, 2018

I opened #148 to address this.

I did some "obvious" changes based on the diff - moving to RWS.Strict, moving to Map.Strict, moving to foldl'.

The foldl' was there before. It's not exercised by your test data (we would need some YAML file with anchors for that). Anyway, I changed it to foldl' as I guess that won't hurt.

I thought I'd spotted it with parseS taking an Int it never forces, which is almost certainly a space leak

I assume this matters for long lists (as your test data essentially is). I didn't observe any difference by changing it. I don't have a strong opinion here. As a side note, this thunks won't be retained unless there is a warning.

@sol
Copy link
Collaborator

sol commented Aug 22, 2018

I'd recommend a ReaderT that contains an IORef (or perhaps more than one). That pattern tends to be most performant and simple.

I at least agree that reasoning about performance is easier with an IORef especially in combination with modifyIORef' / atomicModifyIORef'. Is this a good enough reason to go imperative?

@ndmitchell
Copy link
Contributor Author

Yes, definitely fix the Int thing. It's performance leak 101. Slow, memory hungry, everything bad about laziness.

Regarding ReaderT, no need to go for atomic, and StateT IO is already imperative. However, personal choice.

sol added a commit that referenced this issue Aug 22, 2018
sol added a commit that referenced this issue Aug 22, 2018
sol added a commit that referenced this issue Aug 26, 2018
sol added a commit that referenced this issue Aug 26, 2018
sol added a commit that referenced this issue Aug 27, 2018
sol added a commit that referenced this issue Aug 27, 2018
sol added a commit that referenced this issue Aug 28, 2018
sol added a commit that referenced this issue Aug 28, 2018
sol added a commit that referenced this issue Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants