yaml-0.10.0 requires larger stack usage #147

ndmitchell · 2018-08-20T15:37:01Z

For large stack usage, it's nearly 100% of the time a space leak, leaking performance. I detected this using the HLint space leak tests, which started failing with the 0.10.0 upgrade. See https://github.com/ndmitchell/spaceleak for my methodology.

Using the test case:

import Data.Yaml

main = do
    res <- decodeFileEither "C:\\Neil\\hlint\\data\\hlint.yaml"
    print $ (res :: Either ParseException Value)

This crashes when I do:

ghc --make yaml-overflow.hs -isrc dist\build\c\helper.o dist\build\libyaml\api.o dist\build\libyaml\dumper.o dist\build\libyaml\emitter.o dist\build\libyaml\loader.o dist\build\libyaml\parser.o dist\build\libyaml\reader.o dist\build\libyaml\scanner.o dist\build\libyaml\writer.o -rtsopts -prof -auto-all -XBangPatterns && yaml-overflow +RTS -K30K -xc

This code starts failing with be36373, but works fine with -K1K before that, so ping @sol. The test data can be found at https://github.com/ndmitchell/hlint/blob/master/data/hlint.yaml.

I did some "obvious" changes based on the diff - moving to RWS.Strict, moving to Map.Strict, moving to foldl'. I thought I'd spotted it with parseS taking an Int it never forces, which is almost certainly a space leak, but even that didn't make the space leak go away (but is probably desirable). I suspect the best way forward is to apply the changes from the patch in question one at a time.

The text was updated successfully, but these errors were encountered:

ndmitchell · 2018-08-20T15:38:22Z

The diff that introduced the space leak: be36373

The diff between the two versions: http://hdiff.luite.com/cgit/yaml/diff/

sol · 2018-08-20T16:25:19Z

Bummer! I can try to take a look tomorrow.

sol · 2018-08-20T16:45:16Z

I took a quick look, memory usage went up from 3 MB to 10 MB and is linear (if I double the file size it's 7 MB vs 18 MB).

Doesn't look very dramatic to me.

ndmitchell · 2018-08-20T17:01:42Z

Measuring memory usage may well not detect the space leak - the space leak is about accumulating relatively large amounts of gunk which are later forced. E.g. we accumulate succ (succ (succ 0)) instead of 3. It's somewhat a hygiene thing - I recommend adding +RTS -K1K to all projects to catch these space leaks early. If you decide to leave one in your project then everyone downstream isn't going to be able to spot them.

sol · 2018-08-20T18:07:05Z

Using Control.Monad.RWS.Strict brings total memory usage down to 4 MB, however -K70K is required.

sol · 2018-08-20T18:41:58Z

At the point I switch

type Parse = StateT (Map String Value) (ResourceT IO)

to

type Parse = RWST () () (Map String Value) (ResourceT IO)

I need -K34K.

(I'm using Control.Monad.RWS.Strict here)

ndmitchell · 2018-08-20T20:52:39Z

Before it was using Control.Monad.Trans.State, which is actually the lazy one. Would be interesting to see if the strict and lazy state were both small stack usage.

ndmitchell · 2018-08-20T21:00:27Z

Would also be good to see the RWS.Lazy as well.

sol · 2018-08-20T22:59:24Z

-K33K required for Control.Monad.RWS.

sol · 2018-08-20T23:01:28Z

I don't observe any difference between strict / lazy StateT.

ndmitchell · 2018-08-21T15:40:57Z

I see that the Strict RWST is not strict in the writer, which is very likely where the space leak is coming from. It will be building up a [] ++ [] ++ [] ++ [] ++ [] ++... that just keeps growing and growing. Looking at Control.Monad.Trans.Writer that seems to suffer exactly the same space leak as far as I can tell.

Alternative would be to use the https://www.fpcomplete.com/blog/2017/06/readert-design-pattern pattern, which is my goto alternative to RWST.

ndmitchell · 2018-08-22T10:12:02Z

https://blog.infinitenegativeutility.com/2016/7/writer-monads-and-space-leaks explains the problem - Writer monads are inherently a space leak, and there's nothing you can do about it. ReaderT/IORef is the solution.

snoyberg · 2018-08-22T12:50:56Z

I thought I dropped a comment here, but looks like it got lost.

I'm in favor of fixing this problem, and generally avoiding any flavor of WriterT. @sol are you interested in making the code changes, or should I take it?

sol · 2018-08-22T14:14:43Z

Ok, I assume we will revert to using StateT again. Technically we could put everything into StateT, but I may still want to keep the ReaderT as the use of local fits the problem domain very well.

I'll give it a stab.

ndmitchell · 2018-08-22T14:20:16Z

Why switch to a StateT? I'd recommend a ReaderT that contains an IORef (or perhaps more than one). That pattern tends to be most performant and simple.

sol · 2018-08-22T15:56:11Z

I opened #148 to address this.

I did some "obvious" changes based on the diff - moving to RWS.Strict, moving to Map.Strict, moving to foldl'.

The foldl' was there before. It's not exercised by your test data (we would need some YAML file with anchors for that). Anyway, I changed it to foldl' as I guess that won't hurt.

I thought I'd spotted it with parseS taking an Int it never forces, which is almost certainly a space leak

I assume this matters for long lists (as your test data essentially is). I didn't observe any difference by changing it. I don't have a strong opinion here. As a side note, this thunks won't be retained unless there is a warning.

sol · 2018-08-22T16:05:34Z

I'd recommend a ReaderT that contains an IORef (or perhaps more than one). That pattern tends to be most performant and simple.

I at least agree that reasoning about performance is easier with an IORef especially in combination with modifyIORef' / atomicModifyIORef'. Is this a good enough reason to go imperative?

ndmitchell · 2018-08-22T16:13:38Z

Yes, definitely fix the Int thing. It's performance leak 101. Slow, memory hungry, everything bad about laziness.

Regarding ReaderT, no need to go for atomic, and StateT IO is already imperative. However, personal choice.

ndmitchell mentioned this issue Aug 21, 2018

Stack overflow with yaml-0.10.0 ndmitchell/hlint#519

Closed

sol added a commit that referenced this issue Aug 22, 2018

WriterT related space leak (fixes #147)

4ea5e5b

sol added a commit that referenced this issue Aug 22, 2018

Fix WriterT related space leak (fixes #147)

4239b09

sol added a commit that referenced this issue Aug 22, 2018

Prevent thunking (see #147)

6914555

sol added a commit that referenced this issue Aug 26, 2018

Fix WriterT related space leak (fixes #147)

9c63966

sol added a commit that referenced this issue Aug 26, 2018

Prevent thunking (see #147)

1d5010a

sol added a commit that referenced this issue Aug 27, 2018

Fix WriterT related space leak (fixes #147)

47026c7

sol added a commit that referenced this issue Aug 27, 2018

Prevent thunking (see #147)

178e9f1

sol added a commit that referenced this issue Aug 28, 2018

Fix WriterT related space leak (fixes #147)

6b6651a

sol added a commit that referenced this issue Aug 28, 2018

Prevent thunking (see #147)

36cb6ca

sol added a commit that referenced this issue Aug 28, 2018

Prevent thunking (see #147)

dac4bc8

snoyberg closed this as completed in e03c9ac Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yaml-0.10.0 requires larger stack usage #147

yaml-0.10.0 requires larger stack usage #147

ndmitchell commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018 •

edited

Loading

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018

ndmitchell commented Aug 21, 2018

ndmitchell commented Aug 22, 2018

snoyberg commented Aug 22, 2018

sol commented Aug 22, 2018

ndmitchell commented Aug 22, 2018

sol commented Aug 22, 2018

sol commented Aug 22, 2018

ndmitchell commented Aug 22, 2018

yaml-0.10.0 requires larger stack usage #147

yaml-0.10.0 requires larger stack usage #147

Comments

ndmitchell commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018 • edited Loading

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

ndmitchell commented Aug 20, 2018

sol commented Aug 20, 2018

sol commented Aug 20, 2018

ndmitchell commented Aug 21, 2018

ndmitchell commented Aug 22, 2018

snoyberg commented Aug 22, 2018

sol commented Aug 22, 2018

ndmitchell commented Aug 22, 2018

sol commented Aug 22, 2018

sol commented Aug 22, 2018

ndmitchell commented Aug 22, 2018

sol commented Aug 20, 2018 •

edited

Loading