How to fight undesirable sharing? #6

Closed
feuerbach opened this Issue Jan 15, 2016 · 22 comments

Projects

None yet

5 participants

@feuerbach

This simple code leaks memory because the stream gets shared:

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000

main = a >> a

a = S.mapM_ print $ S.take n $ S.enumFrom (0::Int)

The only workaround I have found is to introduce dummy monadic layers:

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000

main = a >> a

a = S.mapM_ print $ S.take n $ myenumFrom (0::Int)

myenumFrom n = S.effect . return $ do
  S.yield n
  myenumFrom (n+1)

Are there any better approaches?

@michaelt
Owner

It should be possible to reproduce any sharing conundrum that you could make with Data.List. Thus all of these exhaust memory except the one that uses streaming copy:

import qualified Streaming as S
import qualified Streaming.Prelude as S
import Pipes
import qualified Pipes.Prelude as P
import qualified Conduit as C
import qualified Data.Conduit.List as C
import Control.Monad
import Control.Concurrent (threadDelay)

x = 50000000 :: Int

main = stream'

rep :: Monad f => (Int -> f a) -> f ()
rep act = loop 1 where
  loop n = do
    when (n < x) $ do
      act n
      loop (n+1)

ns :: [Int]
ns = [1..x]
ss :: Monad m => S.Stream (S.Of Int) m ()
ss = rep S.yield
ps :: Monad m => Producer Int m ()
ps = rep yield
cs :: Monad m => C.ConduitM i Int m ()
cs = rep C.yield


list = do
  print $ sum ns
  threadDelay 1000000
  print $ product ns

conduit = do
  n <- cs C.$$ C.sumC
  print n 
  threadDelay 1000000
  n' <- cs C.$$ C.productC
  print n'

pipes = do
  n <- P.sum ps
  print n 
  threadDelay 1000000
  n' <- P.product ps 
  print n'    

stream = do
  n <- S.sum ss
  print n 
  threadDelay 1000000
  n' <- S.product ss
  print n'

stream' = do
  n <-  S.product $ S.sum $ S.copy ss
  threadDelay 1000000
  print n 
@feuerbach

I don't follow.

You're showing a different example -- one where you use a stream twice, for two different computations.

In my example, I use the stream once to produce an IO action. That IO action is later executed twice.

I am not sure how your copy solution applies to my case.

@michaelt
Owner

Sorry, the bit about copy was distracting, then; the comparison with Data.List and the streaming libraries was the point. In

 main = do
    let io_action = mapM_ print $ take n $ [0..]
    io_action 
    io_action 

ghc thinks it might need the list [0..] again. There are various ways of keeping it from getting this impression. My thought was that nothing will protect you from sharing curiosities as long as you are using ghc/haskell which must hunt for sharing opportunities. Do you think that something is wrong with conduit, because it produces the difficulty above? It is using a codensity representation, but ghc still thinks it has to preserve its calculation.

Because the representation of S.enumFrom 0, even when specialized to IO, is very close to that of [0..] (in that it doesn't itself involve any monadic steps) -- it differs only by throwing in two constructors for each number emitted, where the list has only : -- ghc will be especially prone to thinking it can re-use the results it is calculating. In the case in question you were able to break this by inserting monadic steps, but I don't think it is possible generally, as we see from the conduit/pipes/list cases which as you say "use the stream once to define an io action"

@feuerbach

Yes, I understand why this is happening, and I didn't want to imply that the problem is somehow specific to this library.

I was just curious if you knew any better workarounds than the one I came up with.

I also have another way to fix this minimal example:

{-# OPTIONS_GHC -fno-full-laziness #-}
import qualified Streaming as S
import qualified Streaming.Prelude as S

a :: () -> IO ()
a () = S.mapM_ print $ S.take 500000 $ S.enumFrom 0

main :: IO ()
main = a () >> a ()

However, in my real code base, the point where S.mapM_ is called is deep down the call stack from where the action is duplicated, and transforming all actions in between to take a dummy argument is not very practical.

@feuerbach

@nomeata I know you've done a ton of work in this area. Do you have any thoughts? Is there a chance to get your dup function into ghc?

@michaelt
Owner

For a while, I implemented the three principal offenders, each yield and enumFrom more or less as you did myenumfrom, interpolating a monadic layer, but this of course slows things down a bit; then I thought of adding a yieldM eachM etc., defined like myenumfrom, but the whole device only gets rid of a range of cases. So in the end I decided that the most raw representation would make it easier to reason and anticipate them: something like as = S.each [0..] really is very close to as = [0..], just more corpulent. A device like delaying with a () argument works in some cases, not in others. Insertion of monadic layers is like this. Pipes is putting all but the first element of the stream behind successive () -> ... but still generates the pathology above. The conduit case is even more remarkable.

The slowdown (which is just a matter of piling on more constructors) was only visible in trivial benchmarky cases, so maybe it's worth it.

@nomeata
nomeata commented Jan 17, 2016

Is there a chance to get your dup function into ghc?

Unlikely; it was very fiddly and low-level, and probably is not reliable either.

Unfortunately, I don’t know of any better solution. Even tricks like an () argument can possibly fail to work as expected if GHC is too smart. And all the other tricks (monadic actions) just work because they obscure the code to the compiler. If the compiler gets smarter, it will fail again.

We might need pragmas like inline this or fuse this really hard to guide the compiler, but again, it is not clear how that would look like precisely.

@michaelt
Owner

Working through some of the recalcitrant cases, I did find that -fno-full-laziness seems to work wonders except where there is a clear binding to a 'pure' stream/pipe/conduit. For example, it much more reliably falls for the () argument trick, if I am understanding my evidence properly - but also if the sameness of stream is otherwise buried. So one prophylactic might be to recommend use of it when compiling main. Strangely, feuerbach's original program

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000
main = a >> a
a = S.mapM_ print $ S.take n $ S.enumFrom (0::Int)

is not finding any unintended sharing. But I'm sure I observed it yesterday. This suggests to me that the compiler has a learning ability, which wouldn't too much surprise me.

@tomjaguarpaw

Even tricks like an () argument can possibly fail to work as expected if GHC is too smart.

I would be very interested in any cases you find where no-full-laziness plus () argument fails. Please forward any such cases you find to me!

@feuerbach

@tomjaguarpaw see my original snippet in the very first post

@tomjaguarpaw

@feuerbach The original snippet doesn't have () argument ... ?

@tomjaguarpaw

I think data OfNoSharing a b = OfNoSharing a (() -> b) could be useful for preventing sharing.

@feuerbach

@tomjaguarpaw it doesn't, but simply adding a () argument to a didn't help last I checked.

Your OfNoSharing could indeed work (I haven't checked), but it's rather different from the standard mechanical transformation of adding a () argument to the thing you don't want to share, which I thought you were referring to.

@tomjaguarpaw

simply adding a () argument to a didn't help last I checked

Right, which is why I am interested in

any cases you find where no-full-laziness plus () argument fails.

i.e. do you know of any space leaks where both

  1. no-full-laziness is enabled, and
  2. anything you don't want shared is hidden behind a lambda abstraction

If so, as I said, I would be very interested. It is my current hypothesis that if both conditions are satisfied then the thunk will not be shared.

@tomjaguarpaw

it's rather different from the standard mechanical transformation of adding a () argument to the thing you don't want to share, which I thought you were referring to.

They are indeed different, and I'm suggesting that each would work.

@feuerbach

Weird, I can't reproduce it anymore. I'm reasonably sure no-full-laziness and dummy arguments didn't help in my real problem at work. I'll try to make a minimal example when I have time.

@michaelt
Owner

@tomjaguarpaw that's an interesting discussion on haskell-cafe. I wondered about something like data OfNoSharing a b = OfNoSharing a (() -> b) That is effectively what Pipes is using when specialized to Producer. I labored quite a bit convincing myself that a left-strict pair was better than a lazy one, but never tested a more-than-just-right-lazy pair. I think the existing scheme explains the cases where this library is faster than pipes, which is a lot of the easily comparable cases, so the results might be unpleasant. Another possibility would be to change the (hidden) constructors to

  data Stream f m r = Return r | Effect (m (Stream f m r)) | Step (() -> f (Stream f m r)) 

That might be somewhat more practical. Then it could just be another hidden feature the user wouldn't bump into much, since one could still have essential functions like wrap :: f (Stream f m r) -> Stream f m r and so on. I made the existing Step (!(f (Stream f m r))) constructor strict, because it was seeming to win in lots of cases, but I think there was never any considerable difference.

The no-sharing pair would also be kind of a drag since, at least the way I was thinking, it is important for comprehension that the folds in e.g. Streaming.Prelude should be of the form

 sum :: Stream (Of Int) m r -> m (Of Int r) 

and the like, so that they can be used for the characteristic rank-2 maps, but they also need to be practical taken by themselves. Construction and deconstruction would be less pleasant. But I don't know. All of this presupposes that generally counseling fno-full-laziness makes sense, though I suppose it would eliminate some mishaps anyway.

@tomjaguarpaw

I'm on a bit of a crusade to eliminate full-laziness, or at least replace it with something more subtle: https://stackoverflow.com/questions/35115172/why-is-no-full-laziness-a-default-optimization/35115664#35115664

I would be interested to see how performance changes under OfNoSharing a (() -> b) if and when you get results.

I don't understand why the form of sum would change. Can you explain?

@michaelt
Owner

The type signature would be the same, it's just that that that's a place where the user might actually pattern match, say

loop str = do 
  (n:> rest) <- lift $ S.sum $ S.splitAt 3 str
  when (n < (10::Int)) $ do
    S.yield n
    loop rest

which then becomes

  ...
  loop (rest ())

which isn't bad, just more to keep track of.

@winterland1989

Will the CONLIKE pragma help here?

@michaelt
Owner
michaelt commented Jun 6, 2016 edited

You mean marking something like enumFrom with {-# INLINE CONLIKE enumFrom #-} or {-# INLINE CONLIKE[1] enumFrom #-} or whatever, in its source, I take it? It doesn't affect the test we were contemplating, though maybe I don't know enough about how to use it. I wonder how it would affect optimization in the intuitively correct case where I don't unreasonably bind a name to the result; optimization is the main thing I am worried about.

Here's the source I'm using - toggling between stream, list, etc. - minus the distraction that caused trouble above.

module Main (main) where

import qualified Streaming.Prelude as S
import Streaming (Of(..))
import Pipes
import qualified Pipes.Prelude as P
import qualified Conduit as C
import qualified Data.Conduit.List as C
import Control.Monad
import Control.Concurrent (threadDelay)

x = 50000000 :: Int

main = stream

rep :: Monad f => (Int -> f a) -> f ()
rep act = loop 1 where
  loop n = do
    when (n < x) $ do
      act n
      loop (n+1)

ns :: [Int]
ns = [1..x]

ss :: S.Stream (Of Int) ()
ss = rep S.yield


ps :: Monad m => Producer Int m ()
ps = rep yield
cs :: Monad m => C.ConduitM i Int m ()
cs = rep C.yield


list = do
  print $ sum ns
  threadDelay 1000000
  print $ product ns

conduit = do
  n <- cs C.$$ C.sumC
  print n 
  threadDelay 1000000
  n' <- cs C.$$ C.productC
  print n'

pipes = do
  n <- P.sum ps
  print n 
  threadDelay 1000000
  n' <- P.product ps 
  print n'    

stream = do
  n <- S.sum ss
  print n 
  threadDelay 1000000
  n' <- S.fold (+) 0 id ss
  print n'
@michaelt michaelt closed this Jun 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment