Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fight undesirable sharing: redux #110

Open
tomjaguarpaw opened this issue Oct 25, 2021 · 2 comments
Open

How to fight undesirable sharing: redux #110

tomjaguarpaw opened this issue Oct 25, 2021 · 2 comments

Comments

@tomjaguarpaw
Copy link
Contributor

This is a follow up from michaelt/streaming#6, wherein it was observed that it is hard to fight undesirable sharing without turning off full-laziness. Well, I think I've found a way!

It seems like it's possible to avoid this problem without invoking no-full-laziness (which is a very blunt instrument) by taking advantage of oneShot. Futhermore, (# #) -> is somewhat faster than () ->. The former has about an 8% slowdown compared to current Of (i.e. Of with no adjustment to second component). The latter has about an 11% slowdown.

  1. I think the slowdown is tolerable to eliminate the risk of catastrophic memory usage, especially since non-stream-bound computations (i.e. those that do relatively less computation to get the next element) will suffer much less
  2. Can anyone come up with an example that demonstrates catastrophic sharing even when using NoMemo?

I still believe it's better to make the full-laziness transformation faster, but in the absence of that, the approach elaborated here might work. I welcome others' thoughts.

{-# LANGUAGE ViewPatterns #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE UnboxedTuples #-}

module Main where

import Prelude hiding (sum, drop, enumFrom, take)
import GHC.Exts
import Streaming
import Streaming.Internal
import Streaming.Prelude hiding (print)
import Data.Functor.Identity
import Data.Functor.Compose

newtype NoMemo a = NoMemo ((# #) -> a)

noMemo :: a -> NoMemo a
noMemo x = NoMemo (oneShot (\(# #) -> x))

unNoMemo :: NoMemo a -> a
unNoMemo (NoMemo f) = f (# #)

instance Functor NoMemo where
  fmap f = noMemo . f . unNoMemo

instance Applicative NoMemo where
  pure = noMemo
  f <*> x = noMemo (unNoMemo f (unNoMemo x))


newtype Unit a = Unit (() -> a)

unit :: a -> Unit a
unit a = Unit (oneShot (\() -> a))

unUnit :: Unit a -> a
unUnit (Unit a) = a ()

instance Functor Unit where
  fmap f = unit . f . unUnit

instance Applicative Unit where
  pure = unit
  f <*> x = unit (unUnit f (unUnit x))


big :: Int
big = 4 * 1000 * 1000 * 1000

enumFrom2 :: (Monad m, Enum n)
          => n -> Stream (Compose (Of n) NoMemo) m r
enumFrom2 = loop where
  loop !n = Effect (return (Step (Compose (n :> pure (loop (succ n))))))

enumFrom3 :: (Monad m, Enum n)
          => n -> Stream (Compose (Of n) Unit) m r
enumFrom3 = loop where
  loop !n = Effect (return (Step (Compose (n :> pure (loop (succ n))))))

stream1 :: Stream (Of Integer) Identity ()
stream1 = take big (enumFrom 0)

stream2 :: Stream (Compose (Of Integer) NoMemo) Identity ()
stream2 = take big (enumFrom2 0)

stream3 :: Stream (Compose (Of Integer) Unit) Identity ()
stream3 = take big (enumFrom3 0)

fold2 :: Monad m => (x -> a -> x) -> x -> (x -> b)
      -> Stream (Compose (Of a) NoMemo) m r
      -> m (Of b r)
fold2 step begin done str = fold_loop str begin
  where
    fold_loop stream !x = case stream of
      Return r         -> return (done x :> r)
      Effect m         -> m >>= \str' -> fold_loop str' x
      Step (Compose (a :> (unNoMemo -> rest))) -> fold_loop rest $! step x a

fold3 :: Monad m => (x -> a -> x) -> x -> (x -> b)
      -> Stream (Compose (Of a) Unit) m r
      -> m (Of b r)
fold3 step begin done str = fold_loop str begin
  where
    fold_loop stream !x = case stream of
      Return r         -> return (done x :> r)
      Effect m         -> m >>= \str' -> fold_loop str' x
      Step (Compose (a :> (unUnit -> rest))) -> fold_loop rest $! step x a

sum1 :: Monad m
     => Stream (Of Integer) m r
     -> m (Of Integer r)
sum1 = fold (+) 0 id

sum2 :: Monad m
     => Stream (Compose (Of Integer) NoMemo) m r
     -> m (Of Integer r)
sum2 = fold2 (+) 0 id

sum3 :: Monad m
     => Stream (Compose (Of Integer) Unit) m r
     -> m (Of Integer r)
sum3 = fold3 (+) 0 id

drop2 :: (Monad m) => Int
      -> Stream (Compose (Of a) NoMemo) m r
      -> Stream (Compose (Of a) NoMemo) m r
drop2 n str | n <= 0 = str
drop2 n str = loop n str where
  loop 0 stream = stream
  loop m stream = case stream of
      Return r       -> Return r
      Effect ma      -> Effect (fmap (loop m) ma)
      Step (Compose (_ :> (unNoMemo -> as))) -> loop (m-1) as

drop3 :: (Monad m) => Int
      -> Stream (Compose (Of a) Unit) m r
      -> Stream (Compose (Of a) Unit) m r
drop3 n str | n <= 0 = str
drop3 n str = loop n str where
  loop 0 stream = stream
  loop m stream = case stream of
      Return r       -> Return r
      Effect ma      -> Effect (fmap (loop m) ma)
      Step (Compose (_ :> (unUnit -> as))) -> loop (m-1) as

main :: IO ()
main = do
-- Time to sum 4bn elements

-- Current version of Of is the fastest.
-- NoMemo, i.e. `(# #) ->`, is about 8% slower
-- Unit, i.e. `() ->`, is about 11% slower.

--  print $ sum stream1 -- 2:00.3
--  print $ sum (drop 1 stream1)
--  print $ sum2 stream2 -- 2:14.5
--  print $ sum2 (drop2 1 stream2)
  print $ sum3 stream3 -- 2:10.5
--  print $ sum3 (drop3 1 stream3)
@chessai
Copy link
Member

chessai commented Apr 23, 2023

Cleaned up the code a little and turned it into a benchmark using tasty-bench
Note that I changed big from 4billion to 4million

 cat AvoidSharing.hs
{-# language
    ImportQualifiedPost
  , UnboxedTuples
#-}

{-# options_ghc -fno-warn-orphans #-}

module Main where

import Data.Functor.Compose (Compose(..))
import Data.Functor.Identity (Identity(..))
import GHC.Exts (oneShot)
import Streaming.Internal (Stream(..))
import Streaming.Prelude (Of(..))
import Streaming.Prelude qualified as S
import Control.Exception (evaluate)
import Test.Tasty.Bench (defaultMain, env, bgroup, bench, nf)
import Control.DeepSeq (NFData(..))

main :: IO ()
main = do
  defaultMain
    [ bgroup "Of"
        [ bench "sum stream" $ nf (S.sum . streamOf) big
        , bench "sum (drop 1 stream)" $ nf (S.sum . S.drop 1 . streamOf) big
        ]
    , bgroup "Compose (Of Integer) Unit"
        [ bench "sum stream" $ nf (sumUnit . streamUnit) big
        , bench "sum (drop 1 stream)" $ nf (sumUnit . dropUnit 1 . streamUnit) big
        ]
    , bgroup "Compose (Of Integer) NoMemo"
        [ bench "sum stream" $ nf (sumNoMemo . streamNoMemo) big
        , bench "sum (drop 1 stream)" $ nf (sumNoMemo . dropNoMemo 1 . streamNoMemo) big
        ]
    ]

newtype NoMemo a = NoMemo ((# #) -> a)

noMemo :: a -> NoMemo a
noMemo x = NoMemo (oneShot (\(# #) -> x))

unNoMemo :: NoMemo a -> a
unNoMemo (NoMemo f) = f (# #)

instance Functor NoMemo where
  fmap f = noMemo . f . unNoMemo

instance Applicative NoMemo where
  pure = noMemo
  f <*> x = noMemo (unNoMemo f (unNoMemo x))

newtype Unit a = Unit (() -> a)

unit :: a -> Unit a
unit a = Unit (oneShot (\() -> a))

unUnit :: Unit a -> a
unUnit (Unit a) = a ()

instance Functor Unit where
  fmap f = unit . f . unUnit

instance Applicative Unit where
  pure = unit
  f <*> x = unit (unUnit f (unUnit x))

big :: Int
big = 4_000_000

enumFromNoMemo :: (Monad m, Enum n)
          => n -> Stream (Compose (Of n) NoMemo) m r
enumFromNoMemo = loop where
  loop !n = Effect (return (Step (Compose (n :> pure (loop (succ n))))))

enumFromUnit :: (Monad m, Enum n)
          => n -> Stream (Compose (Of n) Unit) m r
enumFromUnit = loop where
  loop !n = Effect (return (Step (Compose (n :> pure (loop (succ n))))))

streamOf :: Int -> Stream (Of Integer) Identity ()
streamOf n = S.take n (S.enumFrom 0)

streamNoMemo :: Int -> Stream (Compose (Of Integer) NoMemo) Identity ()
streamNoMemo n = S.take n (enumFromNoMemo 0)

streamUnit :: Int -> Stream (Compose (Of Integer) Unit) Identity ()
streamUnit n = S.take n (enumFromUnit 0)

foldNoMemo :: Monad m => (x -> a -> x) -> x -> (x -> b)
      -> Stream (Compose (Of a) NoMemo) m r
      -> m (Of b r)
foldNoMemo step begin done str = fold_loop str begin
  where
    fold_loop stream !x = case stream of
      Return r         -> return (done x :> r)
      Effect m         -> m >>= \str' -> fold_loop str' x
      Step (Compose (a :> rest)) -> fold_loop (unNoMemo rest) $! step x a

foldUnit :: Monad m => (x -> a -> x) -> x -> (x -> b)
      -> Stream (Compose (Of a) Unit) m r
      -> m (Of b r)
foldUnit step begin done str = fold_loop str begin
  where
    fold_loop stream !x = case stream of
      Return r         -> return (done x :> r)
      Effect m         -> m >>= \str' -> fold_loop str' x
      Step (Compose (a :> rest)) -> fold_loop (unUnit rest) $! step x a

sumOf :: Monad m
     => Stream (Of Integer) m r
     -> m (Of Integer r)
sumOf = S.fold (+) 0 id

sumNoMemo :: Monad m
     => Stream (Compose (Of Integer) NoMemo) m r
     -> m (Of Integer r)
sumNoMemo = foldNoMemo (+) 0 id

sumUnit :: Monad m
     => Stream (Compose (Of Integer) Unit) m r
     -> m (Of Integer r)
sumUnit = foldUnit (+) 0 id

dropNoMemo :: (Monad m) => Int
      -> Stream (Compose (Of a) NoMemo) m r
      -> Stream (Compose (Of a) NoMemo) m r
dropNoMemo n str | n <= 0 = str
dropNoMemo n str = loop n str where
  loop 0 stream = stream
  loop m stream = case stream of
      Return r       -> Return r
      Effect ma      -> Effect (fmap (loop m) ma)
      Step (Compose (_ :> as)) -> loop (m-1) (unNoMemo as)

dropUnit :: (Monad m) => Int
      -> Stream (Compose (Of a) Unit) m r
      -> Stream (Compose (Of a) Unit) m r
dropUnit n str | n <= 0 = str
dropUnit n str = loop n str where
  loop 0 stream = stream
  loop m stream = case stream of
      Return r       -> Return r
      Effect ma      -> Effect (fmap (loop m) ma)
      Step (Compose (_ :> as)) -> loop (m-1) (unUnit as)

instance (NFData a, NFData b) => NFData (Of a b) where
  rnf (a :> b) = rnf a `seq` rnf b `seq` ()

Results:

❯ ghc AvoidSharing.hs -o main -O2 -fproc-alignment=64 -rtsopts -fforce-recomp && ./main +RTS -T -A32m -s -RTS
[1 of 2] Compiling Main             ( AvoidSharing.hs, AvoidSharing.o )
[2 of 2] Linking main [Objects changed]
/nix/store/8qm6sjqa09a03glzmafprpp69k74l4lm-binutils-2.40/bin/ld.gold: warning: AvoidSharing.o: section .rodata.str contains incorrectly aligned strings; the alignment of those strings won't be preserved
All
  Of
    sum stream:          OK (1.37s)
      72.5 ms ± 2.8 ms, 516 MB allocated, 4.1 KB copied, 602 MB peak memory
    sum (drop 1 stream): OK (0.77s)
      69.4 ms ± 1.7 ms, 516 MB allocated, 4.0 KB copied, 602 MB peak memory
  Compose (Of Integer) Unit
    sum stream:          OK (0.35s)
      107  ms ± 8.4 ms, 904 MB allocated,  11 KB copied, 602 MB peak memory
    sum (drop 1 stream): OK (0.83s)
      120  ms ± 2.9 ms, 908 MB allocated, 6.4 KB copied, 602 MB peak memory
  Compose (Of Integer) NoMemo
    sum stream:          OK (0.74s)
      105  ms ± 6.3 ms, 908 MB allocated, 7.3 KB copied, 602 MB peak memory
    sum (drop 1 stream): OK (0.82s)
      118  ms ± 3.5 ms, 908 MB allocated, 6.0 KB copied, 602 MB peak memory

All 6 tests passed (5.03s)
  31,017,450,904 bytes allocated in the heap
   2,104,295,992 bytes copied during GC
     288,641,544 bytes maximum residency (22 sample(s))
       4,596,760 bytes maximum slop
             602 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       915 colls,     0 par    0.143s   0.143s     0.0002s    0.0101s
  Gen  1        22 colls,     0 par    1.130s   1.130s     0.0514s    0.1898s

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    3.762s  (  3.756s elapsed)
  GC      time    1.273s  (  1.273s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    5.036s  (  5.030s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    8,245,191,181 bytes per MUT second

  Productivity  74.7% of total user, 74.7% of total elapsed

@tomjaguarpaw
Copy link
Contributor Author

Thanks! So this is showing more like a 70% slowdown? That's unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants