Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the leak. #1

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@effectfully
Copy link

effectfully commented Nov 2, 2017

Hi. I'm from Dirt cheap Haskell consultancy.

My preferred way to profile Haskell programs looks something like

stack build --profile
stack exec -- json-unify-exe +RTS -sstderr -N -K1K -A4M -n1M -qb0 -xt -hy -p
hp2ps -e8in json-unify-exe.hp

so I'll be using this.

Running the original version of code results in

screenshot from 2017-11-02 14 00 21

ARR_WORDS is stuff related to ByteString. So a lot of ByteStrings are loaded into memory. The culprit is foldChunks (I cannot say right now what exactly the problem with foldChunks is, it's just my intuition said to me it looks suspicious; I'll think about it), so if we remove (note also that runResourceT is completely redundant here)

  allJson <- runResourceT $ runCConduit $ sourceTBMChan trees
    =$=& foldChunks 100
    =$=& foldChunks 10
    =$=& foldChunks 10
    =$=& foldChunks 10
    =$=& foldlC (\x y -> force $ mappend x y) mempty
  Prelude.print $ someStuff $ unfix $ force allJson

and write this instead:

  allJson <- runCConduit $ sourceTBMChan trees
    =$=& foldlC (\x y -> force $ mappend x y) mempty
  Prelude.print $ someStuff $ unfix $ force allJson

we'll get:

screenshot from 2017-11-02 14 07 48

Which is the actual leak. But it's a simple one: you just allocate a lot of [] things, i.e. lists. Namely, x and y in \x y -> force $ mappend x y are of the same type: Fix TestType. The monoid instance for this type is

instance Monoid (Fix TestType) where
  mappend a b = Fix (unfix a `mappend` unfix b)
  mempty = Fix mempty

which in the mappend case uses the monoid instance of TestType:

data TestType a = TestType
  { inner :: [a]
  , someStuff :: Int
  } | Done

instance Monoid (TestType a) where
  mappend (TestType i1 d1) (TestType i2 d2) = TestType (mappend i1 i2) (d1 + d2)
  mappend e Done = e
  mappend Done e = e
  mempty = TestType [] 0

which itself in the mappend case uses the monoid instance of []. I.e. valueToTypes generates some lists inside Fix TestType and then you append (and fully force each time!) these lists over and over again just to discard the resulting value via someStuff $ unfix $ ... later. The now obvious solution is to discard lists immediately while folding:

  stuff <- runCConduit $ sourceTBMChan trees
    =$=& foldlC (\x y -> x + someStuff (unfix y)) 0
  Prelude.print stuff

which results in

screenshot from 2017-11-02 14 28 11

By the way, I cannot review your conduit code as I'm not very familiar with conduit, but one thing that looks overcomplicated is

allInDir :: MonadResource m => Conduit FilePath m (Fix TestType)
allInDir = ...

readFile :: MonadResource m => FilePath -> m B.ByteString
readFile f = do
  (key, h) <- allocate (openBinaryFile f ReadMode) hClose
  res <- liftIO $ B.hGet h 20000000
  release key
  pure res

What readFile does is allocates a handle, reads the file and releases the handle. I.e. strictly reads a file. But everything related to ResourceT it does just inside this single function, so you could as well write

readFile :: FilePath -> IO B.ByteString
readFile f = runResourceT $ do
  (key, h) <- allocate (openBinaryFile f ReadMode) hClose
  res <- liftIO $ B.hGet h 20000000
  release key
  pure res

But B.ByteString is already strict, no reason to use ResourceT here. Hence

allInDir :: Conduit FilePath IO (Fix TestType)
allInDir = ...

readFile :: FilePath -> IO B.ByteString
readFile f = do
  h <- openBinaryFile f ReadMode
  res <- B.hGet h 20000000
  hClose h
  return res

is sufficient.

@neongreen

This comment has been minimized.

Copy link

neongreen commented Nov 5, 2017

@reactormonk ping :)

@reactormonk

This comment has been minimized.

Copy link
Owner

reactormonk commented Nov 6, 2017

Let me investigate the numbers, so far the write up is top notch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.