Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space leak in Data.Map.partitionWithKey? #14

Closed
edsko opened this issue Aug 20, 2012 · 9 comments · Fixed by #15
Closed

Space leak in Data.Map.partitionWithKey? #14

edsko opened this issue Aug 20, 2012 · 9 comments · Fixed by #15

Comments

@edsko
Copy link

edsko commented Aug 20, 2012

I have tried, but I have been unable to condense this into a small example.. so please forgive me for describing what's going on in Cloud Haskell instead. I have the following function (https://github.com/haskell-distributed/distributed-process/blob/ceb7e5ae9f617da20f8e0920f0630a35849bf4fa/distributed-process/src/Control/Distributed/Process/Node.hs#L505):

ncEffectDied :: Identifier -> DiedReason -> NC ()
ncEffectDied ident reason = do
  (affectedLinks, unaffectedLinks) <- gets (splitNotif ident . (^. links))
  (affectedMons,  unaffectedMons)  <- gets (splitNotif ident . (^. monitors))

  -- Do stuff with affectedLinks, affectedMons

  modify' $ (links ^= unaffectedLinks) . (monitors ^= unaffectedMons)

NC is a state monad around IO. So we get the state from the state monad, extract two maps, partition those maps using splitNotif (see below), then do something with the first half and finally store the other half back in the state (links and monitors are Accessors). Some important remarks:

  1. The state is defined as

     data NCState = NCState 
      {
        _links    :: !(Map Identifier (Set ProcessId))
      , _monitors :: !(Map Identifier (Set (ProcessId, MonitorRef)))
      , ...
      }
    

    where, importantly, we have strict references to the maps from the state.

  2. The maps do not contain Maps themselves.

  3. We use modify' to update the state, which is a custom, strict, version of the standard modify:

    modify' :: MonadState s m => (s -> s) -> m ()
    modify' f = StateT.get >>= \s -> StateT.put $! f s
    

Now here's the rub. splitNotif is defined as

splitNotif ident = Map.partitionWithKey (\k !v -> impliesDeathOf ident k) 

With this definition, my code runs in constant space. If however I remove the bang on v so that the predicate is no longer strict in the map values, I end up with a Map-typed space leak. I don't understand why this happens, nor have I been able to reproduce this in smaller test cases (though Ian speculated that that might be because the GC could optimize the smaller test cases but not the bigger one).

I don't know if this is a bug in partitionWithKey or not -- either way, if you understand why this happens, I'd really like to know.

@tibbe
Copy link
Member

tibbe commented Aug 20, 2012

Which version of containers are you using? If it's 0.5, are you using the Strict or the Lazy API?

@edsko
Copy link
Author

edsko commented Aug 20, 2012

No, I'm using 0.4.2.1 -- I couldn't use 0.5 (dependency problems -- upper limits, heh :). But I did check, and I think I concluded that the Strict and Lazy API in 0.5 use the same implementation of partitionWithKey?

@tibbe
Copy link
Member

tibbe commented Aug 20, 2012

You're right, they're using the same implementation. Would it be possible for you to move to 0.5 or should I try to fix both the 0.4 and 0.5 releases?

@edsko
Copy link
Author

edsko commented Aug 20, 2012

I use data-accessor a lot, which needs a new release to allow for containers-0.5. But that's minor; the main problem is that template-haskell is compiled against 0.4.2.1, and cabal tells me that rebuilding template-haskell can lead to all kinds of problems, so I haven't been brave enough to try it.

@tibbe
Copy link
Member

tibbe commented Aug 20, 2012

In that case I think you're out of luck. Even if containers is to blame here and I make a new release in the 0.4.2 series, you'd still have to recompile template-haskell. That being said, I'm still interested in fixing this issue.

@edsko
Copy link
Author

edsko commented Aug 20, 2012

Fair point. Mind you, mostly I'd just like to understand what's going on -- I don't like "fixing" problems without understanding the fix.

@tibbe
Copy link
Member

tibbe commented Aug 25, 2012

GitHub mistakenly closed this when I made a yet-to-be merged commit in another branch. Reopening until the patch is merged into the master branch.

@edsko
Copy link
Author

edsko commented Aug 25, 2012

Aaah. The fix is obvious in hindsight, I should have noticed that. Thanks for looking into this!

@tibbe tibbe closed this as completed in cdc0d56 Aug 25, 2012
@tibbe
Copy link
Member

tibbe commented Aug 25, 2012

So GitHub will continue to close this issue as I update my pull request. I will leave it closed. You'll have to trust that I will eventually merge the patch into master.

tibbe added a commit that referenced this issue Aug 25, 2012
Some functions, like partition, return a pair of values. Before this
change these functions would do almost no work and return immediately,
due to suspending most of the work in closures. This could cause space
leaks.

Closes #14.
tibbe added a commit that referenced this issue Aug 26, 2012
Some functions, like partition, return a pair of values. Before this
change these functions would do almost no work and return immediately,
due to suspending most of the work in closures. This could cause space
leaks.

Closes #14.
tibbe added a commit that referenced this issue Aug 26, 2012
Some functions, like partition, return a pair of values. Before this
change these functions would do almost no work and return immediately,
due to suspending most of the work in closures. This could cause space
leaks.

Closes #14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants