-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict containers #22
Comments
Filed haskell/containers#752 for |
For
cc @edsko, you might have thoughts about this topic. |
Keeping all the elements in a finger tree in WHNF is harder than it sounds. Like @phadej points out, the finger tree depends essentially on laziness to establish its asymptotic bounds. Inserting new elements into the finger tree is easy enough, but operations such as |
Thanks for the link & explanations. Then is the general position that the thunks inside As for |
Re I think there is enough anecdotes that we should try to make I don't think a |
Having a separate |
I have a strong opinion nowadays that achieving a local optimum (e.g. by making That said. If EDIT: I'm not looking for easy options, but for the right ones. |
Making everything strict would degrade the insertion bound from logarithmic in the distance to the nearest end to logarithmic in the total sequence size. So |
Is it not feasible to force the values being inserted, but leave |
Forcing values being inserted is fine. But the internal structure needs to stay lazy. |
I noticed that yet another problem with having the same type for Strict/Lazy variants, is that serialisation instances default to the lazy version - this makes using |
I don't understand. Deserialization should normally be a very eager process. Could you point out where that gives you a lot of harmful thunks? |
@treeowl Deserialising a map gives you a map with thunk values, taking up more memory than you wanted. |
@infinity0, I'm sorry, but I'm missing the context. Deserializing it how? |
I assume that in EDIT: I don't think that this is bad. In that case there is no leak, just some work deferred. Some people might care though. |
@phadej, yeah, that's valid. |
Typically you deserialise just once, so the leak does not expand over time, but it is a leak in the same sense as other space leaks - the work is deferred but the caller prefers to keep the state for a long time in the smaller representation. It's also confusing when you look at a heap-view dump and see a lot of thunks, forget about deserialisation and misattribute it to some other part of the code where there actually is no leak. Or if there actually is a 2nd leak in the other part of the code, it's hard to investigate it with nothunks because the thunks-due-to-deserialisation will cause nothunks to always report "has thunks" even when you fix the 2nd leak, and didn't realise deserialisation was also a problem. All-in-all I think in general this adds to a strong case for having separately-typed strict vs lazy variants of all containers. |
Why not simply change the serialization libraries to use the strict |
@sjakobi The decision on whether a data structure should be strict or lazy is something best decided by the caller (the programmer using the data type). If the strict/lazy types are the same type, the serialisation library author does not have the necessary context in order to correctly decide whether the lazy or strict behaviour is best, for an instance definition. |
The serialization libraries could also offer functions for deserializing both the strict and lazy variants. They don't have to restrain themselves to offering instances. |
That just sounds like duplicating functionality unnecessarily - for a given usage of a variable, nobody uses it in a mixed strict-or-lazy way; it is fixed for a given usage. So choosing between strict vs lazy makes sense at the point of definition, with strict field annotations, then operations on that definition automatically are strict or lazy based on that annotation. Forcing everyone to duplicate their operations to have strict or lazy versions, in a very boilerplate-like way, and forcing callers to correctly choose between them, doesn't seem like a good use of anyone's time given the possibility of automating away this tedious task with strict field annotations. |
One big advantage of having separate data structures e.g. for lazy and strict Currently, when you use the strict |
Note: strict containers won't be fully lawful |
Oops! How would a strict |
An example using strict pair: *Data.Strict> fmap (const True . undefined) x
True :!: True
*Data.Strict> fmap (const True) (fmap undefined x)
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at <interactive>:4:25 in interactive:Ghci4 |
Thanks for the demonstration @phadej! It would be good to document this law-breaking on the relevant instances IMHO. |
The laws hold only except if |
or |
I've had a longer think about this, and I think the |
On second thoughts, only |
However, an initial version based on |
Closing, an initial implementation is on its way in https://github.com/haskellari/strict-containers |
@sjakobi If you could take a look at that, any comments or feedback would be appreciated - and I'm happy to have a co-maintainer too |
@infinity0 Thanks for implementing this! I'm sure that a lot of people will want to use the new types for their strict instances. So I'd suggest not waiting too long with a release. Once we have some feedback, I think we can start to discuss moving these types into the original I expect my bandwidth to be fairly limited, but I can offer to serve as a backup maintainer if you'd be fine with this. |
@sjakobi I've uploaded the package - https://hackage.haskell.org/package/strict-containers-0.1 and there's also some basic strictness tests in the repo. Feel free to tag me in any discussions regarding merging it back into the upstream packages! |
The "strict containers" functor law is |
My program still has space leaks - using
ghc-heap-view
it is clear that the leak is coming fromData.Map
andData.Sequence
which contain lots of thunks, and the leak goes away when I print them.Even though I am using
Data.Map.Strict
in my program, the instances ofMap
(e.g.fmap
andat
from lens) all use lazy operations. AlsoData.Sequence
does not provide a version that is strict in its values, only its length - analogous to howData.Map.Lazy
is already strict in its keys.One idea is to provide a
newtype
wrapper around these containers and define instances that only use strict operations. ForData.Sequence
it is cleanest probably to first implement and upstreamData.Sequence.Strict
.The text was updated successfully, but these errors were encountered: