Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strict containers #22

Closed
infinity0 opened this issue Oct 21, 2020 · 36 comments
Closed

strict containers #22

infinity0 opened this issue Oct 21, 2020 · 36 comments

Comments

@infinity0
Copy link
Member

My program still has space leaks - using ghc-heap-view it is clear that the leak is coming from Data.Map and Data.Sequence which contain lots of thunks, and the leak goes away when I print them.

Even though I am using Data.Map.Strict in my program, the instances of Map (e.g. fmap and at from lens) all use lazy operations. Also Data.Sequence does not provide a version that is strict in its values, only its length - analogous to how Data.Map.Lazy is already strict in its keys.

One idea is to provide a newtype wrapper around these containers and define instances that only use strict operations. For Data.Sequence it is cleanest probably to first implement and upstream Data.Sequence.Strict.

@infinity0
Copy link
Member Author

Filed haskell/containers#752 for Data.Sequence.Strict.

@phadej
Copy link
Contributor

phadej commented Oct 21, 2020

For at, one should advocate for adding at' to lens. That's reasonable request.

Data.Sequence has thunks inside it, it's by design (the amortized costs of operations depend on thunks not being forced too early), see permissible thunks section in https://www.well-typed.com/blog/2020/09/nothunks/ (and the whole post might be of interest to you).

cc @edsko, you might have thoughts about this topic.

@edsko
Copy link

edsko commented Oct 21, 2020

Keeping all the elements in a finger tree in WHNF is harder than it sounds. Like @phadej points out, the finger tree depends essentially on laziness to establish its asymptotic bounds. Inserting new elements into the finger tree is easy enough, but operations such as map are hard: if we need to guarantee that all elements are in WHNF after the map, then we have no choice but to traverse the entire finger tree, forcing any thunks it might have inside. Semantically this is fine, of course, but I think the effects this has on the overall asymptotic complexity of the finger tree need careful consideration.

@infinity0
Copy link
Member Author

Thanks for the link & explanations. Then is the general position that the thunks inside Data.Sequence should not be contributing to any space leak? I didn't yet check this empirically but am prepared to believe it, in which case we can forget my Data.Sequence.Strict suggestion - although the documentation perhaps could be amended to explicitly mention that values are indeed forced.

As for lens, there was already a closed PR (ekmett/lens#492) where the maintainer pointed out that the issue also affects fmap, which is why I suggested a newtype here. A strict at would only solve the problem for the instance of the At typeclass, but it exists for all typeclasses.

@phadej
Copy link
Contributor

phadej commented Oct 21, 2020

Re Map. See ekmett/lens#944

I think there is enough anecdotes that we should try to make Data.Map.Strict.Map and Data.Map.Lazy.Map be separate types, where Strict.Map has ! in the leaf, and not try to pretend that we can have element-strict data structure by having only separate operations.

I don't think a newtype is sustainable option. You could still accidentally insert a thunk there.

@infinity0
Copy link
Member Author

Having a separate Map type with strict fields does sound much more sustainable and is ofc the approach these packages take, but I was not feeling confident it would get done any time soon, especially because of the backwards-compat issues.

@phadej
Copy link
Contributor

phadej commented Oct 21, 2020

I have a strong opinion nowadays that achieving a local optimum (e.g. by making newtype) would prevent going for global one at all.

That said. If strict-containers might be a good start by actually copying the Map implementation, and adding the !. It would be a concrete thing containers maintainers could adopt. And it's not newtype hack you may accidentally still break.

EDIT: I'm not looking for easy options, but for the right ones.

@treeowl
Copy link

treeowl commented Oct 21, 2020

Inserting new elements into the finger tree is easy enough

Making everything strict would degrade the insertion bound from logarithmic in the distance to the nearest end to logarithmic in the total sequence size. So <| and |> would no longer be O(1). There are real-time data structures that do things like Data.Sequence, but they're much more complicated and probably slower in practice.

@infinity0
Copy link
Member Author

Is it not feasible to force the values being inserted, but leave <| and |> (or some equivalent wrapper) as being lazy?

@treeowl
Copy link

treeowl commented Oct 21, 2020

Forcing values being inserted is fine. But the internal structure needs to stay lazy.

@infinity0
Copy link
Member Author

I noticed that yet another problem with having the same type for Strict/Lazy variants, is that serialisation instances default to the lazy version - this makes using nothunks awkward when using data deserialised from a string, because that will contain thunks from the start, polluting your attempt at using nothunks.

@treeowl
Copy link

treeowl commented Oct 22, 2020

I noticed that yet another problem with having the same type for Strict/Lazy variants, is that serialisation instances default to the lazy version - this makes using nothunks awkward when using data deserialised from a string, because that will contain thunks from the start, polluting your attempt at using nothunks.

I don't understand. Deserialization should normally be a very eager process. Could you point out where that gives you a lot of harmful thunks?

@infinity0
Copy link
Member Author

@treeowl Deserialising a map gives you a map with thunk values, taking up more memory than you wanted.

@treeowl
Copy link

treeowl commented Oct 22, 2020

@infinity0, I'm sorry, but I'm missing the context. Deserializing it how?

@phadej
Copy link
Contributor

phadej commented Oct 22, 2020

I assume that in Map Int (Seq Int) case deserialised Map will contain Seq.fromList somelist looking thunks, if/when deserialisation is type-class driven, like in aeson, binary, serialise etc... because instances often look like deserialize = Seq.fromList <$> deserialize

EDIT: I don't think that this is bad. In that case there is no leak, just some work deferred. Some people might care though.

@treeowl
Copy link

treeowl commented Oct 22, 2020

@phadej, yeah, that's valid.

@infinity0
Copy link
Member Author

Typically you deserialise just once, so the leak does not expand over time, but it is a leak in the same sense as other space leaks - the work is deferred but the caller prefers to keep the state for a long time in the smaller representation.

It's also confusing when you look at a heap-view dump and see a lot of thunks, forget about deserialisation and misattribute it to some other part of the code where there actually is no leak. Or if there actually is a 2nd leak in the other part of the code, it's hard to investigate it with nothunks because the thunks-due-to-deserialisation will cause nothunks to always report "has thunks" even when you fix the 2nd leak, and didn't realise deserialisation was also a problem.

All-in-all I think in general this adds to a strong case for having separately-typed strict vs lazy variants of all containers.

@sjakobi
Copy link

sjakobi commented Oct 22, 2020

Why not simply change the serialization libraries to use the strict containers APIs?

@infinity0
Copy link
Member Author

@sjakobi The decision on whether a data structure should be strict or lazy is something best decided by the caller (the programmer using the data type). If the strict/lazy types are the same type, the serialisation library author does not have the necessary context in order to correctly decide whether the lazy or strict behaviour is best, for an instance definition.

@sjakobi
Copy link

sjakobi commented Oct 22, 2020

The serialization libraries could also offer functions for deserializing both the strict and lazy variants. They don't have to restrain themselves to offering instances.

@infinity0
Copy link
Member Author

That just sounds like duplicating functionality unnecessarily - for a given usage of a variable, nobody uses it in a mixed strict-or-lazy way; it is fixed for a given usage. So choosing between strict vs lazy makes sense at the point of definition, with strict field annotations, then operations on that definition automatically are strict or lazy based on that annotation. Forcing everyone to duplicate their operations to have strict or lazy versions, in a very boilerplate-like way, and forcing callers to correctly choose between them, doesn't seem like a good use of anyone's time given the possibility of automating away this tedious task with strict field annotations.

@sjakobi
Copy link

sjakobi commented Nov 1, 2020

One big advantage of having separate data structures e.g. for lazy and strict Maps would be that we could have separate instances too.

Currently, when you use the strict Map API, and fmap over it, you're implicitly using the lazy map function, which may break your assumption that the Map contains no thunks.

@phadej
Copy link
Contributor

phadej commented Nov 1, 2020

Note: strict containers won't be fully lawful Functors. You won't be able to use them, fmap degenerates to ad-hoc name-overload.

@sjakobi
Copy link

sjakobi commented Nov 1, 2020

Note: strict containers won't be fully lawful Functors. You won't be able to use them, fmap degenerates to ad-hoc name-overload.

Oops! How would a strict Map break the laws though? And should a strict Map then maybe not have a Functor instance at all then?

@phadej
Copy link
Contributor

phadej commented Nov 1, 2020

An example using strict pair:

*Data.Strict> fmap (const True . undefined) x
True :!: True

*Data.Strict> fmap (const True) (fmap undefined x)
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
  undefined, called at <interactive>:4:25 in interactive:Ghci4

@sjakobi
Copy link

sjakobi commented Nov 1, 2020

Thanks for the demonstration @phadej!

It would be good to document this law-breaking on the relevant instances IMHO.

@infinity0
Copy link
Member Author

The laws hold only except if undefined is used at any point, right?

@phadej
Copy link
Contributor

phadej commented Nov 3, 2020

or let x = x in x or ...

@infinity0
Copy link
Member Author

I've had a longer think about this, and I think the newtype approach is actually better - then the toStrict and toLazy operations become cheap, rather than having to duplicate the whole data structure. Sometimes even with a strict structure, you might want to convert it cheaply to a lazy form in order to perform certian lazy operations on it like map + filter.

@infinity0
Copy link
Member Author

On second thoughts, only toLazy would be significantly cheaper - toStrict would have to fold over the whole data structure anyways, forcing all of the values "just to be sure".

@infinity0
Copy link
Member Author

However, an initial version based on newtype would be easier to create and get off the ground - and it could be converted to the more full approach later (without breaking user code) if it does turn out that it's too brittle.

@infinity0
Copy link
Member Author

Closing, an initial implementation is on its way in https://github.com/haskellari/strict-containers

@infinity0
Copy link
Member Author

@sjakobi If you could take a look at that, any comments or feedback would be appreciated - and I'm happy to have a co-maintainer too

@sjakobi
Copy link

sjakobi commented Apr 15, 2021

@infinity0 Thanks for implementing this! I'm sure that a lot of people will want to use the new types for their strict instances. So I'd suggest not waiting too long with a release. Once we have some feedback, I think we can start to discuss moving these types into the original containers and u-c packages.

I expect my bandwidth to be fairly limited, but I can offer to serve as a backup maintainer if you'd be fine with this.

@infinity0
Copy link
Member Author

@sjakobi I've uploaded the package - https://hackage.haskell.org/package/strict-containers-0.1 and there's also some basic strictness tests in the repo. Feel free to tag me in any discussions regarding merging it back into the upstream packages!

@tomjaguarpaw
Copy link

Note: strict containers won't be fully lawful Functors. You won't be able to use them, fmap degenerates to ad-hoc name-overload.

The "strict containers" functor law is fmap f . fmap g == fmap (\x -> f $! g x), right? If so is that really so terrible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants