Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map data type too strict? #446

Closed
eskimor opened this issue Dec 17, 2017 · 9 comments
Closed

Map data type too strict? #446

eskimor opened this issue Dec 17, 2017 · 9 comments

Comments

@eskimor
Copy link

eskimor commented Dec 17, 2017

Hi there!

I have the following situation, I have a Map with some internal representation of data:

data MyStuff 
  = MyStuff { internalMap :: Map SomeKey InternalRep }

now I am exposing this map to the user, but I like to offer a different representation of data:

getStuff :: MyStuff -> Map SomeKey ExternalRep 

an obivious implementation would be:

getStuff (MyStuff m) = fmap toExternalRep m

the trouble is, Map is too strict for this: The left and the right branch have strictness annotations, that means if the user does a lookup on the result of getStuff the whole map will be build and the user effectively gets a linear lookup instead of a logarithmic one - which is pretty bad, obviously.

I suppose the strictness annotations are there for efficiency and to avoid space leaks with operations like re-balancing trees. The question is, would it be feasible to simply add bang patterns to those functions instead? As far as I understand it, this should result in the same semantics (Maps can't be unpacked anyway), but would offer the possibility of a lazy fmap, which I actually expected - but gladly I checked the implementation first.

Many thanks! If this sound reasonable, I can definitely provide a PR!

@treeowl
Copy link
Contributor

treeowl commented Dec 18, 2017

I think this is definitely a reasonable thing to think about, but there are two issues that make me very wary of actually implementing it:

  1. There is some efficiency cost to removing those strictness annotations. In particular, when GHC opens up a Bin constructor, it knows that the left and right branches are in WHNF, so it doesn't need to bother dealing with the possibility that they aren't. If you change the constructor to make those fields lazy, then GHC can no longer take advantage of this. I would expect the degradation to be small but measurable. You're welcome to try it out; if the performance difference is small enough, that could change the balance.

  2. Another concern is that making this change substantially increases the potential for functions to be too lazy by mistake. It has been hard enough to ensure that the operations are as strict as they should be in the map elements; I fixed several long-standing strictness bugs in the last release or two. You would have to create a smart constructor standing in for Bin and replace Bin by the smart constructor everywhere appropriate. Then any future changes to the module will need to be audited to ensure that all uses of the raw constructor are appropriate.

The number of functions that will actually benefit from the change you suggest is relatively small: map, mapWithKey, traverse (sometimes), traverseWithKey (sometimes), and perhaps mapAccum{L,R,...}. I'm not yet convinced that improving their performance is really worth making everything else worse.

@eskimor
Copy link
Author

eskimor commented Dec 18, 2017

Thank you for that quick response! Ok I will try it out - let's see how it turns out. The numbers of functions concerned might be few, but the difference is huge - I am looking forward to measuring! ;-)

Another (unrelated) question out of interest, because I believe you are very experienced in this area and I am pretty new to Haskell performance optimizations: There is often the issue too lazy, but also too strict: Do you think a Map variant could be implemented that is super lazy and therefore also quite efficient? Space leaks can often be resolved by more strictness or by less - therefore the question. I thought about it a bit and I believe the answer is no, mainly because of the re-balancing stuff, which easily piles up thunks and it seems to be hard to avoid that. But on the other hand - I know nothing about that stuff ;-)

Many thanks!

@eskimor
Copy link
Author

eskimor commented Dec 18, 2017

There is some efficiency cost to removing those strictness annotations. In particular, when GHC opens up a Bin constructor, it knows that the left and right branches are in WHNF, so it doesn't need to bother dealing with the possibility that they aren't. If you change the constructor to make those fields lazy, then GHC can no longer take advantage of this. I would expect the degradation to be small but measurable. You're welcome to try it out; if the performance difference is small enough, that could change the balance.

That's a good catch, I haven't thought about that. I am curious already. :-)

@mikeplus64
Copy link
Contributor

mikeplus64 commented Mar 15, 2018

Another concern is that making this change substantially increases the potential for functions to be too lazy by mistake. It has been hard enough to ensure that the operations are as strict as they should be in the map elements; I fixed several long-standing strictness bugs in the last release or two. You would have to create a smart constructor standing in for Bin and replace Bin by the smart constructor everywhere appropriate. Then any future changes to the module will need to be audited to ensure that all uses of the raw constructor are appropriate.

GHC now has pattern synonyms that could be used to do this without a massive refactor and cognitive overhead. Something like

data Map k a  
  = LazyBin {-# UNPACK #-} !Size !k a (Map k a) (Map k a)
  | Tip

pattern Bin :: Size -> k -> a -> Map k a -> Map k a -> Map k a
pattern Bin sz k a l r <- LazyBin !sz !k a !l !r
  where Bin !sz !k a !l !r = LazyBin sz k a l r

It'd mean dropping GHC <7.8 though.

@treeowl
Copy link
Contributor

treeowl commented Mar 15, 2018

It would mean losing much hope of getting it working with non-GHC implementations as well, which goes against the tradition of containers.

@BurningWitness
Copy link

BurningWitness commented Aug 22, 2023

I've been writing R- and radix trees for at least a year now and, since tree libraries naturally want to mimic containers, the seemingly arbitrary strictness decisions have been a really interesting problem.

Why was the choice made to share representation instead of having a fully lazy Map and a fully strict one? In its current state isn't Lazy.Map k a essentially just Strict.Map k (Solo a), albeit specialized for extra speed?

@treeowl
Copy link
Contributor

treeowl commented Aug 22, 2023

Yes, you could definitely see it that way. Optimized for speed, but even more so for compactness.

@BurningWitness
Copy link

BurningWitness commented Aug 22, 2023

Oh no, this brings a lot of questions to the table and I'm greatly sorry for this deluge:

  1. "A Map is strict in its keys but lazy in its values", while indeed true, does not really communicate the gravity of mentioned strictness well enough and I imagine the overwhelming majority of people naively think it's just a lazy tree that evaluates just the keys given to it.

    In all fairness a "lazy hash map" is oxymoronic in regards to its operation and that should've been an indication something is awry, yet here I stand.

  2. Doesn't this mean that Data.Map.Lazy is really only there for a small percentage of people who are conscious about evaluating thunks during insertions/unions, but don't want to spend 2 (?) extra bytes on each Solo allocation?

    Not that these people should be disenfranchised, just that this could be represented either as extra function flavors or a separate .Strict.* module, instead of a .Lazy one.

  3. This also means that the ecosystem legitimately has no lazy dictionaries in it and that's... weird. I don't view lazy dictionaries as interchangeable with strict ones, a lazy dictionary is much closer to a lazy list than a persistent data structure in my mind.

    It limits the ecosystem quite a bit, as downstream libraries can't make certain complex lazy datatypes without lazy dictionaries (radix trees are that kinda case). Also the aforementioned "mimicking containers" thing.

Obviously these are containers-1.0-tier quandaries, but it should be noted that even the strictness lunatics are dissatisfied as shared datatypes means shared instances and so they had to branch out into their own universe with libraries like strict-containers.

@BurningWitness
Copy link

BurningWitness commented Nov 14, 2023

After thinking about this for a while, any trees that can rebalance the root node have no reason to be lazy because any operation has to first evaluate the root node, forcing every previous operation. Map thus only makes sense in spine-strict form.

Curiously enough binary radix trees do not have this requirement, so IntMap could have a lazy variant. This would be a partial solution for the requested problem in the case that SomeKey is an enumeration.

The case where SomeKey is a string requires a proper radix tree implementation and I will not burden this repository with it: both due to the fact that different data structures make most sense in different packages, and because using UnliftedDatatypes requires a completely different codestyle and pushes the lower base boundary to 4.16.

This issue as such can be closed, as the current implementation of Map cannot be used to solve it.


My opinion on the module naming still stands: .Lazy for fully lazy, .Strict for fully strict, probably .Strict.Spine for spine-strict and thunk-allocating. Lazy.RadixTree would not behave similar to Lazy.Map Text and I can't name the module Data.RadixTree.UTF8.TrulyLazy to make this clear for the end user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants