Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settle on a consistent strictness for fromList and fromAscList #473

Open
gereeter opened this issue Jan 3, 2018 · 8 comments
Open

Settle on a consistent strictness for fromList and fromAscList #473

gereeter opened this issue Jan 3, 2018 · 8 comments

Comments

@gereeter
Copy link

gereeter commented Jan 3, 2018

(Originally described in #340 (comment))

In both Map and IntMap, the strict versions of fromList and fromAscList show inconsistent and somewhat nonsensical behaviour:

Function Result on [(0, ⊥), (0, 0)] Result on [(0, 0), (0, ⊥), (0, 0)]
fromList
fromAscList [(0, 0)]
Prelude Data.IntMap.Strict> fromList [(0, undefined), (0, 0)]
fromList *** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:2:15 in interactive:Ghci1
Prelude Data.IntMap.Strict> fromAscList [(0, undefined), (0, 0)]
fromList [(0,0)]
Prelude Data.IntMap.Strict> fromList [(0, 0), (0, undefined), (0, 0)]
fromList *** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:4:23 in interactive:Ghci2
Prelude Data.IntMap.Strict> fromAscList [(0, 0), (0, undefined), (0, 0)]
fromList *** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:5:26 in interactive:Ghci2

Map and IntMap show the same behaviour as each other.

@treeowl
Copy link
Contributor

treeowl commented Jan 3, 2018

The result of fromList is fine, and should stay as it is. We probably can't do anything other than that without a potentially large performance penalty.

I think we obviously want the two example arguments to fromAscList to give the same result as each other, but which one? The easiest thing is to force each element immediately, which would give it the same strictness as fromList. The alternative is to find the last element with each key before forcing. I don't have a very strong opinion about which way is most reasonable.

@gereeter
Copy link
Author

gereeter commented Jan 3, 2018

I think it makes the most sense for fromAscList to force everything and match fromList. It is simple to describe and reason about and it allows replacing fromAscList with fromList for safety without having to worry about bottom values.

@sjakobi
Copy link
Member

sjakobi commented Aug 4, 2020

I think it makes the most sense for fromAscList to force everything and match fromList. It is simple to describe and reason about and it allows replacing fromAscList with fromList for safety without having to worry about bottom values.

@treeowl, @int-e, what do you think about this idea?

@treeowl
Copy link
Contributor

treeowl commented Aug 4, 2020

Yeah, sure.

@int-e
Copy link
Contributor

int-e commented Aug 4, 2020

Note that fromAscList is implemented in terms of fromAscListWithKey,

-- abridged from Data.Map.Strict.Internal
fromAscList :: Eq k => [(k,a)] -> Map k a
fromAscList xs  = fromAscListWithKey (\_ x _ -> x) xs

fromAscListWithKey :: Eq k => (k -> a -> a -> a) -> [(k,a)] -> Map k a

and that exhibits the same discrepancy:

fromListWithKey    (\_ x _ -> x) [(0,undefined),(0,())] = _|_
fromAscListWithKey (\_ x _ -> x) [(0,undefined),(0,())] = fromList [(0,())]

We should not make it strict in all given values though because

fromListWithKey    (\_ _ x -> x) [(0,()),(0,undefined),(0,undefined)] = fromList [(0,())]
fromAscListWithKey (\_ _ x -> x) [(0,()),(0,undefined),(0,undefined)] = fromList [(0,())]

looks useful enough for people to rely on (I would rely on the former (at least for performance; imagine expensive computations instead of the bottoms), and if, by accident, the keys turn out to be sorted, I would switch to fromAscList and expect it to work).

On the other hand it's consistent with fromListWithKey to make fromAscListWithKey strict in the first associated value for each key. What's the cost of doing that? The easy approach (forcing the value in the combineEq helper) will force the value twice in the (common) case of a unique key, once here, and once when constructing the map in fromDistinctAscList.

(The last time I thought about this I gave up around here; this fromAscList[With[Key]] laziness corner case is odd, but pretty harmless, and fixing it (with benchmarking to assess the impact, and potentially trying several approaches) didn't seem worh the effort.)

treeowl pushed a commit that referenced this issue Aug 14, 2020
- the function is rather peculiar in that in the presence of
  duplicate keys, the first value is not evaluated, but all
  the others are evaluated. See also #473.
@infinity0
Copy link
Contributor

The test added in 9765edd in fact fails for containers 0.6.2.1 (distributed with GHC 8.10.4). It wasn't immediately obvious to me which commit "fixed" it.

@int-e
Copy link
Contributor

int-e commented Apr 18, 2021

@infinity0: I see only one failing test and that was added in b64a103, testing for extra thunks left behind by Data.IntMap.Lazy.fromAscList (the test group says IntMap.Strict, that's a typo).
#658 is the relevant pull request that fixed it.
If you see more than one failing strictness test case, that may happen due to lack of optimization when compiling containers.

@infinity0
Copy link
Contributor

@int-e Thanks for the explanation, yes I only see that one failing test also. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants