Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix strictness bugs in fromDistinctAscList and fromDistinctDescList #996

Merged
merged 3 commits into from
Mar 28, 2024

Conversation

neilmayhew
Copy link
Contributor

The fromDistinctAscList and fromDistinctDescList functions from Data.Map.Strict fail to enforce strictness on values.

Consider the following ghci session:

λ> import qualified Data.Map.Strict as Map
λ> l = Map.elems . Map.fromDistinctAscList . zip ['a'..'f'] $ Just <$> [0::Int ..]
λ> length l
6
λ> :sprint l
l = [Just 0,_,Just 2,_,Just 4,_]

The odd-numbered elements are still thunks. This strange behaviour is caused by the two-state algorithm used in the functions. In particular, in State1 the value isn't forced because the Push datatype from the lazy interface is used, and this doesn't have strict fields:

next (State1 l stk) (kx, x) = State0 (Push kx x l stk)

The solution is to add bangs to the tuple arguments in the State1 pattern like there are in the State0 pattern:

next (State0 stk) (!kx, !x) = ...
next (State1 l stk) (!kx, !x) = ...

I've added two new NoThunks tests that demonstrate this problem.

Unfortunately, the existing strictness tests aren't failing, for several reasons:

  1. The fromDistinctAscList tests don't actually call fromDistinctAscList
  2. The tests use a one-element list as input
  3. The units used as elements aren't lazy enough

I haven't tried to fix these as I'm not sure how they would ever fail in their present form.

The benchmarks show no difference in performance as a result of this PR.

The bug isn't present in IntMap but I added a test for it anyway.

@treeowl
Copy link
Contributor

treeowl commented Mar 28, 2024

Oh dear. That fromDistinctAscList test is utterly bogus, as you say. I would like to see something more meaningful put in its place, preferably as a QuickCheck property that puts a single bottom in the input list and looks for the exception (presumably using ChasingBottoms as elsewhere). There are several other similarly problematic tests there. Do you think you can deal with these testing issues, or should I give it a go?

@neilmayhew
Copy link
Contributor Author

I'd like it if we could merge this PR as-is, relying entirely on the new NoThunks tests for now, and then open a new PR for fixing the existing tests.

I've spent a fair bit of time on this already and need to get back to other things, so I'd be happy for you to take on the follow-up PR.

@treeowl
Copy link
Contributor

treeowl commented Mar 28, 2024

Wow ... somehow my eyes skipped over those tests altogether. Thanks. Yes, that'll do. Merging.

@treeowl treeowl merged commit 855d6f8 into haskell:master Mar 28, 2024
10 checks passed
@treeowl
Copy link
Contributor

treeowl commented Mar 28, 2024

How urgently do you need this fix released to Hackage? I can definitely get it done this weekend, but if you need it for tomorrow I can probably make that happen.

pStrictFromDistinctAscList = whnfHasNoThunks . evalSpine . M.elems . M.fromDistinctAscList . zip [0::Int ..] . map (Just $!)
where
evalSpine xs = length xs `seq` xs
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Should we make it absolutely clear that those are really thunks going in by using (Just $!) . hiddenId, where hiddenId = id, but it's defined in an unoptimized module that gets no strictness analysis and no unfoldings? As it is, we seem to be relying on map (Just $!) not getting optimized into the Arbitrary [Int] generation. I'm almost certain it won't be (with current GHC), but that feels a tad fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had wondered about that, too. However, I don't think the hiddenId would solve the problem, because it's not the Int that is the thunk, it's the Maybe. I found I had to use Just $! rather than Just otherwise even with the fix there were still thunks in the Int that QuickCheck provides. Doing it this way, the thunk is then a Just of an already-evaluated Int.

Like you say, though, there's nothing to stop a future optimizer deciding that the Just doesn't need to be a thunk and inserting it directly. To stop that I think we'd need a hiddenJust. Even better, though, would be to use something other than Maybe, since it obscures the purpose of what we're trying to do. A possible alternative would be Box from GHC.Exts.Heap. That's designed to hold bottoms safely, so I don't think it would ever be optimized away.

Copy link
Contributor

@treeowl treeowl Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally understanding what you said there. The point is that when GHC sees "Store Just $! x in the heap" (in this case, in a list cons), what code it generates depends on whether it knows that x is already in WHNF. If it does, then it can just drop the $!, so the code will allocate a Just object. If it doesn't know that, then it'll instead allocate a thunk that, when forced, will evaluate x and store a pointer to the result in a Just object. We want the latter behavior. You're right that Just is the wrong shape, but Box is overkill—we don't need its pseudo-existential type. The right tool is Data.Tuple.Solo. The place to get that for testing (for now) is the OneTuple compatibility package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just $!) . hiddenId does the trick because it makes sure GHC doesn't know that the argument to Just $! is already in WHNF. The same goes for MkSolo rather than Just.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you mean. That's a more precise analysis than I had.

Yes, I think Solo would be good. I had considered it but forgot about it when I made my previous comment.

@neilmayhew
Copy link
Contributor Author

How urgently do you need this fix released to Hackage? I can definitely get it done this weekend, but if you need it for tomorrow I can probably make that happen.

Thanks! I'm able to use a workaround in my project so I'm not in a hurry for a Hackage release.

Thanks for merging, though.

@neilmayhew neilmayhew deleted the fromDistinct-strictness branch March 28, 2024 23:45
@meooow25
Copy link
Contributor

Thanks for the fix!
I'll take a shot at improving the strictness tests so that a bug like this doesn't slip through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants