-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong results from pure program using Data.Parallel #147
Comments
(I should have meant it doesn't seem to happen with |
Ouch, that's a nasty bug. I don't know what to do about this unless we have a smaller test case. :/ |
Yeah, I am sorry for this bug report. I don't think it would be easy to get a much smaller test case, though. I suspect it has to do with a combination of multithreading, data-memocombinators and unordered-containers. I might take a stab at it with a simpler game later. |
Problematic code in gamebooksolver looks like this:
One of the fixes looks like this:
I suspect ticket can be closed. |
I'm not sure this is quite the problem. So long as the order in which toList produces the elements is actually a pure function of the contents of the HashMap, it should be okay. If it's producing a different order on the same HashMap in different runs of the program, that's a serious issue. This bug appeared with multiple runs of the very same program which didn't have any input effects, and only appeared when run with pure parallelism turned on. The result of a program should be the same when using par vs. when not using it, so I'm fairly sure it has something to do with thread-unsafety of HashMap.insert or one of the other functions which is using reallyUnsafePtrEquality# internally. |
Documentation states
So even if order is different between runs - that's totally OK. But that's not the problem because contents of HashMap is different every time. Try using this code:
and store results in a file. Sort contents of those files so order is deterministic and compare them for example with |
Map sizes are also non-deterministic with Maps:
so with Map it works only because it produces a sorted list. |
That's not true btw. Consider simple implementation of
and when inserting
just appending them to the end of the list.
Actually that's the other way around, in applications where you handle user supplied data you might want to have different shape (and different order) in order to avoid collision DoS attack. |
The problematic code isn't about changing order of the |
Right, it's |
OTOH, |
That seems to be working. Interesting. I'll try a few more things later then. |
I managed to reduce sample to this -
In some cases I'll try to minimize the test case. |
In minimized test case bug is reproduceable ghc 7.10 and 8.0, but not in 8.2rc. Hmmm... Actually no, -O0 makes it reproduceable in ghc8.2rc as well |
This particular bug is fixed after changing
Still trying to figure out why. |
There is now a GHC ticket, and the test case there has been moreover minimised massively. |
If that's really a GHC issue, then I suppose this ticket should be closed. |
If. |
|
So the issue appears to be lazy blackholing. This is an optimization used by GHC to reduce the cost of blackholing closures. However, in the case of Clearly this needs to be documented a bit better in the users guide, but the fix is clear: ensure that |
While usually this is merely an optimization in parallel settings, in our case it's absolutely essential for correctness due to the use of internal mutability. See haskell-unordered-containers#147.
While usually this is merely an optimization in parallel settings, in our case it's absolutely essential for correctness due to the use of internal mutability. See haskell-unordered-containers#147.
It sounds like eager blackholing is a broken optimization. :( |
@tibbe lazy blackholing is a perfectly good optimization, but requires great care around unsafe |
It would be worth establishing if the unordered-containers code is actually wrong. @treeowl note that this is all in |
I just looked. Yes, your code is wrong. The unsafe modification functions
that modify arrays in place are utterly bogus, and I'm surprised GHC hasn't
blown this up before. I think pacak and I have settled on a vague idea
about how to fix it, and they're going to give it a go. Now the fact that
your code is wrong doesn't guarantee that GHC isn't *also* wrong, but I
think the code should be fixed first.
…On Apr 27, 2017 12:17 AM, "Johan Tibell" ***@***.***> wrote:
It would be worth establishing if the unordered-containers code is
actually wrong.
@treeowl <https://github.com/treeowl> note that this is all in ST, no IO.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABzi_VRizJ8HxsGaN3e_qEwbcQdPHbBGks5r0BbfgaJpZM4Mjq_e>
.
|
For the record, I don't believe that |
For the record, I'm not confident that we can consider this closed quite yet. |
I would agree that this should be reopened. Just to throw out another option for fixing this, |
https://ghc.haskell.org/trac/ghc/ticket/13615#comment:37
|
Ah. The branch you are referring to is exactly what I had in mind. That's surprising that it exhibits the same problem. |
That's what I wanted to write myself as well just never got around doing it. The bug is still there and the only safe solution I can suggest right now is to implement |
I would like to see the test that @bgamari is running so that I can confirm this on my machine. |
It's in ticket description.
|
Pacak, can you explain how to run the rest with a modified
unordered-containers?
…On Jun 14, 2017 9:34 AM, "pacak" ***@***.***> wrote:
I would like to see the test that @bgamari <https://github.com/bgamari>
is running so that I can confirm this on my machine.
It's in ticket description.
git clone https://github.com/pacak/cuddly-bassoon && cd cuddly-bassoon
cabal new-build
dist-newstyle/build/gamebooksolver-0.1.0.0/build/gamebooksolver-solvebook02/gamebooksolver-solvebook02
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABzi_W0Lz02PM4GAcfMMdiKvdFO7pmwAks5sD-FTgaJpZM4Mjq_e>
.
|
Everything is in |
I can confirm that I also get problems with the super-safe version. It's always the "Those are expected to be equal". I'm using a GHC 8.2 release candidate and cabal new-build. |
Expected to be equal, but not "WAT"?
|
Yeah, I never get the "WAT???" message:
|
Found part of problem. I've put up a fork of cuddly bassoon. I've made three changes from the original code @pacak put up:
I see this happening every time I run the test (again, this is without any concurrency involved):
|
Nevermind, I think the check in copyM is actually wrong. |
I've been looking at this for a while now, and it's really weird that @treeowl's branch has the same problem, I've slowly tried stripping out various in-place updates and replacing them by first making a copy of the structure and mutating that instead, and it doesn't help at all. The fact the the code runs fine in a single-threaded setting but not in a multi-threaded setting hints that in the window before a blackhole is committed, two threads end up sharing a mutable variable. However, the code doesn't use any functions that are known to cause this. No lazy ST, no unsafeDupablePerformIO, just runST. The parts of |
Andrew, since it's now almost certain that this is a GHC bug, it would
probably be better to add further thoughts (and especially simpler
examples!) to the GHC ticket rather than this one. But could you open a
fresh ticket for the bounds check issue? That needs to be fixed, and the CI
script needs to be adjusted to ensure it runs tests both with and without
debug mode.
…On Jun 14, 2017 12:07 PM, "Andrew Martin" ***@***.***> wrote:
I've been looking at this for a while now, and it's really weird that
@treeowl <https://github.com/treeowl>'s branch has the same problem, I've
slowly tried stripping out various in-place updates and replacing them by
first making a copy of the structure and mutating that instead, and it
doesn't help at all. The fact the the code runs fine in a single-threaded
setting but not in a multi-threaded setting hints that in the window before
a blackhole is committed, two threads end up sharing a mutable variable.
However, the code doesn't use any functions that are known to cause this.
No lazy ST, no unsafeDupablePerformIO, just runST. The parts of hashable
that get touched don't use them either.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABzi_WdxCgpa789lQo4B484MZyTc0Uodks5sEAVXgaJpZM4Mjq_e>
.
|
@andrewthad have a look at #13615 comment 36; I have a theory for what happens here but have been waiting for further feedback from @simonmar to confirm that it's plausible. |
@bgamari submitted a patch which fixes this issue. https://phabricator.haskell.org/D3695 |
This fix will be present in 8.2 (and in fact, is one of the reasons why 8.2 is so terribly late). The I suggest that drop |
This is unfortunately a bit of a bad bug report, because I have a whole program ...
With the following commit: https://github.com/bartavelle/gamebooksolver/tree/4c7028424ccf8b7309846d5c7d28ab7067d752af
Reproduction:
This will sometimes give a different result (been reproduced on 2 computers by me, and by two #haskell users).
Data.HashMap.Strict
is only ever used here, and when switching toData.Map.Strict
, I can't seem to reproduce the problem.The program uses quite a bit of memory (~ 2GB in the current settings), data-memocombinators and parallel. It only happens when using
-N8
.The text was updated successfully, but these errors were encountered: