-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable CSE for atomic loads #12715
disable CSE for atomic loads #12715
Conversation
dcbc553
to
227bd66
Compare
There is a debate at #12713 about whether CSE for atomic loads is correct. I think that this debate should be resolved before we consider the present PR. I am closing for now, and we can reopen any time we want. |
Further examples from @polytypic over at #12713 suggest the following example where CSE is clearly (?) invalid: let x0 = Atomic.get x in
let y0 = Atomic.get y in
let x1 = Atomic.get x in Merging the two reads of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I've read the arguments in #12713 and agree with the conclusion (CSE of atomic loads is only allowed if no other memory operation occurs in-between, and it's not worth adding a special case for that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me.
Earlier, atomic loads were translated to an external call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. For those who wonder, the Op_other
classification mean that the result of an atomic load is treated as unpredictable, and so cannot be replaced by a move from the result of an earlier load at the same address. It's better than treating atomic loads as external calls, since regular loads occurring before and after the atomic load can still be factored out, while an external call is assumed to write all over memory.
Lines 1118 to 1120 in d77bc97
Hence, the compiler should not optimise atomic stores. It would be useful to do an audit of all the optimisations that we perform on loads of mutable locations. The easiest would be to disable any optimisations of atomic loads. For example, we have disabled the reordering of atomic loads in the scheduling pass in #12248. |
Changes
Outdated
@@ -540,6 +540,9 @@ Working version | |||
- #12684: fix locations filename in AST produced by the `-pp` option | |||
(Gabriel Scherer, review by Florian Angeletti) | |||
|
|||
- ???: disable common subexpression elimination for atomic loads | |||
(Gabriel Scherer, review by ???, report by Vesa Karvonen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was Carine Morel (@lyrm) who wrote the test that triggered the bug. Initially we assumed it was my mistake, but later I could not understand why the bug would be triggered (Atomic.get
is supposed to have an acquire fence and that should prevent the reordering from happening), so I examined the compiler output and realized that the compiler had eliminated the Atomic.get
instruction completely.
227bd66
to
dca5e6e
Compare
@lthls Is there a list of optimisations that we perform on loads of mutable locations? If such a list doesn't exist (very likely), how would we go about gathering it? I assume this would also be needed for flambda. |
No, there isn't such a list. For Flambda we decided to give some sensible semantics to the internal IR, and as a result we can check our optimisations against that (although we never published the semantics properly, hopefully we will correct that with Flambda 2). My intuition though is that for multicore-related issues we should only look at what happens from |
Since this change is disabling a problematic optimisation, and we are moving to 5.1.1 as the reference version for 5.1, I think it is better to cherry-pick it to 5.1. |
disable CSE for atomic loads (cherry picked from commit 0c963ce)
(fixes #12713)
@lthls made the exact same recommendation in the discussion of #12713. I would be interested in @xavierleroy's confirmation that this is a reasonable thing to do.
(I wonder why this was not done back when Multicore was merged. Was it a blind spot, we didn't think of which compiler optimisations should be disabled? Did we say we would disable CSE and forget to do it? Are there other bugs of this category looking around?)