disable CSE for atomic loads #12715

gasche · 2023-11-03T13:10:30Z

@lthls made the exact same recommendation in the discussion of #12713. I would be interested in @xavierleroy's confirmation that this is a reasonable thing to do.

(I wonder why this was not done back when Multicore was merged. Was it a blind spot, we didn't think of which compiler optimisations should be disabled? Did we say we would disable CSE and forget to do it? Are there other bugs of this category looking around?)

gasche · 2023-11-03T14:03:49Z

There is a debate at #12713 about whether CSE for atomic loads is correct. I think that this debate should be resolved before we consider the present PR. I am closing for now, and we can reopen any time we want.

gasche · 2023-11-03T16:07:53Z

Further examples from @polytypic over at #12713 suggest the following example where CSE is clearly (?) invalid:

  let x0 = Atomic.get x in
  let y0 = Atomic.get y in
  let x1 = Atomic.get x in

Merging the two reads of x is wrong, because the read of y0 could have observed a write on x that happens after the read of x0. In operational terms, the read of y0 updated our frontier, so the read of x1 happens in a different state. Atomic loads mutate the frontier state.

lthls

Looks good to me.

I've read the arguments in #12713 and agree with the conclusion (CSE of atomic loads is only allowed if no other memory operation occurs in-between, and it's not worth adding a special case for that).

kayceesrk

The change looks good to me.

kayceesrk · 2023-11-04T10:50:22Z

I wonder why this was not done back when Multicore was merged. Was it a blind spot, we didn't think of which compiler optimisations should be disabled? Did we say we would disable CSE and forget to do it? Are there other bugs of this category looking around?

Earlier, atomic loads were translated to an external call to caml_atomic_load. The external calls are not optimised by the compiler and hence, the SC semantics for atomics was preserved. Later ocaml-multicore/ocaml-multicore#251 lowered it through the compiler. In this PR, we didn't take care of the interaction with the compiler optimisations. This was an oversight.

xavierleroy

Looks good to me. For those who wonder, the Op_other classification mean that the result of an atomic load is treated as unpredictable, and so cannot be replaced by a move from the result of an earlier load at the same address. It's better than treating atomic loads as external calls, since regular loads occurring before and after the atomic load can still be factored out, while an external call is assumed to write all over memory.

kayceesrk · 2023-11-04T11:13:16Z

Atomic.set is compiled to an atomic exchange, which is translated to an external call

ocaml/asmcomp/cmmgen.ml

Lines 1118 to 1120 in d77bc97

    
           | Patomic_exchange -> 
        
              Cop (Cextcall ("caml_atomic_exchange", typ_val, [], false), 
        
                   [transl env arg1; transl env arg2], dbg)

Hence, the compiler should not optimise atomic stores.

It would be useful to do an audit of all the optimisations that we perform on loads of mutable locations. The easiest would be to disable any optimisations of atomic loads. For example, we have disabled the reordering of atomic loads in the scheduling pass in #12248.

polytypic · 2023-11-04T17:51:42Z

Changes

@@ -540,6 +540,9 @@ Working version
 - #12684: fix locations filename in AST produced by the `-pp` option
  (Gabriel Scherer, review by Florian Angeletti)

+- ???: disable common subexpression elimination for atomic loads
+  (Gabriel Scherer, review by ???, report by Vesa Karvonen)


It was Carine Morel (@lyrm) who wrote the test that triggered the bug. Initially we assumed it was my mistake, but later I could not understand why the bug would be triggered (Atomic.get is supposed to have an acquire fence and that should prevent the reordering from happening), so I examined the compiler output and realized that the compiler had eliminated the Atomic.get instruction completely.

fixes ocaml#12713

kayceesrk · 2023-11-06T03:59:38Z

It would be useful to do an audit of all the optimisations that we perform on loads of mutable locations.

@lthls Is there a list of optimisations that we perform on loads of mutable locations? If such a list doesn't exist (very likely), how would we go about gathering it? I assume this would also be needed for flambda.

lthls · 2023-11-06T08:50:45Z

No, there isn't such a list. For Flambda we decided to give some sensible semantics to the internal IR, and as a result we can check our optimisations against that (although we never published the semantics properly, hopefully we will correct that with Flambda 2).
One thing that makes these discussions particularly tricky is that what is considered as a mutable location can change during compilation. For example, CSE of mutable loads across a mutable write are forbidden in the backend, but if the right conditions are met the middle-end might be able to do the CSE because at that point the loads are not mutable yet. (Example: let x0 = fst x in y := 1; let x1 = fst x).

My intuition though is that for multicore-related issues we should only look at what happens from Mach and forward. Everything before that is too high-level (for instance, evaluation order is not finalised until the translation to Mach).
That still leaves a decent number of files to look at, but I believe they're relatively easy to sort into optional optimisation passes and required, non-optimising passes. From memory I would say that the optimisation passes are Deadcode, Split, and CSE. Although it's likely that a few optimisations are hidden in the translation passes (Selection, Linearize, Emit`).

Octachron · 2023-12-06T12:35:41Z

Since this change is disabling a problematic optimisation, and we are moving to 5.1.1 as the reference version for 5.1, I think it is better to cherry-pick it to 5.1.

disable CSE for atomic loads (cherry picked from commit 0c963ce)

gasche force-pushed the atomic-load-no-cse branch from dcbc553 to 227bd66 Compare November 3, 2023 13:11

gasche mentioned this pull request Nov 3, 2023

Repeated Atomic.get optimized away incorrectly #12713

Closed

gasche closed this Nov 3, 2023

gasche reopened this Nov 3, 2023

lthls approved these changes Nov 3, 2023

View reviewed changes

kayceesrk approved these changes Nov 4, 2023

View reviewed changes

xavierleroy approved these changes Nov 4, 2023

View reviewed changes

polytypic reviewed Nov 4, 2023

View reviewed changes

disable CSE for atomic loads

dca5e6e

fixes ocaml#12713

gasche force-pushed the atomic-load-no-cse branch from 227bd66 to dca5e6e Compare November 4, 2023 19:47

gasche added the merge-me label Nov 4, 2023

gasche merged commit 0c963ce into ocaml:trunk Nov 4, 2023
9 of 10 checks passed

dra27 pushed a commit to dra27/ocaml that referenced this pull request Dec 6, 2023

Merge pull request ocaml#12715 from gasche/atomic-load-no-cse

57b6d3e

disable CSE for atomic loads (cherry picked from commit 0c963ce)

kayceesrk mentioned this pull request Dec 14, 2023

CSE of non-atomic load around atomic load breaks SC #12825

Closed

kayceesrk added the memory-model label Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable CSE for atomic loads #12715

disable CSE for atomic loads #12715

gasche commented Nov 3, 2023 •

edited

gasche commented Nov 3, 2023 •

edited

gasche commented Nov 3, 2023

lthls left a comment

kayceesrk left a comment

kayceesrk commented Nov 4, 2023 •

edited

xavierleroy left a comment

kayceesrk commented Nov 4, 2023 •

edited

polytypic Nov 4, 2023

kayceesrk commented Nov 6, 2023

lthls commented Nov 6, 2023

Octachron commented Dec 6, 2023

disable CSE for atomic loads #12715

disable CSE for atomic loads #12715

Conversation

gasche commented Nov 3, 2023 • edited

gasche commented Nov 3, 2023 • edited

gasche commented Nov 3, 2023

lthls left a comment

Choose a reason for hiding this comment

kayceesrk left a comment

Choose a reason for hiding this comment

kayceesrk commented Nov 4, 2023 • edited

xavierleroy left a comment

Choose a reason for hiding this comment

kayceesrk commented Nov 4, 2023 • edited

polytypic Nov 4, 2023

Choose a reason for hiding this comment

kayceesrk commented Nov 6, 2023

lthls commented Nov 6, 2023

Octachron commented Dec 6, 2023

gasche commented Nov 3, 2023 •

edited

gasche commented Nov 3, 2023 •

edited

kayceesrk commented Nov 4, 2023 •

edited

kayceesrk commented Nov 4, 2023 •

edited