New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix evaluation order problem #966

Merged
merged 39 commits into from Feb 15, 2017

Conversation

Projects
None yet
6 participants
@mshinwell
Contributor

mshinwell commented Dec 13, 2016

(Was PR#7346)

Selectgen can currently reorder load instructions against aliasing stores when emitting the arguments to operations (such as function calls, allocation, etc). This causes inconsistency with the bytecode compiler's evaluation order. This patch should prevent it happening. I have also stopped the moving of Ccheckbound since that too has effects.

For the record, even though the evaluation order is officially unspecified, it is a matter of concern for Jane Street that it may vary between bytecode and native.

(By the way this is not a fix for PR#6136, which I am going to look at next.)

@mshinwell mshinwell added the bug label Dec 13, 2016

@mshinwell mshinwell added this to the 4.05.0 milestone Dec 13, 2016

@mshinwell mshinwell requested a review from xavierleroy Dec 13, 2016

@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Dec 13, 2016

Contributor

I need to be reassured that this doesn't degrade performance, esp. for Cload.

Contributor

xavierleroy commented Dec 13, 2016

I need to be reassured that this doesn't degrade performance, esp. for Cload.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 13, 2016

Contributor

We can try to run Louis's benchmarking suite on this, although I'm not sure how much is running on trunk yet.

@chambart I thought Flambda would be immune to this but I think I might be wrong---it looks like [Un_anf] might be able to generate e.g. applications that contain side effects. (See the comment "When sequences of let-bindings...")

Contributor

mshinwell commented Dec 13, 2016

We can try to run Louis's benchmarking suite on this, although I'm not sure how much is running on trunk yet.

@chambart I thought Flambda would be immune to this but I think I might be wrong---it looks like [Un_anf] might be able to generate e.g. applications that contain side effects. (See the comment "When sequences of let-bindings...")

@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Dec 13, 2016

Contributor

Make sure you test x86-32 bits with floating-point codes. Your change has potential to kill the way ocamlopt utilises the x87 FP stack.

Contributor

xavierleroy commented Dec 13, 2016

Make sure you test x86-32 bits with floating-point codes. Your change has potential to kill the way ocamlopt utilises the x87 FP stack.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 13, 2016

Contributor

@xavierleroy What about if we annotated Cload as to whether it is reading from a mutable field? Apparently in multicore the code up to Cmmgen already knows this property, so we could use that to obtain this annotation. Then we could prevent only the loads which read from mutable fields from being re-ordered.

Contributor

mshinwell commented Dec 13, 2016

@xavierleroy What about if we annotated Cload as to whether it is reading from a mutable field? Apparently in multicore the code up to Cmmgen already knows this property, so we could use that to obtain this annotation. Then we could prevent only the loads which read from mutable fields from being re-ordered.

@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Dec 13, 2016

Contributor

Mutability information is always good to have. But many FP loads that are performance critical are from mutable storage (arrays, mutable unboxed record fields), so there is still much potential for performance breakage, esp on x86-x87.

Contributor

xavierleroy commented Dec 13, 2016

Mutability information is always good to have. But many FP loads that are performance critical are from mutable storage (arrays, mutable unboxed record fields), so there is still much potential for performance breakage, esp on x86-x87.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 27, 2016

Contributor

The first version of this had dire performance even on normal x86-64. I've made a revised version which will be benchmarked shortly.

Contributor

mshinwell commented Dec 27, 2016

The first version of this had dire performance even on normal x86-64. I've made a revised version which will be benchmarked shortly.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 27, 2016

Contributor

@chambart This issue appears to be more serious than I first thought: it looks like code that does not depend on unspecified evaluation order can be incorrectly re-ordered as a result of this bug when compiled with Flambda. (Thanks to Yaron Minsky for hard questioning that lead me to construct this test case.)

let i = ref 0

let f x y =
  Printf.printf "%d %d\n" x y;
  0
[@@inline never]

let foo _ = ()

let foobar baz =
  let incr_i _ =
    incr i;
    !i
  in
  let b = !i in
  let z = foo 42 in
  let a = (incr_i [@inlined never]) z in
  let x = f a b in
  x + 1

let () =
  ignore ((foobar 0) : int)

I think this is because we assume during Un_anf that the backend is always going to evaluate right-to-left. For example, the above code built with Flambda (at the default optimization level) produces the following Clambda for foobar:

(+
  (apply* camlBreak_eval_order__f_21 
    (apply* camlBreak_eval_order__incr_i_77  0a)
    (field 0 (field 0 "camlBreak_eval_order__Pmakeblock_124")))
  1)

(camlBreak_eval_order__Pmakeblock_124 is the reference called i.)

This bug does at least seem rather difficult to trigger because the "let-moveable" transformation in Un_anf is very conservative. Once we manage to fix this problem in a satisfactory manner we should consider applying this to the 4.04 branch also.

Contributor

mshinwell commented Dec 27, 2016

@chambart This issue appears to be more serious than I first thought: it looks like code that does not depend on unspecified evaluation order can be incorrectly re-ordered as a result of this bug when compiled with Flambda. (Thanks to Yaron Minsky for hard questioning that lead me to construct this test case.)

let i = ref 0

let f x y =
  Printf.printf "%d %d\n" x y;
  0
[@@inline never]

let foo _ = ()

let foobar baz =
  let incr_i _ =
    incr i;
    !i
  in
  let b = !i in
  let z = foo 42 in
  let a = (incr_i [@inlined never]) z in
  let x = f a b in
  x + 1

let () =
  ignore ((foobar 0) : int)

I think this is because we assume during Un_anf that the backend is always going to evaluate right-to-left. For example, the above code built with Flambda (at the default optimization level) produces the following Clambda for foobar:

(+
  (apply* camlBreak_eval_order__f_21 
    (apply* camlBreak_eval_order__incr_i_77  0a)
    (field 0 (field 0 "camlBreak_eval_order__Pmakeblock_124")))
  1)

(camlBreak_eval_order__Pmakeblock_124 is the reference called i.)

This bug does at least seem rather difficult to trigger because the "let-moveable" transformation in Un_anf is very conservative. Once we manage to fix this problem in a satisfactory manner we should consider applying this to the 4.04 branch also.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Dec 28, 2016

Contributor

I'm still working on this problem, but as an aside, it looks like the list of side-effecting expressions in is_simple_expr may be missing some cases even ignoring Cload (Ccheckbound in particular is absent, and can raise an exception). (Actually never mind, I see this was noted above.)

Contributor

mshinwell commented Dec 28, 2016

I'm still working on this problem, but as an aside, it looks like the list of side-effecting expressions in is_simple_expr may be missing some cases even ignoring Cload (Ccheckbound in particular is absent, and can raise an exception). (Actually never mind, I see this was noted above.)

@mshinwell mshinwell changed the title from Fix one example of different evaluation order in bytecode and native to Fix evaluation order problem Dec 28, 2016

mshinwell added some commits Dec 28, 2016

Show outdated Hide outdated asmcomp/selectgen.ml
@@ -194,18 +195,139 @@ let join_array rs =
(* Name of function being compiled *)
let current_function_name = ref ""
(* Environment parameter for the function being compiled, if any. *)
let current_function_env_param = ref None

This comment has been minimized.

@alainfrisch

alainfrisch Jan 1, 2017

Contributor

The role of this reference is to detect a common case of reading from an immutable block (the current closure). This seems a bit hackish. Couldn't one mark the load instruction instead? This seems cleaner and might (perhap later) cover more cases.

@alainfrisch

alainfrisch Jan 1, 2017

Contributor

The role of this reference is to detect a common case of reading from an immutable block (the current closure). This seems a bit hackish. Couldn't one mark the load instruction instead? This seems cleaner and might (perhap later) cover more cases.

This comment has been minimized.

@mshinwell

mshinwell Jan 5, 2017

Contributor

I have a plan to combine a patch from the multicore branch and an old patch of mine to propagate mutability information into Cmm code. However it's a fair amount of work to do that, so for now, I'd just like to catch the very common case of loads from the current closure.

@mshinwell

mshinwell Jan 5, 2017

Contributor

I have a plan to combine a patch from the multicore branch and an old patch of mine to propagate mutability information into Cmm code. However it's a fair amount of work to do that, so for now, I'd just like to catch the very common case of loads from the current closure.

This comment has been minimized.

@mshinwell

mshinwell Jan 5, 2017

Contributor

(Actually I've incorporated the old patch of mine, but not yet the one which propagates mutability information from the middle-end. I'm going to let @lpw25 submit that separately.)

@mshinwell

mshinwell Jan 5, 2017

Contributor

(Actually I've incorporated the old patch of mine, but not yet the one which propagates mutability information from the middle-end. I'm going to let @lpw25 submit that separately.)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.

@chambart

chambart Feb 13, 2017

Contributor

This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 13, 2017

Contributor

This revised version is ready for review for 4.05. There are two parts:

  1. Addition of mutability information to Cload. This is deliberately conservative at the moment.
  2. A simple effect/coeffect analysis in Selectgen to restrict the movement of expressions to avoid the original evaluation order problem. (Note that deferral of expressions is still guarded by the original is_simple_expr.)

A substantial amount of effort has been invested in this patch to try to ensure that any performance degradation is kept to a minimum. The evolution of the code has largely been guided by benchmarking.

The multicore devs have a patch to propagate mutability information from earlier in the compiler which can link up with the new annotation on Cload and remove the need for the special case tracking the environment parameter.

We also have a 4.03 version of this patch (being used internally at Jane Street); performance numbers on x86-64 Linux can be seen here:
http://bench.flambda.ocamlpro.com/compare?test=2017-02-12-2300%2F4.03.1%2B%2Bpr966%2Bbench&reference=2017-02-12-2300%2F4.03.1%2Bdev%2Bbench

This patch is being re-tested on i386 32-bit at present and numbers should be available shortly. I expect it to show a degradation of a couple of percent on floating-point intensive work.

Contributor

mshinwell commented Feb 13, 2017

This revised version is ready for review for 4.05. There are two parts:

  1. Addition of mutability information to Cload. This is deliberately conservative at the moment.
  2. A simple effect/coeffect analysis in Selectgen to restrict the movement of expressions to avoid the original evaluation order problem. (Note that deferral of expressions is still guarded by the original is_simple_expr.)

A substantial amount of effort has been invested in this patch to try to ensure that any performance degradation is kept to a minimum. The evolution of the code has largely been guided by benchmarking.

The multicore devs have a patch to propagate mutability information from earlier in the compiler which can link up with the new annotation on Cload and remove the need for the special case tracking the environment parameter.

We also have a 4.03 version of this patch (being used internally at Jane Street); performance numbers on x86-64 Linux can be seen here:
http://bench.flambda.ocamlpro.com/compare?test=2017-02-12-2300%2F4.03.1%2B%2Bpr966%2Bbench&reference=2017-02-12-2300%2F4.03.1%2Bdev%2Bbench

This patch is being re-tested on i386 32-bit at present and numbers should be available shortly. I expect it to show a degradation of a couple of percent on floating-point intensive work.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 13, 2017

Contributor

One of the benchmarks exhibits a small difference in allocation numbers but a large difference in compactions. It is believed that this is because of potential variable lifetime changes as a result of this patch. For example for some function alloc which allocates, the code:

foo(alloc(), load block[0])

was effectively translated as:

let alloc_result = alloc () in
foo(alloc_result, load block[0])

With this patch, we instead get:

let block_result = load block[0] in
let alloc_result = alloc () in
foo(alloc_result, block_result)

As such, if block is dead after the load, it might be collected during the call to alloc. Without the patch it would live for longer.

Contributor

mshinwell commented Feb 13, 2017

One of the benchmarks exhibits a small difference in allocation numbers but a large difference in compactions. It is believed that this is because of potential variable lifetime changes as a result of this patch. For example for some function alloc which allocates, the code:

foo(alloc(), load block[0])

was effectively translated as:

let alloc_result = alloc () in
foo(alloc_result, load block[0])

With this patch, we instead get:

let block_result = load block[0] in
let alloc_result = alloc () in
foo(alloc_result, block_result)

As such, if block is dead after the load, it might be collected during the call to alloc. Without the patch it would live for longer.

@chambart

It seems correct and fixes the bug.

I'd prefer the environment variable hack to go away and be done in cmmgen.
The tests could be written in cmm to be more robust.

@@ -227,6 +305,65 @@ method is_simple_expr = function
end

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

It might be worth expanding that _ pattern.

@chambart

chambart Feb 13, 2017

Contributor

It might be worth expanding that _ pattern.

Show outdated Hide outdated asmcomp/selectgen.ml
@@ -711,13 +848,13 @@ method emit_expr (env:environment) exp =
try env_find_static_exception nfail env
with Not_found ->
fatal_error ("Selection.emit_expr: unboun label "^
string_of_int nfail)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

spurious unrelated indentation change. (even if it fixes the indentation)

@chambart

chambart Feb 13, 2017

Contributor

spurious unrelated indentation change. (even if it fixes the indentation)

Show outdated Hide outdated asmcomp/selectgen.ml
in
(* Intermediate registers to handle cases where some
registers from src are present in dest *)
registers from src are present in dest *)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

unrelated indentation too

@chambart

chambart Feb 13, 2017

Contributor

unrelated indentation too

Show outdated Hide outdated asmcomp/selectgen.ml
let tmp_regs = Reg.createv_like src in
(* Ccatch registers are created with type Val. They must not
contain out of heap pointers *)
contain out of heap pointers *)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

unrelated indentation too

@chambart

chambart Feb 13, 2017

Contributor

unrelated indentation too

([], EC.none)
exp_list
in
List.fold_left (fun results_and_env (exp, effects_after) ->

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

This does not behave as the original one in case of overloading. Does it ever happen ?

@chambart

chambart Feb 13, 2017

Contributor

This does not behave as the original one in case of overloading. Does it ever happen ?

Show outdated Hide outdated asmcomp/selectgen.ml
required semantics of [loc_external_arguments] (see proc.mli). *)
across multiple registers to line up correctly, by virtue of the
semantics of [split_int64_for_32bit_target] in cmmgen.ml, and the
required semantics of [loc_external_arguments] (see proc.mli). *)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

Unrelated indentation

@chambart

chambart Feb 13, 2017

Contributor

Unrelated indentation

Show outdated Hide outdated asmcomp/amd64/selection.ml
@@ -133,13 +133,21 @@ method is_immediate_natint n = n <= 0x7FFFFFFFn && n >= -0x80000000n
method! is_simple_expr e =
match e with
| Cop(Cextcall (fn, _, _, _), args, _)

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

unrelated whitespace

@chambart

chambart Feb 13, 2017

Contributor

unrelated whitespace

method! effects_of e =
match e with
| Cop(Cextcall(fn, _, _, _), args, _)
when List.mem fn inline_ops ->

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

You should add a comment on inline_ops definition to tell that if a new non-effect free is included this should be updated.
Same remark for arm64 and i386

@chambart

chambart Feb 13, 2017

Contributor

You should add a comment on inline_ops definition to tell that if a new non-effect free is included this should be updated.
Same remark for arm64 and i386

Show outdated Hide outdated asmcomp/selectgen.ml
@@ -194,18 +195,139 @@ let join_array rs =
(* Name of function being compiled *)
let current_function_name = ref ""
(* Environment parameter for the function being compiled, if any. *)
let current_function_env_param = ref None

This comment has been minimized.

@chambart

chambart Feb 13, 2017

Contributor

This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.

@chambart

chambart Feb 13, 2017

Contributor

This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.

Show outdated Hide outdated asmcomp/cmmgen.ml
@@ -542,7 +545,7 @@ let get_tag ptr dbg =
if Proc.word_addressed then (* If byte loads are slow *)
Cop(Cand, [get_header ptr dbg; Cconst_int 255], dbg)
else (* If byte loads are efficient *)
Cop(Cload Byte_unsigned,
Cop(Cload (Byte_unsigned, Mutable), (* Same comment as above *)

This comment has been minimized.

@lpw25

lpw25 Feb 14, 2017

Contributor

Not obvious what this comment refers to. Maybe just explicitly repeat that tags can be mutated by Obj.set_tag.

@lpw25

lpw25 Feb 14, 2017

Contributor

Not obvious what this comment refers to. Maybe just explicitly repeat that tags can be mutated by Obj.set_tag.

@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 Feb 14, 2017

Contributor

I've reviewed this, it looks fine.

Contributor

lpw25 commented Feb 14, 2017

I've reviewed this, it looks fine.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Feb 14, 2017

Member

We have two positive reviews so this is good to go. @mshinwell please remove the unrelated whitespace changes, then go ahead and merge.

Member

damiendoligez commented Feb 14, 2017

We have two positive reviews so this is good to go. @mshinwell please remove the unrelated whitespace changes, then go ahead and merge.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 15, 2017

Contributor

I've addressed all of the code review comments save for the one about rewriting the testcases in Cmm, which I think is going to have to wait, and one I don't understand about overloading. I have asked @chambart about the latter. This should be ready for merging shortly.

Contributor

mshinwell commented Feb 15, 2017

I've addressed all of the code review comments save for the one about rewriting the testcases in Cmm, which I think is going to have to wait, and one I don't understand about overloading. I have asked @chambart about the latter. This should be ready for merging shortly.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 15, 2017

Contributor

The overloading comment was I believe supposed to say overriding; there should be no issue, since none of the "emit_parts*" functions are exposed in the .mli.

Contributor

mshinwell commented Feb 15, 2017

The overloading comment was I believe supposed to say overriding; there should be no issue, since none of the "emit_parts*" functions are exposed in the .mli.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 15, 2017

Contributor

@lpw25 has read the post-code-review changes so I'm going to merge this now. I've asked @chambart to eyeball them in any case, but I fully expect that to be fine.

Contributor

mshinwell commented Feb 15, 2017

@lpw25 has read the post-code-review changes so I'm going to merge this now. I've asked @chambart to eyeball them in any case, but I fully expect that to be fine.

@mshinwell mshinwell merged commit ea5fa10 into ocaml:trunk Feb 15, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment