New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: Force result of Icomp to be in a register #11808
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. A suggestion for shorter code below. To be copy-pasted to i386 if you adopt it.
An alternative that I considered is to force the result of Icomp
to be in the RAX register, and emit code like this:
xor rax, rax
cmp ...
setxx al
This avoids a partial register stall on writing to AL, which could be good for performance. However, if the result cannot be held in RAX, an extra move will be generated by the register allocator.
asmcomp/amd64/reload.ml
Outdated
(* The result must be a register *) | ||
let res = | ||
if stackp res.(0) | ||
then [|self#makereg res.(0)|] | ||
else res | ||
in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makereg
is a no-op if the argument is already a register, and there's makeregs
to handle arrays of regs. Proposed simplification:
(* The result must be a register *) | |
let res = | |
if stackp res.(0) | |
then [|self#makereg res.(0)|] | |
else res | |
in | |
(* The result must be a register (PR#11083) *) | |
let res = self#makeregs res in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think of calling makereg
even in the register case, but I didn't use makeregs
because it wasn't exported. I assume that there's no harm in exporting it though, so I've pushed a patch with your suggestion.
asmcomp/amd64/reload.ml
Outdated
| Iintop_imm(Icomp _, _) -> | ||
(* The result must be in a register *) | ||
if stackp res.(0) | ||
then (arg, [|self#makereg res.(0)|]) | ||
else (arg, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar suggestion:
| Iintop_imm(Icomp _, _) -> | |
(* The result must be in a register *) | |
if stackp res.(0) | |
then (arg, [|self#makereg res.(0)|]) | |
else (arg, res) | |
| Iintop_imm(Icomp _, _) -> | |
(* The result must be in a register (PR#11083) *) | |
(arg, self#makeregs res) |
That code sequence looks like it's probably better (as you say, avoiding the partial reg stall is good). But the xor should be after the |
Good point. But the xor changes the condition codes :-) A move should do, however:
|
I believe that |
Hah, true! Perhaps we should have Incidentally, on amd64 there's no need to have a |
There is some useful discussion here: https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and (see the section "Things are more complicated when you don't want to xor before a flag-setting instruction. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks!
OK, this is merged on 4.14. Once CI is happy we'll backport to trunk. |
Cherry-picked to trunk: c314da5 |
a09392d Set Menhir version back to 20210419 again (ocaml#89) cc63992 Merge pull request ocaml#88 from mshinwell/flambda-backend-changes-2022-12-27 3e49df3 HACKING.jst.adoc 1866676 Merge flambda-backend changes e012992 Merge pull request ocaml#87 from mshinwell/merge-4.14.1 ac5c7c8 Merge tag '4.14.1' into main 3da21bc add a useful debug printer 83b7c72 Document the debug_printers script 98896e0 Remove a tiny code stutter I came across 99cb5d9 release 4.14.1 b49060f last commit before tagging 4.14.1 fae9aef Add documentation 708e5a9 Add tests c609eee Bootstrap 7f922d0 Polymorphic parameters 51aeb04 Keep generalized structure from patterns when typing let 4b68bb3 Add test of princiaplity from polymorphic type constraints 82c7afe fix wong raise aca252f x86: Force result of Icomp to be in a register (ocaml#11808) 985725b Add dynlink_compilerlibs.mli to .gitignore (ocaml#79) 2b1fa24 Regenerate parser (ocaml#80) 1bb6c79 Merge pull request ocaml#78 from mshinwell/flambda-backend-patches-2022-12-13 9029581 Update otherlibs/dynlink/Makefile 3e4f1b9 Revert toplevel/native/dune to ocaml-jst version 6061e4c Regenerate configure using autoconf 2.71 888d4b1 Back out patch which disables alloc-check in ocaml-jst a6d5796 Fix dynlink build 3e46daf Update .depend files a5c547e Bootstrap a6a9031 Merge flambda-backend changes 0ac7fdd temp fix for linker error (ocaml#77) 1018602 Remove references to 32-bit Cygwin (ocaml#11797) e2d0d9e Enable individual testing with Makefile.jst (ocaml#76) f10cbf6 increment version number after tagging 4.14.1~rc1 11c5ab7 release 4.14.1~rc1 e4c3920 last commit before tagging 4.14.1~rc1 9e598ca Merge pull request ocaml#11793 from dra27/then-than 2a7e501 Use a more relaxed mode for unification in Ctype.subst (ocaml#11771) (ocaml#73) 7b35ef7 Statically initialize `caml_global_data` with a valid value (ocaml#11788) cbd791a Allow immediates to cross modes (ocaml#58) 85a0817 Merge pull request ocaml#11534 from gasche/follow-synonyms-in-show-module-type 699f43c Changes e54e9bc fix the 'stuttering' issue in #show d9799d3 test comments fec3b23 follow synonyms when #show-ing module types 06a1ad7 regression tests for ocaml#11533 (still failing) 549d757 Run "misplaced attributes" check when compiling mlis (ocaml#72) b2b74bf Fix bug in `Mtype.strengthen_lazy` causing spurious typing errors (ocaml#11776) a6c0e75 Ensure that Ctype.nongen always calls remove_mode_variables (ocaml#70) 6c50831 array elements are global (ocaml#67) bc510ed Ensure that types from packed modules are always generalised (ocaml#11732) 4d47036 Fix ocaml#10768 8788ff6 Add/move some documentation 9891a36 Propagate location information to `local_` in expressions 988306d Add support for `global_` and `nonlocal_` constructor arguments (ocaml#50) 6729eb8 Missing CAMLparam in win32's Unix.stat (ocaml#11737) e7dd740 Add debug_printers.ml (ocaml#63) 65f2896 more entries in gitignore (ocaml#62) a9a84d0 Move `global_flag` to `Asttypes` (ocaml#60) fac5896 Minor attribute fixes from flambda-backend 75f402e Note about make install and Makefile.jst (ocaml#56) fb5b1e4 Remove the -force-tmc flag (ocaml#11661) bd87a61 ocamlmklib: use `ar rcs` instead of `ar rc` (ocaml#11670) 83762af Merge pull request ocaml#11622 from Octachron/fix_recursive_types_in_constructor_mismatch ca48730 Merge pull request ocaml#11609 from Octachron/pr11194_unbound_and_printing_context git-subtree-dir: ocaml git-subtree-split: a09392d
Fixes #11803.
I've made the PR against 4.14, as the bug report only mentioned 32-bit native code which is not supported on trunk, but I think this should be considered for trunk too.
I've checked that the issue reported in #11803 disappears with this patch.