Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86: Force result of Icomp to be in a register #11808

Merged
merged 3 commits into from Dec 14, 2022

Conversation

lthls
Copy link
Contributor

@lthls lthls commented Dec 12, 2022

Fixes #11803.
I've made the PR against 4.14, as the bug report only mentioned 32-bit native code which is not supported on trunk, but I think this should be considered for trunk too.
I've checked that the issue reported in #11803 disappears with this patch.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A suggestion for shorter code below. To be copy-pasted to i386 if you adopt it.

An alternative that I considered is to force the result of Icomp to be in the RAX register, and emit code like this:

  xor rax, rax
  cmp ...
  setxx al

This avoids a partial register stall on writing to AL, which could be good for performance. However, if the result cannot be held in RAX, an extra move will be generated by the register allocator.

Comment on lines 76 to 81
(* The result must be a register *)
let res =
if stackp res.(0)
then [|self#makereg res.(0)|]
else res
in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makereg is a no-op if the argument is already a register, and there's makeregs to handle arrays of regs. Proposed simplification:

Suggested change
(* The result must be a register *)
let res =
if stackp res.(0)
then [|self#makereg res.(0)|]
else res
in
(* The result must be a register (PR#11083) *)
let res = self#makeregs res in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of calling makereg even in the register case, but I didn't use makeregs because it wasn't exported. I assume that there's no harm in exporting it though, so I've pushed a patch with your suggestion.

Comment on lines 95 to 99
| Iintop_imm(Icomp _, _) ->
(* The result must be in a register *)
if stackp res.(0)
then (arg, [|self#makereg res.(0)|])
else (arg, res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion:

Suggested change
| Iintop_imm(Icomp _, _) ->
(* The result must be in a register *)
if stackp res.(0)
then (arg, [|self#makereg res.(0)|])
else (arg, res)
| Iintop_imm(Icomp _, _) ->
(* The result must be in a register (PR#11083) *)
(arg, self#makeregs res)

@stedolan
Copy link
Contributor

An alternative that I considered is to force the result of Icomp to be in the RAX register, and emit code like this:

That code sequence looks like it's probably better (as you say, avoiding the partial reg stall is good). But the xor should be after the cmp, because one of the arguments can arrive in RAX.

@xavierleroy
Copy link
Contributor

But the xor should be after the cmp, because one of the arguments can arrive in RAX.

Good point. But the xor changes the condition codes :-) A move should do, however:

  cmp ...
  movl $0, eax
  setxx al

@smuenzel
Copy link
Contributor

  cmp ...
  movl $0, eax
  setxx al

I believe that movl 0 is not one of the zeroing idioms recognized by most processors, meaning that the instruction has to be executed by the processor. xor is eliminated by rename, so it does not consume execution resources

@stedolan
Copy link
Contributor

But the xor changes the condition codes :-)

Hah, true! Perhaps we should have Emit choose xor; cmp; setcc when the result does not alias the input, and cmp; mov; setcc when it does?

Incidentally, on amd64 there's no need to have a mov at the end, since setcc can reach the low 8 bits of every integer register. (This is not true on i386 where the 8-bit registers are weird)

@smuenzel
Copy link
Contributor

There is some useful discussion here: https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and (see the section "Things are more complicated when you don't want to xor before a flag-setting instruction. ")

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@xavierleroy xavierleroy merged commit aca252f into ocaml:4.14 Dec 14, 2022
@xavierleroy
Copy link
Contributor

OK, this is merged on 4.14. Once CI is happy we'll backport to trunk.

xavierleroy pushed a commit that referenced this pull request Dec 14, 2022
Also: export `makeregs` from the Reloadgen interface.

Fixes: #11803

(cherry picked from commit aca252f)
@xavierleroy
Copy link
Contributor

Cherry-picked to trunk: c314da5

@xavierleroy
Copy link
Contributor

@stedolan @smuenzel : I pushed the quick fix for issue #11803, but you're welcome to try and improve the x86-64 code generated for Icomp instructions. In my opinion, it's not performance-critical, but it's a good reminder of how awful the x86 ISA is :-)

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
a09392d Set Menhir version back to 20210419 again (ocaml#89)
cc63992 Merge pull request ocaml#88 from mshinwell/flambda-backend-changes-2022-12-27
3e49df3 HACKING.jst.adoc
1866676 Merge flambda-backend changes
e012992 Merge pull request ocaml#87 from mshinwell/merge-4.14.1
ac5c7c8 Merge tag '4.14.1' into main
3da21bc add a useful debug printer
83b7c72 Document the debug_printers script
98896e0 Remove a tiny code stutter I came across
99cb5d9 release 4.14.1
b49060f last commit before tagging 4.14.1
fae9aef Add documentation
708e5a9 Add tests
c609eee Bootstrap
7f922d0 Polymorphic parameters
51aeb04 Keep generalized structure from patterns when typing let
4b68bb3 Add test of princiaplity from polymorphic type constraints
82c7afe fix wong raise
aca252f x86: Force result of Icomp to be in a register (ocaml#11808)
985725b Add dynlink_compilerlibs.mli to .gitignore (ocaml#79)
2b1fa24 Regenerate parser (ocaml#80)
1bb6c79 Merge pull request ocaml#78 from mshinwell/flambda-backend-patches-2022-12-13
9029581 Update otherlibs/dynlink/Makefile
3e4f1b9 Revert toplevel/native/dune to ocaml-jst version
6061e4c Regenerate configure using autoconf 2.71
888d4b1 Back out patch which disables alloc-check in ocaml-jst
a6d5796 Fix dynlink build
3e46daf Update .depend files
a5c547e Bootstrap
a6a9031 Merge flambda-backend changes
0ac7fdd temp fix for linker error (ocaml#77)
1018602 Remove references to 32-bit Cygwin (ocaml#11797)
e2d0d9e Enable individual testing with Makefile.jst (ocaml#76)
f10cbf6 increment version number after tagging 4.14.1~rc1
11c5ab7 release 4.14.1~rc1
e4c3920 last commit before tagging 4.14.1~rc1
9e598ca Merge pull request ocaml#11793 from dra27/then-than
2a7e501 Use a more relaxed mode for unification in Ctype.subst (ocaml#11771) (ocaml#73)
7b35ef7 Statically initialize `caml_global_data` with a valid value (ocaml#11788)
cbd791a Allow immediates to cross modes (ocaml#58)
85a0817 Merge pull request ocaml#11534 from gasche/follow-synonyms-in-show-module-type
699f43c Changes
e54e9bc fix the 'stuttering' issue in #show
d9799d3 test comments
fec3b23 follow synonyms when #show-ing module types
06a1ad7 regression tests for ocaml#11533 (still failing)
549d757 Run "misplaced attributes" check when compiling mlis (ocaml#72)
b2b74bf Fix bug in `Mtype.strengthen_lazy` causing spurious typing errors (ocaml#11776)
a6c0e75 Ensure that Ctype.nongen always calls remove_mode_variables (ocaml#70)
6c50831 array elements are global (ocaml#67)
bc510ed Ensure that types from packed modules are always generalised (ocaml#11732)
4d47036 Fix ocaml#10768
8788ff6 Add/move some documentation
9891a36 Propagate location information to `local_` in expressions
988306d Add support for `global_` and `nonlocal_` constructor arguments (ocaml#50)
6729eb8 Missing CAMLparam in win32's Unix.stat (ocaml#11737)
e7dd740 Add debug_printers.ml (ocaml#63)
65f2896 more entries in gitignore (ocaml#62)
a9a84d0 Move `global_flag` to `Asttypes` (ocaml#60)
fac5896 Minor attribute fixes from flambda-backend
75f402e Note about make install and Makefile.jst (ocaml#56)
fb5b1e4 Remove the -force-tmc flag (ocaml#11661)
bd87a61 ocamlmklib: use `ar rcs` instead of `ar rc` (ocaml#11670)
83762af Merge pull request ocaml#11622 from Octachron/fix_recursive_types_in_constructor_mismatch
ca48730 Merge pull request ocaml#11609 from Octachron/pr11194_unbound_and_printing_context

git-subtree-dir: ocaml
git-subtree-split: a09392d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants