Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent integer width conversion in caml_int_compare #2250

Merged
merged 7 commits into from Mar 23, 2019

Conversation

@smuenzel-js
Copy link
Contributor

commented Feb 15, 2019

Previously, in 64bit mode, res was computed as 32bits, and then extended to 64bits.
This patch eliminates the sign-extension instruction.

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 15, 2019

Could you include a comment in the code to record the intent in this change?

@xclerc

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

This code pattern has a number of occurrences,
e.g. caml_int32_compare_unboxed.
Shouldn't we update all of them? (Possibly by
defining a macro.)

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

Does this change improves performance in a measurable way? int32 -> int64 conversions are extremely cheap, and the proposed code is less readable than the current code... I'm not asking for big speedups, just a hint that it makes a non-negligible difference.

@smuenzel-js

This comment has been minimized.

Copy link
Contributor Author

commented Feb 18, 2019

I've updated the code to use a macro, and changed all the other comparisons that had this issue.

Benchmark results on a loop that adds up the results of comparisons:

New: 
./a.out  15.00s user 0.00s system 99% cpu 15.011 total
./a.out  14.86s user 0.01s system 99% cpu 14.874 total
./a.out  14.94s user 0.00s system 99% cpu 14.947 total

Old:
./a.out  17.76s user 0.01s system 99% cpu 17.767 total
./a.out  17.30s user 0.00s system 99% cpu 17.316 total
./a.out  18.09s user 0.01s system 99% cpu 18.100 total

This is the program:

let do_compare (x : int) (y : int) =
  let x = Sys.opaque_identity x in
  let y = Sys.opaque_identity y in
  compare x y

let () =
  let result = ref 0 in
  for i = 0 to 4_000_000_000 do
    result := !result + (do_compare 432 5);
    result := !result + (do_compare 5 432);
    result := !result + (do_compare 432 432);
  done;
  print_int !result

The difference in the two functions is clear when you look at the disassembly.
Previously:

XDIS 0000000000418630: BASE 31C0                     xor eax, eax
XDIS 0000000000418632: BASE 4839F7                   cmp rdi, rsi
XDIS 0000000000418635: BASE 0F9FC0                   setnle al
XDIS 0000000000418638: BASE 31D2                     xor edx, edx
XDIS 000000000041863a: BASE 4839F7                   cmp rdi, rsi
XDIS 000000000041863d: BASE 0F9CC2                   setl dl
XDIS 0000000000418640: BASE 29D0                     sub eax, edx
XDIS 0000000000418642: LONGMODE 4898                     cdqe 
XDIS 0000000000418644: BASE 488D440001               lea rax, ptr [rax+rax*1+0x1]
XDIS 0000000000418649: BASE C3                       ret 

With this patch:

XDIS 0000000000418430: BASE 31C0                     xor eax, eax
XDIS 0000000000418432: BASE 4839F7                   cmp rdi, rsi
XDIS 0000000000418435: BASE 0F9CC2                   setl dl
XDIS 0000000000418438: BASE 0F9FC0                   setnle al
XDIS 000000000041843b: BASE 0FB6D2                   movzx edx, dl
XDIS 000000000041843e: BASE 4829D0                   sub rax, rdx
XDIS 0000000000418441: BASE 488D440001               lea rax, ptr [rax+rax*1+0x1]
XDIS 0000000000418446: BASE C3                       ret 
@xclerc

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

Thanks for the changes; Is there a reason to not apply it
to {int32,int64,nativeint}_cmp?

@xclerc

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

Answering my own question: the aforementioned functions
do not need to be modified because they return an int.

Copy link
Contributor

left a comment

Thanks for the benchmark numbers, which are convincing at least for x86.

Some suggestions below from someone who doesn't like macros.

intnat equal_f = f == f;
intnat equal_g = g == g;
intnat res = greater - less + equal_f - equal_g;
return res;
}

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

Do you really need to write it one asm instruction per line? What's wrong with

(intnat)(f > g) - (intnat)(f < g) + (intnat)(f == f) - (intnat)(g == g)
intnat __compare_greater = __compare_v1 > __compare_v2; \
intnat __compare_less = __compare_v1 < __compare_v2; \
intnat result = __compare_greater - __compare_less

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

I'm not a fan of macros that declare variables in the current scope. (As opposed to within a block.) I'm not sure macros are needed anyway, see below.

This comment has been minimized.

Copy link
@smuenzel-js

smuenzel-js Feb 19, 2019

Author Contributor

A macro is nice because there's one canonical reference for the code, not a couple of places where it's inlined. That helps with maintenance -- only one place to change, and it's also discoverable by those new to the codebase.
And we also don't have to document the reasoning for the code in more than one place.

I'd be happy to change the style of the macro, but I'm not exactly sure how.
One way could be make the entire function the macro, and have something like
DECLARE_COMPARE_UNBOXED(caml_int32_compare_unboxed,int32_t)
Another would be to take the one-liner you propose below, and make that into a macro.

This comment has been minimized.

Copy link
@xclerc

xclerc Feb 22, 2019

Contributor

The suggestion, if I understand it correctly, was to do something akin to:

#define COMPARE_INT(type, v1, v2, result) \
  { type __compare_v1 = v1; \
    type __compare_v2 = v2; \
    intnat __compare_greater = __compare_v1 > __compare_v2; \
    intnat __compare_less    = __compare_v1 < __compare_v2; \
    intnat result = __compare_greater - __compare_less }

following the style of e.g. https://github.com/ocaml/ocaml/blob/trunk/runtime/bigarray.c#L195.

I think it is a bit safer/better than your change in 1df410b, because
we are sure the expressions passed to the macro are not evaluated
mutilple times.

@@ -126,7 +134,7 @@ CAMLprim value caml_bswap16(value v)

CAMLprim value caml_int_compare(value v1, value v2)
{
int res = (v1 > v2) - (v1 < v2);
COMPARE_INT(value, v1, v2, res);
return Val_int(res);

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

res is of type intnat, so it should be Val_long(res).

@@ -126,7 +134,7 @@ CAMLprim value caml_bswap16(value v)

CAMLprim value caml_int_compare(value v1, value v2)
{
int res = (v1 > v2) - (v1 < v2);
COMPARE_INT(value, v1, v2, res);

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

Macro-less alternative:

   intnat res = (intnat)(v1 > v2) - (intnat)(v1 < v2);
@@ -314,7 +322,8 @@ CAMLprim value caml_int32_to_float(value v)

intnat caml_int32_compare_unboxed(int32_t i1, int32_t i2)
{
return (i1 > i2) - (i1 < i2);
COMPARE_INT(int32_t, i1, i2, res);

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

Likewise.

@@ -562,7 +571,8 @@ CAMLprim value caml_int64_to_nativeint(value v)

intnat caml_int64_compare_unboxed(int64_t i1, int64_t i2)
{
return (i1 > i2) - (i1 < i2);
COMPARE_INT(int64_t,i1,i2,res);

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

Likewise.

@@ -824,7 +834,8 @@ CAMLprim value caml_nativeint_to_int32(value v)

intnat caml_nativeint_compare_unboxed(intnat i1, intnat i2)
{
return (i1 > i2) - (i1 < i2);
COMPARE_INT(intnat, i1, i2, res);

This comment has been minimized.

Copy link
@xavierleroy

xavierleroy Feb 18, 2019

Contributor

Likewise.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2019

I liked it better before the last commit. Could we please stop suggesting things and pushing code, and instead proceed with a proper review? Which means that the code should not change every day.

@smuenzel-js smuenzel-js force-pushed the smuenzel-js:patch-1 branch from b00717b to b5b1e28 Mar 4, 2019
smuenzel-js added 4 commits Feb 15, 2019
Previously, in 64bit mode, res was computed as 32bits, and then extended to 64bits.
This patch eliminates the sign-extension instruction.
@smuenzel-js smuenzel-js force-pushed the smuenzel-js:patch-1 branch from b5b1e28 to 37abcf1 Mar 4, 2019
@smuenzel-js

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

@xavierleroy I've reverted the latest change.

xavierleroy added 2 commits Mar 23, 2019
The house style is one space after comma.
Copy link
Contributor

left a comment

I pushed directly some small changes (documentation comment, spacing). Looks OK to me.

@xavierleroy xavierleroy merged commit 6efe8fe into ocaml:trunk Mar 23, 2019
0 of 2 checks passed
0 of 2 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.