Skip to content

Conversation

@cl4es
Copy link
Member

@cl4es cl4es commented Nov 3, 2023

https://github.com/cassioneri/eaf suggest this code for leap year calculation:

    public static boolean isLeap(long year) {
        int d = year % 100 != 0 ? 4 : 16;
        return (year & (d - 1)) == 0;
    }

.. with a claim this would compile down to branchless, easily pipelined code.

This doesn't currently happen with C2. In the meantime I think we can improve the current code in Year.isLeap and IsoChronology.isLeapYear by leveraging the fact that the % 100 check is only needed if (year & 15) != 0:

    public static boolean isLeap(long year) {
        return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0;
    }

Mac M1:

Name                           Cnt  Base   Error   Test   Error   Unit  Change
LeapYearBench.isLeapYear        15 0,743 ± 0,009  0,994 ± 0,005 ops/us   1,34x (p = 0,000*)
LeapYearBench.isLeapYearChrono  15 0,748 ± 0,006  0,991 ± 0,003 ops/us   1,32x (p = 0,000*)
LeapYearBench.isLeapYearNS      15 0,558 ± 0,026  0,552 ± 0,033 ops/us   0,99x (p = 0,602 )
  * = significant

Linux x64:

Name                           Cnt  Base   Error   Test   Error   Unit  Change
LeapYearBench.isLeapYear        15 0.534 ± 0.001  0.765 ± 0.004 ops/us   1.43x (p = 0.000*)
LeapYearBench.isLeapYearChrono  15 0.535 ± 0.000  0.753 ± 0.040 ops/us   1.41x (p = 0.000*)
LeapYearBench.isLeapYearNS      15 0.352 ± 0.000  0.351 ± 0.001 ops/us   1.00x (p = 0.000*)
  * = significant

30% higher throughput on M1, 40% on x64. isLeapYearNS runs a variant of the code from https://github.com/cassioneri/eaf ported to java - perhaps the JIT can be improved to do whatever clang/gcc does here and achieve an even better speed-up.

Testing: so far only java/time/tck/java/time locally, will run a few tiers before filing an enhancement and opening the PR for review.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8319423: Improve Year.isLeap by checking divisibility by 16 (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16491/head:pull/16491
$ git checkout pull/16491

Update a local copy of the PR:
$ git checkout pull/16491
$ git pull https://git.openjdk.org/jdk.git pull/16491/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16491

View PR using the GUI difftool:
$ git pr show -t 16491

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16491.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 3, 2023

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 3, 2023

@cl4es The following labels will be automatically applied to this pull request:

  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org labels Nov 3, 2023
…ehave identically), add microbenchmark testing the variant that optimize well with gcc
@RogerRiggs
Copy link
Contributor

Looks good. It probably needs a comment explaining why or a reference; otherwise it looks mysterious.

@cl4es cl4es changed the title Leap year optimization inspired by Neri & Schneider 8319423: Improve Year.isLeap by checking divisibility by 16 Nov 3, 2023
@cl4es cl4es marked this pull request as ready for review November 3, 2023 20:56
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 3, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 3, 2023

Webrevs

@cl4es
Copy link
Member Author

cl4es commented Nov 3, 2023

/label remove i18n

@openjdk openjdk bot removed the i18n i18n-dev@openjdk.org label Nov 3, 2023
@openjdk
Copy link

openjdk bot commented Nov 3, 2023

@cl4es
The i18n label was successfully removed.

Copy link
Member

@naotoj naotoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The logic can also apply to GregorianCalendar.isLeapYear().

@openjdk
Copy link

openjdk bot commented Nov 3, 2023

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8319423: Improve Year.isLeap by checking divisibility by 16

Reviewed-by: naoto, rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 20 new commits pushed to the master branch:

  • cdf3373: 8319316: Clarify text around which layouts a linker supports
  • 1696603: 8308453: Convert JKS test keystores in test/jdk/javax/net/ssl/etc to PKCS12
  • b3126b6: 8319455: Test compiler/print/CompileCommandMemLimit.java times out
  • 1c2ea1d: 8319153: Fix: Class is a raw type in ProcessTools
  • 96e6e67: 4365952: Cannot disable JFileChooser
  • 2d4bbf4: 8319465: Typos in javadoc of com.sun.management.OperatingSystemMXBean methods
  • 8fb94fd: 8319379: G1: gc/logging/TestUnifiedLoggingSwitchStress.java crashes after JDK-8318894
  • b5c863b: 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization
  • 377138c: 8318959: C2: define MachNode::fill_new_machnode() statically
  • c146685: 8319165: hsdis binutils: warns on empty string as option string
  • ... and 10 more: https://git.openjdk.org/jdk/compare/ec79ab4b3cd89c2c0a9c8550cd62433bd6d45266...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 3, 2023
@plokhotnyuk
Copy link
Contributor

@cl4es Could you please test this function too? It seems that when testing divisibility by 100 we can use just one multiplication operation.

@cl4es
Copy link
Member Author

cl4es commented Nov 3, 2023

@cl4es Could you please test this function too? It seems that when testing divisibility by 100 we can use just one multiplication operation.

For int values it seems that makes it a few percent faster (1,061 ± 0,017 ops/us), though we need a variant that works for longs (GregorianCalendar could use this as-is, but not Year). Do you have a reference to how they arrived at these numbers? It might be straightforward to extend it to long values, and it'd be good to have the theory to reference either way.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Nov 4, 2023

@cl4es Could you please test this function too? It seems that when testing divisibility by 100 we can use just one multiplication operation.

For int values it seems that makes it a few percent faster (1,061 ± 0,017 ops/us), though we need a variant that works for longs (GregorianCalendar could use this as-is, but not Year). Do you have a reference to how they arrived at these numbers? It might be straightforward to extend it to long values, and it'd be good to have the theory to reference either way.

Thanks for trying and giving the honest feedback!

I don't recall exactly but it seems was a kind of playing with bool f(int n) { return n % 100 == 0; } on the https://godbolt.org to see assembly generated by latest c++ compilers with -O3 option and then brute force squeezing a redundant shift operation.

I'm not sure that similar is possible for long values, so probably it worth to have 2 different methods for long and int types. I'm even not sure if that function would work for negative int values.

@merykitty
Copy link
Member

I believe this is the theory, the formula there is not actually (x % 100) == 0 but (x % 25) == 0. For long values, the formula for (x % 25) == 0 would be x * -8116567392432202711 + 368934881474191032 u< 737869762948382064. As presented here.

// A year that is a multiple of 100, 200 and 300 is not divisible by 16, but 400 is.
// So for a year that's divisible by 4, checking that it's also divisible by 16
// is sufficient to determine it must be a leap year.
return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0;
Copy link
Member

@merykitty merykitty Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (year & 3) == 0 && ((year & 15) == 0) || (year % 25) != 0 would be better simply because the common path will be a little bit shorter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark                           Mode  Cnt  Score   Error   Units
LeapYearBench.isLeapYear           thrpt   15  0,735 ± 0,004  ops/us
LeapYearBench.isLeapYearChrono     thrpt   15  0,734 ± 0,006  ops/us

So equal to or even slightly worse than baseline. I tested a few variants before submitting the PR - some that looked simpler or better - but the ternary variant in this PR always came out on top.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of my general comments made earlier, let me give my two cents on this line:

  return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0;

If (year & 15) == 0, then the last four bits of year are zeros. In particular, the last two bits of year are zeros, that is, (year & 3) == 0. The same conclusion can be obtained in terms of divisibility: if year % 16 == 0, i.e., year is a multiple of 16, then year % 4 == 0, i.e., year is multiple of 4. What I'm trying to say is that the line above can be simplified to:

  return (year & 15) == 0 ? true : (year & 3) == 0 && year % 100 != 0;

But now it becomes clear that the above is also equivalent to:

  return (year & 15) == 0 || ((year & 3) == 0 && year % 100 != 0);

Which is the simplest form of all the above. It's possible, but I'm not sure, that the Java compiler makes this simplification for you. (FWIW: the GCC C compiler does. Indeed, as seen here the three expressions above generate exactly the same assembly instructions.)

As I explained in my earlier post, for this particular expression a further simplification, that makes the compiler (at least the C compiler above) to save one instruction (ror edi 2) is replacing 100 by 25:

  return (year & 15) == 0 || ((year & 3) == 0 && year % 25 != 0);

@cassioneri
Copy link

cassioneri commented Nov 4, 2023

Thanks for your interest in my work. I'd love to assist porting our algorithms to Java. Notice, however, that I'm not a Java programmer and I don't have the complete picture of what goes on in the JVM. What follows is based on my experience with C/C++ compilers but, I reckon, most of it might apply to Java as well.

When determining if year is leap or not it's very reasonable to believe that checking divisibility by 4 first is the best strategy. As I told in my talk, virtually every implementation that I found made that assumption. However, this is not the case thanks to modern branch predictors, at least for x86_64 CPUs. My experiments showed that checking divisibility by 100 first is the best way:

    if (year % 100 != 0)
      return year % 4 == 0;
    return year % 400 == 0;

Maths results show that, in calculations of leap year, it's correct to replace year % 400 == 0 by the cheaper expression y % 16 == 0. Except for compiler bugs, this should always be done. Hence, the implementation that @cl4es mentioned:

    public static boolean isLeap(long year) {
        int d = year % 100 != 0 ? 4 : 16;
        return (year & (d - 1)) == 0;
    }

In my talk I also said that I did other optimisations around year % 100 == 0 but that was another story. Let me tell this story.

Similar mathematical arguments as above show that it's also correct to replace year % 100 == 0 with year % 25 == 0 and the latter requires one fewer assembly instruction. (I've also discussed this topic in a PR for the Rust implementation.) However, contrary to the case of 400-and-16, it's not always profitable to replace year % 100 == 0 with year % 25 == 0. It depends on whether the code executed by the CPU contains branches or not. (Despite the usage of the ternary operator ?, some CPUs support conditional moves which allow the elimination of some branches.)

If there's no branch, then this is probably be the best option:

    public static boolean is_leap_v0(long year) {
      int d = year % 25 != 0 ? 4 : 16;
      return (year & (d - 1)) == 0;
    }

If there's a branch, then I'd expect this to perform better:

    public static boolean is_leap_v0(long year) {
      int d = year % 100 != 0 ? 4 : 16;
      return (year & (d - 1)) == 0;
    }

The reason is hinted in my talk: If there's a branch, the branch predictor can do a better job when execution is split into paths with (1/100, 99/100) = (1%, 99%) probability distribution than when the distribution is (1/25, 24/25) = (4%, 96%).

Looking at byte-code for the 4 different implementations floated in this discussion, I see some gotos but I can't tell if, in assembly, they translate to branches or conditional moves. With more knowledgeable eyes, the C versions of the same implementations suggest me that the branchless is_leap_v0 is the best. But, I ask you, to not trust in my eyes or my intuition and, instead, measure the performance of the several alternatives. This old SO post might also be helpful.

I can also shed some light on the magical numbers that appear in the code that check divisibility by 100 and 25 by pointing to my series of articles on this topic:

I hope this helps.
Cassio.

Comment on lines +46 to +48
return (gregorianYear & 15) == 0
? (gregorianYear & 3) == 0
: (gregorianYear & 3) == 0 && gregorianYear % 100 != 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly,

Suggested change
return (gregorianYear & 15) == 0
? (gregorianYear & 3) == 0
: (gregorianYear & 3) == 0 && gregorianYear % 100 != 0;
return ((gregorianYear & 15) == 0) || ((gregorianYear & 3) == 0 && gregorianYear % 25 != 0);

@cl4es
Copy link
Member Author

cl4es commented Nov 5, 2023

Thank you for your comments, @cassioneri

For reference I included the following variant in the isLeapYearNS microbenchmark:

        int d = year % 100 != 0 ? 4 : 16;
        return (year & (d - 1)) == 0;

and as shown above this underperforms even the baseline implementation when executed on the HotSpot JVM with C2: 0,558 ± 0,026 ops/us vs 0,743 ± 0,009 for the baseline implementation. My experiments then explored how we could trick the C2 JIT to improve over the baseline, and the patch suggested in this PR is what I ended up with.

This might suggest a deficiency in the C2 JIT rather than an issue with your code. But let's dig a bit deeper and analyze the ASM generated on x86.

With isLeapYearNS we seem to generate something like this for the actual calculation:

   ...
   0.88%   │  │││ ││││││  0x00007f603062d39c:   movabs $0xa3d70a3d70a3d70b,%rax
   1.47%   │  │││ ││││││  0x00007f603062d3a6:   imul   %rbp
   1.29%   │  │││ ││││││  0x00007f603062d3a9:   add    %rbp,%rdx
   1.23%   │  │││ ││││││  0x00007f603062d3ac:   mov    %rbp,%rcx
   0.41%   │  │││ ││││││  0x00007f603062d3af:   sar    $0x3f,%rcx
   0.43%   │  │││ ││││││  0x00007f603062d3b3:   sar    $0x6,%rdx
   1.63%   │  │││ ││││││  0x00007f603062d3b7:   sub    %rcx,%rdx
   2.70%   │  │││ ││││││  0x00007f603062d3ba:   imul   $0x64,%rdx,%rcx
   9.91%   │  │││ ││││││  0x00007f603062d3be:   sub    %rcx,%rbp
   ...

.. so, some sequence of multiplications, shifts and subtractions, not dissimilar to the code you'd expect from Lemire's calculus.

Testing the PR under test then the inner calculation becomes... exactly the same?

   0.70%   │  │││ ││││││  0x00007f8fe062ad1c:   movabs $0xa3d70a3d70a3d70b,%rax
   1.37%   │  │││ ││││││  0x00007f8fe062ad26:   imul   %rbp
   2.03%   │  │││ ││││││  0x00007f8fe062ad29:   add    %rbp,%rdx
   1.80%   │  │││ ││││││  0x00007f8fe062ad2c:   mov    %rbp,%rcx
   0.59%   │  │││ ││││││  0x00007f8fe062ad2f:   sar    $0x3f,%rcx
   0.66%   │  │││ ││││││  0x00007f8fe062ad33:   sar    $0x6,%rdx
   1.66%   │  │││ ││││││  0x00007f8fe062ad37:   sub    %rcx,%rdx
   2.27%   │  │││ ││││││  0x00007f8fe062ad3a:   imul   $0x64,%rdx,%rcx
  10.05%   │  │││ ││││││  0x00007f8fe062ad3e:   sub    %rcx,%rbp

Yet faster... The difference seem to boil down to how the JIT is better able to unroll and vectorize the benchmark loop with my PR'd code. While not an irrelevant property, this means the improvement I'm seeing might be a bit overstated for more typical cases, and I'll need to see if what happens if I set the microbenchmark up not to inline and unroll heavily.

@cassioneri
Copy link

cassioneri commented Nov 5, 2023

Thanks @cl4es for the ASM listing. Now I can understand better what is going on at the very low level.

The compiler is replacing y % 100 == 0 with y == 100 * (y / 100) and is using the traditional Granlund-Montgomery (GM) optimisation for the quotient q = y / 100, that is, q is obtained as (y * M) >> 70 where M = ceil(2^70/100). This is confirmed by the appearance of the hex constant 0xa3d70a3d70a3d70b (this is M) and the two imul instructions (y * M and 100 * q). (This is not Daniel Lemire's technique but related.)

There's a more recent technique that can give y % 100 == 0 with just one multiplication (again, it's not Daniel Lemire's) which gcc started using in v9. See the difference here. This is what I called "minverse" in my first article on quick modular calculations (previously referred).

I think it would be very useful if the Java compiler also implemented this technique. Not only for this piece of code but for any time that n % d == 0 with constant d appears in source code.

For this particular piece of code though, one could manually perform the minverse optimisation. Recall from my previous posts that in leap year calculation, one can replace 100 by 25. This is helpful since, as seen in the link above, the latter doesn't need the ror instruction (which doesn't have an equivalent in C/C++ and, I suppose, Java.) Hence, y % 100 == 0 can be replace by y % 25 == 0 which, in turn, can be replaced by

(y * 0x8F5C28F5C28F5C29 + 0x51EB851EB851EB8) < 0xA3D70A3D70A3D70.

(I'm deducing this code from the godbolt link and I hope I got it right but, if you're interested I can double check another time.) It's very important for the calculations above to be done in unsigned arithmetic. I don't know how this is done in Java but, FWIW, in C it would be:

(((uint64_t) y) * 0x8F5C28F5C28F5C29ull + 0x51EB851EB851EB8ull) < 0xA3D70A3D70A3D70ull;

@cl4es
Copy link
Member Author

cl4es commented Nov 6, 2023

Yes, seems Granlund & Montgomery is used, see

// Attempt the jint constant divide -> multiply transform found in

I explored some more with micros that don't loop over different values but instead test all the year variants of interest in isolation:

Benchmark                          (year)  Mode  Cnt  Score   Error  Units
LeapYearBench.Single.isLeapYear      2000  avgt   15  0,590 ± 0,053  ns/op
LeapYearBench.Single.isLeapYear      2017  avgt   15  0,586 ± 0,030  ns/op
LeapYearBench.Single.isLeapYear      2004  avgt   15  0,936 ± 0,002  ns/op
LeapYearBench.Single.isLeapYear      2100  avgt   15  0,937 ± 0,002  ns/op
LeapYearBench.Single.isLeapYearNS    2000  avgt   15  2,117 ± 0,028  ns/op
LeapYearBench.Single.isLeapYearNS    2017  avgt   15  2,114 ± 0,025  ns/op
LeapYearBench.Single.isLeapYearNS    2004  avgt   15  2,115 ± 0,019  ns/op
LeapYearBench.Single.isLeapYearNS    2100  avgt   15  2,111 ± 0,019  ns/op

When isolating like this the suggested (v & 15) == 0-first approach still wins, and the generated code across the tests appear to be about as branchy.

I suggest we go ahead and integrate this, file an RFE to re-examine the division-by-constant in C2, then re-evaluate these isLeapYear micros in that new environment.

@merykitty
Copy link
Member

.. with a claim this would compile down to branchless, easily pipelined code.

This doesn't currently happen with C2.

I have filed JDK-8319451. I would suggest waiting for this bug to be resolved before proceeding with this PR.

@cl4es
Copy link
Member Author

cl4es commented Nov 6, 2023

I have filed JDK-8319451. I would suggest waiting for this bug to be resolved before proceeding with this PR.

Nice analysis!

While I'm sure we need to re-evaluate this enhancement after JDK-8319451 is resolved, I'm not a fan of blocking library enhancements on improvements to the runtime/compiler (as it's impossible to know up front if this is something we can fix in the next couple of days/weeks, or need to staff, plan and evaluate over a longer cycle). I included the Neri-Schneider variant in the microbenchmark here to make it easy to assess if an optimization such as JDK-8319451 would turn things around.

We should file an enhancement to re-visit the Granlund & Montgomery/Hacker's delight division. Improving this should benefit either variant, and might be needed together with JDK-8319451 for the isLeapYearNS test to win.

@cl4es
Copy link
Member Author

cl4es commented Nov 6, 2023

Filed https://bugs.openjdk.org/browse/JDK-8319526 to re-examine the integer remainder optimization in C2.

@RogerRiggs
Copy link
Contributor

I suggest we go ahead and integrate this, file an RFE to re-examine the division-by-constant in C2, then re-evaluate these isLeapYear micros in that new environment.

These are good improvements and are beneficial with or without other fixes, so should go ahead independently.

@cl4es
Copy link
Member Author

cl4es commented Nov 8, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Nov 8, 2023

Going to push as commit 7d25f1c.
Since your change was applied there have been 57 commits pushed to the master branch:

  • 59e9981: 8319376: ParallelGC: Forwarded objects found during heap inspection
  • 7c7f8ea: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes
  • 7bc8e4c: 8294980: test/jdk/java/lang/invoke 15 test classes use experimental bytecode library
  • e841897: 8319374: JFR: Remove instrumentation for exception events
  • cd9719b: 8319306: Serial: Remove TenuredSpace::verify
  • 1e687b4: 8316719: C2 compilation still fails with "bad AD file"
  • 8555e0f: 8319318: bufferedStream fixed case can be removed
  • 73c5f60: 8319556: Harmonize interface formatting in the FFM API
  • cc4b0d9: 8319378: Spec for j.util.Timer::purge and j.util.Timer::cancel could be improved
  • a290256: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch
  • ... and 47 more: https://git.openjdk.org/jdk/compare/ec79ab4b3cd89c2c0a9c8550cd62433bd6d45266...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 8, 2023
@openjdk openjdk bot closed this Nov 8, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 8, 2023
@openjdk
Copy link

openjdk bot commented Nov 8, 2023

@cl4es Pushed as commit 7d25f1c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlbridge
Copy link

mlbridge bot commented Nov 8, 2023

Mailing list message from Lothar Kimmeringer on i18n-dev:

Am 03.11.2023 um 22:01 schrieb Claes Redestad:

 public static boolean isLeap\(long year\) \{
     return \(year \& 15\) \=\= 0 \? \(year \& 3\) \=\= 0 \: \(year \& 3\) \=\= 0 \&\& year \% 100 \!\= 0\;
 \}

Not sure if this has any effect in terms of performance but not being a
fan of duplicated code, a variant would be

return (year & 3) == 0 && (year & 15 == 0 || year % 100 != 0);

Cheers, Lothar

@cl4es
Copy link
Member Author

cl4es commented Nov 9, 2023

return (year & 3) == 0 && (year & 15 == 0 || year % 100 != 0);

I tried this and many other variants but the one in this PR came out on top - and it even seemed the additional redundancy helped the JIT. This might be due a deficiency in how C2 handles conditional moves, so I stuck to the one that generated the seemingly best code layout with a plan to re-evaluate (and hopefully simplify) this once https://bugs.openjdk.org/browse/JDK-8319451 is resolved. There's already a PR out #16524 so hopefully we can re-visit this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants