-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8319423: Improve Year.isLeap by checking divisibility by 16 #16491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back redestad! A progress list of the required criteria for merging this PR into |
…ehave identically), add microbenchmark testing the variant that optimize well with gcc
|
Looks good. It probably needs a comment explaining why or a reference; otherwise it looks mysterious. |
Webrevs
|
|
/label remove i18n |
|
@cl4es |
naotoj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The logic can also apply to GregorianCalendar.isLeapYear().
|
@cl4es This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 20 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
For int values it seems that makes it a few percent faster (1,061 ± 0,017 ops/us), though we need a variant that works for longs ( |
Thanks for trying and giving the honest feedback! I don't recall exactly but it seems was a kind of playing with I'm not sure that similar is possible for |
| // A year that is a multiple of 100, 200 and 300 is not divisible by 16, but 400 is. | ||
| // So for a year that's divisible by 4, checking that it's also divisible by 16 | ||
| // is sufficient to determine it must be a leap year. | ||
| return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (year & 3) == 0 && ((year & 15) == 0) || (year % 25) != 0 would be better simply because the common path will be a little bit shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Mode Cnt Score Error Units
LeapYearBench.isLeapYear thrpt 15 0,735 ± 0,004 ops/us
LeapYearBench.isLeapYearChrono thrpt 15 0,734 ± 0,006 ops/us
So equal to or even slightly worse than baseline. I tested a few variants before submitting the PR - some that looked simpler or better - but the ternary variant in this PR always came out on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On top of my general comments made earlier, let me give my two cents on this line:
return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0;
If (year & 15) == 0, then the last four bits of year are zeros. In particular, the last two bits of year are zeros, that is, (year & 3) == 0. The same conclusion can be obtained in terms of divisibility: if year % 16 == 0, i.e., year is a multiple of 16, then year % 4 == 0, i.e., year is multiple of 4. What I'm trying to say is that the line above can be simplified to:
return (year & 15) == 0 ? true : (year & 3) == 0 && year % 100 != 0;
But now it becomes clear that the above is also equivalent to:
return (year & 15) == 0 || ((year & 3) == 0 && year % 100 != 0);
Which is the simplest form of all the above. It's possible, but I'm not sure, that the Java compiler makes this simplification for you. (FWIW: the GCC C compiler does. Indeed, as seen here the three expressions above generate exactly the same assembly instructions.)
As I explained in my earlier post, for this particular expression a further simplification, that makes the compiler (at least the C compiler above) to save one instruction (ror edi 2) is replacing 100 by 25:
return (year & 15) == 0 || ((year & 3) == 0 && year % 25 != 0);
|
Thanks for your interest in my work. I'd love to assist porting our algorithms to Java. Notice, however, that I'm not a Java programmer and I don't have the complete picture of what goes on in the JVM. What follows is based on my experience with C/C++ compilers but, I reckon, most of it might apply to Java as well. When determining if Maths results show that, in calculations of leap year, it's correct to replace In my talk I also said that I did other optimisations around Similar mathematical arguments as above show that it's also correct to replace If there's no branch, then this is probably be the best option: If there's a branch, then I'd expect this to perform better: The reason is hinted in my talk: If there's a branch, the branch predictor can do a better job when execution is split into paths with (1/100, 99/100) = (1%, 99%) probability distribution than when the distribution is (1/25, 24/25) = (4%, 96%). Looking at byte-code for the 4 different implementations floated in this discussion, I see some I can also shed some light on the magical numbers that appear in the code that check divisibility by 100 and 25 by pointing to my series of articles on this topic:
I hope this helps. |
| return (gregorianYear & 15) == 0 | ||
| ? (gregorianYear & 3) == 0 | ||
| : (gregorianYear & 3) == 0 && gregorianYear % 100 != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly,
| return (gregorianYear & 15) == 0 | |
| ? (gregorianYear & 3) == 0 | |
| : (gregorianYear & 3) == 0 && gregorianYear % 100 != 0; | |
| return ((gregorianYear & 15) == 0) || ((gregorianYear & 3) == 0 && gregorianYear % 25 != 0); |
|
Thank you for your comments, @cassioneri For reference I included the following variant in the int d = year % 100 != 0 ? 4 : 16;
return (year & (d - 1)) == 0;and as shown above this underperforms even the baseline implementation when executed on the HotSpot JVM with C2: This might suggest a deficiency in the C2 JIT rather than an issue with your code. But let's dig a bit deeper and analyze the ASM generated on x86. With .. so, some sequence of multiplications, shifts and subtractions, not dissimilar to the code you'd expect from Lemire's calculus. Testing the PR under test then the inner calculation becomes... exactly the same? Yet faster... The difference seem to boil down to how the JIT is better able to unroll and vectorize the benchmark loop with my PR'd code. While not an irrelevant property, this means the improvement I'm seeing might be a bit overstated for more typical cases, and I'll need to see if what happens if I set the microbenchmark up not to inline and unroll heavily. |
|
Thanks @cl4es for the ASM listing. Now I can understand better what is going on at the very low level. The compiler is replacing There's a more recent technique that can give I think it would be very useful if the Java compiler also implemented this technique. Not only for this piece of code but for any time that For this particular piece of code though, one could manually perform the minverse optimisation. Recall from my previous posts that in leap year calculation, one can replace (I'm deducing this code from the godbolt link and I hope I got it right but, if you're interested I can double check another time.) It's very important for the calculations above to be done in unsigned arithmetic. I don't know how this is done in Java but, FWIW, in C it would be: |
|
Yes, seems Granlund & Montgomery is used, see jdk/src/hotspot/share/opto/divnode.cpp Line 161 in 01c0d5d
I explored some more with micros that don't loop over different values but instead test all the year variants of interest in isolation: When isolating like this the suggested I suggest we go ahead and integrate this, file an RFE to re-examine the division-by-constant in C2, then re-evaluate these |
I have filed JDK-8319451. I would suggest waiting for this bug to be resolved before proceeding with this PR. |
Nice analysis! While I'm sure we need to re-evaluate this enhancement after JDK-8319451 is resolved, I'm not a fan of blocking library enhancements on improvements to the runtime/compiler (as it's impossible to know up front if this is something we can fix in the next couple of days/weeks, or need to staff, plan and evaluate over a longer cycle). I included the Neri-Schneider variant in the microbenchmark here to make it easy to assess if an optimization such as JDK-8319451 would turn things around. We should file an enhancement to re-visit the Granlund & Montgomery/Hacker's delight division. Improving this should benefit either variant, and might be needed together with JDK-8319451 for the |
|
Filed https://bugs.openjdk.org/browse/JDK-8319526 to re-examine the integer remainder optimization in C2. |
These are good improvements and are beneficial with or without other fixes, so should go ahead independently. |
|
/integrate |
|
Going to push as commit 7d25f1c.
Your commit was automatically rebased without conflicts. |
|
Mailing list message from Lothar Kimmeringer on i18n-dev: Am 03.11.2023 um 22:01 schrieb Claes Redestad:
Not sure if this has any effect in terms of performance but not being a return (year & 3) == 0 && (year & 15 == 0 || year % 100 != 0); Cheers, Lothar |
I tried this and many other variants but the one in this PR came out on top - and it even seemed the additional redundancy helped the JIT. This might be due a deficiency in how C2 handles conditional moves, so I stuck to the one that generated the seemingly best code layout with a plan to re-evaluate (and hopefully simplify) this once https://bugs.openjdk.org/browse/JDK-8319451 is resolved. There's already a PR out #16524 so hopefully we can re-visit this soon. |
https://github.com/cassioneri/eaf suggest this code for leap year calculation:
.. with a claim this would compile down to branchless, easily pipelined code.
This doesn't currently happen with C2. In the meantime I think we can improve the current code in
Year.isLeapandIsoChronology.isLeapYearby leveraging the fact that the% 100check is only needed if(year & 15) != 0:Mac M1:
Linux x64:
30% higher throughput on M1, 40% on x64.
isLeapYearNSruns a variant of the code from https://github.com/cassioneri/eaf ported to java - perhaps the JIT can be improved to do whatever clang/gcc does here and achieve an even better speed-up.Testing: so far only java/time/tck/java/time locally, will run a few tiers before filing an enhancement and opening the PR for review.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16491/head:pull/16491$ git checkout pull/16491Update a local copy of the PR:
$ git checkout pull/16491$ git pull https://git.openjdk.org/jdk.git pull/16491/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16491View PR using the GUI difftool:
$ git pr show -t 16491Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16491.diff
Webrev
Link to Webrev Comment