-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8341003: [lworld+fp16] Benchmarks for various Float16 operations #1254
Conversation
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
Hi @Bhavana-Kilambi , I see vector IR in almost all the micros apart from three i.e. isNaN, isFinite and isInfinity with following command
Indicates Java implementation in those cases is not getting auto-vectorized, we didn't had benchmarks earlier, after tuning we can verify with this new benchmark. Kindly let me know if the micro looks good, I can integrate it. |
Hi @jatin-bhateja , thanks for doing the micros. Also, regarding the performance results you posted for the Intel machine, have you compared it with anything else (like the default FP32 implementation for FP16/case without the intrinsics or the scalar FP16 version) so that we can better interpret the scores? |
Hi @Bhavana-Kilambi , This patch adds micro benchmarks for all Float16 APIs optimized uptill now. |
@jatin-bhateja , Thanks! While we are at the topic, can I ask if there are any real-world usescases or workloads that you are targeting the FP16 work for and maybe plan to do performance testing in the future? |
Hey, for baseline we should not pass --enable-preview since it will prohibit following
Here are the first baseline numbers without --enable-primitive.
Following are the number where we do allow flat array layout, but only disable intrinsics (-XX:DisableIntrinsic=<INTIN_ID>+).
|
Hey, we have some ideas, but for now my intent is to add micros/few demonstrating macro for each API we have accelerated. |
Thanks for sharing the numbers. So in the first case, without the |
Yes. Let me know if you have other comments on micros, or kindly approve if its good to integrate. |
I am just running the tests on one of our machines. Can I just confirm in a while please? The tests otherwise look fine to me.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks.
btw are you generating min instruction for max and max instruction for min in |
My bad, good catch, Thanks! |
/integrate |
@jatin-bhateja Pushed as commit 0ce9f0f. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please find below the results of performance testing over Intel Xeon6 Granite Rapids:-
Best Regrads,
Jatin
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1254/head:pull/1254
$ git checkout pull/1254
Update a local copy of the PR:
$ git checkout pull/1254
$ git pull https://git.openjdk.org/valhalla.git pull/1254/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1254
View PR using the GUI difftool:
$ git pr show -t 1254
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1254.diff
Webrev
Link to Webrev Comment