-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8265783: Create a separate library for x86 Intel SVML assembly intrinsics #3638
Conversation
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
/contributor add sviswanathan |
@sviswa7 |
/contributor add rkandu |
@sviswa7 |
/contributor add rlupusoru |
/label add hotspot-compiler |
@sviswa7 |
@sviswa7 |
@sviswa7 |
@sviswa7 |
Webrevs
|
Tier 1 to 3 tests pass for the default set of build profiles. |
@PaulSandoz Thanks a lot for running through the tests. |
@vnkozlov @AlanBateman @rose00 Looking forward to your review and feedback. |
/contributor add Ahmet Akkas ahmet.akkas@intel.com |
Mailing list message from Paul Sandoz on hotspot-compiler-dev: I share your concerns about code maintainability. I have every confidence that Intel?s contributors to OpenJDK are prepared to maintain the code and provide fixes. In this case I would argue that what is being proposed here is very unique: under incubation for highly specialized implementations of numerical operations from a well regarded library, while exploring alternative bindings during further incubation with enhancements to Panama. I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu). Thus we should dissuade the use of .S source for other intrinsic cases. Does this help alleviate some of your concerns? Paul.
|
Mailing list message from Viswanathan, Sandhya on hotspot-compiler-dev: Hi Andrew, Sorry for delay in response. Best Regards, -----Original Message----- On 5/17/21 6:51 PM, Paul Sandoz wrote:
I understand the argument from utility here, but it's a very substantial precedent to take. In the licence we use for OpenJDK, the "source code" for a work means the preferred form of the work for making modifications to it. (I'm aware that we've had a few of these from Intel before, but ISTM that this is a much bigger deal.) I understand that permission to include these files was probably granted as a result of negotiations with Intel. And it's great to have this code in OpenJDK. However, I am sure that no-one on this project looks forward to a future in which part of our "source code" consists of what are in effect unfixable binary blobs. We should at least have the conversation about whether this is the way OpenJDK should be going. -- |
1 similar comment
Mailing list message from Viswanathan, Sandhya on hotspot-compiler-dev: Hi Andrew, Sorry for delay in response. Best Regards, -----Original Message----- On 5/17/21 6:51 PM, Paul Sandoz wrote:
I understand the argument from utility here, but it's a very substantial precedent to take. In the licence we use for OpenJDK, the "source code" for a work means the preferred form of the work for making modifications to it. (I'm aware that we've had a few of these from Intel before, but ISTM that this is a much bigger deal.) I understand that permission to include these files was probably granted as a result of negotiations with Intel. And it's great to have this code in OpenJDK. However, I am sure that no-one on this project looks forward to a future in which part of our "source code" consists of what are in effect unfixable binary blobs. We should at least have the conversation about whether this is the way OpenJDK should be going. -- |
Mailing list message from Andrew Dinn on hotspot-compiler-dev: Hi Sandhya, On 20/05/2021 00:51, Viswanathan, Sandhya wrote:
I appreciate the above explanation and am largely swayed by the argument Could you clarify one important detail. Is the original C code (with I ask because I think it would significantly lower the risk I feel is Without that sort of understanding I can see this becoming a bug trap Of course, the situation is not necessarily a lot better with the source regards, Andrew Dinn |
1 similar comment
Mailing list message from Andrew Dinn on hotspot-compiler-dev: Hi Sandhya, On 20/05/2021 00:51, Viswanathan, Sandhya wrote:
I appreciate the above explanation and am largely swayed by the argument Could you clarify one important detail. Is the original C code (with I ask because I think it would significantly lower the risk I feel is Without that sort of understanding I can see this becoming a bug trap Of course, the situation is not necessarily a lot better with the source regards, Andrew Dinn |
Mailing list message from Viswanathan, Sandhya on hotspot-compiler-dev: Hi Andrew, Intel has a long history of contributing to Java and being part of OpenJDK. Best Regards, -----Original Message----- Hi Sandhya, On 20/05/2021 00:51, Viswanathan, Sandhya wrote:
I appreciate the above explanation and am largely swayed by the argument that this is a useful, albeit expedient, way forward. Thank you for pinpointing the goal here. Could you clarify one important detail. Is the original C code (with compiler-specific vector extensions) able to be made available either as open source or using some other form of licensing. I ask because I think it would significantly lower the risk I feel is present in importing this code if OpenJDK devs were able to see the source functions from which the various machine code routines have been derived and understand the algorithms that they embody. Without that sort of understanding I can see this becoming a bug trap where a report of a vector floating point computation that goes awry may well leave whoever is left to debug the problem with no clear way of ruling out a problem in the vector code, most especially wasting a lot of time trying to do so before looking elsewhere for a more likely/obvious error. Of course, the situation is not necessarily a lot better with the source available; generated machine code is clearly not going to be easily mapped back to C code. Still, I'd prefer it if OpenJDK devs had a fighting chance than little or none. regards, Andrew Dinn |
Mailing list message from Andrew Haley on hotspot-compiler-dev: Hi, thanks. One or two points inline. On 5/20/21 12:51 AM, Viswanathan, Sandhya wrote:
Do the routines meet the (semi-) monotonicity requirements for "Therefore, most methods with more than 0.5 ulp errors are required to
Sure, I understand. However, you could bring in the source code as
I hope not. Thank you for the reply. -- |
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/20/21 12:34 AM, Paul Sandoz wrote:
I've got nothing at all against .S files, as long as they are the real
Somewhat, but I wonder if this, as a matter of policy, is an area in I guess I wouldn't mind as long as we had a "This far, and no further" -- |
1 similar comment
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/20/21 12:34 AM, Paul Sandoz wrote:
I've got nothing at all against .S files, as long as they are the real
Somewhat, but I wonder if this, as a matter of policy, is an area in I guess I wouldn't mind as long as we had a "This far, and no further" -- |
Mailing list message from John Rose on hotspot-compiler-dev: On May 20, 2021, at 8:31 AM, Andrew Haley <aph at redhat.com> wrote:
Yes, these .S files are a somewhat painful compromise which Intel is contributing them as a one-time artifact which we are, So I?ll characterize the current sources as just-barely-workable Exiting incubation, any of the following options seem allowable - using a very slow element-wise loop over JDK math methods That last option seems very desirable, and getting the Vector API
I think this could rise to the GB level if we needed to make a strong
Well in this case, we have two things: 1. Temporary expedient only for incubation, to gain public feedback. That?s probably enough ?case law? to help clarify the relevant policy. What do you think? ? John |
1 similar comment
Mailing list message from John Rose on hotspot-compiler-dev: On May 20, 2021, at 8:31 AM, Andrew Haley <aph at redhat.com> wrote:
Yes, these .S files are a somewhat painful compromise which Intel is contributing them as a one-time artifact which we are, So I?ll characterize the current sources as just-barely-workable Exiting incubation, any of the following options seem allowable - using a very slow element-wise loop over JDK math methods That last option seems very desirable, and getting the Vector API
I think this could rise to the GB level if we needed to make a strong
Well in this case, we have two things: 1. Temporary expedient only for incubation, to gain public feedback. That?s probably enough ?case law? to help clarify the relevant policy. What do you think? ? John |
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/20/21 7:54 PM, John Rose wrote:
It is, but that's not not entirely what I'm worried about. The four The freedom to run the program as you wish, for any purpose The freedom to study how the program works, and change it so The freedom to redistribute copies so you can help your neighbor The freedom to distribute copies of your modified versions to In this case we have 0, 2, and 3, but not 1. So, this issue is about My question is, then, (please forgive the paraphrase), are we giving up
NB: "preferred form" is a term used (but not fully defined) in
OK, but I don't hold out much hope of 2 actually succeeding before
I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did -- |
1 similar comment
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/20/21 7:54 PM, John Rose wrote:
It is, but that's not not entirely what I'm worried about. The four The freedom to run the program as you wish, for any purpose The freedom to study how the program works, and change it so The freedom to redistribute copies so you can help your neighbor The freedom to distribute copies of your modified versions to In this case we have 0, 2, and 3, but not 1. So, this issue is about My question is, then, (please forgive the paraphrase), are we giving up
NB: "preferred form" is a term used (but not fully defined) in
OK, but I don't hold out much hope of 2 actually succeeding before
I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did -- |
Mailing list message from Viswanathan, Sandhya on hotspot-compiler-dev: Hi Andrew/John, We made this contribution with the goal to help Vector API and its evaluation during incubation. This is the best we could do currently towards JDK 17. Best Regards, -----Original Message----- On 5/20/21 7:54 PM, John Rose wrote:
It is, but that's not not entirely what I'm worried about. The four The freedom to run the program as you wish, for any purpose The freedom to study how the program works, and change it so The freedom to redistribute copies so you can help your neighbor The freedom to distribute copies of your modified versions to In this case we have 0, 2, and 3, but not 1. So, this issue is about more than mere utility, but something more fundamental. It's about the right of our users to understand how OpenJDK works. My question is, then, (please forgive the paraphrase), are we giving up essential freedom to purchase a little temporary utility?
NB: "preferred form" is a term used (but not fully defined) in GPLv2. It's not easy to define, but we know it when we see it: it's the form a programmer prefers to edit, the original source code.
OK, but I don't hold out much hope of 2 actually succeeding before incubation exit.
I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did wonder whether I should object to this commit, but I reasoned that this kind of issue is exactly the reason that we have a governing board with community representatives. It's literally my duty. -- |
1 similar comment
Mailing list message from Viswanathan, Sandhya on hotspot-compiler-dev: Hi Andrew/John, We made this contribution with the goal to help Vector API and its evaluation during incubation. This is the best we could do currently towards JDK 17. Best Regards, -----Original Message----- On 5/20/21 7:54 PM, John Rose wrote:
It is, but that's not not entirely what I'm worried about. The four The freedom to run the program as you wish, for any purpose The freedom to study how the program works, and change it so The freedom to redistribute copies so you can help your neighbor The freedom to distribute copies of your modified versions to In this case we have 0, 2, and 3, but not 1. So, this issue is about more than mere utility, but something more fundamental. It's about the right of our users to understand how OpenJDK works. My question is, then, (please forgive the paraphrase), are we giving up essential freedom to purchase a little temporary utility?
NB: "preferred form" is a term used (but not fully defined) in GPLv2. It's not easy to define, but we know it when we see it: it's the form a programmer prefers to edit, the original source code.
OK, but I don't hold out much hope of 2 actually succeeding before incubation exit.
I think that's OK, as long as it's well-enough understood. By the way, slightly off topic: being rather conflict averse I did wonder whether I should object to this commit, but I reasoned that this kind of issue is exactly the reason that we have a governing board with community representatives. It's literally my duty. -- |
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/21/21 7:34 PM, Viswanathan, Sandhya wrote:
I think the general consensus is to go ahead with this patch. We should take Thank you for your patience. -- |
1 similar comment
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 5/21/21 7:34 PM, Viswanathan, Sandhya wrote:
I think the general consensus is to go ahead with this patch. We should take Thank you for your patience. -- |
/integrate |
@sviswa7 Since your change was applied there have been 3 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 9f05c41. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR contains Short Vector Math Library support related changes for JEP-414 Vector API (Second Incubator), in preparation for when targeted.
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods.
These methods are built into a separate library instead of being part of libjvm.so or jvm.dll.
The following changes are made:
The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml.
The assembly source files are named as “.S” and include files are named as “.S.inc”.
The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk.
Changes are made to build system to support dependency tracking for assembly files with includes.
The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux.
The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library.
Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie@oracle.com).
Looking forward to your review and feedback.
Performance:
Micro benchmark Base Optimized Unit Gain(Optimized/Base)
Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
Double128Vector.COS 49.94 245.89 ops/ms 4.92
Double128Vector.COSH 26.91 126.00 ops/ms 4.68
Double128Vector.EXP 71.64 379.65 ops/ms 5.30
Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
Double128Vector.LOG 61.95 279.84 ops/ms 4.52
Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
Double128Vector.SIN 49.36 240.79 ops/ms 4.88
Double128Vector.SINH 26.59 103.75 ops/ms 3.90
Double128Vector.TAN 41.05 152.39 ops/ms 3.71
Double128Vector.TANH 45.29 169.53 ops/ms 3.74
Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
Double256Vector.COS 58.26 389.77 ops/ms 6.69
Double256Vector.COSH 29.44 151.11 ops/ms 5.13
Double256Vector.EXP 86.67 564.68 ops/ms 6.52
Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
Double256Vector.LOG 71.52 394.90 ops/ms 5.52
Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
Double256Vector.SIN 57.06 380.98 ops/ms 6.68
Double256Vector.SINH 29.40 117.37 ops/ms 3.99
Double256Vector.TAN 44.90 279.90 ops/ms 6.23
Double256Vector.TANH 54.08 274.71 ops/ms 5.08
Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
Double512Vector.COS 59.88 837.04 ops/ms 13.98
Double512Vector.COSH 30.34 172.76 ops/ms 5.70
Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
Double512Vector.LOG 74.84 996.00 ops/ms 13.31
Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
Double512Vector.POW 37.42 384.13 ops/ms 10.26
Double512Vector.SIN 59.74 728.45 ops/ms 12.19
Double512Vector.SINH 29.47 143.38 ops/ms 4.87
Double512Vector.TAN 46.20 587.21 ops/ms 12.71
Double512Vector.TANH 57.36 495.42 ops/ms 8.64
Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
Double64Vector.COS 23.42 152.01 ops/ms 6.49
Double64Vector.COSH 17.34 113.34 ops/ms 6.54
Double64Vector.EXP 27.08 203.53 ops/ms 7.52
Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
Double64Vector.LOG 26.75 142.63 ops/ms 5.33
Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
Double64Vector.SIN 23.28 146.91 ops/ms 6.31
Double64Vector.SINH 17.62 88.59 ops/ms 5.03
Double64Vector.TAN 21.00 86.43 ops/ms 4.12
Double64Vector.TANH 23.75 111.35 ops/ms 4.69
Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
Float128Vector.COS 42.82 803.02 ops/ms 18.75
Float128Vector.COSH 31.44 118.34 ops/ms 3.76
Float128Vector.EXP 72.43 855.33 ops/ms 11.81
Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
Float128Vector.LOG 52.95 877.94 ops/ms 16.58
Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
Float128Vector.SIN 43.38 745.31 ops/ms 17.18
Float128Vector.SINH 31.11 112.91 ops/ms 3.63
Float128Vector.TAN 37.25 332.13 ops/ms 8.92
Float128Vector.TANH 57.63 453.77 ops/ms 7.87
Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
Float256Vector.COS 43.75 926.69 ops/ms 21.18
Float256Vector.COSH 33.52 130.46 ops/ms 3.89
Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
Float256Vector.SIN 44.07 911.04 ops/ms 20.67
Float256Vector.SINH 33.16 122.50 ops/ms 3.69
Float256Vector.TAN 37.85 497.75 ops/ms 13.15
Float256Vector.TANH 64.27 537.20 ops/ms 8.36
Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
Float512Vector.COS 40.92 1567.93 ops/ms 38.32
Float512Vector.COSH 33.42 138.36 ops/ms 4.14
Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
Float512Vector.POW 32.73 1015.85 ops/ms 31.04
Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
Float512Vector.SINH 33.05 129.39 ops/ms 3.91
Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
Float64Vector.COS 44.28 394.33 ops/ms 8.91
Float64Vector.COSH 28.35 95.27 ops/ms 3.36
Float64Vector.EXP 65.80 486.37 ops/ms 7.39
Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
Float64Vector.LOG 51.93 163.25 ops/ms 3.14
Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
Float64Vector.SIN 44.41 382.09 ops/ms 8.60
Float64Vector.SINH 28.20 90.68 ops/ms 3.22
Float64Vector.TAN 36.29 160.89 ops/ms 4.43
Float64Vector.TANH 47.65 214.04 ops/ms 4.49
Progress
Issue
Reviewers
Contributors
<sviswanathan@openjdk.org>
<rkandu@openjdk.org>
<rlupusoru@openjdk.org>
<ihse@openjdk.org>
<jiefu@openjdk.org>
<ahmet.akkas@intel.com>
<marius.cornea@intel.com>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638
$ git checkout pull/3638
Update a local copy of the PR:
$ git checkout pull/3638
$ git pull https://git.openjdk.java.net/jdk pull/3638/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3638
View PR using the GUI difftool:
$ git pr show -t 3638
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3638.diff