Skip to content

Conversation

@Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Apr 28, 2025

Hi,
Can you help to review this patch?

Before JDK-8353786, when a released jdk not supportting sleef (for any reason, e.g. low gcc version, intrinsic not supported, rvv not supported, and so on) runs on machine support vector operation (e.g. on riscv, it supports rvv), it can not call into sleef, but will not fail either, it falls back to java scalar version implementation.
But after JDK-8353786, it will cause an exception thrown at runtime.

This change the behaviour of existing jdk, and it should not throw exception anyway.

@iwanowww @RealFYang

Thanks!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issues

  • JDK-8355698: JDK not supporting sleef could cause exception at runtime after JDK-8353786 (Bug - P4)
  • JDK-8355656: Several vector tests fail when built with clang (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24914/head:pull/24914
$ git checkout pull/24914

Update a local copy of the PR:
$ git checkout pull/24914
$ git pull https://git.openjdk.org/jdk.git pull/24914/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24914

View PR using the GUI difftool:
$ git pr show -t 24914

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24914.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 28, 2025

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 28, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 28, 2025
@openjdk
Copy link

openjdk bot commented Apr 28, 2025

@Hamlin-Li To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels
  • build
  • client
  • compiler
  • core-libs
  • graal
  • hotspot
  • hotspot-compiler
  • hotspot-gc
  • hotspot-jfr
  • hotspot-runtime
  • i18n
  • ide-support
  • javadoc
  • jdk
  • jmx
  • kulla
  • net
  • nio
  • security
  • serviceability
  • shenandoah

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is if vector math library is present, it provides full set of stubs.

Do we really want to complicate things even more by supporting arbitrary subsets of vector math stubs?

IMO a better way to fix the problem is to avoid building/bundling SLEEF when any stubs are missing.

@sendaoYan
Copy link
Member

This PR can fix the failure of JDK-8355656

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Apr 29, 2025

I think, first JDK-8353786 changes the behaviour of jdk (previously it does not throw exception or fail in other ways). second it should fall back to a java implementation rather than fail, this is more friendly to jdk vendors and users. As you can see, @sendaoYan report another issue related to it on x64 platform.

For the jdk vendors, I suppose they should try their best to build a jdk with full vector math library support, but they can fail to do so for many reasons, in that situation we should fall back to a default version which could be slow in performance, but should not fail, because we already has a java scalar version implemented, and the motivation of this version is for falling back when there is no optimal native version.

The assumption is if vector math library is present, it provides full set of stubs.

Yes. The issues here is the released jdk could be lack of support of such library for various reasons.

Do we really want to complicate things even more by supporting arbitrary subsets of vector math stubs?

I'm not sure what you mean here, but seems to me the fix is simple, we just not throw exception and allow it fall back to java scalar version, and this behaviour is consitent with previous jdk behaviour. Or can you clarify further?

IMO a better way to fix the problem is to avoid building/bundling SLEEF when any stubs are missing.

This behavour was decided in previous PRs #20781, #21083, #21502, and some other uncommited PRs prior to these ones.
And the x64 math libraray support (SVML) in jdk was introduced even earlier.

How do you think about it?

@Hamlin-Li
Copy link
Author

/solves JDK-8355656

@openjdk
Copy link

openjdk bot commented Apr 29, 2025

@Hamlin-Li
Adding additional issue to solves list: 8355656: Several vector tests fail when built with clang.

@iwanowww
Copy link
Contributor

iwanowww commented Apr 29, 2025

This behavour was decided in previous PRs #20781, #21083, #21502, and some other uncommited PRs prior to these ones.

Can you point me, please, to the relevant parts of the discussions? I wasn't part of them.

Do we really want to complicate things even more by supporting arbitrary subsets of vector math stubs?
I'm not sure what you mean here, but seems to me the fix is simple, we just not throw exception and allow it fall back to java scalar version, and this behaviour is consitent with previous jdk behaviour. Or can you clarify further?

Both SVML and SLEEF-based vector math native libraries are built and bundled with the JDK. JDK is in full control of their sources and should be able to rely on the exact API they expose.

Such behavior has a number of advantages, in particular:

  • exposes bugs (missing entries, naming mismatches between JDK and vector math library);
  • enables future evolution towards jextract-based automatic Java API extraction (based on library header files)

API variability complicates things, so should be avoided unless there are compelling reasons to justify it. Can you summarize, please, what's the motivation behind such behavior?

If build platform doesn't fully satisfy library requirements, the library can be excluded. If the library is missing, default (Java) implementations are used.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Apr 30, 2025

Can you point me, please, to the relevant parts of the discussions? I wasn't part of them.

Sorry, I can not find out this specific answer for you, as there were too long discussion there, the whole process took several months.

Both SVML and SLEEF-based vector math native libraries are built and bundled with the JDK. JDK is in full control of their sources and should be able to rely on the exact API they expose.

Seems both jdk vendors and jdk end users don't have a full control of the environment to opposite sides.

exposes bugs (missing entries, naming mismatches between JDK and vector math library);

It's arguable if missing entries is a bug, please check JDK-8355656 for example

enables future evolution towards jextract-based automatic Java API extraction (based on library header files)

I'm not familar with this feature, but could it be simple to catch a missing API, then do some necessary things?

API variability complicates things, so should be avoided unless there are compelling reasons to justify it. Can you summarize, please, what's the motivation behind such behavior?

One advantage of it is fault tolerance, please check JDK-8355656 for example. And I don't think this compliates things.

If build platform doesn't fully satisfy library requirements, the library can be excluded.

Does JDK-8355656 fall in the above scenario? Why should we exclude the whole library when only part of the library is not supported?
After all we does not support TANH on riscv and arm for now.

In summary, I'm concerned that there could always be some potential issue, and we should consider to make it fall back when we could, as after a jdk release to end users it's out of your control. If we want to consider these things ( like missing entries as bugs, we still could do it in other ways (like checking the log in a test), rather than totally stop the jdk to run in end users' environment.

@iwanowww
Copy link
Contributor

It's arguable if missing entries is a bug, please check JDK-8355656 for example

Personally, I'm surprised SVML is affected in a similar way because stubs are provided in assembly form, so they are simply linked into a shared library during assembly phase. I can only guess what causes such inconsistencies during JDK build phase. And it just reassures me that current behavior is a source of bugs in the future.

Overall, it still looks like a JDK build issue to me. Hiding problems occurred during the build is not good. If some toolchains can't successfully build the library, the library shouldn't be included in JDK.

After all we does not support TANH on riscv and arm for now.

JDK code is aware of that and doesn't try to look up corresponding entries. Ideally, it shouldn't be included into the native library, since it increases its size for no good reason. But having unused code is a redundancy and not a bug per se.

(For SVML it's a bit more complicated, because JDK is the only place where sources of SVML stubs live (there's no upstream project/distribution)).

@iwanowww
Copy link
Contributor

/label core-libs, hotspot

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Apr 30, 2025
@openjdk
Copy link

openjdk bot commented Apr 30, 2025

@iwanowww
The core-libs label was successfully added.

The hotspot label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Apr 30, 2025

Webrevs

@Hamlin-Li
Copy link
Author

Overall, it still looks like a JDK build issue to me. Hiding problems occurred during the build is not good. If some toolchains can't successfully build the library, the library shouldn't be included in JDK.

No, in riscv case (possiblely also on arm?) I don't think it's a build issue, the jdk vendor can choose to support it or not, it's just the way passing the information ( whether it's supported or not) are different.

How about we consider this in another way, we could fix this issue first, as it fails regularly on riscv and x64 in some situations. Then if you still consider the existing behaviour should be changed and could be improved further, it can be done in another pr. After all, it already broke the existing jdk in some scenarios.
How do you think about it?

@iwanowww
Copy link
Contributor

iwanowww commented May 1, 2025

I want to understand the issue with missing entries in vector math native libraries first before making a decision how to proceed.

I don't think it's a build issue, the jdk vendor can choose to support it or not, it's just the way passing the information ( whether it's supported or not) are different.

Sorry, I don't get it. How does it affect the contents of the native library? JDK vendors do have an option to bundle the library or drop it from their distribution. But when SLEEF-based and SVML math libraries are built by JDK there's no distinction between entries being included in the library. If a vendor modifies make files or native library code, it's up to them to adjust JDK accordingly. Upstream JDK doesn't have to take such scenarios into account.

I looked through SVML and SLEEF-based vector math code and noticed there are some capability check [1] [2] [3] guarding library code. It means that if some library entry is missing, then the whole library is empty. Can you confirm it's the case you see?

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/linux/native/libjsvml/globals_vectorApiSupport_linux.S.inc#L35
[2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/windows/native/libjsvml/globals_vectorApiSupport_windows.S.inc#L28
[3] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/unix/native/libsleef/lib/vector_math_rvv.c#L36

@Hamlin-Li
Copy link
Author

I want to understand the issue with missing entries in vector math native libraries first before making a decision how to proceed.

I don't think it's a build issue, the jdk vendor can choose to support it or not, it's just the way passing the information ( whether it's supported or not) are different.

Sorry, I don't get it. How does it affect the contents of the native library? JDK vendors do have an option to bundle the library or drop it from their distribution. But when SLEEF-based and SVML math libraries are built by JDK there's no distinction between entries being included in the library. If a vendor modifies make files or native library code, it's up to them to adjust JDK accordingly. Upstream JDK doesn't have to take such scenarios into account.

Sorry for confusing you. No, what I mean is vendors can choose their build environment (e.g. with/without support of compiler, I discuss this a bit below), and different environment will lead to whether sleef is supported or not (i.e. whether there are entries in the libsleef.so).

I looked through SVML and SLEEF-based vector math code and noticed there are some capability check [1] [2] [3] guarding library code. It means that if some library entry is missing, then the whole library is empty. Can you confirm it's the case you see?

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/linux/native/libjsvml/globals_vectorApiSupport_linux.S.inc#L35 [2] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/windows/native/libjsvml/globals_vectorApiSupport_windows.S.inc#L28 [3] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/unix/native/libsleef/lib/vector_math_rvv.c#L36

In riscv sleef case, yes, it checks native compiler version and a flat, this is required because only these versions (or higher) support vector intrinsics which are used in sleef header files. ( I think arm is the same case, but in a bit simpler way only with a flag).

@theRealAph
Copy link
Contributor

Overall, it still looks like a JDK build issue to me. Hiding problems occurred during the build is not good. If some toolchains can't successfully build the library, the library shouldn't be included in JDK.

No, in riscv case (possiblely also on arm?) I don't think it's a build issue, the jdk vendor can choose to support it or not, it's just the way passing the information ( whether it's supported or not) are different.

I am not convinced that supporting such a divergence between builds of the JDK is something we should support. Sure, we can choose at runtime whether to link with SLEEF or not, but having some builds of (say) risc OpenJDK with SLEEF and some without is a Bad Thing. It is a build issue.

@mlbridge
Copy link

mlbridge bot commented May 2, 2025

Mailing list message from Andrew Haley on core-libs-dev:

On 5/2/25 10:11, Hamlin Li wrote:

what I mean is vendors can choose their build environment (e.g. with/without support of compiler, I discuss this a bit below), and different environment will lead to whether sleef is supported or not (i.e. whether there are entries in the libsleef.so).

Again, this is an unhelpful thing to do. If the environment doesn't
support building the full JDK, kick it out.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@iwanowww
Copy link
Contributor

iwanowww commented May 6, 2025

Thanks, Li. I think I have a better understanding now how it all works across platforms.

I agree with Andrew that it's highly undesireable to have divergence across JDK builds, but I don't have a good understanding how mature toolchain support is when it comes to vector intrinsics SLEEF library relies on.

As of now, producing empty native library is the only way to communicate that a vector math library isn't properly built.

So, 2 problems to address:
(1) JDK build may produce an empty vector math library (and no build failures);
(2) vector math library is built unconditionally (there's no way to configure JDK build to skip it).

Speaking of SVML, it requires GCC 4.9+ and MSVC 2017. According to JDK build documentation [1], newer toolchain versions are required for JDK build. So, it should be fine to remove GCC version checks. (JDK-8355656 is reported for a JDK build using Clang. I have no idea whether Clang can build SVML stubs or not.)

So, the stop-the-gap fix for JDK-8355698 is to disable SLEEF build when toolchain doesn't have proper support. Alternatively, VectorMathLibrary can probe the library to ensure it is not empty [2].

[1] https://github.com/openjdk/jdk/blob/master/doc/building.md

[2]

diff --git a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
index 4729235e2d9..9f0abb9afa1 100644
--- a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
+++ b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
@@ -189,6 +189,16 @@ public boolean isSupported(Operator op, VectorSpecies<?> vspecies) {
     private static class SLEEF implements Library {
         static {
             VectorSupport.loadNativeLibrary("sleef");
+            ensureValidLibrary();
+        }
+
+        private static void ensureValidLibrary() {
+            // Probe an arbitrary entry point to ensure the library is not empty.
+            // JDK build of SLEEF-based vector math library may produce an empty library when
+            // proper toolchain support is absent. Until it is fixed, ensure the corresponding
+            // native library is not empty.
+            // Throws an exception on lookup failure which triggers a switch to Java-based implementation.
+            LOOKUP.findOrThrow(LIBRARY.symbolName(VectorOperators.SIN, FloatVector.SPECIES_64));
         }
 
         private static String suffix(VectorShape vshape, boolean isShapeAgnostic) {

@Hamlin-Li
Copy link
Author

Thanks, I'll check it.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 4, 2025

@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@Hamlin-Li
Copy link
Author

I'll close this pr, as seems to us it's also helpful to push the jdk vendors to support sleef when they release.

@Hamlin-Li Hamlin-Li closed this Jun 16, 2025
@Hamlin-Li Hamlin-Li deleted the fix-sleef-exception branch June 16, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants