Skip to content

8268014: Build failure on SUSE Linux Enterprise Server 11.4 (s390x) due to 'SYS_get_mempolicy' was not declared #4277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

DamonFool
Copy link
Member

@DamonFool DamonFool commented Jun 1, 2021

Hi all,

Please review the patch which fixes the build failure on SUSE Linux Enterprise Server 11.4 (s390x).

@dholmes-ora had reminded me of this bug in [1].
But I missed the fact that ZGC is currently only built on x64 and aarch64.
And I'm sorry for this breakage.

The sys_call numbers of get_mempolicy for different platforms are just copied from the libnuma source code [2].

Thanks.
Best regards,
Jie

[1] #4205 (comment)
[2] https://github.com/numactl/numactl/blob/master/syscall.c


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8268014: Build failure on SUSE Linux Enterprise Server 11.4 (s390x) due to 'SYS_get_mempolicy' was not declared

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4277/head:pull/4277
$ git checkout pull/4277

Update a local copy of the PR:
$ git checkout pull/4277
$ git pull https://git.openjdk.java.net/jdk pull/4277/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4277

View PR using the GUI difftool:
$ git pr show -t 4277

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4277.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 1, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@DamonFool
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Jun 1, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until 9fc63d7

@openjdk openjdk bot added test-request rfr Pull request is ready for review labels Jun 1, 2021
@openjdk
Copy link

openjdk bot commented Jun 1, 2021

@DamonFool The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 1, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 1, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

Thanks,
David

@openjdk
Copy link

openjdk bot commented Jun 1, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8268014: Build failure on SUSE Linux Enterprise Server 11.4 (s390x) due to 'SYS_get_mempolicy' was not declared

Reviewed-by: dholmes, mdoerr, mbaesken

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 54 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 1, 2021
@DamonFool
Copy link
Member Author

Looks fine.

Thanks,
David

Thanks @dholmes-ora .

@martin Doerr, please let me know if the build failure can be fixed by this change.
Thanks.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds on our old s390 machine. Thank you very much for fixing it!
Note: Some preprocessors may not support whitespaces before '#', but I believe all our supported compilers do so. So, I'm fine with it.

Copy link
Member

@MBaesken MBaesken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, on newer s390x machines (SLES15.2) the value of SYS_get_mempolicy is 269 .
So a fix value per platform is not always correct.
However we have the define for SYS_get_mempolicy on this SLES15.2 machine.

@DamonFool
Copy link
Member Author

Hi, on newer s390x machines (SLES15.2) the value of SYS_get_mempolicy is 269 .
So a fix value per platform is not always correct.
However we have the define for SYS_get_mempolicy on this SLES15.2 machine.

Okay.
Patch has been updated to check & try to use __NR_get_mempolicy first if SYS_get_mempolicy is not defined, which is suggested by @TheRealMDoerr in the JBS.
Thanks.

@DamonFool
Copy link
Member Author

Hi, on newer s390x machines (SLES15.2) the value of SYS_get_mempolicy is 269 .
So a fix value per platform is not always correct.
However we have the define for SYS_get_mempolicy on this SLES15.2 machine.

Okay.
Patch has been updated to check & try to use __NR_get_mempolicy first if SYS_get_mempolicy is not defined, which is suggested by @TheRealMDoerr in the JBS.
Thanks.

Ah, using __NR_get_mempolicy is suggested by @MBaesken in the JBS.
Thanks @dholmes-ora for correcting me.

But I'm not sure if the following change is safe enough since libnuma source code also handles the case when __NR_get_mempolicy isn't defined [1] .

#if !defined(SYS_get_mempolicy) && defined(S390)
#define SYS_get_mempolicy __NR_get_mempolicy
#endif

I would suggest keeping the same logic for that case.
What do you think?
Thanks.

[1] https://github.com/numactl/numactl/blob/master/syscall.c#L29

@MBaesken
Copy link
Member

MBaesken commented Jun 1, 2021

Hi, on newer s390x machines (SLES15.2) the value of SYS_get_mempolicy is 269 .
So a fix value per platform is not always correct.
However we have the define for SYS_get_mempolicy on this SLES15.2 machine.

Okay.
Patch has been updated to check & try to use __NR_get_mempolicy first if SYS_get_mempolicy is not defined, which is suggested by @TheRealMDoerr in the JBS.
Thanks.

Ah, using __NR_get_mempolicy is suggested by @MBaesken in the JBS.
Thanks @dholmes-ora for correcting me.

But I'm not sure if the following change is safe enough since libnuma source code also handles the case when __NR_get_mempolicy isn't defined [1] .

#if !defined(SYS_get_mempolicy) && defined(S390)
#define SYS_get_mempolicy __NR_get_mempolicy
#endif

I would suggest keeping the same logic for that case.
What do you think?
Thanks.

[1] https://github.com/numactl/numactl/blob/master/syscall.c#L29

Hi, checking for the existance of SYS_get_mempolicy AND __NR_get_mempolicy is better.

Best regards, Matthias

@DamonFool
Copy link
Member Author

Hi, checking for the existance of SYS_get_mempolicy AND __NR_get_mempolicy is better.

So are you fine with the latest change?
Thanks.

@dholmes-ora
Copy link
Member

I would prefer to only use what we need to fix the problem on the "old" system that it was observed rather than trying to generalize this for arbitrary/unknown old versions of Linux. Otherwise we can never know when it is okay to remove this code.

David

@DamonFool
Copy link
Member Author

Otherwise we can never know when it is okay to remove this code.

This sounds reasonable.
Will update it later.
Thanks.

@mlbridge
Copy link

mlbridge bot commented Jun 2, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 1/06/2021 9:25 pm, Jie Fu wrote:

On Tue, 1 Jun 2021 10:48:24 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

Hi, on newer s390x machines (SLES15.2) the value of SYS_get_mempolicy is 269 .
So a fix value per platform is not always correct.
However we have the define for SYS_get_mempolicy on this SLES15.2 machine.

Okay.
Patch has been updated to check & try to use __NR_get_mempolicy first if SYS_get_mempolicy is not defined, which is suggested by @TheRealMDoerr in the JBS.

I don't see that suggestion in the JBS. I do see a greatly simplified
suggestion from mbaesken:

#if !defined(SYS_get_mempolicy) && defined(S390)
#define SYS_get_mempolicy __NR_get_mempolicy
#endif

If we know that __NR_get_mempolicy is defined on the system that had the
problem then we don't need to try and generalize things - specially if
the actual value is not constant for a given architecture.

Thanks,
David

@DamonFool
Copy link
Member Author

I would prefer to only use what we need to fix the problem on the "old" system that it was observed rather than trying to generalize this for arbitrary/unknown old versions of Linux. Otherwise we can never know when it is okay to remove this code.

David

Updated.
Seems more clear now.
Thanks.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simple fix, as suggested by @MBaesken, is fine with me, as long as it actually fixes the problem.

Thanks,
David

@MBaesken
Copy link
Member

MBaesken commented Jun 2, 2021

Hi, checking for the existance of SYS_get_mempolicy AND __NR_get_mempolicy is better.

So are you fine with the latest change?
Thanks.

hi, I checked with the larger fix you proposed yesterday. This fixed the night build.

@DamonFool
Copy link
Member Author

hi, I checked with the larger fix you proposed yesterday. This fixed the night build.

Thanks @MBaesken .
If you're fine with the latest version, could please also test it on your s390 machines?
Thanks.

@MBaesken
Copy link
Member

MBaesken commented Jun 2, 2021

hi, I checked with the larger fix you proposed yesterday. This fixed the night build.

Thanks @MBaesken .
If you're fine with the latest version, could please also test it on your s390 machines?
Thanks.

Unfortunately, the latest (short) version fails with
src/hotspot/os/linux/os_linux.cpp:2966:27: error: '__NR_get_mempolicy' was not declared in this scope
#define SYS_get_mempolicy __NR_get_mempolicy
^~~~~~~~~~~~~~~~~~

While __NR_get_mempolicy is on the machine in the system headers; it is not picked up, too bad.
Not sure what to include in this case ...

@DamonFool
Copy link
Member Author

hi, I checked with the larger fix you proposed yesterday. This fixed the night build.

Thanks @MBaesken .
If you're fine with the latest version, could please also test it on your s390 machines?
Thanks.

Unfortunately, the latest (short) version fails with
src/hotspot/os/linux/os_linux.cpp:2966:27: error: '__NR_get_mempolicy' was not declared in this scope
#define SYS_get_mempolicy __NR_get_mempolicy
^~~~~~~~~~~~~~~~~~

While __NR_get_mempolicy is on the machine in the system headers; it is not picked up, too bad.
Not sure what to include in this case ...

Thanks @MBaesken .

Then, I would prefer the larger one, which had been tested last night.
And patch has been updated.

Or shall we only keep the fix for s390?
What do you think?
Thanks.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see something smaller and S390 specific as there is only a problem on that platform.

Also is the right check for S390 or s390x ? Different versions of this patch have used each of them. I have no idea what the classification of S390 machines is, so ni idea if these are in fact equivalent. ??

Thanks,
David

@MBaesken
Copy link
Member

MBaesken commented Jun 2, 2021

I would prefer to see something smaller and S390 specific as there is only a problem on that platform.

Also is the right check for S390 or s390x ? Different versions of this patch have used each of them. I have no idea what the classification of S390 machines is, so ni idea if these are in fact equivalent. ??

Thanks,
David

The S390 works nicely for both "flavors" it comes from our own autoconf/platform.m4

elif test "x$OPENJDK_$1_CPU" = xs390; then
HOTSPOT_$1_CPU_DEFINE=S390
elif test "x$OPENJDK_$1_CPU" = xs390x; then
HOTSPOT_$1_CPU_DEFINE=S390

So this looks reliable. The other one __s390x__ seems to come from GNUC, see https://sourceforge.net/p/predef/wiki/Architectures/ . It worked too nicely in our night make. However I think the HS macro S390 would be prefered in HS coding (it is used at quite alot of places ).
Regarding a shorter version, then maybe this one (its ugly but should work my colleagues told me the old value 236 for the mempolicy would be recognized too ).

#if !defined(SYS_get_mempolicy) && defined(S390)
// No SYS_get_mempolicy definition on SUSE Linux Enterprise Server 11.4 (s390x)
#define SYS_get_mempolicy 236
#endif

@TheRealMDoerr
Copy link
Contributor

TheRealMDoerr commented Jun 2, 2021

Right, S390 is used in OS independent hotspot code. __s390x__ comes from the linux headers. So, in os_linux, both macros can get used.
I'm fine with either version, but I'm not sure if only s390 is affected. I haven't heard from people who build zero on other platforms.

@DamonFool
Copy link
Member Author

I'm fine with either version, but I'm not sure if only s390 is affected. I haven't heard from people who build zero on other platforms.

I'm also a bit worrying about this.

@DamonFool
Copy link
Member Author

Regarding a shorter version, then maybe this one (its ugly but should work my colleagues told me the old value 236 for the mempolicy would be recognized too ).

So it sounds really ugly.

What do you think of this fix?

diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp
index e3a7e28..4044b7b 100644
--- a/src/hotspot/os/linux/os_linux.cpp
+++ b/src/hotspot/os/linux/os_linux.cpp
@@ -2963,6 +2963,7 @@ void* os::Linux::libnuma_v2_dlsym(void* handle, const char* name) {
 
 // Check numa dependent syscalls
 static bool numa_syscall_check() {
+#ifdef SYS_get_mempolicy
   // NUMA APIs depend on several syscalls. E.g., get_mempolicy is required for numa_get_membind and
   // numa_get_interleave_mask. But these dependent syscalls can be unsupported for various reasons.
   // Especially in dockers, get_mempolicy is not allowed with the default configuration. So it's necessary
@@ -2972,6 +2973,7 @@ static bool numa_syscall_check() {
   if (syscall(SYS_get_mempolicy, &dummy, NULL, 0, (void*)&dummy, 3) == -1) {
     return false;
   }
+#endif
 
   return true;
 }

@TheRealMDoerr
Copy link
Contributor

Also fine with me. Should we return true or false when SYS_get_mempolicy is not defined?
True would mean not fixing JDK-8241423 on such platforms.
False would basically disable NUMA, but prevent crashes. Right?

@fweimer-rh
Copy link

#define SYS_get_mempolicy 236

This is not the correct number. It is 269 on both s390 and s390x. (236 is gettid on s390x.)

The kernel definitions can be a bit difficult to navigate. The glibc sources contain generated lists of system call numbers in the arch-syscall.h files (under the sysdeps/unix/sysv/linux directory tree), and these numbers are regularly verified on all supported architectures against published kernel sources (using a cross-compiler, so not just some script-based heuristics).

@DamonFool
Copy link
Member Author

False would basically disable NUMA, but prevent crashes. Right?

No.
It would crash only if ZGC is enabled.
So returning true is safe on s390 since there is no zgc on s390 and this is also what the old JDKs (before jdk17) actually do.
Thanks.

@TheRealMDoerr
Copy link
Contributor

Ok, your proposal is fine. Thanks!

@DamonFool
Copy link
Member Author

This is not the correct number. It is 269 on both s390 and s390x. (236 is gettid on s390x.)

If this is true, there should be a bug in the libnuma source code [1].
All my syscall numbers are copied from libnuma code.
Thanks.

[1] https://github.com/numactl/numactl/blob/master/syscall.c#L121

@DamonFool
Copy link
Member Author

Ok, your proposal is fine. Thanks!

Thanks @TheRealMDoerr .
I'll update it tomorrow if there is no objection.
Thanks.

@DamonFool
Copy link
Member Author

Ok, your proposal is fine. Thanks!

Thanks @TheRealMDoerr .
I'll update it tomorrow if there is no objection.
Thanks.

Updated.
Thanks.

Comment on lines +2971 to +2976
#ifdef SYS_get_mempolicy
int dummy = 0;
if (syscall(SYS_get_mempolicy, &dummy, NULL, 0, (void*)&dummy, 3) == -1) {
return false;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SYS_get_mempolicy is not defined then surely this must return false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SYS_get_mempolicy is not defined then surely this must return false?

I don't think so.

No SYS_get_mempolicy symbol doesn't mean get_mempolicy syscall is unsupported.
It may still be invoked directly by its syscall number just like what the libnuma does [1].

The logic is:

  1. if defined(SYS_get_mempolicy) and SYS_get_mempolicy returns -1
    then get_mempolicy is surely no supported
    so numa_syscall_check() must return fasle
  2. if not defined(SYS_get_mempolicy)
    then get_mempolicy may be supported or not
    so it can be wrong no matter what numa_syscall_check() returns.

However, for case 2), we can just let numa_syscall_check() return true.
This is fine because all the old JDKs (before jdk17) actually assume that get_mempolicy is always supported by default.

Another reason is that numa_syscall_check() is designed to do its best effort checks of numa related syscalls.
Even numa_syscall_check() return true, it still not guarantee that all numa required syscalls are supported since only get_mempolicy is checked currently.
But this is already enough for dockers with the default configuration.

Also, I think there should be SYS_get_mempolicy definition for almost all modern systems.
So the harmless false positive should be very very rare.

Thanks.

[1] https://github.com/numactl/numactl/blob/master/syscall.c#L121

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@mlbridge
Copy link

mlbridge bot commented Jun 3, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 3/06/2021 12:51 pm, Jie Fu wrote:

On Thu, 3 Jun 2021 02:17:18 GMT, David Holmes <dholmes at openjdk.org> wrote:

If SYS_get_mempolicy is not defined then surely this must return false?

I don't think so.

No SYS_get_mempolicy symbol doesn't mean get_mempolicy syscall is unsupported.
It may still be invoked directly by its syscall number just like what the libnuma does [1].

The logic is:
1) if defined(SYS_get_mempolicy) and SYS_get_mempolicy returns -1
then get_mempolicy is surely no supported
so numa_syscall_check() must return fasle
2) if not defined(SYS_get_mempolicy)
then get_mempolicy may be supported or not
so it can be wrong no matter what numa_syscall_check() returns.

However, for case 2), we can just let numa_syscall_check() return true.
This is fine because all the old JDKs (before jdk17) actually assume that get_mempolicy is always supported by default.

Another reason is that numa_syscall_check() is designed to do its best effort checks of numa related syscalls.
Even numa_syscall_check() return true, it still not guarantee that all numa required syscalls are supported since only get_mempolicy is checked currently.
But this is already enough for dockers with the default configuration.

Also, I think there should be SYS_get_mempolicy definition for almost all modern systems.
So the harmless false positive should be very very rare.

Okay. So much trouble for so little code.

Thanks,
David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 3, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 3/06/2021 12:51 pm, Jie Fu wrote:

On Thu, 3 Jun 2021 02:17:18 GMT, David Holmes <dholmes at openjdk.org> wrote:

If SYS_get_mempolicy is not defined then surely this must return false?

I don't think so.

No SYS_get_mempolicy symbol doesn't mean get_mempolicy syscall is unsupported.
It may still be invoked directly by its syscall number just like what the libnuma does [1].

The logic is:
1) if defined(SYS_get_mempolicy) and SYS_get_mempolicy returns -1
then get_mempolicy is surely no supported
so numa_syscall_check() must return fasle
2) if not defined(SYS_get_mempolicy)
then get_mempolicy may be supported or not
so it can be wrong no matter what numa_syscall_check() returns.

However, for case 2), we can just let numa_syscall_check() return true.
This is fine because all the old JDKs (before jdk17) actually assume that get_mempolicy is always supported by default.

Another reason is that numa_syscall_check() is designed to do its best effort checks of numa related syscalls.
Even numa_syscall_check() return true, it still not guarantee that all numa required syscalls are supported since only get_mempolicy is checked currently.
But this is already enough for dockers with the default configuration.

Also, I think there should be SYS_get_mempolicy definition for almost all modern systems.
So the harmless false positive should be very very rare.

Okay. So much trouble for so little code.

Thanks,
David
-----

@DamonFool
Copy link
Member Author

Okay. So much trouble for so little code.

Thanks @dholmes-ora .

Just enjoy the discussion with you all.
Thanks.

@DamonFool
Copy link
Member Author

/integrate

@openjdk openjdk bot closed this Jun 3, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 3, 2021
@openjdk
Copy link

openjdk bot commented Jun 3, 2021

@DamonFool Since your change was applied there have been 63 commits pushed to the master branch:

  • 338dae4: 8268133: Update java/net/Authenticator tests to eliminate dependency on sun.net.www.MessageHeader and some other internal APIs
  • 29ab162: 8266257: Fix foreign linker build issues for ppc and s390
  • c8f4c02: 8268118: Rename bytes_os_cpu.inline.hpp files to bytes_os_cpu.hpp
  • 1296a6c: 8268119: Rename copy_os_cpu.inline.hpp files to copy_os_cpu.hpp
  • 1783437: 8264975: java/net/DatagramSocket/DatagramSocketMulticasting.java fails infrequently
  • a52a08d: 8267916: Adopt cast notation for CompilerThread conversions
  • 9bf347b: 8226362: langtools ProblemList file contains more than 1 entry for a test
  • bdeaeb4: 8240256: Better resource cleaning for SunPKCS11 Provider
  • 06f87cf: 8266337: ThreadTimesClosure doesn't handle exceptions properly
  • ef01e47: 8268150: tier2: test/jdk/tools/jpackage/junit/junit.java needs updating for jtreg 6
  • ... and 53 more: https://git.openjdk.java.net/jdk/compare/ae2f37f868bfdcb3d46098e91ed537fb199d7dbe...master

Your commit was automatically rebased without conflicts.

Pushed as commit fbaebd4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@DamonFool DamonFool deleted the JDK-8268014 branch June 3, 2021 12:56
@fweimer-rh
Copy link

This is not the correct number. It is 269 on both s390 and s390x. (236 is gettid on s390x.)

If this is true, there should be a bug in the libnuma source code [1].
All my syscall numbers are copied from libnuma code.
Thanks.

[1] https://github.com/numactl/numactl/blob/master/syscall.c#L121

Thanks, I have filed an issue: numactl/numactl#110

@DamonFool
Copy link
Member Author

This is not the correct number. It is 269 on both s390 and s390x. (236 is gettid on s390x.)

If this is true, there should be a bug in the libnuma source code [1].
All my syscall numbers are copied from libnuma code.
Thanks.
[1] https://github.com/numactl/numactl/blob/master/syscall.c#L121

Thanks, I have filed an issue: numactl/numactl#110

Thumbs up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated test-request
Development

Successfully merging this pull request may close these issues.

5 participants