Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8241423: NUMA APIs fail to work in dockers due to dependent syscalls are disabled by default #4205

Closed
wants to merge 2 commits into from

Conversation

DamonFool
Copy link
Member

@DamonFool DamonFool commented May 26, 2021

Hi all,

NUMA APIs fail to work in dockers due to dependent syscalls are disabled by default.

NUMA APIs depend on several syscalls.
E.g., get_mempolicy is required for numa_get_membind and numa_get_interleave_mask.
But these dependent syscalls can be unsupported for various reasons.
Especially in dockers, get_mempolicy is not allowed with the default configuration [1].
So it's necessary to check whether the syscalls are available.

In theory, all NUMA-related syscalls should be checked.
But it seems hard to do so because some of them like mbind would cause unexpected side effects.
So only get_mempolicy is checked in the fix, which is already enough for all the default dockers.
And this can be refined in the future if it turns out to be a problem.

Thanks.
Best regards,
Jie

[1] https://docs.docker.com/engine/security/seccomp/


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8241423: NUMA APIs fail to work in dockers due to dependent syscalls are disabled by default

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4205/head:pull/4205
$ git checkout pull/4205

Update a local copy of the PR:
$ git checkout pull/4205
$ git pull https://git.openjdk.java.net/jdk pull/4205/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4205

View PR using the GUI difftool:
$ git pr show -t 4205

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4205.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 26, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label May 26, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 26, 2021

@DamonFool The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime label May 26, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 26, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Jie,

This wasn't as bad as I thought it might be. :) But I have one suggested change and a query.

Thanks,
David

@@ -4477,6 +4478,21 @@ void os::Linux::numa_init() {
}
}

// Check numa dependent syscalls
bool os::Linux::numa_syscall_check() {
Copy link
Member

@dholmes-ora dholmes-ora May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be a static function in the file rather then being in the os::Linux "namespace"

Copy link
Member Author

@DamonFool DamonFool May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be a static function in the file rather then being in the os::Linux "namespace"

Updated.
Thanks.

// to check whether the syscalls are available. Currently, only get_mempolicy is checked since checking
// others like mbind would cause unexpected side effects.
int dummy = 0;
if (syscall(SYS_get_mempolicy, &dummy, NULL, 0, (void*)&dummy, 3) == -1) {
Copy link
Member

@dholmes-ora dholmes-ora May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this SYS_get_mempolicy symbol guaranteed to be available on our supported Linux versions? See earlier in the file for how we handle SYS_gettid and SYS_getcpu.

Copy link
Member Author

@DamonFool DamonFool May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this SYS_get_mempolicy symbol guaranteed to be available on our supported Linux versions? See earlier in the file for how we handle SYS_gettid and SYS_getcpu.

I think it's OK to use the symbol directly since we have used it in ZGC [1].
Thanks.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/gc/z/zSyscall_linux.cpp#L39

@pliden
Copy link
Contributor

@pliden pliden commented May 26, 2021

Wasn't this fixed in #3704? It seems we now have a similar check in two different places?

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 26, 2021

Wasn't this fixed in #3704?

No.
#3704 just fix the zgc crash.
Other gcs still suffer from it.

It seems we now have a similar check in two different places?

To fix the zgc crash, we only need to check get_mempolicy.
But numa_syscall_check may be extended to other syscalls in the future.

@pliden
Copy link
Contributor

@pliden pliden commented May 26, 2021

But then we should remove the check in ZGC, since it serves no purpose anymore when the same check is done in libnuma_init.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 27, 2021

But then we should remove the check in ZGC, since it serves no purpose anymore when the same check is done in libnuma_init.

Yes, this is a more general solution.

I used to send the RFR in 2020 [1].
But the community didn't pay much attention to it at that time.
So we have to get #3704 fixed because ZGC always crashed in docker with numa, while others wouldn't.

This week, @tschatzl encouraged me to try JDK-8241423 again.
So I sent this PR yesterday hoping that it can be fixed for other gcs too.

Changes in ZGC (JDK-8241354 and JDK-8266217) would become unnecessary if this PR is accepted.
And to make it to be more friendly to get reviewed, it would be better to file another RFE to remove them in ZGC.
Thanks.

[1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-March/041149.html

Copy link
Member

@dholmes-ora dholmes-ora left a comment

LGTM.

Thanks,
David

@openjdk
Copy link

@openjdk openjdk bot commented May 27, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8241423: NUMA APIs fail to work in dockers due to dependent syscalls are disabled by default

Reviewed-by: dholmes, pliden

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 33 new commits pushed to the master branch:

  • 1d2c7ac: 8267555: Fix class file version during redefinition after 8238048
  • 97ec5ad: 8265753: Remove manual JavaThread transitions to blocked
  • 6eb9114: 8266877: Missing local debug information when debugging JEP-330
  • 0c9daa7: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)
  • 95b1fa7: 8267529: StringJoiner can create a String that breaks String::equals
  • 7f52c50: 8182043: Access to Windows Large Icons
  • 8a31c07: 8267886: ProblemList javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
  • ae258f1: 8265418: Clean-up redundant null-checks of Class.getPackageName()
  • 41185d3: 8229517: Support for optional asynchronous/buffered logging
  • 7c85f35: 8267123: Remove RMI Activation
  • ... and 23 more: https://git.openjdk.java.net/jdk/compare/f632254943e335d0b4a76d03530309cd194b0813...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label May 27, 2021
@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 27, 2021

LGTM.

Thanks,
David

Thanks @dholmes-ora .

@pliden
Copy link
Contributor

@pliden pliden commented May 27, 2021

Changes in ZGC (JDK-8241354 and JDK-8266217) would become unnecessary if this PR is accepted.
And to make it to be more friendly to get reviewed, it would be better to file another RFE to remove them in ZGC.

I just think the backout of JDK-8241354 and JDK-8266217 should have have been part of this PR. Otherwise this gets harder to review, since the question why we do redundant checks now pops up. But sure, if you prefer backing out JDK-8241354 and JDK-8266217 as a separate change, I'm fine with that too.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 27, 2021

But sure, if you prefer backing out JDK-8241354 and JDK-8266217 as a separate change, I'm fine with that too.

Thanks @pliden .
I'll file a RFE after this pr.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 28, 2021

May I get a second review for this change?
Or is it already OK to be pushed?
Thanks.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented May 28, 2021

Two reviews needed.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 28, 2021

Hi @pliden and @tschatzl ,

Could you please review this change?
Thanks.

pliden
pliden approved these changes May 28, 2021
@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented May 28, 2021

Thanks @pliden .
/integrate

@openjdk openjdk bot closed this May 28, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels May 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 28, 2021

@DamonFool Since your change was applied there have been 33 commits pushed to the master branch:

  • 1d2c7ac: 8267555: Fix class file version during redefinition after 8238048
  • 97ec5ad: 8265753: Remove manual JavaThread transitions to blocked
  • 6eb9114: 8266877: Missing local debug information when debugging JEP-330
  • 0c9daa7: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)
  • 95b1fa7: 8267529: StringJoiner can create a String that breaks String::equals
  • 7f52c50: 8182043: Access to Windows Large Icons
  • 8a31c07: 8267886: ProblemList javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
  • ae258f1: 8265418: Clean-up redundant null-checks of Class.getPackageName()
  • 41185d3: 8229517: Support for optional asynchronous/buffered logging
  • 7c85f35: 8267123: Remove RMI Activation
  • ... and 23 more: https://git.openjdk.java.net/jdk/compare/f632254943e335d0b4a76d03530309cd194b0813...master

Your commit was automatically rebased without conflicts.

Pushed as commit 1413f9e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime integrated
3 participants