Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages #2488

Closed
wants to merge 9 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Feb 9, 2021

When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.

The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.

A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.

The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using shmget() with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.

The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses /proc/self/status to make sure the needed capability is available.

This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in PR#2486 and I plan to get that integrated before this one.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488
$ git checkout pull/2488

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 9, 2021

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 9, 2021

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 9, 2021
@kstefanj kstefanj marked this pull request as ready for review February 10, 2021 10:28
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 10, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 10, 2021

Webrevs

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stefan,

Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.

About your patch: The shmget test is good, I am not sure about the rest.

You first attempt to shmget(IPC_PRIVATE, page_size, SHM_HUGETLB). Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:

       EPERM  The SHM_HUGETLB flag was specified, but the caller was not
              privileged (did not have the CAP_IPC_LOCK capability).

So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().

About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.

I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.

Cheers, Thomas

@@ -3564,6 +3564,66 @@ bool os::Linux::hugetlbfs_sanity_check(bool warn, size_t page_size) {
return result;
}

// The capability needed to lock memory CAP_IPC_LOCK
#define CAP_IPC_LOCK 14
#define CAP_IPC_LOCK_BIT (1 << CAP_IPC_LOCK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just include linux/capabilities.h ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, I'll change it.

@kstefanj
Copy link
Contributor Author

Hi Stefan,

Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.

Nice to hear that you agree Thomas :)

About your patch: The shmget test is good, I am not sure about the rest.

You first attempt to shmget(IPC_PRIVATE, page_size, SHM_HUGETLB). Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:

       EPERM  The SHM_HUGETLB flag was specified, but the caller was not
              privileged (did not have the CAP_IPC_LOCK capability).

So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().

This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first shmget() will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size).

I could of course design the other check to once again call shmget(), but with a size larger than the limit. If that check fails with EPERM we don't have the needed capability.

About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.

In old kernels this was the case but according to man setrlimit this is no longer the case:

RLIMIT_MEMLOCK
  The maximum number of bytes of memory that may be locked into RAM ...
  In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be
  locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of 
  memory that a privileged process may lock, and this limit instead governs the amount 
  of memory that an unprivileged process may lock. 

So if we have the capability the limit will never be a limiting factor.

I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.

I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did.

In that case I would only include the first shmget() in the sanity check and file a separate issue for improving the warning.

Removed check for IPC_LOCK capability. If a large page mapping fails
the errno is already present in the warning printed out. We could
look at improving this to better explain when EPERM vs ENOMEM occurs.
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good. Minor nit below.

src/hotspot/os/linux/os_linux.cpp Outdated Show resolved Hide resolved
@tstuefe
Copy link
Member

tstuefe commented Feb 10, 2021

Hi Stefan,
Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.

Nice to hear that you agree Thomas :)

About your patch: The shmget test is good, I am not sure about the rest.
You first attempt to shmget(IPC_PRIVATE, page_size, SHM_HUGETLB). Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:

       EPERM  The SHM_HUGETLB flag was specified, but the caller was not
              privileged (did not have the CAP_IPC_LOCK capability).

So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().

This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first shmget() will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size).

I could of course design the other check to once again call shmget(), but with a size larger than the limit. If that check fails with EPERM we don't have the needed capability.

Okay, I see what you did now.

About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.

In old kernels this was the case but according to man setrlimit this is no longer the case:

RLIMIT_MEMLOCK
  The maximum number of bytes of memory that may be locked into RAM ...
  In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be
  locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of 
  memory that a privileged process may lock, and this limit instead governs the amount 
  of memory that an unprivileged process may lock. 

So if we have the capability the limit will never be a limiting factor.

I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.

I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did.

In that case I would only include the first shmget() in the sanity check and file a separate issue for improving the warning.

This makes sense, especially since you could add that warning to both SysV and mmap paths since they share at least some of the restrictions. Like huge page pool size. It also would mean you don't pay for some of these tests unless needed.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thank you!

@openjdk
Copy link

openjdk bot commented Feb 11, 2021

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Reviewed-by: stuefe, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 27 new commits pushed to the master branch:

  • 2e18b52: 8261752: Multiple GC test are missing memory requirements
  • c7885eb: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize
  • 05d5955: 8261522: [PPC64] AES intrinsics write beyond the destination array
  • 03b586b: 8261750: Remove internal class sun.net.www.MimeLauncher
  • 8418285: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check
  • a930870: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC
  • b955f85: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation
  • d195033: 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet
  • fc1d032: 8261125: Move VM_Operation to vmOperation.hpp
  • d547e1a: 8261123: Augment discussion of equivalence classes in Object.equals and comparison methods
  • ... and 17 more: https://git.openjdk.java.net/jdk/compare/3882fda83bebf4a8c8100fd59c37f04610491ce7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 11, 2021
// Possible reasons for shmget failure:
// 1. shmmax is too small for the request.
// > check shmmax value: cat /proc/sys/kernel/shmmax
// > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using a 32 bit constant on potentially 64 bit systems. According to that same man page, the limit on 64 bits is 127 TB which is more than the suggested 4 GB :)

The man page talks about ULONG_MAX - 2^24 as default (for 64 bit systems?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a copy of the old one where we actually use shmget(), but I agree that it could use an update. Not even sure we have to suggest a specific value, something like:

Suggested change
// > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax
// > increase shmmax value: echo "new value" > /proc/sys/kernel/shmmax

Might be good enough.

I will also update the original comment.

bool os::Linux::shm_hugetlbfs_sanity_check(bool warn, size_t page_size) {
// Try to create a large shared memory segment.
int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
if (shmid == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flags should contain the SHM_HUGE_* flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in JDK-8261636 reported just recently for the same reason.

According to the man pages] such flags exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree if the later usage of shmget() would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with creating a CR for that. We can always close it later.

// 2. not enough large page memory.
// > check available large pages: cat /proc/meminfo
// > increase amount of large pages:
// echo new_value > /proc/sys/vm/nr_hugepages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just my opinion, but I would prefer to refer to some generic documentation about these options (and use the "new" locations in /sys/kernel/mm/hugepages/hugepages-*/ instead of the "legacy" /proc/sys, e.g. https://lwn.net/Articles/376606/ - yes this is not official documentation, but the best I could find on the spot)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the documentation I usually refer to is:
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Since our SHM-support currently only handles the default large page size I think we should suggest to use:
sysctl -w vm.nr_hugepages=new_value

Suggested change
// echo new_value > /proc/sys/vm/nr_hugepages
// sysctl -w vm.nr_hugepages=new_value
// > For more information regarding large pages please refer to:
// https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look Thomas

bool os::Linux::shm_hugetlbfs_sanity_check(bool warn, size_t page_size) {
// Try to create a large shared memory segment.
int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
if (shmid == -1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree if the later usage of shmget() would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support.

// Possible reasons for shmget failure:
// 1. shmmax is too small for the request.
// > check shmmax value: cat /proc/sys/kernel/shmmax
// > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a copy of the old one where we actually use shmget(), but I agree that it could use an update. Not even sure we have to suggest a specific value, something like:

Suggested change
// > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax
// > increase shmmax value: echo "new value" > /proc/sys/kernel/shmmax

Might be good enough.

I will also update the original comment.

// 2. not enough large page memory.
// > check available large pages: cat /proc/meminfo
// > increase amount of large pages:
// echo new_value > /proc/sys/vm/nr_hugepages
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the documentation I usually refer to is:
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Since our SHM-support currently only handles the default large page size I think we should suggest to use:
sysctl -w vm.nr_hugepages=new_value

Suggested change
// echo new_value > /proc/sys/vm/nr_hugepages
// sysctl -w vm.nr_hugepages=new_value
// > For more information regarding large pages please refer to:
// https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve the changes you made in the "Suggested changes" boxes, or even comment on them as "looks good" directly, but they look all good.

bool os::Linux::shm_hugetlbfs_sanity_check(bool warn, size_t page_size) {
// Try to create a large shared memory segment.
int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
if (shmid == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with creating a CR for that. We can always close it later.

@kstefanj
Copy link
Contributor Author

@kstefanj
Copy link
Contributor Author

/integrate

@openjdk openjdk bot closed this Feb 17, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 17, 2021
@openjdk
Copy link

openjdk bot commented Feb 17, 2021

@kstefanj Since your change was applied there have been 27 commits pushed to the master branch:

  • 2e18b52: 8261752: Multiple GC test are missing memory requirements
  • c7885eb: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize
  • 05d5955: 8261522: [PPC64] AES intrinsics write beyond the destination array
  • 03b586b: 8261750: Remove internal class sun.net.www.MimeLauncher
  • 8418285: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check
  • a930870: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC
  • b955f85: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation
  • d195033: 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet
  • fc1d032: 8261125: Move VM_Operation to vmOperation.hpp
  • d547e1a: 8261123: Augment discussion of equivalence classes in Object.equals and comparison methods
  • ... and 17 more: https://git.openjdk.java.net/jdk/compare/3882fda83bebf4a8c8100fd59c37f04610491ce7...master

Your commit was automatically rebased without conflicts.

Pushed as commit f639df4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@kstefanj kstefanj deleted the 8261401-shm-sanity branch May 18, 2021 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
3 participants