Skip to content
This repository has been archived by the owner. It is now read-only.

8259765: ZGC: Handle incorrect processor id reported by the operating system #124

Closed
wants to merge 2 commits into from

Conversation

@pliden
Copy link
Contributor

@pliden pliden commented Jan 15, 2021

Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count().

We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment.

This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future.

Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker.

This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen.

Testing: Manual testing with various fake/incorrect values returned from sched_getcpu().


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8259765: ZGC: Handle incorrect processor id reported by the operating system

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/124/head:pull/124
$ git checkout pull/124

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 15, 2021

👋 Welcome back pliden! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 15, 2021

/cc hotspot-gc

@openjdk openjdk bot added the hotspot-gc label Jan 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@pliden
The hotspot-gc label was successfully added.

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 15, 2021

/cc hotspot-runtime

@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@pliden
The hotspot-runtime label was successfully added.

@pliden pliden marked this pull request as ready for review Jan 15, 2021
@openjdk openjdk bot added the rfr label Jan 15, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 15, 2021

Webrevs

@@ -4746,19 +4746,46 @@ int os::active_processor_count() {
return active_cpus;
}

static volatile int warn_invalid_processor_id = 1;
Copy link
Member

@albertnetymk albertnetymk Jan 15, 2021

Maybe moving this var into the function, since it's only used inside it.

Copy link
Contributor Author

@pliden pliden Jan 15, 2021

Doing so will come with the cost of always having to run a pthread_once() in function entry.

Copy link
Contributor Author

@pliden pliden Jan 19, 2021

I'm was wrong here. Since the initialization if effectively const/constexp, there will not be any "pthread_once" overhead here. Moved the static variable inside the function in the latest commit.

log_warning(os)("Invalid processor id reported by the operating system "
"(got processor id %d, valid processor id range is 0-%d)",
id, processor_count() - 1);
log_warning(os)("Falling back so assuming processor id is 0. "
Copy link
Contributor

@fisk fisk Jan 15, 2021

s/so/to/

Copy link
Contributor Author

@pliden pliden Jan 15, 2021

Will fix!

// Some debuggers limit the processor count without limiting
// the returned processor ids. Fake the processor id.
return 0;
if (id >= 0 && id < processor_count()) {
Copy link
Contributor

@fisk fisk Jan 15, 2021

Do we really need to check if the returned processor ID is negative? That seems a whole new level of environment screwup to me.

Copy link
Contributor Author

@pliden pliden Jan 15, 2021

I'm thinking we should make this safe to call in all cases. God knows what a broken environment might return.

Copy link
Contributor Author

@pliden pliden Jan 19, 2021

After some discussions, we agreed to not to check for negative processor ids.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then?

Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment?

Cheers,
David

fisk
fisk approved these changes Jan 18, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 18, 2021

@pliden This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259765: ZGC: Handle incorrect processor id reported by the operating system

Reviewed-by: ayang, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 4 new commits pushed to the master branch:

  • bb0821e: 8258643: [TESTBUG] javax/swing/JComponent/7154030/bug7154030.java failed with "Exception: Failed to hide opaque button"
  • cd25bf2: 8259574: SIGSEGV in BFSClosure::closure_impl
  • d5ca3b3: 8259641: C2: assert(early->dominates(LCA)) failed: early is high enough
  • e85892b: 8258396: SIGILL in jdk.jfr.internal.PlatformRecorder.rotateDisk()

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jan 18, 2021
@fweimer-rh
Copy link

@fweimer-rh fweimer-rh commented Jan 18, 2021

What does the affinity mask look like at process startup? It should be possible to look at that and take the maximum CPU ID (plus 1) and sysconf(_SC_NPROCESSORS_CONF). This would be a one-time overhead.

It will not work with container deployments that dynamically alter affinity masks. Are there any?

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 19, 2021

@dholmes-ora

So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then?

Not sure what you have in mind here? Having an indirect function call would not result in a lower overhead than the test/branch I've introduced. It's also not necessarily trivial to detect this error at startup, as you would need a reliable way to enumerate all processors (something that seems semi-broken in this environment, which is the root of the problem), bind the current thread to each of them and then check the processor id.

Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment?

That's of course always judgement call/trade-off. I can't say I have a super good understanding of how common this environment it, but there's at least one "Java cloud provider" that uses this environment.

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 20, 2021

It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR.

From the mailing list:

Glibc's tst-getcpu.c (which I assume is the test you are referring
to?) fails in their environment, so it seems like the affinity mask
isn't reliable either.

What's the nature of the failure? If it's due to a non-changing
affinity mask, then using sched_getaffinity data would still be okay.

Glibc's tst-getcpu fails with some version of "getcpu results X should be Y".

There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns.

Example (container with 1 CPU):

  1. sysconf(_SC_NPROCESSORS_CONF) returns 1
  2. sysconf(_SC_NPROCESSORS_ONLN) returns 1
  3. sched_getaffinity() returns the mask 00000001
  4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0.

Another example (container with 2 CPUs):

  1. sysconf(_SC_NPROCESSORS_CONF) returns 2
  2. sysconf(_SC_NPROCESSORS_ONLN) returns 2
  3. sched_getaffinity() returns the mask 00000011
  4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0.
  5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1.

It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 21, 2021

Mailing list message from David Holmes on hotspot-gc-dev:

Hi Per,

On 20/01/2021 11:16 pm, Per Liden wrote:

It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR.

From the mailing list:

Glibc's tst-getcpu.c (which I assume is the test you are referring
to?) fails in their environment, so it seems like the affinity mask
isn't reliable either.

What's the nature of the failure? If it's due to a non-changing
affinity mask, then using sched_getaffinity data would still be okay.

Glibc's tst-getcpu fails with some version of "getcpu results X should be Y".

There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns.

Example (container with 1 CPU):

1. sysconf(_SC_NPROCESSORS_CONF) returns 1
2. sysconf(_SC_NPROCESSORS_ONLN) returns 1
3. sched_getaffinity() returns the mask 00000001
4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0.

Another example (container with 2 CPUs):

1. sysconf(_SC_NPROCESSORS_CONF) returns 2
2. sysconf(_SC_NPROCESSORS_ONLN) returns 2
3. sched_getaffinity() returns the mask 00000011
4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0.
5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1.

It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU.

So it isn't that sysconf(_SC_NPROCESSORS_CONF) returns a too low number
as stated in the PR but rather that after calling sched_setaffinity,
sched_getcpu is broken? Either way won't that breakage also potentially
affect the NUMA code as well?

Thanks,
David

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 21, 2021

Does sched_getaffinity actually change the affinity mask?

(assuming you meant sched_setaffinity here...)

You're seem to be right. sched_setaffinity() returns success, but a following call to sched_getaffinity() shows it had no effect.

I wonder if it just reports a 2**N - 1 unconditionally, with N being the
number of configured vCPUs for the container. It probably does that so
that the population count of the affinity mask matches the vCPU count.
Likewise for the CPU entries under /sys (currently ignored by glibc
because of a parser bug) and /proc/stat (the fallback actually used by
glibc). There is no virtualization of CPU IDs whatsoever, it looks like
it's all done to communicate the vCPU count, without taking into account
how badly this interacts with sched_getcpu.

Yep, that's what it looks like.

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 21, 2021

So it isn't that sysconf(_SC_NPROCESSORS_CONF) returns a too low number as stated in the PR but rather that after calling sched_setaffinity, sched_getcpu is broken?

It wasn't my intention to claim that sysconf() is the problem here. I just wanted to mention that it might be sysconf() that is the problem. The reason I mentioned that is the because of how Docker behaves. If you give a Docker container 2 CPUs, sysconf() will still return the number of CPUs available on the host system, e.g. 8, and sched_getcpu() will in that case return numbers in the 0-7 range. Of course, this was just an observation, Docker and OpenVZ could do things differently here.

Either way won't that breakage also potentially affect the NUMA code as well?

We should be good, because libnuma will report that NUMA is not available, so we automatically disable UseNUMA if it's set.

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 22, 2021

@dholmes-ora Do you still have questions or concerns here, or can I go ahead and integrate this?

I've gone through all uses of sysconf(SC_NPROCESSORS*) and sched_getaffinity() we have, and they look fine. I've also looked at how the OSContainer stuff behaves in this environment, and it also looks fine. In summary, the only problem I can spot is related to sched_getcpu().

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 24, 2021

Mailing list message from David Holmes on hotspot-gc-dev:

On 22/01/2021 9:21 pm, Per Liden wrote:

On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes <dholmes at openjdk.org> wrote:

Per Liden has updated the pull request incrementally with one additional commit since the last revision:

Review

So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then?

Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment?

Cheers,
David

@dholmes-ora Do you still have questions or concerns here, or can I go ahead and integrate this?

I remain concerned about the justification for putting in this
workaround for a broken virtualization system. I would be happier if the
bug was acknowledged and a fix was in the pipeline so we would know how
long we have to carry this for.

I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and sched_getaffinity() we have, and they look fine. I've also looked at how the OSContainer stuff behaves in this environment, and it also looks fine. In summary, the only problem I can spot is related to sched_getcpu().

So IIUC what we suspect is that sched_getcpu is reporting physical id's
rather than virtualized ones. I find it hard to imagine how only one API
in this area can be affected by such a bug, but if that appears to be
the case then that is reassuring.

I won't "block" this, but I'm not happy about it.

Thanks,
David

@pliden
Copy link
Contributor Author

@pliden pliden commented Jan 28, 2021

Ok, thanks all for reviewing.

/integrate

@openjdk openjdk bot closed this Jan 28, 2021
@openjdk openjdk bot added integrated and removed ready labels Jan 28, 2021
@openjdk openjdk bot removed the rfr label Jan 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 28, 2021

@pliden Since your change was applied there have been 16 commits pushed to the master branch:

  • e28e111: 8260370: C2: LoopLimit node is not eliminated
  • 408772c: 8259025: Record compact constructor using Objects.requireNonNull
  • 81e730e: 8259276: C2: Empty expression stack when reexecuting tableswitch/lookupswitch instructions after deoptimization
  • c5ab7c3: 8260284: C2: assert(_base == Int) failed: Not an Int
  • 685c03d: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region"
  • d90e06a: 8259775: [Vector API] Incorrect code-gen for VectorReinterpret operation
  • ede1bea: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation
  • 62eab50: 8255199: Catching a few NumberFormatExceptions in xmldsig
  • a5367cb: 8247619: Improve Direct Buffering of Characters
  • 0408b23: 8259757: add a regression test for 8259353 and 8259601
  • ... and 6 more: https://git.openjdk.java.net/jdk16/compare/4307fa68b78b3516a2fe4085534681229aa69a13...master

Your commit was automatically rebased without conflicts.

Pushed as commit e68eac9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@pliden pliden deleted the 8259765_processor_id branch Jan 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
5 participants