Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8257746: Regression introduced with JDK-8250984 - memory might be null in some machines #2269

Closed
wants to merge 1 commit into from

Conversation

@poonamparhar
Copy link
Member

@poonamparhar poonamparhar commented Jan 27, 2021

Please review this simple change that adds null checks for memory in CgroupV1Subsystem.java.

Problem: After the backport of JDK-8250984, there are places where memory.isSwapEnabled() is called. For example:

public long getMemoryAndSwapFailCount() {
    if (!memory.isSwapEnabled()) {
        return getMemoryFailCount();
    }
    return SubSystem.getLongValue(memory, "memory.memsw.failcnt");
}

But memory could be Null on some machines that have cgroup entries for CPU but not for memory. This would cause a NullPointerException when memory is accessed.

Fix: Add null checks for memory.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2269/head:pull/2269
$ git checkout pull/2269

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 27, 2021

👋 Welcome back poonam! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Jan 27, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 27, 2021

@poonamparhar The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Jan 27, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 27, 2021

Webrevs

Copy link
Contributor

@jerboaa jerboaa left a comment

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

Copy link
Member

@hseigel hseigel left a comment

Changes look good! Thanks for doing this.

Harold

@openjdk
Copy link

@openjdk openjdk bot commented Jan 28, 2021

@poonamparhar This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8257746: Regression introduced with JDK-8250984 - memory might be null in some machines

Reviewed-by: hseigel

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • baf46ba: 8259801: Enable XML Signature secure validation mode by default
  • 20e7df5: 8260466: Test TestHeapDumpOnOutOfMemoryError.java needs multiple @test sections
  • 11d6467: 8260407: cmp != __null && cmp->Opcode() == Op_CmpL failure with -XX:StressLongCountedLoop=200000000 in lucene
  • d07af2b: 8255531: MethodHandles::permuteArguments throws NPE when duplicating dropped arguments
  • a68c6c2: 8260579: PPC64 and S390 builds failures after JDK-8260467
  • 8752257: 8260502: [s390] NativeMovRegMem::verify() fails because it's too strict
  • 8fe1323: 8260520: Avoid getting permissions in JarFileFactory when no SecurityManager installed
  • ecde52e: 8260506: VersionHelper cleanup
  • a97aedf: 8256215: Shenandoah: re-organize saving/restoring machine state in assembler code
  • 316d52c: 8260497: Shenandoah: Improve SATB flushing
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/684c8558f66e6f8875b68d7b0319f8391389fad3...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jan 28, 2021
@poonamparhar
Copy link
Member Author

@poonamparhar poonamparhar commented Jan 28, 2021

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I don't have access to the config. The issue was reported by a customer.

@poonamparhar
Copy link
Member Author

@poonamparhar poonamparhar commented Jan 28, 2021

/integrate

@openjdk openjdk bot closed this Jan 28, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Jan 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 28, 2021

@poonamparhar Since your change was applied there have been 23 commits pushed to the master branch:

  • 13ca433: 8259628: jdk/net/ExtendedSocketOption/AsynchronousSocketChannelNAPITest.java fails intermittently
  • baf46ba: 8259801: Enable XML Signature secure validation mode by default
  • 20e7df5: 8260466: Test TestHeapDumpOnOutOfMemoryError.java needs multiple @test sections
  • 11d6467: 8260407: cmp != __null && cmp->Opcode() == Op_CmpL failure with -XX:StressLongCountedLoop=200000000 in lucene
  • d07af2b: 8255531: MethodHandles::permuteArguments throws NPE when duplicating dropped arguments
  • a68c6c2: 8260579: PPC64 and S390 builds failures after JDK-8260467
  • 8752257: 8260502: [s390] NativeMovRegMem::verify() fails because it's too strict
  • 8fe1323: 8260520: Avoid getting permissions in JarFileFactory when no SecurityManager installed
  • ecde52e: 8260506: VersionHelper cleanup
  • a97aedf: 8256215: Shenandoah: re-organize saving/restoring machine state in assembler code
  • ... and 13 more: https://git.openjdk.java.net/jdk/compare/684c8558f66e6f8875b68d7b0319f8391389fad3...master

Your commit was automatically rebased without conflicts.

Pushed as commit abc4300.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jerboaa
Copy link
Contributor

@jerboaa jerboaa commented Jan 28, 2021

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I don't have access to the config. The issue was reported by a customer.

This isn't very satisfying, though. How can we be sure this issue isn't also present in the cgroup v2 code? Has this been tested? Surely, there was some stack trace reported by the customer or some sort of reproducer got provided. What was the reasoning that established this issue is present in JDK head and only in cgroups v1 code? My guess is that the issue got triggered via the OperatingSystemMXBean, but nothing to that effect has been noted here or in the bug.

If I were to propose such a point fix, clearly, I'd have to provide some details what the actual problem is and explain why the fix is sufficient and covers all branches. All that got provided is: "But memory could be Null on some machines that have cgroup entries for CPU but not for memory."

@poonamparhar
Copy link
Member Author

@poonamparhar poonamparhar commented Jan 28, 2021

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I don't have access to the config. The issue was reported by a customer.

This isn't very satisfying, though. How can we be sure this issue isn't also present in the cgroup v2 code? Has this been tested? Surely, there was some stack trace reported by the customer or some sort of reproducer got provided. What was the reasoning that established this issue is present in JDK head and only in cgroups v1 code? My guess is that the issue got triggered via the OperatingSystemMXBean, but nothing to that effect has been noted here or in the bug.

If I were to propose such a point fix, clearly, I'd have to provide some details what the actual problem is and explain why the fix is sufficient and covers all branches. All that got provided is: "But memory could be Null on some machines that have cgroup entries for CPU but not for memory."

I can check with the customer if they could share their config.

The cgroups v2 code was thoroughly examined and this problem does not exist in that code. cgroups v2 does not have a separate MemorySubSystemController as we have for c1 cgroups.

An instance of the CgroupV1MemorySubSystemController is stored as a member in CgroupV1Subsystem.

private CgroupV1MemorySubSystemController memory;

private void setMemorySubSystem(CgroupV1MemorySubSystemController memory) {
      this.memory = memory;
}

This memory instance variable could stay null if "memory" entry is not found while creating sub-system objects in createSubSystemController().

           case "memory":
                  subsystem.setMemorySubSystem(new CgroupV1MemorySubSystemController(mountentry[3], mountentry[4]));
                  break;

This fix ensure that the memory instance is not null before invoking any method on it.

Such problem does not exist in cgroups v2 code.

@jerboaa
Copy link
Contributor

@jerboaa jerboaa commented Jan 28, 2021

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I'm curious: What config is this to actually trigger the NPE? How does /proc/self/mountinfo, /proc/self/cgroup and /proc/cgroups look like?

I don't have access to the config. The issue was reported by a customer.

This isn't very satisfying, though. How can we be sure this issue isn't also present in the cgroup v2 code? Has this been tested? Surely, there was some stack trace reported by the customer or some sort of reproducer got provided. What was the reasoning that established this issue is present in JDK head and only in cgroups v1 code? My guess is that the issue got triggered via the OperatingSystemMXBean, but nothing to that effect has been noted here or in the bug.
If I were to propose such a point fix, clearly, I'd have to provide some details what the actual problem is and explain why the fix is sufficient and covers all branches. All that got provided is: "But memory could be Null on some machines that have cgroup entries for CPU but not for memory."

I can check with the customer if they could share their config.

That would be helpful. A stack trace of the NPE would be good too.

The cgroups v2 code was thoroughly examined and this problem does not exist in that code. cgroups v2 does not have a separate MemorySubSystemController as we have for c1 cgroups.

An instance of the CgroupV1MemorySubSystemController is stored as a member in CgroupV1Subsystem.

private CgroupV1MemorySubSystemController memory;

private void setMemorySubSystem(CgroupV1MemorySubSystemController memory) {
      this.memory = memory;
}

This memory instance variable could stay null if "memory" entry is not found while creating sub-system objects in createSubSystemController().

           case "memory":
                  subsystem.setMemorySubSystem(new CgroupV1MemorySubSystemController(mountentry[3], mountentry[4]));
                  break;

This fix ensure that the memory instance is not null before invoking any method on it.

Such problem does not exist in cgroups v2 code.

Right, but is this reasoning sound? Without knowing the code path triggering the problem and knowing something about the config can we really say? Depending on the config it might fail in some very different way on cgroups v2!

On the other hand, knowing the config, might allow us to conclude it's an impossible config on cgroups v2 and we are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants