Skip to content

8267579: Thread::cooked_allocated_bytes() hits assert(left >= right) failed: avoid underflow #4363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mgronlun
Copy link

@mgronlun mgronlun commented Jun 4, 2021

This is a workaround to avoid hitting the assert that was recently added to pointer_delta().

The implementation of cooked_allocated_bytes() is perhaps questionable, in that it reads tlab pointers optimistically. However, the functionality has been in place for a long time, and the impact of changing its behavior more substantially is unknown at this time, hence this defensive workaround to reduce noise and problems seen in general testing.

Testing: tier1, tier2, tier6, tier8

Thanks
Markus


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8267579: Thread::cooked_allocated_bytes() hits assert(left >= right) failed: avoid underflow

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4363/head:pull/4363
$ git checkout pull/4363

Update a local copy of the PR:
$ git checkout pull/4363
$ git pull https://git.openjdk.java.net/jdk pull/4363/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4363

View PR using the GUI difftool:
$ git pr show -t 4363

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4363.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 4, 2021

👋 Welcome back mgronlun! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 4, 2021
@openjdk
Copy link

openjdk bot commented Jun 4, 2021

@mgronlun The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 4, 2021
@mgronlun
Copy link
Author

mgronlun commented Jun 4, 2021

/label add hotspot-jfr

@openjdk openjdk bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label Jun 4, 2021
@openjdk
Copy link

openjdk bot commented Jun 4, 2021

@mgronlun
The hotspot-jfr label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jun 4, 2021

Webrevs

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up.

@openjdk
Copy link

openjdk bot commented Jun 5, 2021

@mgronlun This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8267579: Thread::cooked_allocated_bytes() hits assert(left >= right) failed: avoid underflow

Reviewed-by: dcubed, stefank, kbarrett

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 140 new commits pushed to the master branch:

  • 94d0b0f: 8268565: runtime/records/RedefineRecord.java should be run in driver mode
  • df65237: 8267930: Refine code for loading hsdis library
  • 2e900da: 8268574: ProblemList tests failing due to UseBiasedLocking going away
  • 4fd2a14: 8267556: Enhance class paths check during runtime
  • 8c8422e: 8267893: Improve jtreg test failure handler do get native/mixed stack traces for cores and live processes
  • 1e1039a: 8268223: Problemlist vmTestbase/nsk/jdi/HiddenClass/events/events001.java
  • 78cb677: 8268539: several serviceability/sa tests should be run in driver mode
  • 7267227: 8268361: Fix the infinite loop in next_line
  • b018c45: 8267630: Start of release updates for JDK 18
  • 7400789: 8268542: serviceability/logging/TestFullNames.java tests only 1st test case
  • ... and 130 more: https://git.openjdk.java.net/jdk/compare/59a539fef12dec6ba8af8a41000829402e7e9b72...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 5, 2021
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Markus,

So this fix simply unpacks used_bytes() to avoid calling pointer_delta with values that trigger the assertion. But given we see other problems with that assertion ie JDK-8268265, then could it not be that the assertion is in fact the problem? Is the assertion not making a usage assumption that is simply not valid?

Thanks,
David

Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a race-condition and will potentially fail just the same way the bug in #4224 did. There needs to be at least Atomic::loads of the the variables, or the compiler could convert the loads into multiple loads.

@dholmes-ora
Copy link
Member

@stefank has explained in the bug report why the assertion is not the issue - thanks. So I have to question the validity of just side-stepping the assertion without trying to fix the broken code. What are the implications of finding these broken invariants in the product code? Do we just report/print an incorrect value?

@mgronlun
Copy link
Author

mgronlun commented Jun 7, 2021

@dholmes-ora The existing mechanism is problematic in that it is racy, it reads the pointers being updated by another thread. It attempts to parry for reading problematic values by comparing the used_bytes against the max_tlab_size, to avoid reporting really offside values. A safer means would be to pick up and report only the bytes the owner of the tlab reported already (I..e _allocated_bytes), but this means the value reported will then trail by one tlab, and the impact of doing this is right now is unknown. It could also involve introducing some protocol in how the threads publish their values from their TLABs. All of this can be done of course, but I am not sure this can be addressed straight away. Since testing suffers because the existing behaviour that has been in place for years is now starting to hit the newly introduced assert, this is a workaround, not a fix. It's not clear what the real fix is, so maybe we can create another issue to attempt to figure that out.

@mlbridge
Copy link

mlbridge bot commented Jun 7, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 7/06/2021 8:16 pm, Markus Gr?nlund wrote:

On Mon, 7 Jun 2021 07:44:25 GMT, David Holmes <dholmes at openjdk.org> wrote:

This is a race-condition and will potentially fail just the same way the bug in #4224 did. There needs to be at least Atomic::loads of the the variables, or the compiler could convert the loads into multiple loads.

@stefank has explained in the bug report why the assertion is not the issue - thanks. So I have to question the validity of just side-stepping the assertion without trying to fix the broken code. What are the implications of finding these broken invariants in the product code? Do we just report/print an incorrect value?

@dholmes-ora The existing mechanism is broken in that it is racy, it reads the pointers being updated by another thread. It attempts to parry for reading problematic values by comparing the used_bytes against the max_tlab_size, to avoid reporting really offside values. A safer means would be to pick up and report only the bytes the owner of the tlab reported already (I..e _allocated_bytes), but this means the value reported will then trail by one tlab, and the impact of doing this is right now is unknown. It could also involve introducing some protocol in how the threads publish their values from their TLABs. All of this can be done of course, but I am not sure this can be addressed straight away. Since testing suffers because the existing behaviour that has been in place for years is now starting to hit the newly introduced assert, this is a workaround, not a fix. It's not clear what the real fix is, so maybe we can create another issue to attempt to figure that out.

Okay, please file an issue for the real issue.

Thanks,
David

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up.

Should the _start and _top fields be marked volatile to
prevent the S390 compiler from "optimizing" like it did with
another recent bug (JDK-8267842)?

@mgronlun
Copy link
Author

mgronlun commented Jun 8, 2021

@dcubed-ojdk Thanks Dan for looking at this. Hmm...having to introduce an explicit "volatile" keyword implies the "read-once" promise of Atomic::load() is not sufficient? I was told the side-effects of via Atomic::load(), which makes expressions volatile-qualified, are sufficient for read-once. But maybe not for all platforms?

If "volatile" decorations are required, I would prefer, granted it's sufficient, to only decorate the local variables. Having to decorate the tlab pointers could result in introducing unnecessary ordering constraints, i.e. for the TLAB owner, which could perhaps hurt performance?

@dcubed-ojdk
Copy link
Member

No problem. I believe that @dholmes-ora thinks the bug that we
fixed with JDK-8267842 is a compiler bug for S390.

@mgronlun
Copy link
Author

mgronlun commented Jun 9, 2021

@stefank @dholmes-ora Are you ok with the suggested fix? The decision to have the impl in the .cpp is to avoid having to include runtime/atomic.hpp in the .inline.hpp file unnecessarily (not deemed performance sensitive). Let me know if you prefer the impl in the .inline.hpp instead.

@@ -473,3 +474,11 @@ size_t ThreadLocalAllocBuffer::end_reserve() {
size_t reserve_size = Universe::heap()->tlab_alloc_reserve();
return MAX2(reserve_size, (size_t)_reserve_for_allocation_prefetch);
}

const HeapWord* ThreadLocalAllocBuffer::start_relaxed() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the "relaxed" terminology? By default accessors have no memory ordering properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

@mlbridge
Copy link

mlbridge bot commented Jun 10, 2021

Mailing list message from David Holmes on hotspot-jfr-dev:

On 9/06/2021 11:54 pm, Stefan Karlsson wrote:

On Wed, 9 Jun 2021 13:36:31 GMT, David Holmes <dholmes at openjdk.org> wrote:

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

read once

src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp line 478:

476: }
477:
478: const HeapWord* ThreadLocalAllocBuffer::start_relaxed() const {

Why do we need the "relaxed" terminology? By default accessors have no memory ordering properties.

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence. I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ?? Is the
difference between unordered and relaxed what you get from applying
volatile to the variable being accessed? So unordered from a h/w
perspective but with some constraints on the compiler to maintain
program order?

Thanks,
David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 10, 2021

Mailing list message from David Holmes on hotspot-jfr-dev:

On 9/06/2021 11:54 pm, Stefan Karlsson wrote:

On Wed, 9 Jun 2021 13:36:31 GMT, David Holmes <dholmes at openjdk.org> wrote:

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

read once

src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp line 478:

476: }
477:
478: const HeapWord* ThreadLocalAllocBuffer::start_relaxed() const {

Why do we need the "relaxed" terminology? By default accessors have no memory ordering properties.

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence. I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ?? Is the
difference between unordered and relaxed what you get from applying
volatile to the variable being accessed? So unordered from a h/w
perspective but with some constraints on the compiler to maintain
program order?

Thanks,
David
-----

@stefank
Copy link
Member

stefank commented Jun 10, 2021

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence.

This is not a GC-specific terminology, but a C++11 terminology:
https://en.cppreference.com/w/cpp/atomic/memory_order

I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ??

In atomic.hpp there are references to relaxed at the top of the file:

enum atomic_memory_order {
  // The modes that align with C++11 are intended to
  // follow the same semantics.
  memory_order_relaxed = 0,
  memory_order_acquire = 2,
  memory_order_release = 3,
  memory_order_acq_rel = 4,
  // Strong two-way memory barrier.
  memory_order_conservative = 8
};

But you are right, nothing here helps the reader understand that Atomic::load implies relaxed. This needs to be better documented.

Is the
difference between unordered and relaxed what you get from applying
volatile to the variable being accessed? So unordered from a h/w
perspective but with some constraints on the compiler to maintain
program order?

I talked to @fisk about this yesterday, and IIUC relaxed access are not supposed to be reordered by the HW, but volatile access could be.

Thanks,
David

@mlbridge
Copy link

mlbridge bot commented Jun 10, 2021

Mailing list message from David Holmes on hotspot-jfr-dev:

On 10/06/2021 4:55 pm, Stefan Karlsson wrote:

On Thu, 10 Jun 2021 04:35:50 GMT, David Holmes <david.holmes at oracle.com> wrote:

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence.

This is not a GC-specific terminology, but a C++11 terminology:
https://en.cppreference.com/w/cpp/atomic/memory_order

I meant in the context of the hotspot sources and its use in method names.

I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ??

In atomic.hpp there are references to relaxed at the top of the file:

enum atomic_memory_order {
// The modes that align with C++11 are intended to
// follow the same semantics.
memory_order_relaxed = 0,
memory_order_acquire = 2,
memory_order_release = 3,
memory_order_acq_rel = 4,
// Strong two-way memory barrier.
memory_order_conservative = 8
};

But you are right, nothing here helps the reader understand that Atomic::load implies relaxed. This needs to be better documented.

What am I missing - I can't see any memory barrier related instructions
associated with basic Atomic::load and store. Granted store needs
nothing on TSO and on non-TSO the barrier may be implicit in whatever
platform intrinsic is used; but loads would also need explicit barriers
to prevent reordering ??

David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 10, 2021

Mailing list message from David Holmes on hotspot-jfr-dev:

On 10/06/2021 4:55 pm, Stefan Karlsson wrote:

On Thu, 10 Jun 2021 04:35:50 GMT, David Holmes <david.holmes at oracle.com> wrote:

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence.

This is not a GC-specific terminology, but a C++11 terminology:
https://en.cppreference.com/w/cpp/atomic/memory_order

I meant in the context of the hotspot sources and its use in method names.

I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ??

In atomic.hpp there are references to relaxed at the top of the file:

enum atomic_memory_order {
// The modes that align with C++11 are intended to
// follow the same semantics.
memory_order_relaxed = 0,
memory_order_acquire = 2,
memory_order_release = 3,
memory_order_acq_rel = 4,
// Strong two-way memory barrier.
memory_order_conservative = 8
};

But you are right, nothing here helps the reader understand that Atomic::load implies relaxed. This needs to be better documented.

What am I missing - I can't see any memory barrier related instructions
associated with basic Atomic::load and store. Granted store needs
nothing on TSO and on non-TSO the barrier may be implicit in whatever
platform intrinsic is used; but loads would also need explicit barriers
to prevent reordering ??

David
-----

@stefank
Copy link
Member

stefank commented Jun 10, 2021

Mailing list message from David Holmes on hotspot-jfr-dev:

On 10/06/2021 4:55 pm, Stefan Karlsson wrote:

On Thu, 10 Jun 2021 04:35:50 GMT, David Holmes <david.holmes at oracle.com> wrote:

I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.

I see this terminology is used in some of the GC code but I was not
aware of its existence.

This is not a GC-specific terminology, but a C++11 terminology:
https://en.cppreference.com/w/cpp/atomic/memory_order

I meant in the context of the hotspot sources and its use in method names.

I guess the other parts of HotSpot needs to get on board with this terminology now that we have gone to C++11.

I also see nothing in atomic.hpp that states
that Atomic::load/store implement memory_order_relaxed ??

In atomic.hpp there are references to relaxed at the top of the file:
enum atomic_memory_order {
// The modes that align with C++11 are intended to
// follow the same semantics.
memory_order_relaxed = 0,
memory_order_acquire = 2,
memory_order_release = 3,
memory_order_acq_rel = 4,
// Strong two-way memory barrier.
memory_order_conservative = 8
};
But you are right, nothing here helps the reader understand that Atomic::load implies relaxed. This needs to be better documented.

What am I missing - I can't see any memory barrier related instructions
associated with basic Atomic::load and store. Granted store needs
nothing on TSO and on non-TSO the barrier may be implicit in whatever
platform intrinsic is used; but loads would also need explicit barriers
to prevent reordering ??

AFAIK, the code was written prior to C++11 and nothing else is needed on our currently supported platforms. Though I've heard stories about some platforms that probably would have to fix this. I've also heard @fisk argue that we should replace the Atomic::load implementation with the compiler's version of relaxed atomic loads.

David

@stefank
Copy link
Member

stefank commented Jun 10, 2021

Kim has been looking through the standards w.r.t. atomic and read-read coherence and has some insights. Could be worth reading:
https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-June/052745.html

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kimbarrett
Copy link

Kim has been looking through the standards w.r.t. atomic and read-read coherence and has some insights. Could be worth reading:
https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-June/052745.html

read-read coherence only applies to reads of the same atomic object. That doesn't apply to the case at hand.

@stefank
Copy link
Member

stefank commented Jun 11, 2021

Kim has been looking through the standards w.r.t. atomic and read-read coherence and has some insights. Could be worth reading:
https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-June/052745.html

read-read coherence only applies to reads of the same atomic object. That doesn't apply to the case at hand.

Thanks for pointing this out. FWIW, the discussion above strayed away from the actual contents PR, and started to talk about the name and semantics of Atomic::load / relaxed atomic loads, in a more generic sense than the suggested usage of it in the patch above.

@mgronlun
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 14, 2021

Going to push as commit c420735.
Since your change was applied there have been 147 commits pushed to the master branch:

  • 51f3427: 8266791: Annotation property which is compiled as an array property but changed to a single element throws NullPointerException
  • ba601b8: 8268520: VirtualSpace::print_on() should be const
  • 1ba4e0c: 8257038: Remove expired flags in JDK 18
  • 90c1034: 8268644: ProblemList serviceability/sa/ClhsdbJstackXcompStress.java in -Xcomp mode
  • 5cee23a: 8265518: C1: Intrinsic support for Preconditions.checkIndex
  • a466b49: 8267634: Update --release 17 symbol information for JDK 17 build 26
  • 49112fa: 8265909: build.tools.dtdbuilder.DTDBuilder.java failed detecting missing path of dtd_home
  • 94d0b0f: 8268565: runtime/records/RedefineRecord.java should be run in driver mode
  • df65237: 8267930: Refine code for loading hsdis library
  • 2e900da: 8268574: ProblemList tests failing due to UseBiasedLocking going away
  • ... and 137 more: https://git.openjdk.java.net/jdk/compare/59a539fef12dec6ba8af8a41000829402e7e9b72...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jun 14, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 14, 2021
@openjdk
Copy link

openjdk bot commented Jun 14, 2021

@mgronlun Pushed as commit c420735.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk
Copy link

openjdk bot commented Jun 14, 2021

@mgronlun Unknown command backport - for a list of valid commands use /help.

@openjdk
Copy link

openjdk bot commented Jun 14, 2021

@mgronlun Available commands:

  • covered - used when employer has signed the OCA
  • help - shows this text
  • signed - used after signing the OCA
  • test - used to run tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-jfr hotspot-jfr-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants