-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8267579: Thread::cooked_allocated_bytes() hits assert(left >= right) failed: avoid underflow #4363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back mgronlun! A progress list of the required criteria for merging this PR into |
/label add hotspot-jfr |
@mgronlun |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
@mgronlun This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 140 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Markus,
So this fix simply unpacks used_bytes() to avoid calling pointer_delta with values that trigger the assertion. But given we see other problems with that assertion ie JDK-8268265, then could it not be that the assertion is in fact the problem? Is the assertion not making a usage assumption that is simply not valid?
Thanks,
David
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a race-condition and will potentially fail just the same way the bug in #4224 did. There needs to be at least Atomic::loads of the the variables, or the compiler could convert the loads into multiple loads.
@stefank has explained in the bug report why the assertion is not the issue - thanks. So I have to question the validity of just side-stepping the assertion without trying to fix the broken code. What are the implications of finding these broken invariants in the product code? Do we just report/print an incorrect value? |
@dholmes-ora The existing mechanism is problematic in that it is racy, it reads the pointers being updated by another thread. It attempts to parry for reading problematic values by comparing the used_bytes against the max_tlab_size, to avoid reporting really offside values. A safer means would be to pick up and report only the bytes the owner of the tlab reported already (I..e _allocated_bytes), but this means the value reported will then trail by one tlab, and the impact of doing this is right now is unknown. It could also involve introducing some protocol in how the threads publish their values from their TLABs. All of this can be done of course, but I am not sure this can be addressed straight away. Since testing suffers because the existing behaviour that has been in place for years is now starting to hit the newly introduced assert, this is a workaround, not a fix. It's not clear what the real fix is, so maybe we can create another issue to attempt to figure that out. |
Mailing list message from David Holmes on hotspot-runtime-dev: On 7/06/2021 8:16 pm, Markus Gr?nlund wrote:
Okay, please file an issue for the real issue. Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up.
Should the _start
and _top
fields be marked volatile
to
prevent the S390 compiler from "optimizing" like it did with
another recent bug (JDK-8267842)?
@dcubed-ojdk Thanks Dan for looking at this. Hmm...having to introduce an explicit "volatile" keyword implies the "read-once" promise of Atomic::load() is not sufficient? I was told the side-effects of via Atomic::load(), which makes expressions volatile-qualified, are sufficient for read-once. But maybe not for all platforms? If "volatile" decorations are required, I would prefer, granted it's sufficient, to only decorate the local variables. Having to decorate the tlab pointers could result in introducing unnecessary ordering constraints, i.e. for the TLAB owner, which could perhaps hurt performance? |
No problem. I believe that @dholmes-ora thinks the bug that we |
@stefank @dholmes-ora Are you ok with the suggested fix? The decision to have the impl in the .cpp is to avoid having to include runtime/atomic.hpp in the .inline.hpp file unnecessarily (not deemed performance sensitive). Let me know if you prefer the impl in the .inline.hpp instead. |
@@ -473,3 +474,11 @@ size_t ThreadLocalAllocBuffer::end_reserve() { | |||
size_t reserve_size = Universe::heap()->tlab_alloc_reserve(); | |||
return MAX2(reserve_size, (size_t)_reserve_for_allocation_prefetch); | |||
} | |||
|
|||
const HeapWord* ThreadLocalAllocBuffer::start_relaxed() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the "relaxed" terminology? By default accessors have no memory ordering properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you are asking, but Atomic::load implements the "relaxed" semantics equivalent to using MO_RELAXED in the Access API.
Mailing list message from David Holmes on hotspot-jfr-dev: On 9/06/2021 11:54 pm, Stefan Karlsson wrote:
I see this terminology is used in some of the GC code but I was not Thanks, |
1 similar comment
Mailing list message from David Holmes on hotspot-jfr-dev: On 9/06/2021 11:54 pm, Stefan Karlsson wrote:
I see this terminology is used in some of the GC code but I was not Thanks, |
This is not a GC-specific terminology, but a C++11 terminology:
In atomic.hpp there are references to relaxed at the top of the file:
But you are right, nothing here helps the reader understand that Atomic::load implies relaxed. This needs to be better documented.
I talked to @fisk about this yesterday, and IIUC relaxed access are not supposed to be reordered by the HW, but volatile access could be.
|
Mailing list message from David Holmes on hotspot-jfr-dev: On 10/06/2021 4:55 pm, Stefan Karlsson wrote:
I meant in the context of the hotspot sources and its use in method names.
What am I missing - I can't see any memory barrier related instructions David |
1 similar comment
Mailing list message from David Holmes on hotspot-jfr-dev: On 10/06/2021 4:55 pm, Stefan Karlsson wrote:
I meant in the context of the hotspot sources and its use in method names.
What am I missing - I can't see any memory barrier related instructions David |
I guess the other parts of HotSpot needs to get on board with this terminology now that we have gone to C++11.
AFAIK, the code was written prior to C++11 and nothing else is needed on our currently supported platforms. Though I've heard stories about some platforms that probably would have to fix this. I've also heard @fisk argue that we should replace the Atomic::load implementation with the compiler's version of relaxed atomic loads.
|
Kim has been looking through the standards w.r.t. atomic and read-read coherence and has some insights. Could be worth reading: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
read-read coherence only applies to reads of the same atomic object. That doesn't apply to the case at hand. |
Thanks for pointing this out. FWIW, the discussion above strayed away from the actual contents PR, and started to talk about the name and semantics of Atomic::load / relaxed atomic loads, in a more generic sense than the suggested usage of it in the patch above. |
/integrate |
Going to push as commit c420735.
Your commit was automatically rebased without conflicts. |
@mgronlun Unknown command |
@mgronlun Available commands:
|
This is a workaround to avoid hitting the assert that was recently added to pointer_delta().
The implementation of cooked_allocated_bytes() is perhaps questionable, in that it reads tlab pointers optimistically. However, the functionality has been in place for a long time, and the impact of changing its behavior more substantially is unknown at this time, hence this defensive workaround to reduce noise and problems seen in general testing.
Testing: tier1, tier2, tier6, tier8
Thanks
Markus
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4363/head:pull/4363
$ git checkout pull/4363
Update a local copy of the PR:
$ git checkout pull/4363
$ git pull https://git.openjdk.java.net/jdk pull/4363/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4363
View PR using the GUI difftool:
$ git pr show -t 4363
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4363.diff