-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8308766: TLAB initialization may cause div by zero #14121
8308766: TLAB initialization may cause div by zero #14121
Conversation
👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into |
Looking at where // ** sampling place ** //
size_t capacity = Universe::heap()->tlab_capacity(thread()) / HeapWordSize;
float alloc_frac = desired_size() * target_refills() / (float)capacity;
_allocation_fraction.sample(alloc_frac);
// ** where it's used ** //
// Compute the next tlab size using expected allocation amount
size_t alloc = (size_t)(_allocation_fraction.average() *
(Universe::heap()->tlab_capacity(thread()) / HeapWordSize)); |
Where the capacity is used, during the GC pause, in G1 There is a problem what to do during TLAB initialization when attaching a random thread: eden can be partially exhausted as it can happen at any time when the mutator is running: do you want to have Serial and parallel calculate it as if eden were empty, Shenandoah and Z seem to use total heap capacity (they're single-generational), and G1 uses the remaining eden capacity, with different effects. (Fwiw, if there is an issue with that logic, it is pre-existing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this does happen when a new thread comes at unfortunate time in VM lifecycle, like on shutdown? Anyway, the fix looks okay. I think many other versions are also affected, can you please add relevant Affected-Versions to the bug?
@tschatzl This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 123 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
In this case, yes, a thread is attached on shutdown and you can get weird failures in other FP code (if you also enable FP exceptions, but it can leave something in a weird state apparently). I think (well I hope) it is also the cause for another similar bug (*) that caused crashes in G1 extremely intermittently (that has been closed as CNR at that point after it stopped appearing). That assert that tripped is something that I added for trying to reproduce JDK-8264798, initially failed to do so, and then accidentally left in when testing another change.... I think it is worth cleaning up just in case. (*) That may just be wishful thinking... |
Added affects version back to JDK 8 since that code and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to Thomas' explanation, now I understand why it tracks the ratio instead of the actual alloc-amount. It's because (eden) capacity affects the distance btw two gc-pause (in STW GC), and alloc-amount is semi-proportional to gc-distance. Therefore, the ratio more or less reflects alloc-rate, which can be used to predict alloc-amount until the next gc-pause.
However, maintaining a constant number of refills btw gc-pauses seems an odd objective; preexisting issue.
Thanks @albertnetymk @shipilev for your reviews. /integrate |
Going to push as commit 96ed139.
Your commit was automatically rebased without conflicts. |
Hi all,
can I have reviews for this change that fixes an FP div by zero?
In
ThreadLocalAllocBuffer::initialize()
we initialize the TLAB using current available TLAB capacity for the thread. In G1, this can be zero in some situations, leading to that div by zero (see the CR for the crash when adding an assert).The suggested fix is to just not sample at this point. TLAB resizing will fix TLAB sizing up.
Only G1 seems to be affected as it seems to be the only gc that uses a dynamic value for the capacity available for TLAB allocation. Other GCs seem to just use total heap capacity (Z, Shenandoah) or eden capacity (Serial, Parallel).
Not sure if that is actually better and I think won't result in the expected behavior (every thread should reload TLABs
target_refills()
times per mutator time); since even with G1 at TLAB resizing time eden is maximal, this hiccup at initialization does not seem too bad.This may also be the cause for the behavior observed in https://bugs.openjdk.org/browse/JDK-8264798.
Testing: gha
Thanks,
Thomas
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14121/head:pull/14121
$ git checkout pull/14121
Update a local copy of the PR:
$ git checkout pull/14121
$ git pull https://git.openjdk.org/jdk.git pull/14121/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14121
View PR using the GUI difftool:
$ git pr show -t 14121
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14121.diff
Webrev
Link to Webrev Comment