-
Notifications
You must be signed in to change notification settings - Fork 6k
JDK-8270308: Arena::Amalloc may return misaligned address on 32-bit #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8270308: Arena::Amalloc may return misaligned address on 32-bit #4835
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good and minimal. I'll comment in the other PR. Maybe I pushed back to hard but these changes are somewhat difficult to review. It was an impressive amount of work and maybe we are wasting a lot of space in ResourceAreas since many are used to print Symbol->as_C_string(). But even so, they're mostly short lived.
Thank you for doing this change.
src/hotspot/share/memory/arena.hpp
Outdated
#ifndef LP64 // Since this is a hot path, do this only if really needed. | ||
_hwm = ARENA_ALIGN(_hwm); | ||
_hwm = MIN2(_hwm, _max); // _max is not guaranteed to be 64 bit aligned. | ||
#endif // !LP64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change I was thinking of with the conditional for LP64 as well so that 64 bit doesn't pay for the alignment test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, this makes sense that this would have not worked, since UseMallocOnly assumes that the arena is filled with pointers to the malloc memory.
I think it's better this way. This fix is easy to backport. And maybe we do a larger cleanup in later patches. Sorry for running forward with this one - Arenas have been a cause for a lot of frustration in our group, especially before OpenJDK (so, we never could give any changes back). Realistically, are you guys even interested in cleanups in this area? Since review time is limited, and there are many areas needing attention. |
We crash now in 32-bit with +UseMallocOnly. I'm investigating. This code is fragile :( |
Fixed a bug on 32-bit with +UseMallocOnly. The alignment must never happen in +UseMallocOnly mode since there, the arena is used to keep a packed pointer array and gaps are not helpful. Also added a cautionary comment about mixing Amalloc .. calls for arenas with tight packing. Should we ever clean this up, there are some possible improvements:
|
src/hotspot/share/memory/arena.hpp
Outdated
#ifndef LP64 // Since this is a hot path, do this only if really needed. | ||
_hwm = ARENA_ALIGN(_hwm); | ||
_hwm = MIN2(_hwm, _max); // _max is not guaranteed to be 64 bit aligned. | ||
#endif // !LP64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, this makes sense that this would have not worked, since UseMallocOnly assumes that the arena is filled with pointers to the malloc memory.
I don't know if I can speak for everybody but I think cleanups in this area would be welcome if they're incremental and we can plan for time to review them. This hasn't been an area of pain for us (that I know of), but if it has for SAP, maybe cleanups like allowing packing in arenas has a higher priority. |
I'll take a look (with medium priority) whether there are any memory savings possible or whether this is "just" cleanup. If it's the former, more effort may be worth it.
Thanks for the review, Thomas |
src/hotspot/share/memory/arena.hpp
Outdated
// like HandleAreas. Those areas should never mix Amalloc.. calls with differing alignment. | ||
#ifndef LP64 // Since this is a hot path, and on 64-bit Amalloc and AmallocWords are identical, restrict this alignment to 32-bit. | ||
_hwm = ARENA_ALIGN(_hwm); | ||
_hwm = MIN2(_hwm, _max); // _max is not guaranteed to be 64 bit aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may or may not work, depending on what the behavior for x == 0
is supposed to be. The current behavior would just return _hwm
without advancing it. But if _max
is not 64bit aligned and _hwm == _max
then these two steps will end up with _hwm == _max
again, and a not-64bit aligned result will ensue. I think better is, for 32bit platforms, allocate an extra word so _max
can be guaranteed to be 64bit aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or guarantee the chunk size is 64bit aligned and std::max_align_t is also (at least) 64bit aligned (which it almost certainly is), so that the result of malloc will always be (at least) 64bit aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree that _max and the chunk boundaries should be aligned to 64bit alignment. But as I wrote, that would cause more widespread changes. I would just duplicate the work I did with my first proposal #4784. Which already exists, and it does do all you requested. Maybe take a look at that? If you like it, I propose we put this minimal patch through and later use my first PR as a base for a better cleanup.
The current behavior for Amalloc(0) is non-malloc-standard: it returns a non-unique-non-NULL address (since the next allocation returns the same address). When I worked on the first proposal I tried to fix this too. First tried to return NULL, which does not work, callers assume that's an OOM. Then by returning a unique pointer, but I did not like the memory waste. The true fix would be to disallow size==0 and fix callers not to allocate 0 bytes. But I believe that memory allocated with size==0 is not used, since it would be overwritten by the next allocation.
I'll modify the patch and do a size==0->size=1 for 32bit only. Let's hope that does not blow up arena sizes too much.
About std::max_align_t, can we use that? I hard-coded malloc align in a number of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just assert that size > 0 for Amalloc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, since that is actually done. I would have to hunt down callers which do this. That is maybe worthwhile, but would be a larger change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version doesn't seem any better than the previous. It could still return a misaligned value in the same circumstance, just more slowly because of the additional branch.
Under what circumstances can _hwm not be 64bit aligned? What needs to be done to prevent it? It looks to me that if Arena::grow were to align its size argument, then it all falls out, assuming std::max_align_t is also appropriately aligned. Add asserts in Chunk allocation. Add a static assert that std::max_align_t is appropriately aligned and leave it to the person porting to the very weird platform to figure out what to do if it fails. And I think that's sufficient.
(While use of alignof and std::max_align_t are not currently discussed in the HotSpot Style Guide, I would have no objection to their use when the alternative is something more complicated and less robust. And feel free to propose corresponding post facto changes to style guide.)
@coleenp, @kimbarrett is this current version okay for you? |
src/hotspot/share/memory/arena.hpp
Outdated
// like HandleAreas. Those areas should never mix Amalloc.. calls with differing alignment. | ||
#ifndef LP64 // Since this is a hot path, and on 64-bit Amalloc and AmallocWords are identical, restrict this alignment to 32-bit. | ||
_hwm = ARENA_ALIGN(_hwm); | ||
_hwm = MIN2(_hwm, _max); // _max is not guaranteed to be 64 bit aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version doesn't seem any better than the previous. It could still return a misaligned value in the same circumstance, just more slowly because of the additional branch.
Under what circumstances can _hwm not be 64bit aligned? What needs to be done to prevent it? It looks to me that if Arena::grow were to align its size argument, then it all falls out, assuming std::max_align_t is also appropriately aligned. Add asserts in Chunk allocation. Add a static assert that std::max_align_t is appropriately aligned and leave it to the person porting to the very weird platform to figure out what to do if it fails. And I think that's sufficient.
(While use of alignof and std::max_align_t are not currently discussed in the HotSpot Style Guide, I would have no objection to their use when the alternative is something more complicated and less robust. And feel free to propose corresponding post facto changes to style guide.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still isn't right.
How? I must be blind.
What am I overlooking here?
_hwm is misaligned by the caller mixing Amalloc and AmallocWords on 32-bit. AmallocWords advances _hwm by word size. Then, it is not 64-bit aligned anymore. A subsequent Amalloc() call needs to return a 64-bit aligned pointer, so it needs to align _hwm up. A subsequent AmallocWords() call, however, would be fine with _hwm as it is, so the alignment should really be done at Amalloc(). We cannot just align the allocation size to always be 64-bit, since on an Arena that uses repeated calls to AmallocWords() (e.g., HandleArea) this would create allocation padding and waste memory or even crash if the arena cannot handle padding.
No, since this only ensures the chunk boundaries are properly aligned. The first call to AmallocWords - on a 32-bit platform - will misalign _hwm for potential follow-up calls to Amalloc(). In short, mixing allocation calls with different alignment requirements means the calls with higher alignment guarantees must align. Alternatively, you must ensure that for a given Arena, only one alignment is used - I proposed that alternative in my first patch. Make alignment property of the Arena and only allow allocations with that alignment. |
Sorry about the typo. I meant to ask, how can |
Oh, okay. I see what you mean. Arena::_max is not 64-bit aligned currently: Arena::_max depeds on the chunk length (the payload length). The chunk length is either handed in as initsize parameter to Or, chunk length is one of the default chunk lengths - I think both occurrences may be aligned to 64-bit, then we could remove the overflow check. I have to check. It would certainly be cleaner. |
In this new version, the chunk payload area - and therefore, Arena::_max - is aligned to 64-bit. In order to do this we need to make sure the chunk length given when creating a chunk is 64-bit aligned. There are three places to fix:
Tested this new version on x64 and x86 Ubuntu, works. I tested it also with my new gtests (see #4898), no problems found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better. But I think you missed a place.
If AmallocWords is called with a non-64bit-aligned value (it could be only 32bit aligned on a 32bit platform), and it calls grow(), grow will call the chunk allocator with that length, which fails the precondition because it's not BytesPerLong aligned. I think grow() needs to call ARENA_ALIGN on the length on 32bit platforms.
That also suggests an additional test.
I like the new comment for Chunk::operator new.
You are right. I even spelled it out in my last comment, but then did not fix it. I added a test, saw that it fires on 32-bit, then fixed Arena::grow(). I also limited all new tests to 32-bit to save some cycles on 64-bit. One remaining worry I have is that when mixing Amalloc and AmallocWords now we align correctly, that is fine. But if the Arena cannot handle gaps - e.g. HandleArea - difficult to analyze crashes can happen on 32-bit. The original cause would be that the author mixes Amalloc and AmallocWords for an Arena where he should stick to one alignment only. My first patch filled alignment gaps in debug with a pattern, to trip off the analyzer. What do you think, should I do this here too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thanks, @kimbarrett ! @coleenp are you ok with the latest version too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are small comments but maybe important.
@@ -52,11 +52,12 @@ class Chunk: CHeapObj<mtChunk> { | |||
enum { | |||
// default sizes; make them slightly smaller than 2**k to guard against | |||
// buddy-system style malloc implementations | |||
// Note: please keep these constants 64-bit aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a static_assert(is_aligned(slack, ARENA_ALIGNMENT)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
// | Chunk |a | Payload | | ||
// | |p | | | ||
// +-----------+--+--------------------------------------------+ | ||
// A B C D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray for ascii art!
@@ -0,0 +1,66 @@ | |||
/* | |||
* Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want an SAP copyright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add one. Probably should, thanks for noticing.
|
||
#ifndef LP64 | ||
// These tests below are about alignment issues when mixing Amalloc and AmallocWords. | ||
// Since on 64-bit these APIs offer the same alignment, they only matter for 32-bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for testing this on 32 bit platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I run Ubuntu 20.4, with Debian multi-arch it's beautifully easy to have 32-bit and 64-bit compilers side by side.
void* p1 = ar.AmallocWords(BytesPerWord); | ||
void* p2 = ar.Amalloc(BytesPerLong); | ||
ASSERT_TRUE(is_aligned(p1, BytesPerWord)); | ||
ASSERT_TRUE(is_aligned(p2, BytesPerLong)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should BytesPerLong in this test be ARENA_AMALLOC_ALIGNMENT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that.
src/hotspot/share/memory/arena.cpp
Outdated
@@ -188,6 +205,7 @@ void* Chunk::operator new (size_t requested_size, AllocFailType alloc_failmode, | |||
if (p == NULL && alloc_failmode == AllocFailStrategy::EXIT_OOM) { | |||
vm_exit_out_of_memory(bytes, OOM_MALLOC_ERROR, "Chunk::new"); | |||
} | |||
assert(is_aligned(p, BytesPerLong), "Chunk start address not malloc aligned?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ARENA_AMALLOC_ALIGNMENT, in case we have to change it back to 128 bits for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that too. I should try to set it to 128bits and test it.
With 128 bit we may run into alignment problems with 32-bit platforms, since we rely on ARENA_AMALLOC_ALIGNMENT being <= malloc alignment, and on 32-bit I believe malloc alignment is just 64 bit. Not a showstopper, but may require a bit more care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it would make an interesting test mode.
|
Yes, you are right, maybe that's sufficient. I thought that the zap pattern would be overwritten with the first allocation wave (allocate a bunch, reset-to-mark, allocate again) but I see reset-to-mark - ultimately Chunk::chop() - also zaps, so this may be fine.
Yes. But the default pattern may be fine already.
Sure! |
New version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks even better after dealing with Coleen's recent comments.
Thanks Coleen & Kim! /integrate |
@tstuefe This PR has not yet been marked as ready for integration. |
/integrate |
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 114 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Going to push as commit 45d277f.
Your commit was automatically rebased without conflicts. |
Hi,
may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit.
The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit:
This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: #4784 .
This fix is limited to
Amalloc()
and aligns_hwm
to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow_max
. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there.Remaining issues:
Amalloc...
align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the// ... for 32 bits this should align _hwm as well.
comment inAmalloc()
though I think it is wrong.Because of (2) and (3), I had to add the overflow check into
Amalloc()
- any other way to solve this would result in more widespread changes.Tests:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4835/head:pull/4835
$ git checkout pull/4835
Update a local copy of the PR:
$ git checkout pull/4835
$ git pull https://git.openjdk.java.net/jdk pull/4835/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4835
View PR using the GUI difftool:
$ git pr show -t 4835
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4835.diff