Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage #18979

Closed
wants to merge 65 commits into from

Conversation

jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Apr 26, 2024

Hi,

This PR introduces a new allocator, IndexedFreelistAllocator. It uses a GrowableArray in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names.

We then use this allocator in order to store the NativeCallStackStorage hash table. This saves 4 * 4099 = ~16KiB of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage.

The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help.

It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR.

The results are as follows on linux-x64-slowdebug:

Generate stacks... Done
Time taken with GrowableArray: 8341.240945
Time taken with CHeap: 12189.031318
Time taken with Arena: 8800.703092
Time taken with GrowableArray again: 8295.508829

And on linux-x64:

Time taken with GrowableArray: 8378.018135
Time taken with CHeap: 12437.347868
Time taken with Arena: 8758.064717
Time taken with GrowableArray again: 8391.076291

Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979
$ git checkout pull/18979

Update a local copy of the PR:
$ git checkout pull/18979
$ git pull https://git.openjdk.org/jdk.git pull/18979/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18979

View PR using the GUI difftool:
$ git pr show -t 18979

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18979.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 26, 2024

👋 Welcome back jsjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@jdksjolen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage

Reviewed-by: stuefe, azafari

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 240 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@jdksjolen The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Apr 26, 2024
@jdksjolen jdksjolen force-pushed the indexed-freelist-allocator branch from 558f3d6 to f275962 Compare June 5, 2024 14:54
@jdksjolen jdksjolen force-pushed the indexed-freelist-allocator branch from f275962 to 0a69068 Compare June 5, 2024 14:57
@jdksjolen jdksjolen changed the title WIP: Introduce a GrowableArray-backed free list allocator to Hotspot 8333658 Jun 5, 2024
@openjdk openjdk bot changed the title 8333658 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage Jun 5, 2024
@jdksjolen jdksjolen marked this pull request as ready for review June 5, 2024 16:16
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 5, 2024
@jdksjolen
Copy link
Contributor Author

An obvious improvement: Have the returned pointers remember which allocator object it was allocated from when in debug mode to avoid free:ing an element using the wrong allocator. You can also support detecting double frees by assigning each allocation a unique id.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started looking at this. I like this.

The smaller beauty spot of your implementation is that by relying on GrowableHeapArray, we are bound to the size of its index type, which is int. So we will never be able to hold more than 2G-1 entries. I think thats okay for now though, and if not, its easy to change in GrowableArray.

A bigger beauty spot is that it needs explicit initialization for every member (IIUC). That feels just wasteful, especially in combination with the generous hard-coded power-of-2 growth of GrowableArray.

} else {
// Follow the link to the next free element
free_start = be.link;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit uncommon and contrary to the normal linked list implementation. If possible, I would opt for a more traditional approach that is easier to understand and does not rely on at_grow (e.g. in case we want adapt this to some other form of backing storage).

e.g.

if (_freelist == -1) {
  allocate new slot   
} else {
  reuse slot at freelist
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that at_grow also could have the problem that an error, resulting in a high index, would cause a large allocation to happen.

I think this class could use an optional max value.

Copy link
Contributor Author

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas! Sorry for the messy code, I felt that it was in good enough shape to show people and see what they think. I'll clean it up and ping you when it's ready. I answered some of your comments also.

/ Johan

@tstuefe
Copy link
Member

tstuefe commented Jun 6, 2024

@jdksjolen Github is being weird again and shows all your comments twice. Bad Github.

@stefank
Copy link
Member

stefank commented Jun 24, 2024

Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous?

Would it make sense to put this in utilities/ instead of nmt/.

The include guards should be placed before the includes.

@jdksjolen
Copy link
Contributor Author

The include guards should be placed before the includes.

Thanks!

Would it make sense to put this in utilities/ instead of nmt/.

Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements?

Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous?

I agree on the name part. We could call it a HomogenousAllocator, that does differentiate it meaningfully from something like Arena which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense.

@tstuefe
Copy link
Member

tstuefe commented Jun 24, 2024

The include guards should be placed before the includes.

Thanks!

Would it make sense to put this in utilities/ instead of nmt/.

Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements?

Yes, I will use it to replace some code in Metaspace at least, but for that I need some more features (placing this thing atop of an existing memory range, and templatized index type).

Can all be done in a follow-up RFE.

Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous?

I agree on the name part. We could call it a HomogenousAllocator, that does differentiate it meaningfully from something like Arena which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense.

Names are hard.

We already have two types of arenas in hotspots, plus glibc has arenas, so "arena" is not ideal either. Arena does usually not imply a free list either, nor homogenous sizes.

Thinking about it, Array already sort of implies homogeneous sizes, so maybe "homogeneous" is redundant.

ArrayWithFreeList? Unsexy but precise.

@tstuefe
Copy link
Member

tstuefe commented Jun 24, 2024

Hi,

I've changed the data which is stored in HOA to be trivial and have added static_asserts to the methods. ... It's very important that the data we put into HOA is trivial as we do not call any destructors or copy constructors/operators on the code.

We don't, but why forbid the user from doing that? I may want to place a non-trivial object in there, e.g. one where I forbid copying altogether, or one with a single non-default ctor? What would be the problem with that?

Forbidding copying would be a good way to make sure the only instance exists at the place the array provides. So its address stable.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good.

alloc.deallocate(i1);
A::I i3 = alloc.allocate(0);
EXPECT_EQ(p1, &alloc.at(i3));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for what little the tests do, testing with different list types is overengineered. You could just scrap both LL and LL2, and just use a HOA directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to push back on that. I believe that while the list tests aren't testing that much they do serve as a form of example of how to use the HOA. It's also useful to have this in the form of a test, as opposed to a comment, as breaking changes to the HOA will break the examples, forcing them to be updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I am not sold on the "example" benefit. The array is quite easy in itself.

You could scratch the lists, since their implementations don't do anything to really test the AWFL (new acronym, yay). For the saved LOCs could expand the tests to test with a collection of various types, e.g.

  • u1, u2, u8
  • unaligned structures that need alignment (e.g. struct (void*; int; ))
  • trivial objects

Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.).

I wont insist of it, just thinking that the current test complexity is somewhat wasted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really test the AWFL (new acronym, yay).

Can't wait for templatized indices so I can write

using AWFUL = ArrayWithFreeListAllocator<int, uint8_t>;

Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.).

I wont insist of it, just thinking that the current test complexity is somewhat wasted.

I think those tests make sense if we replace the GrowableArrayas the backing memory area with something else, otherwise that's just testing the GA.

OK, how about we scrap the tests for when moving this to utilities and having the index being a part of the template? I've got that in the pipeline already, so it's not going to take too long I hope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me

@jdksjolen
Copy link
Contributor Author

The include guards should be placed before the includes.

Thanks!

Would it make sense to put this in utilities/ instead of nmt/.

Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements?

Yes, I will use it to replace some code in Metaspace at least, but for that I need some more features (placing this thing atop of an existing memory range, and templatized index type).

Can all be done in a follow-up RFE.

Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous?

I agree on the name part. We could call it a HomogenousAllocator, that does differentiate it meaningfully from something like Arena which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense.

Names are hard.

We already have two types of arenas in hotspots, plus glibc has arenas, so "arena" is not ideal either. Arena does usually not imply a free list either, nor homogenous sizes.

Thinking about it, Array already sort of implies homogeneous sizes, so maybe "homogeneous" is redundant.

ArrayWithFreeList? Unsexy but precise.

Agreed, let's go with ArrayWithFreeList.

Hi,
I've changed the data which is stored in HOA to be trivial and have added static_asserts to the methods. ... It's very important that the data we put into HOA is trivial as we do not call any destructors or copy constructors/operators on the code.

We don't, but why forbid the user from doing that? I may want to place a non-trivial object in there, e.g. one where I forbid copying altogether, or one with a single non-default ctor? What would be the problem with that?

Forbidding copying would be a good way to make sure the only instance exists at the place the array provides. So its address stable.

We can't have what you suggest at the moment, as objects aren't address stable without a fixed and non-growing memory area. Any form of constructor is actually fine, as those are called as they are.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 25, 2024
@jdksjolen
Copy link
Contributor Author

One last change: I've removed the e char[sizeof(E)] part of the union and made it E e. As the type is trivially copyable and trivially destructible this is fine. I realised that the alignment of I may be larger than the alignment of E if we're unlucky (storing a char for example) then that can lead to faulty behavior. We're now a bit safer.

Copy link
Contributor

@afshin-zafari afshin-zafari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my comments are applied.

@jdksjolen
Copy link
Contributor Author

Thank you for the in-depth reviewing of this change.

/integrate

@openjdk
Copy link

openjdk bot commented Jun 25, 2024

Going to push as commit 57f8b91.
Since your change was applied there have been 242 commits pushed to the master branch:

  • 6c67933: 8334899: Test runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java failed after JDK-8306580
  • cae94b2: 8334397: RISC-V: verify perf of ReverseBytesS/US
  • 75a2afa: 8248981: Specify list of standard message digest and mgf algorithms for RSASSA-PSS signature
  • baafa66: 8334287: Man page update for jstatd deprecation
  • c30e040: 8331911: Reconsider locking for recently disarmed nmethods
  • 974dca8: 8334223: Make Arena MEMFLAGs immutable
  • e527e1c: 8334580: Deprecate no-arg constructor BasicSliderUI() for removal
  • 3a26bbc: 8185429: [macos] After a modal dialog is closed, no window becomes active
  • 4b153e5: 8306580: Propagate CDS dumping errors instead of directly exiting the VM
  • 71a692a: 8321033: Avoid casting Array to GrowableArray
  • ... and 232 more: https://git.openjdk.org/jdk/compare/3944e67366601b6f748df1c5f93f184a7cb23ec3...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 25, 2024
@openjdk openjdk bot closed this Jun 25, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 25, 2024
@openjdk
Copy link

openjdk bot commented Jun 25, 2024

@jdksjolen Pushed as commit 57f8b91.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@SoniaZaldana
Copy link
Member

Hi @jdksjolen,
I'm seeing some build failures after this PR got integrated. I filed JDK-8335108 for reference.

@jdksjolen
Copy link
Contributor Author

Hi @jdksjolen, I'm seeing some build failures after this PR got integrated. I filed JDK-8335108 for reference.

Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the issue, change the -std=c++14 to -std=c++20 to see the issue: https://godbolt.org/z/eK5GaTceY

That ought to mean that I don't have to back out this PR and re-introduce it with the bug fixed, but I still think that we should fix the bug because we do want to get to C++20 some day :-).

There's something weird with the fact that you get builds with C++20 support enabled, not sure what your configure options are.

@SoniaZaldana
Copy link
Member

Hi @jdksjolen, thanks for looking into it!

Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the issue, change the -std=c++14 to -std=c++20 to see the issue: https://godbolt.org/z/eK5GaTceY

That ought to mean that I don't have to back out this PR and re-introduce it with the bug fixed, but I still think that we should fix the bug because we do want to get to C++20 some day :-).

Makes sense. I'll submit a follow-up PR to fix this.

There's something weird with the fact that you get builds with C++20 support enabled, not sure what your configure options are.

I'm not sure. I don't have any special configure options. I came across this error after a plain bash configure but I guess my installed g++ version is 14.1.1.

@mlbridge
Copy link

mlbridge bot commented Jun 26, 2024

Mailing list message from Thomas Stüfe on hotspot-dev:

Johan, I would just remove the noncopyable. Its questionable anyway, there
is no reason one shouldnt be able to copy the AWFL, in particular because
client code just uses indices, not pointers.

On Tue 25. Jun 2024 at 21:01, Sonia Zaldana Calles <szaldana at openjdk.org>
wrote:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240625/f693178c/attachment.htm>

@jdksjolen
Copy link
Contributor Author

Mailing list message from Thomas Stüfe on hotspot-dev:

Johan, I would just remove the noncopyable. Its questionable anyway, there is no reason one shouldnt be able to copy the AWFL, in particular because client code just uses indices, not pointers.

On Tue 25. Jun 2024 at 21:01, Sonia Zaldana Calles wrote:

-------------- next part -------------- An HTML attachment was scrubbed... URL: https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240625/f693178c/attachment.htm

Whichever is fine by me. The class will still be non-copyable however, as GrowableArrayCHeap already is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants