Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8261302: NMT: Improve malloc site table hashing #2473

Closed

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Feb 9, 2021

While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved.

The current hash algorithm uses the 32bit masked sum of all stack entries as hash.

I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains.

The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp:

// Currently, (number of buckets / number of entires) ratio is

wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups.


With that out of the way, there are still small things we can improve about the hash function:

Since the vast majority of NativeCallStack objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make NativeCallStack::hash() a simple inline getter.

When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation.

See difference (linux x86):

Before:

Dump of assembler code for function NativeCallStack::hash() const:
=> 0x00007ffff68092f0 <+0>:     mov    0x20(%rdi),%eax        <load hash
   0x00007ffff68092f3 <+3>:     push   %rbp
   0x00007ffff68092f4 <+4>:     mov    %rsp,%rbp              
   0x00007ffff68092f7 <+7>:     test   %eax,%eax              <0?->X
   0x00007ffff68092f9 <+9>:     jne    0x7ffff6809324 <NativeCallStack::hash() const+52>
   0x00007ffff68092fb <+11>:    mov    (%rdi),%rdx
   0x00007ffff68092fe <+14>:    test   %rdx,%rdx
   0x00007ffff6809301 <+17>:    je     0x7ffff6809330 <NativeCallStack::hash() const+64>
   0x00007ffff6809303 <+19>:    mov    0x8(%rdi),%rax
   0x00007ffff6809307 <+23>:    test   %rax,%rax
   0x00007ffff680930a <+26>:    je     0x7ffff680931f <NativeCallStack::hash() const+47>
   0x00007ffff680930c <+28>:    add    %rax,%rdx
   0x00007ffff680930f <+31>:    mov    0x10(%rdi),%rax
   0x00007ffff6809313 <+35>:    test   %rax,%rax
   0x00007ffff6809316 <+38>:    je     0x7ffff680931f <NativeCallStack::hash() const+47>
   0x00007ffff6809318 <+40>:    add    %rax,%rdx
   0x00007ffff680931b <+43>:    add    0x18(%rdi),%rdx
   0x00007ffff680931f <+47>:    mov    %edx,%eax
   0x00007ffff6809321 <+49>:    mov    %edx,0x20(%rdi)
   0x00007ffff6809324 <+52>:    pop    %rbp                   <X
   0x00007ffff6809325 <+53>:    retq   
   0x00007ffff6809326 <+54>:    nopw   %cs:0x0(%rax,%rax,1)
   0x00007ffff6809330 <+64>:    xor    %edx,%edx
   0x00007ffff6809332 <+66>:    jmp    0x7ffff680931f <NativeCallStack::hash() const+47>

hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic.

With this patch, the NativeCallStack::hash() gets inlined at the call sites to a simple load.
The hash calculation gets now inlined into the constructor and uses a series of simple adds now:

Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool):
=> 0x00007ffff67cc9a0 <+0>:     push   %rbp
   0x00007ffff67cc9a1 <+1>:     mov    %rsp,%rbp
   0x00007ffff67cc9a4 <+4>:     push   %rbx
   0x00007ffff67cc9a5 <+5>:     mov    %rdi,%rbx
   0x00007ffff67cc9a8 <+8>:     sub    $0x8,%rsp
   0x00007ffff67cc9ac <+12>:    test   %dl,%dl
   0x00007ffff67cc9ae <+14>:    movl   $0x0,0x20(%rdi)
   0x00007ffff67cc9b5 <+21>:    jne    0x7ffff67cc9f0 <NativeCallStack::NativeCallStack(int, bool)+80>
   0x00007ffff67cc9b7 <+23>:    movq   $0x0,(%rdi)
   0x00007ffff67cc9be <+30>:    movq   $0x0,0x8(%rdi)
   0x00007ffff67cc9c6 <+38>:    movq   $0x0,0x10(%rdi)
   0x00007ffff67cc9ce <+46>:    movq   $0x0,0x18(%rdi)
   0x00007ffff67cc9d6 <+54>:    mov    0x10(%rbx),%rax   <<<
   0x00007ffff67cc9da <+58>:    add    0x8(%rbx),%rax    <<<    
   0x00007ffff67cc9de <+62>:    add    0x18(%rbx),%rax   <<<
   0x00007ffff67cc9e2 <+66>:    add    (%rbx),%rax       <<<
   0x00007ffff67cc9e5 <+69>:    mov    %eax,0x20(%rbx)
   0x00007ffff67cc9e8 <+72>:    add    $0x8,%rsp
   0x00007ffff67cc9ec <+76>:    pop    %rbx
   0x00007ffff67cc9ed <+77>:    pop    %rbp
   0x00007ffff67cc9ee <+78>:    retq   
   0x00007ffff67cc9ef <+79>:    nop
   0x00007ffff67cc9f0 <+80>:    mov    %esi,%edx
   0x00007ffff67cc9f2 <+82>:    mov    $0x4,%esi
   0x00007ffff67cc9f7 <+87>:    callq  0x7ffff68250d0 <os::get_native_stack(unsigned char**, int, int)>
   0x00007ffff67cc9fc <+92>:    jmp    0x7ffff67cc9d6 <NativeCallStack::NativeCallStack(int, bool)+54>
End of assembler dump.

Thanks, Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2473/head:pull/2473
$ git checkout pull/2473

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 9, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 9, 2021
@openjdk
Copy link

openjdk bot commented Feb 9, 2021

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 9, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 9, 2021

Webrevs

@tstuefe tstuefe force-pushed the JDK-8261302-nmt-malloc-site-table-tuning branch from b8e11af to 309d6c0 Compare February 9, 2021 07:26
Copy link
Contributor

@zhengyu123 zhengyu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Feb 9, 2021

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261302: NMT: Improve malloc site table hashing

Reviewed-by: zgu, lucy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 20 new commits pushed to the master branch:

  • ad54d8d: 8260934: java/lang/StringBuilder/HugeCapacity.java fails without Compact Strings
  • 752f92b: 6211242: AreaAveragingScaleFilter(int, int): IAE is not specified
  • 3af334a: 6211257: BasicStroke.createStrokedShape(Shape): NPE is not specified
  • 4619f37: 8261472: BasicConstraintsExtension::toString shows "PathLen:2147483647" if there is no pathLenConstraint
  • 699a3cd: 8223188: Removed unnecessary #ifdef __cplusplus from .cpp sources
  • 05c6009: 8259656: fixpath.sh changes broke _NT_SYMBOL_PATH in RunTests.gmk
  • ef7ee3f: 8225081: Remove Telia Company CA certificate expiring in April 2021
  • 7c565f8: 8261209: isStandalone property: remove dependency on pretty-print
  • 01d9280: 8261299: Use-after-free on failure path in LinuxPackage.c, getJvmLauncherLibPath
  • a00b130: 8261356: Clean up enum G1Mark
  • ... and 10 more: https://git.openjdk.java.net/jdk/compare/5183d8ae1eec86202eace2c4770f81edbc73cb68...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 9, 2021
@tstuefe
Copy link
Member Author

tstuefe commented Feb 9, 2021

Looks good to me.

Thanks Zhengyu.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the one comment I added (it's not a "must", it's a "please").

@@ -28,6 +28,14 @@
#include "utilities/globalDefinitions.hpp"
#include "utilities/nativeCallStack.hpp"

static unsigned calculate_hash(address stack[NMT_TrackingStackDepth]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it doesn't change anything semantically, but I'd like to see the int type specifier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change it before pushing.

@tstuefe
Copy link
Member Author

tstuefe commented Feb 10, 2021

LGTM, except for the one comment I added (it's not a "must", it's a "please").

Thanks Lucy!

@tstuefe
Copy link
Member Author

tstuefe commented Feb 10, 2021

/integrate

@openjdk openjdk bot closed this Feb 10, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 10, 2021
@openjdk
Copy link

openjdk bot commented Feb 10, 2021

@tstuefe Since your change was applied there have been 20 commits pushed to the master branch:

  • ad54d8d: 8260934: java/lang/StringBuilder/HugeCapacity.java fails without Compact Strings
  • 752f92b: 6211242: AreaAveragingScaleFilter(int, int): IAE is not specified
  • 3af334a: 6211257: BasicStroke.createStrokedShape(Shape): NPE is not specified
  • 4619f37: 8261472: BasicConstraintsExtension::toString shows "PathLen:2147483647" if there is no pathLenConstraint
  • 699a3cd: 8223188: Removed unnecessary #ifdef __cplusplus from .cpp sources
  • 05c6009: 8259656: fixpath.sh changes broke _NT_SYMBOL_PATH in RunTests.gmk
  • ef7ee3f: 8225081: Remove Telia Company CA certificate expiring in April 2021
  • 7c565f8: 8261209: isStandalone property: remove dependency on pretty-print
  • 01d9280: 8261299: Use-after-free on failure path in LinuxPackage.c, getJvmLauncherLibPath
  • a00b130: 8261356: Clean up enum G1Mark
  • ... and 10 more: https://git.openjdk.java.net/jdk/compare/5183d8ae1eec86202eace2c4770f81edbc73cb68...master

Your commit was automatically rebased without conflicts.

Pushed as commit a3d6e37.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8261302-nmt-malloc-site-table-tuning branch February 11, 2021 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants