Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

Closed
wants to merge 13 commits into from

Conversation

cl4es
Copy link
Member

@cl4es cl4es commented Nov 6, 2020

This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen.

To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (trivialMath), one "regular" (mixHashCode), and one largish ( largeMethod..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation:

Baseline:

Benchmark                                      Mode  Cnt     Score    Error  Units
SimpleRepeatCompilation.largeMethod_baseline     ss   10   168.919 ±  2.839  ms/op
SimpleRepeatCompilation.largeMethod_repeat       ss   10  8920.305 ± 40.531  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   153.961 ±  2.762  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  8242.061 ± 71.989  ms/op
SimpleRepeatCompilation.mixHashCode_baseline     ss   10    69.526 ±  7.098  ms/op
SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6733.627 ± 63.689  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   316.862 ± 29.682  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4544.604 ± 57.439  ms/op
SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.757 ±  1.553  ms/op
SimpleRepeatCompilation.trivialMath_repeat       ss   10   499.214 ± 35.984  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.345 ±  2.168  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   398.528 ±  4.718  ms/op

Patched:

Benchmark                                      Mode  Cnt     Score    Error  Units
SimpleRepeatCompilation.largeMethod_baseline     ss   10   164.355 ±  3.531  ms/op
SimpleRepeatCompilation.largeMethod_repeat       ss   10  8516.033 ± 22.408  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   151.181 ± 12.869  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  7857.373 ± 52.826  ms/op
SimpleRepeatCompilation.mixHashCode_baseline     ss   10    65.085 ±  5.643  ms/op
SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6601.693 ± 57.898  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   315.845 ± 27.474  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4456.847 ± 30.459  ms/op
SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.273 ±  2.115  ms/op
SimpleRepeatCompilation.trivialMath_repeat       ss   10   506.873 ± 18.994  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.184 ±  3.008  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   397.010 ±  4.531  ms/op

This shows that there's no significant change on trivialMath, mixHashCode see a small improvement (~2%) and largeMethod see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition.

Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8221404: C2: Convert RegMask and IndexSet to use uintptr_t

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1102/head:pull/1102
$ git checkout pull/1102

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 6, 2020

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 6, 2020

@cl4es The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 6, 2020
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general.
You may want to compare RA times from -XX:+LogCompilation to see clear difference.

@@ -83,9 +81,10 @@ int RegMask::num_registers(uint ireg) {
case Op_VecA:
assert(Matcher::supports_scalable_vector(), "does not support scalable vector");
return SlotsPerVecA;
default:
// Op_VecS and the rest ideal registers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert to make sure we see only expected values here.

@@ -102,17 +102,17 @@ class IndexSet : public ResourceObj {
// All of BitBlocks fields and methods are declared private. We limit
// access to IndexSet and IndexSetIterator.

// A BitBlock is composed of some number of 32 bit words. When a BitBlock
// A BitBlock is composed of some number of 64 bit words. When a BitBlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

63- or 32- bit words

@cl4es
Copy link
Member Author

cl4es commented Nov 7, 2020

Using +CITime to get a breakdown of a sample run of Regalloc times for largeMethod_repeat_c2, baseline:

    C2 Compile Time:        8.731 s
...
       Regalloc:              4.759 s
         Ctor Chaitin:          0.000 s
         Build IFG (virt):      0.190 s
         Build IFG (phys):      1.523 s
         Compute Liveness:      0.235 s
         Regalloc Split:        0.284 s
         Postalloc Copy Rem:    0.283 s
         Merge multidefs:       0.011 s
         Fixup Spills:          0.012 s
         Compact:               0.005 s
         Coalesce 1:            0.127 s
         Coalesce 2:            0.002 s
         Coalesce 3:            0.747 s
         Cache LRG:             0.005 s
         Simplify:              0.375 s
         Select:                0.423 s
         Other:                 0.536 s

Patch:

    C2 Compile Time:        8.317 s
...
       Regalloc:              4.340 s
         Ctor Chaitin:          0.000 s
         Build IFG (virt):      0.162 s
         Build IFG (phys):      1.344 s
         Compute Liveness:      0.237 s
         Regalloc Split:        0.284 s
         Postalloc Copy Rem:    0.279 s
         Merge multidefs:       0.011 s
         Fixup Spills:          0.012 s
         Compact:               0.004 s
         Coalesce 1:            0.121 s
         Coalesce 2:            0.002 s
         Coalesce 3:            0.680 s
         Cache LRG:             0.005 s
         Simplify:              0.345 s
         Select:                0.362 s
         Other:                 0.490 s

Timings appear pretty stable from run-to-run. No significant change in other phases.

@cl4es cl4es marked this pull request as ready for review November 8, 2020 20:43
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 8, 2020
@mlbridge
Copy link

mlbridge bot commented Nov 8, 2020

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

// Each block consists of 256 bits
block_index_length = 8,
// Split over 4 or 8 words depending on bitness
word_index_length = block_index_length - LogBitsPerWord,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I also thought about using ‘word’ definitions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @vidmik pushed me to come up with derived definitions here rather than adding another magic constant for 64-bit.

@openjdk
Copy link

openjdk bot commented Nov 9, 2020

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8221404: C2: Convert RegMask and IndexSet to use uintptr_t

Reviewed-by: kvn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 9, 2020
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@cl4es
Copy link
Member Author

cl4es commented Nov 10, 2020

/integrate

@openjdk openjdk bot closed this Nov 10, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 10, 2020
@openjdk
Copy link

openjdk bot commented Nov 10, 2020

@cl4es Pushed as commit 6ae5e5b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
3 participants