8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

cl4es · 2020-11-06T21:21:56Z

This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen.

To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (trivialMath), one "regular" (mixHashCode), and one largish ( largeMethod..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation:

Baseline:

Benchmark                                      Mode  Cnt     Score    Error  Units
SimpleRepeatCompilation.largeMethod_baseline     ss   10   168.919 ±  2.839  ms/op
SimpleRepeatCompilation.largeMethod_repeat       ss   10  8920.305 ± 40.531  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   153.961 ±  2.762  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  8242.061 ± 71.989  ms/op
SimpleRepeatCompilation.mixHashCode_baseline     ss   10    69.526 ±  7.098  ms/op
SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6733.627 ± 63.689  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   316.862 ± 29.682  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4544.604 ± 57.439  ms/op
SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.757 ±  1.553  ms/op
SimpleRepeatCompilation.trivialMath_repeat       ss   10   499.214 ± 35.984  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.345 ±  2.168  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   398.528 ±  4.718  ms/op

Patched:

Benchmark                                      Mode  Cnt     Score    Error  Units
SimpleRepeatCompilation.largeMethod_baseline     ss   10   164.355 ±  3.531  ms/op
SimpleRepeatCompilation.largeMethod_repeat       ss   10  8516.033 ± 22.408  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c1    ss   10   151.181 ± 12.869  ms/op
SimpleRepeatCompilation.largeMethod_repeat_c2    ss   10  7857.373 ± 52.826  ms/op
SimpleRepeatCompilation.mixHashCode_baseline     ss   10    65.085 ±  5.643  ms/op
SimpleRepeatCompilation.mixHashCode_repeat       ss   10  6601.693 ± 57.898  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c1    ss   10   315.845 ± 27.474  ms/op
SimpleRepeatCompilation.mixHashCode_repeat_c2    ss   10  4456.847 ± 30.459  ms/op
SimpleRepeatCompilation.trivialMath_baseline     ss   10    21.273 ±  2.115  ms/op
SimpleRepeatCompilation.trivialMath_repeat       ss   10   506.873 ± 18.994  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c1    ss   10   100.184 ±  3.008  ms/op
SimpleRepeatCompilation.trivialMath_repeat_c2    ss   10   397.010 ±  4.531  ms/op

This shows that there's no significant change on trivialMath, mixHashCode see a small improvement (~2%) and largeMethod see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition.

Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8221404: C2: Convert RegMask and IndexSet to use uintptr_t

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 38c6056
Tobias Hartmann (@TobiHartmann - Reviewer) ⚠️ Review applies to 38c6056

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1102/head:pull/1102
$ git checkout pull/1102

bridgekeeper · 2020-11-06T21:22:32Z

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-11-06T21:23:24Z

@cl4es The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

vnkozlov

Looks good in general.
You may want to compare RA times from -XX:+LogCompilation to see clear difference.

vnkozlov · 2020-11-06T21:45:42Z

src/hotspot/share/opto/regmask.cpp

@@ -83,9 +81,10 @@ int RegMask::num_registers(uint ireg) {
      case Op_VecA:
        assert(Matcher::supports_scalable_vector(), "does not support scalable vector");
        return SlotsPerVecA;
+      default:
+        // Op_VecS and the rest ideal registers.


Add assert to make sure we see only expected values here.

vnkozlov · 2020-11-06T21:52:49Z

src/hotspot/share/opto/indexSet.hpp

@@ -102,17 +102,17 @@ class IndexSet : public ResourceObj {
    // All of BitBlocks fields and methods are declared private.  We limit
    // access to IndexSet and IndexSetIterator.

-    // A BitBlock is composed of some number of 32 bit words.  When a BitBlock
+    // A BitBlock is composed of some number of 64 bit words.  When a BitBlock


63- or 32- bit words

cl4es · 2020-11-07T16:48:00Z

Using +CITime to get a breakdown of a sample run of Regalloc times for largeMethod_repeat_c2, baseline:

    C2 Compile Time:        8.731 s
...
       Regalloc:              4.759 s
         Ctor Chaitin:          0.000 s
         Build IFG (virt):      0.190 s
         Build IFG (phys):      1.523 s
         Compute Liveness:      0.235 s
         Regalloc Split:        0.284 s
         Postalloc Copy Rem:    0.283 s
         Merge multidefs:       0.011 s
         Fixup Spills:          0.012 s
         Compact:               0.005 s
         Coalesce 1:            0.127 s
         Coalesce 2:            0.002 s
         Coalesce 3:            0.747 s
         Cache LRG:             0.005 s
         Simplify:              0.375 s
         Select:                0.423 s
         Other:                 0.536 s

Patch:

    C2 Compile Time:        8.317 s
...
       Regalloc:              4.340 s
         Ctor Chaitin:          0.000 s
         Build IFG (virt):      0.162 s
         Build IFG (phys):      1.344 s
         Compute Liveness:      0.237 s
         Regalloc Split:        0.284 s
         Postalloc Copy Rem:    0.279 s
         Merge multidefs:       0.011 s
         Fixup Spills:          0.012 s
         Compact:               0.004 s
         Coalesce 1:            0.121 s
         Coalesce 2:            0.002 s
         Coalesce 3:            0.680 s
         Cache LRG:             0.005 s
         Simplify:              0.345 s
         Select:                0.362 s
         Other:                 0.490 s

Timings appear pretty stable from run-to-run. No significant change in other phases.

mlbridge · 2020-11-08T20:47:02Z

Webrevs

vnkozlov

Looks good.

vnkozlov · 2020-11-09T16:59:52Z

src/hotspot/share/opto/indexSet.hpp

+         // Each block consists of 256 bits
+         block_index_length = 8,
+         // Split over 4 or 8 words depending on bitness
+         word_index_length  = block_index_length - LogBitsPerWord,


Nice. I also thought about using ‘word’ definitions.

Thanks! @vidmik pushed me to come up with derived definitions here rather than adding another magic constant for 64-bit.

openjdk · 2020-11-09T17:02:30Z

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8221404: C2: Convert RegMask and IndexSet to use uintptr_t

Reviewed-by: kvn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

TobiHartmann

Looks good to me.

cl4es · 2020-11-10T16:47:55Z

/integrate

openjdk · 2020-11-10T16:48:43Z

@cl4es Pushed as commit 6ae5e5b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Reviewed-by: kvn, thartmann

cl4es added 3 commits November 6, 2020 21:20

C2: Convert RegMask and IndexSet to use uintptr_t

90a5d73

32-bit: Long -> Word

73c5842

32-bit compat: 63U -> BitsPerWord-1U

3e770f8

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 6, 2020

vnkozlov reviewed Nov 6, 2020

View reviewed changes

cl4es added 4 commits November 7, 2020 13:49

Fix comments from Vladimir and Mikael. A few additional cleanups.

225d3d2

ALL_BITS clash, rename constants.

9f86360

Improve bitfield comments

0de8fba

Merge branch 'master' into c2_uintptr_t

f91bc65

cl4es added 2 commits November 7, 2020 19:10

Fix and clarify low_bits

e46634f

unsigned overflow in find_last_elem (found by some tier6 tests)

fde1fc5

cl4es marked this pull request as ready for review November 8, 2020 20:43

openjdk bot added the rfr Pull request is ready for review label Nov 8, 2020

Avoid using ULL

38c6056

vnkozlov approved these changes Nov 9, 2020

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Nov 9, 2020

Merge branch 'master' into c2_uintptr_t

eb3851b

TobiHartmann approved these changes Nov 10, 2020

View reviewed changes

cl4es added 2 commits November 10, 2020 17:44

Copyrights

25392bf

Merge branch 'master' into c2_uintptr_t

3e29c4e

openjdk bot closed this Nov 10, 2020

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 10, 2020

openjdk-notifier bot referenced this pull request Nov 10, 2020

8221404: C2: Convert RegMask and IndexSet to use uintptr_t

6ae5e5b

Reviewed-by: kvn, thartmann

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

cl4es commented Nov 6, 2020 •

edited by openjdk bot

bridgekeeper bot commented Nov 6, 2020

openjdk bot commented Nov 6, 2020

vnkozlov left a comment

vnkozlov Nov 6, 2020

vnkozlov Nov 6, 2020

cl4es commented Nov 7, 2020

mlbridge bot commented Nov 8, 2020 •

edited

vnkozlov left a comment

vnkozlov Nov 9, 2020

cl4es Nov 9, 2020

openjdk bot commented Nov 9, 2020 •

edited

TobiHartmann left a comment

cl4es commented Nov 10, 2020

openjdk bot commented Nov 10, 2020

8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

8221404: C2: Convert RegMask and IndexSet to use uintptr_t #1102

Conversation

cl4es commented Nov 6, 2020 • edited by openjdk bot

Progress

Issue

Reviewers

Download

bridgekeeper bot commented Nov 6, 2020

openjdk bot commented Nov 6, 2020

vnkozlov left a comment

Choose a reason for hiding this comment

vnkozlov Nov 6, 2020

Choose a reason for hiding this comment

vnkozlov Nov 6, 2020

Choose a reason for hiding this comment

cl4es commented Nov 7, 2020

mlbridge bot commented Nov 8, 2020 • edited

Webrevs

vnkozlov left a comment

Choose a reason for hiding this comment

vnkozlov Nov 9, 2020

Choose a reason for hiding this comment

cl4es Nov 9, 2020

Choose a reason for hiding this comment

openjdk bot commented Nov 9, 2020 • edited

TobiHartmann left a comment

Choose a reason for hiding this comment

cl4es commented Nov 10, 2020

openjdk bot commented Nov 10, 2020

cl4es commented Nov 6, 2020 •

edited by openjdk bot

mlbridge bot commented Nov 8, 2020 •

edited

openjdk bot commented Nov 9, 2020 •

edited