Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8276618: Pad cacheline for Thread::_rcu_counter #6246

Closed
wants to merge 1 commit into from

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Nov 4, 2021

Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.

The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015.

========= test result (1st round) ==========
rcu base
45096 38980
41741 41468
42349 41053
44485 42030
47103 39915
43864 36004

==== average ====
44106.33333 39908.33333

==== improvement ====
10.5%

========= test result (2nd round) ==========
Second round of run includes 3 types:

  1. pad gc data & pad rcu
  2. pad rcu only
  3. base

Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement.

gc data & rcu rcu base
41284 41860 37099
42296 42166 44692
42810 43423 41801
43492 45603 40274
43808 40641 39627
43029 40242 39793
42543 41662 41544
43420 42702 37991
44212 43354 40319
42692 43442 45264
44773 44577 44213
40835 41870 42008
44282 44167 42527

==== average ====
43036.61538 42746.84615 41319.38462

==== improvement ====
gc data + rcu / base: 4.156%
rcu / base: 3.45%

========= configuration and environment ==========
specjbb arguments:
GROUP_COUNT=4
TI_JVM_COUNT=1

SPEC_OPTS_C="-Dspecjbb.group.count=$GROUP_COUNT -Dspecjbb.txi.pergroup.count=$TI_JVM_COUNT"
SPEC_OPTS_TI=""
SPEC_OPTS_BE=""

JAVA_OPTS_C="-server -Xms2g -Xmx2g -XX:+UseParallelGC"
JAVA_OPTS_TI="-server -Xms2g -Xmx2g -XX:+UseParallelGC"
JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"

MODE_ARGS_C="-ikv"
MODE_ARGS_TI="-ikv"
MODE_ARGS_BE="-ikv"

NUM_OF_RUNS=1

HW:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz
Stepping: 4
CPU MHz: 1001.925
CPU max MHz: 2101.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 39424K
NUMA node0 CPU(s): 0-27,112-139
NUMA node1 CPU(s): 28-55,140-167
NUMA node2 CPU(s): 56-83,168-195
NUMA node3 CPU(s): 84-111,196-223

          total        used        free      shared  buff/cache   available

Mem: 3.0T 3.8G 2.9T 18M 25G 2.9T
Swap: 99G 0B 99G


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6246/head:pull/6246
$ git checkout pull/6246

Update a local copy of the PR:
$ git checkout pull/6246
$ git pull https://git.openjdk.java.net/jdk pull/6246/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6246

View PR using the GUI difftool:
$ git pr show -t 6246

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6246.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Nov 4, 2021

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Nov 4, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 4, 2021

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Nov 4, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Nov 4, 2021

Webrevs

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 4, 2021

/cc hotspot-gc

@openjdk openjdk bot added the hotspot-gc label Nov 4, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 4, 2021

@Hamlin-Li
The hotspot-gc label was successfully added.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Hamlin,

This seems reasonable to me, however whenever we add padding to optimise the placement of one field, I always wonder if that same padding has de-optimised the placement of other fields? I think we need to see a broader run of benchmarks here and across more than just x86_64.

I will see if I can assist on the benchmark front.

Thanks,
David

@@ -250,10 +251,10 @@ class Thread: public ThreadShadow {

// Support for GlobalCounter
private:
Copy link
Member

@dholmes-ora dholmes-ora Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing nit: this private is not needed; nor is the public at line 260.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 4, 2021

Thanks a lot David, it will be very helpful.
BTW, I will modify as you suggested later together with other's comments.

Copy link
Contributor

@tschatzl tschatzl left a comment

I'll push it through our perf testing.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 4, 2021

Thanks a lot Thomas. :)

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 8, 2021

Hi,

we tried to reproduce your numbers internally, but failed to do so. Differences seem to be within noise.

We tried with specjbb2015 multi-jvm on a fairly large machine (152 threads; it was what has been "on hand") and multiple runs of our internal benchmarks and specjbb2015 composite runs on various (not-so-large but still fairly big sized machines).

Could you post or send more details about your configuration?

The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"?

Thanks,
Thomas

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 8, 2021

Thanks Thomas for the feedback.

I have updated the summary of this PR with more configuration and environment info, and I also updated the 2nd round of run. Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement, and seems the data is more stable than the 1st round of run.
(JBS is not available currently, will update JBS too later)

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Nov 8, 2021

The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"?

Sure, if current change is proven to bring some performance benefit, let me do some more research in this direction.

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 6, 2021

@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 17, 2021

need further investigation.

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 14, 2022

@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 11, 2022

@Hamlin-Li This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-gc rfr
3 participants