ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442

danberindei · 2016-07-04T10:59:09Z

https://issues.jboss.org/browse/ISPN-6805

rvansa · 2016-07-07T11:02:15Z

@danberindei Motivation? Benchmark results?

Note that as the Stripes are allocated in a loop, these probably lie on the same cache line. You can separate them using a hierarchy of classes (A with padding, B with variables, C with padding), or through @sun.misc.Contended but that requires users to set -XX:-RestrictContended (which makes @Contended unusable in the end).

wburns · 2016-07-07T13:17:39Z

I am also a bit interested into what was behind this. I know the allocation for the stripes would be less than all of those long adders, but is this solving some additional contention you saw as well?

danberindei · 2016-07-07T13:23:24Z

Motivation? Benchmark results?

The motivation is simple, JFR was showing a lot of CPU spent in LongAdder :) I didn't actually think I could do better, but apparently LongAdder is too slow to scale up the number of cells.

Some performance numbers for replicated mode reads, running on 8-way Opterons:

	10 threads	110 threads	210 threads	310 threads
LongAdder	12162261 reqs/s	12441921 reqs/s	12399660 reqs/s	12286400 reqs/s
Stripes	14168233 reqs/s	14781110 reqs/s	13964451 reqs/s	13664435 reqs/s

Note that as the Stripes are allocated in a loop, these probably lie on the same cache line.
You can separate them using a hierarchy of classes (A with padding, B with variables, C with padding), or through @sun.misc.Contended but that requires users to set -XX:-RestrictContended (which makes @contended unusable in the end).

I'm pretty sure you only need to use multiple classes when you want to separate 2 fields in the same instance. Regardless of how the JVM reorders fields, it's going to use the same layout for all the instances of a class, so it doesn't matter where exactly the padding is.

Anyway, I don't think padding is needed because cache lines are 64 bytes almost everywhere, and a Stripe is 80 bytes + the object header. I can add a few extra longs, but I only really care about hits and hitTimes, so the others already serve as padding :)

I'm a little more worried about the distribution of the thread id... if it turns out that on Windows or some other OS thread ids are all multiples of 8, that's going to cause some trouble ;)

rvansa · 2016-07-07T13:38:33Z

@danberindei Ok, so LongAdder does not generate enough stripes. You could rather isolate your code to another class than put it directly into CacheMgmtInterceptor.

Wrt padding - that means that one Stripe will be on line 0 and 1, and second on line 1 and 2. And you have a conflict on line 1. Could you try the benchmark with @Contended (activated with -XX:-RestrictContended)?

If you're worried about thread ids, you should use java.lang.Thread#threadLocalRandomProbe in the same way as LongAdder does (using Unsafe). That should be truly random.

slaskawi · 2016-07-26T10:45:38Z

LGTM. @wburns @rvansa are you ok with merging it?

slaskawi · 2016-07-26T10:46:27Z

Retriggering CI

rvansa · 2016-07-26T10:57:52Z

@slaskawi No, I am still not satisfied with the padding and I would rather use threadLocalRandomProbe than thread Id. Dan updated one of the commits but did not respond to my last comment (and did not address it in the updated commit).

tristantarrant · 2016-08-09T17:01:12Z

@danberindei are @rvansa 's comments addressed ?

danberindei · 2016-08-10T14:16:35Z

@danberindei Ok, so LongAdder does not generate enough stripes. You could rather isolate your code to another class than put it directly into CacheMgmtInterceptor.

I've updated the PR to add a generic StripedCounters class, but still, most of the code is CacheMgmtInterceptor-specific.

Wrt padding - that means that one Stripe will be on line 0 and 1, and second on line 1 and 2. And you have a conflict on line 1. Could you try the benchmark with @contended (activated with -XX:-RestrictContended)?

Yes, in theory you can have a conflict, but you usually don't, because a thread will only update 2 of the counters. Still, I read somewhere that Intel prefetches 2 cache lines at a time, so I've added some more fields for padding. I'm having some trouble with JMH at the moment, but I'll get back to you with some more benchmarks. (On my laptop the 2 versions seem much closer than I remember them, and a definitely a lot closer than on the Opterons...)

If you're worried about thread ids, you should use java.lang.Thread#threadLocalRandomProbe in the same way as LongAdder does (using Unsafe). That should be truly random.

That's not really an option, as we want to wean ourselves off of Unsafe for JDK9. Otherwise I could have used Unsafe for accessing the counter fields as well...

Apparently thread id allocation is platform-independent ATM, but I've added some bit twiddling for future-proofing.

rvansa · 2016-08-10T15:39:19Z

Yes, in theory you can have a conflict, but you usually don't, because a thread will only update 2 of the counters. Still, I read somewhere that Intel prefetches 2 cache lines at a time, so I've added some more fields for padding. I'm having some trouble with JMH at the moment, but I'll get back to you with some more benchmarks. (On my laptop the 2 versions seem much closer than I remember them, and a definitely a lot closer than on the Opterons...)

You still have conflicts in theory there, as the field ordering is not guaranteed in the class. If the padding gets in the middle and some hot fields end up on the beginning and end, there will be conflicts (note that while you've made the class occupy at least 16x8 bytes, it is won't be aligned on the cache line). I don't see why you're so much against the padding > fields > padding class hierarchy. But I won't block this PR; I wash my hands on this.

That's not really an option, as we want to wean ourselves off of Unsafe for JDK9. Otherwise I could have used Unsafe for accessing the counter fields as well...

OK, haven't considered JDK 9, that's a beast to come. Fine then.

rvansa · 2016-09-26T07:53:51Z

bin/failedTestLogs.sh

@@ -18,7 +18,7 @@ fi
 echo Processing $FILE

 if [ -z "$FAILED_TESTS" ] ; then
-  if $CAT $FILE | head -100 | grep -q TestSuiteProgress ; then
+  if $CAT $FILE | head -10000 | grep -q TestSuiteProgress ; then


Is this intentional?

Nope, it was not intentional :)

rvansa · 2016-09-26T07:59:29Z

I would use more meaningful name of the third commit, but otherwise I am satisfied with the padding.

The JVM is free to reorder fields inside a class, but not to mix fields from subclasses and superclasses.

danberindei · 2016-09-26T09:02:15Z

It took a while @rvansa, but I finally ran the local read benchmark with the extra padding you suggested and it seems to make a small, but visible difference:

	80 threads, my laptop	80 threads, cluster10
LongAdder	20243520 reqs/s	17190979 reqs/s
Stripes	22011241 reqs/s	17338521 reqs/s
Extra-padding	22212625 reqs/s	17444039 reqs/s

TBH I'm a bit surprised that I'm seeing a 10% difference on my machine now and almost no difference on cluster10, I'm pretty sure it was the other way around when I first made the change :)

I also got the benchmark results from @rmacor, and they still show ~10% improvement in replicated reads throughput:

	iteration 1	iteration 5
master	17536470 reqs/s	17853700 reqs/s
ISPN-6805 (without 3rd commit)	19913319 reqs/s	19860132 reqs/s

tristantarrant · 2016-09-27T07:25:19Z

@rvansa unless you have more comments, I am going to merge this

rvansa · 2016-09-27T07:48:56Z

@tristantarrant 👍

tristantarrant · 2016-09-27T08:27:59Z

Pushed to master, thank.

danberindei added the Ready for Review label Jul 4, 2016

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 4 times, most recently from 2a8aa7f to 9d7e3ef Compare July 6, 2016 11:59

wburns added the Question label Jul 7, 2016

danberindei removed the Question label Jul 7, 2016

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 3 times, most recently from b8d168f to 0affd6b Compare July 8, 2016 14:44

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch from 0affd6b to 3f2561b Compare July 19, 2016 23:01

slaskawi added Changes Required and removed Ready for Review labels Jul 26, 2016

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 3 times, most recently from 54cdaa3 to d7aaf6f Compare August 5, 2016 11:57

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 2 times, most recently from 8226291 to 8d7abc0 Compare August 10, 2016 14:08

rvansa added Changes Suggested and removed Changes Required labels Aug 10, 2016

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 5 times, most recently from 8bc84cc to 53d1165 Compare August 29, 2016 10:14

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 6 times, most recently from da52f05 to b207bad Compare September 8, 2016 17:04

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 2 times, most recently from 657c7ee to 7d38c71 Compare September 19, 2016 06:52

ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder

c976855

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch from 7d38c71 to c6536a5 Compare September 21, 2016 08:19

ISPN-6805 Extract StripedCounters, add padding, spread thread id

3484d57

rvansa reviewed Sep 26, 2016

View reviewed changes

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch 2 times, most recently from a29bccc to 2bc733d Compare September 26, 2016 07:55

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch from 2bc733d to 0a608a0 Compare September 26, 2016 08:38

ISPN-6805 Enforce padding with class hierarchy

0b9ac9e

The JVM is free to reorder fields inside a class, but not to mix fields from subclasses and superclasses.

danberindei force-pushed the ISPN-6805_CacheMgmtInterceptor branch from 0a608a0 to 0b9ac9e Compare September 26, 2016 08:40

danberindei added Ready for Review and removed Changes Suggested labels Sep 26, 2016

tristantarrant closed this Sep 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442

ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442

danberindei commented Jul 4, 2016

rvansa commented Jul 7, 2016 •

edited

wburns commented Jul 7, 2016

danberindei commented Jul 7, 2016

rvansa commented Jul 7, 2016

slaskawi commented Jul 26, 2016

slaskawi commented Jul 26, 2016

rvansa commented Jul 26, 2016

tristantarrant commented Aug 9, 2016

danberindei commented Aug 10, 2016

rvansa commented Aug 10, 2016

rvansa Sep 26, 2016

danberindei Sep 26, 2016

rvansa commented Sep 26, 2016

danberindei commented Sep 26, 2016

tristantarrant commented Sep 27, 2016

rvansa commented Sep 27, 2016

tristantarrant commented Sep 27, 2016

ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442

ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442

Conversation

danberindei commented Jul 4, 2016

rvansa commented Jul 7, 2016 • edited

wburns commented Jul 7, 2016

danberindei commented Jul 7, 2016

rvansa commented Jul 7, 2016

slaskawi commented Jul 26, 2016

slaskawi commented Jul 26, 2016

rvansa commented Jul 26, 2016

tristantarrant commented Aug 9, 2016

danberindei commented Aug 10, 2016

rvansa commented Aug 10, 2016

rvansa Sep 26, 2016

Choose a reason for hiding this comment

danberindei Sep 26, 2016

Choose a reason for hiding this comment

rvansa commented Sep 26, 2016

danberindei commented Sep 26, 2016

tristantarrant commented Sep 27, 2016

rvansa commented Sep 27, 2016

tristantarrant commented Sep 27, 2016

rvansa commented Jul 7, 2016 •

edited