New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442
ISPN-6805 CacheMgmtInterceptor: use a simplified LongAdder #4442
Conversation
2a8aa7f
to
9d7e3ef
Compare
@danberindei Motivation? Benchmark results? Note that as the Stripes are allocated in a loop, these probably lie on the same cache line. You can separate them using a hierarchy of classes ( |
I am also a bit interested into what was behind this. I know the allocation for the stripes would be less than all of those long adders, but is this solving some additional contention you saw as well? |
The motivation is simple, JFR was showing a lot of CPU spent in Some performance numbers for replicated mode reads, running on 8-way Opterons:
I'm pretty sure you only need to use multiple classes when you want to separate 2 fields in the same instance. Regardless of how the JVM reorders fields, it's going to use the same layout for all the instances of a class, so it doesn't matter where exactly the padding is. Anyway, I don't think padding is needed because cache lines are 64 bytes almost everywhere, and a I'm a little more worried about the distribution of the thread id... if it turns out that on Windows or some other OS thread ids are all multiples of 8, that's going to cause some trouble ;) |
@danberindei Ok, so LongAdder does not generate enough stripes. You could rather isolate your code to another class than put it directly into CacheMgmtInterceptor. Wrt padding - that means that one Stripe will be on line 0 and 1, and second on line 1 and 2. And you have a conflict on line 1. Could you try the benchmark with If you're worried about thread ids, you should use java.lang.Thread#threadLocalRandomProbe in the same way as LongAdder does (using Unsafe). That should be truly random. |
b8d168f
to
0affd6b
Compare
0affd6b
to
3f2561b
Compare
Retriggering CI |
@slaskawi No, I am still not satisfied with the padding and I would rather use |
54cdaa3
to
d7aaf6f
Compare
@danberindei are @rvansa 's comments addressed ? |
8226291
to
8d7abc0
Compare
I've updated the PR to add a generic
Yes, in theory you can have a conflict, but you usually don't, because a thread will only update 2 of the counters. Still, I read somewhere that Intel prefetches 2 cache lines at a time, so I've added some more fields for padding. I'm having some trouble with JMH at the moment, but I'll get back to you with some more benchmarks. (On my laptop the 2 versions seem much closer than I remember them, and a definitely a lot closer than on the Opterons...)
That's not really an option, as we want to wean ourselves off of Apparently thread id allocation is platform-independent ATM, but I've added some bit twiddling for future-proofing. |
You still have conflicts in theory there, as the field ordering is not guaranteed in the class. If the padding gets in the middle and some hot fields end up on the beginning and end, there will be conflicts (note that while you've made the class occupy at least 16x8 bytes, it is won't be aligned on the cache line). I don't see why you're so much against the padding > fields > padding class hierarchy. But I won't block this PR; I wash my hands on this.
OK, haven't considered JDK 9, that's a beast to come. Fine then. |
8bc84cc
to
53d1165
Compare
da52f05
to
b207bad
Compare
657c7ee
to
7d38c71
Compare
7d38c71
to
c6536a5
Compare
@@ -18,7 +18,7 @@ fi | |||
echo Processing $FILE | |||
|
|||
if [ -z "$FAILED_TESTS" ] ; then | |||
if $CAT $FILE | head -100 | grep -q TestSuiteProgress ; then | |||
if $CAT $FILE | head -10000 | grep -q TestSuiteProgress ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it was not intentional :)
a29bccc
to
2bc733d
Compare
I would use more meaningful name of the third commit, but otherwise I am satisfied with the padding. |
2bc733d
to
0a608a0
Compare
The JVM is free to reorder fields inside a class, but not to mix fields from subclasses and superclasses.
0a608a0
to
0b9ac9e
Compare
It took a while @rvansa, but I finally ran the local read benchmark with the extra padding you suggested and it seems to make a small, but visible difference:
TBH I'm a bit surprised that I'm seeing a 10% difference on my machine now and almost no difference on I also got the benchmark results from @rmacor, and they still show ~10% improvement in replicated reads throughput:
|
@rvansa unless you have more comments, I am going to merge this |
Pushed to master, thank. |
https://issues.jboss.org/browse/ISPN-6805