-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AbstractReferenceCountedByteBuf can save using a volatile set on constructor #12288
Conversation
Still have to run benchs (that's why it's a draft) and need adding few checks to be sure the fence method is present on Unsafe. |
…tructor Motivation: AbstractReferenceCountedByteBuf can be made faster by ensure proper safe initialization vs volatile set on construction Modification: Using Unsafe and plain store instead of volatile set Result: Faster AbstractReferenceCountedByteBuf allocation
ded5850
to
a876add
Compare
it's similar to #12286 |
Results with
this PR:
There's an improvement, but not a huge one |
Relevant assembly parts are...
and, this PR:
Hence, with this PR, using storeFence translate in no-op (if we're lucky with inlining it till the Unsafe::storeFence intrinsic) and there's no associated cost, while on 4.1 there is a full barrier after setting the refCnt field with its initial value ie |
IMO it still worth to be considered wdyt @normanmaurer ? I'll be curious to see what numbers get @chrisvest disabling leak detection on a non-x86 CPU: i don't have any, but I'm curious |
|
||
protected AbstractReferenceCountedByteBuf(int maxCapacity) { | ||
super(maxCapacity); | ||
if (!PlatformDependent.hasUnsafe()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding some comments here on what we are doing and why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to move everything in a better named method too, but I'm curious if any can run some experiment vs the existing allocations benchs; I've a gut feeling (hopefully not wrong) that it has a larger benefit that expected given that it happen on hot path
@normanmaurer I've implemented a specific PlatformDependent method to "safe construct put int", but I've been unlucky that it has hit the inlining depth limit of JDK 8 and not inlined, see
This is one of the drawback of using wrapped And indeed the performance are now not as brilliant as they should be ie
Still better then the existing one, but will now pay the
That seems a good improvement (as expected). The assembly has changed too:
The expected In short: although a wild (and unconfigured) JMH benchmark shows just a small improvement (because of unlucky inlining depth of the newly platform dependent method), the improvement in the best case is larger. |
PTAL @normanmaurer |
@chrisvest @njhill PTAL |
Thanks! |
…tructor (netty#12288) Motivation: AbstractReferenceCountedByteBuf can be made faster by ensure proper safe initialization vs volatile set on construction Modification: Using Unsafe and plain store instead of volatile set Result: Faster AbstractReferenceCountedByteBuf allocation
…tructor (netty#12288) Motivation: AbstractReferenceCountedByteBuf can be made faster by ensure proper safe initialization vs volatile set on construction Modification: Using Unsafe and plain store instead of volatile set Result: Faster AbstractReferenceCountedByteBuf allocation
Motivation:
AbstractReferenceCountedByteBuf can be made faster by ensure proper safe initialization vs volatile set on construction
Modification:
Using Unsafe and plain store instead of volatile set
Result:
Faster AbstractReferenceCountedByteBuf allocation