Make AdaptiveByteBuf.setBytes faster#15736
Conversation
Motivation: The setBytes method was getting a sliced nioBuffer of its source, which typically causes allocation. On Java 16 onwards, we can instead copy using an absolutely offsetted `put` method, and forego allocating a duplicate ByteBuffer instance that is otherwise needed for isolating the position field. Modification: - Make use of the absolutely offsetted put method in setBytes, when its available. - Add a benchmark targeting the setBytes method that takes a ByteBuf source. - Change a few benchmarks to use the default allocator when pooling is enabled. Result: Faster setBytes in certain cases.
d94c605 to
3c5a614
Compare
| tmp.put(src); | ||
| int length = src.remaining(); | ||
| checkIndex(index, length); | ||
| ByteBuffer tmp = internalNioBuffer(); |
There was a problem hiding this comment.
Why not using the rootParent NIO buffer(s, from each adaptive, if available)?
It is shared among size classed chunks and will save them to allocates NIO buffers, which will be dead as they will be released (AdaptiveByteBufs are just empty shells afaik)
There was a problem hiding this comment.
I tried this but it was a lot slower, by like 50%. Not sure what was going on, but in theory you're right.
There was a problem hiding this comment.
Maybe it was doubling the bound check? Debugging it or checking ASM could help.
There was a problem hiding this comment.
If you have the version w root parent link it here an tomorrow I will take a look
There was a problem hiding this comment.
Here's the version I had that used the rootParent:
@Override
public ByteBuf setBytes(int index, ByteBuf src, int srcIndex, int length) {
checkIndex(index, length);
if (src instanceof AdaptiveByteBuf && PlatformDependent.javaVersion() >= 16) {
AdaptiveByteBuf srcBuf = (AdaptiveByteBuf) src;
AbstractByteBuf dstRoot = rootParent();
ByteBuffer dstBuffer = dstRoot.internalNioBuffer(0, dstRoot.maxFastWritableBytes());
AbstractByteBuf srcRoot = srcBuf.rootParent();
ByteBuffer srcBuffer = srcRoot.internalNioBuffer(0, srcRoot.maxFastWritableBytes());
PlatformDependent.absolutePut(dstBuffer, idx(index), srcBuffer, srcBuf.idx(srcIndex), length);
} else {
ByteBuffer tmp = (ByteBuffer) internalNioBuffer();
tmp.clear().position(index);
tmp.put(src.nioBuffer(srcIndex, length));
}
return this;
}There was a problem hiding this comment.
I'm getting
Bug: unsafe access to shared internal chunk buffer
so I'm adding this too
@Override
public ByteBuffer internalNioBuffer(int index, int length) {
if (!allowSectionedInternalNioBufferAccess) {
// we can only return the internalNioBuffer if the whole buffer is requested
if (length != capacity && index != 0) {
throw new UnsupportedOperationException("Bug: unsafe access to shared internal chunk buffer");
}
return internalNioBuffer();
}
checkIndex(index, length);
return internalNioBuffer().clear().position(index).limit(index + length);
}it's not very secure, still .-.
But I've moved the checkIndex later since I didn't wanted to perform again both the accessibility and bound check here.
In short: I would love we have a simple internalNioBuffer() method which don't perform bound nor accessibility checks and is not meant (ever) to provide a modifiable (in term of position/limit) internal buffer
There was a problem hiding this comment.
With the changes at #15736 (comment) to enable using the internalNioBuffer and
@Override
public ByteBuf setBytes(int index, ByteBuf src, int srcIndex, int length) {
checkIndex(index, length);
if (src instanceof AdaptiveByteBuf && PlatformDependent.javaVersion() >= 16) {
AdaptiveByteBuf srcBuf = (AdaptiveByteBuf) src;
// bound check src as well
srcBuf.checkIndex(srcIndex, length);
AbstractByteBuf dstRoot = rootParent;
AbstractByteBuf srcRoot = srcBuf.rootParent;
// TODO why we have to pay again for accessibility and bound checks?
ByteBuffer dstBuffer = dstRoot.internalNioBuffer(0, dstRoot.maxFastWritableBytes());
ByteBuffer srcBuffer = srcRoot.internalNioBuffer(0, srcRoot.maxFastWritableBytes());
PlatformDependent.absolutePut(dstBuffer, idx(index), srcBuffer, srcBuf.idx(srcIndex), length);
} else {
ByteBuffer tmp = (ByteBuffer) internalNioBuffer();
tmp.clear().position(index);
tmp.put(src.nioBuffer(srcIndex, length));
}
return this;
}I'm getting the best ever performance and no allocations so i would go for it
|
Still running the http 2 benchmarks and numbers now looks better The slice creation is still dominant: but the actual copy has become cheaper - especially with While benchmarking/profiling this I' ve found something weird
Reading https://wiki.openjdk.org/display/HotSpot/Server+Compiler+Inlining+Messages it means
and
Adding https://github.com/openjdk/jdk/blob/ba7bf43c76c94bea85dbbd865794184b7ee0cc86/src/hotspot/share/opto/bytecodeInfo.cpp#L276 for confirmation. JDK 24 By adding
FYI disabling with whilst, without (i.e. using The reason why it popups here is because It looks like that the reason why where @yawkat suggested franz1981#1 long time ago to fix this problem In this way, since there are no overloaded
Another "dirty" solution to this, is to make // on Chunk
updater = new VarHandleReferenceCountUpdater<Chunk>() {
@Override
protected VarHandle varHandle() {
return (VarHandle) REFCNT_FIELD_VH;
}
@Override
public boolean release(final Chunk instance) {
int rawCnt = getRawRefCnt(instance);
return rawCnt == 2 ? tryFinalRelease0(instance, 2) || retryRelease0(instance, 1)
: nonFinalRelease0(instance, 1, rawCnt, toLiveRealRefCnt(rawCnt, 1));
}
}; |
|
I'm up for improving the way reference counting works; one implementation used across (if possible), and simpler algorithm with no count/raw count distinction. But let's do those in separate PRs. |
oki, I leave this in good hands than ;) |
franz1981
left a comment
There was a problem hiding this comment.
I would improve the way we expose internalNioBuffer to avoid paying twice for bound and accessibility checks, but ATM that's the most performant version in my tests.
|
@chrisvest @franz1981 what is the status of this one ? |
|
Let's say that the ref cnt PR from Chris will change what we would observe here. |
|
@chrisvest I think after #15764 this can make progress and use something like what I made at #15736 (comment) |
|
@franz1981 I tried the attached patch, but I'm finding that it's actually slower in the This PR currently: The patch: A patch attempting the approach in #15736 (comment)Subject: [PATCH] Use root parent ByteBuffers directly
---
Index: buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java b/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java
--- a/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java (date 1762899017654)
@@ -689,14 +689,13 @@
}
static UnpooledUnsafeDirectByteBuf newUnsafeDirectByteBuf(
- ByteBufAllocator alloc, int initialCapacity, int maxCapacity,
- boolean allowSectionedInternalNioBufferAccess) {
+ ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
if (PlatformDependent.useDirectBufferNoCleaner()) {
return new UnpooledUnsafeNoCleanerDirectByteBuf(
- alloc, initialCapacity, maxCapacity, allowSectionedInternalNioBufferAccess);
+ alloc, initialCapacity, maxCapacity);
}
return new UnpooledUnsafeDirectByteBuf(
- alloc, initialCapacity, maxCapacity, allowSectionedInternalNioBufferAccess);
+ alloc, initialCapacity, maxCapacity);
}
private UnsafeByteBufUtil() { }
Index: buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java b/buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
--- a/buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java (date 1762899017652)
@@ -1687,11 +1687,15 @@
@Override
public ByteBuf setBytes(int index, ByteBuf src, int srcIndex, int length) {
checkIndex(index, length);
- ByteBuffer tmp = internalNioBuffer();
if (src instanceof AdaptiveByteBuf && PlatformDependent.javaVersion() >= 16) {
AdaptiveByteBuf srcBuf = (AdaptiveByteBuf) src;
- PlatformDependent.absolutePut(tmp, index, srcBuf.internalNioBuffer(), srcIndex, length);
+// ByteBuffer tmp = internalNioBuffer();
+// PlatformDependent.absolutePut(tmp, index, srcBuf.internalNioBuffer(), srcIndex, length);
+ ByteBuffer dstBuffer = rootParent()._internalNioBuffer();
+ ByteBuffer srcBuffer = srcBuf.rootParent()._internalNioBuffer();
+ PlatformDependent.absolutePut(dstBuffer, idx(index), srcBuffer, srcBuf.idx(srcIndex), length);
} else {
+ ByteBuffer tmp = internalNioBuffer();
tmp.position(index);
tmp.put(src.nioBuffer(srcIndex, length));
}
Index: buffer/src/main/java/io/netty/buffer/UnpooledHeapByteBuf.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnpooledHeapByteBuf.java b/buffer/src/main/java/io/netty/buffer/UnpooledHeapByteBuf.java
--- a/buffer/src/main/java/io/netty/buffer/UnpooledHeapByteBuf.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnpooledHeapByteBuf.java (date 1762899017653)
@@ -213,7 +213,7 @@
ensureAccessible();
ByteBuffer tmpBuf;
if (internal) {
- tmpBuf = internalNioBuffer();
+ tmpBuf = _internalNioBuffer();
} else {
tmpBuf = ByteBuffer.wrap(array);
}
@@ -222,7 +222,7 @@
private int getBytes(int index, FileChannel out, long position, int length, boolean internal) throws IOException {
ensureAccessible();
- ByteBuffer tmpBuf = internal ? internalNioBuffer() : ByteBuffer.wrap(array);
+ ByteBuffer tmpBuf = internal ? _internalNioBuffer() : ByteBuffer.wrap(array);
return out.write((ByteBuffer) tmpBuf.clear().position(index).limit(index + length), position);
}
@@ -279,7 +279,7 @@
public int setBytes(int index, ScatteringByteChannel in, int length) throws IOException {
ensureAccessible();
try {
- return in.read((ByteBuffer) internalNioBuffer().clear().position(index).limit(index + length));
+ return in.read((ByteBuffer) _internalNioBuffer().clear().position(index).limit(index + length));
} catch (ClosedChannelException ignored) {
return -1;
}
@@ -289,7 +289,7 @@
public int setBytes(int index, FileChannel in, long position, int length) throws IOException {
ensureAccessible();
try {
- return in.read((ByteBuffer) internalNioBuffer().clear().position(index).limit(index + length), position);
+ return in.read((ByteBuffer) _internalNioBuffer().clear().position(index).limit(index + length), position);
} catch (ClosedChannelException ignored) {
return -1;
}
@@ -314,7 +314,7 @@
@Override
public ByteBuffer internalNioBuffer(int index, int length) {
checkIndex(index, length);
- return (ByteBuffer) internalNioBuffer().clear().position(index).limit(index + length);
+ return (ByteBuffer) _internalNioBuffer().clear().position(index).limit(index + length);
}
@Override
@@ -535,7 +535,8 @@
return alloc().heapBuffer(length, maxCapacity()).writeBytes(array, index, length);
}
- private ByteBuffer internalNioBuffer() {
+ @Override
+ ByteBuffer _internalNioBuffer() {
ByteBuffer tmpNioBuf = this.tmpNioBuf;
if (tmpNioBuf == null) {
this.tmpNioBuf = tmpNioBuf = ByteBuffer.wrap(array);
Index: buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java b/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java
--- a/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java (date 1762899017652)
@@ -403,8 +403,8 @@
buf = directArena.allocate(cache, initialCapacity, maxCapacity);
} else {
buf = PlatformDependent.hasUnsafe() ?
- UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity, true) :
- new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity, true);
+ UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
+ new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
onAllocateBuffer(buf, false, false);
}
return toLeakAwareBuffer(buf);
Index: buffer/src/test/java/io/netty/buffer/LittleEndianUnsafeNoCleanerDirectByteBufTest.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/test/java/io/netty/buffer/LittleEndianUnsafeNoCleanerDirectByteBufTest.java b/buffer/src/test/java/io/netty/buffer/LittleEndianUnsafeNoCleanerDirectByteBufTest.java
--- a/buffer/src/test/java/io/netty/buffer/LittleEndianUnsafeNoCleanerDirectByteBufTest.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/test/java/io/netty/buffer/LittleEndianUnsafeNoCleanerDirectByteBufTest.java (date 1762899017655)
@@ -31,6 +31,6 @@
@Override
protected ByteBuf newBuffer(int length, int maxCapacity) {
- return new UnpooledUnsafeNoCleanerDirectByteBuf(UnpooledByteBufAllocator.DEFAULT, length, maxCapacity, true);
+ return new UnpooledUnsafeNoCleanerDirectByteBuf(UnpooledByteBufAllocator.DEFAULT, length, maxCapacity);
}
}
Index: buffer/src/main/java/io/netty/buffer/AdaptiveByteBufAllocator.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/AdaptiveByteBufAllocator.java b/buffer/src/main/java/io/netty/buffer/AdaptiveByteBufAllocator.java
--- a/buffer/src/main/java/io/netty/buffer/AdaptiveByteBufAllocator.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/AdaptiveByteBufAllocator.java (date 1762899017651)
@@ -112,8 +112,8 @@
@Override
public AbstractByteBuf allocate(int initialCapacity, int maxCapacity) {
return PlatformDependent.hasUnsafe() ?
- UnsafeByteBufUtil.newUnsafeDirectByteBuf(allocator, initialCapacity, maxCapacity, false) :
- new UnpooledDirectByteBuf(allocator, initialCapacity, maxCapacity, false);
+ UnsafeByteBufUtil.newUnsafeDirectByteBuf(allocator, initialCapacity, maxCapacity) :
+ new UnpooledDirectByteBuf(allocator, initialCapacity, maxCapacity);
}
}
}
Index: buffer/src/test/java/io/netty/buffer/BigEndianUnsafeNoCleanerDirectByteBufTest.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/test/java/io/netty/buffer/BigEndianUnsafeNoCleanerDirectByteBufTest.java b/buffer/src/test/java/io/netty/buffer/BigEndianUnsafeNoCleanerDirectByteBufTest.java
--- a/buffer/src/test/java/io/netty/buffer/BigEndianUnsafeNoCleanerDirectByteBufTest.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/test/java/io/netty/buffer/BigEndianUnsafeNoCleanerDirectByteBufTest.java (date 1762899017655)
@@ -31,6 +31,6 @@
@Override
protected ByteBuf newBuffer(int length, int maxCapacity) {
- return new UnpooledUnsafeNoCleanerDirectByteBuf(UnpooledByteBufAllocator.DEFAULT, length, maxCapacity, true);
+ return new UnpooledUnsafeNoCleanerDirectByteBuf(UnpooledByteBufAllocator.DEFAULT, length, maxCapacity);
}
}
Index: buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java b/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java
--- a/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java (date 1762899017651)
@@ -1496,4 +1496,8 @@
long _memoryAddress() {
return isAccessible() && hasMemoryAddress() ? memoryAddress() : 0L;
}
+
+ ByteBuffer _internalNioBuffer() {
+ return internalNioBuffer(0, maxFastWritableBytes());
+ }
}
Index: buffer/src/main/java/io/netty/buffer/UnpooledDirectByteBuf.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnpooledDirectByteBuf.java b/buffer/src/main/java/io/netty/buffer/UnpooledDirectByteBuf.java
--- a/buffer/src/main/java/io/netty/buffer/UnpooledDirectByteBuf.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnpooledDirectByteBuf.java (date 1762899017653)
@@ -45,7 +45,6 @@
private ByteBuffer tmpNioBuf;
private int capacity;
private boolean doNotFree;
- private final boolean allowSectionedInternalNioBufferAccess;
/**
* Creates a new direct buffer.
@@ -54,20 +53,6 @@
* @param maxCapacity the maximum capacity of the underlying direct buffer
*/
public UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
- this(alloc, initialCapacity, maxCapacity, true);
- }
-
- /**
- * Creates a new direct buffer.
- *
- * @param initialCapacity the initial capacity of the underlying direct buffer
- * @param maxCapacity the maximum capacity of the underlying direct buffer
- * @param allowSectionedInternalNioBufferAccess
- * {@code true} if {@link #internalNioBuffer(int, int)} is allowed to be called,
- * or {@code false} if it should throw an exception.
- */
- UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity,
- boolean allowSectionedInternalNioBufferAccess) {
super(maxCapacity);
ObjectUtil.checkNotNull(alloc, "alloc");
checkPositiveOrZero(initialCapacity, "initialCapacity");
@@ -79,7 +64,6 @@
this.alloc = alloc;
setByteBuffer(allocateDirectBuffer(initialCapacity), false);
- this.allowSectionedInternalNioBufferAccess = allowSectionedInternalNioBufferAccess;
}
/**
@@ -113,7 +97,6 @@
doNotFree = !doFree;
setByteBuffer((slice ? initialBuffer.slice() : initialBuffer).order(ByteOrder.BIG_ENDIAN), false);
writerIndex(initialCapacity);
- allowSectionedInternalNioBufferAccess = true;
}
/**
@@ -617,7 +600,7 @@
if (length == 0) {
return;
}
- ByteBufUtil.readBytes(alloc(), internal ? internalNioBuffer() : buffer.duplicate(), index, length, out);
+ ByteBufUtil.readBytes(alloc(), internal ? _internalNioBuffer() : buffer.duplicate(), index, length, out);
}
@Override
@@ -750,13 +733,11 @@
@Override
public ByteBuffer internalNioBuffer(int index, int length) {
checkIndex(index, length);
- if (!allowSectionedInternalNioBufferAccess) {
- throw new UnsupportedOperationException("Bug: unsafe access to shared internal chunk buffer");
- }
- return (ByteBuffer) internalNioBuffer().clear().position(index).limit(index + length);
+ return (ByteBuffer) _internalNioBuffer().clear().position(index).limit(index + length);
}
- private ByteBuffer internalNioBuffer() {
+ @Override
+ ByteBuffer _internalNioBuffer() {
ByteBuffer tmpNioBuf = this.tmpNioBuf;
if (tmpNioBuf == null) {
this.tmpNioBuf = tmpNioBuf = buffer.duplicate();
Index: buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java b/buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java
--- a/buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java (date 1762899017652)
@@ -179,7 +179,7 @@
extends UnpooledUnsafeNoCleanerDirectByteBuf {
InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf(
UnpooledByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
- super(alloc, initialCapacity, maxCapacity, true);
+ super(alloc, initialCapacity, maxCapacity);
}
@Override
Index: buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java b/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java
--- a/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java (date 1762899017654)
@@ -43,11 +43,6 @@
super(alloc, initialCapacity, maxCapacity);
}
- UnpooledUnsafeDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity,
- boolean allowSectionedInternalNioBufferAccess) {
- super(alloc, initialCapacity, maxCapacity, allowSectionedInternalNioBufferAccess);
- }
-
/**
* Creates a new direct buffer by wrapping the specified initial buffer.
*
Index: buffer/src/main/java/io/netty/buffer/UnpooledUnsafeNoCleanerDirectByteBuf.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeNoCleanerDirectByteBuf.java b/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeNoCleanerDirectByteBuf.java
--- a/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeNoCleanerDirectByteBuf.java (revision 293385859e01ba7501e6f120a0984dd084dfab67)
+++ b/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeNoCleanerDirectByteBuf.java (date 1762899017654)
@@ -21,9 +21,8 @@
import java.nio.ByteBuffer;
class UnpooledUnsafeNoCleanerDirectByteBuf extends UnpooledUnsafeDirectByteBuf {
- UnpooledUnsafeNoCleanerDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity,
- boolean allowSectionedInternalNioBufferAccess) {
- super(alloc, initialCapacity, maxCapacity, allowSectionedInternalNioBufferAccess);
+ UnpooledUnsafeNoCleanerDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
+ super(alloc, initialCapacity, maxCapacity);
}
@Override |
|
Thanks, i will try with JDK 25 as well, to make sure 🙏 But If you want, run the benchmark with -prof gc to compare the 2 approaches. I would suggest to use the patched one because should allocate less |
|
I think @chrisvest that this is not the "right" benchmark because it would reuse the internal buffer - which under certain circumstances can save to be materialized... Said that, I see a regression myself, likely due to bound checks
this is not happening in the current version - and "it could" (to be verified) a quirk of the benchmark I have again inspect the asm to know it... (see #13783 (comment) as a remainder: this PR is key to see in that HTTP 2 PR the right improvement...a linked list of issues!) |
|
@chrisvest re the performance difference: at a first look the problem is visible by reading the code as well
So, in which cases the second is a better option? |
|
@normanmaurer @franz1981 Please take a look. |
Motivation: The setBytes method was getting a sliced nioBuffer of its source, which typically causes allocation. On Java 16 onwards, we can instead copy using an absolutely offsetted `put` method, and forego allocating a duplicate ByteBuffer instance that is otherwise needed for isolating the position field. Modification: - Make use of the absolutely offsetted put method in setBytes, when its available. - Use the underlying ByteBuffer of the shared chunk where possible to avoid multiple bounds checks. - Add a benchmark targeting the setBytes method that takes a ByteBuf source. - Change a few benchmarks to use the default allocator when pooling is enabled. Result: Faster setBytes in certain cases.
|
@chrisvest @franz1981 do we want to also port this to 4.1 ? |
Motivation: The setBytes method was getting a sliced nioBuffer of its source, which typically causes allocation. On Java 16 onwards, we can instead copy using an absolutely offsetted `put` method, and forego allocating a duplicate ByteBuffer instance that is otherwise needed for isolating the position field. Modification: - Make use of the absolutely offsetted put method in setBytes, when its available. - Use the underlying ByteBuffer of the shared chunk where possible to avoid multiple bounds checks. - Add a benchmark targeting the setBytes method that takes a ByteBuf source. - Change a few benchmarks to use the default allocator when pooling is enabled. Result: Faster setBytes in certain cases. Co-authored-by: Chris Vest <christianvest_hansen@apple.com>
|
@normanmaurer I don't think we need to back port this to 4.1. |
Motivation: Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance Modification: Reduce the amount of atomic operations in the thread local allocation's fast path Result: Fixes #15571 These are the different variations I want to test: - [x] Uses unguarded `Recycler`s - [x] Implements "compressed" local free list (LIFO) - [x] Use a mpsc q for the reuse chunk q in the thread-local case **NO VISIBLE IMPROVEMENTS** - [x] Guards `nextInLine`'s `getAndSet` with a null check via volatile `get` first, since size classed chunks rarely end up into `nextInLine` (i.e. which is mostly `null`) **NO VISIBLE IMPROVEMENTS** - [x] Implements a var handle based `MpscIntQueue` (done at 1c4e1e4) **NO VISIBLE IMPROVEMENTS** - [x] Remove the live/raw ref cnt as mentioned at #15736 (comment) - [ ] Remove the ref count for size classed chunks (see 8953bbe and 8cb1bf0) - [ ] Use the "pinned" Recycler instead of the `FastThreadLocal`-based one
Motivation: Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance Modification: Reduce the amount of atomic operations in the thread local allocation's fast path Result: Fixes netty#15571 These are the different variations I want to test: - [x] Uses unguarded `Recycler`s - [x] Implements "compressed" local free list (LIFO) - [x] Use a mpsc q for the reuse chunk q in the thread-local case **NO VISIBLE IMPROVEMENTS** - [x] Guards `nextInLine`'s `getAndSet` with a null check via volatile `get` first, since size classed chunks rarely end up into `nextInLine` (i.e. which is mostly `null`) **NO VISIBLE IMPROVEMENTS** - [x] Implements a var handle based `MpscIntQueue` (done at 1c4e1e4) **NO VISIBLE IMPROVEMENTS** - [x] Remove the live/raw ref cnt as mentioned at netty#15736 (comment) - [ ] Remove the ref count for size classed chunks (see 8953bbe and 8cb1bf0) - [ ] Use the "pinned" Recycler instead of the `FastThreadLocal`-based one (cherry picked from commit accd981)
Motivation: Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance Modification: Reduce the amount of atomic operations in the thread local allocation's fast path Result: Fixes #15571 These are the different variations I want to test: - [x] Uses unguarded `Recycler`s - [x] Implements "compressed" local free list (LIFO) - [x] Use a mpsc q for the reuse chunk q in the thread-local case **NO VISIBLE IMPROVEMENTS** - [x] Guards `nextInLine`'s `getAndSet` with a null check via volatile `get` first, since size classed chunks rarely end up into `nextInLine` (i.e. which is mostly `null`) **NO VISIBLE IMPROVEMENTS** - [x] Implements a var handle based `MpscIntQueue` (done at 1c4e1e4) **NO VISIBLE IMPROVEMENTS** - [x] Remove the live/raw ref cnt as mentioned at #15736 (comment) - [ ] Remove the ref count for size classed chunks (see 8953bbe and 8cb1bf0) - [ ] Use the "pinned" Recycler instead of the `FastThreadLocal`-based one (cherry picked from commit accd981) Co-authored-by: Francesco Nigro <nigro.fra@gmail.com>





Motivation:
The setBytes method was getting a sliced nioBuffer of its source, which typically causes allocation. On Java 16 onwards, we can instead copy using an absolutely offsetted
putmethod, and forego allocating a duplicate ByteBuffer instance that is otherwise needed for isolating the position field.Modification:
Result:
Faster setBytes in certain cases.