(POC) Refactor FileRegion to implement new ReadableObject API #3969

Closed
wants to merge 6 commits into
from

Projects

None yet

8 participants

@nmittler
Member
nmittler commented Jul 9, 2015

Motivation:

Based on #3965. As a first step of integrating the unified reading API, the low-hanging fruit is to refactor FIleRegion to use it.

Modifications:

Refactored FileRegion to extend ReadableObject and implemented the new interface in DefaultFileRegion.

Result:

FileRegion implements the new ReadableObject interface.

@nmittler
Member
nmittler commented Jul 9, 2015

@normanmaurer see related question regarding FileRegion and DefaultMessageSizeEstimator: #3964 (comment). Should the size for FileRegion should just be its readableBytes()?

@nmittler nmittler (POC) Refactor FileRegion to implement new ReadableObject API
Motivation:

Based on #3965. As a first step of integrating the unified reading API, the low-hanging fruit is to refactor `FIleRegion` to use it.

Modifications:

Refactored `FileRegion` to extend `ReadableObject` and implemented the new interface in `DefaultFileRegion`.

Result:

`FileRegion` implements the new `ReadableObject` interface.
7a15cf0
@nmittler nmittler and 2 others commented on an outdated diff Jul 9, 2015
...n/java/io/netty/channel/oio/OioByteStreamChannel.java
@@ -126,25 +126,19 @@ protected void doWriteFileRegion(FileRegion region) throws Exception {
outChannel = Channels.newChannel(os);
}
- long written = 0;
- for (;;) {
- long localWritten = region.transferTo(outChannel, written);
- if (localWritten == -1) {
- checkEOF(region);
+ long bytesToTransfer = obj.readableBytes();
+ long totalTransferred = 0;
+ while (obj.isReadable()) {
+ long transferred = obj.readTo(outChannel, obj.readableBytes());
+ if (transferred == -1) {
@nmittler
nmittler Jul 9, 2015 Member

@normanmaurer AFAICT, this method should never return -1. Was this a mistake? Should we be checking for 0 instead?

@Apache9
Apache9 Jul 10, 2015 Contributor

For FileChannel.transferTo, return 0 does not mean EOF I think, it only means it does not transfer any bytes. I think we need to check our position and FileChannel.size() somewhere to expose EOF?

@nmittler
nmittler Jul 10, 2015 Member

Right, zero does not imply EOF. The javadoc for FileChannel.transferTo does not indicate that it will ever return -1. The FileChannel.read, however does. I'm just wondering why we're checking for -1 ... is it actually possible as a return value of transferTo, but just isn't documented?

@Scottmitch
Scottmitch Jul 16, 2015 Member

@nmittler - Agreed this looks suspicious to me...

@Apache9

file could be null here because DefaultFileRegion supports lazy initialization. And the reference counting may cause the file being closed even if it is still used by some FileRegion. I think we need to follow the implementation of ByteBuf here.

I created a SlicedFileRegion in the patch of #3928
https://github.com/netty/netty/pull/3928/files#diff-cfa07c2dae220c0d536ad0dac07572c0R23
Could you see if it helps? Thanks.

@nmittler
Member

@Apache9 Yup, good idea. I've refactored the code based on your work in #3928.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...rc/main/java/io/netty/channel/AbstractFileRegion.java
+ endPosition = readPosition + length;
+ }
+
+ @Override
+ public long readPosition() {
+ return readPosition;
+ }
+
+ @Override
+ public long readableBytes() {
+ return endPosition - readPosition;
+ }
+
+ @Override
+ public boolean isReadable() {
+ return readableBytes() > 0;
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: return endPosition != readPosition; to reduce number ops needed.

@nmittler
nmittler Jul 16, 2015 Member

This comment doesn't apply anymore.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...rc/main/java/io/netty/channel/AbstractFileRegion.java
+ }
+
+ @Override
+ public FileRegion skipBytes(long length) {
+ verifyReadable(length);
+ readPosition += length;
+ return this;
+ }
+
+ @Override
+ public long readTo(WritableByteChannel target, long length) throws IOException {
+ verifyReadable(length);
+ if (length == 0) {
+ return 0L;
+ }
+ if (refCnt() == 0) {
@Scottmitch
Scottmitch Jul 16, 2015 Member

Should this check come before the length == 0 check?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...rc/main/java/io/netty/channel/AbstractFileRegion.java
+ verifyReadable(length);
+ readPosition += length;
+ return this;
+ }
+
+ @Override
+ public long readTo(WritableByteChannel target, long length) throws IOException {
+ verifyReadable(length);
+ if (length == 0) {
+ return 0L;
+ }
+ if (refCnt() == 0) {
+ throw new IllegalReferenceCountException(0);
+ }
+
+ long written = channel().transferTo(this.readPosition, length, target);
@Scottmitch
Scottmitch Jul 16, 2015 Member

remove this.?

@nmittler
nmittler Jul 16, 2015 Member

This comment doesn't apply anymore.

@Scottmitch Scottmitch commented on an outdated diff Jul 16, 2015
...src/main/java/io/netty/channel/DefaultFileRegion.java
private static final InternalLogger logger = InternalLoggerFactory.getInstance(DefaultFileRegion.class);
- private final File f;
- private final long position;
- private final long count;
- private long transfered;
+
+ private File f;
private FileChannel file;
@Scottmitch
Scottmitch Jul 16, 2015 Member

file -> fileChannel

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...src/main/java/io/netty/channel/DefaultFileRegion.java
private static final InternalLogger logger = InternalLoggerFactory.getInstance(DefaultFileRegion.class);
- private final File f;
- private final long position;
- private final long count;
- private long transfered;
+
+ private File f;
@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.netty.buffer;
+
+import io.netty.util.ReferenceCounted;
+
+import java.io.IOException;
+import java.nio.channels.WritableByteChannel;
+
+/**
+ * An object that contains a readable region. For simplicity, extends {@link ReferenceCounted}.
+ */
+public interface ReadableObject<T extends ReadableObject> extends ReferenceCounted {
@Scottmitch
Scottmitch Jul 16, 2015 Member

@nmittler - Have you thought about just using ReadableObject instead of T in this interface? Or use <T extends ReadableObject> T foo(...) so users don't have to cast? The usage of this generic interface seems awkward in classes that need to use ReadableObject in a generic way and not care about the generic argument (i.e. AbstractNioByteChannel).

@nmittler
nmittler Jul 16, 2015 Member

Removed the generic parameter.

@Scottmitch
Member

@nmittler - Looks good to me. I guess it will be a bit awkward to deal with the long (FileRegion) vs int (ByteBuf) indexing and sizes (assuming the intention is to have ByteBuf implement ReadableObject).

@normanmaurer
Member

@nmittler I'm not sure we should add this interface because of the miss match of indexes (long vs. int).

@trustin WDYT ?

@nmittler
Member

@normanmaurer We can change method names so that there is no collision with ByteBuf. The long versions will just delegate to the int versions. WDYT?

@nmittler
Member

Moving discussion from #3965 to here. From @Apache9:

@nmittler I think we need a POC of ReadableObjectQueue? It is important for coalescing small writes into one data frame. And the slice method is not that straight forward. ByteBuf has positioned read write method so we can implement a SlicedByteBuf for all ByteBuf implementations. But our ReadableObject does not support positioned read, so we need to implement our own version of slice?

By ReadableObjectQueue do you mean something similar to CompositeByteBuf or CoalescingBufferQueue? We would certainly need the former to support the latter.

@louiscryan louiscryan commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
+ */
+ T readSlice(long length);
+
+ /**
+ * Return the underlying {@link ReadableObject} instance if this is a wrapper around another
+ * object (e.g. a slice of another object).
+ *
+ * @return {@code null} if this is not a wrapper
+ */
+ T unwrap();
+
+ /**
+ * Transfers this object's data to the specified stream starting at the
@louiscryan
louiscryan Jul 16, 2015 Contributor

s/stream/channel

@Scottmitch
Member

By ReadableObjectQueue do you mean something similar to CompositeByteBuf or CoalescingBufferQueue? We would certainly need the former to support the latter.

I agree this is direction seems natural to mimic what ByteBuf has to offer #3928 (comment).

@nmittler
Member

@Scottmitch @Apache9 @normanmaurer @louiscryan PTAL. I've added a CompositeReadableObject that is similar to CompositeByteBuf, but (hopefully) does a better job at indexing/reference counting WRT slices.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+
+ /**
+ * Gets the read position limit (exclusive). The value of {@code readerPosition} must always
+ * be less than or equal to this limit.
+ */
+ long readerLimit();
+
+ /**
+ * Returns {@code true} if and only if {@link #readableBytes()} > {@code 0}.
+ */
+ boolean isReadable();
+
+ /**
+ * Returns {@code true} if and only if this buffer contains equal to or more than the specified number of elements.
+ */
+ boolean isReadable(int size);
@Scottmitch
Scottmitch Jul 16, 2015 Member

@nmittler - int vs long. Should we use long here to be consistent with the rest of the methods in this interface?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * Returns current read position in the object.
+ */
+ long readerPosition();
+
+ /**
+ * Sets the {@code readerPosition} of this object.
+ *
+ * @throws IndexOutOfBoundsException
+ * if the specified {@code readerPosition} is
+ * less than {@code 0} or
+ * greater than {@link #readerLimit()}.
+ */
+ ReadableObject readerPosition(long readerPosition);
+
+ /**
+ * Gets the read position limit (exclusive). The value of {@code readerPosition} must always
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+import java.nio.channels.WritableByteChannel;
+
+/**
+ * An object that contains a readable region. For simplicity, extends {@link ReferenceCounted}.
+ */
+public interface ReadableObject extends ReferenceCounted {
+ /**
+ * Returns current read position in the object.
+ */
+ long readerPosition();
+
+ /**
+ * Sets the {@code readerPosition} of this object.
+ *
+ * @throws IndexOutOfBoundsException
+ * if the specified {@code readerPosition} is
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: consider compressing on 1 line. or use html tags to format if you want some specific format.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ /**
+ * Returns {@code true} if and only if this buffer contains equal to or more than the specified number of elements.
+ */
+ boolean isReadable(int size);
+
+ /**
+ * Returns the number of readable bytes in this object.
+ */
+ long readableBytes();
+
+ /**
+ * Increases the current {@code readerPosition} by the specified
+ * {@code length} in this buffer.
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: will this fit on 1 line?

@Scottmitch Scottmitch commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ */
+ long readableBytes();
+
+ /**
+ * Increases the current {@code readerPosition} by the specified
+ * {@code length} in this buffer.
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
+ */
+ ReadableObject skipBytes(long length);
+
+ /**
+ * Returns a slice of this object's readable region. This method is
+ * identical to {@code r.slice(r.readerPosition(), r.readableBytes())}.
+ * This method does not modify {@code readerPosition}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * <p>
+ * Also be aware that this method will NOT call {@link #retain()} and so the
+ * reference count will NOT be increased.
+ */
+ ReadableObject slice();
+
+ /**
+ * Returns a slice of this object's sub-region. This method does not modify
+ * {@code readerPosition}.
+ * <p>
+ * Also be aware that this method will NOT call {@link #retain()} and so the
+ * reference count will NOT be increased.
+ *
+ * @param position the starting position for the slice.
+ *
+ * @param length the size of the new slice relative to {@code position}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code position} -> {@link #readerPosition()}?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ */
+ ReadableObject skipBytes(long length);
+
+ /**
+ * Returns a slice of this object's readable region. This method is
+ * identical to {@code r.slice(r.readerPosition(), r.readableBytes())}.
+ * This method does not modify {@code readerPosition}.
+ * <p>
+ * Also be aware that this method will NOT call {@link #retain()} and so the
+ * reference count will NOT be increased.
+ */
+ ReadableObject slice();
+
+ /**
+ * Returns a slice of this object's sub-region. This method does not modify
+ * {@code readerPosition}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * <p>
+ * Also be aware that this method will NOT call {@link #retain()} and so the
+ * reference count will NOT be increased.
+ *
+ * @param position the starting position for the slice.
+ *
+ * @param length the size of the new slice relative to {@code position}.
+ *
+ * @throws IndexOutOfBoundsException
+ * if any part of the requested region falls outside of the currently readable region.
+ */
+ ReadableObject slice(long position, long length);
+
+ /**
+ * Returns a new slice of this object's sub-region starting at the current
+ * {@link #readerPosition()} and increases the {@code readerPosition} by the size
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ ReadableObject slice(long position, long length);
+
+ /**
+ * Returns a new slice of this object's sub-region starting at the current
+ * {@link #readerPosition()} and increases the {@code readerPosition} by the size
+ * of the new slice (= {@code length}).
+ * <p>
+ * Also be aware that this method will NOT call {@link #retain()} and so the
+ * reference count will NOT be increased.
+ *
+ * @param length the size of the new slice
+ *
+ * @return the newly created slice
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: can this fit on 1 line?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * reference count will NOT be increased.
+ *
+ * @param length the size of the new slice
+ *
+ * @return the newly created slice
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
+ */
+ ReadableObject readSlice(long length);
+
+ /**
+ * Return the underlying {@link ReadableObject} instance if this is a wrapper around another
+ * object (e.g. a slice of another object).
+ *
+ * @return {@code null} if this is not a wrapper
@Scottmitch
Scottmitch Jul 16, 2015 Member

consider just combining this with the previous statement. Looks strange by itself.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@link #readableBytes()}
+ */
+ ReadableObject readSlice(long length);
+
+ /**
+ * Return the underlying {@link ReadableObject} instance if this is a wrapper around another
+ * object (e.g. a slice of another object).
+ *
+ * @return {@code null} if this is not a wrapper
+ */
+ ReadableObject unwrap();
+
+ /**
+ * Discards the bytes between position 0 and {@code readerPosition}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * <p>
+ * Please refer to the class documentation for more detailed explanation.
+ */
+ ReadableObject discardReadBytes();
+
+ /**
+ * Similar to {@link #discardReadBytes()} except that this method might discard
+ * some, all, or none of read bytes depending on its internal implementation to reduce
+ * overall memory bandwidth consumption at the cost of potentially additional memory
+ * consumption.
+ */
+ ReadableObject discardSomeReadBytes();
+
+ /**
+ * Transfers this object's data to the specified stream starting at the
+ * given {@code readerPosition}. Does not change the {@code readerPosition}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code readerPosition} -> {@link #readerPosition()}

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ */
+ ReadableObject discardReadBytes();
+
+ /**
+ * Similar to {@link #discardReadBytes()} except that this method might discard
+ * some, all, or none of read bytes depending on its internal implementation to reduce
+ * overall memory bandwidth consumption at the cost of potentially additional memory
+ * consumption.
+ */
+ ReadableObject discardSomeReadBytes();
+
+ /**
+ * Transfers this object's data to the specified stream starting at the
+ * given {@code readerPosition}. Does not change the {@code readerPosition}.
+ *
+ * @param pos the starting position for the transfer. This must be < {@link #readerLimit()}.
@Scottmitch
Scottmitch Jul 16, 2015 Member

consider putting < in a {@code } block or escaping it somehow....may not format well in HTML land.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ * overall memory bandwidth consumption at the cost of potentially additional memory
+ * consumption.
+ */
+ ReadableObject discardSomeReadBytes();
+
+ /**
+ * Transfers this object's data to the specified stream starting at the
+ * given {@code readerPosition}. Does not change the {@code readerPosition}.
+ *
+ * @param pos the starting position for the transfer. This must be < {@link #readerLimit()}.
+ * @param length the maximum number of bytes to transfer
+ *
+ * @return the actual number of bytes written out to the specified channel
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@code this.readableBytes}
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: can this fit on 1 line?

@Scottmitch
Scottmitch Jul 16, 2015 Member

{@code this.readableBytes} -> {@link #readerPosition()}

@Scottmitch
Scottmitch Jul 16, 2015 Member

Actually you may want to use {@link #isReadable(long)} here.

@nmittler
nmittler Jul 17, 2015 Member

I think this is easier to express using {@link #readerLimit}. Done.

@Scottmitch
Scottmitch Jul 17, 2015 Member

@nmittler - On second thought...the statement should also be sensitive to pos too right?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
buffer/src/main/java/io/netty/buffer/ReadableObject.java
+ */
+ ReadableObject discardSomeReadBytes();
+
+ /**
+ * Transfers this object's data to the specified stream starting at the
+ * given {@code readerPosition}. Does not change the {@code readerPosition}.
+ *
+ * @param pos the starting position for the transfer. This must be < {@link #readerLimit()}.
+ * @param length the maximum number of bytes to transfer
+ *
+ * @return the actual number of bytes written out to the specified channel
+ *
+ * @throws IndexOutOfBoundsException
+ * if {@code length} is greater than {@code this.readableBytes}
+ * @throws IOException
+ * if the specified channel threw an exception during I/O
@Scottmitch
Scottmitch Jul 16, 2015 Member

nit: can this fit on 1 line?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...main/java/io/netty/buffer/AbstractReadableObject.java
+import java.io.IOException;
+import java.nio.channels.WritableByteChannel;
+
+/**
+ * A skeletal implementation of a {@link ReadableObject}.
+ */
+public abstract class AbstractReadableObject implements ReadableObject {
+ private static final ResourceLeakDetector<ReadableObject> leakDetector =
+ new ResourceLeakDetector<ReadableObject>(ReadableObject.class);
+
+ private final ResourceLeak leak;
+ private long readerPosition;
+
+ public AbstractReadableObject(long readerPosition) {
+ this.readerPosition = readerPosition;
+ leak = leakDetector.open(this);
@Scottmitch
Scottmitch Jul 16, 2015 Member

I guess if we decide to merge this is we would eventually use a similar strategy that is used by ByteBuf, AbstractByteBufAllocator, and CompositeByteBuf such that every object is not required to have this ResourceLeak object (unless leak detection is turned on).

@nmittler
nmittler Jul 17, 2015 Member

I haven't dug into how this detector works exactly ... I was under the assumption that leakDetector.open() would return null if leak detection was not enabled. Is that not the case?

@Scottmitch
Scottmitch Jul 17, 2015 Member

@nmittler - That is how leakDetector works. However the ByteBuf hierarchy is able to keep the leak reference outside of the classes which implement ByteBuf functionality, and thus don't require an extra member for every instance. This also helps partition the ResourceLeak logic away from the actual ByteBuf logic.

@nmittler
nmittler Jul 21, 2015 Member

@Scottmitch I believe CompositeByteBuf has it, which is how it showed up here :)

Do we just say that's a "bug" in CompositeByteBuf and just fix it here?

@Scottmitch
Scottmitch Jul 21, 2015 Member

@nmittler - I don't see any reason why CompositeByteBuf should behave differently then the other ByteBufs in this regard. However I think this is a separate issue....shouldn't we model AbstractReadableObject after the ByteBuf model instead of the CompositeByteBuf?

@nmittler
nmittler Jul 21, 2015 Member

@Scottmitch absolutely ... I just wanted to point out where this came from in case there was a legitimate reason that CompositeByteBuf is different ... sounds like it's not. I'll remove the leak detection code.

@Scottmitch
Scottmitch Jul 22, 2015 Member

@nmittler - Ok sounds like we are on the same page. I opened #4017 to get input from others and address as a followup if necessary.

@nmittler
nmittler Jul 22, 2015 Member

@Scottmitch sgtm ... thanks!

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...main/java/io/netty/buffer/AbstractReadableObject.java
+ }
+ if (length == 0) {
+ return 0L;
+ }
+
+ return readTo0(target, pos, length);
+ }
+
+ @Override
+ public final ReadableObject slice() {
+ return slice(readerPosition, readableBytes());
+ }
+
+ @Override
+ public ReadableObject slice(long position, long length) {
+ ensureReadable(length);
@Scottmitch
Scottmitch Jul 16, 2015 Member

Is this necessary? Shouldn't we just cover this in the next if statement? We should be considering the position parameter when checking validity too.

Just check if length < 0 in the next if statement?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
...src/main/java/io/netty/channel/DefaultFileRegion.java
}
@Override
- public long transfered() {
- return transfered;
+ public ReadableObject slice(long position, long length) {
+ ensureReadable(length);
@Scottmitch
Scottmitch Jul 16, 2015 Member

Is this necessary? Shouldn't we just cover this in the next if statement? We should be considering the position parameter when checking validity too.

Just check if length < 0 in the next if statement?

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
.../io/netty/buffer/AbstractCompositeReadableObject.java
@@ -0,0 +1,192 @@
+/*
+ * Copyright 2012 The Netty Project
@Scottmitch
Scottmitch Jul 16, 2015 Member

2012 -> 2015.

@Scottmitch Scottmitch and 1 other commented on an outdated diff Jul 16, 2015
.../io/netty/buffer/AbstractCompositeReadableObject.java
+
+ @Override
+ public final long readerLimit() {
+ return readerLimit;
+ }
+
+ /**
+ * Discards all read bytes. Calls the {@link #removeComponent0()} to perform the removal of
+ * any fully read component.
+ */
+ @Override
+ @SuppressWarnings("unchecked")
+ public ReadableObject discardReadBytes() {
+ // Discard all fully read components and save the offset of the reader position.
+ long offset = readerPosition();
+ while (!components.isEmpty()) {
@Scottmitch
Scottmitch Jul 16, 2015 Member

consider restructuring to just use return value of peek() to detect empty.

for (;;) {
  Component c = components.peek();
  if (c == null) {
    break;
  }
  ...
@Apache9
Contributor
Apache9 commented Jul 17, 2015

Do we really need to support all features here? For example, if we aim to unify ByteBuf and FileRegion or the some sort of combination of them when writing them out, then only the readSlice method is needed? It does not make sense to slice a ReadableObject at the middle or tail?

So my concern is we may implement lots of features that we do not use it right now. What usually happen is the code will never be used, or when someone wants to use it he finds that it does not fit and write the code again...

Thanks.

@nmittler
Member

@Scottmitch @normanmaurer any thoughts on @Apache9's concern in #3969 (comment)?

@Scottmitch
Member

@nmittler - @Apache9 brings up some good points. I have no problem keeping interfaces minimal. We could always push more of the "common" stuff down to this "lower" layer as it is needed. I am still a bit unsure about ByteBuf exposing long and int methods for indexing though. If it wasn't for this I would say we could push more of the common stuff into the lower layers without as much concern.

@nmittler
Member

@Scottmitch @Apache9 fair enough, thanks! Do you guys want to list the methods that we should remove? ... then I can see if the changes are feasible for the current PR.

@nmittler
Member

@Scottmitch I think I've addressed your previous comments.

I'll take a look at reducing the interface shortly.

@Scottmitch
Member

Do you guys want to list the methods that we should remove?

@nmittler - My assumption was this PR would be focused on allowing HTTP/2 to use ReadableObject instead of ByteBuf. If we just support the methods we need for http/2 then I guess I would consider that "minimal". Of-course if there are supporting methods that are needed for this process that may be fine too. I didn't have any specific methods to call out (and not saying what we have now isn't minimal already)...more just agreeing with the concept that we should focus on accomplishing this task and worry less about pulling all the commonalities between ByteBuf and FileRegion into ReadableObject.

@nmittler nmittler was assigned by Scottmitch Jul 17, 2015
@Scottmitch Scottmitch added this to the 4.1.0.Beta6 milestone Jul 17, 2015
@Apache9
Contributor
Apache9 commented Jul 18, 2015

@Scottmitch @nmittler Agree, we could try to integrate this into HTTP/2 codec first and then check if there are some features that we do not need to support. Thanks.

@nmittler
Member

@Scottmitch @Apache9 sounds good guys!

@normanmaurer Before I go any further on this, I'd like to get some buy in from you regarding whether you'll accept such an interface change. Do you want to take another look at this and discuss?

@normanmaurer
Member

@nmittler honestly I'm still not sure I like it... @trustin WDYT ?

@nmittler
Member

@normanmaurer what is it specifically that you don't like? Should I rename methods to avoid collision with ByteBuf?

@nmittler
Member
@nmittler
Member

@normanmaurer gentle ping ... could you clarify what your concern is here? Do you not like the idea of having a common interface between ByteBuf and FileRegion? Or do you just not like the interface proposed?

@Scottmitch
Member

@nmittler - Will take a look next week :) We should discuss with @normanmaurer to clarify concerns too.

@normanmaurer
Member

@nmittler sorry I totally missed this, I will pay the ๐Ÿบ next time ;)

Maybe its just me but what I dislike is that it kind of give the "impression" that a ByteBuf can contain more then Integer.MAX_VALUE bytes.

@nmittler
Member

@normanmaurer ah ok, that makes sense. I guess the way I was thinking of it is that the ReadableObject interface affords very large values, subject to the restrictions of the implementation. Unfortunately, the interface has to use long if we want to be able to generically handle ByteBuf and FileRegion.

With that said, I was specifically avoiding the term index in the ReadableObject API, as that would be the term used in the ByteBuf interface. So ByteBuf would have long getReaderPosition() and int getReaderIndex() which would return the same value.

@nmittler
Member

@normanmaurer any thoughts on #3969 (comment)?

@trustin could you take a look as well?

@Scottmitch
Member

@nmittler - Sorry for slacking...will look again soon. However @normanmaurer @trustin is this something you are against for now?

@trustin
Member
trustin commented Jul 30, 2015

Wouldn't it make JVM insert lots of safepoints in the loop if we use long as an index variable type? Practically we could use long as an index variable type everywhere if JVM was doing a good job and we are free to break our API compatibility, but ... I'm not sure it's worth it.

@Apache9
Contributor
Apache9 commented Jul 30, 2015

After googling(sorry I'm not an expert of JVM...), in this article

http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html

For compiled code, JIT inserts safepoint checks in code at certain points (usually, after return from calls or at back jump of loop).

So I think the overhead of safepoint is related to how many times we run the loop, no matter the index variable is long or int?(correct me if I'm wrong)

And I think the start point of this issue is to simplify the implementation of FileRegion support and DATA frame coalescing in HTTP/2, especially if we want to combine a FileRegion and a ByteBuf.

Thanks.

@Scottmitch
Member

Something that may also be relevant in this context is there is also no "increment long" byte code instruction (int has iinc). May be worth benchmarking to see if there is any difference between int/long iteration with a few jvms/versions.

@normanmaurer
Member

Yeah I think we need further testing here

Am 30.07.2015 um 22:38 schrieb Scott Mitchell notifications@github.com:

Something that may also be relevant in this context is there is also no "increment long" byte code instruction (int has iinc). May be worth benchmarking to see if there is any difference between int/long iteration with a few jvms/versions.

โ€”
Reply to this email directly or view it on GitHub.

@nmittler
Member

@normanmaurer do you have benchmarks that I should run? If not, what sort of benchmark do you think would be sufficient? Just iterating over int vs long?

@normanmaurer
Member

@nmittler yep exactly this (iterating)

@nmittler
Member

@normanmaurer @Scottmitch @trustin @Apache9

I threw together a simple benchmark for iterating across int vs long. Here are my results:

Environment
OS Ubuntu 14.04 LTS (Trusty Tahr)
Java Oracle jdk1.8.0_25
JMH 1.7.1
VM options -server -dsa -da -ea:io.netty... -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+HeapDumpOnOutOfMemoryError -Dio.netty.noResourceLeakDetection -Xms768m -Xmx768m -XX:MaxDirectMemorySize=768m -Dharness.executor=CUSTOM -Dharness.executor.class=AbstractMicrobenchmark$HarnessExecutor
Benchmark                     (size)   Mode  Cnt        Score       Error  Units
LongVsIntIteration.intIter       100  thrpt   50  3065970.866 ยฑ 81440.988  ops/s
LongVsIntIteration.intIter      1000  thrpt   50   328747.993 ยฑ 11644.994  ops/s
LongVsIntIteration.intIter    100000  thrpt   50     3300.834 ยฑ    37.026  ops/s
LongVsIntIteration.intIter   1000000  thrpt   50      333.858 ยฑ     1.722  ops/s
LongVsIntIteration.longIter      100  thrpt   50  2837587.995 ยฑ 43150.368  ops/s
LongVsIntIteration.longIter     1000  thrpt   50   321867.851 ยฑ 10719.184  ops/s
LongVsIntIteration.longIter   100000  thrpt   50     3452.820 ยฑ   117.789  ops/s
LongVsIntIteration.longIter  1000000  thrpt   50      332.586 ยฑ     1.898  ops/s

Generally there doesn't seem to be much of a performance difference. For the small case (size=100), both int and long have fairly noisy results and it's unclear if there is actually a performance difference. For the other array sizes, the results are nearly identical.

Here is the code, you may want to take a look to see if there are places where JVM optimizations may be playing a role in the results:

@State(Scope.Benchmark)
@Warmup(iterations = 10)
@Measurement(iterations = 25)
public class LongVsIntIteration extends AbstractMicrobenchmark {

  @Param({ "100", "1000", "100000", "1000000" })
  public int size;

  private class DataSet {
    private final int[] data = new Random().ints(1000000).toArray();

    int intSize() {
      return size;
    }

    int intGet(int index) {
      return data[index];
    }

    long longSize() {
      return size;
    }

    int longGet(long index) {
      return data[(int) index];
    }
  }

  private DataSet dataset = new DataSet();

  @Benchmark
  public void intIter(Blackhole bh) {
    for(int index = 0; index < dataset.intSize(); ++index) {
      bh.consume(dataset.intGet(index));
    }
  }

  @Benchmark
  public void longIter(Blackhole bh) {
    for(long index = 0; index < dataset.longSize(); ++index) {
      bh.consume(dataset.longGet(index));
    }
  }
}
@Scottmitch
Member

@nmittler -

I re-ran results on my end (also modified longIter to use all long types for the loop boundaries) and got similar results. See the code and results below.

/*
 * Copyright 2015 The Netty Project
 *
 * The Netty Project licenses this file to you under the Apache License,
 * version 2.0 (the "License"); you may not use this file except in compliance
 * with the License. You may obtain a copy of the License at:
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 * License for the specific language governing permissions and limitations
 * under the License.
 */
package io.netty.microbenchmark.common;

import io.netty.microbench.util.AbstractMicrobenchmark;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

import java.util.Random;

@State(Scope.Benchmark)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
public class LongVsIntIteration extends AbstractMicrobenchmark {

  @Param({ "100", "1000", "100000", "1000000" })
  public int size;

  private DataSet dataset;

  private static final class DataSet {
    private final int[] data;

    public DataSet(int size) {
        Random r = new Random();
        data = new int[size];
        for (int i = 0; i < size; ++i) {
            data[i] = r.nextInt();
        }
    }

    int intSize() {
      return data.length;
    }

    int intGet(int index) {
      return data[index];
    }

    long longSize() {
      return data.length;
    }

    int longGet(long index) {
      return data[(int) index];
    }
  }

  @Setup(Level.Trial)
  public void setup() {
      dataset = new DataSet(size);
  }

  @Benchmark
  public void intIter(Blackhole bh) {
    for (int index = 0; index < dataset.intSize(); ++index) {
      bh.consume(dataset.intGet(index));
    }
  }

  @Benchmark
  public void longIter(Blackhole bh) {
    for (long index = 0; index < dataset.longSize(); ++index) {
      bh.consume(dataset.longGet(index));
    }
  }
}
Benchmark                     (size)   Mode  Cnt        Score        Error  Units
LongVsIntIteration.intIter       100  thrpt   20  3511883.837 ยฑ 193853.473  ops/s
LongVsIntIteration.intIter      1000  thrpt   20   373725.867 ยฑ   4357.782  ops/s
LongVsIntIteration.intIter    100000  thrpt   20     4123.765 ยฑ     63.784  ops/s
LongVsIntIteration.intIter   1000000  thrpt   20      413.567 ยฑ      6.029  ops/s
LongVsIntIteration.longIter      100  thrpt   20  3554900.161 ยฑ  22009.430  ops/s
LongVsIntIteration.longIter     1000  thrpt   20   349472.347 ยฑ  14390.589  ops/s
LongVsIntIteration.longIter   100000  thrpt   20     4046.519 ยฑ     21.611  ops/s
LongVsIntIteration.longIter  1000000  thrpt   20      401.623 ยฑ      1.720  ops/s
@nmittler
Member

@Scottmitch thanks!

So it would seem that iteration is not a concern with this PR.

@normanmaurer @trustin do you agree? If so, what are the next steps?

@buchgr
Contributor
buchgr commented Aug 17, 2015

I have tested it in the past and it's true that the JIT will insert a safe point check when you switch your index from int to long. A safe point check is basically a load from a fixed memory location. I don't know how exactly this works (maybe the page is usually cached in TLB (fast path) and when then set to no read, it's removed and so that on next load kernel logic has to be invoked?). Anyways, this is for sure just so very very cheap (think: single digit cycles) as it's done all the time. So I don't think you will be able to measure this with this kind of benchmark, as any other noise on the system will far outweigh the overhead of a safepoint check.

Anyways, unless your loop is super basic, then in practice you will end up doing some logic in your loop that comes with a safepoint anyway (like calling a function) and so I would argue that this is something so micro and VM + OS + hardware dependent, that a Java dev should not worry about it.

Just my two cents with lots of half knowledge ;-)

@trustin
Member
trustin commented Aug 18, 2015

Summoning @nitsanw on his thoughts on safepoint polling and long loop variable :-)

@nitsanw
Contributor
nitsanw commented Aug 19, 2015

@trustin Ok :)
Safepoints are inserted on the backedges of non-counted loops, where counted means the loop is a for like control structure with an index that is smaller or equal to int.
The safepoint cost is very low and only very simple loops will suffer a noticeable overhead from the safepoint cost alone. Sadly, safepoints have requirements which may block other optimizations from happening. In particular loop unrolling, superword optimization and arrays fill optimization may all stop happening for your code should you switch the loop index to long.
The issue with the benchmarks above is that the use of the Blackhole enforces a pretty big overhead on the loop cost estimation, leaving the results very similar as your loop cost is dominated by the BH call. Blackhole.consume is forced not to inline and contains a volatile load which will also inhibit the sort of optimizations one can hope to see here.
So here's a different benchmark, demonstrating some of the phenomena you might see:

    import java.util.Arrays;
    import java.util.Random;
    import java.util.concurrent.TimeUnit;

    import org.openjdk.jmh.annotations.Benchmark;
    import org.openjdk.jmh.annotations.BenchmarkMode;
    import org.openjdk.jmh.annotations.Level;
    import org.openjdk.jmh.annotations.Mode;
    import org.openjdk.jmh.annotations.OutputTimeUnit;
    import org.openjdk.jmh.annotations.Param;
    import org.openjdk.jmh.annotations.Scope;
    import org.openjdk.jmh.annotations.Setup;
    import org.openjdk.jmh.annotations.State;

    @State(Scope.Thread)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    public class ArrayWrapperInterfaceBenchmark {
      @Param({ "100", "1000", "10000" })
      public int size;
      private DataSet datasetA;
      private DataSet datasetB;

      private static final class DataSet {
        private final int[] data;

        public DataSet(DataSet ds) {
          this.data = Arrays.copyOf(ds.data, ds.data.length);
        }

        public DataSet(int size) {
          Random r = new Random();
          data = new int[size];
          for (int i = 0; i < size; ++i) {
            data[i] = r.nextInt();
          }
        }

        int intSize() {
          return data.length;
        }

        int intGet(int index) {
          return data[index];
        }
        void intSet(int index, int v) {
          data[index] = v;
        }
        long longSize() {
          return data.length;
        }

        int longGet(long index) {
          return data[(int) index];
        }
        void longSet(long index, int v) {
          data[(int) index] = v;
        }
      }

      @Setup(Level.Trial)
      public void setup() {
        datasetA = new DataSet(size);
        datasetB = new DataSet(datasetA);
      }

      @Benchmark
      public int sumInt() {
        int sum = 0;
        for (int index = 0; index < datasetA.intSize(); ++index) {
          sum += datasetA.intGet(index);
        }
        return sum;
      }

      @Benchmark
      public int sumLong() {
        int sum = 0;
        for (long index = 0; index < datasetA.longSize(); ++index) {
          sum += datasetA.longGet(index);
        }
        return sum;
      }

      @Benchmark
      public boolean equalsInt() {
        for (int index = 0; index < datasetA.intSize(); ++index) {
          if(datasetA.intGet(index) != datasetB.intGet(index))
            return false;
        }
        return true;
      }
      @Benchmark
      public boolean equalsLong() {
        for (long index = 0; index < datasetA.longSize(); ++index) {
          if(datasetA.longGet(index) != datasetB.longGet(index))
            return false;
        }
        return true;
      }

      @Benchmark
      public void fillInt() {
        for (int index = 0; index < datasetA.intSize(); ++index) {
          datasetA.intSet(index, size);
        }
      }

      @Benchmark
      public void fillLong() {
        for (long index = 0; index < datasetA.longSize(); ++index) {
          datasetA.longSet(index, size);
        }
      }
      @Benchmark
      public void copyInt() {
        for (int index = 0; index < datasetA.intSize(); ++index) {
          datasetA.intSet(index, datasetB.intGet(index));
        }
      }

      @Benchmark
      public void copyLong() {
        for (long index = 0; index < datasetA.longSize(); ++index) {
          datasetA.longSet(index, datasetB.longGet(index));
        }
      }
    }

Here's the results I see:

    Benchmark                                  (size)  Mode  Cnt    Score     Error  Units
    ArrayWrapperInterfaceBenchmark.copyInt       1000  avgt    5   76.943 ยฑ  14.256  ns/op
    ArrayWrapperInterfaceBenchmark.copyLong      1000  avgt    5  796.583 ยฑ  55.583  ns/op
    ArrayWrapperInterfaceBenchmark.equalsInt     1000  avgt    5  367.192 ยฑ  16.398  ns/op
    ArrayWrapperInterfaceBenchmark.equalsLong    1000  avgt    5  806.421 ยฑ 196.106  ns/op
    ArrayWrapperInterfaceBenchmark.fillInt       1000  avgt    5   84.075 ยฑ  19.033  ns/op
    ArrayWrapperInterfaceBenchmark.fillLong      1000  avgt    5  567.866 ยฑ  10.154  ns/op
    ArrayWrapperInterfaceBenchmark.sumInt        1000  avgt    5  338.204 ยฑ  44.529  ns/op
    ArrayWrapperInterfaceBenchmark.sumLong       1000  avgt    5  585.657 ยฑ 105.808  ns/op

The large arrays have similar results attached. If I was going to get all serious about this I would run on a better machine and more iterations etc, but as a sketch this is representative enough.
The difference is quite noticeable, especially when some cool optimization gets to kick ass on the int version but not the long one. This phenomena might go away in future JVMs, but at the present the above is as per my expectations.
Note that when doing nano scale benchmarks it's best to have a look at the assembly (use perfasm) and see what's happening. Setting the benchmark mode to average time (-bm avgt, or use annotations) will help you notice the scale of the benchmark, so I personally prefer to work with it.
e.g. on the copy benchmark we get the following profile hotspot for int:

  0.90%    0.80%    โ”‚โ”‚โ”‚โ”‚ โ”‚โ†—โ”‚โ”‚  0x00007fbcf3f66551: vmovdqu 0x10(%r9,%r10,4),%ymm0
 10.25%   10.74%    โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fbcf3f66558: vmovdqu %ymm0,0x10(%r11,%r10,4)  ;*iastore
 73.89%   75.55%    โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fbcf3f6655f: add    $0x8,%r10d         ;*iinc
  0.01%    0.01%    โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fbcf3f66563: cmp    %r8d,%r10d
                    โ”‚โ”‚โ”‚โ”‚ โ”‚โ•ฐโ”‚โ”‚  0x00007fbcf3f66566: jl     0x00007fbcf3f66551  ;*if_icmpge

For the long copy we get this:

 10.78%   11.48%   โ†—    0x00007f2fc91ff770: mov    %ebp,%r9d          ;*aload_0
 10.73%   11.72%   โ”‚ โ†—  0x00007f2fc91ff773: cmp    %esi,%r9d
                  โ•ญโ”‚ โ”‚  0x00007f2fc91ff776: jae    0x00007f2fc91ff7d6
  8.38%    7.98%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff778: mov    0x10(%rdx,%r9,4),%edi  ;*iaload
 11.56%   10.06%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff77d: cmp    %ecx,%r9d
 10.18%    9.39%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff780: jae    0x00007f2fc91ff809
 10.57%   10.17%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff786: mov    %edi,0x10(%r11,%r9,4)  ;*goto
 13.68%   14.68%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff78b: add    $0x1,%rbp          ; OopMap{r10=NarrowOop r11=NarrowOop r8=Oop rbx=Oop rdx=NarrowOop rax=Oop off=271}
  9.16%    9.09%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff78f: test   %eax,0x17f0886b(%rip)        # 0x00007f2fe1108000
 10.67%   11.70%  โ”‚โ”‚ โ”‚  0x00007f2fc91ff795: cmp    %r13,%rbp

So for ints we got vmovdqu copying 256bit at a time via the YMM registers, that's 32 elements copied at a time, all out SIMD awesomeness. For long we got one element copied at a time + a safepoint in the loop. Very sad :(.

A further consideration is the cost on 32bit systems, but I've run out of steam...

@buchgr
Contributor
buchgr commented Aug 19, 2015

@nitsanw thank you very much !

Am I right concluding that based on this we should benchmark real world use cases where we loop over ByteBuf's then? given that there seem to be a lot of things that kill any such optimization one gains from int over long?

@Scottmitch
Member

@nitsanw - Great explanation / breakdown!

@nitsanw
Contributor
nitsanw commented Aug 19, 2015

@buchgr benchmarking relevant usages is the way forward. Not sure about many things killing optimizations though. You can have prety complex data processing going in a loop without inhibiting optimizations.

@buchgr
Contributor
buchgr commented Aug 19, 2015

@nitsanw so your recommendation / feeling is to stick with int?

'cause of your comment

Sadly, safepoints have requirements which may block other optimizations from happening.

So I (prematurely?) concluded that if i.e. I call a function that doesn't get inlined within my loop it will come with a safepoint and thus not allow those optimizations to happen?

@nitsanw
Contributor
nitsanw commented Aug 19, 2015

If the work in the loop is non-trivial than the loop construct overhead will become less significant. Assuming the work is ALWAYS insignificant seems a hard decision to make, given I'm not familiar with the API we're trying to replace and it's usage. I read from the ticket (correct me if I'm wrong) that this is a ByteBuffer like container, in which case the element we take out of the container is trivial (a byte?). How much work will a single element trigger? is it reasonable to assume most code will have relatively small loop bodies when iterating over this kind of container?
My gut feeling is that this is not a good idea from a performance perspective, but performance isn't everything and I may be completely misjudging the usage.

@buchgr
Contributor
buchgr commented Aug 19, 2015

@nitsanw makes sense. thank you!

@nmittler @Scottmitch @trustin @normanmaurer @Apache9
Interestingly in e.g. ByteBuf.forEachByte(...) we are using ... a while loop :-). Also I guess stuff like forEachByte could continue to use int indices.

@nitsanw
Contributor
nitsanw commented Aug 19, 2015

A while loop can still be interpreted by the compiler as a for loop control structure, as long as it's simple enough. Worth checking.
I assume you are referring to this:

        do {
            if (processor.process(_getByte(i))) {
                i ++;
            } else {
                return i;
            }
        } while (i < endIndex);

Which is the same as:

int i=index;
boolean allIsAwesome = true;
for(;i<endIndex && allIsAwesome;i++)
   allIsAwesome = processor.process(_getByte(i));
return i;

https://github.com/netty/netty/blob/ed4a89082bb29b9e7d869c5d25d6b9ea8fc9d25b/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java#L982

@buchgr
Contributor
buchgr commented Aug 19, 2015

@nitsanw yes I see, but I understand one has to check these things on a case-by-case basis, correct?

@Apache9
Contributor
Apache9 commented Aug 20, 2015

@nitsanw Great analysis!
So for some reason, JIT does not unroll a loop which uses a long index or unroll the loop with safepoint as part of loop body that prevents further optimization such as vectorization?

@nmittler I think we could review the patch to verify if our long-index loops suffer from this? I skimmed the patch again and didn't find any simple long-index loops... Maybe something in CompositeReadableObject? Could we write a benchmark for it?

Thanks.

@nmittler
Member

@nitsanw Awesome explanation ... very helpful!

@normanmaurer @Scottmitch @Apache9 @trustin Perhaps the best path forward is to use a more realistic benchmark of aggregate Netty ByteBuf processing where we're hitting some of the actual loops in the pipeline. Does this sound reasonable? Any suggestions?

@nmittler
Member

I was just chatting with @louiscryan offline and we have an idea which may help to get past this. The core part of the idea is that the only part of the system that needs long indices is when we actually go to write the FileRegion. The idea is this:

  1. Change ReadableObject to use int positions.
  2. FileRegion will no longer implement ReadableObject.
  3. Allow FileRegion to be sliced into a sequence of FileRegionSlices, which only hold int-based positions and implement ReadableObject.
  4. PendingWriteQueue will only take ReadableObjects as input, but when data is read out of it, any contiguous FileRegionSlices (for the same FileRegion) will be merged into a newFileRegion. Note that this will only need to happen if a merge is needed (i.e. > 1 mergeableFileRegionSlice`s in a row),

Unfortunately this means that when we go to do the actual write, we'll still need separate logic for handling FileRegion and ReadableObject, but maybe that's just a necessary evil.

What do you guys think?

@buchgr
Contributor
buchgr commented Aug 20, 2015

@nmittler doesn't the ByteBuf.forEachByte visitor pattern allow one to avoid the long iteration anyway, as ByteBuf can just use int there?

@nmittler
Member

@buchgr I'm actually not aware of which places iterate byte-by-byte. I suspect what we're really talking about are the loops in the channels:

AbstractNioByteChannel
AbstractEpollStreamChannel
AbstractOioByteChannel
OioByteStreamChannel.java
NioSocketChannel.java

@normanmaurer are there other places that are of concern here?

@buchgr
Contributor
buchgr commented Aug 20, 2015

@nmittler @normanmaurer hmm I see. Any specific loop in mind? I couldn't find one where we iterate over a Bytebuf?.In the channels afaik we just memcopy the bytes to the socket.

@nmittler
Member

@buchgr yeah that's sort of my point ... I don't see one either.

@normanmaurer any thoughts?

@trustin
Member
trustin commented Aug 21, 2015

SGTM. Will review once you update this PR or open a new one based on the new proposal.

@nmittler
Member

@trustin actually before I re-work this PR, @buchgr and I are asking to see code references where the iteration might be an issue ... we haven't been able to find any (see #3969 (comment) and the comments that follow). Do you or @normanmaurer have any hotspots in mind that might be impacted by the switch to long?

@trustin
Member
trustin commented Aug 22, 2015

@nmittler Ah, I thought it's an orthogonal question.

I'm not sure why forEachByte() is a problem when we continue using int as an index variable. (That's what @nmittler is proposing, right?) Could you elaborate?

@buchgr
Contributor
buchgr commented Aug 22, 2015

@trustin I think the question is: can you think of / point to any loops that iterate over ByteBuf? If not, why worry about safepoints and optimizations in the first place? Why not use long everywhere? Thanks :-).

and yes forEachByte should continue to use int.

@trustin
Member
trustin commented Aug 22, 2015

@buchgr Ah, that's a great question. Byte-by-byte iteration will most likely happen in decoders. A good example is DelimiterBasedFrameDecoder.indexOf(), and header decoding part in the HTTP codec. They currently do not use forEachByte() but they do iterate in byte-by-byte manner.

Also, please note netty-buffer can be used as a generic buffer library, so it's probably not safe to make such a assumption.

@normanmaurer
Member

I agree with @trustin. We can not expect users to not do it.

Am 22.08.2015 um 04:00 schrieb Trustin Lee notifications@github.com:

@buchgr Ah, that's a great question. Byte-by-byte iteration will most likely happen in decoders. A good example is DelimiterBasedFrameDecoder.indexOf(), and header decoding part in the HTTP codec. They currently do not use forEachByte() but they do iterate in byte-by-byte manner.

Also, please note netty-buffer can be used as a generic buffer library, so it's probably not safe to make such a assumption.

โ€”
Reply to this email directly or view it on GitHub.

@Apache9
Contributor
Apache9 commented Aug 22, 2015

AFAIK the new ReadableObject just add some new methods to ByteBuf without changing the existing methods? Maybe the word 'Readable' is a bit confusing. I assume our purpose here is to unify ByteBuf and FileRegion when writing data out, not a generic 'Readable' interface. Am I right? @nmittler @buchgr

@nmittler
Member

@Apache9 that's right ... the assumption is that ByteBufwill retain it's current methods (it will just be changed to support the new interface as well), so I think the iteration issue is not a concern for ByteBuf. It may be the case that ReadableObject isn't the best since its aim is effectively to write to a WritableByteChannel ... I am definitely open to suggestions :)

@nmittler
Member

@trustin @normanmaurer given that ByteBuf will still retain the int based index API that we all know and love, is there still any concern with it also supporting the long API? HTTP/2 support for FileRegion is blocked on this.

@trustin
Member
trustin commented Aug 26, 2015

@nmittler I thought the plan was to use int as an index value type and to give up making FileRegion implement ReadableObject but only making its 'slices' do? Did I miss something?

@Apache9
Contributor
Apache9 commented Aug 26, 2015

@trustin We plan to use ReadableObject when writing to a WritableByteChannel only. Actually ReadableObject should have only two methods: readTo and readSlice, and there maybe other methods such as readerPosition for supporting readSlice operation. Here we do not aim to replace the methods in ByteBuf, we only add some new methods to ByteBuf, and we do not provide byte-by-byte iteration methods in ReadableObject. So I think long index is not a problem here?

Thanks.

@trustin
Member
trustin commented Aug 27, 2015

(Sorry for the long round-trip time. Pretty busy since moved to a new company last March) Ah, I see. Sound fine to me. Please go ahead, then.

@normanmaurer
Member

+1

@nmittler
Member

@trustin @normanmaurer thanks!

Now that we've agreed on the general approach, let me rebase this PR and let's go through a formal review. I'll keep the scope the same as it is ... the API and FileRegion implements it. I'll refactor ByteBuf in a follow-on PR. Sound good?

@trustin
Member
trustin commented Aug 29, 2015

@nmittler SGTM. Please ping us when this PR is ready for a review.

@nmittler
Member
nmittler commented Sep 1, 2015

Will do ... probably won't get to this until after Netty releases beta6 and grpc releases its beta. Should be soon.

@normanmaurer normanmaurer modified the milestone: 4.1.0.Beta6, 4.1.0.RC1 Sep 2, 2015
@normanmaurer normanmaurer modified the milestone: 4.1.0.CR1, 4.1.0.Final Nov 27, 2015
@nitsanw nitsanw referenced this pull request in real-logic/Agrona Dec 10, 2015
@RichardWarburton RichardWarburton make buffers long indexed ac53814
@Scottmitch
Member

@nmittler - FYI i think this got closed as part of the master branch rename

@Apache9
Contributor
Apache9 commented Dec 18, 2015

I found the difficulty here is how to make EpollSocketChannel work with ReadbleObject.
@nmittler Any progress? If not, I could have a try.

Thanks.

@nmittler
Member

@Apache9 sorry, I haven't had any space cycles for this. Feel free to take over!

I think the approach we should go with is to stay with the int-based indices. We can break up FileRegions into chunks of int-based ranges. When we go to write, if we have several chunks for the same region, we can merge them back together before writing.

I think this will play a lot nicer with Netty in general, WDYT?

@normanmaurer normanmaurer modified the milestone: 4.1.0.Final, 4.1.0.CR7 Apr 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment