Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4926314: Optimize Reader.read(CharBuffer) #1915

Open
wants to merge 15 commits into
base: master
from
@@ -190,18 +190,17 @@ public int read(CharBuffer target) throws IOException {
int off = target.arrayOffset() + target.position();
int len = target.remaining();
nread = this.read(cbuf, off, len);
if (nread > 0) {
if (nread > 0)
target.position(target.position() + nread);

This comment has been minimized.

@bplb

bplb Feb 8, 2021
Member

As target is mutable, I think you would do better to change lines 189-194 to something like:

            char cbuf[] = target.array();
            int pos = target.position();
            int rem = target.limit() - pos;
            if (rem <= 0)
                return -1;
            int off = target.arrayOffset() + pos;
            nread = this.read(cbuf, off, rem);
            if (nread > 0)
                target.position(pos + nread);

This comment has been minimized.

@marschall

marschall Feb 9, 2021
Author Contributor

Done

}
} else {
int remaining = target.remaining();
char cbuf[] = new char[Math.min(remaining, TRANSFER_BUFFER_SIZE)];

This comment has been minimized.

@bplb

bplb Feb 8, 2021
Member

As cbuf for the off-heap case is used in a synchronized block, is there the opportunity for some sort of cached array here and would it help?

This comment has been minimized.

@marschall

marschall Feb 9, 2021
Author Contributor

That would be possible. It would help in cases where a large Reader is read into one or several relatively small off-heap CharBuffers, requiring multiple #read calls. This can only be done when the caller is able to work with only a partial input. I don't know how common this case is.

We could re-purpose #skipBuffer, it has the same maximum size (8192) but determined by a different constant (#maxSkipBufferSize instead of #TRANSFER_BUFFER_SIZE). That would likely require it to be renamed and maybe we should even remove #maxSkipBufferSize. We could also do the reallocation and growing similar as is currently done in #skip.

This comment has been minimized.

@bplb

bplb Feb 10, 2021
Member

Perhaps a static final WORK_BUFFER_SIZE could be added with value 8192 and maxSkipBufferSize and TRANSFER_BUFFER_SIZE replaced with that? Then skipBuffer could be renamed to workBuffer and used in both read(CharBuffer) and skip(long). That shouldn't be a problem as both uses are in synchronized blocks. Also I suggest putting the declaration of workBuffer just below that of lock instead of lower down the file where skipBuffer is.

Lastly you mentioned C-style array declarations like char buf[]. As there are only four of these in the file it might be good to just go ahead and change them, I don't think that adds much noise or risk.

This comment has been minimized.

@marschall

marschall Feb 12, 2021
Author Contributor

Done. I left #transferTo(Writer) untouched for now. Firstly it is not already behind a synchronized. Secondly it writes so there is no need for repeated calls.

This comment has been minimized.

@bokken

bokken Jan 10, 2021

Would there be value in making this a (lazily created) member variable? That would allow a single instance to be reused. It seems likely that, if one call is made with a direct CharBuffer, subsequent calls will also be made with direct instances (probably same instance?).

This comment has been minimized.

@marschall

marschall Jan 18, 2021
Author Contributor

I am not sure. It would help to reduce the allocation rate when reading a large amount of data into a small direct CharBuffer. I don't know how common that is. We would introduce an instance variable for one path in one method.

nread = 0;
synchronized (lock) {
int n = 0;
int n;
do {
// read to EOF which may read more or less than buffer size
if ((n = read(cbuf)) > 0) {
if ((n = read(cbuf, 0, Math.min(remaining, cbuf.length))) > 0) {
target.put(cbuf, 0, n);
nread += n;
remaining -= n;

This comment has been minimized.

@plevart

plevart Jan 18, 2021
Contributor

Wouldn't there be a possibility for target.put(cbuf, 0, n) to throw BufferOverflowException ?
For example:

  • there's room (remaining) for TRANSFER_BUFFER_SIZE + 1 characters in target
  • cbuff is sized to TRANSFER_BUFFER_SIZE
  • 1st iteration of do loop transfers TRANSFER_BUFFER_SIZE charasters (remaining == 1)
  • 2nd iteration reads > 1 (up to TRANSFER_BUFFER_SIZE) characters
  • target.put throws BufferOverflowException

You have to limit the amount read in each iteration to be Math.min(remaining, cbuf.length)

This comment has been minimized.

@marschall

marschall Jan 19, 2021
Author Contributor

You're correct. I need to expand the unit tests.

This comment has been minimized.

@marschall

marschall Jan 26, 2021
Author Contributor

Fixed

@@ -35,15 +35,14 @@
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.util.Arrays;

import static java.nio.charset.StandardCharsets.US_ASCII;
import java.util.Objects;

import static org.testng.Assert.assertEquals;

@Test(groups = "unit")
public class ReadCharBuffer {

private static final int BUFFER_SIZE = 24;
private static final int BUFFER_SIZE = 8 + 8192 + 2;

@DataProvider(name = "buffers")
public Object[][] createBuffers() {
@@ -58,29 +57,91 @@
public void read(CharBuffer buffer) throws IOException {
fillBuffer(buffer);

try (Reader reader = new StringReader("ABCDEFGHIJKLMNOPQRTUVWXYZ")) {
buffer.limit(7);
StringBuilder input = new StringBuilder(BUFFER_SIZE - 2 + 1);
input.append("ABCDEF");
for (int i = 0; i < 8192; i++) {
input.append('y');
}
input.append("GH");

try (Reader reader = new UnoptimizedStringReader(input.toString())) {
// put only between position and limit in the target buffer
int limit = 1 + 6;
buffer.limit(limit);
buffer.position(1);
assertEquals(reader.read(buffer), 6);
assertEquals(buffer.position(), 7);
assertEquals(buffer.limit(), 7);
assertEquals(buffer.position(), limit);
assertEquals(buffer.limit(), limit);

buffer.limit(16);
// read the full temporary buffer
// and then accurately reduce the next #read call
limit = 8 + 8192 + 1;
buffer.limit(8 + 8192 + 1);
buffer.position(8);
assertEquals(reader.read(buffer), 8);
assertEquals(buffer.position(), 16);
assertEquals(buffer.limit(), 16);
assertEquals(reader.read(buffer), 8192 + 1);

This comment has been minimized.

@bplb

bplb Apr 16, 2021
Member

Lines 80 + 83 might be easier to read if replaced with

            int position = 8;
            limit = position + 8192 + 1;
            buffer.limit(limit);
            buffer.position(position);
            assertEquals(reader.read(buffer), limit - position);
assertEquals(buffer.position(), limit);
assertEquals(buffer.limit(), limit);

assertEquals(reader.read(), 'H');
assertEquals(reader.read(), -1);
}

buffer.clear();
assertEquals(buffer.toString(), "xABCDEFxGHIJKLMNxxxxxxxx");
StringBuilder expected = new StringBuilder(BUFFER_SIZE);
expected.append("xABCDEFx");
for (int i = 0; i < 8192; i++) {
expected.append('y');
}
expected.append("Gx");
assertEquals(buffer.toString(), expected.toString());
}

private void fillBuffer(CharBuffer buffer) {
char[] filler = new char[BUFFER_SIZE];
char[] filler = new char[buffer.remaining()];
Arrays.fill(filler, 'x');
buffer.put(filler);
buffer.clear();
}

/**
* Unoptimized version of StringReader in case StringReader overrides
* #read(CharBuffer)
*/
static final class UnoptimizedStringReader extends Reader {

private String str;
private int next = 0;

UnoptimizedStringReader(String s) {
this.str = s;
}

@Override
public int read() throws IOException {
synchronized (lock) {
if (next >= str.length())
return -1;
return str.charAt(next++);
}
}

@Override
public int read(char cbuf[], int off, int len) throws IOException {
synchronized (lock) {
Objects.checkFromIndexSize(off, len, cbuf.length);
if (next >= str.length())
return -1;
int n = Math.min(str.length() - next, len);
str.getChars(next, next + n, cbuf, off);
next += n;
return n;
}
}

@Override
public void close() throws IOException {

}
}

}
ProTip! Use n and p to navigate between commits in a pull request.