Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4926314: Optimize Reader.read(CharBuffer) #1915

Open
wants to merge 15 commits into
base: master
from
@@ -1,5 +1,5 @@
/*
* Copyright (c) 1996, 2019, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 1996, 2021, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
@@ -25,6 +25,8 @@

package java.io;

import java.nio.CharBuffer;

/**
* This class implements a character buffer that can be used as a
* character-input stream.
@@ -145,6 +147,23 @@ public int read(char b[], int off, int len) throws IOException {
}
}

@Override
public int read(CharBuffer target) throws IOException {
synchronized (lock) {
ensureOpen();

if (pos >= count) {
return -1;
}

int avail = count - pos;
int len = Math.min(avail, target.remaining());
target.put(buf, pos, len);
pos += len;
return avail;
}
}

/**
* Skips characters. Returns the number of characters that were skipped.
*
@@ -187,11 +187,14 @@ public int read(CharBuffer target) throws IOException {
int nread;
if (target.hasArray()) {
char cbuf[] = target.array();
int off = target.arrayOffset() + target.position();
int len = target.remaining();
nread = this.read(cbuf, off, len);
int pos = target.position();
int rem = target.limit() - pos;
if (rem <= 0)
return -1;
int off = target.arrayOffset() + pos;
nread = this.read(cbuf, off, rem);
if (nread > 0)
target.position(target.position() + nread);
target.position(pos + nread);
} else {
int remaining = target.remaining();
char cbuf[] = new char[Math.min(remaining, TRANSFER_BUFFER_SIZE)];

This comment has been minimized.

@bplb

bplb Feb 8, 2021
Member

As cbuf for the off-heap case is used in a synchronized block, is there the opportunity for some sort of cached array here and would it help?

This comment has been minimized.

@marschall

marschall Feb 9, 2021
Author Contributor

That would be possible. It would help in cases where a large Reader is read into one or several relatively small off-heap CharBuffers, requiring multiple #read calls. This can only be done when the caller is able to work with only a partial input. I don't know how common this case is.

We could re-purpose #skipBuffer, it has the same maximum size (8192) but determined by a different constant (#maxSkipBufferSize instead of #TRANSFER_BUFFER_SIZE). That would likely require it to be renamed and maybe we should even remove #maxSkipBufferSize. We could also do the reallocation and growing similar as is currently done in #skip.

This comment has been minimized.

@bplb

bplb Feb 10, 2021
Member

Perhaps a static final WORK_BUFFER_SIZE could be added with value 8192 and maxSkipBufferSize and TRANSFER_BUFFER_SIZE replaced with that? Then skipBuffer could be renamed to workBuffer and used in both read(CharBuffer) and skip(long). That shouldn't be a problem as both uses are in synchronized blocks. Also I suggest putting the declaration of workBuffer just below that of lock instead of lower down the file where skipBuffer is.

Lastly you mentioned C-style array declarations like char buf[]. As there are only four of these in the file it might be good to just go ahead and change them, I don't think that adds much noise or risk.

This comment has been minimized.

@marschall

marschall Feb 12, 2021
Author Contributor

Done. I left #transferTo(Writer) untouched for now. Firstly it is not already behind a synchronized. Secondly it writes so there is no need for repeated calls.

This comment has been minimized.

@bokken

bokken Jan 10, 2021

Would there be value in making this a (lazily created) member variable? That would allow a single instance to be reused. It seems likely that, if one call is made with a direct CharBuffer, subsequent calls will also be made with direct instances (probably same instance?).

This comment has been minimized.

@marschall

marschall Jan 18, 2021
Author Contributor

I am not sure. It would help to reduce the allocation rate when reading a large amount of data into a small direct CharBuffer. I don't know how common that is. We would introduce an instance variable for one path in one method.

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2001, 2021, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2001, 2019, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
@@ -189,30 +189,6 @@ public int read(char cbuf[], int offset, int length) throws IOException {
}
}

public int read(CharBuffer target) throws IOException {
int len = target.remaining();
synchronized (lock) {
ensureOpen();
if (len == 0)
return 0;

int n = 0;

if (haveLeftoverChar) {
// Copy the leftover char into the buffer
target.put(leftoverChar);
len--;
haveLeftoverChar = false;
n = 1;
if ((len == 0) || !implReady())
// Return now if this is all we can produce w/o blocking
return n;
}

return n + implRead(target);
}
}

public boolean ready() throws IOException {
synchronized (lock) {
ensureOpen();
@@ -346,57 +322,47 @@ int implRead(char[] cbuf, int off, int end) throws IOException {
assert (end - off > 1);

CharBuffer cb = CharBuffer.wrap(cbuf, off, end - off);

return implRead(cb);
}

int implRead(CharBuffer cb) throws IOException {

// In order to handle surrogate pairs, this method requires that
// the invoker attempt to read at least two characters. Saving the
// extra character, if any, at a higher level is easier than trying
// to deal with it here.
if (cb.position() != 0)
// Ensure that cb[0] == cbuf[off]
cb = cb.slice();

boolean eof = false;
int initialPosition = cb.position();
int nread;
for (;;) {
CoderResult cr = decoder.decode(bb, cb, eof);
nread = cb.position() - initialPosition;
if (cr.isUnderflow()) {
if (eof)
break;
if (!cb.hasRemaining())
break;
if ((nread > 0) && !inReady())
break; // Block at most once
int n = readBytes();
if (n < 0) {
eof = true;
if ((nread == 0) && (!bb.hasRemaining()))
break;
decoder.reset();
}
continue;
}
if (cr.isOverflow()) {
assert nread > 0;
CoderResult cr = decoder.decode(bb, cb, eof);
if (cr.isUnderflow()) {
if (eof)
break;
if (!cb.hasRemaining())
break;
if ((cb.position() > 0) && !inReady())
break; // Block at most once
int n = readBytes();
if (n < 0) {
eof = true;
if ((cb.position() == 0) && (!bb.hasRemaining()))
break;
decoder.reset();
}
cr.throwException();
continue;
}
if (cr.isOverflow()) {
assert cb.position() > 0;
break;
}
cr.throwException();
}

if (eof) {
// ## Need to flush decoder
decoder.reset();
// ## Need to flush decoder
decoder.reset();
}

if (nread == 0) {
if (cb.position() == 0) {
if (eof)
return -1;
assert false;
}
return nread;
return cb.position();
}

String encodingName() {
ProTip! Use n and p to navigate between commits in a pull request.