Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4926314: Optimize Reader.read(CharBuffer) #1915

Closed
wants to merge 16 commits into from
Closed
7 changes: 6 additions & 1 deletion src/java.base/share/classes/java/io/InputStreamReader.java
@@ -1,5 +1,5 @@
/*
* Copyright (c) 1996, 2019, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 1996, 2021, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Expand All @@ -25,6 +25,7 @@

package java.io;

import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import sun.nio.cs.StreamDecoder;
Expand Down Expand Up @@ -149,6 +150,10 @@ public String getEncoding() {
return sd.getEncoding();
}

public int read(CharBuffer target) throws IOException {
return sd.read(target);
}

/**
* Reads a single character.
*
Expand Down
35 changes: 28 additions & 7 deletions src/java.base/share/classes/java/io/Reader.java
@@ -1,5 +1,5 @@
/*
* Copyright (c) 1996, 2020, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 1996, 2021, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Expand Down Expand Up @@ -184,12 +184,33 @@ protected Reader(Object lock) {
* @since 1.5
*/
public int read(CharBuffer target) throws IOException {
int len = target.remaining();
char[] cbuf = new char[len];
int n = read(cbuf, 0, len);
if (n > 0)
target.put(cbuf, 0, n);
return n;
int nread;
if (target.hasArray()) {
char cbuf[] = target.array();
int off = target.arrayOffset() + target.position();
int len = target.remaining();
nread = this.read(cbuf, off, len);
if (nread > 0)
target.position(target.position() + nread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As target is mutable, I think you would do better to change lines 189-194 to something like:

            char cbuf[] = target.array();
            int pos = target.position();
            int rem = target.limit() - pos;
            if (rem <= 0)
                return -1;
            int off = target.arrayOffset() + pos;
            nread = this.read(cbuf, off, rem);
            if (nread > 0)
                target.position(pos + nread);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
int remaining = target.remaining();
char cbuf[] = new char[Math.min(remaining, TRANSFER_BUFFER_SIZE)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As cbuf for the off-heap case is used in a synchronized block, is there the opportunity for some sort of cached array here and would it help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be possible. It would help in cases where a large Reader is read into one or several relatively small off-heap CharBuffers, requiring multiple #read calls. This can only be done when the caller is able to work with only a partial input. I don't know how common this case is.

We could re-purpose #skipBuffer, it has the same maximum size (8192) but determined by a different constant (#maxSkipBufferSize instead of #TRANSFER_BUFFER_SIZE). That would likely require it to be renamed and maybe we should even remove #maxSkipBufferSize. We could also do the reallocation and growing similar as is currently done in #skip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a static final WORK_BUFFER_SIZE could be added with value 8192 and maxSkipBufferSize and TRANSFER_BUFFER_SIZE replaced with that? Then skipBuffer could be renamed to workBuffer and used in both read(CharBuffer) and skip(long). That shouldn't be a problem as both uses are in synchronized blocks. Also I suggest putting the declaration of workBuffer just below that of lock instead of lower down the file where skipBuffer is.

Lastly you mentioned C-style array declarations like char buf[]. As there are only four of these in the file it might be good to just go ahead and change them, I don't think that adds much noise or risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I left #transferTo(Writer) untouched for now. Firstly it is not already behind a synchronized. Secondly it writes so there is no need for repeated calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be value in making this a (lazily created) member variable? That would allow a single instance to be reused. It seems likely that, if one call is made with a direct CharBuffer, subsequent calls will also be made with direct instances (probably same instance?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. It would help to reduce the allocation rate when reading a large amount of data into a small direct CharBuffer. I don't know how common that is. We would introduce an instance variable for one path in one method.

nread = 0;
synchronized (lock) {
int n;
do {
// read to EOF which may read more or less than buffer size
if ((n = read(cbuf, 0, Math.min(remaining, cbuf.length))) > 0) {
target.put(cbuf, 0, n);
nread += n;
remaining -= n;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't there be a possibility for target.put(cbuf, 0, n) to throw BufferOverflowException ?
For example:

  • there's room (remaining) for TRANSFER_BUFFER_SIZE + 1 characters in target
  • cbuff is sized to TRANSFER_BUFFER_SIZE
  • 1st iteration of do loop transfers TRANSFER_BUFFER_SIZE charasters (remaining == 1)
  • 2nd iteration reads > 1 (up to TRANSFER_BUFFER_SIZE) characters
  • target.put throws BufferOverflowException

You have to limit the amount read in each iteration to be Math.min(remaining, cbuf.length)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. I need to expand the unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
// if the last call to read returned -1 or the number of bytes
// requested have been read then break
} while (n >= 0 && remaining > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for case that the char buffer has a backing array looks okay but I'm not sure about the direct buffer/other cases. One concern is that this is a read method, not a transferXXX method so we shouldn't be calling the underlying read several times. You'll see what I mean if you consider the scenario where you read < rem, then read again and the second read blocks or throws. I'l also concerned about "workBuffer" adding more per-stream footprint for cases where skip or read(CB) is used. Objects such as InputStreamReader are already a problem due to the underlying stream decoder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So you propose to revert the off-heap path to the current master? That would be fine with me. The original bug and my motivation was only about the backing array case, the rest crept in. That would certainly keep the risk and impact lower.

Copy link
Member

@bplb bplb Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what @AlanBateman intended. The skip() changes would revert also (I think) but the C-style array changes can stay. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's bring it back to just eliminating the intermediate array when the buffer has a backing array. The other case that been examined separated if needed but we can't use the approach proposed in the current PR because it changes the semantics of read when there is a short-read followed by a blocking or exception throwing read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}
return nread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this back to just the heap buffer case. This part looks good now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

}

/**
Expand Down
90 changes: 62 additions & 28 deletions src/java.base/share/classes/sun/nio/cs/StreamDecoder.java
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2001, 2019, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2001, 2021, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Expand Down Expand Up @@ -189,6 +189,30 @@ public int read(char cbuf[], int offset, int length) throws IOException {
}
}

public int read(CharBuffer target) throws IOException {
int len = target.remaining();
synchronized (lock) {
ensureOpen();
if (len == 0)
return 0;

int n = 0;

if (haveLeftoverChar) {
// Copy the leftover char into the buffer
target.put(leftoverChar);
len--;
haveLeftoverChar = false;
n = 1;
if ((len == 0) || !implReady())
// Return now if this is all we can produce w/o blocking
return n;
}

return n + implRead(target);
}
}

public boolean ready() throws IOException {
synchronized (lock) {
ensureOpen();
Expand Down Expand Up @@ -322,47 +346,57 @@ int implRead(char[] cbuf, int off, int end) throws IOException {
assert (end - off > 1);

CharBuffer cb = CharBuffer.wrap(cbuf, off, end - off);
if (cb.position() != 0)
// Ensure that cb[0] == cbuf[off]
cb = cb.slice();

return implRead(cb);
}

int implRead(CharBuffer cb) throws IOException {

// In order to handle surrogate pairs, this method requires that
// the invoker attempt to read at least two characters. Saving the
// extra character, if any, at a higher level is easier than trying
// to deal with it here.

boolean eof = false;
int initialPosition = cb.position();
int nread;
for (;;) {
CoderResult cr = decoder.decode(bb, cb, eof);
if (cr.isUnderflow()) {
if (eof)
break;
if (!cb.hasRemaining())
break;
if ((cb.position() > 0) && !inReady())
break; // Block at most once
int n = readBytes();
if (n < 0) {
eof = true;
if ((cb.position() == 0) && (!bb.hasRemaining()))
CoderResult cr = decoder.decode(bb, cb, eof);
nread = cb.position() - initialPosition;
if (cr.isUnderflow()) {
if (eof)
break;
if (!cb.hasRemaining())
break;
decoder.reset();
if ((nread > 0) && !inReady())
break; // Block at most once
int n = readBytes();
if (n < 0) {
eof = true;
if ((nread == 0) && (!bb.hasRemaining()))
break;
decoder.reset();
}
continue;
}
continue;
}
if (cr.isOverflow()) {
assert cb.position() > 0;
break;
}
cr.throwException();
if (cr.isOverflow()) {
assert nread > 0;
break;
}
cr.throwException();
}

if (eof) {
// ## Need to flush decoder
decoder.reset();
// ## Need to flush decoder
decoder.reset();
}

if (cb.position() == 0) {
if (nread == 0) {
if (eof)
return -1;
assert false;
}
return cb.position();
return nread;
}

String encodingName() {
Expand Down
84 changes: 84 additions & 0 deletions test/jdk/java/io/InputStreamReader/ReadCharBuffer.java
@@ -0,0 +1,84 @@
/*
* Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/

/*
* @test
* @bug 4926314
* @summary Test for InputStreamReader#read(CharBuffer).
* @run testng ReadCharBuffer
*/

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.io.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally better not to use a wild card.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.util.Arrays;

import static java.nio.charset.StandardCharsets.US_ASCII;
import static org.testng.Assert.assertEquals;

public class ReadCharBuffer {

private static final int BUFFER_SIZE = 24;

@DataProvider(name = "buffers")
public Object[][] createBuffers() {
// test both on-heap and off-heap buffers has they make use different code paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as they may"

return new Object[][]{
new Object[]{CharBuffer.allocate(BUFFER_SIZE)},
new Object[]{ByteBuffer.allocateDirect(BUFFER_SIZE * 2).asCharBuffer()}
};
}

@Test(dataProvider = "buffers")
public void read(CharBuffer buffer) throws IOException {
fillBuffer(buffer);

try (Reader reader = new InputStreamReader(new ByteArrayInputStream("ABCDEFGHIJKLMNOPQRTUVWXYZ".getBytes(US_ASCII)), US_ASCII)) {
buffer.limit(7);
buffer.position(1);
assertEquals(reader.read(buffer), 6);
assertEquals(buffer.position(), 7);
assertEquals(buffer.limit(), 7);

buffer.limit(16);
buffer.position(8);
assertEquals(reader.read(buffer), 8);
assertEquals(buffer.position(), 16);
assertEquals(buffer.limit(), 16);
}

buffer.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think buffer.rewind() would be more in keeping with the specification verbiage although there will be no practical effect here. Same comment applies below and in the other test ReadCharBuffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer.rewind() would not work in this specific case as it does not reset the limit. I want to assert the entire buffers content to make sure the method didn't accidentally write where it shouldn't have. Therefore limit needs to be set to capacity.

assertEquals(buffer.toString(), "xABCDEFxGHIJKLMNxxxxxxxx");
}

private void fillBuffer(CharBuffer buffer) {
char[] filler = new char[BUFFER_SIZE];
Arrays.fill(filler, 'x');
buffer.put(filler);
buffer.clear();
}

}