Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4926314: Optimize Reader.read(CharBuffer) #1915

Open
wants to merge 15 commits into
base: master
from

Conversation

@marschall
Copy link
Contributor

@marschall marschall commented Dec 31, 2020

Implement three optimiztations for Reader.read(CharBuffer)

  • Add a code path for heap buffers in Reader#read to use the backing array instead of allocating a new one.
  • Change the code path for direct buffers in Reader#read to limit the intermediate allocation to TRANSFER_BUFFER_SIZE.
  • Implement InputStreamReader#read(CharBuffer) and delegate to StreamDecoder.
  • Implement StreamDecoder#read(CharBuffer) and avoid buffer allocation.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/1915/head:pull/1915
$ git checkout pull/1915

Update a local copy of the PR:
$ git checkout pull/1915
$ git pull https://git.openjdk.java.net/jdk pull/1915/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1915

View PR using the GUI difftool:
$ git pr show -t 1915

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/1915.diff

Implement three optimiztations for Reader.read(CharBuffer)

Add a code path for heap buffers in Reader#read to use the backing array instead of allocating a new one.

Change the code path for direct buffers in Reader#read to limit the intermediate allocation to TRANSFER_BUFFER_SIZE.

Implement InputStreamReader#read(CharBuffer) and delegate to StreamDecoder.

Implement StreamDecoder#read(CharBuffer) and avoid buffer allocation.
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 31, 2020

👋 Welcome back marschall! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Dec 31, 2020
@marschall
Copy link
Contributor Author

@marschall marschall commented Dec 31, 2020

A couple of implementation notes:

Reader#read(CharBuffer)

on-heap case

I introduced a dedicated path for the on-heap case and directly read into the backing array. This completely avoids the intermediate allocation and copy. Compared to the initial proposal the buffer position is updated. This has to be done because we bypass the buffer and directly read into the array. This also handles the case that #read returns -1.

I am using a c-style array declaration because the rest of the class uses it.

off-heap case

I limit the intermadiate allocation to TRANSFER_BUFFER_SIZE. In order to reduce the total intermediate allocation we now call #read multiple times until the buffer is full or EOF is reached. This changes the behavior slightly as now possibly more data is read. However this should contribute to reduce the overall intermediate allocations.

A lock is acquired to keep the the read atomic. This is consistent with #skip which also holds a lock over multiple #read calls. This is inconsistent with #transferTo which does not acquire a lock over multiple #read calls.

The implementation took inspiration from java.io.InputStream#readNBytes(int).

InputStreamReader#read(CharBuffer)

Since StreamDecoder is a Reader as well we can simply delegate.

StreamDecoder#read(CharBuffer)

Interestingly this was not implemented even though StreamDecoder internally works on NIO buffers.

on-heap case

We see a performance improvement and the elimination of all intermediate allocation.

StreamDecoder#read(char[], int, int) and InputStreamReader#read(char[], int, int) may get a small improvement as we no longer #slice the buffer.

off-heap case

We see the elimination of all intermediate allocation but a performance penalty because we hit the slow path in #decodeLoop. There is a trade-off here, we could improve the performance to previous levels by introducing intermediate allocation again. I don't know how much the users of off-heap buffers care about introducing intermediate allocation vs maximum throughput.

I was struggling to come up with microbenchmarks because the built in InputStream and Reader implementations are finite but JMH calls the benchmark methods repeatably. I ended up implementing custom infinite InputStream and Reader implementations. The code can be found there:

https://github.com/marschall/reader-benchmarks/tree/master/src/main/java/com/github/marschall/readerbenchmarks

An Excel with charts can be found here:

https://github.com/marschall/reader-benchmarks/raw/master/src/main/resources/4926314.xlsx

I looked for java.io.Reader#read(CharBuffer) users in the JDK and only found java.util.Scanner#readInput(). I wrote a microbenchmark for this but only found minuscule improvements due to regex dominating the runtime.

There seem to be no direct Reader tests in the tier1 suite, the initial code was wrong because I forgot to update the buffer code position but I only found out because some Scanner tests failed.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 31, 2020

@marschall The following labels will be automatically applied to this pull request:

  • core-libs
  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 31, 2020

The off-heap code path was missing a buffer.put
@bplb
Copy link
Member

@bplb bplb commented Jan 5, 2021

I changed the JBS issue summary to match the title of this PR so that integration blocker is removed.

How does the off-heap performance of InputStreamReader.read(CharBuffer) compare for the case where only the change to Reader is made versus if all your proposed changes are applied?

Some kind of specific test should likely be included.

Note that the more recent copyright year is now 2021.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 5, 2021

Mailing list message from Philippe Marschall on nio-dev:

On 05.01.21 01:53, Brian Burkhalter wrote:

...

I changed the JBS issue summary to match the title of this PR so that integration blocker is removed.

Thank you.

How does the performance of `InputStreamReader.read(CharBuffer)` compare for the case where only the change to `Reader` is made versus if all your proposed changes are applied?

I can look into this, this will take a moment. I guess it would also
make sense to have a comparison with a version that does intermediate
heap allocation for off-heap buffers in InputStreamReader#read(CharBuffer).

Some kind of specific test should likely be included.

Sure. The existing tests in this area seem to be #main-based. Would you
prefer #main based tests or TestNG tests?

Note that the more recent copyright year is now 2021.

Indeed it is, fixed.

Cheers
Philippe

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 5, 2021

Mailing list message from Brian Burkhalter on nio-dev:

On Jan 5, 2021, at 9:53 AM, Philippe Marschall <kustos at gmx.net> wrote:

How does the performance of `InputStreamReader.read(CharBuffer)` compare for the case where only the change to `Reader` is made versus if all your proposed changes are applied?

I can look into this, this will take a moment. I guess it would also
make sense to have a comparison with a version that does intermediate
heap allocation for off-heap buffers in InputStreamReader#read(CharBuffer).

If you like. I was just wondering whether the change to Reader.read(CharBuffer) would be enough.

Some kind of specific test should likely be included.

Sure. The existing tests in this area seem to be #main-based. Would you
prefer #main based tests or TestNG tests?

For new tests we seem to be preferring Tests NG.

Note that the more recent copyright year is now 2021.

Indeed it is, fixed.

Thanks.

Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210105/0de0aedc/attachment-0001.htm>

- add unit test for Reader#read(CharBuffer)
- add unit test for InputStreamReader#reader(CharBuffer)
- test with both on-heap and off-heap buffers
}
} else {
int remaining = target.remaining();
char cbuf[] = new char[Math.min(remaining, TRANSFER_BUFFER_SIZE)];

This comment has been minimized.

@bokken

bokken Jan 10, 2021

Would there be value in making this a (lazily created) member variable? That would allow a single instance to be reused. It seems likely that, if one call is made with a direct CharBuffer, subsequent calls will also be made with direct instances (probably same instance?).

This comment has been minimized.

@marschall

marschall Jan 18, 2021
Author Contributor

I am not sure. It would help to reduce the allocation rate when reading a large amount of data into a small direct CharBuffer. I don't know how common that is. We would introduce an instance variable for one path in one method.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 17, 2021

Mailing list message from Philippe Marschall on nio-dev:

On 05.01.21 01:53, Brian Burkhalter wrote:

On Thu, 31 Dec 2020 10:11:58 GMT, Philippe Marschall <github.com+471021+marschall at openjdk.org> wrote:
...

How does the performance of `InputStreamReader.read(CharBuffer)` compare for the case where only the change to `Reader` is made versus if all your proposed changes are applied?

I left the delegating one in InputStreamReader in but removed all
changes in StreamDecoder. Performance looks pretty good:

on-heap CharBuffer
- Throughput is a slightly lower than with the changes but close and
still better than before.
- Allocation rate is drastically reduced and comparable to the results
with the changes except for small buffer sizes (128 bytes).

off-heap CharBuffer
- Relative throughput depends on the buffer size. For small buffers (128
bytes) throughput is a bit lower than master but increases with buffer
size. For 1 KB buffers it is about even, for 1 MB buffers it is better.
Throughput is a lot better than with the StreamDecoder changes without
intermediate allocation, there we lose about one half to two thirds of
throughput.
- Allocation rate stays high (1.5 - 3 GB/s) and only drops with large
buffer sizes (1 MB) to 20 - 30 MB/s. The StreamDecoder changes cause the
allocation rate to drop to almost zero.

To be honest backing out of the StreamDecoder changes looks like a good
comprise to me to reduce the risk while still improving throughput and
reducing allocation rate, especially in the on-heap path.

Looking a bit further I wonder if CharArrayReader and StringReader
should implement #read(CharBuffer) as well and call CharBuffer#put
directly. And maybe even #transferTo(Writer).

Cheers
Philippe

if ((n = read(cbuf)) > 0) {
target.put(cbuf, 0, n);
nread += n;
remaining -= n;

This comment has been minimized.

@plevart

plevart Jan 18, 2021
Contributor

Wouldn't there be a possibility for target.put(cbuf, 0, n) to throw BufferOverflowException ?
For example:

  • there's room (remaining) for TRANSFER_BUFFER_SIZE + 1 characters in target
  • cbuff is sized to TRANSFER_BUFFER_SIZE
  • 1st iteration of do loop transfers TRANSFER_BUFFER_SIZE charasters (remaining == 1)
  • 2nd iteration reads > 1 (up to TRANSFER_BUFFER_SIZE) characters
  • target.put throws BufferOverflowException

You have to limit the amount read in each iteration to be Math.min(remaining, cbuf.length)

This comment has been minimized.

@marschall

marschall Jan 19, 2021
Author Contributor

You're correct. I need to expand the unit tests.

This comment has been minimized.

@marschall

marschall Jan 26, 2021
Author Contributor

Fixed

- limit the amount read
- add tests
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 3, 2021

Mailing list message from Philippe Marschall on nio-dev:

On 17.01.21 18:48, Philippe Marschall wrote:

...

To be honest backing out of the StreamDecoder changes looks like a good
comprise to me to reduce the risk while still improving throughput and
reducing allocation rate, especially in the on-heap path.

I gave it some more thought and propose to back out of the StreamDecoder
changes. While that leaves some optimization potential unused it keeps
the patch smaller, the risk lower and avoids any throughput regressions.

Looking a bit further I wonder if CharArrayReader and StringReader
should implement #read(CharBuffer) as well and call CharBuffer#put
directly. And maybe even #transferTo(Writer).

I did have a look at this [1] for coders LATIN1(0) and UTF16(1) as well
as 128 and 1024 char sized readers for both on- and off-heap buffers.

CharArrayReader#read(CharBuffer)

on-heap
Higher throughput than master but slightly lower throughput than the
current PR. I don? t know why that is, maybe the additional checks in
CharBuffer show up here.

off-heap
Higher throughput than master or the PR, all intermediate allocation is
gone.

StringReader#read(CharBuffer)

on-heap
Higher throughput than master but slightly lower throughput than the
current PR. I don? t know why that is, maybe the additional checks in
CharBuffer show up here. The situation is similar to CharArrayReader.

off-heap
Lower throughput but all intermediate allocation is gone. What shows up
here is the lack of an optimized #put(String) method for off-heap
CharBuffers like was done in JDK-8011135 for on-heap buffers.

Based on this is propose to add CharArrayReader#read(CharBuffer),
assuming it is still in the scope of the bug. I wouldn't add
StringReader#read(CharBuffer) due to the throughput regression. I think
#transferTo(Writer) would be out of the scope of the bug. Is that ok?

[1]
https://github.com/marschall/reader-benchmarks/blob/master/src/main/java/com/github/marschall/readerbenchmarks/CharsequenceReaderBenchmarks.java

Cheers
Philippe

}
} else {
int remaining = target.remaining();
char cbuf[] = new char[Math.min(remaining, TRANSFER_BUFFER_SIZE)];

This comment has been minimized.

@bplb

bplb Feb 8, 2021
Member

As cbuf for the off-heap case is used in a synchronized block, is there the opportunity for some sort of cached array here and would it help?

This comment has been minimized.

@marschall

marschall Feb 9, 2021
Author Contributor

That would be possible. It would help in cases where a large Reader is read into one or several relatively small off-heap CharBuffers, requiring multiple #read calls. This can only be done when the caller is able to work with only a partial input. I don't know how common this case is.

We could re-purpose #skipBuffer, it has the same maximum size (8192) but determined by a different constant (#maxSkipBufferSize instead of #TRANSFER_BUFFER_SIZE). That would likely require it to be renamed and maybe we should even remove #maxSkipBufferSize. We could also do the reallocation and growing similar as is currently done in #skip.

This comment has been minimized.

@bplb

bplb Feb 10, 2021
Member

Perhaps a static final WORK_BUFFER_SIZE could be added with value 8192 and maxSkipBufferSize and TRANSFER_BUFFER_SIZE replaced with that? Then skipBuffer could be renamed to workBuffer and used in both read(CharBuffer) and skip(long). That shouldn't be a problem as both uses are in synchronized blocks. Also I suggest putting the declaration of workBuffer just below that of lock instead of lower down the file where skipBuffer is.

Lastly you mentioned C-style array declarations like char buf[]. As there are only four of these in the file it might be good to just go ahead and change them, I don't think that adds much noise or risk.

This comment has been minimized.

@marschall

marschall Feb 12, 2021
Author Contributor

Done. I left #transferTo(Writer) untouched for now. Firstly it is not already behind a synchronized. Secondly it writes so there is no need for repeated calls.

int len = target.remaining();
nread = this.read(cbuf, off, len);
if (nread > 0)
target.position(target.position() + nread);

This comment has been minimized.

@bplb

bplb Feb 8, 2021
Member

As target is mutable, I think you would do better to change lines 189-194 to something like:

            char cbuf[] = target.array();
            int pos = target.position();
            int rem = target.limit() - pos;
            if (rem <= 0)
                return -1;
            int off = target.arrayOffset() + pos;
            nread = this.read(cbuf, off, rem);
            if (nread > 0)
                target.position(pos + nread);

This comment has been minimized.

@marschall

marschall Feb 9, 2021
Author Contributor

Done

@bplb
Copy link
Member

@bplb bplb commented Feb 16, 2021

I think the implementation changes here look good. I don't know however whether there is enough coverage in the tests. These should verify that the Reader, CharArrayReader, and InputStreamReader implementations of read(CharBuffer) are accurate. If there is already sufficient coverage in the tests in test/jdk/java/io then that is good enough and nothing need be added.

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.io.*;

This comment has been minimized.

@bplb

bplb Feb 16, 2021
Member

It's generally better not to use a wild card.

This comment has been minimized.

@marschall

marschall Mar 13, 2021
Author Contributor

Done

assertEquals(buffer.limit(), 16);
}

buffer.clear();

This comment has been minimized.

@bplb

bplb Feb 16, 2021
Member

I think buffer.rewind() would be more in keeping with the specification verbiage although there will be no practical effect here. Same comment applies below and in the other test ReadCharBuffer.

This comment has been minimized.

@marschall

marschall Mar 13, 2021
Author Contributor

buffer.rewind() would not work in this specific case as it does not reset the limit. I want to assert the entire buffers content to make sure the method didn't accidentally write where it shouldn't have. Therefore limit needs to be set to capacity.

}
// if the last call to read returned -1 or the number of bytes
// requested have been read then break
} while (n >= 0 && remaining > 0);

This comment has been minimized.

@AlanBateman

AlanBateman Feb 17, 2021
Contributor

The code for case that the char buffer has a backing array looks okay but I'm not sure about the direct buffer/other cases. One concern is that this is a read method, not a transferXXX method so we shouldn't be calling the underlying read several times. You'll see what I mean if you consider the scenario where you read < rem, then read again and the second read blocks or throws. I'l also concerned about "workBuffer" adding more per-stream footprint for cases where skip or read(CB) is used. Objects such as InputStreamReader are already a problem due to the underlying stream decoder.

This comment has been minimized.

@marschall

marschall Feb 18, 2021
Author Contributor

Right. So you propose to revert the off-heap path to the current master? That would be fine with me. The original bug and my motivation was only about the backing array case, the rest crept in. That would certainly keep the risk and impact lower.

This comment has been minimized.

@bplb

bplb Feb 18, 2021
Member

I think that's what @AlanBateman intended. The skip() changes would revert also (I think) but the C-style array changes can stay. Thanks.

This comment has been minimized.

@AlanBateman

AlanBateman Feb 19, 2021
Contributor

Yes, let's bring it back to just eliminating the intermediate array when the buffer has a backing array. The other case that been examined separated if needed but we can't use the approach proposed in the current PR because it changes the semantics of read when there is a short-read followed by a blocking or exception throwing read.

This comment has been minimized.

@marschall

marschall Mar 13, 2021
Author Contributor

Done

@openjdk
Copy link

@openjdk openjdk bot commented Mar 13, 2021

@marschall this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-4926314
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
@openjdk openjdk bot removed the merge-conflict label Mar 13, 2021
@marschall
Copy link
Contributor Author

@marschall marschall commented Mar 13, 2021

I think the implementation changes here look good. I don't know however whether there is enough coverage in the tests. These should verify that the Reader, CharArrayReader, and InputStreamReader implementations of read(CharBuffer) are accurate. If there is already sufficient coverage in the tests in test/jdk/java/io then that is good enough and nothing need be added.

CharArrayReader was lacking a test. I added a test which found a bug and fixed the bug. The PR also contains new tests for Reader and InputStreamReader. They cover on-heap and off-heap cases.

Is there a way to get test coverage with JTReg tests? I only found [1] which seems out of date and points to an Oracle internal wiki.

[1] https://wiki.openjdk.java.net/display/CodeTools/JCov+FAQ#JCovFAQ-HowdoIuseJCovwithjtreg?

if (nread > 0)
target.put(cbuf, 0, nread);
}
return nread;

This comment has been minimized.

@AlanBateman

AlanBateman Mar 14, 2021
Contributor

Thanks for bringing this back to just the heap buffer case. This part looks good now.

This comment has been minimized.

@bplb

bplb Mar 19, 2021
Member

Agreed.

@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Mar 26, 2021

/label remove nio

@openjdk openjdk bot removed the nio label Mar 26, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Mar 26, 2021

@AlanBateman
The nio label was successfully removed.

@marschall
Copy link
Contributor Author

@marschall marschall commented Apr 8, 2021

How do we proceed here? Are there additional changes that you would like me to perform or undo?


@DataProvider(name = "buffers")
public Object[][] createBuffers() {
// test both on-heap and off-heap buffers has they may use different code paths

This comment has been minimized.

@bplb

bplb Apr 16, 2021
Member

"as they may"


@DataProvider(name = "buffers")
public Object[][] createBuffers() {
// test both on-heap and off-heap buffers has they make use different code paths

This comment has been minimized.

@bplb

bplb Apr 16, 2021
Member

"as they may"


@DataProvider(name = "buffers")
public Object[][] createBuffers() {
// test both on-heap and off-heap buffers has they make use different code paths

This comment has been minimized.

@bplb

bplb Apr 16, 2021
Member

"as they may"

limit = 8 + 8192 + 1;
buffer.limit(8 + 8192 + 1);
buffer.position(8);
assertEquals(reader.read(buffer), 8192 + 1);

This comment has been minimized.

@bplb

bplb Apr 16, 2021
Member

Lines 80 + 83 might be easier to read if replaced with

            int position = 8;
            limit = position + 8192 + 1;
            buffer.limit(limit);
            buffer.position(position);
            assertEquals(reader.read(buffer), limit - position);
Copy link
Contributor

@AlanBateman AlanBateman left a comment

Thanks for rolling back to the direct buffer case, the updated implementation changes look okay.

@openjdk
Copy link

@openjdk openjdk bot commented Apr 16, 2021

@marschall This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

4926314: Optimize Reader.read(CharBuffer)

Reviewed-by: alanb, bpb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 524 new commits pushed to the master branch:

  • 7c37c02: 8244190: JFR: When starting a JVM with -XX:StartFlightRecording, output is written to stdout
  • 79adc16: 8264152: javax/net/ssl/DTLS/RespondToRetransmit.java timed out
  • 1c3fd46: 8265175: (fs) Files.copy(Path,Path,CopyOption...) should use sendfile on Linux
  • cee4f1d: 8203925: tools/javac/importscope/T8193717.java ran out of java heap
  • 694e1cd: 8262060: compiler/whitebox/BlockingCompilation.java timed out
  • 6946d91: 8075915: The eight controls without black backgrounds with WinLAF & GTK LAF & Nimbus LAF
  • 714298a: 8265259: G1: Fix HeapRegion::block_is_obj for unloading class in full gc
  • ff5bb8c: 8265239: Shenandoah: Shenandoah heap region count could be off by 1
  • 17b6592: 8265335: Epsilon: Minor typo in EpsilonElasticTLABDecay description
  • 10ec38f: 8262462: IGV: cannot remove specific groups imported via network
  • ... and 514 more: https://git.openjdk.java.net/jdk/compare/d339320e0b648e28bcc0c07801ae9376a33fc975...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@AlanBateman, @bplb) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Apr 16, 2021
@bplb
bplb approved these changes Apr 16, 2021
Copy link
Member

@bplb bplb left a comment

Approved modulo my "as they may" remarks on three comment lines in the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants