Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254078: DataOutputStream is very slow post-disabling of Biased Locking #542

Closed
wants to merge 8 commits into from

Conversation

theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Oct 7, 2020

DataOutputStream is very slow post-disabling of Biased Locking. This
was discovered when benchmarking a transaction library, which showed
significant performance loss when moving to JDK 15. WIth some small
changes to DataOutputStream we can get the performance back. There's a
JMH benchmark at
http://cr.openjdk.java.net/~aph/JDK-8254078/jmh-tests.tar

Some Stream classes use very fine-grained locking.

In particular, writeInt is defined like this:

    out.write((v >>> 24) & 0xFF);
    out.write((v >>> 16) & 0xFF);
    out.write((v >>> 8) & 0xFF);
    out.write((v >>> 0) & 0xFF);
    incCount(4);

Unfortunately, ByteArrayOutputStream.write(byte) is defined like this:

public synchronized void write(int b) {
    ensureCapacity(count + 1);
    buf[count] = (byte) b;
    count += 1;
}

so we acquire and release a lock for every byte that is output.

For example, writing 4kb of ints goes from 17.3 us/op to 53.9 us/op when biased locking is disabled:

+UseBiasedLocking DataOutputStreamTest.dataOutputStreamOverByteArray avgt 6 53.895 ± 5.126 us/op
-UseBiasedLocking DataOutputStreamTest.dataOutputStreamOverByteArray avgt 6 17.291 ± 4.430 us/op

There are refactorings of DataOutputStream we can do to mitigate this.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8254078: DataOutputStream is very slow post-disabling of Biased Locking

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/542/head:pull/542
$ git checkout pull/542

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 7, 2020

👋 Welcome back aph! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 7, 2020
@openjdk
Copy link

openjdk bot commented Oct 7, 2020

@theRealAph The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Oct 7, 2020
@theRealAph
Copy link
Contributor Author

theRealAph commented Oct 7, 2020

jdk, with option +UseBiasedLocking:

Benchmark                                                    (BASIC_TYPE)  (KIND)  (SIZE)  Mode  Cnt     Score    Error  Units
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream          char     old    4096  avgt    6    45.145 ± 38.769  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream         short     old    4096  avgt    6    54.226 ±  0.872  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream           int     old    4096  avgt    6    56.863 ±  1.264  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                   char     old    4096  avgt    6    20.369 ±  0.462  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                  short     old    4096  avgt    6    20.272 ±  0.214  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                    int     old    4096  avgt    6    17.625 ±  0.142  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream               char     old    4096  avgt    6  3045.074 ± 79.539  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream              short     old    4096  avgt    6  3133.192 ± 74.831  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream                int     old    4096  avgt    6  2988.553 ± 29.789  us/op

jdk, default -UseBiasedLocking:

DataOutputStreamTest.dataOutputStreamOverBufferedFileStream          char     old    4096  avgt    6    58.267 ±  1.070  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream         short     old    4096  avgt    6    57.875 ±  2.386  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream           int     old    4096  avgt    6    58.393 ±  0.405  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                   char     old    4096  avgt    6    50.269 ± 14.233  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                  short     old    4096  avgt    6    49.006 ± 11.184  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                    int     old    4096  avgt    6    52.905 ±  5.153  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream               char     old    4096  avgt    6  3053.183 ± 89.999  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream              short     old    4096  avgt    6  2986.694 ± 57.205  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream                int     old    4096  avgt    6  3061.812 ± 57.636  us/op

jdk, default -UseBiasedLocking, patched:

Benchmark                                                    (BASIC_TYPE)  (KIND)  (SIZE)  Mode  Cnt     Score    Error  Units
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream          char     old    4096  avgt    6    28.911 ±  0.529  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream         short     old    4096  avgt    6    28.895 ±  0.286  us/op
DataOutputStreamTest.dataOutputStreamOverBufferedFileStream           int     old    4096  avgt    6    15.157 ±  2.870  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                   char     old    4096  avgt    6    29.251 ±  0.291  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                  short     old    4096  avgt    6    29.712 ±  0.230  us/op
DataOutputStreamTest.dataOutputStreamOverByteArray                    int     old    4096  avgt    6    15.703 ±  0.265  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream               char     old    4096  avgt    6  1704.795 ± 36.028  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream              short     old    4096  avgt    6  1731.482 ± 21.304  us/op
DataOutputStreamTest.dataOutputStreamOverRawFileStream                int     old    4096  avgt    6   861.758 ± 18.296  us/op

@mlbridge
Copy link

mlbridge bot commented Oct 7, 2020

Webrevs

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a pre-existing race condition on use of the writeBuffer.
It is allocated per DataOutputStream but is unsynchronized.
Only the write(byte) and write(byte[], off, len) methods are synchronized.
Will that be/become a problem?

@theRealAph
Copy link
Contributor Author

theRealAph commented Oct 7, 2020

There is a pre-existing race condition on use of the writeBuffer.
It is allocated per DataOutputStream but is unsynchronized.
Only the write(byte) and write(byte[], off, len) methods are synchronized.
Will that be/become a problem?

Hard to say. There isn't a guarantee in the specification. After my patch, writeInt is

    public final void writeInt(int v) throws IOException {
        writeBuffer[0] = (byte)(v >>> 24);
        writeBuffer[1] = (byte)(v >>> 16);
        writeBuffer[2] = (byte)(v >>>  8);
        writeBuffer[3] = (byte)(v >>>  0);
        out.write(writeBuffer, 0, 4);
        incCount(4);
    }

whereas writeLong() is

    public final void writeLong(long v) throws IOException {
        writeBuffer[0] = (byte)(v >>> 56);
        writeBuffer[1] = (byte)(v >>> 48);
        writeBuffer[2] = (byte)(v >>> 40);
        writeBuffer[3] = (byte)(v >>> 32);
        writeBuffer[4] = (byte)(v >>> 24);
        writeBuffer[5] = (byte)(v >>> 16);
        writeBuffer[6] = (byte)(v >>>  8);
        writeBuffer[7] = (byte)(v >>>  0);
        out.write(writeBuffer, 0, 8);
        incCount(8);
    }

I know there's a history of Java programmers writing to the implementation, not the specification, but ISTM that if writeLong() can be written this way so can writeInt().

@AlanBateman
Copy link
Contributor

DataOutputStream is a JDK 1.0 era class that doesn't specify whether it is safe for use by multiple threads or not. Some methods are synchronized, some are not. The writeBuffer used by writeLong means it is not thread safe. The writeShort and writeInt methods aren't thread safe so changing them to use the writeBuffer doesn't make things any worse. Other examples are the "bytearr" used by writeUTF, the writeChars and writeBytes methods, and the protected "written" field that sub-classes can use to monitor the number of bytes written.

It might be helpful to add a statement to the DataOutputStream class description to say that it not safe for by concurrent threads and it needs appropriate synchronization to coordinate multiple writers. This issue or another issue doesn't matter.

Can the benchmark be turned into a microbenchmark for test/micro?

@theRealAph
Copy link
Contributor Author

It might be helpful to add a statement to the DataOutputStream class description to say that it not safe for by concurrent threads and it needs appropriate synchronization to coordinate multiple writers. This issue or another issue doesn't matter.

I see.

Can the benchmark be turned into a microbenchmark for test/micro?

Now there's an interesting question. I can tweak the benchmark (which at present contains a modified version of DataOutputSteam, so can compare and contrast) so that it measures only the performance of the core library's implementation. Does that make sense? If so, I can add it to this PR.

@RogerRiggs
Copy link
Contributor

The benchmark can be run with different JDKs, so the modified DOS can be omitted.

@theRealAph
Copy link
Contributor Author

It turns out that DataInputStream already has a comment about thread safety being the user's problem, so I copied that comment to DataOutputStream.

@theRealAph
Copy link
Contributor Author

/test

@openjdk
Copy link

openjdk bot commented Oct 7, 2020

@theRealAph you need to get approval to run the tests in tier1 for commits up until 8379b4f

@openjdk openjdk bot added the test-request label Oct 7, 2020
@AlanBateman
Copy link
Contributor

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Oct 8, 2020
@openjdk
Copy link

openjdk bot commented Oct 8, 2020

@AlanBateman has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@theRealAph please create a CSR request and add link to it in JDK-8254078. This pull request cannot be integrated until the CSR request is approved.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have comments about the benchmark code style :)

Copy link
Contributor Author

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have comments about the benchmark code style :)

Wow, thanks! I didn't know that JMH could do this, I thought you had to jump through the BenchmarkState hoops. This way is much better.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark code looks good, except a few remaining nits.


@Param({"4096"}) int size;
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(size);
final File f = new File("DataOutputStreamTest.tmp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be File.createTempFile("DataOutputStreamTest", "out")? This way you don't leave waste in current dir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably move all these initializations to setup(). It does not look like the fields final-ity matter for this benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably move all these initializations to setup(). It does not look like the fields final-ity matter for this benchmark.

This conflicts with other desiderata, i.e. that fields should be final and if a field can be initialized it should be. And it makes the benchmark longer. I prefer to pass on this one.

* Thread safety is optional and is the responsibility of users of
* methods in this class.
* A DataInputStream is not safe for use by multiple concurrent
* threads. If a DataOutputStream is to be used by more than one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Andrew, just noticed a typo here: DataOutputStream
best regards,
-- daniel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was a comment on the previous iteration too. Also s/output/input/.

* methods in this class.
* A DataInputStream is not safe for use by multiple concurrent
* threads. If a DataInputStream is to be used by more than one
* thread then access to the data output stream should be controlled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"data output stream" should be "data input stream".

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Oct 25, 2020
@openjdk
Copy link

openjdk bot commented Oct 25, 2020

@theRealAph This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254078: DataOutputStream is very slow post-disabling of Biased Locking

Reviewed-by: rriggs, shade, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 310 new commits pushed to the master branch:

  • 6545e19: 8255078: sun/net/ftp/imp/FtpClient$MLSxParser uses wrong datetime format
  • 3f6abd2: 8255330: gtest/MetaspaceGtests.java fail on 32-bit platforms
  • cf821b0: 8232024: Don't pollute log output with multiple errors of the same type
  • 185c8bc: 8255338: CodeSections are never frozen
  • cc86113: 8255271: Avoid generating duplicate interpreter entries for subword types
  • 5ec1b80: 8255287: aarch64: fix SVE patterns for vector shift count
  • 12daf2b: 8255340: [s390] build failure after JDK-8255208
  • 9007bc2: 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node
  • df79257: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched
  • 55a0cad: 8254982: (tz) Upgrade time-zone data to tzdata2020c
  • ... and 300 more: https://git.openjdk.java.net/jdk/compare/51fdb4c0392aee5d817765d59cb6b9ec27f0050b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 25, 2020
@theRealAph
Copy link
Contributor Author

/integrate

@openjdk openjdk bot closed this Nov 9, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 9, 2020
@openjdk
Copy link

openjdk bot commented Nov 9, 2020

@theRealAph Since your change was applied there have been 554 commits pushed to the master branch:

  • 79b7909: 8255980: G1 Service thread register_task can be used after shutdown
  • dd8e4ff: 8255711: Fix and unify hotspot signal handlers
  • d99e1f6: 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg
  • c7551c3: 8256014: Eliminate the warning about serialization in non-public API of Swing
  • 2d6c28d: 6847157: java.lang.NullPointerException: HDC for component at sun.java2d.loops.Blit.Blit
  • 3ce09c0: 8255920: J2DBench should support CS_PYCC color profile
  • 2c8f4e2: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly
  • 2cad836: 8255575: java.awt.color.ICC_ColorSpace is not thread-safe
  • a53b12d: 8255722: Create a new test for rotated blit
  • f39a2c8: 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen
  • ... and 544 more: https://git.openjdk.java.net/jdk/compare/51fdb4c0392aee5d817765d59cb6b9ec27f0050b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 17f04fc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated test-request
Development

Successfully merging this pull request may close these issues.

5 participants