Skip to content

8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder #12077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cl4es
Copy link
Member

@cl4es cl4es commented Jan 18, 2023

ZipCoder::checkedHashCode emulates StringLatin1::hashCode but operates on a byte[] subrange. It can profitably use the recently introduced ArraysSupport::vectorizedHashCode method to see a speed-up, which translates to a small but significant speed-up on ZipFile creation.

Before:

Benchmark                     (size)  Mode  Cnt       Score      Error  Units
ZipFileOpen.openCloseZipFile     512  avgt   15   83007.325 ± 1446.716  ns/op
ZipFileOpen.openCloseZipFile    1024  avgt   15  154550.631 ± 2166.673  ns/op

After:

Benchmark                     (size)  Mode  Cnt       Score      Error  Units
ZipFileOpen.openCloseZipFile     512  avgt   15   79512.902 ±  814.449  ns/op
ZipFileOpen.openCloseZipFile    1024  avgt   15  147892.522 ± 2744.017  ns/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12077/head:pull/12077
$ git checkout pull/12077

Update a local copy of the PR:
$ git checkout pull/12077
$ git pull https://git.openjdk.org/jdk pull/12077/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12077

View PR using the GUI difftool:
$ git pr show -t 12077

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12077.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 18, 2023

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 18, 2023
@openjdk
Copy link

openjdk bot commented Jan 18, 2023

@cl4es The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jan 18, 2023
@mlbridge
Copy link

mlbridge bot commented Jan 18, 2023

Webrevs

@AlanBateman
Copy link
Contributor

Using countPositives, and vectorizedHashCode(T_BOOLEAN) for unsigned bytes, make sense here. I don't have time to study the micro right now.

@cl4es
Copy link
Member Author

cl4es commented Jan 19, 2023

FWIW the micro is derived from the sibling ZipFileGetEntry micro in the same directory. It's not exactly necessary for this use case to add such a benchmark, but I think there's value in verifying that optimizing checkedHash improves ZipFile setup and adding the micro might allow us to find further opportunities down the line.

@AlanBateman
Copy link
Contributor

FWIW the micro is derived from the sibling ZipFileGetEntry micro in the same directory. It's not exactly necessary for this use case to add such a benchmark, but I think there's value in verifying that optimizing checkedHash improves ZipFile setup and adding the micro might allow us to find further opportunities down the line.

I've no doubt it improves checkedHash, it's just that open the zip file and reading in the CEN probably dominates when not in the file system cache.

@openjdk
Copy link

openjdk bot commented Jan 20, 2023

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

Reviewed-by: alanb, lancea

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 80 new commits pushed to the master branch:

  • 92d8326: 8299827: Add resolved IP address in connection exception for sockets
  • c6d5600: 8038146: Clarify Map.Entry's connection to the underlying map
  • b2d3622: 8299896: Reduce enum values of HtmlLinkInfo.Kind
  • 623ba5b: 8300653: G1EvacInfo should use common naming scheme for collection set
  • 97c611d: 8289748: C2 compiled code crashes with SIGFPE with -XX:+StressLCM and -XX:+StressGCM
  • 4562b40: 8300682: InstanceKlassMiscStatus is a bad name
  • 26410c1: 8281213: Unsafe uses of long and size_t in MemReporterBase::diff_in_current_scale
  • eca6479: 8300087: Replace NULL with nullptr in share/cds/
  • 49d60fe: 8300172: java/net/httpclient/MappingResponseSubscriber.java failed with java.net.ConnectException
  • e189397: 8296403: [TESTBUG] IR test runner methods in TestLongRangeChecks.java invoke wrong test methods
  • ... and 70 more: https://git.openjdk.org/jdk/compare/89a032dc057d04c996632ad317a0303cf3560852...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 20, 2023
zos.putNextEntry(new ZipEntry(ename));

ename += "long-entry-name-" + (random.nextInt(90000) + 10000) + "-" + i;
zos.putNextEntry(new ZipEntry(ename));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth to add some random sized data when the entries are created to allow for getting a bit more insight, or perhaps do that in a separate benchmark>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was experimenting with varying the entry names length to see what - if any - impact it had and saw a small effect on the micro. It does make more sense to vary lengths now that very long names will take different paths in the vectorized intrinsic. I'll see what I can do without overengineering this.

@cl4es
Copy link
Member Author

cl4es commented Jan 20, 2023

FWIW the micro is derived from the sibling ZipFileGetEntry micro in the same directory. It's not exactly necessary for this use case to add such a benchmark, but I think there's value in verifying that optimizing checkedHash improves ZipFile setup and adding the micro might allow us to find further opportunities down the line.

I've no doubt it improves checkedHash, it's just that open the zip file and reading in the CEN probably dominates when not in the file system cache.

Right, the micro is a poor proxy for real-world implications since time to open a zip file very much depends on the filesystem speed but this is sort of by design. We have separate startup tests that tries to emulate more "cold start" scenarios, which micros like this are complementary to and not a substitute for.

…ntries short but making the longest paths longer
@cl4es
Copy link
Member Author

cl4es commented Jan 20, 2023

Updated micro to vary entry sizes more to ensure we exercise the different code paths through the hashCode intrinsic. The new setup generates both longer and shorter entries than before, weighting up the average length a bit by increasing the spread. The longer entries see a proportionately larger speed-up, as expected since they benefit from vectorization. Removed some pointless randomness.

Baseline:

Benchmark                     (size)  Mode  Cnt       Score      Error  Units
ZipFileOpen.openCloseZipFile     512  avgt   15   98832.801 ± 2155.928  ns/op
ZipFileOpen.openCloseZipFile    1024  avgt   15  187373.545 ± 2296.779  ns/op

Patched:

Benchmark                     (size)  Mode  Cnt       Score      Error  Units
ZipFileOpen.openCloseZipFile     512  avgt   15   85574.648 ±  920.477  ns/op
ZipFileOpen.openCloseZipFile    1024  avgt   15  160493.277 ± 3450.928  ns/op

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Claes

@cl4es
Copy link
Member Author

cl4es commented Jan 21, 2023

Thanks @LanceAndersen and @AlanBateman for reviewing!

/integrate

@openjdk
Copy link

openjdk bot commented Jan 21, 2023

Going to push as commit bb42e61.
Since your change was applied there have been 88 commits pushed to the master branch:

  • 06394ee: 8300590: [JVMCI] BytecodeFrame.equals is broken
  • 5331a3e: 8298908: Instrument Metaspace for ASan
  • e1ee672: 8300725: Improve performance of ColorConvertOp for default destinations with alpha
  • 7c2f77a: 8300584: Accelerate AVX-512 CRC32C for small buffers
  • 5784eb7: 8300721: Cleanup ProblemList-svc-vthread.txt
  • 9d44dd0: 8297972: Poly1305 Endianness on ByteBuffer not enforced
  • facd415: 8297757: VarHandles.getStaticFieldFromBaseAndOffset should get the receiver type from VarHandle
  • e803855: 8299863: URLFromURITest.java should import org.junit.jupiter.api.Test
  • 92d8326: 8299827: Add resolved IP address in connection exception for sockets
  • c6d5600: 8038146: Clarify Map.Entry's connection to the underlying map
  • ... and 78 more: https://git.openjdk.org/jdk/compare/89a032dc057d04c996632ad317a0303cf3560852...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 21, 2023
@openjdk openjdk bot closed this Jan 21, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 21, 2023
@openjdk
Copy link

openjdk bot commented Jan 21, 2023

@cl4es Pushed as commit bb42e61.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated security security-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants