Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8328938: C2 SuperWord: disable vectorization for large stride and scale #495

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Apr 12, 2024

Unclean backport to prevent accidents in C2 loop optimizations. The patch is unclean, because JDK 21u misses major SuperWord refactorings. I applied the hunk by hand in the similar place, and also used slp->iv_stride() in one place to get this thing to work.

@eme64, if you want to take a look at this?

Additional testing:

  • New regression test fails without the patch, passes with it
  • Linux x86_64 server fastdebug, all
  • Linux x86_64 server fastdebug, 100K Fuzzer tests
  • Linux x86_64 server fastdebug, Maven CTW

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8328938 needs maintainer approval

Issue

  • JDK-8328938: C2 SuperWord: disable vectorization for large stride and scale (Bug - P3 - Approved)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk21u-dev.git pull/495/head:pull/495
$ git checkout pull/495

Update a local copy of the PR:
$ git checkout pull/495
$ git pull https://git.openjdk.org/jdk21u-dev.git pull/495/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 495

View PR using the GUI difftool:
$ git pr show -t 495

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk21u-dev/pull/495.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 12, 2024

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8328938: C2 SuperWord: disable vectorization for large stride and scale

Reviewed-by: epeter, simonis

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 77 new commits pushed to the master branch:

  • 3bb8eee: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading
  • a4d8d06: 8328744: Parallel: Parallel GC throws OOM before heap is fully expanded
  • 87d5da4: 8328822: C2: "negative trip count?" assert failure in profile predicate code
  • 3892078: 8324121: SIGFPE in PhaseIdealLoop::extract_long_range_checks
  • 3ab3f52: 8326974: ODR violation in macroAssembler_aarch64.cpp
  • 5d5c678: 8319713: Parallel: Remove PSAdaptiveSizePolicy::should_full_GC
  • 56a9090: 8312014: [s390x] TestSigInfoInHsErrFile.java Failure
  • c922c50: 8322962: Upcall stub might go undetected when freezing frames
  • cbd30c9: 8325095: C2: bailout message broken: ResourceArea allocated string used after free
  • 75ac9e8: 8328168: Epsilon: Premature OOM when allocating object larger than uncommitted heap size
  • ... and 67 more: https://git.openjdk.org/jdk21u-dev/compare/249d89632c8e78b301c419678081796bf273e285...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title Backport 2931458711244e20eb7845a1aefcf6ed4206bce1 8328938: C2 SuperWord: disable vectorization for large stride and scale Apr 12, 2024
@openjdk
Copy link

openjdk bot commented Apr 12, 2024

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added the backport label Apr 12, 2024
@shipilev shipilev marked this pull request as ready for review April 15, 2024 08:07
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 15, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 15, 2024

Webrevs

Copy link

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shipilev that looks like the right thing to do, yes!
Thanks for the work :)

@shipilev
Copy link
Member Author

There is one new failure:
compiler/vectorization/runner/LoopCombinedOpTest.java

Some backport might be missing.

@eme64
Copy link

eme64 commented Apr 15, 2024

What kind of failure are we talking about? How does it manifest?

@shipilev
Copy link
Member Author

Like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5b5c807f09, pid=46847, tid=46898
#
# JRE version: OpenJDK Runtime Environment (21.0.4) (fastdebug build 21.0.4-internal-adhoc.shipilev.shipilev-jdk21u-dev)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 21.0.4-internal-adhoc.shipilev.shipilev-jdk21u-dev, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x115bf09]  CountedLoopNode::stride_con() const+0x29
#

Stack: [0x00007f5b29b5b000,0x00007f5b29c5c000],  sp=0x00007f5b29c56a50,  free space=1006k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x115bf09]  CountedLoopNode::stride_con() const+0x29  (loopnode.cpp:2461)
V  [libjvm.so+0x169a609]  SWPointer::SWPointer(MemNode*, SuperWord*, Node_Stack*, bool) [clone .constprop.345]+0x439  (superword.hpp:364)
V  [libjvm.so+0x16a8cdd]  SuperWord::output()+0x5ed  (superword.cpp:2709)
V  [libjvm.so+0x16aad43]  SuperWord::SLP_extract()+0x353  (superword.cpp:667)
V  [libjvm.so+0x16aaff4]  SuperWord::transform_loop(IdealLoopTree*, bool)+0x254  (superword.cpp:178)
V  [libjvm.so+0x117728b]  PhaseIdealLoop::build_and_optimize()+0xf8b  (loopnode.cpp:4858)
V  [libjvm.so+0x991e5f]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x2ef  (loopnode.hpp:1125)
V  [libjvm.so+0x98ebbc]  Compile::Optimize()+0x115c  (compile.cpp:2165)
V  [libjvm.so+0x990992]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1a32  (compile.cpp:851)
V  [libjvm.so+0x8047b0]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x320  (c2compiler.cpp:119)
V  [libjvm.so+0x99b22d]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x47d  (compileBroker.cpp:2266)
V  [libjvm.so+0x99c120]  CompileBroker::compiler_thread_loop()+0x320  (compileBroker.cpp:1945)
V  [libjvm.so+0xddc2ab]  JavaThread::thread_main_inner()+0x1bb  (javaThread.cpp:719)
V  [libjvm.so+0x171de6f]  Thread::call_run()+0x12f  (thread.cpp:218)
V  [libjvm.so+0x1355b74]  thread_native_entry(Thread*)+0x114  (os_linux.cpp:808)

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000

Registers:
RAX=0x0000000000000000, RBX=0x00007f5ab412e990, RCX=0x00007f5abc1fdaa0, RDX=0x00007f5ab412ea80
RSP=0x00007f5b29c56a50, RBP=0x00007f5b29c56a70, RSI=0x0000000000000001, RDI=0x00007f5ab412e990
R8 =0x00007f5b29c56e90, R9 =0x0000000000000000, R10=0x0000000000000001, R11=0x00007f5b5d8faa84
R12=0x00007f5b29c56e90, R13=0x00007f5b5bfdb000, R14=0x00007f5abc049980, R15=0x00007f5b29c56b90
RIP=0x00007f5b5c807f09, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

I think it is from this PR addition of slp->iv_stride(), with SuperWord::_lp is not initialized yet. Hm. The refactorings gave us VLoop as the carrier object in mainline, I wonder how it should look in 21u then.

@eme64
Copy link

eme64 commented Apr 15, 2024

Ah, fair enough. Initialization has changed with all my refactorings. But this must be solvable I think.

@eme64
Copy link

eme64 commented Apr 15, 2024

But wait. How can _lp not be initialized yet, if we are already in SuperWord::output? SuperWord::set_lp should have been called in SuperWord::transform_loop.

@shipilev
Copy link
Member Author

shipilev commented Apr 15, 2024

Actually, more debugging shows that stride() == nullptr here:

return stride()->bottom_type()->is_integer(bt())->get_con_as_long(bt());

It might sound like another instance of https://bugs.openjdk.org/browse/JDK-8299975 or some such, hm. There is only one test failing, so it likely depends on some related optimizations happening before we hit this code.

@eme64
Copy link

eme64 commented Apr 16, 2024

Hmm, I see. So slp->lp() != nullptr in this failing case? If so, you could just check if the stride() == null, and use that as an additional "fail condition", and a guard for getting slp->iv_stride(). That is not the prettiest, but it might work.

Copy link

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the explanation.

@shipilev
Copy link
Member Author

shipilev commented Apr 17, 2024

New revision passes the previously failing tests, and is still sensitive to the original regression test. Re-running all testing now... Update: passes all tests.

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Apr 18, 2024

⚠️ @shipilev This change is now ready for you to apply for maintainer approval. This can be done directly in each associated issue or by using the /approval command.

@shipilev
Copy link
Member Author

Thanks!

@shipilev
Copy link
Member Author

/approval request Unclean backport to prevent accidents in C2 loop optimizations. The patch is unclean, because JDK 21u misses major SuperWord refactorings; 21u PR acked by Emanuel and Volker. Passes all aggressive compiler testing. Risk is usual for C2 changes, but on the lower side, as it bails out of optimizations cleanly; minor risk of performance regressions.

@openjdk
Copy link

openjdk bot commented Apr 18, 2024

@shipilev
8328938: The approval request has been created successfully.

@openjdk openjdk bot added the approval label Apr 18, 2024
@GoeLin
Copy link
Member

GoeLin commented May 2, 2024

Hi @shipilev, does the backport to 22 require a third version? Or would it be clean, or would the 21 version work for 22?

@shipilev
Copy link
Member Author

shipilev commented May 2, 2024

Hi @shipilev, does the backport to 22 require a third version? Or would it be clean, or would the 21 version work for 22?

Yes, JDK 22 version would be a third variant. Neither JDK 23 nor JDK 21 version works there cleanly. So I would prefer to just leave JDK 22 be.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 3, 2024
@openjdk openjdk bot removed the approval label May 3, 2024
@shipilev
Copy link
Member Author

shipilev commented May 6, 2024

Thanks all!

/integrate

@openjdk
Copy link

openjdk bot commented May 6, 2024

Going to push as commit 2b858f5.
Since your change was applied there have been 84 commits pushed to the master branch:

  • 9159882: 8310513: [s390x] Intrinsify recursive ObjectMonitor locking
  • 3ff5359: 8330011: [s390x] update block-comments to make code consistent
  • abbad92: 8326201: [S390] Need to bailout cleanly if creation of stubs fails when code cache is out of space
  • 3770c28: 8331331: :tier1 target explanation in doc/testing.md is incorrect
  • 021372c: 8328703: Illegal accesses in Java_jdk_internal_org_jline_terminal_impl_jna_linux_CLibraryImpl_ioctl0
  • d459ae9: 8329850: [AIX] Allow loading of different members of same shared library archive
  • 835d016: 8330094: RISC-V: Save and restore FRM in the call stub
  • 3bb8eee: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading
  • a4d8d06: 8328744: Parallel: Parallel GC throws OOM before heap is fully expanded
  • 87d5da4: 8328822: C2: "negative trip count?" assert failure in profile predicate code
  • ... and 74 more: https://git.openjdk.org/jdk21u-dev/compare/249d89632c8e78b301c419678081796bf273e285...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 6, 2024
@openjdk openjdk bot closed this May 6, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 6, 2024
@openjdk
Copy link

openjdk bot commented May 6, 2024

@shipilev Pushed as commit 2b858f5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants