-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8300256: C2: vectorization is sometimes skipped on loops where it would succeed #12116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment.
@@ -3990,7 +3990,6 @@ void IdealLoopTree::dump_head() const { | |||
if (cl->is_post_loop()) tty->print(" post"); | |||
if (cl->is_reduction_loop()) tty->print(" reduction"); | |||
if (cl->is_vectorized_loop()) tty->print(" vector"); | |||
if (cl->range_checks_present()) tty->print(" rc "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use new range_checks_present()
here? The information is still useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this.
Indeed, you're right. I updated the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good.
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 32 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Maybe add a @bug 8300256
to the test.
Thanks for the review. Yes, I will do that. |
FTR, the test fails on x86 (32 bits) for some reason. I changed the test so it doesn't run on x86 (32 bits). |
/integrate |
Going to push as commit 2a8ae2f.
Your commit was automatically rebased without conflicts. |
Vectorization for a counted loop cl only proceeds if
cl->range_checks_present() returns true. The result of that method is
computed lazily and its result cached in the CountedLoopNode and never
re-computed. If PhaseIdealLoop::do_range_check() returns 0 then the
result of that computation is overwritten (no range checks
present). PhaseIdealLoop::do_range_check() counts the number of tests
present in the loop body (which is really what range_checks_present()
is about) and decrements that count for every check it eliminates
except if it's not a comparison with a LoadRange (for a reason that I
don't understand). In the case of the test (a pattern from a
ByteBuffer benchmark), not all tests are with a LoadRange. As a
result, PhaseIdealLoop::do_range_check() returns non zero even though
it eliminates all tests. As a result, vectorization is never
attempted.
There doesn't seem to be a value in caching the result of
range_checks_present() in CountedLoopNode. It's not that expensive to
compute, it's only used during loop opts and it's really hard to keep
in sync with whether the loop has still tests: several different
transformations could remove a test. What I propose instead is to keep
roughly the same approach (compute the result lazily and cache it so
it doesn't have to be re-computed) but to store it on the
IdealLoopTree instead (so it's recomputed on every loop opts pass and
there's no risk that it becomes out of sync).
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12116/head:pull/12116
$ git checkout pull/12116
Update a local copy of the PR:
$ git checkout pull/12116
$ git pull https://git.openjdk.org/jdk pull/12116/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12116
View PR using the GUI difftool:
$ git pr show -t 12116
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12116.diff