8255763: C2: OSR miscompilation caused by invalid memory instruction placement #22
Conversation
…placement Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents GCM from wrongly "hoisting" stores into descendants of their original loop. Such an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate estimation of frequencies for irreducible CFGs. Extend CFG verification code by checking that memory-writing nodes are placed in either their original loop or an ancestor. Add tests for the reducible and irreducible cases. The former was already handled correctly before the change (the frequency estimation model prevents "inverted hoisting" for reducible CFGs), and is just added for coverage. This change addresses the specific miscompilation issue in a conservative way, for simplicity and risk reduction. Future work includes discarding only illegal blocks as candidates for GCM hoisting, and refining frequency estimation for irreducible CFGs.
|
@robcasloz The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/summary Extend CFG verification code by checking that memory-writing nodes are placed in Add tests for the reducible and irreducible cases. The former was already This change addresses the specific miscompilation issue in a conservative way, |
@robcasloz Setting summary to:
|
Tested on |
Tested for performance regressions on a set of standard benchmark suites (DaCapo, SPECjbb2005, SPECjvm2008, ...) and on windows-x64, linux-x64, and macosx-x64. No regression was observed, which can be expected since this change 1) only affects a minority of the compiled methods (those with irreducible CFGs), 2) only affects the placement of memory-writing nodes, which tends to be quite constrained already, and 3) forces the placement of these nodes, as much as possible, out of loops. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seems fine. And should be done regardless my following questions.
Is it possible to avoid "hoist inversion" if we take into account BCI and other information instead of only frequency? Is it possible compute_freq() take into account irreducible loops?
We could avoid it if
"We choose the block that is in the shallowest loop nest possible, and then is as control dependent as possible." and
Unfortunately, information about irreducible loops is currently discarded when building the CFG-loop tree: jdk16/src/hotspot/share/opto/gcm.cpp Lines 1657 to 1659 in ce0ab2d
I think so. compute_freq() computes the frequencies of each block within a loop L in a single forward pass (in reverse DFS postorder) over the members (blocks and loops) of L: jdk16/src/hotspot/share/opto/gcm.cpp Lines 1790 to 1808 in ce0ab2d
In this computation, it assumes that the members form an acyclic graph (except for L's back-edges). This assumption does not hold for irreducible graphs, where there might be additional cycles corresponding to non-natural loops. This could be refined by transforming the single forward pass into a proper system of equations to be solved iteratively. See some more details in the description field of the bug report. |
Thank you, @robcasloz I don't see link to testing results in bug report. |
Thank you Vladimir for looking at this, I agree with your suggested plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing results are good.
@robcasloz This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 34 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @chhagedorn) but any other Committer may sponsor as well.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice summary in the JBS issue! That looks good to me and I agree with Vladimir to do this fix in 16 and proceed with an RFE to further investigate the mentioned improvement possibilities.
Thanks for reviewing, Christian! |
/integrate |
@robcasloz |
Thanks for reviewing @vnkozlov and @chhagedorn! Requesting integration, since the only update after the reviews is a trivial style change (4f2763c). |
/sponsor |
@chhagedorn @robcasloz Since your change was applied there have been 34 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 4e8338e. |
Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents GCM from wrongly "hoisting" stores into descendants of their original loop. Such an "inverted hoisting" can happen due to
CFGLoop::compute_freq()
's inaccurate estimation of frequencies for irreducible CFGs.Extend CFG verification code by checking that memory-writing nodes are placed in either their original loop or an ancestor.
Add tests for the reducible and irreducible cases. The former was already handled correctly before the change (the frequency estimation model prevents "inverted hoisting" for reducible CFGs), and is just added for coverage.
This change addresses the specific miscompilation issue in a conservative way, for simplicity and safety. Future work includes investigating if only the illegal blocks can be discarded as candidates for GCM hoisting, and refining frequency estimation for irreducible CFGs.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk16 pull/22/head:pull/22
$ git checkout pull/22