Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.

8255763: C2: OSR miscompilation caused by invalid memory instruction placement #22

Closed
wants to merge 3 commits into from

Conversation

robcasloz
Copy link
Contributor

@robcasloz robcasloz commented Dec 15, 2020

Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents GCM from wrongly "hoisting" stores into descendants of their original loop. Such an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate estimation of frequencies for irreducible CFGs.

Extend CFG verification code by checking that memory-writing nodes are placed in either their original loop or an ancestor.

Add tests for the reducible and irreducible cases. The former was already handled correctly before the change (the frequency estimation model prevents "inverted hoisting" for reducible CFGs), and is just added for coverage.

This change addresses the specific miscompilation issue in a conservative way, for simplicity and safety. Future work includes investigating if only the illegal blocks can be discarded as candidates for GCM hoisting, and refining frequency estimation for irreducible CFGs.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8255763: C2: OSR miscompilation caused by invalid memory instruction placement

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/22/head:pull/22
$ git checkout pull/22

robcasloz added 2 commits Dec 14, 2020
…placement

Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents
GCM from wrongly "hoisting" stores into descendants of their original loop. Such
an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate
estimation of frequencies for irreducible CFGs.

Extend CFG verification code by checking that memory-writing nodes are placed in
either their original loop or an ancestor.

Add tests for the reducible and irreducible cases. The former was already
handled correctly before the change (the frequency estimation model prevents
"inverted hoisting" for reducible CFGs), and is just added for coverage.

This change addresses the specific miscompilation issue in a conservative way,
for simplicity and risk reduction. Future work includes discarding only illegal
blocks as candidates for GCM hoisting, and refining frequency estimation for
irreducible CFGs.
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2020

👋 Welcome back rcastanedalo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 15, 2020

@robcasloz The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.java.net label Dec 15, 2020
@robcasloz
Copy link
Contributor Author

/summary
Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents
GCM from wrongly "hoisting" stores into descendants of their original loop. Such
an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate
estimation of frequencies for irreducible CFGs.

Extend CFG verification code by checking that memory-writing nodes are placed in
either their original loop or an ancestor.

Add tests for the reducible and irreducible cases. The former was already
handled correctly before the change (the frequency estimation model prevents
"inverted hoisting" for reducible CFGs), and is just added for coverage.

This change addresses the specific miscompilation issue in a conservative way,
for simplicity and safety. Future work includes investigating if only the
illegal blocks can be discarded as candidates for GCM hoisting, and refining
frequency estimation for irreducible CFGs.

@openjdk
Copy link

openjdk bot commented Dec 15, 2020

@robcasloz Setting summary to:

Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents
GCM from wrongly "hoisting" stores into descendants of their original loop. Such
an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate
estimation of frequencies for irreducible CFGs.

Extend CFG verification code by checking that memory-writing nodes are placed in
either their original loop or an ancestor.

Add tests for the reducible and irreducible cases. The former was already
handled correctly before the change (the frequency estimation model prevents
"inverted hoisting" for reducible CFGs), and is just added for coverage.

This change addresses the specific miscompilation issue in a conservative way,
for simplicity and safety. Future work includes investigating if only the
illegal blocks can be discarded as candidates for GCM hoisting, and refining
frequency estimation for irreducible CFGs.

@robcasloz
Copy link
Contributor Author

Tested on hs-tier1-9 on windows-x64, linux-x64, linux-aarch64, and macosx-x64 with VerifyRegisterAllocator enabled (to exercise all calls to the updated PhaseCFG::verify()).

@robcasloz
Copy link
Contributor Author

Tested for performance regressions on a set of standard benchmark suites (DaCapo, SPECjbb2005, SPECjvm2008, ...) and on windows-x64, linux-x64, and macosx-x64. No regression was observed, which can be expected since this change 1) only affects a minority of the compiled methods (those with irreducible CFGs), 2) only affects the placement of memory-writing nodes, which tends to be quite constrained already, and 3) forces the placement of these nodes, as much as possible, out of loops.

@robcasloz robcasloz marked this pull request as ready for review Dec 16, 2020
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 16, 2020
@mlbridge
Copy link

mlbridge bot commented Dec 16, 2020

Webrevs

Copy link

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seems fine. And should be done regardless my following questions.

Is it possible to avoid "hoist inversion" if we take into account BCI and other information instead of only frequency? Is it possible compute_freq() take into account irreducible loops?

@robcasloz
Copy link
Contributor Author

Is it possible to avoid "hoist inversion" if we take into account BCI and other information instead of only frequency?

We could avoid it if

  1. we followed the original heuristic proposed in Click's Global code motion/global value numbering paper:

"We choose the block that is in the shallowest loop nest possible, and then is as control dependent as possible."

and

  1. we had perfect loop information (including information about irreducible loops).

Unfortunately, information about irreducible loops is currently discarded when building the CFG-loop tree:

// Defensively filter out Loop nodes for non-single-entry loops.
// For all reasonable loops, the head occurs before the tail in RPO.
if (i <= tail->_rpo) {

Is it possible compute_freq() take into account irreducible loops?

I think so. compute_freq() computes the frequencies of each block within a loop L in a single forward pass (in reverse DFS postorder) over the members (blocks and loops) of L:

for (int i = 0; i < _members.length(); i++) {
CFGElement* s = _members.at(i);
double freq = s->_freq;
if (s->is_block()) {
Block* b = s->as_Block();
for (uint j = 0; j < b->_num_succs; j++) {
Block* sb = b->_succs[j];
update_succ_freq(sb, freq * b->succ_prob(j));
}
} else {
CFGLoop* lp = s->as_CFGLoop();
assert(lp->_parent == this, "immediate child");
for (int k = 0; k < lp->_exits.length(); k++) {
Block* eb = lp->_exits.at(k).get_target();
double prob = lp->_exits.at(k).get_prob();
update_succ_freq(eb, freq * prob);
}
}
}

In this computation, it assumes that the members form an acyclic graph (except for L's back-edges). This assumption does not hold for irreducible graphs, where there might be additional cycles corresponding to non-natural loops. This could be refined by transforming the single forward pass into a proper system of equations to be solved iteratively. See some more details in the description field of the bug report.

@vnkozlov
Copy link

Thank you, @robcasloz
I propose to file RFE to do investigation about discussed improvements.
And proceed with current fix for JDK 16.

I don't see link to testing results in bug report.

@robcasloz
Copy link
Contributor Author

I propose to file RFE to do investigation about discussed improvements.
And proceed with current fix for JDK 16.

Thank you Vladimir for looking at this, I agree with your suggested plan.

Copy link

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing results are good.

@openjdk
Copy link

openjdk bot commented Dec 17, 2020

@robcasloz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255763: C2: OSR miscompilation caused by invalid memory instruction placement

Disable GCM hoisting of memory-writing nodes for irreducible CFGs. This prevents
GCM from wrongly "hoisting" stores into descendants of their original loop. Such
an "inverted hoisting" can happen due to CFGLoop::compute_freq()'s inaccurate
estimation of frequencies for irreducible CFGs.

Extend CFG verification code by checking that memory-writing nodes are placed in
either their original loop or an ancestor.

Add tests for the reducible and irreducible cases. The former was already
handled correctly before the change (the frequency estimation model prevents
"inverted hoisting" for reducible CFGs), and is just added for coverage.

This change addresses the specific miscompilation issue in a conservative way,
for simplicity and safety. Future work includes investigating if only the
illegal blocks can be discarded as candidates for GCM hoisting, and refining
frequency estimation for irreducible CFGs.

Reviewed-by: kvn, chagedorn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 34 new commits pushed to the master branch:

  • 2525f39: 8258714: Shenandoah: Process references before evacuation during degen
  • e680ebe: 8258007: Add instrumentation to NativeLibraryTest
  • c04c7e1: 8258002: Update "type" terminology in generated docs
  • 45bd3b9: 8223607: --override-methods=summary ignores some signature changes
  • 59ae054: 8258687: Build broken on Windows after fix for JDK-8258134
  • 1cc98bd: 8256693: getAnnotatedReceiverType parameterizes types too eagerly
  • 1ce2e94: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack
  • 45a150b: 8258134: assert(size == calc_size) failed: incorrect size calculation on x86_32 with AVX512 machines
  • 38593a4: 8257974: Regression 21% in DaCapo-lusearch-large after JDK-8236926
  • 7afb01d: 8258373: Update the text handling in the JPasswordField
  • ... and 24 more: https://git.openjdk.java.net/jdk16/compare/09e8675f568571d959d55b096c2cd3b033204e62...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @chhagedorn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 17, 2020
Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice summary in the JBS issue! That looks good to me and I agree with Vladimir to do this fix in 16 and proceed with an RFE to further investigate the mentioned improvement possibilities.

src/hotspot/share/opto/block.cpp Outdated Show resolved Hide resolved
@robcasloz
Copy link
Contributor Author

Nice summary in the JBS issue! That looks good to me and I agree with Vladimir to do this fix in 16 and proceed with an RFE to further investigate the mentioned improvement possibilities.

Thanks for reviewing, Christian!

@robcasloz
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Dec 21, 2020
@openjdk
Copy link

openjdk bot commented Dec 21, 2020

@robcasloz
Your change (at version 4f2763c) is now ready to be sponsored by a Committer.

@robcasloz
Copy link
Contributor Author

Thanks for reviewing @vnkozlov and @chhagedorn! Requesting integration, since the only update after the reviews is a trivial style change (4f2763c).

@chhagedorn
Copy link
Member

/sponsor

@openjdk openjdk bot closed this Dec 21, 2020
@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 21, 2020
@openjdk openjdk bot removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 21, 2020
@openjdk
Copy link

openjdk bot commented Dec 21, 2020

@chhagedorn @robcasloz Since your change was applied there have been 34 commits pushed to the master branch:

  • 2525f39: 8258714: Shenandoah: Process references before evacuation during degen
  • e680ebe: 8258007: Add instrumentation to NativeLibraryTest
  • c04c7e1: 8258002: Update "type" terminology in generated docs
  • 45bd3b9: 8223607: --override-methods=summary ignores some signature changes
  • 59ae054: 8258687: Build broken on Windows after fix for JDK-8258134
  • 1cc98bd: 8256693: getAnnotatedReceiverType parameterizes types too eagerly
  • 1ce2e94: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack
  • 45a150b: 8258134: assert(size == calc_size) failed: incorrect size calculation on x86_32 with AVX512 machines
  • 38593a4: 8257974: Regression 21% in DaCapo-lusearch-large after JDK-8236926
  • 7afb01d: 8258373: Update the text handling in the JPasswordField
  • ... and 24 more: https://git.openjdk.java.net/jdk16/compare/09e8675f568571d959d55b096c2cd3b033204e62...master

Your commit was automatically rebased without conflicts.

Pushed as commit 4e8338e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.java.net integrated Pull request has been integrated
3 participants