You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The root issue is that we don't perform cross block CSE (in MachineCSE) for IMPLICIT_DEF nodes. This is not a new issue, but is newly exposed on RISCV.
Note that I'm unclear on the practical impact of this regression. Middle level optimization will tend to catch such cases, so the opportunities we're missing are either a) generated during SelectionDAG or b) for some reason not caught in the middle end.
At the moment, I've got a couple approaches to address this.
First, we could revert the series above on the release branch. Normally, this would be my goto option, but given how invasive these changes were, and how much has built on top, I'm leery of this option.
Second, we can simply perform CSE on IMPLICIT_DEF I've locally implemented this, and it appears to functionally work. My original worry was a correctness risk, but I think I've mostly convinced myself this is a non-issue. However, both RegisterCoalescer and ProcessImplicitDefs appear to have sensitivities to cross block live ranges for IMPLCIT_DEFs which look less than obvious on how to fix.
Third, we could perform CSE of the IMPLICIT_DEF users without CSE of the IMPLICIT_DEF itself. This is the most direct fix, but requires some delicate code in the hash map keys in MachineCSE. (In particularly, we need to keep hash and identity in sync.)
Fourth, we could take inspiration from the predication support on ARM (and other targets), and conditionally add the pass thru operand only if needed. I need to investigate this option in more depth.
The text was updated successfully, but these errors were encountered:
…rands
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in #64282.
This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.
We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.
Differential Revision: https://reviews.llvm.org/D156909
The following test case demonstrates a performance regression on ToT (and unfortunately, the release branch).
The basic problem here is triggered by my recent refactoring series (see https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295). After that series, we're now using IMPLICIT_DEF operands on many vector operations which we didn't used to. (We previously had multiple forms, both with and without passthru.)
The root issue is that we don't perform cross block CSE (in MachineCSE) for IMPLICIT_DEF nodes. This is not a new issue, but is newly exposed on RISCV.
Note that I'm unclear on the practical impact of this regression. Middle level optimization will tend to catch such cases, so the opportunities we're missing are either a) generated during SelectionDAG or b) for some reason not caught in the middle end.
At the moment, I've got a couple approaches to address this.
First, we could revert the series above on the release branch. Normally, this would be my goto option, but given how invasive these changes were, and how much has built on top, I'm leery of this option.
Second, we can simply perform CSE on IMPLICIT_DEF I've locally implemented this, and it appears to functionally work. My original worry was a correctness risk, but I think I've mostly convinced myself this is a non-issue. However, both RegisterCoalescer and ProcessImplicitDefs appear to have sensitivities to cross block live ranges for IMPLCIT_DEFs which look less than obvious on how to fix.
Third, we could perform CSE of the IMPLICIT_DEF users without CSE of the IMPLICIT_DEF itself. This is the most direct fix, but requires some delicate code in the hash map keys in MachineCSE. (In particularly, we need to keep hash and identity in sync.)
Fourth, we could take inspiration from the predication support on ARM (and other targets), and conditionally add the pass thru operand only if needed. I need to investigate this option in more depth.
The text was updated successfully, but these errors were encountered: