Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression due to lost cross-block CSE after recent refactoring #64282

Closed
preames opened this issue Jul 31, 2023 · 3 comments
Closed

Comments

@preames
Copy link
Collaborator

preames commented Jul 31, 2023

The following test case demonstrates a performance regression on ToT (and unfortunately, the release branch).

$ cat implicit_def.ll 
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc < %s -O3 -mtriple=riscv64 -mattr=+v | FileCheck %s

define void @foo(<vscale x 2 x i32> %x, <vscale x 2 x i32> %y, ptr %p1, ptr %p2, i1 zeroext %cond) {
; CHECK-LABEL: foo:
; CHECK:       # %bb.0:
; CHECK-NEXT:    vsetvli a3, zero, e32, m1, ta, ma
; CHECK-NEXT:    vadd.vv v10, v8, v9
; CHECK-NEXT:    vs1r.v v10, (a0)
; CHECK-NEXT:    bnez a2, .LBB0_2
; CHECK-NEXT:  # %bb.1: # %falsebb
; CHECK-NEXT:    vadd.vv v8, v8, v9
; CHECK-NEXT:    vs1r.v v8, (a1)
; CHECK-NEXT:  .LBB0_2: # %mergebb
; CHECK-NEXT:    ret
  %a = add <vscale x 2 x i32> %x, %y
  store <vscale x 2 x i32> %a, ptr %p1
  br i1 %cond, label %mergebb, label %falsebb

falsebb:
  %b = add <vscale x 2 x i32> %x, %y
  store <vscale x 2 x i32> %b, ptr %p2
  br label %mergebb

mergebb:
  ret void
}

The basic problem here is triggered by my recent refactoring series (see https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295). After that series, we're now using IMPLICIT_DEF operands on many vector operations which we didn't used to. (We previously had multiple forms, both with and without passthru.)

The root issue is that we don't perform cross block CSE (in MachineCSE) for IMPLICIT_DEF nodes. This is not a new issue, but is newly exposed on RISCV.

Note that I'm unclear on the practical impact of this regression. Middle level optimization will tend to catch such cases, so the opportunities we're missing are either a) generated during SelectionDAG or b) for some reason not caught in the middle end.

At the moment, I've got a couple approaches to address this.

First, we could revert the series above on the release branch. Normally, this would be my goto option, but given how invasive these changes were, and how much has built on top, I'm leery of this option.

Second, we can simply perform CSE on IMPLICIT_DEF I've locally implemented this, and it appears to functionally work. My original worry was a correctness risk, but I think I've mostly convinced myself this is a non-issue. However, both RegisterCoalescer and ProcessImplicitDefs appear to have sensitivities to cross block live ranges for IMPLCIT_DEFs which look less than obvious on how to fix.

Third, we could perform CSE of the IMPLICIT_DEF users without CSE of the IMPLICIT_DEF itself. This is the most direct fix, but requires some delicate code in the hash map keys in MachineCSE. (In particularly, we need to keep hash and identity in sync.)

Fourth, we could take inspiration from the predication support on ARM (and other targets), and conditionally add the pass thru operand only if needed. I need to investigate this option in more depth.

@llvmbot
Copy link

llvmbot commented Jul 31, 2023

@llvm/issue-subscribers-backend-risc-v

@preames
Copy link
Collaborator Author

preames commented Aug 2, 2023

Candidate patch: https://reviews.llvm.org/D156909

preames added a commit that referenced this issue Aug 14, 2023
…rands

In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in #64282.

This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.

We may end up backporting this into the 17.x release branch.  Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.

Differential Revision: https://reviews.llvm.org/D156909
@preames
Copy link
Collaborator Author

preames commented Sep 5, 2023

@preames preames closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants