Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Run X86FastPreTileConfigPass only with FastISel. #70754

Closed
wants to merge 1 commit into from

Conversation

e-kud
Copy link
Contributor

@e-kud e-kud commented Oct 31, 2023

Closes #64452

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 31, 2023

@llvm/pr-subscribers-backend-x86

Author: Evgenii Kudriashov (e-kud)

Changes

Closes #64452


Full diff: https://github.com/llvm/llvm-project/pull/70754.diff

3 Files Affected:

  • (modified) llvm/lib/Target/X86/X86FastPreTileConfig.cpp (+3-1)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+3-3)
  • (modified) llvm/test/CodeGen/X86/O0-pipeline.ll (+102-86)
diff --git a/llvm/lib/Target/X86/X86FastPreTileConfig.cpp b/llvm/lib/Target/X86/X86FastPreTileConfig.cpp
index ea942445a18193c..98621eae5861d87 100644
--- a/llvm/lib/Target/X86/X86FastPreTileConfig.cpp
+++ b/llvm/lib/Target/X86/X86FastPreTileConfig.cpp
@@ -667,7 +667,9 @@ bool X86FastPreTileConfig::runOnMachineFunction(MachineFunction &MFunc) {
   bool HasVirtTileReg = false;
   for (unsigned I = 0, E = NumVirtRegs; I != E; ++I) {
     Register VirtReg = Register::index2VirtReg(I);
-    if (MRI->getRegClass(VirtReg)->getID() == X86::TILERegClassID) {
+    const TargetRegisterClass *RC = MRI->getRegClassOrNull(VirtReg);
+    assert(RC && "A register must have a class after FastISel");
+    if (RC->getID() == X86::TILERegClassID) {
       HasVirtTileReg = true;
       break;
     }
diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp
index 5668b514d6dec07..4698123ff00fa77 100644
--- a/llvm/lib/Target/X86/X86TargetMachine.cpp
+++ b/llvm/lib/Target/X86/X86TargetMachine.cpp
@@ -531,10 +531,10 @@ void X86PassConfig::addPreRegAlloc() {
   addPass(createX86FlagsCopyLoweringPass());
   addPass(createX86DynAllocaExpander());
 
-  if (getOptLevel() != CodeGenOptLevel::None)
-    addPass(createX86PreTileConfigPass());
-  else
+  if (TM->Options.EnableFastISel)
     addPass(createX86FastPreTileConfigPass());
+  else
+    addPass(createX86PreTileConfigPass());
 }
 
 void X86PassConfig::addMachineSSAOptimization() {
diff --git a/llvm/test/CodeGen/X86/O0-pipeline.ll b/llvm/test/CodeGen/X86/O0-pipeline.ll
index 402645ed1e2e5d6..5943e3bca0d34d9 100644
--- a/llvm/test/CodeGen/X86/O0-pipeline.ll
+++ b/llvm/test/CodeGen/X86/O0-pipeline.ll
@@ -1,95 +1,111 @@
 ; When EXPENSIVE_CHECKS are enabled, the machine verifier appears between each
 ; pass. Ignore it with 'grep -v'.
 ; RUN: llc -mtriple=x86_64-- -O0 -debug-pass=Structure < %s -o /dev/null 2>&1 \
-; RUN:   | grep -v 'Verify generated machine code' | FileCheck %s
+; RUN:   | grep -v 'Verify generated machine code' | FileCheck %s  --check-prefixes=COMMON,SDISEL,DEFAULT
+; RUN: llc -mtriple=x86_64-- -O0 -fast-isel=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \
+; RUN:   | grep -v 'Verify generated machine code' | FileCheck %s  --check-prefixes=COMMON,SDISEL,NOFASTISEL
+; RUN: llc -mtriple=x86_64-- -O0 -global-isel -fast-isel=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \
+; RUN:   | grep -v 'Verify generated machine code' | FileCheck %s  --check-prefixes=COMMON,GISEL,NOFASTISEL
 
 ; REQUIRES: asserts
 
-; CHECK-LABEL: Pass Arguments:
-; CHECK-NEXT: Target Library Information
-; CHECK-NEXT: Target Pass Configuration
-; CHECK-NEXT: Machine Module Information
-; CHECK-NEXT: Target Transform Information
-; CHECK-NEXT: Create Garbage Collector Module Metadata
-; CHECK-NEXT: Assumption Cache Tracker
-; CHECK-NEXT: Profile summary info
-; CHECK-NEXT: Machine Branch Probability Analysis
-; CHECK-NEXT:   ModulePass Manager
-; CHECK-NEXT:     Pre-ISel Intrinsic Lowering
-; CHECK-NEXT:     FunctionPass Manager
-; CHECK-NEXT:       Expand large div/rem
-; CHECK-NEXT:       Expand large fp convert
-; CHECK-NEXT:       Expand Atomic instructions
-; CHECK-NEXT:       Lower AMX intrinsics
-; CHECK-NEXT:       Lower AMX type for load/store
-; CHECK-NEXT:       Module Verifier
-; CHECK-NEXT:       Lower Garbage Collection Instructions
-; CHECK-NEXT:       Shadow Stack GC Lowering
-; CHECK-NEXT:       Lower constant intrinsics
-; CHECK-NEXT:       Remove unreachable blocks from the CFG
-; CHECK-NEXT:       Expand vector predication intrinsics
-; CHECK-NEXT:       Scalarize Masked Memory Intrinsics
-; CHECK-NEXT:       Expand reduction intrinsics
-; CHECK-NEXT:       Expand indirectbr instructions
-; CHECK-NEXT:       Exception handling preparation
-; CHECK-NEXT:       Prepare callbr
-; CHECK-NEXT:       Safe Stack instrumentation pass
-; CHECK-NEXT:       Insert stack protectors
-; CHECK-NEXT:       Module Verifier
-; CHECK-NEXT:       Assignment Tracking Analysis
-; CHECK-NEXT:       X86 DAG->DAG Instruction Selection
-; CHECK-NEXT:       X86 PIC Global Base Reg Initialization
-; CHECK-NEXT:       Argument Stack Rebase
-; CHECK-NEXT:       Finalize ISel and expand pseudo-instructions
-; CHECK-NEXT:       Local Stack Slot Allocation
-; CHECK-NEXT:       X86 speculative load hardening
-; CHECK-NEXT:       MachineDominator Tree Construction
-; CHECK-NEXT:       X86 EFLAGS copy lowering
-; CHECK-NEXT:       X86 DynAlloca Expander
-; CHECK-NEXT:       Fast Tile Register Preconfigure
-; CHECK-NEXT:       Eliminate PHI nodes for register allocation
-; CHECK-NEXT:       Two-Address instruction pass
-; CHECK-NEXT:       Fast Register Allocator
-; CHECK-NEXT:       Fast Tile Register Configure
-; CHECK-NEXT:       X86 Lower Tile Copy
-; CHECK-NEXT:       Bundle Machine CFG Edges
-; CHECK-NEXT:       X86 FP Stackifier
-; CHECK-NEXT:       Remove Redundant DEBUG_VALUE analysis
-; CHECK-NEXT:       Fixup Statepoint Caller Saved
-; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
-; CHECK-NEXT:       Machine Optimization Remark Emitter
-; CHECK-NEXT:       Prologue/Epilogue Insertion & Frame Finalization
-; CHECK-NEXT:       Post-RA pseudo instruction expansion pass
-; CHECK-NEXT:       X86 pseudo instruction expansion pass
-; CHECK-NEXT:       Insert KCFI indirect call checks
-; CHECK-NEXT:       Analyze Machine Code For Garbage Collection
-; CHECK-NEXT:       Insert fentry calls
-; CHECK-NEXT:       Insert XRay ops
-; CHECK-NEXT:       Implement the 'patchable-function' attribute
-; CHECK-NEXT:       X86 Indirect Branch Tracking
-; CHECK-NEXT:       X86 vzeroupper inserter
-; CHECK-NEXT:       Compressing EVEX instrs to VEX encoding when possibl
-; CHECK-NEXT:       X86 Discriminate Memory Operands
-; CHECK-NEXT:       X86 Insert Cache Prefetches
-; CHECK-NEXT:       X86 insert wait instruction
-; CHECK-NEXT:       Contiguously Lay Out Funclets
-; CHECK-NEXT:       StackMap Liveness Analysis
-; CHECK-NEXT:       Live DEBUG_VALUE analysis
-; CHECK-NEXT:       Machine Sanitizer Binary Metadata
-; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
-; CHECK-NEXT:       Machine Optimization Remark Emitter
-; CHECK-NEXT:       Stack Frame Layout Analysis
-; CHECK-NEXT:       X86 Speculative Execution Side Effect Suppression
-; CHECK-NEXT:       X86 Indirect Thunks
-; CHECK-NEXT:       X86 Return Thunks
-; CHECK-NEXT:       Check CFA info and insert CFI instructions if needed
-; CHECK-NEXT:       X86 Load Value Injection (LVI) Ret-Hardening
-; CHECK-NEXT:       Pseudo Probe Inserter
-; CHECK-NEXT:       Unpack machine instruction bundles
-; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
-; CHECK-NEXT:       Machine Optimization Remark Emitter
-; CHECK-NEXT:       X86 Assembly Printer
-; CHECK-NEXT:       Free MachineFunction
+; COMMON-LABEL: Pass Arguments:
+; COMMON-NEXT: Target Library Information
+; COMMON-NEXT: Target Pass Configuration
+; COMMON-NEXT: Machine Module Information
+; COMMON-NEXT: Target Transform Information
+; COMMON-NEXT: Create Garbage Collector Module Metadata
+; COMMON-NEXT: Assumption Cache Tracker
+; SDISEL-NEXT: Profile summary info
+; COMMON-NEXT: Machine Branch Probability Analysis
+; COMMON-NEXT:   ModulePass Manager
+; COMMON-NEXT:     Pre-ISel Intrinsic Lowering
+; COMMON-NEXT:     FunctionPass Manager
+; COMMON-NEXT:       Expand large div/rem
+; COMMON-NEXT:       Expand large fp convert
+; COMMON-NEXT:       Expand Atomic instructions
+; COMMON-NEXT:       Lower AMX intrinsics
+; COMMON-NEXT:       Lower AMX type for load/store
+; COMMON-NEXT:       Module Verifier
+; COMMON-NEXT:       Lower Garbage Collection Instructions
+; COMMON-NEXT:       Shadow Stack GC Lowering
+; COMMON-NEXT:       Lower constant intrinsics
+; COMMON-NEXT:       Remove unreachable blocks from the CFG
+; COMMON-NEXT:       Expand vector predication intrinsics
+; COMMON-NEXT:       Scalarize Masked Memory Intrinsics
+; COMMON-NEXT:       Expand reduction intrinsics
+; COMMON-NEXT:       Expand indirectbr instructions
+; COMMON-NEXT:       Exception handling preparation
+; COMMON-NEXT:       Prepare callbr
+; COMMON-NEXT:       Safe Stack instrumentation pass
+; COMMON-NEXT:       Insert stack protectors
+; COMMON-NEXT:       Module Verifier
+; SDISEL-NEXT:       Assignment Tracking Analysis
+; SDISEL-NEXT:       X86 DAG->DAG Instruction Selection
+; SDISEL-NEXT:       X86 PIC Global Base Reg Initialization
+; SDISEL-NEXT:       Argument Stack Rebase
+; GISEL-NEXT:        Analysis containing CSE Info
+; GISEL-NEXT:        IRTranslator
+; GISEL-NEXT:        Analysis containing CSE Info
+; GISEL-NEXT:        Analysis for ComputingKnownBits
+; GISEL-NEXT:        Legalizer
+; GISEL-NEXT:        RegBankSelect
+; GISEL-NEXT:        Analysis for ComputingKnownBits
+; GISEL-NEXT:        InstructionSelect
+; GISEL-NEXT:        ResetMachineFunction
+; COMMON-NEXT:       Finalize ISel and expand pseudo-instructions
+; COMMON-NEXT:       Local Stack Slot Allocation
+; COMMON-NEXT:       X86 speculative load hardening
+; COMMON-NEXT:       MachineDominator Tree Construction
+; COMMON-NEXT:       X86 EFLAGS copy lowering
+; COMMON-NEXT:       X86 DynAlloca Expander
+; DEFAULT-NEXT:      Fast Tile Register Preconfigure
+; NOFASTISEL-NEXT:   MachineDominator Tree Construction
+; NOFASTISEL-NEXT:   Machine Natural Loop Construction
+; NOFASTISEL-NEXT:   Tile Register Pre-configure
+; COMMON-NEXT:       Eliminate PHI nodes for register allocation
+; COMMON-NEXT:       Two-Address instruction pass
+; COMMON-NEXT:       Fast Register Allocator
+; COMMON-NEXT:       Fast Tile Register Configure
+; COMMON-NEXT:       X86 Lower Tile Copy
+; COMMON-NEXT:       Bundle Machine CFG Edges
+; COMMON-NEXT:       X86 FP Stackifier
+; COMMON-NEXT:       Remove Redundant DEBUG_VALUE analysis
+; COMMON-NEXT:       Fixup Statepoint Caller Saved
+; COMMON-NEXT:       Lazy Machine Block Frequency Analysis
+; COMMON-NEXT:       Machine Optimization Remark Emitter
+; COMMON-NEXT:       Prologue/Epilogue Insertion & Frame Finalization
+; COMMON-NEXT:       Post-RA pseudo instruction expansion pass
+; COMMON-NEXT:       X86 pseudo instruction expansion pass
+; COMMON-NEXT:       Insert KCFI indirect call checks
+; COMMON-NEXT:       Analyze Machine Code For Garbage Collection
+; COMMON-NEXT:       Insert fentry calls
+; COMMON-NEXT:       Insert XRay ops
+; COMMON-NEXT:       Implement the 'patchable-function' attribute
+; COMMON-NEXT:       X86 Indirect Branch Tracking
+; COMMON-NEXT:       X86 vzeroupper inserter
+; COMMON-NEXT:       Compressing EVEX instrs to VEX encoding when possibl
+; COMMON-NEXT:       X86 Discriminate Memory Operands
+; COMMON-NEXT:       X86 Insert Cache Prefetches
+; COMMON-NEXT:       X86 insert wait instruction
+; COMMON-NEXT:       Contiguously Lay Out Funclets
+; COMMON-NEXT:       StackMap Liveness Analysis
+; COMMON-NEXT:       Live DEBUG_VALUE analysis
+; COMMON-NEXT:       Machine Sanitizer Binary Metadata
+; COMMON-NEXT:       Lazy Machine Block Frequency Analysis
+; COMMON-NEXT:       Machine Optimization Remark Emitter
+; COMMON-NEXT:       Stack Frame Layout Analysis
+; COMMON-NEXT:       X86 Speculative Execution Side Effect Suppression
+; COMMON-NEXT:       X86 Indirect Thunks
+; COMMON-NEXT:       X86 Return Thunks
+; COMMON-NEXT:       Check CFA info and insert CFI instructions if needed
+; COMMON-NEXT:       X86 Load Value Injection (LVI) Ret-Hardening
+; COMMON-NEXT:       Pseudo Probe Inserter
+; COMMON-NEXT:       Unpack machine instruction bundles
+; COMMON-NEXT:       Lazy Machine Block Frequency Analysis
+; COMMON-NEXT:       Machine Optimization Remark Emitter
+; COMMON-NEXT:       X86 Assembly Printer
+; COMMON-NEXT:       Free MachineFunction
 
 define void @f() {
   ret void

@e-kud
Copy link
Contributor Author

e-kud commented Oct 31, 2023

This is a revision from the phabricator https://reviews.llvm.org/D157458.
The revision hanged up on a dicussion of whether it is a bug or intended behavior in GlobalISel when a register doesn't have a class. I've created an issue for the problem: #66531. Want to bring attention to this issue on the next GlobalISel sync meeting.

It's not X86 problem. The same can be observed on AArch64 as well. The only difference is that X86 is able to reveal this problem. At the same time, X86FastPreTileConfigPass was enabled in assumption of SelectionDAG and FastISel and it crashes with GlobalISel if there is a single dead instruction. I want to disable it until it is clear whether GlobalISel needs it or not.
Another moment, previous enabling looks suspicious because non-zero opt level doesn't guarantee that FastISel is not used.

I intentionally do not fix comments in getRegClass as it depends on #66531 resolution. However I can add a comment about the issue somewhere if needed.

/cc @qcolombet @arsenm @aemerson

Comment on lines +670 to +671
const TargetRegisterClass *RC = MRI->getRegClassOrNull(VirtReg);
assert(RC && "A register must have a class after FastISel");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use getRegClass instead of doing your own assert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should have a registers have a class function property to go with the others that the PM would enforce

@arsenm
Copy link
Contributor

arsenm commented Nov 13, 2023

This is not the kind of problem that can be correctly solved by changing the pass set. The MIR must follow a consistent set of rules at all points. All selectors must be bound to emit MIR that respects the rules. Shuffling around which passes run with which selector is expected will not work reliably, especially with fallbacks. Every selector should be guaranteeing no virtual registers without classes are passed through

@e-kud
Copy link
Contributor Author

e-kud commented Nov 14, 2023

This is not the kind of problem that can be correctly solved by changing the pass set. The MIR must follow a consistent set of rules at all points. All selectors must be bound to emit MIR that respects the rules. Shuffling around which passes run with which selector is expected will not work reliably, especially with fallbacks. Every selector should be guaranteeing no virtual registers without classes are passed through

Agreed. Also I've realized that generally we want to keep this pass for GlobalISel as it allows a fast skip at O0. Closing in favor of the original issue that leaves registers without classes.

@e-kud e-kud closed this Nov 14, 2023
@e-kud e-kud deleted the global-amx branch November 14, 2023 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants