[InstCombine] Fold `switch(rol(x, C1)) case C2:` to `switch(x) case rol(C2, -C1):` #86307

YanWQ-monad · 2024-03-22T16:37:31Z

This closes #86161.

It is worth mentioning that, as @dtcxzyw pointed out, there is an inverse fold in

llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Lines 6911 to 6996 in 90454a6

    
           /// Try to transform a switch that has "holes" in it to a contiguous sequence 
        
           /// of cases. 
        
           /// 
        
           /// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be 
        
           /// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}. 
        
           /// 
        
           /// This converts a sparse switch into a dense switch which allows better 
        
           /// lowering and could also allow transforming into a lookup table. 
        
           static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder, 
        
                                         const DataLayout &DL, 
        
                                         const TargetTransformInfo &TTI) { 
        
             auto *CondTy = cast<IntegerType>(SI->getCondition()->getType()); 
        
             if (CondTy->getIntegerBitWidth() > 64 || 
        
                 !DL.fitsInLegalInteger(CondTy->getIntegerBitWidth())) 
        
               return false; 
        
             // Only bother with this optimization if there are more than 3 switch cases; 
        
             // SDAG will only bother creating jump tables for 4 or more cases. 
        
             if (SI->getNumCases() < 4) 
        
               return false; 
        
             // This transform is agnostic to the signedness of the input or case values. We 
        
             // can treat the case values as signed or unsigned. We can optimize more common 
        
             // cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values 
        
             // as signed. 
        
             SmallVector<int64_t,4> Values; 
        
             for (const auto &C : SI->cases()) 
        
               Values.push_back(C.getCaseValue()->getValue().getSExtValue()); 
        
             llvm::sort(Values); 
        
             // If the switch is already dense, there's nothing useful to do here. 
        
             if (isSwitchDense(Values)) 
        
               return false; 
        
             // First, transform the values such that they start at zero and ascend. 
        
             int64_t Base = Values[0]; 
        
             for (auto &V : Values) 
        
               V -= (uint64_t)(Base); 
        
             // Now we have signed numbers that have been shifted so that, given enough 
        
             // precision, there are no negative values. Since the rest of the transform 
        
             // is bitwise only, we switch now to an unsigned representation. 
        
             // This transform can be done speculatively because it is so cheap - it 
        
             // results in a single rotate operation being inserted. 
        
             // countTrailingZeros(0) returns 64. As Values is guaranteed to have more than 
        
             // one element and LLVM disallows duplicate cases, Shift is guaranteed to be 
        
             // less than 64. 
        
             unsigned Shift = 64; 
        
             for (auto &V : Values) 
        
               Shift = std::min(Shift, (unsigned)llvm::countr_zero((uint64_t)V)); 
        
             assert(Shift < 64); 
        
             if (Shift > 0) 
        
               for (auto &V : Values) 
        
                 V = (int64_t)((uint64_t)V >> Shift); 
        
             if (!isSwitchDense(Values)) 
        
               // Transform didn't create a dense switch. 
        
               return false; 
        
             // The obvious transform is to shift the switch condition right and emit a 
        
             // check that the condition actually cleanly divided by GCD, i.e. 
        
             //   C & (1 << Shift - 1) == 0 
        
             // inserting a new CFG edge to handle the case where it didn't divide cleanly. 
        
             // 
        
             // A cheaper way of doing this is a simple ROTR(C, Shift). This performs the 
        
             // shift and puts the shifted-off bits in the uppermost bits. If any of these 
        
             // are nonzero then the switch condition will be very large and will hit the 
        
             // default case. 
        
             auto *Ty = cast<IntegerType>(SI->getCondition()->getType()); 
        
             Builder.SetInsertPoint(SI); 
        
             Value *Sub = 
        
                 Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base)); 
        
             Value *Rot = Builder.CreateIntrinsic( 
        
                 Ty, Intrinsic::fshl, 
        
                 {Sub, Sub, ConstantInt::get(Ty, Ty->getBitWidth() - Shift)}); 
        
             SI->replaceUsesOfWith(SI->getCondition(), Rot); 
        
             for (auto Case : SI->cases()) { 
        
               auto *Orig = Case.getCaseValue(); 
        
               auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base); 
        
               Case.setValue(cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(Shift)))); 
        
             } 
        
             return true; 
        
           }

, so IR may be flickering in the optimization pipeline.
SimplifyCFG will have the upper hand, so there won't be any major problems. But in some rare cases, it might cause a regression (in @dtcxzyw's opt-benchmark).

github-actions · 2024-03-22T16:37:50Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-03-22T16:38:22Z

@llvm/pr-subscribers-llvm-transforms

Author: Monad (YanWQ-monad)

Changes

This solves #86161.

It is worth mentioning that, as @dtcxzyw pointed out, there is an inverse fold in

llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Lines 6911 to 6996 in 90454a6

    
           /// Try to transform a switch that has "holes" in it to a contiguous sequence 
        
           /// of cases. 
        
           /// 
        
           /// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be 
        
           /// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}. 
        
           /// 
        
           /// This converts a sparse switch into a dense switch which allows better 
        
           /// lowering and could also allow transforming into a lookup table. 
        
           static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder, 
        
                                         const DataLayout &DL, 
        
                                         const TargetTransformInfo &TTI) { 
        
             auto *CondTy = cast<IntegerType>(SI->getCondition()->getType()); 
        
             if (CondTy->getIntegerBitWidth() > 64 || 
        
                 !DL.fitsInLegalInteger(CondTy->getIntegerBitWidth())) 
        
               return false; 
        
             // Only bother with this optimization if there are more than 3 switch cases; 
        
             // SDAG will only bother creating jump tables for 4 or more cases. 
        
             if (SI->getNumCases() < 4) 
        
               return false; 
        
             // This transform is agnostic to the signedness of the input or case values. We 
        
             // can treat the case values as signed or unsigned. We can optimize more common 
        
             // cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values 
        
             // as signed. 
        
             SmallVector<int64_t,4> Values; 
        
             for (const auto &C : SI->cases()) 
        
               Values.push_back(C.getCaseValue()->getValue().getSExtValue()); 
        
             llvm::sort(Values); 
        
             // If the switch is already dense, there's nothing useful to do here. 
        
             if (isSwitchDense(Values)) 
        
               return false; 
        
             // First, transform the values such that they start at zero and ascend. 
        
             int64_t Base = Values[0]; 
        
             for (auto &V : Values) 
        
               V -= (uint64_t)(Base); 
        
             // Now we have signed numbers that have been shifted so that, given enough 
        
             // precision, there are no negative values. Since the rest of the transform 
        
             // is bitwise only, we switch now to an unsigned representation. 
        
             // This transform can be done speculatively because it is so cheap - it 
        
             // results in a single rotate operation being inserted. 
        
             // countTrailingZeros(0) returns 64. As Values is guaranteed to have more than 
        
             // one element and LLVM disallows duplicate cases, Shift is guaranteed to be 
        
             // less than 64. 
        
             unsigned Shift = 64; 
        
             for (auto &V : Values) 
        
               Shift = std::min(Shift, (unsigned)llvm::countr_zero((uint64_t)V)); 
        
             assert(Shift < 64); 
        
             if (Shift > 0) 
        
               for (auto &V : Values) 
        
                 V = (int64_t)((uint64_t)V >> Shift); 
        
             if (!isSwitchDense(Values)) 
        
               // Transform didn't create a dense switch. 
        
               return false; 
        
             // The obvious transform is to shift the switch condition right and emit a 
        
             // check that the condition actually cleanly divided by GCD, i.e. 
        
             //   C & (1 << Shift - 1) == 0 
        
             // inserting a new CFG edge to handle the case where it didn't divide cleanly. 
        
             // 
        
             // A cheaper way of doing this is a simple ROTR(C, Shift). This performs the 
        
             // shift and puts the shifted-off bits in the uppermost bits. If any of these 
        
             // are nonzero then the switch condition will be very large and will hit the 
        
             // default case. 
        
             auto *Ty = cast<IntegerType>(SI->getCondition()->getType()); 
        
             Builder.SetInsertPoint(SI); 
        
             Value *Sub = 
        
                 Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base)); 
        
             Value *Rot = Builder.CreateIntrinsic( 
        
                 Ty, Intrinsic::fshl, 
        
                 {Sub, Sub, ConstantInt::get(Ty, Ty->getBitWidth() - Shift)}); 
        
             SI->replaceUsesOfWith(SI->getCondition(), Rot); 
        
             for (auto Case : SI->cases()) { 
        
               auto *Orig = Case.getCaseValue(); 
        
               auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base); 
        
               Case.setValue(cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(Shift)))); 
        
             } 
        
             return true; 
        
           }

, so IR may be flickering in the optimization pipeline.
SimplifyCFG will have the upper hand, so there won't be any major problems. But in some rare cases, it might cause a regression (in @dtcxzyw's opt-benchmark).

Full diff: https://github.com/llvm/llvm-project/pull/86307.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+10)
(added) llvm/test/Transforms/InstCombine/switch-rol.ll (+33)

diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 7c40fb4fc86082..b6611cfbbfc1f4 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3645,6 +3645,16 @@ Instruction *InstCombinerImpl::visitSwitchInst(SwitchInst &SI) {
     }
   }
 
+  // Fold 'switch(rol(x, C1)) case C2:' to 'switch(x) case rol(C2, -C1):'
+  if (match(Cond,
+            m_FShl(m_Value(Op0), m_Deferred(Op0), m_ConstantInt(ShiftAmt)))) {
+    for (auto &Case : SI.cases()) {
+      const APInt NewCase = Case.getCaseValue()->getValue().rotr(ShiftAmt);
+      Case.setValue(ConstantInt::get(SI.getContext(), NewCase));
+    }
+    return replaceOperand(SI, 0, Op0);
+  }
+
   KnownBits Known = computeKnownBits(Cond, 0, &SI);
   unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
   unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
diff --git a/llvm/test/Transforms/InstCombine/switch-rol.ll b/llvm/test/Transforms/InstCombine/switch-rol.ll
new file mode 100644
index 00000000000000..1cd55ff91c9492
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/switch-rol.ll
@@ -0,0 +1,33 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+declare void @dummy()
+
+define i32 @switch_rol(i32 %a) #0 {
+; CHECK-LABEL: define i32 @switch_rol(
+; CHECK-SAME: i32 [[A:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    switch i32 [[A]], label [[DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 0, label [[TRAP_EXIT:%.*]]
+; CHECK-NEXT:      i32 20, label [[TRAP_EXIT]]
+; CHECK-NEXT:    ]
+; CHECK:       default:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br label [[TRAP_EXIT]]
+; CHECK:       trap.exit:
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  %rol = call i32 @llvm.fshl.i32(i32 %a, i32 %a, i32 30)
+  switch i32 %rol, label %default [
+  i32 0, label %trap.exit
+  i32 5, label %trap.exit
+  ]
+
+default:
+  call void @dummy()
+  br label %trap.exit
+
+trap.exit:
+  ret i32 0
+}

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

goldsteinn · 2024-03-22T17:26:03Z

This change looks fine, although it would be nice if we had some helper generically detecting if an op is reversable so we could generalize all these cases.

goldsteinn · 2024-03-22T17:41:19Z

This change looks fine, although it would be nice if we had some helper generically detecting if an op is reversable so we could generalize all these cases.

Would also be useful for things like icmp eq/ne X, C

dtcxzyw · 2024-03-22T17:52:06Z

This change looks fine, although it would be nice if we had some helper generically detecting if an op is reversable so we could generalize all these cases.

Would also be useful for things like icmp eq/ne X, C

It has been supported.

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 3621 to 3633 in bbcfe6f

    
           case Intrinsic::fshl: 
        
           case Intrinsic::fshr: 
        
             if (II->getArgOperand(0) == II->getArgOperand(1)) { 
        
               const APInt *RotAmtC; 
        
               // ror(X, RotAmtC) == C --> X == rol(C, RotAmtC) 
        
               // rol(X, RotAmtC) == C --> X == ror(C, RotAmtC) 
        
               if (match(II->getArgOperand(2), m_APInt(RotAmtC))) 
        
                 return new ICmpInst(Pred, II->getArgOperand(0), 
        
                                     II->getIntrinsicID() == Intrinsic::fshl 
        
                                         ? ConstantInt::get(Ty, C.rotr(*RotAmtC)) 
        
                                         : ConstantInt::get(Ty, C.rotl(*RotAmtC))); 
        
             } 
        
             break;

goldsteinn · 2024-03-22T18:32:05Z

This change looks fine, although it would be nice if we had some helper generically detecting if an op is reversable so we could generalize all these cases.

Would also be useful for things like icmp eq/ne X, C

It has been supported.

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 3621 to 3633 in bbcfe6f

case Intrinsic::fshl:

case Intrinsic::fshr:

if (II->getArgOperand(0) == II->getArgOperand(1)) {

const APInt *RotAmtC;

// ror(X, RotAmtC) == C --> X == rol(C, RotAmtC)

// rol(X, RotAmtC) == C --> X == ror(C, RotAmtC)

if (match(II->getArgOperand(2), m_APInt(RotAmtC)))

return new ICmpInst(Pred, II->getArgOperand(0),

II->getIntrinsicID() == Intrinsic::fshl

? ConstantInt::get(Ty, C.rotr(*RotAmtC))

: ConstantInt::get(Ty, C.rotl(*RotAmtC)));

}

break;

Err I don't mean that to necessarily new support in icmp eq/ne X, C (or here for that matter).
I mean we could use a common helper between icmp eq/ne X, C and here which would
defintely add coverage to this function and at the very least reduce code duplication.
I'm working on a patch.

goldsteinn · 2024-03-22T20:04:53Z

LGTM.

goldsteinn · 2024-03-22T21:48:56Z

This change looks fine, although it would be nice if we had some helper generically detecting if an op is reversable so we could generalize all these cases.

Would also be useful for things like icmp eq/ne X, C

It has been supported.

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 3621 to 3633 in bbcfe6f

case Intrinsic::fshl:

case Intrinsic::fshr:

if (II->getArgOperand(0) == II->getArgOperand(1)) {

const APInt *RotAmtC;

// ror(X, RotAmtC) == C --> X == rol(C, RotAmtC)

// rol(X, RotAmtC) == C --> X == ror(C, RotAmtC)

if (match(II->getArgOperand(2), m_APInt(RotAmtC)))

return new ICmpInst(Pred, II->getArgOperand(0),

II->getIntrinsicID() == Intrinsic::fshl

? ConstantInt::get(Ty, C.rotr(*RotAmtC))

: ConstantInt::get(Ty, C.rotl(*RotAmtC)));

}

break;

Err I don't mean that to necessarily new support in icmp eq/ne X, C (or here for that matter). I mean we could use a common helper between icmp eq/ne X, C and here which would defintely add coverage to this function and at the very least reduce code duplication. I'm working on a patch.

See #86346 for what I mean.

goldsteinn · 2024-04-03T19:48:39Z

ping @dtcxzyw @nikic

PR Link: llvm/llvm-project#86307

dtcxzyw · 2024-04-04T09:55:13Z

@YanWQ-monad I'd like to move the logic into SimplifyCFG. Then we will run into an infinite loop if the existing code in SimplifyCFG reverts your fold :)

goldsteinn · 2024-04-04T16:11:33Z

@YanWQ-monad I'd like to move the logic into SimplifyCFG. Then we will run into an infinite loop if the existing code in SimplifyCFG reverts your fold :)

Shouldn't that also apply for all the other cases we handle here?

dtcxzyw · 2024-04-04T16:46:40Z

Shouldn't that also apply for all the other cases we handle here?

No. We should leave other optimizations in InstCombine because they don't modify DomTree and some of them need DT information. I suggest we implement this fold in SimplifyCFG so as to make it easier to see the impact.

goldsteinn · 2024-04-04T17:33:08Z

Shouldn't that also apply for all the other cases we handle here?

No. We should leave other optimizations in InstCombine because they don't modify DomTree and some of them need DT information. I suggest we implement this fold in SimplifyCFG so as to make it easier to see the impact.

I don't really see your point. How is this fold different from the shl one?

nikic · 2024-04-10T06:54:12Z

I don't really like the idea of SimplifyCFG and InstCombine changing the code back and forth. Maybe we should limit the fold to the case where it will not make the switch less dense?

YanWQ-monad · 2024-04-10T12:28:43Z

I don't really like the idea of SimplifyCFG and InstCombine changing the code back and forth. Maybe we should limit the fold to the case where it will not make the switch less dense?

It's rare that the fold will make the switch dense, at least as the issue shows, and in dtcxzyw's benchmark. So a more reasonable solution should be limit the fold so that it won't affect ReduceSwitchRange?

If so, isSwitchDense may be needed to check the preconditions, but it's only visible inside SimplifyCFG. Should I move isSwitchDense to somewhere else to make it shared between SimplifyCFG and InstCombine?

XChy · 2024-05-10T14:53:21Z

I think folding it if the number of reachable successors <= 3 is OK, because the backend doesn't create a lookup table in this case, and the real-world case in original issue also holds only 3 successors.

YanWQ-monad requested a review from nikic as a code owner March 22, 2024 16:37

llvmbot added the llvm:transforms label Mar 22, 2024

YanWQ-monad added 2 commits March 23, 2024 00:47

Pre-commit: add test

87a7204

Fold 'switch(rol(x, C1))'

e52ac48

dtcxzyw requested review from dtcxzyw and goldsteinn March 22, 2024 16:49

YanWQ-monad force-pushed the switch-rol branch from 2c578ea to e52ac48 Compare March 22, 2024 16:51

goldsteinn reviewed Mar 22, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Show resolved Hide resolved

goldsteinn mentioned this pull request Mar 22, 2024

[InstCombine] Add helper simplifying Instruction w/ constants with eq/ne Constants #86346

Open

dtcxzyw mentioned this pull request Mar 23, 2024

Test PR86307 plctlab/llvm-ci#1122

Open

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Apr 4, 2024

pre-commit: test PR86307

91a7ae0

PR Link: llvm/llvm-project#86307

dtcxzyw mentioned this pull request Apr 4, 2024

pre-commit: test PR86307 dtcxzyw/llvm-opt-benchmark#474

Closed

[InstCombine] Only fold switches with no more than 3 cases

365b36d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Fold `switch(rol(x, C1)) case C2:` to `switch(x) case rol(C2, -C1):` #86307

[InstCombine] Fold `switch(rol(x, C1)) case C2:` to `switch(x) case rol(C2, -C1):` #86307

YanWQ-monad commented Mar 22, 2024 •

edited by dtcxzyw

Loading

github-actions bot commented Mar 22, 2024

llvmbot commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

dtcxzyw commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Apr 3, 2024

dtcxzyw commented Apr 4, 2024

goldsteinn commented Apr 4, 2024

dtcxzyw commented Apr 4, 2024

goldsteinn commented Apr 4, 2024

nikic commented Apr 10, 2024

YanWQ-monad commented Apr 10, 2024 •

edited

Loading

XChy commented May 10, 2024

	/// Try to transform a switch that has "holes" in it to a contiguous sequence
	/// of cases.
	///
	/// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be
	/// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}.
	///
	/// This converts a sparse switch into a dense switch which allows better
	/// lowering and could also allow transforming into a lookup table.
	static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
	const DataLayout &DL,
	const TargetTransformInfo &TTI) {
	auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());
	if (CondTy->getIntegerBitWidth() > 64 \|\|
	!DL.fitsInLegalInteger(CondTy->getIntegerBitWidth()))
	return false;
	// Only bother with this optimization if there are more than 3 switch cases;
	// SDAG will only bother creating jump tables for 4 or more cases.
	if (SI->getNumCases() < 4)
	return false;

	// This transform is agnostic to the signedness of the input or case values. We
	// can treat the case values as signed or unsigned. We can optimize more common
	// cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values
	// as signed.
	SmallVector<int64_t,4> Values;
	for (const auto &C : SI->cases())
	Values.push_back(C.getCaseValue()->getValue().getSExtValue());
	llvm::sort(Values);

	// If the switch is already dense, there's nothing useful to do here.
	if (isSwitchDense(Values))
	return false;

	// First, transform the values such that they start at zero and ascend.
	int64_t Base = Values[0];
	for (auto &V : Values)
	V -= (uint64_t)(Base);

	// Now we have signed numbers that have been shifted so that, given enough
	// precision, there are no negative values. Since the rest of the transform
	// is bitwise only, we switch now to an unsigned representation.

	// This transform can be done speculatively because it is so cheap - it
	// results in a single rotate operation being inserted.

	// countTrailingZeros(0) returns 64. As Values is guaranteed to have more than
	// one element and LLVM disallows duplicate cases, Shift is guaranteed to be
	// less than 64.
	unsigned Shift = 64;
	for (auto &V : Values)
	Shift = std::min(Shift, (unsigned)llvm::countr_zero((uint64_t)V));
	assert(Shift < 64);
	if (Shift > 0)
	for (auto &V : Values)
	V = (int64_t)((uint64_t)V >> Shift);

	if (!isSwitchDense(Values))
	// Transform didn't create a dense switch.
	return false;

	// The obvious transform is to shift the switch condition right and emit a
	// check that the condition actually cleanly divided by GCD, i.e.
	// C & (1 << Shift - 1) == 0
	// inserting a new CFG edge to handle the case where it didn't divide cleanly.
	//
	// A cheaper way of doing this is a simple ROTR(C, Shift). This performs the
	// shift and puts the shifted-off bits in the uppermost bits. If any of these
	// are nonzero then the switch condition will be very large and will hit the
	// default case.

	auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
	Builder.SetInsertPoint(SI);
	Value *Sub =
	Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base));
	Value *Rot = Builder.CreateIntrinsic(
	Ty, Intrinsic::fshl,
	{Sub, Sub, ConstantInt::get(Ty, Ty->getBitWidth() - Shift)});
	SI->replaceUsesOfWith(SI->getCondition(), Rot);

	for (auto Case : SI->cases()) {
	auto *Orig = Case.getCaseValue();
	auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base);
	Case.setValue(cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(Shift))));
	}
	return true;
	}

[InstCombine] Fold switch(rol(x, C1)) case C2: to switch(x) case rol(C2, -C1): #86307

Are you sure you want to change the base?

[InstCombine] Fold switch(rol(x, C1)) case C2: to switch(x) case rol(C2, -C1): #86307

Conversation

YanWQ-monad commented Mar 22, 2024 • edited by dtcxzyw Loading

github-actions bot commented Mar 22, 2024

llvmbot commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

dtcxzyw commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Mar 22, 2024

goldsteinn commented Apr 3, 2024

dtcxzyw commented Apr 4, 2024

goldsteinn commented Apr 4, 2024

dtcxzyw commented Apr 4, 2024

goldsteinn commented Apr 4, 2024

nikic commented Apr 10, 2024

YanWQ-monad commented Apr 10, 2024 • edited Loading

XChy commented May 10, 2024

[InstCombine] Fold `switch(rol(x, C1)) case C2:` to `switch(x) case rol(C2, -C1):` #86307

[InstCombine] Fold `switch(rol(x, C1)) case C2:` to `switch(x) case rol(C2, -C1):` #86307

YanWQ-monad commented Mar 22, 2024 •

edited by dtcxzyw

Loading

YanWQ-monad commented Apr 10, 2024 •

edited

Loading