[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

AdamMagierFOSS · 2024-02-03T00:10:52Z

Testing the shift-exponent check with small width _BitInt values exposed a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result to determine valid exponent sizes. False positives were reported for some left shifts when width(LHS)-1 > range(RHS) and false negatives were reported for right shifts when value(RHS) > range(LHS). This patch caps the maximum value of GetWidthMinusOneValue to fit within range(RHS) to fix the issue with left shifts and fixes a code generation in EmitShr to fix the issue with right shifts and renames the function to GetMaximumShiftAmount to better reflect the new behaviour.

Fixes #80135.

llvmbot · 2024-02-03T00:11:08Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Adam Magier (AdamMagierFOSS)

Changes

Testing the shift-exponent check with small width _BitInt values exposed a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result to determine valid exponent sizes. False positives were reported for some left shifts when width(LHS)-1 > range(RHS) and false negatives were reported for right shifts when value(RHS) > range(LHS). This patch caps the maximum value of GetWidthMinusOneValue to fit within range(RHS) to fix the issue with left shifts and fixes a code generation in EmitShr to fix the issue with right shifts.

Full diff: https://github.com/llvm/llvm-project/pull/80515.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CGExprScalar.cpp (+8-1)
(added) clang/test/CodeGen/ubsan-shift-bitint.c (+36)

diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 5502f685f6474..e2e3ed839714a 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -4121,6 +4121,13 @@ Value *ScalarExprEmitter::GetWidthMinusOneValue(Value* LHS,Value* RHS) {
     Ty = cast<llvm::IntegerType>(VT->getElementType());
   else
     Ty = cast<llvm::IntegerType>(LHS->getType());
+  // Testing with small _BitInt types has shown that Ty->getBitwidth() - 1
+  // can sometimes overflow the capacity of RHS->getType(), cap the value
+  // to be the largest RHS->getType() can hold
+  llvm::APInt RHSMax =
+      llvm::APInt::getMaxValue(RHS->getType()->getScalarSizeInBits());
+  if (RHSMax.ult(Ty->getBitWidth()))
+    return llvm::ConstantInt::get(RHS->getType(), RHSMax);
   return llvm::ConstantInt::get(RHS->getType(), Ty->getBitWidth() - 1);
 }
 
@@ -4235,7 +4242,7 @@ Value *ScalarExprEmitter::EmitShr(const BinOpInfo &Ops) {
            isa<llvm::IntegerType>(Ops.LHS->getType())) {
     CodeGenFunction::SanitizerScope SanScope(&CGF);
     llvm::Value *Valid =
-        Builder.CreateICmpULE(RHS, GetWidthMinusOneValue(Ops.LHS, RHS));
+        Builder.CreateICmpULE(Ops.RHS, GetWidthMinusOneValue(Ops.LHS, Ops.RHS));
     EmitBinOpCheck(std::make_pair(Valid, SanitizerKind::ShiftExponent), Ops);
   }
 
diff --git a/clang/test/CodeGen/ubsan-shift-bitint.c b/clang/test/CodeGen/ubsan-shift-bitint.c
new file mode 100644
index 0000000000000..8ca94b7de5a42
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-shift-bitint.c
@@ -0,0 +1,36 @@
+// RUN: %clang_cc1 %s -O0 -fsanitize=shift-exponent -emit-llvm -o - | FileCheck %s
+
+// Checking that the code generation is using the unextended/untruncated
+// exponent values and capping the values accordingly
+
+// CHECK-LABEL: define{{.*}} i32 @test_left_variable
+int test_left_variable(unsigned _BitInt(5) b, unsigned _BitInt(2) e) {
+  // CHECK: [[E_REG:%.+]] = load [[E_SIZE:i2]]
+  // CHECK: icmp ule [[E_SIZE]] [[E_REG]], -1
+  return b << e;
+}
+
+// CHECK-LABEL: define{{.*}} i32 @test_right_variable
+int test_right_variable(unsigned _BitInt(2) b, unsigned _BitInt(3) e) {
+  // CHECK: [[E_REG:%.+]] = load [[E_SIZE:i3]]
+  // CHECK: icmp ule [[E_SIZE]] [[E_REG]], 1
+  return b >> e;
+}
+
+// Old code generation would give false positives on left shifts when:
+//   value(e) > (width(b) - 1 % 2 ** width(e))
+// CHECK-LABEL: define{{.*}} i32 @test_left_literal
+int test_left_literal(unsigned _BitInt(5) b) {
+  // CHECK-NOT: br i1 false, label %cont, label %handler.shift_out_of_bounds
+  // CHECK: br i1 true, label %cont, label %handler.shift_out_of_bounds
+  return b << 3uwb;
+}
+
+// Old code generation would give false positives on right shifts when:
+//   (value(e) % 2 ** width(b)) < width(b)
+// CHECK-LABEL: define{{.*}} i32 @test_right_literal
+int test_right_literal(unsigned _BitInt(2) b) {
+  // CHECK-NOT: br i1 true, label %cont, label %handler.shift_out_of_bounds
+  // CHECK: br i1 false, label %cont, label %handler.shift_out_of_bounds
+  return b >> 4uwb;
+}

AdamMagierFOSS · 2024-02-03T00:12:50Z

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

bjope · 2024-02-03T10:04:10Z

Make sure the commit message refers to fixing #80135.

(Github is a bit weird, so if you want to use the "squash an merge" support in the web UI later, then I think you need to update the first comment in the "Conversation", rather than updating the commits on this pull-request branch.)

bjope · 2024-02-03T10:51:36Z

clang/test/CodeGen/ubsan-shift-bitint.c

@@ -0,0 +1,36 @@
+// RUN: %clang_cc1 %s -O0 -fsanitize=shift-exponent -emit-llvm -o - | FileCheck %s


I think you for example want to use -std=c2x -triple=x86_64-unknown-linux here.
(Not sure exactly if you for example need the triple, but I fear that you would end up with failing build bots otherwise as not all targets support C23 _BitInt literals by default.)

Sounds good, thank you for the recommendation!

bjope · 2024-02-03T10:51:54Z

clang/lib/CodeGen/CGExprScalar.cpp

@@ -4121,6 +4121,13 @@ Value *ScalarExprEmitter::GetWidthMinusOneValue(Value* LHS,Value* RHS) {
    Ty = cast<llvm::IntegerType>(VT->getElementType());
  else
    Ty = cast<llvm::IntegerType>(LHS->getType());
+  // Testing with small _BitInt types has shown that Ty->getBitwidth() - 1


Prefer if the code comment just help describe what it is doing, and possibly why here.
IMO, the "Testing with small _BitInt types has shown..." part is not really that interesting as a code comment.

One idea is to rename the GetWidthMinusOneValue helper function as we slightly change what it does.
It's more like a GetMaximumShiftAmount kind of helper. And there should be an update function description that describe a bit more about what is going on. I'm not sure why this static function is declared inside ScalarExprEmitter but the current description is far away at line 776.

It could say something like this:
Get a maximum legal shift exponent given a shift operation shifting the value LHS by RHS steps. The result type is given by RHS. The scalar bit width of the LHS value gives an upper bound on the shift exponent as it is considered illegal to shift more steps than "width minus one". If that value does not fit in the result type, then return an unsinged max for the result type.

I've renamed the function to be more accurate, but to keep a consistent style with the surrounding code I've not put a formal function description and instead kept the explanatory comment (though rewriting it to not include mentions of _BitInt).

bjope · 2024-02-03T10:59:08Z

clang/test/CodeGen/ubsan-shift-bitint.c

+// CHECK-LABEL: define{{.*}} i32 @test_left_variable
+int test_left_variable(unsigned _BitInt(5) b, unsigned _BitInt(2) e) {
+  // CHECK: [[E_REG:%.+]] = load [[E_SIZE:i2]]
+  // CHECK: icmp ule [[E_SIZE]] [[E_REG]], -1


So this test case show something that we perhaps should follow up an optimize later.
Not sure if the idea is to make the frontend simple at let the middle end optimize it away, but it is a bit stupid to emit the check here when comparing with -1.

I tried to include this type of optimization in this patch but there's an assert that expects a check to be generated when the shift-exponent check is enabled. I suppose it wouldn't be too difficult to refactor this but from what I can tell none of the UBSan checks are really optimized in this way and instead rely on the middle end to optimize this out (e.g. the array-bounds check still generates a check on an array of size 256 when indexing with uint8_t). Don't know how much of an impact this type of pre-emptive optimization would have either.

rjmccall · 2024-02-03T18:58:59Z

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

Either one is fine as long as you aren't having to narrow the portability of a test case in order to use _BitInt in it.

Patch generally LGTM.

AdamMagierFOSS · 2024-02-05T23:13:26Z

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

Either one is fine as long as you aren't having to narrow the portability of a test case in order to use _BitInt in it.

Patch generally LGTM.

After thinking about it some more maybe it would be best to keep this test its own file. The other two tests check the optimized code but in this test I explicitly want to test the unoptimized code to make sure the base code generation is going as anticipated. The way I understand it, either I include these test cases in the existing tests and there's a disconnect in how the "first part" of the test gets performed vs the "second part" of the test or I keep the test separate. Unless there's a strong opinion otherwise I think this would be the best way forward.

github-actions · 2024-02-05T23:15:37Z

✅ With the latest revision this PR passed the C/C++ code formatter.

bjope

LG

Testing the shift-exponent check with small width _BitInt values exposed a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result to determine valid exponent sizes. False positives were reported for some left shifts when width(LHS)-1 > range(RHS) and false negatives were reported for right shifts when value(RHS) > range(LHS). This patch caps the maximum value of GetWidthMinusOneValue to fit within range(RHS) to fix the issue with left shifts and fixes a code generation in EmitShr to fix the issue with right shifts and renames the function to GetMaximumShiftAmount to better reflect the new behaviour. Fixes llvm#80135.

Commit 5f87957 (pull-requst llvm#80515) corrected some codegen problems related to _BitInt types being used as shift exponents. But it did not fix it properly for the special case when the shift count operand is a signed _BitInt. The basic problem is the same as the one solved for unsigned _BitInt. As we use an unsigned comparison to see if the shift exponent is out-of-bounds, then we need to find an unsigned maximum allowed shift amount to use in the check. Normally the shift amount is limited by bitwidth of the LHS of the shift. However, when the RHS type is small in relation to the LHS then we need to use a value that fits inside the bitwidth of the RHS instead. The earlier fix simply used the unsigned maximum when deterining the max shift amount based on the RHS type. It did however not take into consideration that the RHS type could have a signed representation. In such situations we need to use the signed maximum instead. Otherwise we do not recognize a negative shift exponent as UB.

…88004) Commit 5f87957 (pull-requst #80515) corrected some codegen problems related to _BitInt types being used as shift exponents. But it did not fix it properly for the special case when the shift count operand is a signed _BitInt. The basic problem is the same as the one solved for unsigned _BitInt. As we use an unsigned comparison to see if the shift exponent is out-of-bounds, then we need to find an unsigned maximum allowed shift amount to use in the check. Normally the shift amount is limited by bitwidth of the LHS of the shift. However, when the RHS type is small in relation to the LHS then we need to use a value that fits inside the bitwidth of the RHS instead. The earlier fix simply used the unsigned maximum when deterining the max shift amount based on the RHS type. It did however not take into consideration that the RHS type could have a signed representation. In such situations we need to use the signed maximum instead. Otherwise we do not recognize a negative shift exponent as UB.

AdamMagierFOSS added bug Indicates an unexpected problem or unintended behavior c23 clang:codegen compiler-rt:ubsan Undefined behavior sanitizer labels Feb 3, 2024

AdamMagierFOSS requested review from vedantk, jfbastien and rjmccall February 3, 2024 00:10

llvmbot added the clang Clang issues not falling into any other category label Feb 3, 2024

AdamMagierFOSS linked an issue Feb 3, 2024 that may be closed by this pull request

Faulty -fsanitize=shift-exponent checks (seen with _BitInt) #80135

Closed

bjope self-requested a review February 3, 2024 07:08

bjope reviewed Feb 3, 2024

View reviewed changes

bjope approved these changes Feb 6, 2024

View reviewed changes

AdamMagierFOSS force-pushed the ubsan-shift-exponent-fix branch from 3721afa to f990fb2 Compare February 6, 2024 16:24

AdamMagierFOSS force-pushed the ubsan-shift-exponent-fix branch from f990fb2 to d4b92b2 Compare February 6, 2024 16:26

AdamMagierFOSS merged commit 5f87957 into llvm:main Feb 6, 2024
4 checks passed

bjope mentioned this pull request Apr 8, 2024

[clang][CodeGen] Fix shift-exponent ubsan check for signed _BitInt #88004

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

AdamMagierFOSS commented Feb 3, 2024 •

edited

llvmbot commented Feb 3, 2024 •

edited

AdamMagierFOSS commented Feb 3, 2024

bjope commented Feb 3, 2024

bjope Feb 3, 2024

AdamMagierFOSS Feb 5, 2024

bjope Feb 3, 2024

AdamMagierFOSS Feb 5, 2024

bjope Feb 3, 2024

AdamMagierFOSS Feb 5, 2024

rjmccall commented Feb 3, 2024

AdamMagierFOSS commented Feb 5, 2024

github-actions bot commented Feb 5, 2024 •

edited

bjope left a comment

		@@ -0,0 +1,36 @@
		// RUN: %clang_cc1 %s -O0 -fsanitize=shift-exponent -emit-llvm -o - \| FileCheck %s

[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

Conversation

AdamMagierFOSS commented Feb 3, 2024 • edited

llvmbot commented Feb 3, 2024 • edited

AdamMagierFOSS commented Feb 3, 2024

bjope commented Feb 3, 2024

bjope Feb 3, 2024

Choose a reason for hiding this comment

AdamMagierFOSS Feb 5, 2024

Choose a reason for hiding this comment

bjope Feb 3, 2024

Choose a reason for hiding this comment

AdamMagierFOSS Feb 5, 2024

Choose a reason for hiding this comment

bjope Feb 3, 2024

Choose a reason for hiding this comment

AdamMagierFOSS Feb 5, 2024

Choose a reason for hiding this comment

rjmccall commented Feb 3, 2024

AdamMagierFOSS commented Feb 5, 2024

github-actions bot commented Feb 5, 2024 • edited

bjope left a comment

Choose a reason for hiding this comment

AdamMagierFOSS commented Feb 3, 2024 •

edited

llvmbot commented Feb 3, 2024 •

edited

github-actions bot commented Feb 5, 2024 •

edited