Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][CodeGen][UBSan] Fixing shift-exponent generation for _BitInt #80515

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

AdamMagierFOSS
Copy link
Contributor

@AdamMagierFOSS AdamMagierFOSS commented Feb 3, 2024

Testing the shift-exponent check with small width _BitInt values exposed a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result to determine valid exponent sizes. False positives were reported for some left shifts when width(LHS)-1 > range(RHS) and false negatives were reported for right shifts when value(RHS) > range(LHS). This patch caps the maximum value of GetWidthMinusOneValue to fit within range(RHS) to fix the issue with left shifts and fixes a code generation in EmitShr to fix the issue with right shifts and renames the function to GetMaximumShiftAmount to better reflect the new behaviour.

Fixes #80135.

@AdamMagierFOSS AdamMagierFOSS added bug Indicates an unexpected problem or unintended behavior c23 clang:codegen compiler-rt:ubsan Undefined behavior sanitizer labels Feb 3, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 3, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Adam Magier (AdamMagierFOSS)

Changes

Testing the shift-exponent check with small width _BitInt values exposed a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result to determine valid exponent sizes. False positives were reported for some left shifts when width(LHS)-1 > range(RHS) and false negatives were reported for right shifts when value(RHS) > range(LHS). This patch caps the maximum value of GetWidthMinusOneValue to fit within range(RHS) to fix the issue with left shifts and fixes a code generation in EmitShr to fix the issue with right shifts.


Full diff: https://github.com/llvm/llvm-project/pull/80515.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+8-1)
  • (added) clang/test/CodeGen/ubsan-shift-bitint.c (+36)
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 5502f685f6474..e2e3ed839714a 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -4121,6 +4121,13 @@ Value *ScalarExprEmitter::GetWidthMinusOneValue(Value* LHS,Value* RHS) {
     Ty = cast<llvm::IntegerType>(VT->getElementType());
   else
     Ty = cast<llvm::IntegerType>(LHS->getType());
+  // Testing with small _BitInt types has shown that Ty->getBitwidth() - 1
+  // can sometimes overflow the capacity of RHS->getType(), cap the value
+  // to be the largest RHS->getType() can hold
+  llvm::APInt RHSMax =
+      llvm::APInt::getMaxValue(RHS->getType()->getScalarSizeInBits());
+  if (RHSMax.ult(Ty->getBitWidth()))
+    return llvm::ConstantInt::get(RHS->getType(), RHSMax);
   return llvm::ConstantInt::get(RHS->getType(), Ty->getBitWidth() - 1);
 }
 
@@ -4235,7 +4242,7 @@ Value *ScalarExprEmitter::EmitShr(const BinOpInfo &Ops) {
            isa<llvm::IntegerType>(Ops.LHS->getType())) {
     CodeGenFunction::SanitizerScope SanScope(&CGF);
     llvm::Value *Valid =
-        Builder.CreateICmpULE(RHS, GetWidthMinusOneValue(Ops.LHS, RHS));
+        Builder.CreateICmpULE(Ops.RHS, GetWidthMinusOneValue(Ops.LHS, Ops.RHS));
     EmitBinOpCheck(std::make_pair(Valid, SanitizerKind::ShiftExponent), Ops);
   }
 
diff --git a/clang/test/CodeGen/ubsan-shift-bitint.c b/clang/test/CodeGen/ubsan-shift-bitint.c
new file mode 100644
index 0000000000000..8ca94b7de5a42
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-shift-bitint.c
@@ -0,0 +1,36 @@
+// RUN: %clang_cc1 %s -O0 -fsanitize=shift-exponent -emit-llvm -o - | FileCheck %s
+
+// Checking that the code generation is using the unextended/untruncated
+// exponent values and capping the values accordingly
+
+// CHECK-LABEL: define{{.*}} i32 @test_left_variable
+int test_left_variable(unsigned _BitInt(5) b, unsigned _BitInt(2) e) {
+  // CHECK: [[E_REG:%.+]] = load [[E_SIZE:i2]]
+  // CHECK: icmp ule [[E_SIZE]] [[E_REG]], -1
+  return b << e;
+}
+
+// CHECK-LABEL: define{{.*}} i32 @test_right_variable
+int test_right_variable(unsigned _BitInt(2) b, unsigned _BitInt(3) e) {
+  // CHECK: [[E_REG:%.+]] = load [[E_SIZE:i3]]
+  // CHECK: icmp ule [[E_SIZE]] [[E_REG]], 1
+  return b >> e;
+}
+
+// Old code generation would give false positives on left shifts when:
+//   value(e) > (width(b) - 1 % 2 ** width(e))
+// CHECK-LABEL: define{{.*}} i32 @test_left_literal
+int test_left_literal(unsigned _BitInt(5) b) {
+  // CHECK-NOT: br i1 false, label %cont, label %handler.shift_out_of_bounds
+  // CHECK: br i1 true, label %cont, label %handler.shift_out_of_bounds
+  return b << 3uwb;
+}
+
+// Old code generation would give false positives on right shifts when:
+//   (value(e) % 2 ** width(b)) < width(b)
+// CHECK-LABEL: define{{.*}} i32 @test_right_literal
+int test_right_literal(unsigned _BitInt(2) b) {
+  // CHECK-NOT: br i1 true, label %cont, label %handler.shift_out_of_bounds
+  // CHECK: br i1 false, label %cont, label %handler.shift_out_of_bounds
+  return b >> 4uwb;
+}

@AdamMagierFOSS
Copy link
Contributor Author

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

@AdamMagierFOSS AdamMagierFOSS linked an issue Feb 3, 2024 that may be closed by this pull request
@bjope bjope self-requested a review February 3, 2024 07:08
@bjope
Copy link
Collaborator

bjope commented Feb 3, 2024

Make sure the commit message refers to fixing #80135.

(Github is a bit weird, so if you want to use the "squash an merge" support in the web UI later, then I think you need to update the first comment in the "Conversation", rather than updating the commits on this pull-request branch.)

@@ -0,0 +1,36 @@
// RUN: %clang_cc1 %s -O0 -fsanitize=shift-exponent -emit-llvm -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you for example want to use -std=c2x -triple=x86_64-unknown-linux here.
(Not sure exactly if you for example need the triple, but I fear that you would end up with failing build bots otherwise as not all targets support C23 _BitInt literals by default.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you for the recommendation!

@@ -4121,6 +4121,13 @@ Value *ScalarExprEmitter::GetWidthMinusOneValue(Value* LHS,Value* RHS) {
Ty = cast<llvm::IntegerType>(VT->getElementType());
else
Ty = cast<llvm::IntegerType>(LHS->getType());
// Testing with small _BitInt types has shown that Ty->getBitwidth() - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer if the code comment just help describe what it is doing, and possibly why here.
IMO, the "Testing with small _BitInt types has shown..." part is not really that interesting as a code comment.

One idea is to rename the GetWidthMinusOneValue helper function as we slightly change what it does.
It's more like a GetMaximumShiftAmount kind of helper. And there should be an update function description that describe a bit more about what is going on. I'm not sure why this static function is declared inside ScalarExprEmitter but the current description is far away at line 776.

It could say something like this:
Get a maximum legal shift exponent given a shift operation shifting the value LHS by RHS steps. The result type is given by RHS. The scalar bit width of the LHS value gives an upper bound on the shift exponent as it is considered illegal to shift more steps than "width minus one". If that value does not fit in the result type, then return an unsinged max for the result type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the function to be more accurate, but to keep a consistent style with the surrounding code I've not put a formal function description and instead kept the explanatory comment (though rewriting it to not include mentions of _BitInt).

// CHECK-LABEL: define{{.*}} i32 @test_left_variable
int test_left_variable(unsigned _BitInt(5) b, unsigned _BitInt(2) e) {
// CHECK: [[E_REG:%.+]] = load [[E_SIZE:i2]]
// CHECK: icmp ule [[E_SIZE]] [[E_REG]], -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test case show something that we perhaps should follow up an optimize later.
Not sure if the idea is to make the frontend simple at let the middle end optimize it away, but it is a bit stupid to emit the check here when comparing with -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to include this type of optimization in this patch but there's an assert that expects a check to be generated when the shift-exponent check is enabled. I suppose it wouldn't be too difficult to refactor this but from what I can tell none of the UBSan checks are really optimized in this way and instead rely on the middle end to optimize this out (e.g. the array-bounds check still generates a check on an array of size 256 when indexing with uint8_t). Don't know how much of an impact this type of pre-emptive optimization would have either.

@rjmccall
Copy link
Contributor

rjmccall commented Feb 3, 2024

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

Either one is fine as long as you aren't having to narrow the portability of a test case in order to use _BitInt in it.

Patch generally LGTM.

@AdamMagierFOSS
Copy link
Contributor Author

One thing I'll preemptively address is I didn't know where to put the new unit testing - creating a separate file seems a little heavy handed but I see that there's a test for UBSan shift generation (clang/test/CodeGen/ubsan-shift.c) and one for UBSan + _BitInt (clang/test/CodeGen/ext-int-sanitizer.cpp). Both seem equally "valid" but neither seem to test in the same way that I'm trying to test these changes. Advice on this would be appreciated.

Either one is fine as long as you aren't having to narrow the portability of a test case in order to use _BitInt in it.

Patch generally LGTM.

After thinking about it some more maybe it would be best to keep this test its own file. The other two tests check the optimized code but in this test I explicitly want to test the unoptimized code to make sure the base code generation is going as anticipated. The way I understand it, either I include these test cases in the existing tests and there's a disconnect in how the "first part" of the test gets performed vs the "second part" of the test or I keep the test separate. Unless there's a strong opinion otherwise I think this would be the best way forward.

Copy link

github-actions bot commented Feb 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@bjope bjope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Testing the shift-exponent check with small width _BitInt values exposed
a bug in ScalarExprEmitter::GetWidthMinusOneValue when using the result
to determine valid exponent sizes. False positives were reported for
some left shifts when width(LHS)-1 > range(RHS) and false negatives were
reported for right shifts when value(RHS) > range(LHS). This patch caps
the maximum value of GetWidthMinusOneValue to fit within range(RHS) to
fix the issue with left shifts and fixes a code generation in EmitShr to
fix the issue with right shifts and renames the function to
GetMaximumShiftAmount to better reflect the new behaviour.

Fixes llvm#80135.
@AdamMagierFOSS AdamMagierFOSS merged commit 5f87957 into llvm:main Feb 6, 2024
4 checks passed
bjope added a commit to bjope/llvm-project that referenced this pull request Apr 8, 2024
Commit 5f87957 (pull-requst llvm#80515) corrected some
codegen problems related to _BitInt types being used as shift
exponents. But it did not fix it properly for the special
case when the shift count operand is a signed _BitInt.

The basic problem is the same as the one solved for unsigned
_BitInt. As we use an unsigned comparison to see if the shift
exponent is out-of-bounds, then we need to find an unsigned
maximum allowed shift amount to use in the check. Normally the
shift amount is limited by bitwidth of the LHS of the shift.
However, when the RHS type is small in relation to the LHS then
we need to use a value that fits inside the bitwidth of the RHS
instead.

The earlier fix simply used the unsigned maximum when deterining
the max shift amount based on the RHS type. It did however not
take into consideration that the RHS type could have a signed
representation. In such situations we need to use the signed
maximum instead. Otherwise we do not recognize a negative shift
exponent as UB.
bjope added a commit that referenced this pull request Apr 19, 2024
…88004)

Commit 5f87957 (pull-requst #80515) corrected some codegen
problems related to _BitInt types being used as shift exponents. But it
did not fix it properly for the special case when the shift count
operand is a signed _BitInt.

The basic problem is the same as the one solved for unsigned _BitInt. As
we use an unsigned comparison to see if the shift exponent is
out-of-bounds, then we need to find an unsigned maximum allowed shift
amount to use in the check. Normally the shift amount is limited by
bitwidth of the LHS of the shift. However, when the RHS type is small in
relation to the LHS then we need to use a value that fits inside the
bitwidth of the RHS instead.

The earlier fix simply used the unsigned maximum when deterining the max
shift amount based on the RHS type. It did however not take into
consideration that the RHS type could have a signed representation. In
such situations we need to use the signed maximum instead. Otherwise we
do not recognize a negative shift exponent as UB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior c23 clang:codegen clang Clang issues not falling into any other category compiler-rt:ubsan Undefined behavior sanitizer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faulty -fsanitize=shift-exponent checks (seen with _BitInt)
4 participants