Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][UBSan] Add implicit conversion check for bitfields #75481

Merged
merged 3 commits into from
Apr 3, 2024

Conversation

Zonotora
Copy link
Contributor

@Zonotora Zonotora commented Dec 14, 2023

This patch implements the implicit truncation and implicit sign change checks for bitfields using UBSan. E.g.,
-fsanitize=implicit-bitfield-truncation and -fsanitize=implicit-bitfield-sign-change.
However, right now some unnecessary emits are generated that ideally will be removed in the future. Let me explain.

This commit implements the truncation and sign change checks whenever we are emitting a EmitStoreThroughBitfieldLValue. I use the llvm::Value updated through EmitStoreThroughBitfieldLValue to check whether the value changed or not. However, calling EmitStoreThroughBitfieldLValue means we have already evaluated RHS and potentially emitted a truncation or sign check already, which is the case if we have a bitfield of type char and the RHS is of type int. Thus, in this case we may report the truncation or sign change error two times at runtime under the right circumstances which is not ideal. E.g.,

#include <stdio.h>
#include <math.h>

typedef struct {
    unsigned char a:4;
} X;

int main(void) {
    X x;
    unsigned int a = 272;
    x.a = a;
    return 0;
}

produces

main.c:11:11: runtime error: implicit conversion from type 'unsigned int' of value 272 (32-bit, unsigned) to type 'unsigned char' changed the value to 16 (8-bit, unsigned)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior main.c:12:11 in
main.c:11:9: runtime error: implicit conversion from type 'unsigned int' of value 16 (32-bit, unsigned) to type 'unsigned char' changed the value to 0 (4-bit bitfield, unsigned)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior main.c:12:9 in

because we truncate the value 272 into a char when evaluating the RHS (resulting in the value 16) and then truncate it again when assigning it to the bitfield.

I thought of disabling the relevant sanitizer checks when evaluating the RHS, but to make it work for all cases I need to extract the previous (before casting it to char in the above case) llvm::Value from Visit(E->getRHS()) in order to compare it, which seemed more complicated for a first approach. The current approach is easier to iterate upon and is more isolated, but again, not ideal.

I would appreciate feedback so that I can advance this to a mergeable state. This is my first commit here, so I have probably missed some essentials!

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt clang:codegen compiler-rt:sanitizer labels Dec 14, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 14, 2023

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-compiler-rt-sanitizer
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Axel Lundberg (Zonotora)

Changes

This patch implements the implicit truncation and implicit sign change checks for bitfields using UBSan. E.g.,
-fsanitize=implicit-integer-truncation and -fsanitize=implicit-integer-sign-change.
However, right now some unnecessary emits are generated that ideally will be removed in the future. Let me explain.

This commit implements the truncation and sign change checks whenever we are emitting a EmitStoreThroughBitfieldLValue. I use the llvm::Value updated through EmitStoreThroughBitfieldLValue to check whether the value changed or not. However, calling EmitStoreThroughBitfieldLValue means we have already evaluated RHS and potentially emitted a truncation or sign check already, which is the case if we have a bitfield of type char and the RHS is of type int. Thus, in this case we may report the truncation or sign change error two times at runtime under the right circumstances which is not ideal. E.g.,

#include &lt;stdio.h&gt;
#include &lt;math.h&gt;

typedef struct {
    unsigned char a:4;
} X;

int main(void) {
    X x;
    unsigned int a = 272;
    x.a = a;
    return 0;
}

produces

main.c:11:11: runtime error: implicit conversion from type 'unsigned int' of value 272 (32-bit, unsigned) to type 'unsigned char' changed the value to 16 (8-bit, unsigned)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior main.c:12:11 in
main.c:11:9: runtime error: implicit conversion from type 'unsigned int' of value 16 (32-bit, unsigned) to type 'unsigned char' changed the value to 0 (4-bit bitfield, unsigned)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior main.c:12:9 in

because we truncate the value 272 into a char when evaluating the RHS (resulting in the value 16) and then truncate it again when assigning it to the bitfield.

I thought of disabling the relevant sanitizer checks when evaluating the RHS, but to make it work for all cases I need to extract the previous (before casting it to char in the above case) llvm::Value from Visit(E-&gt;getRHS()) in order to compare it, which seemed more complicated for a first approach. The current approach is easier to iterate upon and is more isolated, but again, not ideal.

I would appreciate feedback so that I can advance this to a mergeable state. This is my first commit here, so I have probably missed some essentials!


Patch is 20.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75481.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+283-26)
  • (added) clang/test/CodeGen/ubsan-bitfield-conversion.c (+27)
  • (modified) compiler-rt/lib/ubsan/ubsan_diag.h (+1-1)
  • (modified) compiler-rt/lib/ubsan/ubsan_handlers.cpp (+6-4)
  • (modified) compiler-rt/lib/ubsan/ubsan_handlers.h (+1)
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 41ad2ddac30d2d..c93c5d89583d68 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -15,6 +15,7 @@
 #include "CGDebugInfo.h"
 #include "CGObjCRuntime.h"
 #include "CGOpenMPRuntime.h"
+#include "CGRecordLayout.h"
 #include "CodeGenFunction.h"
 #include "CodeGenModule.h"
 #include "ConstantEmitter.h"
@@ -324,6 +325,25 @@ class ScalarExprEmitter
   void EmitIntegerSignChangeCheck(Value *Src, QualType SrcType, Value *Dst,
                                   QualType DstType, SourceLocation Loc);
 
+  /// Emit a check that an [implicit] truncation of a bitfield does not
+  /// discard any bits. It is not UB, so we use the value after truncation.
+  void EmitBitfieldTruncationCheck(Value *Src, QualType SrcType, Value *Dst,
+                                   QualType DstType, const CGBitFieldInfo &Info,
+                                   SourceLocation Loc);
+
+  /// Emit a check that an [implicit] conversion of a bitfield does not change
+  /// the sign of the value. It is not UB, so we use the value after conversion.
+  /// NOTE: Src and Dst may be the exact same value! (point to the same thing)
+  void EmitBitfieldSignChangeCheck(Value *Src, QualType SrcType, Value *Dst,
+                                   QualType DstType, const CGBitFieldInfo &Info,
+                                   SourceLocation Loc);
+
+  /// Emit a check that an [implicit] conversion of a bitfield. It is not UB,
+  /// so we use the value after conversion.
+  void EmitBitfieldConversionCheck(Value *Src, QualType SrcType, Value *Dst,
+                                   QualType DstType, const CGBitFieldInfo &Info,
+                                   SourceLocation Loc);
+
   /// Emit a conversion from the specified type to the specified destination
   /// type, both of which are LLVM scalar types.
   struct ScalarConversionOpts {
@@ -1094,6 +1114,27 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value *Src, QualType SrcType,
                 {Src, Dst});
 }
 
+static llvm::Value *EmitIsNegativeTestHelper(Value *V, QualType VType,
+                                             const char *Name,
+                                             CGBuilderTy &Builder) {
+  // NOTE: zero value is considered to be non-negative.
+  // Is this value a signed type?
+  bool VSigned = VType->isSignedIntegerOrEnumerationType();
+  llvm::Type *VTy = V->getType();
+  if (!VSigned) {
+    // If the value is unsigned, then it is never negative.
+    // FIXME: can we encounter non-scalar VTy here?
+    return llvm::ConstantInt::getFalse(VTy->getContext());
+  }
+  // Get the zero of the same type with which we will be comparing.
+  llvm::Constant *Zero = llvm::ConstantInt::get(VTy, 0);
+  // %V.isnegative = icmp slt %V, 0
+  // I.e is %V *strictly* less than zero, does it have negative value?
+  return Builder.CreateICmp(llvm::ICmpInst::ICMP_SLT, V, Zero,
+                            llvm::Twine(Name) + "." + V->getName() +
+                                ".negativitycheck");
+};
+
 // Should be called within CodeGenFunction::SanitizerScope RAII scope.
 // Returns 'i1 false' when the conversion Src -> Dst changed the sign.
 static std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
@@ -1118,30 +1159,12 @@ EmitIntegerSignChangeCheckHelper(Value *Src, QualType SrcType, Value *Dst,
   assert(((SrcBits != DstBits) || (SrcSigned != DstSigned)) &&
          "either the widths should be different, or the signednesses.");
 
-  // NOTE: zero value is considered to be non-negative.
-  auto EmitIsNegativeTest = [&Builder](Value *V, QualType VType,
-                                       const char *Name) -> Value * {
-    // Is this value a signed type?
-    bool VSigned = VType->isSignedIntegerOrEnumerationType();
-    llvm::Type *VTy = V->getType();
-    if (!VSigned) {
-      // If the value is unsigned, then it is never negative.
-      // FIXME: can we encounter non-scalar VTy here?
-      return llvm::ConstantInt::getFalse(VTy->getContext());
-    }
-    // Get the zero of the same type with which we will be comparing.
-    llvm::Constant *Zero = llvm::ConstantInt::get(VTy, 0);
-    // %V.isnegative = icmp slt %V, 0
-    // I.e is %V *strictly* less than zero, does it have negative value?
-    return Builder.CreateICmp(llvm::ICmpInst::ICMP_SLT, V, Zero,
-                              llvm::Twine(Name) + "." + V->getName() +
-                                  ".negativitycheck");
-  };
-
   // 1. Was the old Value negative?
-  llvm::Value *SrcIsNegative = EmitIsNegativeTest(Src, SrcType, "src");
+  llvm::Value *SrcIsNegative =
+      EmitIsNegativeTestHelper(Src, SrcType, "src", Builder);
   // 2. Is the new Value negative?
-  llvm::Value *DstIsNegative = EmitIsNegativeTest(Dst, DstType, "dst");
+  llvm::Value *DstIsNegative =
+      EmitIsNegativeTestHelper(Dst, DstType, "dst", Builder);
   // 3. Now, was the 'negativity status' preserved during the conversion?
   //    NOTE: conversion from negative to zero is considered to change the sign.
   //    (We want to get 'false' when the conversion changed the sign)
@@ -1236,6 +1259,221 @@ void ScalarExprEmitter::EmitIntegerSignChangeCheck(Value *Src, QualType SrcType,
                 {Src, Dst});
 }
 
+// Should be called within CodeGenFunction::SanitizerScope RAII scope.
+// Returns 'i1 false' when the truncation Src -> Dst was lossy.
+static std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
+                 std::pair<llvm::Value *, SanitizerMask>>
+EmitBitfieldTruncationCheckHelper(Value *Src, QualType SrcType, Value *Dst,
+                                  QualType DstType, CGBuilderTy &Builder) {
+
+  llvm::Type *SrcTy = Src->getType();
+  llvm::Type *DstTy = Dst->getType();
+  (void)SrcTy; // Only used in assert()
+  (void)DstTy; // Only used in assert()
+
+  // This should be truncation of integral types.
+  assert(isa<llvm::IntegerType>(SrcTy) && isa<llvm::IntegerType>(DstTy) &&
+         "non-integer llvm type");
+
+  bool SrcSigned = SrcType->isSignedIntegerOrEnumerationType();
+  bool DstSigned = DstType->isSignedIntegerOrEnumerationType();
+
+  ScalarExprEmitter::ImplicitConversionCheckKind Kind;
+  SanitizerMask Mask;
+  if (!SrcSigned && !DstSigned) {
+    Kind = ScalarExprEmitter::ICCK_UnsignedIntegerTruncation;
+    Mask = SanitizerKind::ImplicitUnsignedIntegerTruncation;
+  } else {
+    Kind = ScalarExprEmitter::ICCK_SignedIntegerTruncation;
+    Mask = SanitizerKind::ImplicitSignedIntegerTruncation;
+  }
+
+  // Since Src already has the same type as Dst, we don't have
+  // to sign extend or truncate the Src value.
+  llvm::Value *CheckV = Builder.CreateICmpEQ(Dst, Src, "bf.truncheck");
+  // If the comparison result is 'i1 false', then the truncation was lossy.
+
+  return std::make_pair(Kind, std::make_pair(CheckV, Mask));
+}
+
+void ScalarExprEmitter::EmitBitfieldTruncationCheck(
+    Value *Src, QualType SrcType, Value *Dst, QualType DstType,
+    const CGBitFieldInfo &Info, SourceLocation Loc) {
+
+  if (!CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitIntegerTruncation))
+    return;
+
+  // TODO: Calculate src width to avoid emitting code
+  // for unecessary cases.
+  unsigned SrcBits = ConvertType(SrcType)->getScalarSizeInBits();
+  unsigned BitfieldWidth = Info.Size;
+  // This must be truncation. Else we do not care.
+  if (SrcBits <= BitfieldWidth)
+    return;
+
+  // If the integer sign change sanitizer is enabled,
+  // and we are truncating from larger unsigned type to smaller signed type,
+  // let that next sanitizer deal with it.
+  bool SrcSigned = SrcType->isSignedIntegerOrEnumerationType();
+  bool DstSigned = DstType->isSignedIntegerOrEnumerationType();
+  if (CGF.SanOpts.has(SanitizerKind::ImplicitIntegerSignChange) &&
+      (!SrcSigned && DstSigned))
+    return;
+
+  CodeGenFunction::SanitizerScope SanScope(&CGF);
+
+  std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
+            std::pair<llvm::Value *, SanitizerMask>>
+      Check = EmitBitfieldTruncationCheckHelper(Src, SrcType, Dst, DstType,
+                                                Builder);
+  // If the comparison result is 'i1 false', then the truncation was lossy.
+
+  llvm::Constant *StaticArgs[] = {
+      CGF.EmitCheckSourceLocation(Loc), CGF.EmitCheckTypeDescriptor(SrcType),
+      CGF.EmitCheckTypeDescriptor(DstType),
+      llvm::ConstantInt::get(Builder.getInt8Ty(), Check.first),
+      llvm::ConstantInt::get(Builder.getInt32Ty(), Info.Size)};
+  CGF.EmitCheck(Check.second, SanitizerHandler::ImplicitConversion, StaticArgs,
+                {Src, Dst});
+}
+
+// Should be called within CodeGenFunction::SanitizerScope RAII scope.
+// Returns 'i1 false' when the conversion Src -> Dst changed the sign.
+static std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
+                 std::pair<llvm::Value *, SanitizerMask>>
+EmitBitfieldSignChangeCheckHelper(Value *Src, QualType SrcType,
+                                  unsigned SrcBits, Value *Dst,
+                                  QualType DstType, unsigned DstBits,
+                                  CGBuilderTy &Builder) {
+  llvm::Type *SrcTy = Src->getType();
+  llvm::Type *DstTy = Dst->getType();
+  (void)SrcTy; // Only used in assert()
+  (void)DstTy; // Only used in assert()
+
+  // This should be truncation of integral types.
+  assert(isa<llvm::IntegerType>(SrcTy) && isa<llvm::IntegerType>(DstTy) &&
+         "non-integer llvm type");
+
+  bool SrcSigned = SrcType->isSignedIntegerOrEnumerationType();
+  bool DstSigned = DstType->isSignedIntegerOrEnumerationType();
+
+  assert(((SrcBits != DstBits) || (SrcSigned != DstSigned)) &&
+         "either the widths should be different, or the signednesses.");
+
+  // 1. Was the old Value negative?
+  llvm::Value *SrcIsNegative =
+      EmitIsNegativeTestHelper(Src, SrcType, "bf.src", Builder);
+  // 2. Is the new Value negative?
+  llvm::Value *DstIsNegative =
+      EmitIsNegativeTestHelper(Dst, DstType, "bf.dst", Builder);
+  // 3. Now, was the 'negativity status' preserved during the conversion?
+  //    NOTE: conversion from negative to zero is considered to change the sign.
+  //    (We want to get 'false' when the conversion changed the sign)
+  //    So we should just equality-compare the negativity statuses.
+  llvm::Value *Check = nullptr;
+  Check =
+      Builder.CreateICmpEQ(SrcIsNegative, DstIsNegative, "bf.signchangecheck");
+  // If the comparison result is 'false', then the conversion changed the sign.
+  return std::make_pair(
+      ScalarExprEmitter::ICCK_IntegerSignChange,
+      std::make_pair(Check, SanitizerKind::ImplicitIntegerSignChange));
+}
+
+void ScalarExprEmitter::EmitBitfieldSignChangeCheck(
+    Value *Src, QualType SrcType, Value *Dst, QualType DstType,
+    const CGBitFieldInfo &Info, SourceLocation Loc) {
+
+  if (!CGF.SanOpts.has(SanitizerKind::ImplicitIntegerSignChange))
+    return;
+
+  bool SrcSigned = SrcType->isSignedIntegerOrEnumerationType();
+  bool DstSigned = DstType->isSignedIntegerOrEnumerationType();
+  unsigned SrcBits = ConvertType(SrcType)->getScalarSizeInBits();
+  unsigned DstBits = Info.Size;
+
+  // Now, we do not need to emit the check in *all* of the cases.
+  // We can avoid emitting it in some obvious cases where it would have been
+  // dropped by the opt passes (instcombine) always anyways.
+  // If it's a cast between effectively the same type, no check.
+  // NOTE: this is *not* equivalent to checking the canonical types.
+  if (SrcSigned == DstSigned && SrcBits == DstBits)
+    return;
+  // At least one of the values needs to have signed type.
+  // If both are unsigned, then obviously, neither of them can be negative.
+  if (!SrcSigned && !DstSigned)
+    return;
+  // If the conversion is to *larger* *signed* type, then no check is needed.
+  // Because either sign-extension happens (so the sign will remain),
+  // or zero-extension will happen (the sign bit will be zero.)
+  if ((DstBits > SrcBits) && DstSigned)
+    return;
+  if (CGF.SanOpts.has(SanitizerKind::ImplicitSignedIntegerTruncation) &&
+      (SrcBits > DstBits) && SrcSigned) {
+    // If the signed integer truncation sanitizer is enabled,
+    // and this is a truncation from signed type, then no check is needed.
+    // Because here sign change check is interchangeable with truncation check.
+    return;
+  }
+  // That's it. We can't rule out any more cases with the data we have.
+
+  CodeGenFunction::SanitizerScope SanScope(&CGF);
+
+  std::pair<ScalarExprEmitter::ImplicitConversionCheckKind,
+            std::pair<llvm::Value *, SanitizerMask>>
+      Check;
+
+  // Each of these checks needs to return 'false' when an issue was detected.
+  ImplicitConversionCheckKind CheckKind;
+  llvm::SmallVector<std::pair<llvm::Value *, SanitizerMask>, 2> Checks;
+  // So we can 'and' all the checks together, and still get 'false',
+  // if at least one of the checks detected an issue.
+
+  Check = EmitBitfieldSignChangeCheckHelper(Src, SrcType, SrcBits, Dst, DstType,
+                                            DstBits, Builder);
+  CheckKind = Check.first;
+  Checks.emplace_back(Check.second);
+
+  if (CGF.SanOpts.has(SanitizerKind::ImplicitSignedIntegerTruncation) &&
+      (SrcBits > DstBits) && !SrcSigned && DstSigned) {
+    // If the signed integer truncation sanitizer was enabled,
+    // and we are truncating from larger unsigned type to smaller signed type,
+    // let's handle the case we skipped in that check.
+    Check =
+        EmitBitfieldTruncationCheckHelper(Src, SrcType, Dst, DstType, Builder);
+    CheckKind = ICCK_SignedIntegerTruncationOrSignChange;
+    Checks.emplace_back(Check.second);
+    // If the comparison result is 'i1 false', then the truncation was lossy.
+  }
+
+  llvm::Constant *StaticArgs[] = {
+      CGF.EmitCheckSourceLocation(Loc), CGF.EmitCheckTypeDescriptor(SrcType),
+      CGF.EmitCheckTypeDescriptor(DstType),
+      llvm::ConstantInt::get(Builder.getInt8Ty(), CheckKind),
+      llvm::ConstantInt::get(Builder.getInt32Ty(), Info.Size)};
+  // EmitCheck() will 'and' all the checks together.
+  CGF.EmitCheck(Checks, SanitizerHandler::ImplicitConversion, StaticArgs,
+                {Src, Dst});
+}
+
+void ScalarExprEmitter::EmitBitfieldConversionCheck(
+    Value *Src, QualType SrcType, Value *Dst, QualType DstType,
+    const CGBitFieldInfo &Info, SourceLocation Loc) {
+
+  if (!CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitConversion))
+    return;
+
+  // We only care about int->int conversions here.
+  // We ignore conversions to/from pointer and/or bool.
+  if (!PromotionIsPotentiallyEligibleForImplicitIntegerConversionCheck(SrcType,
+                                                                       DstType))
+    return;
+
+  assert(!DstType->isBooleanType() && "we should not get here with booleans.");
+
+  EmitBitfieldTruncationCheck(Src, SrcType, Dst, DstType, Info, Loc);
+  EmitBitfieldSignChangeCheck(Src, SrcType, Dst, DstType, Info, Loc);
+}
+
 Value *ScalarExprEmitter::EmitScalarCast(Value *Src, QualType SrcType,
                                          QualType DstType, llvm::Type *SrcTy,
                                          llvm::Type *DstTy,
@@ -2845,9 +3083,12 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,
   }
 
   // Store the updated result through the lvalue.
-  if (LV.isBitField())
+  if (LV.isBitField()) {
+    Value *Src = value;
     CGF.EmitStoreThroughBitfieldLValue(RValue::get(value), LV, &value);
-  else
+    EmitBitfieldConversionCheck(Src, E->getType(), value, E->getType(),
+                                LV.getBitFieldInfo(), E->getExprLoc());
+  } else
     CGF.EmitStoreThroughLValue(RValue::get(value), LV);
 
   // If this is a postinc, return the value read from memory, otherwise use the
@@ -3372,9 +3613,17 @@ LValue ScalarExprEmitter::EmitCompoundAssignLValue(
   // specially because the result is altered by the store, i.e., [C99 6.5.16p1]
   // 'An assignment expression has the value of the left operand after the
   // assignment...'.
-  if (LHSLV.isBitField())
+  if (LHSLV.isBitField()) {
+    Value *Src = Result;
+    QualType SrcType = E->getRHS()->getType();
+    QualType DstType = E->getLHS()->getType();
+    if (ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E->getRHS())) {
+      SrcType = ICE->getSubExpr()->getType();
+    }
     CGF.EmitStoreThroughBitfieldLValue(RValue::get(Result), LHSLV, &Result);
-  else
+    EmitBitfieldConversionCheck(Src, SrcType, Result, DstType,
+                                LHSLV.getBitFieldInfo(), E->getExprLoc());
+  } else
     CGF.EmitStoreThroughLValue(RValue::get(Result), LHSLV);
 
   if (CGF.getLangOpts().OpenMP)
@@ -4510,7 +4759,15 @@ Value *ScalarExprEmitter::VisitBinAssign(const BinaryOperator *E) {
     // 'An assignment expression has the value of the left operand after
     // the assignment...'.
     if (LHS.isBitField()) {
+      Value *Src = RHS;
+      QualType SrcType = E->getRHS()->getType();
+      QualType DstType = E->getLHS()->getType();
+      if (ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E->getRHS())) {
+        SrcType = ICE->getSubExpr()->getType();
+      }
       CGF.EmitStoreThroughBitfieldLValue(RValue::get(RHS), LHS, &RHS);
+      EmitBitfieldConversionCheck(Src, SrcType, RHS, DstType,
+                                  LHS.getBitFieldInfo(), E->getExprLoc());
     } else {
       CGF.EmitNullabilityCheck(LHS, RHS, E->getExprLoc());
       CGF.EmitStoreThroughLValue(RValue::get(RHS), LHS);
diff --git a/clang/test/CodeGen/ubsan-bitfield-conversion.c b/clang/test/CodeGen/ubsan-bitfield-conversion.c
new file mode 100644
index 00000000000000..4b2940f106c576
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-bitfield-conversion.c
@@ -0,0 +1,27 @@
+// RUN: %clang -fsanitize=implicit-integer-truncation -target x86_64-linux -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang -fsanitize=implicit-integer-sign-change -target x86_64-linux -S -emit-llvm -o - %s | FileCheck %s
+typedef struct _xx {
+  int x1:3;
+  char x2:2;
+} xx, *pxx;
+
+xx vxx;
+
+// CHECK-LABEL: define{{.*}} void @foo1
+void foo1(int x) {
+  vxx.x1 = x;
+  // CHECK: call void @__ubsan_handle_implicit_conversion
+}
+
+// CHECK: declare void @__ubsan_handle_implicit_conversion
+
+// CHECK-LABEL: define{{.*}} void @foo2
+void foo2(int x) {
+  vxx.x2 = x;
+  // CHECK: call void @__ubsan_handle_implicit_conversion
+  // TODO: Ideally we should only emit once (emit is generated
+  //       when evaluating RHS integer->char and when storing
+  //       value in bitfield)
+  // CHECK: call void @__ubsan_handle_implicit_conversion
+}
+
diff --git a/compiler-rt/lib/ubsan/ubsan_diag.h b/compiler-rt/lib/ubsan/ubsan_diag.h
index b444e971b22838..1e60651e272a70 100644
--- a/compiler-rt/lib/ubsan/ubsan_diag.h
+++ b/compiler-rt/lib/ubsan/ubsan_diag.h
@@ -177,7 +177,7 @@ class Diag {
   };
 
 private:
-  static const unsigned MaxArgs = 8;
+  static const unsigned MaxArgs = 9;
   static const unsigned MaxRanges = 1;
 
   /// The arguments which have been added to this diagnostic so far.
diff --git a/compiler-rt/lib/ubsan/ubsan_handlers.cpp b/compiler-rt/lib/ubsan/ubsan_handlers.cpp
index 0f16507d5d88f1..1232f60836b090 100644
--- a/compiler-rt/lib/ubsan/ubsan_handlers.cpp
+++ b/compiler-rt/lib/ubsan/ubsan_handlers.cpp
@@ -594,14 +594,16 @@ static void handleImplicitConversion(ImplicitConversionData *Data,
 
   ScopedReport R(Opts, Loc, ET);
 
-  // FIXME: is it possible to dump the values as hex with fixed width?
+  unsigned DstBits =
+      Data->BitfieldBits ? Data->BitfieldBits : DstTy.getIntegerBitWidth();
 
+  // FIXME: is it possible to dump the values as hex with fixed width?
   Diag(Loc, DL_Error, ET,
        "implicit conversion from type %0 of value %1 (%2-bit, %3signed) to "
-       "type %4 changed the value to %5 (%6-bit, %7signed)")
+       "type %4 changed the value to %5 (%6-bit%7, %8signed)")
       << SrcTy << Value(SrcTy, Src) << SrcTy.getIntegerBitWidth()
-      << (SrcSigned ? "" : "un") << DstTy << Value(DstTy, Dst)
-      << DstTy.getIntegerBitWidth() << (DstSigned ? "" : "un");
+      << (SrcSigned ? "" : "un") << DstTy << Value(DstTy, Dst) << DstBits
+      << (Data->BitfieldBits ? " bitfield" : "") << (DstSigned ? "" : "un");
 }
 
 void __ubsan::__ubsan_handle_implicit_conversion(ImplicitConversionData *Data,
diff --git a/compiler-rt/lib...
[truncated]

@Zonotora
Copy link
Contributor Author

Hi, @vitalybuka @zygoloid @AaronBallman you might have something to say about this commit

@vitalybuka
Copy link
Collaborator

Is is UB?

@@ -1094,6 +1114,27 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value *Src, QualType SrcType,
{Src, Dst});
}

static llvm::Value *EmitIsNegativeTestHelper(Value *V, QualType VType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract lambda to function into a separate patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see it here, does it needs rebase, or that patch needs to be landed?

@vitalybuka
Copy link
Collaborator

@LebedevRI who added implicit-integer-truncation

@vitalybuka
Copy link
Collaborator

Is is UB?

I guess it follows existing convention implicit-integer-truncation is convenience check and implicit-signed-integer-truncation is UB

@Zonotora
Copy link
Contributor Author

Is is UB?

Don't think so? To quote [1]:

Issues caught by these sanitizers are not undefined behavior, but are often unintentional.

where "these sanitizers" refer to implicit-unsigned-integer-truncation, implicit-signed-integer-truncation and implicit-integer-sign-change.

[1] https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

@Zonotora
Copy link
Contributor Author

Zonotora commented Jan 2, 2024

Feedback @vitalybuka @AaronBallman @LebedevRI? 😃

@efriedma-quic
Copy link
Collaborator

Please add test coverage for compound assignment and increment/decrement.

This seems like a reasonable extension of the existing integer truncation checks, but we might want to consider giving it a unique name. Otherwise, people using integer truncation checks will have trouble upgrading their compiler.

@Zonotora
Copy link
Contributor Author

Zonotora commented Jan 2, 2024

Please add test coverage for compound assignment and increment/decrement.

This seems like a reasonable extension of the existing integer truncation checks, but we might want to consider giving it a unique name. Otherwise, people using integer truncation checks will have trouble upgrading their compiler.

Sure 👍 Fair enough, what name do you suggest? Replace "integer" with "bitfield" as in implicit-unsigned-bitfield-truncation, implicit-signed-bitfield-truncation and implicit-bitfield-sign-change? Or just have a single flag implicit-bitfield-conversion for now?

@llvmbot llvmbot added clang:frontend Language frontend issues, e.g. anything involving "Sema" compiler-rt:ubsan Undefined behavior sanitizer labels Feb 7, 2024
@Zonotora
Copy link
Contributor Author

Zonotora commented Feb 7, 2024

Hi again, I have now finally gotten time and updated the patch so that the unnecessary emits I mentioned in the initial commit are avoided. The current patch introduces a number of new fsanitizer flags to separate integer conversions from bitfield conversions. E.g.,

  • -fsanitize=implicit-unsigned-bitfield-truncation
  • -fsanitize=implicit-signed-bitfield-truncation
  • -fsanitize=implicit-bitfield-sign-change
  • -fsanitize=implicit-bitfield-truncation
  • -fsanitize=implicit-bitfield-arithmetic-value-change
  • -fsanitize=implicit-bitfield-conversion
  • -fsanitize=implicit-integer-conversion <---- This used to be -fsanitize=implicit-conversion

-fsanitize=implicit-conversion will now represent -fsanitize=implicit-integer-conversion and -fsanitize=implicit-bitfield-conversion.

Previously the following:

typedef struct {
    unsigned char a:4;
} X;

int main(void) {
    X x;
    unsigned int a = 272;
    x.a = a;
    return 0;
}

emitted an implicit integer conversion error in the assignment of x.a = a with the -fsanitize=implicit-integer-conversion. This is no longer the case as the assignment involves bitfields. To get the emission error, one would have to include the -fsanitize=implicit-bitfield-conversion flag instead.

I have compiled clang with the -fsanitizer flag -fsanitize=implicit-bitfield-conversion without any problems. What are your thoughts on this new change? @vitalybuka @AaronBallman @LebedevRI @efriedma-quic

@vabridgers
Copy link
Contributor

vabridgers commented Feb 9, 2024

LGTM, but someone else must approve. Only other thing I might add is to try enabling these checks for some open source test cases. Maybe look through some past reviews to see what others have done, or wait for one of the maintainers/owners to comment on which ones might be best.

Thanks for doing this work, Axel!

@AaronBallman
Copy link
Collaborator

Precommit CI seems to have found some related failures that should be investigated. I'd love to hear opinions from the codegen code owners on this, as well as @zygoloid as UBSan code owner. I'm not opposed to the changes, though I do find it somewhat strange to add them to UBSan given that they're not undefined behavior (even though there's precedent).

@Zonotora Zonotora force-pushed the main branch 2 times, most recently from 537ee15 to c70b72b Compare March 12, 2024 21:39
@@ -147,6 +147,7 @@ struct ImplicitConversionData {
const TypeDescriptor &FromType;
const TypeDescriptor &ToType;
/* ImplicitConversionCheckKind */ unsigned char Kind;
unsigned int BitfieldBits;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an ABI stability policy for UBsan? Is the change allowed under it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @zygoloid -- I'm not certain if we have an ABI stability policy for sanitizers or not.

@Zonotora
Copy link
Contributor Author

@LebedevRI LGTM?

This patch implements the implicit truncation and implicit
sign change checks for bitfields.
Update testcases due to new -fsanitize options regarding bitfields
@Zonotora
Copy link
Contributor Author

Also, I don't have commit access (first PR here), someone has to commit on my behalf!

@Zonotora
Copy link
Contributor Author

Also, I don't have commit access (first PR here), someone has to commit on my behalf!

@AaronBallman

@AaronBallman
Copy link
Collaborator

Also, I don't have commit access (first PR here), someone has to commit on my behalf!

@AaronBallman

Thank you for mentioning this! I'll commit on your behalf, but I'd like to hear from @zygoloid on the ABI question. It would also be nice if @efriedma-quic signed off on the testing changes (I think they're sufficient, but he may have other cases in mind).

@zygoloid
Copy link
Collaborator

I don't think we've established an explicit policy, and we've made ABI-breaking changes previously. I think we should avoid breaks within a major release version (don't backport this) to avoid giving packagers headaches: the ubsan runtime is installed in a versioned directory, but I don't think that includes the patch version.

Having an ABI break here seems OK and in line with historic precedent. (But we ought to document our ABI policy for our runtime libraries.)

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Zonotora
Copy link
Contributor Author

Zonotora commented Apr 3, 2024

@AaronBallman looks good?

@AaronBallman AaronBallman merged commit 450f195 into llvm:main Apr 3, 2024
4 checks passed
@vitalybuka
Copy link
Collaborator

This bot started inconsistently crash https://lab.llvm.org/buildbot/#/builders/75/builds/45086
Few builds before are green and tests only, so I suspect this patch.

@vitalybuka
Copy link
Collaborator

@AaronBallman
Copy link
Collaborator

Thank you for catching that and doing the revert!

@Zonotora
Copy link
Contributor Author

Zonotora commented Apr 3, 2024

Will look into this...

vitalybuka added a commit that referenced this pull request Apr 3, 2024
vitalybuka added a commit that referenced this pull request Apr 3, 2024
@vitalybuka
Copy link
Collaborator

vitalybuka commented Apr 3, 2024

Probably 1189e87 fixed the issue.
I will revert my revert if https://lab.llvm.org/buildbot/#/builders/269/builds/7066 is green.

@vitalybuka
Copy link
Collaborator

Still had to revert with #87562

vitalybuka added a commit that referenced this pull request Apr 8, 2024
)

Fix since #75481 got reverted.

- Explicitly set BitfieldBits to 0 to avoid uninitialized field member
for the integer checks:
```diff
-       llvm::ConstantInt::get(Builder.getInt8Ty(), Check.first)};
+      llvm::ConstantInt::get(Builder.getInt8Ty(), Check.first),
+      llvm::ConstantInt::get(Builder.getInt32Ty(), 0)};
```
- `Value **Previous` was erroneously `Value *Previous` in
`CodeGenFunction::EmitWithOriginalRHSBitfieldAssignment`, fixed now.
- Update following:
```diff
-     if (Kind == CK_IntegralCast) {
+     if (Kind == CK_IntegralCast || Kind == CK_LValueToRValue) {
```
CK_LValueToRValue when going from, e.g., char to char, and
CK_IntegralCast otherwise.
- Make sure that `Value *Previous = nullptr;` is initialized (see
1189e87)
- Add another extensive testcase
`ubsan/TestCases/ImplicitConversion/bitfield-conversion.c`

---------

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt:ubsan Undefined behavior sanitizer compiler-rt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants