Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Mention possibility of underflow in array overflow errors #84201

Merged
merged 5 commits into from
Mar 19, 2024

Conversation

NagyDonat
Copy link
Contributor

The checker alpha.security.ArrayBoundV2 performs bounds checking in two steps: first it checks for underflow, and if it isn't guaranteed then it assumes that there is no underflow. After this, it checks for overflow, and if that's guaranteed or the index is tainted then it reports it.

This meant that in situations where overflow and underflow are both possible (but the index is either tainted or guaranteed to be invalid), the checker was reporting just an overflow error.

This commit modifies the messages printed in these cases to mention the possibility of an underflow.

The checker alpha.security.ArrayBoundV2 performs bounds checking in two
steps: first it checks for underflow, and if it isn't guaranteed then it
assumes that there is no underflow. After this, it checks for overflow,
and if it's guaranteed or the index is tainted then it reports it.

This meant that in situations where overflow and underflow are both
possible (but the index is either tainted or guaranteed to be invalid),
the checker was reporting an overflow error.

This commit modifies the messages printed in these cases to mention the
possibility of an underflow.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Mar 6, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: None (NagyDonat)

Changes

The checker alpha.security.ArrayBoundV2 performs bounds checking in two steps: first it checks for underflow, and if it isn't guaranteed then it assumes that there is no underflow. After this, it checks for overflow, and if that's guaranteed or the index is tainted then it reports it.

This meant that in situations where overflow and underflow are both possible (but the index is either tainted or guaranteed to be invalid), the checker was reporting just an overflow error.

This commit modifies the messages printed in these cases to mention the possibility of an underflow.


Full diff: https://github.com/llvm/llvm-project/pull/84201.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+24-13)
  • (modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+55-1)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+1-1)
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 29eb932584027d..664d26b00a3cab 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -83,6 +83,8 @@ class StateUpdateReporter {
     AssumedUpperBound = UpperBoundVal;
   }
 
+  bool assumedNonNegative() { return AssumedNonNegative; }
+
   const NoteTag *createNoteTag(CheckerContext &C) const;
 
 private:
@@ -402,7 +404,8 @@ static bool tryDividePair(std::optional<int64_t> &Val1,
 }
 
 static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
-                               NonLoc Offset, NonLoc Extent, SVal Location) {
+                               NonLoc Offset, NonLoc Extent, SVal Location,
+                               bool AssumedNonNegative) {
   std::string RegName = getRegionName(Region);
   const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
   assert(EReg && "this checker only handles element access");
@@ -414,6 +417,7 @@ static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
   int64_t ElemSize = ACtx.getTypeSizeInChars(ElemType).getQuantity();
 
   bool UseByteOffsets = !tryDividePair(OffsetN, ExtentN, ElemSize);
+  const char *OffsetOrIndex = UseByteOffsets ? "byte offset" : "index";
 
   SmallString<256> Buf;
   llvm::raw_svector_ostream Out(Buf);
@@ -421,10 +425,12 @@ static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
   if (!ExtentN && !UseByteOffsets)
     Out << "'" << ElemType.getAsString() << "' element in ";
   Out << RegName << " at ";
-  if (OffsetN) {
-    Out << (UseByteOffsets ? "byte offset " : "index ") << *OffsetN;
+  if (AssumedNonNegative) {
+    Out << "a negative or overflowing " << OffsetOrIndex;
+  } else if (OffsetN) {
+    Out << OffsetOrIndex << " " << *OffsetN;
   } else {
-    Out << "an overflowing " << (UseByteOffsets ? "byte offset" : "index");
+    Out << "an overflowing " << OffsetOrIndex;
   }
   if (ExtentN) {
     Out << ", while it holds only ";
@@ -441,17 +447,19 @@ static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
       Out << "s";
   }
 
-  return {
-      formatv("Out of bound access to memory after the end of {0}", RegName),
-      std::string(Buf)};
+  return {formatv("Out of bound access to memory {0} {1}",
+                  AssumedNonNegative ? "around" : "after the end of", RegName),
+          std::string(Buf)};
 }
 
-static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName,
+                             bool AssumedNonNegative) {
   std::string RegName = getRegionName(Region);
   return {formatv("Potential out of bound access to {0} with tainted {1}",
                   RegName, OffsetName),
-          formatv("Access of {0} with a tainted {1} that may be too large",
-                  RegName, OffsetName)};
+          formatv("Access of {0} with a tainted {1} that may be {2}too large",
+                  RegName, OffsetName,
+                  AssumedNonNegative ? "negative or " : "")};
 }
 
 const NoteTag *StateUpdateReporter::createNoteTag(CheckerContext &C) const {
@@ -603,6 +611,8 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
     auto [WithinUpperBound, ExceedsUpperBound] =
         compareValueToThreshold(State, ByteOffset, *KnownSize, SVB);
 
+    bool AssumedNonNegative = SUR.assumedNonNegative();
+
     if (ExceedsUpperBound) {
       // The offset may be invalid (>= Size)...
       if (!WithinUpperBound) {
@@ -615,8 +625,9 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
           return;
         }
 
-        Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
-                                       *KnownSize, Location);
+        Messages Msgs =
+            getExceedsMsgs(C.getASTContext(), Reg, ByteOffset, *KnownSize,
+                           Location, AssumedNonNegative);
         reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize);
         return;
       }
@@ -632,7 +643,7 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
           if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
             OffsetName = "index";
 
-        Messages Msgs = getTaintMsgs(Reg, OffsetName);
+        Messages Msgs = getTaintMsgs(Reg, OffsetName, AssumedNonNegative);
         reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize,
                   /*IsTaintBug=*/true);
         return;
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 0c3c67c6a546ad..db1bebe10a2f58 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -24,6 +24,33 @@ void taintedIndex(void) {
   scanf("%d", &index);
   // expected-note@-1 {{Taint originated here}}
   // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  TenElements[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'TenElements' with tainted index}}
+  // expected-note@-2 {{Access of 'TenElements' with a tainted index that may be negative or too large}}
+}
+
+void taintedIndexNonneg(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+
+  // expected-note@+2 {{Assuming 'index' is >= 0}}
+  // expected-note@+1 {{Taking false branch}}
+  if (index < 0)
+    return;
+
+  TenElements[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'TenElements' with tainted index}}
+  // expected-note@-2 {{Access of 'TenElements' with a tainted index that may be too large}}
+}
+
+void taintedIndexUnsigned(void) {
+  unsigned index;
+  scanf("%u", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+
   TenElements[index] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'TenElements' with tainted index}}
   // expected-note@-2 {{Access of 'TenElements' with a tainted index that may be too large}}
@@ -59,7 +86,7 @@ void taintedOffset(void) {
   int *p = TenElements + index;
   p[0] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'TenElements' with tainted offset}}
-  // expected-note@-2 {{Access of 'TenElements' with a tainted offset that may be too large}}
+  // expected-note@-2 {{Access of 'TenElements' with a tainted offset that may be negative or too large}}
 }
 
 void arrayOverflow(void) {
@@ -109,6 +136,33 @@ int *potentialAfterTheEndPtr(int idx) {
   // &TenElements[idx].
 }
 
+int overflowOrUnderflow(int arg) {
+  // expected-note@+2 {{Assuming 'arg' is < 0}}
+  // expected-note@+1 {{Taking false branch}}
+  if (arg >= 0)
+    return 0;
+
+  return TenElements[arg - 1];
+  // expected-warning@-1 {{Out of bound access to memory around 'TenElements'}}
+  // expected-note@-2 {{Access of 'TenElements' at a negative or overflowing index, while it holds only 10 'int' elements}}
+}
+
+char TwoElements[2] = {11, 22};
+char overflowOrUnderflowConcrete(int arg) {
+  // expected-note@+6 {{Assuming 'arg' is < 3}}
+  // expected-note@+5 {{Left side of '||' is false}}
+  // expected-note@+4 {{Assuming 'arg' is not equal to 0}}
+  // expected-note@+3 {{Left side of '||' is false}}
+  // expected-note@+2 {{Assuming 'arg' is not equal to 1}}
+  // expected-note@+1 {{Taking false branch}}
+  if (arg >= 3 || arg == 0 || arg == 1)
+    return 0;
+
+  return TwoElements[arg];
+  // expected-warning@-1 {{Out of bound access to memory around 'TwoElements'}}
+  // expected-note@-2 {{Access of 'TwoElements' at a negative or overflowing index, while it holds only 2 'char' elements}}
+}
+
 int scalar;
 int scalarOverflow(void) {
   return (&scalar)[1];
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 020e9579ac535c..2ba7d9938fc3d8 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -30,7 +30,7 @@ int taintDiagnosticOutOfBound(void) {
   scanf("%d", &index); // expected-note {{Taint originated here}}
                        // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
-                       // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
+                       // expected-note@-1 {{Access of 'Array' with a tainted index}}
 }
 
 int taintDiagnosticDivZero(int operand) {

@@ -603,6 +611,8 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
auto [WithinUpperBound, ExceedsUpperBound] =
compareValueToThreshold(State, ByteOffset, *KnownSize, SVB);

bool AssumedNonNegative = SUR.assumedNonNegative();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name AssumedNonNegative looks misleading (in variable and function arguments too). This has meaning like CanBeNegative which is a better name for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but CanBeNegative is also somewhat incorrect here because at this point (in the most recent State) the value of the symbol cannot be negative. I think I'll use something like AlsoMentionUnderflow with a comment that explains its meaning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first observation was at the text output generation when at the if statement AssumedNonNegative is true and a message is printed "can be negative or ...", this looks like a bug. Because the same name is used at other places this may clarify the meaning, but a comment would be useful anyway (even if the current name remains).
I would not use a variable, instead call SUR.assumedNonNegative() directly, then it is better visible that the value comes from StateUpdateReporter.

@balazske
Copy link
Collaborator

balazske commented Mar 7, 2024

The change looks correct, but it would be more accurate if 3 different index error cases would be possible, index is too small (negative), too large, or can be both too small and too large.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Mar 7, 2024

The change looks correct, but it would be more accurate if 3 different index error cases would be possible, index is too small (negative), too large, or can be both too small and too large.

From the point of view of the user this is already true, with this commit there will be three separate kinds of warning messages (+one taint-based):

  • Out of bound access to memory preceding <region>
  • Out of bound access to memory after the end of <region>
  • Out of bound access to memory around <region>
  • Potential out of bound access to <region> with a tainted offset

In the source code the "after the end of" and the "around" messages are generated by the same function (because the corresponding note has lots of "moving parts" and there is a large overlap between the necessary logic), but that's just an implementation detail.

@NagyDonat NagyDonat requested a review from balazske March 8, 2024 09:52
Copy link
Collaborator

@balazske balazske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not totally sure but this looks correct with the new variable names and there are some tests for the new case.

@@ -83,6 +83,8 @@ class StateUpdateReporter {
AssumedUpperBound = UpperBoundVal;
}

bool assumedNonNegative() { return AssumedNonNegative; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called getAssumedNonNegative or hasAssumedNonNegative (but the naming rule looks not very strict for this case)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the current name because it's more natural if we're looking at this from the outside (the get/has variants are awkward because AssumedNonNegative is not a noun phrase); and it's just an implementation detail that this is a getter for a private data member.

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scrolled over, and it looks harmless, so LGTM.

steakhal and others added 2 commits March 11, 2024 14:10
This reverts commit df6a67f, which was
accidentally pushed here instead of the branch corresponding to
llvm#84469
@NagyDonat NagyDonat merged commit 175ad66 into llvm:main Mar 19, 2024
4 checks passed
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…lvm#84201)

The checker alpha.security.ArrayBoundV2 performs bounds checking in two
steps: first it checks for underflow, and if it isn't guaranteed then it
assumes that there is no underflow. After this, it checks for overflow,
and if that's guaranteed or the index is tainted then it reports it.

This meant that in situations where overflow and underflow are both
possible (but the index is either tainted or guaranteed to be invalid),
the checker was reporting just an overflow error.

This commit modifies the messages printed in these cases to mention the
possibility of an underflow.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants