Skip to content

[-Wunsafe-buffer-usage] Warning for unsafe invocation of span::data #75650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 2, 2024

Conversation

malavikasamak
Copy link
Contributor

…-Wunsafe-buffer-usage,

there maybe accidental re-introduction of new OutOfBound accesses into the code bases. One such case is invoking span::data() method on a span variable to retrieve a pointer, which is then cast to a larger type and dereferenced. Such dereferences can introduce OutOfBound accesses.

To address this, a new WarningGadget is being introduced to warn against such invocations.

…-Wunsafe-buffer-usage,

there maybe accidental re-introduction of new OutOfBound accesses into the code bases. One
such case is invoking span::data() method on a span variable to retrieve a pointer, which is
then cast to a larger type and dereferenced. Such dereferences can introduce OutOfBound accesses.

To address this, a new WarningGadget is being introduced to warn against such invocations.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:analysis labels Dec 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 15, 2023

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: Malavika Samak (malavikasamak)

Changes

…-Wunsafe-buffer-usage,

there maybe accidental re-introduction of new OutOfBound accesses into the code bases. One such case is invoking span::data() method on a span variable to retrieve a pointer, which is then cast to a larger type and dereferenced. Such dereferences can introduce OutOfBound accesses.

To address this, a new WarningGadget is being introduced to warn against such invocations.


Full diff: https://github.com/llvm/llvm-project/pull/75650.diff

6 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h (+1-1)
  • (modified) clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def (+1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+1-1)
  • (modified) clang/lib/Analysis/UnsafeBufferUsage.cpp (+33-4)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+17-2)
  • (added) clang/test/SemaCXX/warn-unsafe-buffer-usage-warning-data-invocation.cpp (+133)
diff --git a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
index 8a2d56668e32f9..b28f2c6b99c50e 100644
--- a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
+++ b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
@@ -66,7 +66,7 @@ class UnsafeBufferUsageHandler {
 
   /// Invoked when an unsafe operation over raw pointers is found.
   virtual void handleUnsafeOperation(const Stmt *Operation,
-                                     bool IsRelatedToDecl) = 0;
+                                     bool IsRelatedToDecl, ASTContext &Ctx) = 0;
 
   /// Invoked when a fix is suggested against a variable. This function groups
   /// all variables that must be fixed together (i.e their types must be changed
diff --git a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
index 757ee452ced748..c9766168836510 100644
--- a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
+++ b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
@@ -30,6 +30,7 @@ WARNING_GADGET(Decrement)
 WARNING_GADGET(ArraySubscript)
 WARNING_GADGET(PointerArithmetic)
 WARNING_GADGET(UnsafeBufferUsageAttr)
+WARNING_GADGET(DataInvocation)
 FIXABLE_GADGET(ULCArraySubscript)          // `DRE[any]` in an Unspecified Lvalue Context
 FIXABLE_GADGET(DerefSimplePtrArithFixable)
 FIXABLE_GADGET(PointerDereference)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 94e97a891baedc..9038dd879f54ae 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12058,7 +12058,7 @@ def warn_unsafe_buffer_variable : Warning<
   InGroup<UnsafeBufferUsage>, DefaultIgnore;
 def warn_unsafe_buffer_operation : Warning<
   "%select{unsafe pointer operation|unsafe pointer arithmetic|"
-  "unsafe buffer access|function introduces unsafe buffer manipulation}0">,
+  "unsafe buffer access|function introduces unsafe buffer manipulation|unsafe invocation of span::data}0">,
   InGroup<UnsafeBufferUsage>, DefaultIgnore;
 def note_unsafe_buffer_operation : Note<
   "used%select{| in pointer arithmetic| in buffer access}0 here">;
diff --git a/clang/lib/Analysis/UnsafeBufferUsage.cpp b/clang/lib/Analysis/UnsafeBufferUsage.cpp
index 70eec1cee57f8e..e4eca9939b10d5 100644
--- a/clang/lib/Analysis/UnsafeBufferUsage.cpp
+++ b/clang/lib/Analysis/UnsafeBufferUsage.cpp
@@ -721,6 +721,33 @@ class UnsafeBufferUsageAttrGadget : public WarningGadget {
   DeclUseList getClaimedVarUseSites() const override { return {}; }
 };
 
+// Warning gadget for unsafe invocation of span::data method.
+// Triggers when the pointer returned by the invocation is immediately
+// cast to a larger type.
+
+class DataInvocationGadget : public WarningGadget {
+  constexpr static const char *const OpTag = "data_invocation_expr";
+  const ExplicitCastExpr *Op;
+
+  public:
+  DataInvocationGadget(const MatchFinder::MatchResult &Result)
+      : WarningGadget(Kind::DataInvocation),
+        Op(Result.Nodes.getNodeAs<ExplicitCastExpr>(OpTag)) {}
+
+  static bool classof(const Gadget *G) {
+    return G->getKind() == Kind::DataInvocation;
+  }
+ 
+  static Matcher matcher() {
+    return stmt(
+        explicitCastExpr(has(cxxMemberCallExpr(callee(
+               cxxMethodDecl(hasName("data")))))).bind(OpTag));
+  }   
+  const Stmt *getBaseStmt() const override { return Op; }
+
+  DeclUseList getClaimedVarUseSites() const override { return {}; }
+};
+
 // Represents expressions of the form `DRE[*]` in the Unspecified Lvalue
 // Context (see `isInUnspecifiedLvalueContext`).
 // Note here `[]` is the built-in subscript operator.
@@ -2657,8 +2684,8 @@ void clang::checkUnsafeBufferUsage(const Decl *D,
     // every problematic operation and consider it done. No need to deal
     // with fixable gadgets, no need to group operations by variable.
     for (const auto &G : WarningGadgets) {
-      Handler.handleUnsafeOperation(G->getBaseStmt(),
-                                    /*IsRelatedToDecl=*/false);
+      Handler.handleUnsafeOperation(G->getBaseStmt(), /*IsRelatedToDecl=*/false, 
+                                    D->getASTContext());
     }
 
     // This return guarantees that most of the machine doesn't run when
@@ -2893,7 +2920,8 @@ void clang::checkUnsafeBufferUsage(const Decl *D,
                   Tracker, Handler, VarGrpMgr);
 
   for (const auto &G : UnsafeOps.noVar) {
-    Handler.handleUnsafeOperation(G->getBaseStmt(), /*IsRelatedToDecl=*/false);
+    Handler.handleUnsafeOperation(G->getBaseStmt(), /*IsRelatedToDecl=*/false
+                                  , D->getASTContext());
   }
 
   for (const auto &[VD, WarningGadgets] : UnsafeOps.byVar) {
@@ -2904,7 +2932,8 @@ void clang::checkUnsafeBufferUsage(const Decl *D,
                                       : FixItList{},
                                       D);
     for (const auto &G : WarningGadgets) {
-      Handler.handleUnsafeOperation(G->getBaseStmt(), /*IsRelatedToDecl=*/true);
+      Handler.handleUnsafeOperation(G->getBaseStmt(), /*IsRelatedToDecl=*/true
+                                    , D->getASTContext());
     }
   }
 }
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 0947e8b0f526a3..4f8e181806957e 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -2226,8 +2226,8 @@ class UnsafeBufferUsageReporter : public UnsafeBufferUsageHandler {
   UnsafeBufferUsageReporter(Sema &S, bool SuggestSuggestions)
     : S(S), SuggestSuggestions(SuggestSuggestions) {}
 
-  void handleUnsafeOperation(const Stmt *Operation,
-                             bool IsRelatedToDecl) override {
+  void handleUnsafeOperation(const Stmt *Operation, bool IsRelatedToDecl, 
+                             ASTContext &Ctx) override {
     SourceLocation Loc;
     SourceRange Range;
     unsigned MsgParam = 0;
@@ -2261,6 +2261,21 @@ class UnsafeBufferUsageReporter : public UnsafeBufferUsageHandler {
         // note_unsafe_buffer_operation doesn't have this mode yet.
         assert(!IsRelatedToDecl && "Not implemented yet!");
         MsgParam = 3;
+      } else if (const auto *ECE = dyn_cast<ExplicitCastExpr>(Operation)) {
+        QualType destType = ECE->getType();
+        const uint64_t dSize = Ctx.getTypeSize(destType.getTypePtr()->getPointeeType());
+        if(const auto *CE =dyn_cast<CXXMemberCallExpr>(ECE->getSubExpr())) {
+
+          if(CE->getRecordDecl()->getQualifiedNameAsString().compare("std::span"))
+           return;
+         
+          QualType srcType = CE->getType();
+          const uint64_t sSize = Ctx.getTypeSize(srcType.getTypePtr()->getPointeeType());
+          if(sSize >= dSize)
+            return;
+        }
+        MsgParam = 4;
+ 
       }
       Loc = Operation->getBeginLoc();
       Range = Operation->getSourceRange();
diff --git a/clang/test/SemaCXX/warn-unsafe-buffer-usage-warning-data-invocation.cpp b/clang/test/SemaCXX/warn-unsafe-buffer-usage-warning-data-invocation.cpp
new file mode 100644
index 00000000000000..5cce91ae4753de
--- /dev/null
+++ b/clang/test/SemaCXX/warn-unsafe-buffer-usage-warning-data-invocation.cpp
@@ -0,0 +1,133 @@
+// RUN: %clang_cc1 -std=c++20  -Wno-all -Wunsafe-buffer-usage \
+// RUN:            -fsafe-buffer-usage-suggestions \
+// RUN:            -fblocks -include %s -verify %s
+
+// RUN: %clang -x c++ -frtti -fsyntax-only -fblocks -include %s %s 2>&1 | FileCheck --allow-empty %s
+// RUN: %clang_cc1 -std=c++11 -fblocks -include %s %s 2>&1 | FileCheck --allow-empty %s
+// RUN: %clang_cc1 -std=c++20 -fblocks -include %s %s 2>&1 | FileCheck --allow-empty %s
+// CHECK-NOT: [-Wunsafe-buffer-usage]
+
+#ifndef INCLUDED
+#define INCLUDED
+#pragma clang system_header
+
+// no spanification warnings for system headers
+void foo(...);  // let arguments of `foo` to hold testing expressions
+#else
+
+namespace std {
+  class type_info;
+  class bad_cast;
+  class bad_typeid;
+}
+using size_t = __typeof(sizeof(int));
+void *malloc(size_t);
+
+void foo(int v) {
+}
+
+void foo(int *p){}
+
+namespace std{
+  template <typename T> class span {
+
+  T *elements;
+ 
+  span(T *, unsigned){}
+
+  public:
+
+  constexpr span<T> subspan(size_t offset, size_t count) const {
+    return span<T> (elements+offset, count); // expected-warning{{unsafe pointer arithmetic}}
+  }
+
+  constexpr T* data() const noexcept {
+    return elements;
+  }
+
+ 
+  constexpr T* hello() const noexcept {
+   return elements;
+  }
+};
+ 
+ template <typename T> class span_duplicate {
+  span_duplicate(T *, unsigned){}
+
+  T array[10];
+
+  public:
+
+  T* data() {
+    return array;
+  }
+
+};
+}
+
+class span {
+
+ int array[10];
+ 
+ public:
+
+ int *data() {
+   return array;
+ }
+};
+
+class A {
+  int a, b, c;
+};
+
+struct Base {
+   virtual ~Base() = default;
+};
+
+struct Derived: Base {
+  int d;
+};
+
+void cast_without_data(int *ptr) {
+ A *a = (A*) ptr;
+ float *p = (float*) ptr;
+}
+
+void warned_patterns(std::span<int> span_ptr, std::span<Base> base_span) {
+    int *p;
+    A *a = (A*)span_ptr.data(); // expected-warning{{unsafe invocation of span::data}}
+    a = (A*)span_ptr.data(); // expected-warning{{unsafe invocation of span::data}}
+   
+    // TODO:: Should we warn when we cast from base to derived type?
+    Derived *b = dynamic_cast<Derived*> (base_span.data());// expected-warning{{unsafe invocation of span::data}}
+
+   // TODO:: This pattern is safe. We can add special handling for it, if we decide this
+   // is the recommended fixit for the unsafe invocations.
+   a = (A*)span_ptr.subspan(0, sizeof(A)).data(); // expected-warning{{unsafe invocation of span::data}}
+}
+
+void not_warned_patterns(std::span<A> span_ptr, std::span<Base> base_span) {
+    int *p = (int*)span_ptr.data();
+    p = (int*)span_ptr.data();
+    A *a = (A*) span_ptr.hello();
+}
+
+// We do not want to warn about other types
+void other_classes(std::span_duplicate<int> span_ptr, span sp) {
+    int *p;
+    A *a = (A*)span_ptr.data();
+    a = (A*)span_ptr.data(); 
+
+    a = (A*)sp.data();
+}
+
+// Potential source for false negatives
+
+void false_negatives(std::span<int> span_pt) {
+  int *ptr = span_pt.data();
+  A *a = (A*)ptr; //TODO: We want to warn here eventually.
+
+  int k = *ptr; // TODO: Can cause OOB if span_pt is empty
+
+}
+#endif

Copy link

github-actions bot commented Dec 15, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@malavikasamak malavikasamak changed the title While refactoring projects to eradicate unsafe buffer accesses using … Warning for unsafe invocation of span::data Dec 15, 2023
static Matcher matcher() {
return stmt(
explicitCastExpr(has(cxxMemberCallExpr(callee(
cxxMethodDecl(hasName("data")))))).bind(OpTag));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also match on user defined functions called "data"? I think something like cxxMethodDecl(ofClass(hasName("span"))) might be needed to match span.data() alone (but I might be wrong). In any case, maybe have a test case with a user defined function called "data" which'll show the matcher matches only on span.data().

Copy link
Contributor Author

@malavikasamak malavikasamak Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It matches all method invocations with name "data". However, we wouldn't be warning on all of them as the "handleUnsafeOperation" checks if the method belongs to std::span class. It maybe better to move it to the matcher as you point out, so there is less overlap between gadgets. The test case "other_classes" is meant to check this.

// cast to a larger type.

class DataInvocationGadget : public WarningGadget {
constexpr static const char *const OpTag = "data_invocation_expr";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nitpick: I noticed most tag strings use camel cases. We better be consistent on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Will fix this.

const uint64_t dSize = Ctx.getTypeSize(destType.getTypePtr()->getPointeeType());
if(const auto *CE =dyn_cast<CXXMemberCallExpr>(ECE->getSubExpr())) {

if(CE->getRecordDecl()->getQualifiedNameAsString().compare("std::span"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a tiny bit skeptical that if the qualified name is always std::span.
Maybe we can have a test like this:

using namespace std;
void f(span<int> buf) {
   BigTy *p = (BigTy *) buf.data();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I think it shouldn't be an issue as we are getting the qualified name. However, I will add a test case to check this.

Copy link
Contributor

@t-rasmud t-rasmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@ziqingluo-90 ziqingluo-90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@malavikasamak malavikasamak changed the title Warning for unsafe invocation of span::data [-Wunsafe-buffer-usage] Warning for unsafe invocation of span::data Jan 2, 2024
@malavikasamak malavikasamak merged commit 7122f55 into llvm:main Jan 2, 2024
malavikasamak added a commit to swiftlang/llvm-project that referenced this pull request Jan 12, 2024
…lvm#75650)

…-Wunsafe-buffer-usage,

there maybe accidental re-introduction of new OutOfBound accesses into
the code bases. One such case is invoking span::data() method on a span
variable to retrieve a pointer, which is then cast to a larger type and
dereferenced. Such dereferences can introduce OutOfBound accesses.

To address this, a new WarningGadget is being introduced to warn against
such invocations.

---------

Co-authored-by: MalavikaSamak <malavika2@apple.com>
(cherry picked from commit 7122f55)
@malavikasamak malavikasamak deleted the data-invocation-warning branch July 22, 2024 20:19
@malavikasamak malavikasamak restored the data-invocation-warning branch July 22, 2024 20:19
@malavikasamak malavikasamak deleted the data-invocation-warning branch July 22, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants