Skip to content

Conversation

ddkilzer
Copy link

Initial version of a bounds information checker to warn when the two-argument std::span constructor has a suspicious-looking size.

Initial version of a bounds information checker to warn when the
two-argument std::span constructor has a suspicious-looking size.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2024

@llvm/pr-subscribers-clang

Author: David Kilzer (ddkilzer)

Changes

Initial version of a bounds information checker to warn when the two-argument std::span constructor has a suspicious-looking size.


Full diff: https://github.com/llvm/llvm-project/pull/112784.diff

5 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+5)
  • (added) clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp (+199)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt (+1)
  • (added) clang/test/Analysis/bounds-information-checker.cpp (+76)
  • (added) clang/test/Analysis/std-span-system-header.h (+78)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 6e224a4e098ad2..49900f36f8fd10 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -775,6 +775,11 @@ def VirtualCallChecker : Checker<"VirtualCall">,
 
 let ParentPackage = CplusplusAlpha in {
 
+def BoundsInformationChecker : Checker<"BoundsInformation">,
+  HelpText<"Confirms that view objects such as std::span are constructed "
+           "within the bounds of the source buffer">,
+  Documentation<NotDocumented>;
+
 def ContainerModeling : Checker<"ContainerModeling">,
   HelpText<"Models C++ containers">,
   Documentation<NotDocumented>,
diff --git a/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp
new file mode 100644
index 00000000000000..8588c068e50f37
--- /dev/null
+++ b/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp
@@ -0,0 +1,199 @@
+//== BoundsInformationChecker.cpp - bounds information checker --*- C++ -*--==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This defines BoundsInformationChecker, a path-sensitive checker that
+// checks that the buffer and count arguments are within the bounds of
+// the source buffer.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/CheckerManager.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+class BoundsInformationChecker : public Checker<check::PreCall> {
+  const BugType BT_DifferentMemRegion{
+      this, "std::span constructor arguments from different sources",
+      categories::SecurityError};
+  const BugType BT_NonConstantSizeArg{
+      this,
+      "std::span constructor for std::array has non-constant size argument",
+      categories::SecurityError};
+  const BugType BT_OutOfBounds{
+      this,
+      "std::span constructor for std::array uses out-of-bounds size argument",
+      categories::SecurityError};
+  void reportBug(ExplodedNode *N, const Expr *E, CheckerContext &C,
+                 const BugType &BT, StringRef Msg) const;
+
+public:
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
+};
+} // end anonymous namespace
+
+void BoundsInformationChecker::reportBug(ExplodedNode *N, const Expr *E,
+                                         CheckerContext &C, const BugType &BT,
+                                         StringRef Msg) const {
+  // Generate a report for this bug.
+  auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+  if (auto *CE = dyn_cast<CXXConstructExpr>(E)) {
+    bugreporter::trackExpressionValue(N, CE->getArg(0), *R);
+    bugreporter::trackExpressionValue(N, CE->getArg(1), *R);
+  }
+  C.emitReport(std::move(R));
+}
+
+static const MemRegion *GetRegionOrigin(SVal SV) {
+  const SymExpr *Sym = SV.getAsSymbol(/*IncludeBaseRegions =*/true);
+  return Sym ? Sym->getOriginRegion() : nullptr;
+}
+
+static const ValueDecl *GetExpressionOrigin(const Stmt *STMT) {
+  if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(STMT)) {
+    if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+      return VD;
+  } else if (const CXXMemberCallExpr *MCE = dyn_cast<CXXMemberCallExpr>(STMT)) {
+    if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(
+            MCE->getImplicitObjectArgument()->IgnoreParenCasts())) {
+      if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+        return VD;
+    } else if (const MemberExpr *ME = dyn_cast<MemberExpr>(
+                   MCE->getImplicitObjectArgument()->IgnoreParenCasts())) {
+      if (const FieldDecl *FD = dyn_cast<FieldDecl>(ME->getMemberDecl()))
+        return FD;
+    }
+  } else if (const CXXOperatorCallExpr *OCE =
+                 dyn_cast<CXXOperatorCallExpr>(STMT)) {
+    if (OCE->getNumArgs() >= 1) {
+      if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(OCE->getArg(0))) {
+        if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+          return VD;
+      }
+    }
+  } else if (const UnaryOperator *UnaryOp = dyn_cast<UnaryOperator>(STMT)) {
+    if (const ArraySubscriptExpr *ASExpr =
+            dyn_cast<ArraySubscriptExpr>(UnaryOp->getSubExpr())) {
+      if (const DeclRefExpr *DRE =
+              dyn_cast<DeclRefExpr>(ASExpr->getBase()->IgnoreParenCasts())) {
+        if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+          return VD;
+      }
+    }
+  } else if (const UnaryExprOrTypeTraitExpr *UTExpr =
+                 dyn_cast<UnaryExprOrTypeTraitExpr>(STMT)) {
+    if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(
+            UTExpr->getArgumentExpr()->IgnoreParenCasts())) {
+      if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+        return VD;
+    }
+  }
+  return nullptr;
+}
+
+static const ValueDecl *GetConjuredSymbolOrigin(SVal SV) {
+  const SymExpr *Sym = SV.getAsSymbol(/*IncludeBaseRegions =*/true);
+  if (const SymbolConjured *SCArg = dyn_cast_or_null<SymbolConjured>(Sym)) {
+    if (const Stmt *STMTArg = SCArg->getStmt())
+      return GetExpressionOrigin(STMTArg);
+  }
+  return nullptr;
+}
+
+void BoundsInformationChecker::checkPreCall(const CallEvent &Call,
+                                            CheckerContext &C) const {
+  // Return early if not std::span<IT>(IT, size_t) constructor.
+  // a. Check if this is a ctor for std::span.
+  CallDescription CD({CDM::CXXMethod, {"std", "span", "span"}});
+  if (!CD.matches(Call))
+    return;
+  // b. Check if std::span ctor has two arguments.
+  if (Call.getNumArgs() != 2)
+    return;
+  // c. Check if second std::span ctor argument is of type size_t.
+  if (Call.getArgExpr(1)->getType().getCanonicalType() !=
+      C.getASTContext().getSizeType())
+    return;
+
+  SVal PointerArg = Call.getArgSVal(0);
+  SVal SizeArg = Call.getArgSVal(1);
+
+  // If buffer and length params are not from the same "source", then report a
+  // bug.
+  const MemRegion *MRArg0 = GetRegionOrigin(PointerArg);
+  const MemRegion *MRArg1 = GetRegionOrigin(SizeArg);
+  if (MRArg0 && MRArg1 && MRArg0->getBaseRegion() != MRArg1->getBaseRegion()) {
+    // FIXME: Add more logic to filter out valid cases.
+    if (ExplodedNode *N = C.generateNonFatalErrorNode(C.getState())) {
+      reportBug(
+          N, Call.getOriginExpr(), C, BT_DifferentMemRegion,
+          "Constructor args for std::span are from different memory regions");
+      return;
+    }
+  }
+
+  // Check if value comes from an unknown function call.
+  const ValueDecl *VDArg0 = GetConjuredSymbolOrigin(PointerArg);
+  const ValueDecl *VDArg1 = GetConjuredSymbolOrigin(SizeArg);
+
+  if (VDArg0) {
+    // If first argument is std::array.
+    // FIXME: Support C arrays.
+    if (const auto *CRDecl0 = VDArg0->getType()->getAsCXXRecordDecl()) {
+      if (CRDecl0->isInStdNamespace() && CRDecl0->getIdentifier() &&
+          CRDecl0->getName() == "array") {
+        if (VDArg0 != VDArg1) {
+          // Check second argument against known size of std::array.
+          if (SizeArg.isConstant()) {
+            if (const auto *CTSDecl =
+                    dyn_cast<ClassTemplateSpecializationDecl>(CRDecl0)) {
+              const TemplateArgumentList &templateArgList =
+                  CTSDecl->getTemplateArgs();
+              if (templateArgList.size() == 2) {
+                const TemplateArgument &templateArg1 = templateArgList[1];
+                if (templateArg1.getKind() ==
+                        TemplateArgument::ArgKind::Integral &&
+                    *SizeArg.getAsInteger() > templateArg1.getAsIntegral()) {
+                  if (ExplodedNode *N =
+                          C.generateNonFatalErrorNode(C.getState())) {
+                    reportBug(N, Call.getOriginExpr(), C, BT_OutOfBounds,
+                              "std::span constructed with overflow length");
+                    return;
+                  }
+                }
+              }
+            }
+          } else if (ExplodedNode *N =
+                         C.generateNonFatalErrorNode(C.getState())) {
+            reportBug(N, Call.getOriginExpr(), C, BT_NonConstantSizeArg,
+                      "std::span constructed from std::array with non-constant "
+                      "length");
+            return;
+          }
+        }
+      }
+    }
+  }
+}
+
+void ento::registerBoundsInformationChecker(CheckerManager &mgr) {
+  mgr.registerChecker<BoundsInformationChecker>();
+}
+
+bool ento::shouldRegisterBoundsInformationChecker(const CheckerManager &mgr) {
+  return true;
+}
diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index 682cfa01bec963..76f8d68818bfb5 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangStaticAnalyzerCheckers
   BitwiseShiftChecker.cpp
   BlockInCriticalSectionChecker.cpp
   BoolAssignmentChecker.cpp
+  BoundsInformationChecker.cpp
   BuiltinFunctionChecker.cpp
   CStringChecker.cpp
   CStringSyntaxChecker.cpp
diff --git a/clang/test/Analysis/bounds-information-checker.cpp b/clang/test/Analysis/bounds-information-checker.cpp
new file mode 100644
index 00000000000000..332730b0bf3386
--- /dev/null
+++ b/clang/test/Analysis/bounds-information-checker.cpp
@@ -0,0 +1,76 @@
+// RUN: %clang %s -std=c++20 -Xclang -verify --analyze \
+// RUN:   -Xclang -analyzer-checker=core,alpha.cplusplus.BoundsInformation \
+// RUN:   -Xclang -analyzer-checker=debug.ExprInspection
+
+#include "std-span-system-header.h"
+
+//----------------------------------------------------------------------------//
+// std::span - no issue
+//----------------------------------------------------------------------------//
+
+void stdSpanFromCArrayNoIssue() {
+  char buffer[4] = { '3', '.', '1', '4' };
+
+  (void)std::span { buffer };                           // no-warning
+  (void)std::span<char> { buffer, sizeof(buffer) };     // no-warning
+  (void)std::span<char> { &buffer[0], sizeof(buffer) }; // no-warning
+
+  char *ptr = buffer;
+  size_t size = sizeof(buffer);
+  (void)std::span<char> { ptr, size };                  // no-warning
+}
+
+void stdSpanFromStdArrayNoIssue() {
+  std::array<char, 124> buffer { '3', '.', '1', '4' };
+
+  (void)std::span { buffer }.first(4);                             // no-warning
+  (void)std::span<char> { buffer.data(), buffer.size() }.first(4); // no-warning
+  (void)std::span<char> { &buffer[0], buffer.size() }.first(4);    // no-warning
+
+  char *ptr = buffer.data();
+  size_t size = buffer.size();
+  (void)std::span<char> { ptr, size }.first(4);                    // no-warning
+}
+
+void stdSpanFromStaticCArray() {
+  static const uint8_t prefixDeltaFrame[6] = { 0x00, 0x00, 0x00, 0x01, 0x21, 0xe0 };
+  (void)std::span<const uint8_t> { prefixDeltaFrame, sizeof(prefixDeltaFrame) };     // no-warning
+  (void)std::span<const uint8_t> { &prefixDeltaFrame[0], sizeof(prefixDeltaFrame) }; // no-warning
+}
+
+// No issue because arguments from the same memory region.
+// FIXME: This works locally, but not in actual WebKit code.
+class MappedFileData {
+public:
+  unsigned size() const { return m_fileSize; }
+  std::span<const uint8_t> span() const { return { static_cast<const uint8_t*>(m_fileData), size() }; } // no-warning
+  std::span<uint8_t> mutableSpan() { return { static_cast<uint8_t*>(m_fileData), size() }; }            // no-warning
+
+private:
+  void* m_fileData { nullptr };
+  unsigned m_fileSize { 0 };
+};
+
+//----------------------------------------------------------------------------//
+// std::span - warnings
+//----------------------------------------------------------------------------//
+
+void stdSpanFromStdArrayWarnings() {
+  std::array<char, 124> buffer { '3', '.', '1', '4' };
+  (void)std::span<char> { &buffer[0], 4 };
+  (void)std::span<char> { buffer.data(), 4 };
+}
+
+void stdSpanFromStdArrayOutOfBounds() {
+  std::array<char, 4> buffer { '3', '.', '1', '4' };
+  (void)std::span<char> { &buffer[0], 5 };    // expected-warning {{std::span constructed with overflow length}}
+  (void)std::span<char> { buffer.data(), 5 }; // expected-warning {{std::span constructed with overflow length}}
+}
+
+struct HexNumberBuffer {
+   std::array<char, 16> buffer;
+   unsigned length;
+
+   const char* characters() const { return &*(buffer.end() - length); }
+   std::span<const char> span() const { return { characters(), length }; } // expected-warning {{std::span constructed from std::array with non-constant length}}
+};
diff --git a/clang/test/Analysis/std-span-system-header.h b/clang/test/Analysis/std-span-system-header.h
new file mode 100644
index 00000000000000..5484bc3e8c0c4d
--- /dev/null
+++ b/clang/test/Analysis/std-span-system-header.h
@@ -0,0 +1,78 @@
+#pragma clang system_header
+
+#include "Inputs/system-header-simulator-cxx.h"
+
+namespace std {
+
+template <class _Tp, size_t _Size>
+struct array {
+  typedef array                                 __self;
+  typedef _Tp                                   value_type;
+  typedef value_type&                           reference;
+  typedef const value_type&                     const_reference;
+  typedef value_type*                           iterator;
+  typedef const value_type*                     const_iterator;
+  typedef value_type*                           pointer;
+  typedef const value_type*                     const_pointer;
+  typedef size_t                                size_type;
+
+  _Tp __elems_[_Size];
+
+  reference operator[](size_type __n);
+  const_reference operator[](size_type __n) const;
+
+  value_type* data();
+  const value_type* data() const;
+
+  size_type size() const;
+
+  iterator begin();
+  const_iterator begin() const;
+  iterator end();
+  const_iterator end() const;
+};
+
+inline constexpr size_t dynamic_extent = (size_t)-1;
+
+template <typename _Tp, size_t _Extent = dynamic_extent>
+class span {
+public:
+  using element_type           = _Tp;
+  using value_type             = remove_cv_t<_Tp>;
+  using size_type              = size_t;
+  using pointer                = _Tp *;
+
+  template <size_t _Sz = _Extent> requires(_Sz == 0)
+  constexpr span();
+
+  constexpr span(const span&);
+
+  template <typename _It>
+  constexpr span(_It* __first, size_type __count);
+
+  template <size_t _Sz>
+  constexpr span(element_type (&__arr)[_Sz]);
+
+  template <class _OtherElementType>
+  constexpr span(array<_OtherElementType, _Extent>& __arr);
+
+  template <class _OtherElementType>
+  constexpr span(const array<_OtherElementType, _Extent>& __arr);
+
+  constexpr pointer data() const;
+
+  constexpr size_type size() const;
+  constexpr size_type size_bytes() const;
+
+  constexpr span<element_type, dynamic_extent> first(size_type __count) const;
+};
+
+template<class _Tp, size_t _Sz>
+  span(_Tp (&)[_Sz]) -> span<_Tp, _Sz>;
+
+template<class _Tp, size_t _Sz>
+  span(array<_Tp, _Sz>&) -> span<_Tp, _Sz>;
+
+template<class _Tp, size_t _Sz>
+  span(const array<_Tp, _Sz>&) -> span<const _Tp, _Sz>;
+}

@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

Author: David Kilzer (ddkilzer)

Changes

Initial version of a bounds information checker to warn when the two-argument std::span constructor has a suspicious-looking size.


Full diff: https://github.com/llvm/llvm-project/pull/112784.diff

5 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+5)
  • (added) clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp (+199)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt (+1)
  • (added) clang/test/Analysis/bounds-information-checker.cpp (+76)
  • (added) clang/test/Analysis/std-span-system-header.h (+78)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 6e224a4e098ad2..49900f36f8fd10 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -775,6 +775,11 @@ def VirtualCallChecker : Checker<"VirtualCall">,
 
 let ParentPackage = CplusplusAlpha in {
 
+def BoundsInformationChecker : Checker<"BoundsInformation">,
+  HelpText<"Confirms that view objects such as std::span are constructed "
+           "within the bounds of the source buffer">,
+  Documentation<NotDocumented>;
+
 def ContainerModeling : Checker<"ContainerModeling">,
   HelpText<"Models C++ containers">,
   Documentation<NotDocumented>,
diff --git a/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp
new file mode 100644
index 00000000000000..8588c068e50f37
--- /dev/null
+++ b/clang/lib/StaticAnalyzer/Checkers/BoundsInformationChecker.cpp
@@ -0,0 +1,199 @@
+//== BoundsInformationChecker.cpp - bounds information checker --*- C++ -*--==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This defines BoundsInformationChecker, a path-sensitive checker that
+// checks that the buffer and count arguments are within the bounds of
+// the source buffer.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/CheckerManager.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+class BoundsInformationChecker : public Checker<check::PreCall> {
+  const BugType BT_DifferentMemRegion{
+      this, "std::span constructor arguments from different sources",
+      categories::SecurityError};
+  const BugType BT_NonConstantSizeArg{
+      this,
+      "std::span constructor for std::array has non-constant size argument",
+      categories::SecurityError};
+  const BugType BT_OutOfBounds{
+      this,
+      "std::span constructor for std::array uses out-of-bounds size argument",
+      categories::SecurityError};
+  void reportBug(ExplodedNode *N, const Expr *E, CheckerContext &C,
+                 const BugType &BT, StringRef Msg) const;
+
+public:
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
+};
+} // end anonymous namespace
+
+void BoundsInformationChecker::reportBug(ExplodedNode *N, const Expr *E,
+                                         CheckerContext &C, const BugType &BT,
+                                         StringRef Msg) const {
+  // Generate a report for this bug.
+  auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+  if (auto *CE = dyn_cast<CXXConstructExpr>(E)) {
+    bugreporter::trackExpressionValue(N, CE->getArg(0), *R);
+    bugreporter::trackExpressionValue(N, CE->getArg(1), *R);
+  }
+  C.emitReport(std::move(R));
+}
+
+static const MemRegion *GetRegionOrigin(SVal SV) {
+  const SymExpr *Sym = SV.getAsSymbol(/*IncludeBaseRegions =*/true);
+  return Sym ? Sym->getOriginRegion() : nullptr;
+}
+
+static const ValueDecl *GetExpressionOrigin(const Stmt *STMT) {
+  if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(STMT)) {
+    if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+      return VD;
+  } else if (const CXXMemberCallExpr *MCE = dyn_cast<CXXMemberCallExpr>(STMT)) {
+    if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(
+            MCE->getImplicitObjectArgument()->IgnoreParenCasts())) {
+      if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+        return VD;
+    } else if (const MemberExpr *ME = dyn_cast<MemberExpr>(
+                   MCE->getImplicitObjectArgument()->IgnoreParenCasts())) {
+      if (const FieldDecl *FD = dyn_cast<FieldDecl>(ME->getMemberDecl()))
+        return FD;
+    }
+  } else if (const CXXOperatorCallExpr *OCE =
+                 dyn_cast<CXXOperatorCallExpr>(STMT)) {
+    if (OCE->getNumArgs() >= 1) {
+      if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(OCE->getArg(0))) {
+        if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+          return VD;
+      }
+    }
+  } else if (const UnaryOperator *UnaryOp = dyn_cast<UnaryOperator>(STMT)) {
+    if (const ArraySubscriptExpr *ASExpr =
+            dyn_cast<ArraySubscriptExpr>(UnaryOp->getSubExpr())) {
+      if (const DeclRefExpr *DRE =
+              dyn_cast<DeclRefExpr>(ASExpr->getBase()->IgnoreParenCasts())) {
+        if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+          return VD;
+      }
+    }
+  } else if (const UnaryExprOrTypeTraitExpr *UTExpr =
+                 dyn_cast<UnaryExprOrTypeTraitExpr>(STMT)) {
+    if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(
+            UTExpr->getArgumentExpr()->IgnoreParenCasts())) {
+      if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+        return VD;
+    }
+  }
+  return nullptr;
+}
+
+static const ValueDecl *GetConjuredSymbolOrigin(SVal SV) {
+  const SymExpr *Sym = SV.getAsSymbol(/*IncludeBaseRegions =*/true);
+  if (const SymbolConjured *SCArg = dyn_cast_or_null<SymbolConjured>(Sym)) {
+    if (const Stmt *STMTArg = SCArg->getStmt())
+      return GetExpressionOrigin(STMTArg);
+  }
+  return nullptr;
+}
+
+void BoundsInformationChecker::checkPreCall(const CallEvent &Call,
+                                            CheckerContext &C) const {
+  // Return early if not std::span<IT>(IT, size_t) constructor.
+  // a. Check if this is a ctor for std::span.
+  CallDescription CD({CDM::CXXMethod, {"std", "span", "span"}});
+  if (!CD.matches(Call))
+    return;
+  // b. Check if std::span ctor has two arguments.
+  if (Call.getNumArgs() != 2)
+    return;
+  // c. Check if second std::span ctor argument is of type size_t.
+  if (Call.getArgExpr(1)->getType().getCanonicalType() !=
+      C.getASTContext().getSizeType())
+    return;
+
+  SVal PointerArg = Call.getArgSVal(0);
+  SVal SizeArg = Call.getArgSVal(1);
+
+  // If buffer and length params are not from the same "source", then report a
+  // bug.
+  const MemRegion *MRArg0 = GetRegionOrigin(PointerArg);
+  const MemRegion *MRArg1 = GetRegionOrigin(SizeArg);
+  if (MRArg0 && MRArg1 && MRArg0->getBaseRegion() != MRArg1->getBaseRegion()) {
+    // FIXME: Add more logic to filter out valid cases.
+    if (ExplodedNode *N = C.generateNonFatalErrorNode(C.getState())) {
+      reportBug(
+          N, Call.getOriginExpr(), C, BT_DifferentMemRegion,
+          "Constructor args for std::span are from different memory regions");
+      return;
+    }
+  }
+
+  // Check if value comes from an unknown function call.
+  const ValueDecl *VDArg0 = GetConjuredSymbolOrigin(PointerArg);
+  const ValueDecl *VDArg1 = GetConjuredSymbolOrigin(SizeArg);
+
+  if (VDArg0) {
+    // If first argument is std::array.
+    // FIXME: Support C arrays.
+    if (const auto *CRDecl0 = VDArg0->getType()->getAsCXXRecordDecl()) {
+      if (CRDecl0->isInStdNamespace() && CRDecl0->getIdentifier() &&
+          CRDecl0->getName() == "array") {
+        if (VDArg0 != VDArg1) {
+          // Check second argument against known size of std::array.
+          if (SizeArg.isConstant()) {
+            if (const auto *CTSDecl =
+                    dyn_cast<ClassTemplateSpecializationDecl>(CRDecl0)) {
+              const TemplateArgumentList &templateArgList =
+                  CTSDecl->getTemplateArgs();
+              if (templateArgList.size() == 2) {
+                const TemplateArgument &templateArg1 = templateArgList[1];
+                if (templateArg1.getKind() ==
+                        TemplateArgument::ArgKind::Integral &&
+                    *SizeArg.getAsInteger() > templateArg1.getAsIntegral()) {
+                  if (ExplodedNode *N =
+                          C.generateNonFatalErrorNode(C.getState())) {
+                    reportBug(N, Call.getOriginExpr(), C, BT_OutOfBounds,
+                              "std::span constructed with overflow length");
+                    return;
+                  }
+                }
+              }
+            }
+          } else if (ExplodedNode *N =
+                         C.generateNonFatalErrorNode(C.getState())) {
+            reportBug(N, Call.getOriginExpr(), C, BT_NonConstantSizeArg,
+                      "std::span constructed from std::array with non-constant "
+                      "length");
+            return;
+          }
+        }
+      }
+    }
+  }
+}
+
+void ento::registerBoundsInformationChecker(CheckerManager &mgr) {
+  mgr.registerChecker<BoundsInformationChecker>();
+}
+
+bool ento::shouldRegisterBoundsInformationChecker(const CheckerManager &mgr) {
+  return true;
+}
diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index 682cfa01bec963..76f8d68818bfb5 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangStaticAnalyzerCheckers
   BitwiseShiftChecker.cpp
   BlockInCriticalSectionChecker.cpp
   BoolAssignmentChecker.cpp
+  BoundsInformationChecker.cpp
   BuiltinFunctionChecker.cpp
   CStringChecker.cpp
   CStringSyntaxChecker.cpp
diff --git a/clang/test/Analysis/bounds-information-checker.cpp b/clang/test/Analysis/bounds-information-checker.cpp
new file mode 100644
index 00000000000000..332730b0bf3386
--- /dev/null
+++ b/clang/test/Analysis/bounds-information-checker.cpp
@@ -0,0 +1,76 @@
+// RUN: %clang %s -std=c++20 -Xclang -verify --analyze \
+// RUN:   -Xclang -analyzer-checker=core,alpha.cplusplus.BoundsInformation \
+// RUN:   -Xclang -analyzer-checker=debug.ExprInspection
+
+#include "std-span-system-header.h"
+
+//----------------------------------------------------------------------------//
+// std::span - no issue
+//----------------------------------------------------------------------------//
+
+void stdSpanFromCArrayNoIssue() {
+  char buffer[4] = { '3', '.', '1', '4' };
+
+  (void)std::span { buffer };                           // no-warning
+  (void)std::span<char> { buffer, sizeof(buffer) };     // no-warning
+  (void)std::span<char> { &buffer[0], sizeof(buffer) }; // no-warning
+
+  char *ptr = buffer;
+  size_t size = sizeof(buffer);
+  (void)std::span<char> { ptr, size };                  // no-warning
+}
+
+void stdSpanFromStdArrayNoIssue() {
+  std::array<char, 124> buffer { '3', '.', '1', '4' };
+
+  (void)std::span { buffer }.first(4);                             // no-warning
+  (void)std::span<char> { buffer.data(), buffer.size() }.first(4); // no-warning
+  (void)std::span<char> { &buffer[0], buffer.size() }.first(4);    // no-warning
+
+  char *ptr = buffer.data();
+  size_t size = buffer.size();
+  (void)std::span<char> { ptr, size }.first(4);                    // no-warning
+}
+
+void stdSpanFromStaticCArray() {
+  static const uint8_t prefixDeltaFrame[6] = { 0x00, 0x00, 0x00, 0x01, 0x21, 0xe0 };
+  (void)std::span<const uint8_t> { prefixDeltaFrame, sizeof(prefixDeltaFrame) };     // no-warning
+  (void)std::span<const uint8_t> { &prefixDeltaFrame[0], sizeof(prefixDeltaFrame) }; // no-warning
+}
+
+// No issue because arguments from the same memory region.
+// FIXME: This works locally, but not in actual WebKit code.
+class MappedFileData {
+public:
+  unsigned size() const { return m_fileSize; }
+  std::span<const uint8_t> span() const { return { static_cast<const uint8_t*>(m_fileData), size() }; } // no-warning
+  std::span<uint8_t> mutableSpan() { return { static_cast<uint8_t*>(m_fileData), size() }; }            // no-warning
+
+private:
+  void* m_fileData { nullptr };
+  unsigned m_fileSize { 0 };
+};
+
+//----------------------------------------------------------------------------//
+// std::span - warnings
+//----------------------------------------------------------------------------//
+
+void stdSpanFromStdArrayWarnings() {
+  std::array<char, 124> buffer { '3', '.', '1', '4' };
+  (void)std::span<char> { &buffer[0], 4 };
+  (void)std::span<char> { buffer.data(), 4 };
+}
+
+void stdSpanFromStdArrayOutOfBounds() {
+  std::array<char, 4> buffer { '3', '.', '1', '4' };
+  (void)std::span<char> { &buffer[0], 5 };    // expected-warning {{std::span constructed with overflow length}}
+  (void)std::span<char> { buffer.data(), 5 }; // expected-warning {{std::span constructed with overflow length}}
+}
+
+struct HexNumberBuffer {
+   std::array<char, 16> buffer;
+   unsigned length;
+
+   const char* characters() const { return &*(buffer.end() - length); }
+   std::span<const char> span() const { return { characters(), length }; } // expected-warning {{std::span constructed from std::array with non-constant length}}
+};
diff --git a/clang/test/Analysis/std-span-system-header.h b/clang/test/Analysis/std-span-system-header.h
new file mode 100644
index 00000000000000..5484bc3e8c0c4d
--- /dev/null
+++ b/clang/test/Analysis/std-span-system-header.h
@@ -0,0 +1,78 @@
+#pragma clang system_header
+
+#include "Inputs/system-header-simulator-cxx.h"
+
+namespace std {
+
+template <class _Tp, size_t _Size>
+struct array {
+  typedef array                                 __self;
+  typedef _Tp                                   value_type;
+  typedef value_type&                           reference;
+  typedef const value_type&                     const_reference;
+  typedef value_type*                           iterator;
+  typedef const value_type*                     const_iterator;
+  typedef value_type*                           pointer;
+  typedef const value_type*                     const_pointer;
+  typedef size_t                                size_type;
+
+  _Tp __elems_[_Size];
+
+  reference operator[](size_type __n);
+  const_reference operator[](size_type __n) const;
+
+  value_type* data();
+  const value_type* data() const;
+
+  size_type size() const;
+
+  iterator begin();
+  const_iterator begin() const;
+  iterator end();
+  const_iterator end() const;
+};
+
+inline constexpr size_t dynamic_extent = (size_t)-1;
+
+template <typename _Tp, size_t _Extent = dynamic_extent>
+class span {
+public:
+  using element_type           = _Tp;
+  using value_type             = remove_cv_t<_Tp>;
+  using size_type              = size_t;
+  using pointer                = _Tp *;
+
+  template <size_t _Sz = _Extent> requires(_Sz == 0)
+  constexpr span();
+
+  constexpr span(const span&);
+
+  template <typename _It>
+  constexpr span(_It* __first, size_type __count);
+
+  template <size_t _Sz>
+  constexpr span(element_type (&__arr)[_Sz]);
+
+  template <class _OtherElementType>
+  constexpr span(array<_OtherElementType, _Extent>& __arr);
+
+  template <class _OtherElementType>
+  constexpr span(const array<_OtherElementType, _Extent>& __arr);
+
+  constexpr pointer data() const;
+
+  constexpr size_type size() const;
+  constexpr size_type size_bytes() const;
+
+  constexpr span<element_type, dynamic_extent> first(size_type __count) const;
+};
+
+template<class _Tp, size_t _Sz>
+  span(_Tp (&)[_Sz]) -> span<_Tp, _Sz>;
+
+template<class _Tp, size_t _Sz>
+  span(array<_Tp, _Sz>&) -> span<_Tp, _Sz>;
+
+template<class _Tp, size_t _Sz>
+  span(const array<_Tp, _Sz>&) -> span<const _Tp, _Sz>;
+}

@ddkilzer
Copy link
Author

ddkilzer commented Oct 17, 2024

This checker is intended to be a companion to C++ Safe Buffers and -Wunsafe-buffer-usage, but not to replace it.

@haoNoQ
Copy link
Collaborator

haoNoQ commented Oct 18, 2024

but not to replace it

And it doesn't necessarily need it in the first place! I think it's most likely going to be useful as a standalone checker even when you never needed -Wunsafe-buffer-usage.

It's a bit coding-convention-y: roughly on the same level of "speculative" as the use-after-move checker for local variables. We don't plan to make sure that every warning is a real OOB bug. We will sometimes warn when the code is "scary-looking but ultimately benign". But depending on what we see, I think there's a solid chance it could be an on-by-default checker.

The biggest thing we're going after is the situations where the attacker controls the buffer size, or the span size, or both-independently, but we're trying to see if this can be detected even without taint analysis. This may work because the buffer and the size typically go "together". For example, as a struct with pointer and size fields, or as a pair of parameters. So if you're using a parameter for the size while using your member variable for the pointer, this means that something really weird is going on. Even if it's ultimately benign, we gotta say something about this. We could still use taint analysis as an extra source of information, but the idea is, maybe we don't even have to!

// bug.
const MemRegion *MRArg0 = GetRegionOrigin(PointerArg);
const MemRegion *MRArg1 = GetRegionOrigin(SizeArg);
if (MRArg0 && MRArg1 && MRArg0->getBaseRegion() != MRArg1->getBaseRegion()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heuristic will probably require the most tweaking. At a glance, the next step is probably to accept code such as

void foo(int *ptr, size_t sz) {
    std::span{ptr, sz};
}

where ptr and sz have different base regions (they're two separate parameters) but it's easy to see that they both belong to the same StackArgumentsSpaceRegion (which confirms not only that they're parameters of the same function, but also that they're arguments of the same call).

} else if (ExplodedNode *N =
C.generateNonFatalErrorNode(C.getState())) {
reportBug(N, Call.getOriginExpr(), C, BT_NonConstantSizeArg,
"std::span constructed from std::array with non-constant "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that it's really a std::array on this branch? I think GetExpressionOrigin() works with raw arrays too.

@@ -0,0 +1,76 @@
// RUN: %clang %s -std=c++20 -Xclang -verify --analyze \
// RUN: -Xclang -analyzer-checker=core,alpha.cplusplus.BoundsInformation \
// RUN: -Xclang -analyzer-checker=debug.ExprInspection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExprInspection is probably unnecessary. (It's a checker built for the purposes of debugging/testing the engine, mostly. It reacts on magic function calls such as clang_analyzer_eval(). You don't have any of those magic calls in the test. See also https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html)

if (ExplodedNode *N = C.generateNonFatalErrorNode(C.getState())) {
reportBug(
N, Call.getOriginExpr(), C, BT_DifferentMemRegion,
"Constructor args for std::span are from different memory regions");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have any "positive" tests for this warning yet. It's a good idea to have some!

Wording: "memory regions" is our local terminology. Users won't necessarily understand what it means. I think we should say "from different sources" or something like that. Though, of course, this warning will probably change quite a bit as we figure out the specific cases in which we want to emit it. (It may be a good idea to have different warnings for different cases.)

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the direction looks good to me, some nitpicky comments inline.

C.emitReport(std::move(R));
}

static const MemRegion *GetRegionOrigin(SVal SV) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we usually start function and method names with a lower-case letter.

}
}
} else if (const UnaryOperator *UnaryOp = dyn_cast<UnaryOperator>(STMT)) {
if (const ArraySubscriptExpr *ASExpr =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work properly for nesting? E.g., foo[i][j]. I wonder if this needs to be recursive. Or is this a scenario that we explicitly don't want to support yet?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with nesting is that std::span itself doesn't work properly with nesting. A std::span<std::span<T>> doesn't really represent a view into std::array<std::array<T>>. This is why std::mdspan is a thing, but it's an entire completely different thing.

return Sym ? Sym->getOriginRegion() : nullptr;
}

static const ValueDecl *GetExpressionOrigin(const Stmt *STMT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if there is a way to get around pattern matching. Is it possible, that the same information could be derived from the memory regions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely!!

This code effectively handles SymbolConjured, i.e. the one place where we don't have such information.

But even then, if we simply had access to eg. the argument SVals, the 'this' SVal of the call that produced the SymbolConjured, we'd be in a much better shape.

const ValueDecl *VDArg1 = GetConjuredSymbolOrigin(SizeArg);

if (VDArg0) {
// If first argument is std::array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this check might want to support multiple types (std::array, C array, and maybe even more in the future, like we could have std::vectors with known sizes (if the std::vector is const and we know the size from the ctor), we might want to move this logic of deriving the size to a separate function.

@ddkilzer
Copy link
Author

Thanks for the feedback! I will work on updates this week (around the WebKit Contributors meeting, which happens to be the same week as the LLVM Developer's Meeting).

In the meantime, the checker (using the original PR) has already found some std::span anti-patterns in WebKit that were easy to fix:

Also, in addition to creating a std::span from function parameters (mentioned in this feedback), this checker found another pattern that we may want the checker to recognize, which is a function that returns a pointer and takes a reference (or a pointer) to a length variable (example take from this code in WebKit):

        UErrorCode status = U_ZERO_ERROR;
        [...]
            int32_t length = 0;
            const char* pointer = uenum_next(enumeration.get(), &length, &status);
            String calendar(std::span { pointer, static_cast<size_t>(length) });

Where the uenum_next() function is defined as:

const char* uenum_next(UEnumeration* en, int32_t* resultLength, UErrorCode* status);

One could also imagine a function that returns a length and passes in a pointer or a reference to a buffer.

However, we want to be careful about accepting this pattern if the function is in the current project, which assumes it could be improved, like this example for PAL::TextCodec::getUnencodableReplacement():

@haoNoQ
Copy link
Collaborator

haoNoQ commented Oct 23, 2024

a function that returns a pointer and takes a reference (or a pointer) to a length variable

Yes, this one should be easy to catch. Both values will be SymbolConjured pointing to the same function call expression / program point. (The out-parameter value may also be SymbolDerived which is derived from the same SymbolConjured; you may need to carefully unwrap it, as well as make sure that it's not falling into the "known origin region" category because SymbolDerived does indeed respond nicely to ->getOriginRegion()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants