Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-tidy] Add bugprone-suspicious-stringview-data-usage check #83716

Conversation

PiotrZSL
Copy link
Member

@PiotrZSL PiotrZSL commented Mar 3, 2024

This check identifies suspicious usages of std::string_view::data() that could lead to reading out-of-bounds data due to inadequate or incorrect string null termination.

Closes #80854

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 3, 2024

@llvm/pr-subscribers-clang-tools-extra

Author: Piotr Zegar (PiotrZSL)

Changes

This check identifies suspicious usages of std::string_view::data() that could lead to reading out-of-bounds data due to inadequate or incorrect string null termination.

Closes #80854


Full diff: https://github.com/llvm/llvm-project/pull/83716.diff

9 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp (+3)
  • (modified) clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt (+1)
  • (added) clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp (+100)
  • (added) clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h (+38)
  • (modified) clang-tools-extra/docs/ReleaseNotes.rst (+7)
  • (added) clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst (+58)
  • (modified) clang-tools-extra/docs/clang-tidy/checks/list.rst (+1)
  • (modified) clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string (+7)
  • (added) clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp (+48)
diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
index a8a23b045f80bb..4040399edbcb81 100644
--- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
+++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
@@ -72,6 +72,7 @@
 #include "SuspiciousReallocUsageCheck.h"
 #include "SuspiciousSemicolonCheck.h"
 #include "SuspiciousStringCompareCheck.h"
+#include "SuspiciousStringviewDataUsageCheck.h"
 #include "SwappedArgumentsCheck.h"
 #include "SwitchMissingDefaultCaseCheck.h"
 #include "TerminatingContinueCheck.h"
@@ -217,6 +218,8 @@ class BugproneModule : public ClangTidyModule {
         "bugprone-suspicious-semicolon");
     CheckFactories.registerCheck<SuspiciousStringCompareCheck>(
         "bugprone-suspicious-string-compare");
+    CheckFactories.registerCheck<SuspiciousStringviewDataUsageCheck>(
+        "bugprone-suspicious-stringview-data-usage");
     CheckFactories.registerCheck<SwappedArgumentsCheck>(
         "bugprone-swapped-arguments");
     CheckFactories.registerCheck<TerminatingContinueCheck>(
diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
index 1cd6fb207d7625..db65ce8cb1567b 100644
--- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
@@ -26,6 +26,7 @@ add_clang_library(clangTidyBugproneModule
   ImplicitWideningOfMultiplicationResultCheck.cpp
   InaccurateEraseCheck.cpp
   IncorrectEnableIfCheck.cpp
+  SuspiciousStringviewDataUsageCheck.cpp
   SwitchMissingDefaultCaseCheck.cpp
   IncDecInConditionsCheck.cpp
   IncorrectRoundingsCheck.cpp
diff --git a/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp
new file mode 100644
index 00000000000000..ffb31840c4c886
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp
@@ -0,0 +1,100 @@
+//===--- SuspiciousStringviewDataUsageCheck.cpp - clang-tidy --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "SuspiciousStringviewDataUsageCheck.h"
+#include "../utils/Matchers.h"
+#include "../utils/OptionsUtils.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang::tidy::bugprone {
+
+SuspiciousStringviewDataUsageCheck::SuspiciousStringviewDataUsageCheck(
+    StringRef Name, ClangTidyContext *Context)
+    : ClangTidyCheck(Name, Context),
+      StringViewTypes(utils::options::parseStringList(Options.get(
+          "StringViewTypes", "::std::basic_string_view;::llvm::StringRef"))),
+      AllowedCallees(
+          utils::options::parseStringList(Options.get("AllowedCallees", ""))) {}
+
+void SuspiciousStringviewDataUsageCheck::storeOptions(
+    ClangTidyOptions::OptionMap &Opts) {
+  Options.store(Opts, "StringViewTypes",
+                utils::options::serializeStringList(StringViewTypes));
+  Options.store(Opts, "AllowedCallees",
+                utils::options::serializeStringList(AllowedCallees));
+}
+
+bool SuspiciousStringviewDataUsageCheck::isLanguageVersionSupported(
+    const LangOptions &LangOpts) const {
+  return LangOpts.CPlusPlus;
+}
+
+std::optional<TraversalKind>
+SuspiciousStringviewDataUsageCheck::getCheckTraversalKind() const {
+  return TK_AsIs;
+}
+
+void SuspiciousStringviewDataUsageCheck::registerMatchers(MatchFinder *Finder) {
+
+  auto AncestorCall = anyOf(
+      cxxConstructExpr(), callExpr(unless(cxxOperatorCallExpr())), lambdaExpr(),
+      initListExpr(
+          hasType(qualType(hasCanonicalType(hasDeclaration(recordDecl()))))));
+
+  auto DataMethod =
+      cxxMethodDecl(hasName("data"),
+                    ofClass(matchers::matchesAnyListedName(StringViewTypes)));
+
+  auto DataWithSelfCall =
+      cxxMemberCallExpr(on(ignoringParenImpCasts(expr().bind("self"))),
+                        callee(DataMethod))
+          .bind("data-call");
+  auto SizeCall = cxxMemberCallExpr(
+      callee(cxxMethodDecl(hasAnyName("size", "length"))),
+      on(ignoringParenImpCasts(
+          matchers::isStatementIdenticalToBoundNode("self"))));
+
+  Finder->addMatcher(
+      cxxMemberCallExpr(
+          on(ignoringParenImpCasts(expr().bind("self"))), callee(DataMethod),
+          expr().bind("data-call"),
+          hasParent(expr(anyOf(
+              invocation(
+                  expr().bind("call"), unless(cxxOperatorCallExpr()),
+                  hasAnyArgument(
+                      ignoringParenImpCasts(equalsBoundNode("data-call"))),
+                  unless(hasAnyArgument(ignoringParenImpCasts(SizeCall))),
+                  unless(hasAnyArgument(hasDescendant(expr(
+                      SizeCall,
+                      hasAncestor(expr(AncestorCall).bind("ancestor-size")),
+                      hasAncestor(expr(equalsBoundNode("call"),
+                                       equalsBoundNode("ancestor-size"))))))),
+                  hasDeclaration(namedDecl(
+                      unless(matchers::matchesAnyListedName(AllowedCallees))))),
+              initListExpr(expr().bind("init"),
+                           hasType(qualType(hasCanonicalType(hasDeclaration(
+                               recordDecl(unless(matchers::matchesAnyListedName(
+                                   AllowedCallees))))))),
+                           unless(has(ignoringParenImpCasts(SizeCall)))))))),
+      this);
+}
+
+void SuspiciousStringviewDataUsageCheck::check(
+    const MatchFinder::MatchResult &Result) {
+  const auto *DataCallExpr =
+      Result.Nodes.getNodeAs<CXXMemberCallExpr>("data-call");
+  diag(DataCallExpr->getExprLoc(),
+       "result of a `data()` call may not be null terminated, provide size "
+       "information to the callee to prevent potential issues")
+      << DataCallExpr->getCallee()->getSourceRange();
+}
+
+} // namespace clang::tidy::bugprone
diff --git a/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h
new file mode 100644
index 00000000000000..31eca0a48722fe
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h
@@ -0,0 +1,38 @@
+//===--- SuspiciousStringviewDataUsageCheck.h - clang-tidy -------//C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang::tidy::bugprone {
+
+/// Identifies suspicious usages of std::string_view::data() that could lead to
+/// reading out-of-bounds data due to inadequate or incorrect string null
+/// termination.
+///
+/// For the user-facing documentation see:
+/// http://clang.llvm.org/extra/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.html
+class SuspiciousStringviewDataUsageCheck : public ClangTidyCheck {
+public:
+  SuspiciousStringviewDataUsageCheck(StringRef Name, ClangTidyContext *Context);
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+  void storeOptions(ClangTidyOptions::OptionMap &Opts) override;
+  bool isLanguageVersionSupported(const LangOptions &LangOpts) const override;
+  std::optional<TraversalKind> getCheckTraversalKind() const override;
+
+private:
+  std::vector<llvm::StringRef> StringViewTypes;
+  std::vector<llvm::StringRef> AllowedCallees;
+};
+
+} // namespace clang::tidy::bugprone
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 5bae530e942384..a5dfda0a9e42db 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -104,6 +104,13 @@ Improvements to clang-tidy
 New checks
 ^^^^^^^^^^
 
+- New :doc:`bugprone-suspicious-stringview-data-usage
+  <clang-tidy/checks/bugprone/suspicious-stringview-data-usage>` check.
+
+  Identifies suspicious usages of ``std::string_view::data()`` that could lead
+  to reading out-of-bounds data due to inadequate or incorrect string null
+  termination.
+
 - New :doc:`modernize-use-designated-initializers
   <clang-tidy/checks/modernize/use-designated-initializers>` check.
 
diff --git a/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst b/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst
new file mode 100644
index 00000000000000..9b38d836018103
--- /dev/null
+++ b/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst
@@ -0,0 +1,58 @@
+.. title:: clang-tidy - bugprone-suspicious-stringview-data-usage
+
+bugprone-suspicious-stringview-data-usage
+=========================================
+
+Identifies suspicious usages of ``std::string_view::data()`` that could lead to
+reading out-of-bounds data due to inadequate or incorrect string null
+termination.
+
+It warns when the result of ``data()`` is passed to a constructor or function
+without also passing the corresponding result of ``size()`` or ``length()``
+member function. Such usage can lead to unintended behavior, particularly when
+assuming the data pointed to by ``data()`` is null-terminated.
+
+The absence of a ``c_str()`` method in ``std::string_view`` often leads
+developers to use ``data()`` as a substitute, especially when interfacing with
+C APIs that expect null-terminated strings. However, since ``data()`` does not
+guarantee null termination, this can result in unintended behavior if the API
+relies on proper null termination for correct string interpretation.
+
+In today's programming landscape, this scenario can occur when implicitly
+converting an ``std::string_view`` to an ``std::string``. Since the constructor
+in ``std::string`` designed for string-view-like objects is ``explicit``,
+attempting to pass an ``std::string_view`` to a function expecting an
+``std::string`` will result in a compilation error. As a workaround, developers
+may be tempted to utilize the ``.data()`` method to achieve compilation,
+introducing potential risks.
+
+For instance:
+
+.. code-block:: c++
+
+  void printString(const std::string& str) {
+    std::cout << "String: " << str << std::endl;
+  }
+
+  void something(std::string_view sv) {
+    printString(sv.data());
+  }
+
+In this example, directly passing ``sv`` to the ``printString`` function would
+lead to a compilation error due to the explicit nature of the ``std::string``
+constructor. Consequently, developers might opt for ``sv.data()`` to resolve the
+compilation error, albeit introducing potential hazards as discussed.
+
+.. option:: StringViewTypes
+
+  Option allows users to specify custom string view-like types for analysis. It
+  accepts a semicolon-separated list of type names or regular expressions
+  matching these types. Default value is:
+  `::std::basic_string_view;::llvm::StringRef`.
+
+.. option:: AllowedCallees
+
+  Specifies methods, functions, or classes where the result of ``.data()`` is
+  passed to. Allows to exclude such calls from the analysis. Accepts a
+  semicolon-separated list of names or regular expressions matching these
+  entities. Default value is: empty string.
diff --git a/clang-tools-extra/docs/clang-tidy/checks/list.rst b/clang-tools-extra/docs/clang-tidy/checks/list.rst
index 5e57bc0ee483fe..5d649c3afa8e4a 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -138,6 +138,7 @@ Clang-Tidy Checks
    :doc:`bugprone-suspicious-realloc-usage <bugprone/suspicious-realloc-usage>`,
    :doc:`bugprone-suspicious-semicolon <bugprone/suspicious-semicolon>`, "Yes"
    :doc:`bugprone-suspicious-string-compare <bugprone/suspicious-string-compare>`, "Yes"
+   :doc:`bugprone-suspicious-stringview-data-usage <bugprone/suspicious-stringview-data-usage>`,
    :doc:`bugprone-swapped-arguments <bugprone/swapped-arguments>`, "Yes"
    :doc:`bugprone-switch-missing-default-case <bugprone/switch-missing-default-case>`,
    :doc:`bugprone-terminating-continue <bugprone/terminating-continue>`, "Yes"
diff --git a/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string b/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
index f2e4159a224513..28e2b4a231e52e 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
+++ b/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
@@ -24,6 +24,7 @@ struct basic_string {
   basic_string();
   basic_string(const C *p, const A &a = A());
   basic_string(const C *p, size_type count);
+  basic_string(const C *b, const C *e);
 
   ~basic_string();
 
@@ -85,6 +86,12 @@ struct basic_string_view {
   const C *str;
   constexpr basic_string_view(const C* s) : str(s) {}
 
+  const C *data() const;
+
+  bool empty() const;
+  size_type size() const;
+  size_type length() const;
+
   size_type find(_Type v, size_type pos = 0) const;
   size_type find(C ch, size_type pos = 0) const;
   size_type find(const C* s, size_type pos, size_type count) const;
diff --git a/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp b/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp
new file mode 100644
index 00000000000000..638ce8591a24ac
--- /dev/null
+++ b/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp
@@ -0,0 +1,48 @@
+// RUN: %check_clang_tidy -std=c++17-or-later %s bugprone-suspicious-stringview-data-usage %t -- -- -isystem %clang_tidy_headers
+#include <string>
+
+struct View {
+   const char* str;
+};
+
+struct ViewWithSize {
+   const char* str;
+   std::string_view::size_type size;
+};
+
+void something(const char*);
+void something(const char*, unsigned);
+void something(const char*, unsigned, const char*);
+void something_str(std::string, unsigned);
+
+void invalid(std::string_view sv, std::string_view sv2) {
+  std::string s(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:20: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  std::string si{sv.data()};
+// CHECK-MESSAGES: :[[@LINE-1]]:21: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  std::string_view s2(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:26: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:16: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something(sv.data(), sv.size(), sv2.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:39: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something_str(sv.data(), sv.size());
+// CHECK-MESSAGES: :[[@LINE-1]]:20: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  View view{sv.data()};
+// CHECK-MESSAGES: :[[@LINE-1]]:16: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+}
+
+void valid(std::string_view sv) {
+  std::string s1(sv.data(), sv.data() + sv.size());
+  std::string s2(sv.data(), sv.data() + sv.length());
+  std::string s3(sv.data(), sv.size() + sv.data());
+  std::string s4(sv.data(), sv.length() + sv.data());
+  std::string s5(sv.data(), sv.size());
+  std::string s6(sv.data(), sv.length());
+  something(sv.data(), sv.size());
+  something(sv.data(), sv.length());
+  ViewWithSize view1{sv.data(), sv.size()};
+  ViewWithSize view2{sv.data(), sv.length()};
+
+  const char* str{sv.data()};
+}

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 3, 2024

@llvm/pr-subscribers-clang-tidy

Author: Piotr Zegar (PiotrZSL)

Changes

This check identifies suspicious usages of std::string_view::data() that could lead to reading out-of-bounds data due to inadequate or incorrect string null termination.

Closes #80854


Full diff: https://github.com/llvm/llvm-project/pull/83716.diff

9 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp (+3)
  • (modified) clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt (+1)
  • (added) clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp (+100)
  • (added) clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h (+38)
  • (modified) clang-tools-extra/docs/ReleaseNotes.rst (+7)
  • (added) clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst (+58)
  • (modified) clang-tools-extra/docs/clang-tidy/checks/list.rst (+1)
  • (modified) clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string (+7)
  • (added) clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp (+48)
diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
index a8a23b045f80bb..4040399edbcb81 100644
--- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
+++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
@@ -72,6 +72,7 @@
 #include "SuspiciousReallocUsageCheck.h"
 #include "SuspiciousSemicolonCheck.h"
 #include "SuspiciousStringCompareCheck.h"
+#include "SuspiciousStringviewDataUsageCheck.h"
 #include "SwappedArgumentsCheck.h"
 #include "SwitchMissingDefaultCaseCheck.h"
 #include "TerminatingContinueCheck.h"
@@ -217,6 +218,8 @@ class BugproneModule : public ClangTidyModule {
         "bugprone-suspicious-semicolon");
     CheckFactories.registerCheck<SuspiciousStringCompareCheck>(
         "bugprone-suspicious-string-compare");
+    CheckFactories.registerCheck<SuspiciousStringviewDataUsageCheck>(
+        "bugprone-suspicious-stringview-data-usage");
     CheckFactories.registerCheck<SwappedArgumentsCheck>(
         "bugprone-swapped-arguments");
     CheckFactories.registerCheck<TerminatingContinueCheck>(
diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
index 1cd6fb207d7625..db65ce8cb1567b 100644
--- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
@@ -26,6 +26,7 @@ add_clang_library(clangTidyBugproneModule
   ImplicitWideningOfMultiplicationResultCheck.cpp
   InaccurateEraseCheck.cpp
   IncorrectEnableIfCheck.cpp
+  SuspiciousStringviewDataUsageCheck.cpp
   SwitchMissingDefaultCaseCheck.cpp
   IncDecInConditionsCheck.cpp
   IncorrectRoundingsCheck.cpp
diff --git a/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp
new file mode 100644
index 00000000000000..ffb31840c4c886
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.cpp
@@ -0,0 +1,100 @@
+//===--- SuspiciousStringviewDataUsageCheck.cpp - clang-tidy --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "SuspiciousStringviewDataUsageCheck.h"
+#include "../utils/Matchers.h"
+#include "../utils/OptionsUtils.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang::tidy::bugprone {
+
+SuspiciousStringviewDataUsageCheck::SuspiciousStringviewDataUsageCheck(
+    StringRef Name, ClangTidyContext *Context)
+    : ClangTidyCheck(Name, Context),
+      StringViewTypes(utils::options::parseStringList(Options.get(
+          "StringViewTypes", "::std::basic_string_view;::llvm::StringRef"))),
+      AllowedCallees(
+          utils::options::parseStringList(Options.get("AllowedCallees", ""))) {}
+
+void SuspiciousStringviewDataUsageCheck::storeOptions(
+    ClangTidyOptions::OptionMap &Opts) {
+  Options.store(Opts, "StringViewTypes",
+                utils::options::serializeStringList(StringViewTypes));
+  Options.store(Opts, "AllowedCallees",
+                utils::options::serializeStringList(AllowedCallees));
+}
+
+bool SuspiciousStringviewDataUsageCheck::isLanguageVersionSupported(
+    const LangOptions &LangOpts) const {
+  return LangOpts.CPlusPlus;
+}
+
+std::optional<TraversalKind>
+SuspiciousStringviewDataUsageCheck::getCheckTraversalKind() const {
+  return TK_AsIs;
+}
+
+void SuspiciousStringviewDataUsageCheck::registerMatchers(MatchFinder *Finder) {
+
+  auto AncestorCall = anyOf(
+      cxxConstructExpr(), callExpr(unless(cxxOperatorCallExpr())), lambdaExpr(),
+      initListExpr(
+          hasType(qualType(hasCanonicalType(hasDeclaration(recordDecl()))))));
+
+  auto DataMethod =
+      cxxMethodDecl(hasName("data"),
+                    ofClass(matchers::matchesAnyListedName(StringViewTypes)));
+
+  auto DataWithSelfCall =
+      cxxMemberCallExpr(on(ignoringParenImpCasts(expr().bind("self"))),
+                        callee(DataMethod))
+          .bind("data-call");
+  auto SizeCall = cxxMemberCallExpr(
+      callee(cxxMethodDecl(hasAnyName("size", "length"))),
+      on(ignoringParenImpCasts(
+          matchers::isStatementIdenticalToBoundNode("self"))));
+
+  Finder->addMatcher(
+      cxxMemberCallExpr(
+          on(ignoringParenImpCasts(expr().bind("self"))), callee(DataMethod),
+          expr().bind("data-call"),
+          hasParent(expr(anyOf(
+              invocation(
+                  expr().bind("call"), unless(cxxOperatorCallExpr()),
+                  hasAnyArgument(
+                      ignoringParenImpCasts(equalsBoundNode("data-call"))),
+                  unless(hasAnyArgument(ignoringParenImpCasts(SizeCall))),
+                  unless(hasAnyArgument(hasDescendant(expr(
+                      SizeCall,
+                      hasAncestor(expr(AncestorCall).bind("ancestor-size")),
+                      hasAncestor(expr(equalsBoundNode("call"),
+                                       equalsBoundNode("ancestor-size"))))))),
+                  hasDeclaration(namedDecl(
+                      unless(matchers::matchesAnyListedName(AllowedCallees))))),
+              initListExpr(expr().bind("init"),
+                           hasType(qualType(hasCanonicalType(hasDeclaration(
+                               recordDecl(unless(matchers::matchesAnyListedName(
+                                   AllowedCallees))))))),
+                           unless(has(ignoringParenImpCasts(SizeCall)))))))),
+      this);
+}
+
+void SuspiciousStringviewDataUsageCheck::check(
+    const MatchFinder::MatchResult &Result) {
+  const auto *DataCallExpr =
+      Result.Nodes.getNodeAs<CXXMemberCallExpr>("data-call");
+  diag(DataCallExpr->getExprLoc(),
+       "result of a `data()` call may not be null terminated, provide size "
+       "information to the callee to prevent potential issues")
+      << DataCallExpr->getCallee()->getSourceRange();
+}
+
+} // namespace clang::tidy::bugprone
diff --git a/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h
new file mode 100644
index 00000000000000..31eca0a48722fe
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/SuspiciousStringviewDataUsageCheck.h
@@ -0,0 +1,38 @@
+//===--- SuspiciousStringviewDataUsageCheck.h - clang-tidy -------//C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang::tidy::bugprone {
+
+/// Identifies suspicious usages of std::string_view::data() that could lead to
+/// reading out-of-bounds data due to inadequate or incorrect string null
+/// termination.
+///
+/// For the user-facing documentation see:
+/// http://clang.llvm.org/extra/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.html
+class SuspiciousStringviewDataUsageCheck : public ClangTidyCheck {
+public:
+  SuspiciousStringviewDataUsageCheck(StringRef Name, ClangTidyContext *Context);
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+  void storeOptions(ClangTidyOptions::OptionMap &Opts) override;
+  bool isLanguageVersionSupported(const LangOptions &LangOpts) const override;
+  std::optional<TraversalKind> getCheckTraversalKind() const override;
+
+private:
+  std::vector<llvm::StringRef> StringViewTypes;
+  std::vector<llvm::StringRef> AllowedCallees;
+};
+
+} // namespace clang::tidy::bugprone
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_SUSPICIOUSSTRINGVIEWDATAUSAGECHECK_H
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 5bae530e942384..a5dfda0a9e42db 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -104,6 +104,13 @@ Improvements to clang-tidy
 New checks
 ^^^^^^^^^^
 
+- New :doc:`bugprone-suspicious-stringview-data-usage
+  <clang-tidy/checks/bugprone/suspicious-stringview-data-usage>` check.
+
+  Identifies suspicious usages of ``std::string_view::data()`` that could lead
+  to reading out-of-bounds data due to inadequate or incorrect string null
+  termination.
+
 - New :doc:`modernize-use-designated-initializers
   <clang-tidy/checks/modernize/use-designated-initializers>` check.
 
diff --git a/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst b/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst
new file mode 100644
index 00000000000000..9b38d836018103
--- /dev/null
+++ b/clang-tools-extra/docs/clang-tidy/checks/bugprone/suspicious-stringview-data-usage.rst
@@ -0,0 +1,58 @@
+.. title:: clang-tidy - bugprone-suspicious-stringview-data-usage
+
+bugprone-suspicious-stringview-data-usage
+=========================================
+
+Identifies suspicious usages of ``std::string_view::data()`` that could lead to
+reading out-of-bounds data due to inadequate or incorrect string null
+termination.
+
+It warns when the result of ``data()`` is passed to a constructor or function
+without also passing the corresponding result of ``size()`` or ``length()``
+member function. Such usage can lead to unintended behavior, particularly when
+assuming the data pointed to by ``data()`` is null-terminated.
+
+The absence of a ``c_str()`` method in ``std::string_view`` often leads
+developers to use ``data()`` as a substitute, especially when interfacing with
+C APIs that expect null-terminated strings. However, since ``data()`` does not
+guarantee null termination, this can result in unintended behavior if the API
+relies on proper null termination for correct string interpretation.
+
+In today's programming landscape, this scenario can occur when implicitly
+converting an ``std::string_view`` to an ``std::string``. Since the constructor
+in ``std::string`` designed for string-view-like objects is ``explicit``,
+attempting to pass an ``std::string_view`` to a function expecting an
+``std::string`` will result in a compilation error. As a workaround, developers
+may be tempted to utilize the ``.data()`` method to achieve compilation,
+introducing potential risks.
+
+For instance:
+
+.. code-block:: c++
+
+  void printString(const std::string& str) {
+    std::cout << "String: " << str << std::endl;
+  }
+
+  void something(std::string_view sv) {
+    printString(sv.data());
+  }
+
+In this example, directly passing ``sv`` to the ``printString`` function would
+lead to a compilation error due to the explicit nature of the ``std::string``
+constructor. Consequently, developers might opt for ``sv.data()`` to resolve the
+compilation error, albeit introducing potential hazards as discussed.
+
+.. option:: StringViewTypes
+
+  Option allows users to specify custom string view-like types for analysis. It
+  accepts a semicolon-separated list of type names or regular expressions
+  matching these types. Default value is:
+  `::std::basic_string_view;::llvm::StringRef`.
+
+.. option:: AllowedCallees
+
+  Specifies methods, functions, or classes where the result of ``.data()`` is
+  passed to. Allows to exclude such calls from the analysis. Accepts a
+  semicolon-separated list of names or regular expressions matching these
+  entities. Default value is: empty string.
diff --git a/clang-tools-extra/docs/clang-tidy/checks/list.rst b/clang-tools-extra/docs/clang-tidy/checks/list.rst
index 5e57bc0ee483fe..5d649c3afa8e4a 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -138,6 +138,7 @@ Clang-Tidy Checks
    :doc:`bugprone-suspicious-realloc-usage <bugprone/suspicious-realloc-usage>`,
    :doc:`bugprone-suspicious-semicolon <bugprone/suspicious-semicolon>`, "Yes"
    :doc:`bugprone-suspicious-string-compare <bugprone/suspicious-string-compare>`, "Yes"
+   :doc:`bugprone-suspicious-stringview-data-usage <bugprone/suspicious-stringview-data-usage>`,
    :doc:`bugprone-swapped-arguments <bugprone/swapped-arguments>`, "Yes"
    :doc:`bugprone-switch-missing-default-case <bugprone/switch-missing-default-case>`,
    :doc:`bugprone-terminating-continue <bugprone/terminating-continue>`, "Yes"
diff --git a/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string b/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
index f2e4159a224513..28e2b4a231e52e 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
+++ b/clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/string
@@ -24,6 +24,7 @@ struct basic_string {
   basic_string();
   basic_string(const C *p, const A &a = A());
   basic_string(const C *p, size_type count);
+  basic_string(const C *b, const C *e);
 
   ~basic_string();
 
@@ -85,6 +86,12 @@ struct basic_string_view {
   const C *str;
   constexpr basic_string_view(const C* s) : str(s) {}
 
+  const C *data() const;
+
+  bool empty() const;
+  size_type size() const;
+  size_type length() const;
+
   size_type find(_Type v, size_type pos = 0) const;
   size_type find(C ch, size_type pos = 0) const;
   size_type find(const C* s, size_type pos, size_type count) const;
diff --git a/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp b/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp
new file mode 100644
index 00000000000000..638ce8591a24ac
--- /dev/null
+++ b/clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-stringview-data-usage.cpp
@@ -0,0 +1,48 @@
+// RUN: %check_clang_tidy -std=c++17-or-later %s bugprone-suspicious-stringview-data-usage %t -- -- -isystem %clang_tidy_headers
+#include <string>
+
+struct View {
+   const char* str;
+};
+
+struct ViewWithSize {
+   const char* str;
+   std::string_view::size_type size;
+};
+
+void something(const char*);
+void something(const char*, unsigned);
+void something(const char*, unsigned, const char*);
+void something_str(std::string, unsigned);
+
+void invalid(std::string_view sv, std::string_view sv2) {
+  std::string s(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:20: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  std::string si{sv.data()};
+// CHECK-MESSAGES: :[[@LINE-1]]:21: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  std::string_view s2(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:26: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something(sv.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:16: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something(sv.data(), sv.size(), sv2.data());
+// CHECK-MESSAGES: :[[@LINE-1]]:39: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  something_str(sv.data(), sv.size());
+// CHECK-MESSAGES: :[[@LINE-1]]:20: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+  View view{sv.data()};
+// CHECK-MESSAGES: :[[@LINE-1]]:16: warning: result of a `data()` call may not be null terminated, provide size information to the callee to prevent potential issues
+}
+
+void valid(std::string_view sv) {
+  std::string s1(sv.data(), sv.data() + sv.size());
+  std::string s2(sv.data(), sv.data() + sv.length());
+  std::string s3(sv.data(), sv.size() + sv.data());
+  std::string s4(sv.data(), sv.length() + sv.data());
+  std::string s5(sv.data(), sv.size());
+  std::string s6(sv.data(), sv.length());
+  something(sv.data(), sv.size());
+  something(sv.data(), sv.length());
+  ViewWithSize view1{sv.data(), sv.size()};
+  ViewWithSize view2{sv.data(), sv.length()};
+
+  const char* str{sv.data()};
+}

Copy link
Contributor

@HerrCai0907 HerrCai0907 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@5chmidti 5chmidti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

This check identifies suspicious usages of std::string_view::data()
that could lead to reading out-of-bounds data due to inadequate or
incorrect string null termination.
@PiotrZSL PiotrZSL force-pushed the 80854-clang-tidy-create-bugprone-string-view-data-usage-check branch from ce8017e to 4310713 Compare March 7, 2024 19:58
Copy link

github-actions bot commented Mar 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@5chmidti 5chmidti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found something unused, otherwise LGTM

@PiotrZSL PiotrZSL merged commit 28c1279 into llvm:main Mar 19, 2024
5 checks passed
@PiotrZSL PiotrZSL deleted the 80854-clang-tidy-create-bugprone-string-view-data-usage-check branch March 19, 2024 19:15
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…m#83716)

This check identifies suspicious usages of std::string_view::data() that
could lead to reading out-of-bounds data due to inadequate or incorrect
string null termination.

Closes llvm#80854
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Clang-Tidy] Create bugprone-string-view-data-usage check
4 participants