Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Safe Buffers] Serialize unsafe_buffer_usage pragmas #92031

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ziqingluo-90
Copy link
Contributor

A pair of #pragma clang unsafe_buffer_usage begin/end pragmas marks a warning-opt-out region. The begin and end locations (opt-out regions) are stored in preprocessor instances (PP) and will be queried by the -Wunsafe-buffer-usage analyzer.

The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized. During de-serialization, regions from loaded files are stored by their source files. When later one queries if a loaded location l is in an opt-out region, PP looks up regions by the source file of l.

The reported issue at upstream: #90501
rdar://124035402

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels May 13, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 13, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Ziqing Luo (ziqingluo-90)

Changes

A pair of #pragma clang unsafe_buffer_usage begin/end pragmas marks a warning-opt-out region. The begin and end locations (opt-out regions) are stored in preprocessor instances (PP) and will be queried by the -Wunsafe-buffer-usage analyzer.

The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized. During de-serialization, regions from loaded files are stored by their source files. When later one queries if a loaded location l is in an opt-out region, PP looks up regions by the source file of l.

The reported issue at upstream: #90501
rdar://124035402


Full diff: https://github.com/llvm/llvm-project/pull/92031.diff

12 Files Affected:

  • (modified) clang/include/clang/Lex/Preprocessor.h (+18-4)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+87-19)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+11)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+7)
  • (added) clang/test/Modules/Inputs/SafeBuffers/base.h (+9)
  • (added) clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap (+10)
  • (added) clang/test/Modules/Inputs/SafeBuffers/test_sub1.h (+20)
  • (added) clang/test/Modules/Inputs/SafeBuffers/test_sub2.h (+11)
  • (added) clang/test/Modules/safe_buffers_optout.cpp (+39)
  • (added) clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp (+72)
  • (added) clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp (+27)
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index e89b4a2c5230e..8d6884ebe7597 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -2883,11 +2883,15 @@ class Preprocessor {
   /// otherwise.
   SourceLocation CurrentSafeBufferOptOutStart; // It is used to report the start location of an never-closed region.
 
-  // An ordered sequence of "-Wunsafe-buffer-usage" opt-out regions in one
-  // translation unit. Each region is represented by a pair of start and end
-  // locations.  A region is "open" if its' start and end locations are
+  using SafeBufferOptOutMapTy =
+      SmallVector<std::pair<SourceLocation, SourceLocation>, 16>;
+  // An ordered sequence of "-Wunsafe-buffer-usage" opt-out regions in this
+  // translation unit. Each region is represented by a pair of start and
+  // end locations.  A region is "open" if its' start and end locations are
   // identical.
-  SmallVector<std::pair<SourceLocation, SourceLocation>, 8> SafeBufferOptOutMap;
+  SafeBufferOptOutMapTy SafeBufferOptOutMap;
+  // `SafeBufferOptOutMap`s of loaded files:
+  llvm::DenseMap<FileID, SafeBufferOptOutMapTy> LoadedSafeBufferOptOutMap;
 
 public:
   /// \return true iff the given `Loc` is in a "-Wunsafe-buffer-usage" opt-out
@@ -2918,6 +2922,16 @@ class Preprocessor {
   ///          opt-out region
   bool isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc);
 
+  /// \return a sequence of SourceLocations representing ordered opt-out regions
+  /// specified by
+  /// `\#pragma clang unsafe_buffer_usage begin/end`s of this translation unit.
+  SmallVector<SourceLocation, 64> serializeSafeBufferOptOutMap() const;
+
+  /// \param SrcLocSeqs a sequence of SourceLocations deserialized from a
+  /// record of code `PP_UNSAFE_BUFFER_USAGE`.
+  void setDeserializedSafeBufferOptOutMap(
+      const SmallVectorImpl<SourceLocation> &SrcLocSeqs);
+
 private:
   /// Helper functions to forward lexing to the actual lexer. They all share the
   /// same signature.
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index dcfa4ac0c1967..d1a0eba943039 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -775,6 +775,9 @@ enum ASTRecordTypes {
   /// Record code for lexical and visible block for delayed namespace in
   /// reduced BMI.
   DELAYED_NAMESPACE_LEXICAL_VISIBLE_RECORD = 68,
+
+  /// Record code for \#pragma clang unsafe_buffer_usage begin/end
+  PP_UNSAFE_BUFFER_USAGE = 69,
 };
 
 /// Record types used within a source manager block.
diff --git a/clang/lib/Lex/Preprocessor.cpp b/clang/lib/Lex/Preprocessor.cpp
index 0b70192743a39..6a41e3d4138aa 100644
--- a/clang/lib/Lex/Preprocessor.cpp
+++ b/clang/lib/Lex/Preprocessor.cpp
@@ -58,6 +58,7 @@
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/Capacity.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -1483,26 +1484,41 @@ void Preprocessor::emitFinalMacroWarning(const Token &Identifier,
 }
 
 bool Preprocessor::isSafeBufferOptOut(const SourceManager &SourceMgr,
-                                           const SourceLocation &Loc) const {
-  // Try to find a region in `SafeBufferOptOutMap` where `Loc` is in:
-  auto FirstRegionEndingAfterLoc = llvm::partition_point(
-      SafeBufferOptOutMap,
-      [&SourceMgr,
-       &Loc](const std::pair<SourceLocation, SourceLocation> &Region) {
-        return SourceMgr.isBeforeInTranslationUnit(Region.second, Loc);
-      });
+                                      const SourceLocation &Loc) const {
+  // The lambda that tests if a `Loc` is in an opt-out region given one opt-out
+  // region map:
+  auto TestInMap = [&SourceMgr](const SafeBufferOptOutMapTy &Map,
+                                const SourceLocation &Loc) -> bool {
+    // Try to find a region in `SafeBufferOptOutMap` where `Loc` is in:
+    auto FirstRegionEndingAfterLoc = llvm::partition_point(
+        Map, [&SourceMgr,
+              &Loc](const std::pair<SourceLocation, SourceLocation> &Region) {
+          return SourceMgr.isBeforeInTranslationUnit(Region.second, Loc);
+        });
+
+    if (FirstRegionEndingAfterLoc != Map.end()) {
+      // To test if the start location of the found region precedes `Loc`:
+      return SourceMgr.isBeforeInTranslationUnit(
+          FirstRegionEndingAfterLoc->first, Loc);
+    }
+    // If we do not find a region whose end location passes `Loc`, we want to
+    // check if the current region is still open:
+    if (!Map.empty() && Map.back().first == Map.back().second)
+      return SourceMgr.isBeforeInTranslationUnit(Map.back().first, Loc);
+    return false;
+  };
 
-  if (FirstRegionEndingAfterLoc != SafeBufferOptOutMap.end()) {
-    // To test if the start location of the found region precedes `Loc`:
-    return SourceMgr.isBeforeInTranslationUnit(FirstRegionEndingAfterLoc->first,
-                                               Loc);
-  }
-  // If we do not find a region whose end location passes `Loc`, we want to
-  // check if the current region is still open:
-  if (!SafeBufferOptOutMap.empty() &&
-      SafeBufferOptOutMap.back().first == SafeBufferOptOutMap.back().second)
-    return SourceMgr.isBeforeInTranslationUnit(SafeBufferOptOutMap.back().first,
-                                               Loc);
+  if (SourceMgr.isLocalSourceLocation(Loc))
+    return TestInMap(SafeBufferOptOutMap, Loc);
+
+  // Expand macros (if any) until it gets to a file location:
+  SourceLocation FLoc = SourceMgr.getFileLoc(Loc);
+  FileID FlocFID = SourceMgr.getDecomposedLoc(FLoc).first;
+  auto LoadedMap = LoadedSafeBufferOptOutMap.find(FlocFID);
+
+  if (LoadedMap != LoadedSafeBufferOptOutMap.end())
+    return TestInMap(LoadedMap->getSecond(), Loc);
+  // FIXME: think out if this is unreachable?
   return false;
 }
 
@@ -1551,6 +1567,58 @@ bool Preprocessor::isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc) {
   return InSafeBufferOptOutRegion;
 }
 
+SmallVector<SourceLocation, 64>
+Preprocessor::serializeSafeBufferOptOutMap() const {
+  assert(!InSafeBufferOptOutRegion &&
+         "Attempt to serialize safe buffer opt-out regions before file being "
+         "completely preprocessed");
+
+  SmallVector<SourceLocation, 64> SrcSeq;
+
+  for (const auto &[begin, end] : SafeBufferOptOutMap) {
+    SrcSeq.push_back(begin);
+    SrcSeq.push_back(end);
+  }
+  // Only `SafeBufferOptOutMap` gets serialized. No need to serialize
+  // `LoadedSafeBufferOptOutMap` because if this TU loads a pch/module, every
+  // pch/module in the pch-chain/module-DAG will be loaded one by one in order.
+  // It means that for each loading pch/module m, it just needs to load m's own
+  // `SafeBufferOptOutMap`.
+  return SrcSeq;
+}
+
+void Preprocessor::setDeserializedSafeBufferOptOutMap(
+    const SmallVectorImpl<SourceLocation> &SourceLocations) {
+  auto It = SourceLocations.begin();
+
+  assert(SourceLocations.size() % 2 == 0 &&
+         "ill-formed SourceLocation sequence");
+  while (It != SourceLocations.end()) {
+    SourceLocation begin = *It++;
+    SourceLocation end = *It++;
+    SourceLocation FileLoc = SourceMgr.getFileLoc(begin);
+    FileID FID = SourceMgr.getDecomposedLoc(FileLoc).first;
+
+    if (FID.isInvalid()) {
+      // I suppose this should not happen:
+      assert(false && "Attempted to read a safe buffer opt-out region whose "
+                      "begin location is associated to an invalid File ID.");
+      break;
+    }
+    assert(!SourceMgr.isLocalFileID(FID) && "Expected a pch/module file");
+    // Here we assume that
+    // `SourceMgr.getFileLoc(begin) == SourceMgr.getFileLoc(end)`.
+    // Though it may not hold in very rare and strange cases, i.e., a pair of
+    // opt-out pragams written in different files but are always used in a way
+    // that they match in a TU (otherwise PP will complaint).
+    // FIXME: PP should complaint about cross file safe buffer opt-out pragmas.
+    assert(FID == SourceMgr.getDecomposedLoc(SourceMgr.getFileLoc(end)).first &&
+           "Maybe a safe buffer opt-out region starts and ends at different "
+           "files.");
+    LoadedSafeBufferOptOutMap[FID].emplace_back(begin, end);
+  }
+}
+
 ModuleLoader::~ModuleLoader() = default;
 
 CommentHandler::~CommentHandler() = default;
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 43b69045bb054..4e22b3e1187c0 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3586,6 +3586,17 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
     }
 
+    case PP_UNSAFE_BUFFER_USAGE: {
+      if (!Record.empty()) {
+        SmallVector<SourceLocation, 64> SrcLocs;
+        unsigned Idx = 0;
+        while (Idx < Record.size())
+          SrcLocs.push_back(ReadSourceLocation(F, Record, Idx));
+        PP.setDeserializedSafeBufferOptOutMap(SrcLocs);
+      }
+      break;
+    }
+
     case PP_CONDITIONAL_STACK:
       if (!Record.empty()) {
         unsigned Idx = 0, End = Record.size() - 1;
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 30195868ca996..3203173640d18 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -911,6 +911,7 @@ void ASTWriter::WriteBlockInfoBlock() {
   RECORD(PP_CONDITIONAL_STACK);
   RECORD(DECLS_TO_CHECK_FOR_DEFERRED_DIAGS);
   RECORD(PP_ASSUME_NONNULL_LOC);
+  RECORD(PP_UNSAFE_BUFFER_USAGE);
 
   // SourceManager Block.
   BLOCK(SOURCE_MANAGER_BLOCK);
@@ -2439,6 +2440,12 @@ void ASTWriter::WritePreprocessor(const Preprocessor &PP, bool IsModule) {
     Record.clear();
   }
 
+  // Write the safe buffer opt-out region map in PP
+  for (SourceLocation &S : PP.serializeSafeBufferOptOutMap())
+    AddSourceLocation(std::move(S), Record);
+  Stream.EmitRecord(PP_UNSAFE_BUFFER_USAGE, Record);
+  Record.clear();
+
   // Enter the preprocessor block.
   Stream.EnterSubblock(PREPROCESSOR_BLOCK_ID, 3);
 
diff --git a/clang/test/Modules/Inputs/SafeBuffers/base.h b/clang/test/Modules/Inputs/SafeBuffers/base.h
new file mode 100644
index 0000000000000..660df17998a48
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/base.h
@@ -0,0 +1,9 @@
+#ifdef __cplusplus
+int base(int *p) {
+  int x = p[5];
+#pragma clang unsafe_buffer_usage begin
+  int y = p[5];
+#pragma clang unsafe_buffer_usage end
+  return x + y;
+}
+#endif
diff --git a/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap b/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap
new file mode 100644
index 0000000000000..a4826c6514b98
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap
@@ -0,0 +1,10 @@
+module safe_buffers_test_base {
+  header "base.h"
+}
+
+module safe_buffers_test_optout {
+  explicit module test_sub1 {  header "test_sub1.h" }
+  explicit module test_sub2 {  textual header "test_sub2.h" }
+  use safe_buffers_test_base
+}
+
diff --git a/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h b/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h
new file mode 100644
index 0000000000000..c2b105c6e748a
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h
@@ -0,0 +1,20 @@
+#include "base.h"
+
+#ifdef __cplusplus
+int sub1(int *p) {
+  int x = p[5];
+#pragma clang unsafe_buffer_usage begin  
+  int y = p[5];
+#pragma clang unsafe_buffer_usage end
+  return x + y;
+}
+
+template <typename T>
+T sub1_T(T *p) {
+  T x = p[5];
+#pragma clang unsafe_buffer_usage begin  
+  T y = p[5];
+#pragma clang unsafe_buffer_usage end
+  return x + y;
+}
+#endif
diff --git a/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h b/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h
new file mode 100644
index 0000000000000..11eb9591ab600
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h
@@ -0,0 +1,11 @@
+#include "base.h"
+
+#ifdef __cplusplus
+int sub2(int *p) {  
+  int x = p[5];
+#pragma clang unsafe_buffer_usage begin
+  int y = p[5];
+#pragma clang unsafe_buffer_usage end
+  return x + y;
+}
+#endif
diff --git a/clang/test/Modules/safe_buffers_optout.cpp b/clang/test/Modules/safe_buffers_optout.cpp
new file mode 100644
index 0000000000000..dd8b1cca5ec89
--- /dev/null
+++ b/clang/test/Modules/safe_buffers_optout.cpp
@@ -0,0 +1,39 @@
+// RUN: rm -rf %t
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodule-name=safe_buffers_test_base -x c++ %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20\
+// RUN:     -o %t/safe_buffers_test_base.pcm -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodule-name=safe_buffers_test_optout -x c++ %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20\
+// RUN:     -o %t/safe_buffers_test_optout.pcm -fmodule-file=%t/safe_buffers_test_base.pcm -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -verify -fmodule-file=%t/safe_buffers_test_optout.pcm -I %S/Inputs/SafeBuffers %s\
+// RUN:     -std=c++20 -Wunsafe-buffer-usage
+
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodules-cache-path=%t -fmodule-name=safe_buffers_test_base -x c++\
+// RUN:     %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20 -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodules-cache-path=%t -fmodule-name=safe_buffers_test_optout -x c++\
+// RUN:     %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20 -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -verify -fmodules-cache-path=%t -I %S/Inputs/SafeBuffers %s -x c++\
+// RUN:     -std=c++20 -Wunsafe-buffer-usage
+
+#include "test_sub1.h"
+#include "test_sub2.h"
+
+// Testing safe buffers opt-out region serialization with modules: this
+// file loads 2 submodules from top-level module
+// `safe_buffers_test_optout`, which uses another top-level module
+// `safe_buffers_test_base`. (So the module dependencies form a DAG.)
+
+// expected-warning@base.h:3{{unsafe buffer access}}
+// expected-note@base.h:3{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub1.h:5{{unsafe buffer access}}
+// expected-note@test_sub1.h:5{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub1.h:14{{unsafe buffer access}}
+// expected-note@test_sub1.h:14{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub2.h:5{{unsafe buffer access}}
+// expected-note@test_sub2.h:5{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int foo(int * p) {
+  int x = p[5]; // expected-warning{{unsafe buffer access}} expected-note{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+#pragma clang unsafe_buffer_usage begin
+  int y = p[5];
+#pragma clang unsafe_buffer_usage end
+  sub1_T(p); // instantiate template
+  return base(p) + sub1(p) + sub2(p);
+}
diff --git a/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp
new file mode 100644
index 0000000000000..04ac77b3f342e
--- /dev/null
+++ b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp
@@ -0,0 +1,72 @@
+// A more complex example than warn-unsafe-buffer-usage-pragma-pch.cpp:
+//   MAIN - includes INC_H_1 
+//        \ loads    PCH_H_1 - includes INC_H_2
+//                           \ loads    PCH_H_2   
+
+// Test with PCH
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -emit-pch -o %t-1 -DPCH_H_2 %s
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -include-pch %t-1 -emit-pch -o %t-2 -DPCH_H_1 %s
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include-pch %t-2 -DMAIN -verify %s
+
+#define UNSAFE_BEGIN _Pragma("clang unsafe_buffer_usage begin")
+#define UNSAFE_END   _Pragma("clang unsafe_buffer_usage end")
+
+
+#ifdef INC_H_1
+#undef INC_H_1
+int a(int *s) {
+  s[2];  // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+  return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef INC_H_2
+#undef INC_H_2
+int b(int *s) {
+  s[2];  // <- expected warning here  
+#pragma clang unsafe_buffer_usage begin
+  return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef PCH_H_1
+#undef PCH_H_1
+#define INC_H_2
+#include "warn-unsafe-buffer-usage-pragma-pch-complex.cpp"
+
+int c(int *s) {
+  s[2];  // <- expected warning here  
+UNSAFE_BEGIN
+  return s[1];
+UNSAFE_END
+}
+#endif
+
+#ifdef PCH_H_2
+#undef PCH_H_2
+int d(int *s) {
+  s[2];  // <- expected warning here  
+#pragma clang unsafe_buffer_usage begin
+  return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef MAIN
+#undef MAIN
+#define INC_H_1
+#include "warn-unsafe-buffer-usage-pragma-pch-complex.cpp"
+
+// expected-warning@-45{{unsafe buffer access}} expected-note@-45{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-36{{unsafe buffer access}} expected-note@-36{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-24{{unsafe buffer access}} expected-note@-24{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-15{{unsafe buffer access}} expected-note@-15{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int main() {
+  int s[] = {1, 2, 3};
+  return a(s) + b(s) + c(s) + d(s);
+}
+#undef MAIN
+#endif
diff --git a/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp
new file mode 100644
index 0000000000000..abd3f0ffe9565
--- /dev/null
+++ b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp
@@ -0,0 +1,27 @@
+// The original example from https://github.com/llvm/llvm-project/issues/90501
+
+// Test without PCH
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include %s -verify %s
+// Test with PCH
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -emit-pch -o %t %s
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include-pch %t -verify %s
+
+#ifndef A_H
+#define A_H
+
+int a(int *s) {
+  s[2];  // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+  return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+
+#else
+// expected-warning@-7{{unsafe buffer access}}
+// expected-note@-8{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int main() {
+  int s[] = {1, 2, 3};
+  return a(s);
+}
+
+#endif

Copy link
Collaborator

@haoNoQ haoNoQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy that this is going somewhere! Everything makes sense to me but I also don't know a lot about this stuff.

During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized.

Hmm how does this work? When a precompiled header includes another precompiled header, do they both get loaded independently and then you get to combine the maps?

@@ -1551,6 +1567,58 @@ bool Preprocessor::isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc) {
return InSafeBufferOptOutRegion;
}

SmallVector<SourceLocation, 64>
Preprocessor::serializeSafeBufferOptOutMap() const {
assert(!InSafeBufferOptOutRegion &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require the pragma to be closed before the end of TU right? And that's a hard error, not a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a warning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in this case we need to support the situation when the warning was ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm sorry, they are not warnings, they are errors:

PP.Diag(Loc, diag::err_pp_double_begin_pragma_unsafe_buffer_usage);

clang/lib/Lex/Preprocessor.cpp Show resolved Hide resolved
@ziqingluo-90
Copy link
Contributor Author

I'm very happy that this is going somewhere! Everything makes sense to me but I also don't know a lot about this stuff.

During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized.

Hmm how does this work? When a precompiled header includes another precompiled header, do they both get loaded independently and then you get to combine the maps?

For loading multiple PCHs, it is a chain. Suppose TU includes pch B and B includes pch C. So one would compile and emit PCHs starting from C, then do it for B with C being loaded. When compile A, it includes pch B from the command line and clang actually loads C and B separately.

It is the same story for modules, except that modules form a DAG instead of a chain. Clang will load each module file in an order with respect to the DAG. So for each module/PCH emitted by clang, only their own opt-out regions need to be serialized.

@danakj
Copy link
Contributor

danakj commented May 15, 2024

I am curious so I will ask: why did this take an approach of serializing the output of the pragmas (the maps) instead of serializing the pragmas themselves as AST nodes, so that the maps would be constructed in the same way as from parsing the text directly? that's how other pragmas were serialized that I have seen.

@haoNoQ
Copy link
Collaborator

haoNoQ commented May 15, 2024

serializing the pragmas themselves as AST nodes

These pragmas don't really translate very well into the AST. Similarly to #pragma clang diagnostic, they can be placed weirdly "across" other AST nodes, like:

#pragma clang unsafe_buffer_usage begin
void foo() {
#pragma clang unsafe_buffer_usage end
}

We could support it in the same way that clang supports Duff's Device but I don't think this is really a good idea. We ultimately want them to be textual, a preprocessor-level construct, because that's what security audits are looking for: an extremely clear sign that this is where unchecked pointer operations live. So that they could tell it without looking at anything else in the code.

@ziqingluo-90 ziqingluo-90 changed the title [WIP][Safe Buffers] Serialize unsafe_buffer_usage pragmas [Safe Buffers] Serialize unsafe_buffer_usage pragmas May 15, 2024
@ziqingluo-90
Copy link
Contributor Author

serializing the pragmas themselves as AST nodes

These pragmas don't really translate very well into the AST. Similarly to #pragma clang diagnostic, ....

yeah, #pragma clang unsafe_buffer_usage begin/end is an extremely simpler implementation of #pragma clang diagnostic for -Wunsafe-buffer-usage that can be queried at the early stage of the analysis to avoid unnecessary analysis. That's why we didn't implement it into the existing diagnostic map mechanism.

@ziqingluo-90 ziqingluo-90 self-assigned this May 15, 2024
@danakj
Copy link
Contributor

danakj commented May 15, 2024

Thanks for the explanation :)

Throw an error if clang deserializes a safe buffer opt-out region that
begins and ends at different files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants