Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/18.x: Reland "[clang-repl] Keep the first llvm::Module empty to avoid invalid memory access. (#89031)" #90544

Merged
merged 1 commit into from
May 14, 2024

Conversation

llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Apr 30, 2024

Backport a3f07d3

Requested by: @nikic

@llvmbot llvmbot added this to the LLVM 18.X Release milestone Apr 30, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 30, 2024
@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 30, 2024

@llvm/pr-subscribers-clang

Author: None (llvmbot)

Changes

Backport a3f07d3

Requested by: @nikic


Full diff: https://github.com/llvm/llvm-project/pull/90544.diff

2 Files Affected:

  • (modified) clang/lib/Interpreter/IncrementalParser.cpp (+19-5)
  • (modified) clang/lib/Interpreter/IncrementalParser.h (+5)
diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index 370bcbfee8b014..f5f32b9f3924ae 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -209,6 +209,10 @@ IncrementalParser::IncrementalParser(Interpreter &Interp,
   if (Err)
     return;
   CI->ExecuteAction(*Act);
+
+  if (getCodeGen())
+    CachedInCodeGenModule = GenModule();
+
   std::unique_ptr<ASTConsumer> IncrConsumer =
       std::make_unique<IncrementalASTConsumer>(Interp, CI->takeASTConsumer());
   CI->setASTConsumer(std::move(IncrConsumer));
@@ -224,11 +228,8 @@ IncrementalParser::IncrementalParser(Interpreter &Interp,
     return;                     // PTU.takeError();
   }
 
-  if (CodeGenerator *CG = getCodeGen()) {
-    std::unique_ptr<llvm::Module> M(CG->ReleaseModule());
-    CG->StartModule("incr_module_" + std::to_string(PTUs.size()),
-                    M->getContext());
-    PTU->TheModule = std::move(M);
+  if (getCodeGen()) {
+    PTU->TheModule = GenModule();
     assert(PTU->TheModule && "Failed to create initial PTU");
   }
 }
@@ -364,6 +365,19 @@ IncrementalParser::Parse(llvm::StringRef input) {
 std::unique_ptr<llvm::Module> IncrementalParser::GenModule() {
   static unsigned ID = 0;
   if (CodeGenerator *CG = getCodeGen()) {
+    // Clang's CodeGen is designed to work with a single llvm::Module. In many
+    // cases for convenience various CodeGen parts have a reference to the
+    // llvm::Module (TheModule or Module) which does not change when a new
+    // module is pushed. However, the execution engine wants to take ownership
+    // of the module which does not map well to CodeGen's design. To work this
+    // around we created an empty module to make CodeGen happy. We should make
+    // sure it always stays empty.
+    assert((!CachedInCodeGenModule ||
+            (CachedInCodeGenModule->empty() &&
+             CachedInCodeGenModule->global_empty() &&
+             CachedInCodeGenModule->alias_empty() &&
+             CachedInCodeGenModule->ifunc_empty())) &&
+           "CodeGen wrote to a readonly module");
     std::unique_ptr<llvm::Module> M(CG->ReleaseModule());
     CG->StartModule("incr_module_" + std::to_string(ID++), M->getContext());
     return M;
diff --git a/clang/lib/Interpreter/IncrementalParser.h b/clang/lib/Interpreter/IncrementalParser.h
index e13b74c7f65948..f63bce50acd3b9 100644
--- a/clang/lib/Interpreter/IncrementalParser.h
+++ b/clang/lib/Interpreter/IncrementalParser.h
@@ -24,6 +24,7 @@
 #include <memory>
 namespace llvm {
 class LLVMContext;
+class Module;
 } // namespace llvm
 
 namespace clang {
@@ -57,6 +58,10 @@ class IncrementalParser {
   /// of code.
   std::list<PartialTranslationUnit> PTUs;
 
+  /// When CodeGen is created the first llvm::Module gets cached in many places
+  /// and we must keep it alive.
+  std::unique_ptr<llvm::Module> CachedInCodeGenModule;
+
   IncrementalParser();
 
 public:

@nikic
Copy link
Contributor

nikic commented May 1, 2024

This looks like an ABI breaking change to me.

I think the libclang-cpp.so ABI test is currently broken due to the new minor version scheme. (cc @tstellar we need something like #85166 for the clang ABI check)

@vgvassilev
Copy link
Contributor

@nikic, I am confused. How is that an abi breaking change?

@nikic
Copy link
Contributor

nikic commented May 4, 2024

@vgvassilev My bad, I completely missed that the changed header is in the lib/ directory, not in include/.

@vgvassilev
Copy link
Contributor

Ok, maybe we could move forward?

@nikic nikic requested a review from lhames May 4, 2024 07:02
Copy link
Contributor

@lhames lhames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…id memory access. (llvm#89031)"

Original commit message: "

Clang's CodeGen is designed to work with a single llvm::Module. In many cases
for convenience various CodeGen parts have a reference to the llvm::Module
(TheModule or Module) which does not change when a new module is pushed.
However, the execution engine wants to take ownership of the module which does
not map well to CodeGen's design. To work this around we clone the module and
pass it down.

With some effort it is possible to teach CodeGen to ask the CodeGenModule for
its current module and that would have an overall positive impact on CodeGen
improving the encapsulation of various parts but that's not resilient to future
regression.

This patch takes a more conservative approach and keeps the first llvm::Module
empty intentionally and does not pass it to the Jit. That's also not bullet
proof because we have to guarantee that CodeGen does not write on the
blueprint. However, we have inserted some assertions to catch accidental
additions to that canary module.

This change will fixes a long-standing invalid memory access reported by
valgrind when we enable the TBAA optimization passes. It also unblock progress
on llvm#84758.
"

This patch reverts adc4f62 and removes
the check of `named_metadata_empty` of the first llvm::Module because on darwin
clang inserts some harmless metadata which we can ignore.

(cherry picked from commit a3f07d3)
@tstellar tstellar merged commit c5b3fa4 into llvm:release/18.x May 14, 2024
9 of 10 checks passed
@tstellar
Copy link
Collaborator

@nikic (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants