Skip to content

Conversation

@anutosh491
Copy link
Member

We can see the following while running clang-repl in C mode

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>
hi

In debug mode while dumping the generated Module, i see this

clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>

=== compile-ptu 1 ===
[TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
[LLVM IR]
; ModuleID = 'incr_module_1'
source_filename = "incr_module_1"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_1, ptr null }]

define internal void @__stmts__0() #0 {
entry:
  %call = call i32 (ptr, ...) @printf(ptr noundef @.str)
  ret void
}

declare i32 @printf(ptr noundef, ...) #1

; Function Attrs: noinline nounwind uwtable
define internal void @_GLOBAL__sub_I_incr_module_1() #2 section ".text.startup" {
entry:
  call void @__stmts__0()
  ret void
}

attributes #0 = { "min-legal-vector-width"="0" }
attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fbc2bb09bae61ed59316468011e4a42cf47)"}
=== end compile-ptu ===

execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
hi

Basically I see that CodeGen emits IR for a cell before we know whether DiagnosticsEngine has an error. For C code like printf("hi\n"); without <stdio.h>, Sema emits a diagnostic but still produces a "codegen-able" TopLevelStmt, so the printf call is IR-generated into the current module.

Previously, when Diags.hasErrorOccurred() was true, we only cleaned up the PTU AST and left the CodeGen module untouched. The next successful cell then called GenModule(), which returned that same module (now also containing the next cell’s IR), causing side effects from the failed cell (e.g. printf)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2025

@llvm/pr-subscribers-clang

Author: Anutosh Bhat (anutosh491)

Changes

We can see the following while running clang-repl in C mode

anutosh491@<!-- -->vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23
clang-repl&gt; printf("hi\n");
In file included from &lt;&lt;&lt; inputs &gt;&gt;&gt;:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header &lt;stdio.h&gt; or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl&gt; #include &lt;stdio.h&gt;
hi

In debug mode while dumping the generated Module, i see this

clang-repl&gt; printf("hi\n");
In file included from &lt;&lt;&lt; inputs &gt;&gt;&gt;:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header &lt;stdio.h&gt; or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl&gt; #include &lt;stdio.h&gt;

=== compile-ptu 1 ===
[TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
[LLVM IR]
; ModuleID = 'incr_module_1'
source_filename = "incr_module_1"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1
@<!-- -->llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @<!-- -->_GLOBAL__sub_I_incr_module_1, ptr null }]

define internal void @<!-- -->__stmts__0() #<!-- -->0 {
entry:
  %call = call i32 (ptr, ...) @<!-- -->printf(ptr noundef @.str)
  ret void
}

declare i32 @<!-- -->printf(ptr noundef, ...) #<!-- -->1

; Function Attrs: noinline nounwind uwtable
define internal void @<!-- -->_GLOBAL__sub_I_incr_module_1() #<!-- -->2 section ".text.startup" {
entry:
  call void @<!-- -->__stmts__0()
  ret void
}

attributes #<!-- -->0 = { "min-legal-vector-width"="0" }
attributes #<!-- -->1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #<!-- -->2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fbc2bb09bae61ed59316468011e4a42cf47)"}
=== end compile-ptu ===

execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
hi

Basically I see that CodeGen emits IR for a cell before we know whether DiagnosticsEngine has an error. For C code like printf("hi\n"); without <stdio.h>, Sema emits a diagnostic but still produces a "codegen-able" TopLevelStmt, so the printf call is IR-generated into the current module.

Previously, when Diags.hasErrorOccurred() was true, we only cleaned up the PTU AST and left the CodeGen module untouched. The next successful cell then called GenModule(), which returned that same module (now also containing the next cell’s IR), causing side effects from the failed cell (e.g. printf)


Full diff: https://github.com/llvm/llvm-project/pull/169989.diff

3 Files Affected:

  • (modified) clang/lib/Interpreter/IncrementalAction.cpp (+11)
  • (modified) clang/lib/Interpreter/IncrementalAction.h (+2)
  • (modified) clang/lib/Interpreter/IncrementalParser.cpp (+1)
diff --git a/clang/lib/Interpreter/IncrementalAction.cpp b/clang/lib/Interpreter/IncrementalAction.cpp
index 3d489fce54bc6..e2b9d13017ada 100644
--- a/clang/lib/Interpreter/IncrementalAction.cpp
+++ b/clang/lib/Interpreter/IncrementalAction.cpp
@@ -120,6 +120,17 @@ std::unique_ptr<llvm::Module> IncrementalAction::GenModule() {
   return nullptr;
 }
 
+void IncrementalAction::discardCurrentCodeGenModule() {
+  if (CodeGenerator *CG = getCodeGen()) {
+    if (auto *CurM = CG->GetModule()) {
+      llvm::LLVMContext &Ctx = CurM->getContext();
+      std::string Name = CurM->getName().str();
+      std::unique_ptr<llvm::Module> Dead(CG->ReleaseModule());
+      CG->StartModule(Name, Ctx);
+    }
+  }
+}
+
 CodeGenerator *IncrementalAction::getCodeGen() const {
   FrontendAction *WrappedAct = getWrapped();
   if (!WrappedAct || !WrappedAct->hasIRSupport())
diff --git a/clang/lib/Interpreter/IncrementalAction.h b/clang/lib/Interpreter/IncrementalAction.h
index 725cdd0c27cf4..485cfaa45f793 100644
--- a/clang/lib/Interpreter/IncrementalAction.h
+++ b/clang/lib/Interpreter/IncrementalAction.h
@@ -74,6 +74,8 @@ class IncrementalAction : public WrapperFrontendAction {
 
   /// Generate an LLVM module for the most recent parsed input.
   std::unique_ptr<llvm::Module> GenModule();
+
+  void discardCurrentCodeGenModule();
 };
 
 class InProcessPrintingASTConsumer final : public MultiplexConsumer {
diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index bf08911e23533..53379603c26da 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -82,6 +82,7 @@ IncrementalParser::ParseOrWrapTopLevelDecl() {
 
   DiagnosticsEngine &Diags = S.getDiagnostics();
   if (Diags.hasErrorOccurred()) {
+    Act->discardCurrentCodeGenModule();
     CleanUpPTU(C.getTranslationUnitDecl());
 
     Diags.Reset(/*soft=*/true);

@anutosh491
Copy link
Member Author

So this fix discards the current CodeGen module whenever Diags.hasErrorOccurred(), so IR from a failing input is dropped and cannot leak into the next cell. After the patch in debug mode we can see the following !

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --debug-only=clang-repl
compile-ptu 0: [TU=0x55556cf78168, M=0x55556cf72c60 (incr_module_0)]
execute-ptu 0: [TU=0x55556cf78168, M=0x55556cf72c60 (incr_module_0)]
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: use of undeclared identifier 'printf'
    1 | printf("hi\n");
      | ^~~~~~
error: Parsing failed.
clang-repl> #include <stdio.h>
compile-ptu 1: [TU=0x55556cfc7c08, M=0x55556cfbeee0 (incr_module_1)]
execute-ptu 1: [TU=0x55556cfc7c08, M=0x55556cfbeee0 (incr_module_1)]
clang-repl> printf("hi\n");
compile-ptu 2: [TU=0x55556d02cd68, M=0x55556cfca4e0 (incr_module_2)]
execute-ptu 2: [TU=0x55556d02cd68, M=0x55556cfca4e0 (incr_module_2)]
hi
clang-repl> 

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23 --debug-only=clang-repl
compile-ptu 0: [TU=0x55556cfb2b18, M=0x55556cf95a90 (incr_module_0)]
execute-ptu 0: [TU=0x55556cfb2b18, M=0x55556cf95a90 (incr_module_0)]
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type
      'int (const char *, ...)'; ISO C99 and later do not support implicit function
      declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a
      declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h> 
compile-ptu 1: [TU=0x55556cfbfb20, M=0x55556cf979a0 (incr_module_1)]
execute-ptu 1: [TU=0x55556cfbfb20, M=0x55556cf979a0 (incr_module_1)]
clang-repl> printf("hi\n");
compile-ptu 2: [TU=0x55556d00b5f8, M=0x55556cfc1790 (incr_module_2)]
execute-ptu 2: [TU=0x55556d00b5f8, M=0x55556cfc1790 (incr_module_2)]
hi

/// Generate an LLVM module for the most recent parsed input.
std::unique_ptr<llvm::Module> GenModule();

void discardCurrentCodeGenModule();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add a comment explaining the purpose behind the function (just like we do for above functions) if needed !

std::string Name = CurM->getName().str();
std::unique_ptr<llvm::Module> Dead(CG->ReleaseModule());
CG->StartModule(Name, Ctx);
}
Copy link
Member Author

@anutosh491 anutosh491 Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the variable name Dead for now ! Can you use Discarded or something along those lines I think

@anutosh491
Copy link
Member Author

Looks like the goto approach for fixing this to me. Can add tests if we agree with the same !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category clang-repl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants