Skip to content

[clang-repl] Remove invalid TopLevelStmtDecl from TU on parse failure #153945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

devajithvs
Copy link
Contributor

ActOnStartTopLevelStmtDecl immediately adds the new TopLevelStmtDecl to the current DeclContext.

If parsing fails, the decl remains attached to the TU even though it has no valid Stmt, leaving the AST in an invalid state.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 16, 2025

@llvm/pr-subscribers-clang

Author: Devajith (devajithvs)

Changes

ActOnStartTopLevelStmtDecl immediately adds the new TopLevelStmtDecl to the current DeclContext.

If parsing fails, the decl remains attached to the TU even though it has no valid Stmt, leaving the AST in an invalid state.


Full diff: https://github.com/llvm/llvm-project/pull/153945.diff

2 Files Affected:

  • (modified) clang/lib/Parse/ParseDecl.cpp (+4-1)
  • (modified) clang/unittests/Interpreter/InterpreterTest.cpp (+48)
diff --git a/clang/lib/Parse/ParseDecl.cpp b/clang/lib/Parse/ParseDecl.cpp
index fd53cca5a13ff..59c5f4cacc36e 100644
--- a/clang/lib/Parse/ParseDecl.cpp
+++ b/clang/lib/Parse/ParseDecl.cpp
@@ -5696,8 +5696,11 @@ Parser::DeclGroupPtrTy Parser::ParseTopLevelStmtDecl() {
   TopLevelStmtDecl *TLSD = Actions.ActOnStartTopLevelStmtDecl(getCurScope());
   StmtResult R = ParseStatementOrDeclaration(Stmts, SubStmtCtx);
   Actions.ActOnFinishTopLevelStmtDecl(TLSD, R.get());
-  if (!R.isUsable())
+  if (!R.isUsable()) {
+    if (DeclContext *DC = TLSD->getDeclContext())
+      DC->removeDecl(TLSD); // unlink from TU
     R = Actions.ActOnNullStmt(Tok.getLocation());
+  }
 
   if (Tok.is(tok::annot_repl_input_end) &&
       Tok.getAnnotationValue() != nullptr) {
diff --git a/clang/unittests/Interpreter/InterpreterTest.cpp b/clang/unittests/Interpreter/InterpreterTest.cpp
index 8639fb668f1fe..c9eb65f346041 100644
--- a/clang/unittests/Interpreter/InterpreterTest.cpp
+++ b/clang/unittests/Interpreter/InterpreterTest.cpp
@@ -21,6 +21,8 @@
 #include "clang/Interpreter/Value.h"
 #include "clang/Sema/Lookup.h"
 #include "clang/Sema/Sema.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
 
 #include "llvm/TargetParser/Host.h"
 
@@ -91,6 +93,52 @@ TEST_F(InterpreterTest, IncrementalInputTopLevelDecls) {
   EXPECT_EQ("var2", DeclToString(*R2DeclRange.begin()));
 }
 
+TEST_F(InterpreterTest, BadIncludeDoesNotCorruptTU) {
+  // Create a temporary header that triggers an error at top level.
+  llvm::SmallString<256> HeaderPath;
+  ASSERT_FALSE(llvm::sys::fs::createTemporaryFile("interp-bad-include", "h",
+                                                  HeaderPath));
+  {
+    std::error_code EC;
+    llvm::raw_fd_ostream OS(HeaderPath, EC, llvm::sys::fs::OF_Text);
+    ASSERT_FALSE(EC);
+    OS << R"(error_here; // expected-error {{use of undeclared identifier}})";
+  }
+
+  llvm::SmallString<256> HeaderDir = llvm::sys::path::parent_path(HeaderPath);
+  llvm::SmallString<256> HeaderFile = llvm::sys::path::filename(HeaderPath);
+
+  // Set up the interpreter with the include path.
+  using Args = std::vector<const char *>;
+  Args ExtraArgs = {"-I",      HeaderDir.c_str(),
+                    "-Xclang", "-diagnostic-log-file",
+                    "-Xclang", "-"};
+
+  // Create the diagnostic engine with unowned consumer.
+  std::string DiagnosticOutput;
+  llvm::raw_string_ostream DiagnosticsOS(DiagnosticOutput);
+  auto DiagPrinter = std::make_unique<TextDiagnosticPrinter>(
+      DiagnosticsOS, new DiagnosticOptions());
+
+  auto Interp = createInterpreter(ExtraArgs, DiagPrinter.get());
+  // This parse should fail because of an undeclared identifier, but should
+  // leave TU intact.
+  auto Err =
+      Interp->Parse("#include \"" + HeaderFile.str().str() + "\"").takeError();
+  using ::testing::HasSubstr;
+  EXPECT_THAT(DiagnosticOutput,
+              HasSubstr("use of undeclared identifier 'error_here'"));
+  EXPECT_EQ("Parsing failed.", llvm::toString(std::move(Err)));
+
+  // Inspect the TU, there must not be any TopLevelStmtDecl left.
+  TranslationUnitDecl *TU =
+      Interp->getCompilerInstance()->getASTContext().getTranslationUnitDecl();
+  EXPECT_EQ(0U, DeclsSize(TU));
+
+  // Cleanup temp file.
+  llvm::sys::fs::remove(HeaderPath);
+}
+
 TEST_F(InterpreterTest, Errors) {
   Args ExtraArgs = {"-Xclang", "-diagnostic-log-file", "-Xclang", "-"};
 

@devajithvs
Copy link
Contributor Author

This was caught in one of the cling-tests in root-project/root#19242, when comparing the state before and after a failed include.

CC: @vgvassilev

`ActOnStartTopLevelStmtDecl` immediately adds the new `TopLevelStmtDecl` to
the current `DeclContext`.

If parsing fails, the decl remains attached to the TU even though it has no
valid `Stmt`, leaving the AST in an invalid state.
if (!R.isUsable())
if (!R.isUsable()) {
if (DeclContext *DC = TLSD->getDeclContext())
DC->removeDecl(TLSD); // unlink from TU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this here

void IncrementalParser::CleanUpPTU(TranslationUnitDecl *MostRecentTU) {
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do something like:

diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index 6343f17ed822..893aea55e9ab 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -175,6 +175,9 @@ void IncrementalParser::CleanUpPTU(TranslationUnitDecl *MostRecentTU) {
 
   // FIXME: We should de-allocate MostRecentTU
   for (Decl *D : MostRecentTU->decls()) {
+    // Remove any TopLevelStmtDecl created for statements that failed to parse.
+    if (auto *TLSD = dyn_cast<TopLevelStmtDecl>(D))
+      if (!TLSD->getStmt())
+        MostRecentTU->removeDecl(TLSD);
     auto *ND = dyn_cast<NamedDecl>(D);
     if (!ND || ND->getDeclName().isEmpty())
       continue;

Does that work?

I’m wondering, though, if we should fix this at the source: when ParseTopLevelStmtDecl sees !R.isUsable(), we immediately unlink the TopLevelStmtDecl we just attached. That ensures we never see a TU with an invalid TopLevelStmtDecl between parse and cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not have the notion of removing the declarations on error. If we do it in the Parser that would be inconsistent to the rest of the logic in clang. I'd prefer to have all of this happen in a single place eg. CleanUpPTU.

@@ -91,6 +93,53 @@ TEST_F(InterpreterTest, IncrementalInputTopLevelDecls) {
EXPECT_EQ("var2", DeclToString(*R2DeclRange.begin()));
}

TEST_F(InterpreterTest, BadIncludeDoesNotCorruptTU) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend either clang/test/Interpreter/execute.c or clang/test/Interpreter/fail.cpp tests instead? I think it would be simpler to write and maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of the AST before and after the failed statement will not be identical (and asserts on AST dump). I could not find a way to emit/dump AST on a failed statement.

int i = 10;
#include "interp-bad-include.h"
int j = 20;

cat "test.C" | bin/clang-repl -Xcc -Xclang -Xcc -ast-dump

I get:

...
...
TranslationUnitDecl 0x5af994c521d0 prev 0x5af994c26c18 <<invalid sloc>> <invalid sloc>
`-VarDecl 0x5af994c52250 <input_line_1:1:1, col:9> col:5 i 'int' cinit
  `-IntegerLiteral 0x5af994c522b8 <col:9> 'int' 10
In file included from <<< inputs >>>:1:
In file included from input_line_2:1:
./interp-bad-include.h:1:1: error: use of undeclared identifier 'error_here'
    1 | error_here; // expected-error {{use of undeclared identifier}}
      | ^~~~~~~~~~
error: Parsing failed.
TranslationUnitDecl 0x5af994c52438 prev 0x5af994c52328 <<invalid sloc>> <invalid sloc>
`-VarDecl 0x5af994c524b8 <input_line_3:1:1, col:9> col:5 j 'int' cinit
  `-IntegerLiteral 0x5af994c52520 <col:9> 'int' 20

There are no dumps after the failed statement. Do you know of another way to dump the AST (to trigger the assertion)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to .undo a TopLevelStmtDecl instead and check if it triggers the same issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants