Skip to content

Conversation

@tetzank
Copy link

@tetzank tetzank commented Oct 17, 2025

When eliminating a block, codegenprepare updates all blockaddress expressions which reference the block. In case the blockaddress is located in a different function, it leads to updates across function boundaries, which is problematic for a function pass like codegenprepare.

If blockaddress is in a function defined before the current one, the update to the blockaddress is lost.

This change adds a check to avoid eliminations of any block which has its address taken.

Fixes: #161164

When eliminating a block, codegenprepare updates all blockaddress
expressions which reference the block. In case the blockaddress is
located in a different function, it leads to updates across function
boundaries, which is problematic for a function pass like
codegenprepare.

If blockaddress is in a function defined before the current one, the
update to the blockaddress is lost.

This change adds a check to avoid eliminations of any block which has
its address taken.

Fixes: llvm#161164
Change-Id: Ieebc50352234c332365d24ad0083cd51ae903e35
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/CodeGen/CodeGenPrepare.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 551dde26e..776dbda97 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -958,7 +958,8 @@ bool CodeGenPrepare::isMergingEmptyBlockProfitable(BasicBlock *BB,
   // This could lead to updates across functions which is problematic in a
   // function pass like codegenprepare. The update to a blockaddress in a
   // function defined before the function with the eliminated block is lost.
-  if(BB->hasAddressTaken()) return false;
+  if (BB->hasAddressTaken())
+    return false;
 
   // Do not delete loop preheaders if doing so would create a critical edge.
   // Loop preheaders can be good locations to spill registers. If the

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a basic block is eliminated before codegen (or before the last IR module pass in the codegen pipeline), there isn't a problem because nothing holds onto the reference.

We have infrastructure for dealing with basic blocks that get eliminated after isel (see AsmPrinter::emitFunctionBody).

And if a block gets eliminated between the start of codegen and isel, we have some code that's supposed to deal with it. See AddrLabelMap::takeDeletedSymbolsForFunction in AsmPrinter.cpp. (see also db035a0). This is supposed to handle testcases like #161164. If that code isn't functioning correctly, we should fix it.

Trying to prevent any codegen IR pass from making code unreachable is way too much work, and impossible to maintain long-term.

@tetzank
Copy link
Author

tetzank commented Oct 20, 2025

And if a block gets eliminated between the start of codegen and isel, we have some code that's supposed to deal with it. See AddrLabelMap::takeDeletedSymbolsForFunction in AsmPrinter.cpp. (see also db035a0). This is supposed to handle testcases like #161164. If that code isn't functioning correctly, we should fix it.

Looks like the code in AddrLabelMap in the end just adds a label definition at the beginning of a function, if it was collected previously as a deleted symbol.

// If the function had address-taken blocks that got deleted, then we have
// references to the dangling symbols. Emit them at the start of the function
// so that we don't get references to undefined symbols.
std::vector<MCSymbol*> DeadBlockSyms;
takeDeletedSymbolsForFunction(&F, DeadBlockSyms);
for (MCSymbol *DeadBlockSym : DeadBlockSyms) {
OutStreamer->AddComment("Address taken block that was later removed");
OutStreamer->emitLabel(DeadBlockSym);
}

Deleted symbols are collected in AddrLabelMap::UpdateForDeletedBlock.

This sounds like a band aid to not get the undefined symbol error, if we spot the missing location to call UpdateForDeletedBlock. But I have the feeling the code is broken afterwards. We are just placing the symbol somewhere.

It does not fix the lost update (neither does my fix of avoiding block elimination). codegenprepare updates the blockaddress referring to the eliminated block, but it does not work if blockaddress is in a function defined before the current one.

@efriedma-quic
Copy link
Collaborator

This sounds like a band aid to not get the undefined symbol error, if we spot the missing location to call UpdateForDeletedBlock. But I have the feeling the code is broken afterwards. We are just placing the symbol somewhere.

blockaddress constants are specifically designed to be used as inputs to indirectbr. Any other use of a blockaddress constant is outside of the IR model. In particular, if a block has no indirectbr predecessors, we provide no guarantees at all about the value of a blockaddress. So... in cases like #161164, yes, it just needs to point somewhere.

@tetzank
Copy link
Author

tetzank commented Oct 21, 2025

blockaddress constants are specifically designed to be used as inputs to indirectbr. Any other use of a blockaddress constant is outside of the IR model. In particular, if a block has no indirectbr predecessors, we provide no guarantees at all about the value of a blockaddress. So... in cases like #161164, yes, it just needs to point somewhere.

Ok, makes sense. All indirectbr instructions got eliminated in #161164, therefore the value of blockaddress is unimportant. One can see if a block is targeted by a indirectbr by looking at the terminator in each predecessor.

Thank you for explaining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[x86] Undefined temporary symbol .Ltmp0 created by blockaddress

3 participants