Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

Closed
rickyz opened this issue Dec 18, 2021 · 20 comments

Comments

@rickyz
Copy link
Member

rickyz commented Dec 18, 2021

This bug was originally discovered at ClangBuiltLinux/linux#1215 by attempting to build Linux with Clang+LTO. The following is a copy of the analysis from ClangBuiltLinux/linux#1215 (comment):

Here's a hand-minimized repro:

void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}

$ clang -O2 -flto -c repro.c -o repro.i
$ ld.lld repro.i
ld.lld: error: Never resolved function from blockaddress (...)

Relevant part of the LLVM IR, generated with:
$ clang -O2 -flto -c repro.c -fno-discard-value-names -S

...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...

We can see clang figured out that the first argument to fun() is always the same, so it inlined the value of the argument (which is the address of label in repro).

Searching LLVM code for the source of the error message gives

return error("Never resolved function from blockaddress");

(there is another error with the same string, but building lld with a change to the error message shows that this is the one that triggers). Reading through this code, we see that BitcodeReader lazily reads an LLVM bitcode file. The reader appears to initially return placeholder objects that need to be "materialized" on demand.

The code has some special handling of blockaddress IR constants, as a blockaddress that appears within a function F may reference a basic block from a different function G (as is the case in the IR for the bug repro).

When materializing a function with a blockaddress(Fn, Fn_BB), the code needs to separately handle the cases where Fn either has or has not yet been materialized. That logic appears here:

// If the function is already parsed we can insert the block address right
// away.
BasicBlock *BB;
unsigned BBID = Record[2];
if (!BBID)
// Invalid reference to entry block.
return error("Invalid ID");
if (!Fn->empty()) {
Function::iterator BBI = Fn->begin(), BBE = Fn->end();
for (size_t I = 0, E = BBID; I != E; ++I) {
if (BBI == BBE)
return error("Invalid ID");
++BBI;
}
BB = &*BBI;
} else {
// Otherwise insert a placeholder and remember it so it can be inserted
// when the function is parsed.
auto &FwdBBs = BasicBlockFwdRefs[Fn];
if (FwdBBs.empty())
BasicBlockFwdRefQueue.push_back(Fn);
if (FwdBBs.size() < BBID + 1)
FwdBBs.resize(BBID + 1);
if (!FwdBBs[BBID])
FwdBBs[BBID] = BasicBlock::Create(Context);
BB = FwdBBs[BBID];
}

If Fn has been materialized (the Fn->empty() check), then the reader creates a BlockAddress referring to the appropriate Function/BasicBlock object. If Fn has not been materialized, then the reader creates a placeholder BasicBlock for the BlockAddress and adds Fn to a queue of functions to be materialized. When Fn is materialized, it will make sure to use the precreated BasicBlock object.

After adding some debug prints, it appears that lld's use of BitcodeReader breaks this lazy blockaddress handling. In the repro case, the linker does materialize repro before it attempts to materialize fun. However, before it attempts to materialize fun, it steals the basic blocks out of repro's Function object here:

Dst.stealArgumentListFrom(Src);
Dst.getBasicBlockList().splice(Dst.end(), Src.getBasicBlockList());

This causes the Fn->empty() check mentioned above to pass, which makes the code think that Fn (repro) has not yet been materialized. That code path enqueues repro to be materialized a second time. When that happens, this code:

if (F.isMaterializable())
continue;

detects (using IsMaterializable() instead of empty()) that repro has already been materialized, and errors out.

This looks like a fairly old bug in lld. Fixing it looks tricky - it seems that BitcodeReader requires that materialized Functions aren't modified while there are still functions that will be materialized later, and lld violates that assumption. Maintaining this invariant seems like it might conflict with the goal of lazy reading. Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.

@nickdesaulniers
Copy link
Member

smaller reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                    
target triple = "x86_64-unknown-linux-gnu"                                                                                                                                      
                                                                                                                                                                                
define void @repro() {                                                                                                                                                          
  br label %label                                                                                                                                                               
                                                                                                                                                                                
label:                                                                                                                                                                          
  call void @fun()                                                                                                                                                              
  ret void                                                                                                                                                                      
}

define void @fun() noinline {
  call void @f(i8* blockaddress(@repro, %label))
  ret void
}

declare void @f(i8*)
$ llvm-as d.ll
$ ld.lld d.bc
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM14.0.0git' Reader: 'LLVM 14.0.0git')

@nickdesaulniers
Copy link
Member

cc @dexonsmith (author of 908d809)

@dexonsmith
Copy link
Collaborator

cc @dexonsmith (author of 908d809)

Gosh, that was a while ago. In case it helps, I can tell you I was fixing bugs I found while implementing use-list-order serialization (see e.g. the successor 5a511b5).

Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.

This could work, but I think it should be possible to fix BitcodeReader/IRMover to do the right thing without adding backedges.

I think you need something like this:

if (!Fn.empty()) {
  // fully materialized
} else if (Fn.isMaterializable()) {
  // existing logic to add placeholders
} else {
  // new logic for when basic blocks have been moved away with IRMover
}

I haven't looked carefully at the stack — why is IRMover being called? will the blocks be moved back? — but I think it's possible we wouldn't need to serialize anything new.

If we did need something new serialized, maybe it could be limited to a bit per basic block saying "is this basic block referenced by a blockaddress outside this function?" (instead of a backedge, just a bit saying there is an edge), and then add ValueHandles for those basic blocks when materializing them to track what they get RAUW'd to.

But I'd much prefer to avoid having the function content in bitcode depend on what references it. Seems like the IRMover and BitcodeReader should be able to track this somehow. E.g., maybe the caller of IRMover should tell BitcodeReader that the basic blocks were moved to another function.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Feb 16, 2022

Funnily enough, given my reduced test case above, reordering the definition of @fun to occur before @repro works.

(for my reduced case above) It seems like @repro is materialized first, marked non-materializable. Then when we visit @fun, we seem to be no longer capable of materializing the reference in the blockaddress to @repro.

There's a comment in BitcodeReader::materializeForwardReferencedFunctions about:

 852     if (!BasicBlockFwdRefs.count(F))                                                                                                                                       
 853       // Already materialized.                                                                                                                                             
 854       continue;

so it seems like F technically was already materialized, but it's no longer referred to by BasicBlockFwdRefs.

@nickdesaulniers
Copy link
Member

I think what's going on is that LTO uses IRMover is moving Functions from one Module to another (to merge them).

If the Function of the BlockAddress has already been moved, it appears "empty" so a BasicBlockFwdRef gets inserted. Later, when iterating the BasicBlocks of BasicBlockFwdRef we find the referred to function is no longer materializable and we produce the observed error.

So right now, in BitcodeReader::parseConstants we can't disambiguate whether a Function is empty because we haven't parsed it yet and therefor should insert a reference to it in BasicBlockFwdRef vs it being empty because LTO moved it to a new Module.

Mapper::mapBlockAddress looks like it probably should be called somewhere. (IRMover has a ValueMapper)

This is tricky; you could have 2 different Functions with BlockAddresses that refer to each other. How is importing them one by one supposed to work?

@rickyz
Copy link
Member Author

rickyz commented Feb 18, 2022

I haven't had a chance to investigate deeply after @dexonsmith's comments, but I think my biggest question is: is it guaranteed that no other modifications (e.g. LTO passes) will happen to a moved function until we have materialized all functions that reference it?

If that is guaranteed, then we can probably keep track of some information about moved functions to resolve blockrefs pointing into moved functions. If that is not guaranteed though, then I think we would need to figure out a way to eagerly materialize all of the dependents of a function, which sounds a little trickier to do.

@nickdesaulniers
Copy link
Member

If that is guaranteed, then we can probably keep track of some information about moved functions to resolve blockrefs pointing into moved functions.

IRMover has a ValueMapper that keeps track. The problem is that BitcodeReader doesn't have access to this mapping; it's just being told to materialize Functions one by one from bitcode.

If that is not guaranteed though, then I think we would need to figure out a way to eagerly materialize all of the dependents of a function, which sounds a little trickier to do.

Right, for a Function, we can generally call Function::hasAddressTaken() to find such dependencies. The issue is that that method will assert if we haven't yet fully materialized the whole Module, which we haven't done yet during LTO.

@nickdesaulniers
Copy link
Member

IRLinker::linkFunctionBody uses splice to rip the Function body from the source to the dest Module; I suspect if it copied without stealing, then the ValueMapper stuff would just work. Unfortunately, it seems that BasicBlockListType (return type of Function::getBasicBlockList) is intentionally non-copy-able.

So what's happening is that we have something like:

for function f in function_list:
  materialize f
  splice f from source module to dest module

That splicing is making f fail to materialize for the second function in the function_list since materializing (i.e. lazy parsing LLVM bitcode into memory) is recursive for blockaddress references. Instead, for this to be safe, we'd need modify the above logic to something like:

for function f in function_list:
  materialize f
for function f in function_list:
  splice f from source module to dest module

@nickdesaulniers
Copy link
Member

for a Function, we can generally call Function::hasAddressTaken() to find such dependencies

I tried that; doesn't work; we can't determine whether a function has its address taken until a module has been fully materialized. To support lazy parsing, we don't want to fully parse the body of every function to try to find blockaddress constants.

@nickdesaulniers
Copy link
Member

I think I have a fix for this; just running the test suite and writing more tests.

@nickdesaulniers nickdesaulniers self-assigned this Mar 1, 2022
@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

/cherry-pick 23ec578

@tstellar this is an addition to LLVM bitcode. It doesn't remove anything or modify existing bitcode. I don't know if that's considered a risk or not for an LLVM point release. This will help us fix compilation failures for various Linux kernel configurations under LTO.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 12, 2022

/branch llvmbot/llvm-project/issue52787

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Apr 12, 2022
IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: llvm#52787
Fixes: ClangBuiltLinux/linux#1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781

(cherry picked from commit 23ec578)
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 12, 2022

/pull-request llvmbot#167

@tstellar
Copy link
Collaborator

@dexonsmith Is this safe to backport: 23ec578

@tstellar
Copy link
Collaborator

@nickdesaulniers Does this change mean that LLVM 14.0.0 won't be able to read bitcode from 14.0.1+x ?

@nickdesaulniers
Copy link
Member

Does this change mean that LLVM 14.0.0 won't be able to read bitcode from 14.0.1+x ?

I suspect so; 14.0.1 could produce new bitcode, I assume 14.0.0 would not recognize it.

@dexonsmith
Copy link
Collaborator

@dexonsmith Is this safe to backport: 23ec578

Sorry I missed the question — no, I don't think it's safe to backport to a dot release, since it adds new bitcode records.

@nickdesaulniers
Copy link
Member

No worries; there's only 2 optional configs that expose this easily (at the moment). Those folks can use ToT if they really need full LTO to work for those modules.

@twd2
Copy link
Contributor

twd2 commented May 3, 2022

Hi, it seems that the fix only adds Functions that have Instructions that directly use BlockAddresses into the bitcode.
So I'd like to propose a patch to further add the indirect users: https://reviews.llvm.org/D124878

nickdesaulniers pushed a commit that referenced this issue May 10, 2022
The original fix (commit 23ec578) of
#52787 only adds `Function`s
that have `Instruction`s that directly use `BlockAddress`es into the
bitcode (`FUNC_CODE_BLOCKADDR_USERS`).

However, in either @rickyz's original reproducing code:

```
void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}
```

```
...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...
```

or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g.,
`i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s
(e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and
then used by `Instruction`s. This case is not handled by the original
fix.

This patch adds *indirect* users of `BlockAddress`es, i.e., the
`Instruction`s using some `Constant`s which further use the
`BlockAddress`es, into the bitcode as well, by doing depth-first
searches.

Fixes: #52787
Fixes: 23ec578 ("[Bitcode] materialize Functions early when BlockAddress taken")

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D124878
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: llvm/llvm-project#52787
Fixes: ClangBuiltLinux/linux#1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
The original fix (commit 23ec5782c3cc) of
llvm/llvm-project#52787 only adds `Function`s
that have `Instruction`s that directly use `BlockAddress`es into the
bitcode (`FUNC_CODE_BLOCKADDR_USERS`).

However, in either @rickyz's original reproducing code:

```
void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}
```

```
...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...
```

or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g.,
`i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s
(e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and
then used by `Instruction`s. This case is not handled by the original
fix.

This patch adds *indirect* users of `BlockAddress`es, i.e., the
`Instruction`s using some `Constant`s which further use the
`BlockAddress`es, into the bitcode as well, by doing depth-first
searches.

Fixes: llvm/llvm-project#52787
Fixes: 23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken")

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D124878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants