[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

rickyz · 2021-12-18T03:45:45Z

This bug was originally discovered at ClangBuiltLinux/linux#1215 by attempting to build Linux with Clang+LTO. The following is a copy of the analysis from ClangBuiltLinux/linux#1215 (comment):

Here's a hand-minimized repro:

void f(long);

__attribute__((noinline)) static void fun(long x) {
  f(x + 1);
}

void repro(void) {
  fun(({
    label:
      (long)&&label;
  }));
}

$ clang -O2 -flto -c repro.c -o repro.i
$ ld.lld repro.i
ld.lld: error: Never resolved function from blockaddress (...)

Relevant part of the LLVM IR, generated with:
$ clang -O2 -flto -c repro.c -fno-discard-value-names -S

...
define dso_local void @repro() #0 {
entry:
  br label %label

label:                                            ; preds = %entry
  tail call fastcc void @fun()
  ret void
}

define internal fastcc void @fun() unnamed_addr #1 {
entry:
  tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
  ret void
}
...

We can see clang figured out that the first argument to fun() is always the same, so it inlined the value of the argument (which is the address of label in repro).

Searching LLVM code for the source of the error message gives

llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Line 853 in 4b55329

return error("Never resolved function from blockaddress");

(there is another error with the same string, but building lld with a change to the error message shows that this is the one that triggers). Reading through this code, we see that BitcodeReader lazily reads an LLVM bitcode file. The reader appears to initially return placeholder objects that need to be "materialized" on demand.

The code has some special handling of blockaddress IR constants, as a blockaddress that appears within a function F may reference a basic block from a different function G (as is the case in the IR for the bug repro).

When materializing a function with a blockaddress(Fn, Fn_BB), the code needs to separately handle the cases where Fn either has or has not yet been materialized. That logic appears here:

llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Lines 2901 to 2927 in 4b55329

    
           // If the function is already parsed we can insert the block address right 
        
           // away. 
        
           BasicBlock *BB; 
        
           unsigned BBID = Record[2]; 
        
           if (!BBID) 
        
             // Invalid reference to entry block. 
        
             return error("Invalid ID"); 
        
           if (!Fn->empty()) { 
        
             Function::iterator BBI = Fn->begin(), BBE = Fn->end(); 
        
             for (size_t I = 0, E = BBID; I != E; ++I) { 
        
               if (BBI == BBE) 
        
                 return error("Invalid ID"); 
        
               ++BBI; 
        
             } 
        
             BB = &*BBI; 
        
           } else { 
        
             // Otherwise insert a placeholder and remember it so it can be inserted 
        
             // when the function is parsed. 
        
             auto &FwdBBs = BasicBlockFwdRefs[Fn]; 
        
             if (FwdBBs.empty()) 
        
               BasicBlockFwdRefQueue.push_back(Fn); 
        
             if (FwdBBs.size() < BBID + 1) 
        
               FwdBBs.resize(BBID + 1); 
        
             if (!FwdBBs[BBID]) 
        
               FwdBBs[BBID] = BasicBlock::Create(Context); 
        
             BB = FwdBBs[BBID]; 
        
           }

If Fn has been materialized (the Fn->empty() check), then the reader creates a BlockAddress referring to the appropriate Function/BasicBlock object. If Fn has not been materialized, then the reader creates a placeholder BasicBlock for the BlockAddress and adds Fn to a queue of functions to be materialized. When Fn is materialized, it will make sure to use the precreated BasicBlock object.

After adding some debug prints, it appears that lld's use of BitcodeReader breaks this lazy blockaddress handling. In the repro case, the linker does materialize repro before it attempts to materialize fun. However, before it attempts to materialize fun, it steals the basic blocks out of repro's Function object here:

llvm-project/llvm/lib/Linker/IRMover.cpp

Lines 1116 to 1117 in 4b55329

    
           Dst.stealArgumentListFrom(Src); 
        
           Dst.getBasicBlockList().splice(Dst.end(), Src.getBasicBlockList());

This causes the Fn->empty() check mentioned above to pass, which makes the code think that Fn (repro) has not yet been materialized. That code path enqueues repro to be materialized a second time. When that happens, this code:

llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Lines 162 to 163 in 4b55329

    
           if (F.isMaterializable()) 
        
             continue;

detects (using IsMaterializable() instead of empty()) that repro has already been materialized, and errors out.

This looks like a fairly old bug in lld. Fixing it looks tricky - it seems that BitcodeReader requires that materialized Functions aren't modified while there are still functions that will be materialized later, and lld violates that assumption. Maintaining this invariant seems like it might conflict with the goal of lazy reading. Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.

The text was updated successfully, but these errors were encountered:

nickdesaulniers · 2021-12-23T22:07:33Z

smaller reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                    
target triple = "x86_64-unknown-linux-gnu"                                                                                                                                      
                                                                                                                                                                                
define void @repro() {                                                                                                                                                          
  br label %label                                                                                                                                                               
                                                                                                                                                                                
label:                                                                                                                                                                          
  call void @fun()                                                                                                                                                              
  ret void                                                                                                                                                                      
}

define void @fun() noinline {
  call void @f(i8* blockaddress(@repro, %label))
  ret void
}

declare void @f(i8*)

$ llvm-as d.ll
$ ld.lld d.bc
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM14.0.0git' Reader: 'LLVM 14.0.0git')

nickdesaulniers · 2021-12-23T22:31:49Z

cc @dexonsmith (author of 908d809)

dexonsmith · 2022-01-11T20:00:51Z

cc @dexonsmith (author of 908d809)

Gosh, that was a while ago. In case it helps, I can tell you I was fixing bugs I found while implementing use-list-order serialization (see e.g. the successor 5a511b5).

Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.

This could work, but I think it should be possible to fix BitcodeReader/IRMover to do the right thing without adding backedges.

I think you need something like this:

if (!Fn.empty()) {
  // fully materialized
} else if (Fn.isMaterializable()) {
  // existing logic to add placeholders
} else {
  // new logic for when basic blocks have been moved away with IRMover
}

I haven't looked carefully at the stack — why is IRMover being called? will the blocks be moved back? — but I think it's possible we wouldn't need to serialize anything new.

If we did need something new serialized, maybe it could be limited to a bit per basic block saying "is this basic block referenced by a blockaddress outside this function?" (instead of a backedge, just a bit saying there is an edge), and then add ValueHandles for those basic blocks when materializing them to track what they get RAUW'd to.

But I'd much prefer to avoid having the function content in bitcode depend on what references it. Seems like the IRMover and BitcodeReader should be able to track this somehow. E.g., maybe the caller of IRMover should tell BitcodeReader that the basic blocks were moved to another function.

nickdesaulniers · 2022-02-16T23:32:30Z

Funnily enough, given my reduced test case above, reordering the definition of @fun to occur before @repro works.

(for my reduced case above) It seems like @repro is materialized first, marked non-materializable. Then when we visit @fun, we seem to be no longer capable of materializing the reference in the blockaddress to @repro.

There's a comment in BitcodeReader::materializeForwardReferencedFunctions about:

 852     if (!BasicBlockFwdRefs.count(F))                                                                                                                                       
 853       // Already materialized.                                                                                                                                             
 854       continue;

so it seems like F technically was already materialized, but it's no longer referred to by BasicBlockFwdRefs.

nickdesaulniers · 2022-02-18T19:51:44Z

I think what's going on is that LTO uses IRMover is moving Functions from one Module to another (to merge them).

If the Function of the BlockAddress has already been moved, it appears "empty" so a BasicBlockFwdRef gets inserted. Later, when iterating the BasicBlocks of BasicBlockFwdRef we find the referred to function is no longer materializable and we produce the observed error.

So right now, in BitcodeReader::parseConstants we can't disambiguate whether a Function is empty because we haven't parsed it yet and therefor should insert a reference to it in BasicBlockFwdRef vs it being empty because LTO moved it to a new Module.

Mapper::mapBlockAddress looks like it probably should be called somewhere. (IRMover has a ValueMapper)

This is tricky; you could have 2 different Functions with BlockAddresses that refer to each other. How is importing them one by one supposed to work?

rickyz · 2022-02-18T20:08:08Z

I haven't had a chance to investigate deeply after @dexonsmith's comments, but I think my biggest question is: is it guaranteed that no other modifications (e.g. LTO passes) will happen to a moved function until we have materialized all functions that reference it?

If that is guaranteed, then we can probably keep track of some information about moved functions to resolve blockrefs pointing into moved functions. If that is not guaranteed though, then I think we would need to figure out a way to eagerly materialize all of the dependents of a function, which sounds a little trickier to do.

nickdesaulniers · 2022-02-18T20:41:37Z

If that is guaranteed, then we can probably keep track of some information about moved functions to resolve blockrefs pointing into moved functions.

IRMover has a ValueMapper that keeps track. The problem is that BitcodeReader doesn't have access to this mapping; it's just being told to materialize Functions one by one from bitcode.

If that is not guaranteed though, then I think we would need to figure out a way to eagerly materialize all of the dependents of a function, which sounds a little trickier to do.

Right, for a Function, we can generally call Function::hasAddressTaken() to find such dependencies. The issue is that that method will assert if we haven't yet fully materialized the whole Module, which we haven't done yet during LTO.

nickdesaulniers · 2022-03-01T19:29:55Z

IRLinker::linkFunctionBody uses splice to rip the Function body from the source to the dest Module; I suspect if it copied without stealing, then the ValueMapper stuff would just work. Unfortunately, it seems that BasicBlockListType (return type of Function::getBasicBlockList) is intentionally non-copy-able.

So what's happening is that we have something like:

for function f in function_list:
  materialize f
  splice f from source module to dest module

That splicing is making f fail to materialize for the second function in the function_list since materializing (i.e. lazy parsing LLVM bitcode into memory) is recursive for blockaddress references. Instead, for this to be safe, we'd need modify the above logic to something like:

for function f in function_list:
  materialize f
for function f in function_list:
  splice f from source module to dest module

nickdesaulniers · 2022-03-01T19:31:12Z

for a Function, we can generally call Function::hasAddressTaken() to find such dependencies

I tried that; doesn't work; we can't determine whether a function has its address taken until a module has been fully materialized. To support lazy parsing, we don't want to fully parse the body of every function to try to find blockaddress constants.

nickdesaulniers · 2022-03-01T22:39:21Z

I think I have a fix for this; just running the test suite and writing more tests.

nickdesaulniers · 2022-03-01T23:16:46Z

https://reviews.llvm.org/D120781

nickdesaulniers · 2022-04-12T18:42:06Z

/cherry-pick 23ec578

@tstellar this is an addition to LLVM bitcode. It doesn't remove anything or modify existing bitcode. I don't know if that's considered a risk or not for an LLVM point release. This will help us fix compilation failures for various Linux kernel configurations under LTO.

llvmbot · 2022-04-12T18:52:46Z

/branch llvmbot/llvm-project/issue52787

IRLinker builds a work list of functions to materialize, then moves them from a source module to a destination module one at a time. This is a problem for blockaddress Constants, since they need not refer to the function they are used in; IPSCCP is quite good at sinking these constants deep into other functions when passed as arguments. This would lead to curious errors during LTO: ld.lld: error: Never resolved function from blockaddress ... based on the ordering of function definitions in IR. The problem was that IRLinker would basically do: for function f in worklist: materialize f splice f from source module to destination module in one pass, with Functions being lazily added to the running worklist. This confuses BitcodeReader, which cannot disambiguate whether a blockaddress is referring to a function which has not yet been parsed ("materialized") or is simply empty because its body was spliced out. This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs list incorrectly, as it will never re-materialize an already materialized (but spliced out) function. Because of the possibility that blockaddress Constants may appear in Functions other than the ones they reference, this patch adds a new bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of Functions that contain BlockAddress Constants that refer back to this Function, rather then the Function they are scoped in. We then materialize those functions when materializing `f` from the example loop above. This might over-materialize Functions should the user of BitcodeReader ultimately decide not to link those Functions, but we can at least now we can avoid this ordering related issue with blockaddresses. Fixes: llvm#52787 Fixes: ClangBuiltLinux/linux#1215 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D120781 (cherry picked from commit 23ec578)

llvmbot · 2022-04-12T18:53:57Z

/pull-request llvmbot#167

tstellar · 2022-04-14T21:16:37Z

@dexonsmith Is this safe to backport: 23ec578

tstellar · 2022-04-14T21:18:51Z

@nickdesaulniers Does this change mean that LLVM 14.0.0 won't be able to read bitcode from 14.0.1+x ?

nickdesaulniers · 2022-04-15T07:43:52Z

Does this change mean that LLVM 14.0.0 won't be able to read bitcode from 14.0.1+x ?

I suspect so; 14.0.1 could produce new bitcode, I assume 14.0.0 would not recognize it.

dexonsmith · 2022-04-22T23:47:55Z

@dexonsmith Is this safe to backport: 23ec578

Sorry I missed the question — no, I don't think it's safe to backport to a dot release, since it adds new bitcode records.

nickdesaulniers · 2022-04-23T05:42:53Z

No worries; there's only 2 optional configs that expose this easily (at the moment). Those folks can use ToT if they really need full LTO to work for those modules.

twd2 · 2022-05-03T21:30:58Z

Hi, it seems that the fix only adds Functions that have Instructions that directly use BlockAddresses into the bitcode.
So I'd like to propose a patch to further add the indirect users: https://reviews.llvm.org/D124878

@rickyz

The original fix (commit 23ec578) of #52787 only adds `Function`s that have `Instruction`s that directly use `BlockAddress`es into the bitcode (`FUNC_CODE_BLOCKADDR_USERS`). However, in either @rickyz's original reproducing code: ``` void f(long); __attribute__((noinline)) static void fun(long x) { f(x + 1); } void repro(void) { fun(({ label: (long)&&label; })); } ``` ``` ... define dso_local void @repro() #0 { entry: br label %label label: ; preds = %entry tail call fastcc void @fun() ret void } define internal fastcc void @fun() unnamed_addr #1 { entry: tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3 ret void } ... ``` or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g., `i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s (e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and then used by `Instruction`s. This case is not handled by the original fix. This patch adds *indirect* users of `BlockAddress`es, i.e., the `Instruction`s using some `Constant`s which further use the `BlockAddress`es, into the bitcode as well, by doing depth-first searches. Fixes: #52787 Fixes: 23ec578 ("[Bitcode] materialize Functions early when BlockAddress taken") Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124878

IRLinker builds a work list of functions to materialize, then moves them from a source module to a destination module one at a time. This is a problem for blockaddress Constants, since they need not refer to the function they are used in; IPSCCP is quite good at sinking these constants deep into other functions when passed as arguments. This would lead to curious errors during LTO: ld.lld: error: Never resolved function from blockaddress ... based on the ordering of function definitions in IR. The problem was that IRLinker would basically do: for function f in worklist: materialize f splice f from source module to destination module in one pass, with Functions being lazily added to the running worklist. This confuses BitcodeReader, which cannot disambiguate whether a blockaddress is referring to a function which has not yet been parsed ("materialized") or is simply empty because its body was spliced out. This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs list incorrectly, as it will never re-materialize an already materialized (but spliced out) function. Because of the possibility that blockaddress Constants may appear in Functions other than the ones they reference, this patch adds a new bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of Functions that contain BlockAddress Constants that refer back to this Function, rather then the Function they are scoped in. We then materialize those functions when materializing `f` from the example loop above. This might over-materialize Functions should the user of BitcodeReader ultimately decide not to link those Functions, but we can at least now we can avoid this ordering related issue with blockaddresses. Fixes: llvm/llvm-project#52787 Fixes: ClangBuiltLinux/linux#1215 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D120781

@rickyz

The original fix (commit 23ec5782c3cc) of llvm/llvm-project#52787 only adds `Function`s that have `Instruction`s that directly use `BlockAddress`es into the bitcode (`FUNC_CODE_BLOCKADDR_USERS`). However, in either @rickyz's original reproducing code: ``` void f(long); __attribute__((noinline)) static void fun(long x) { f(x + 1); } void repro(void) { fun(({ label: (long)&&label; })); } ``` ``` ... define dso_local void @repro() #0 { entry: br label %label label: ; preds = %entry tail call fastcc void @fun() ret void } define internal fastcc void @fun() unnamed_addr #1 { entry: tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3 ret void } ... ``` or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g., `i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s (e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and then used by `Instruction`s. This case is not handled by the original fix. This patch adds *indirect* users of `BlockAddress`es, i.e., the `Instruction`s using some `Constant`s which further use the `BlockAddress`es, into the bitcode as well, by doing depth-first searches. Fixes: llvm/llvm-project#52787 Fixes: 23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken") Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124878

github-actions bot added the new issue label Dec 18, 2021

nickdesaulniers added the lld label Dec 23, 2021

EugeneZelenko removed the new issue label Jan 15, 2022

nickdesaulniers self-assigned this Mar 1, 2022

nickdesaulniers closed this as completed in 23ec578 Apr 12, 2022

nickdesaulniers reopened this Apr 12, 2022

nickdesaulniers added this to the LLVM 14.0.1 Release milestone Apr 12, 2022

nickdesaulniers added release:backport llvm:bitcode tools:llvm-link and removed lld labels Apr 12, 2022

nickdesaulniers mentioned this issue Apr 12, 2022

"Never resolved function from blockaddress" linking xfs.ko ClangBuiltLinux/linux#1215

Open

llvmbot mentioned this issue Apr 12, 2022

PR for llvm/llvm-project#52787 llvmbot/llvm-project#167

Open

tstellar modified the milestones: LLVM 14.0.1 Release, LLVM 14.0.2 Release Apr 13, 2022

tstellar assigned dexonsmith Apr 14, 2022

nickdesaulniers closed this as completed Apr 23, 2022

EugeneZelenko removed the release:backport label Apr 23, 2022

EugeneZelenko removed this from the LLVM 14.0.2 Release milestone Apr 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

rickyz commented Dec 18, 2021 •

edited

Loading

nickdesaulniers commented Dec 23, 2021

nickdesaulniers commented Dec 23, 2021

dexonsmith commented Jan 11, 2022

nickdesaulniers commented Feb 16, 2022 •

edited

Loading

nickdesaulniers commented Feb 18, 2022

rickyz commented Feb 18, 2022

nickdesaulniers commented Feb 18, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Apr 12, 2022

llvmbot commented Apr 12, 2022

llvmbot commented Apr 12, 2022

tstellar commented Apr 14, 2022

tstellar commented Apr 14, 2022

nickdesaulniers commented Apr 15, 2022

dexonsmith commented Apr 22, 2022

nickdesaulniers commented Apr 23, 2022

twd2 commented May 3, 2022

[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787

Comments

rickyz commented Dec 18, 2021 • edited Loading

nickdesaulniers commented Dec 23, 2021

nickdesaulniers commented Dec 23, 2021

dexonsmith commented Jan 11, 2022

nickdesaulniers commented Feb 16, 2022 • edited Loading

nickdesaulniers commented Feb 18, 2022

rickyz commented Feb 18, 2022

nickdesaulniers commented Feb 18, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Mar 1, 2022

nickdesaulniers commented Apr 12, 2022

llvmbot commented Apr 12, 2022

llvmbot commented Apr 12, 2022

tstellar commented Apr 14, 2022

tstellar commented Apr 14, 2022

nickdesaulniers commented Apr 15, 2022

dexonsmith commented Apr 22, 2022

nickdesaulniers commented Apr 23, 2022

twd2 commented May 3, 2022

rickyz commented Dec 18, 2021 •

edited

Loading

nickdesaulniers commented Feb 16, 2022 •

edited

Loading