-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LTO] Fat LTO pipeline miss-optimizes indirect goto. #70703
Comments
The simplify cfg pass introduces the crash in
to
because the destinations don't have their addresses taken. I'm guessing this is a duplicate of #55991, and maybe #47769 just with the problem manifesting in the fat-lto pipeline because we clone the module to embed in the IR. |
Sorry that I didn't see this, between vacation and being terrible setting up github notifications. This is unfortunate, and I agree w/ your suspicions that this looks to be due to cloning the module. We can potentially change the FatLTO pipeline to avoid cloning the module, which we wanted to do anyway. I don't think that https://reviews.llvm.org/D148010 is close to landing, but it was what I was hoping would allow us to go to a unified way to do ModuleSimpilfication followed by ModuleOptimization uniformly w/ existing pipelines. Maybe a better way to go about this for now is to use UnfiedLTO, so we can defer the Full/Thin decision until link-time. Then we can use the PreLinkThinLTO pipeline to emit the IR section which would work w/ both Full and Thin LTO. I'm not sure if we can get away w/ only running ModuleOptimization after that, or we should conservativly run the PostLinkThinLTO pipeline to generate the object code. I think that probably still does too much work, but that is probably preferable to having a codgen bug like this in a supported pipeline. @nikic @teresajohnson Do either of you have thoughts here? We've luckily not hit this in Fuchsia yet, but given that this leads to miscompiles, I'd like to address this ASAP. |
#72180 is a potential fix, though I'm unsure if the pipeline changes are exactly the way we want them. |
Thanks for looking at this. I did a little bit of poking at the cloning code, its confusing because we have code like this to patch up the BranchAddresses when cloning to a new function that explicitly states its illegal to clone code where a basic block address leaks out of the function, but it seems the initializers for the arrays of block addresses get mapped to BlockAddress values from the cloned module anyway 🤷 . It seems like it's the IndirectBrInst that has invalid operands, but I haven't had time to look into it any deeper to determine if we could update the operands to the correct ones to fix the problem. With the comments it seems it might be best to consider dropping cloning like your PR does. I've had a look at the proposed fix. I think you are right about https://reviews.llvm.org/D148010 not helping as it seems there are a number of ordering issues that need to be worked out first. Don't we have a somewhat similar situation with #72180 if we are using the fat objects to feed a monolithic LTO link though? Since we are always running the thin-lto prelink pipeline I assume we miss the same opportunities. Is the cloning necessary for us to run the fat lto pipeline? |
On Mon, Nov 27, 2023 at 11:31 AM Sean Fertile ***@***.***> wrote:
Thanks for looking at this.
I did a little bit of poking at the cloning code, its confusing because we
have code like this
<https://github.com/llvm/llvm-project/blob/e3f16de9a33d48f6a9d8035a9aebfdb0e3a16ea5/llvm/lib/Transforms/Utils/CloneFunction.cpp#L207C32-L207C32>
to patch up the BranchAddresses when cloning to a new function that
explicitly states its illegal to clone code where a basic block address
leaks out of the function, but it seems the initializers for the arrays of
block addresses get mapped to BlockAddress values from the cloned module
anyway 🤷 . It seems like it's the IndirectBrInst that has invalid
operands, but I haven't had time to look into it any deeper to determine if
we could update the operands to the correct ones to fix the problem. With
the comments it seems it might be best to consider dropping cloning like
your PR does.
I've had a look at the proposed fix. I think you are right about
https://reviews.llvm.org/D148010 not helping as it seems there are a
number of ordering issues that need to be worked out first. Don't we have a
somewhat similar situation with #72180
<#72180> if we are using the fat
objects to feed a monolithic LTO link though?
It sounds like the issue relates to cloning, which should go away with that
fix. Using the unified LTO approach works with either ThinLTO or monolithic
LTO links.
Since we are always running the thin-lto prelink pipeline I assume we miss
the same opportunities. Is the cloning necessary for us to run the fat lto
pipeline?
I can't recall exactly why FatLTO started cloning the module - iirc it was
related to some concerns about the optimization of the non-LTO native
objects, but I don't recall why the decision was to clone instead of using
ThinLTO pre+post link pipelines as the fix does. Perhaps this was before
Unified LTO was ready?
… —
Reply to this email directly, view it on GitHub
<#70703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE37ZQ553ORSGKWIAO3IMD3YGTS7RAVCNFSM6AAAAAA6WPTULCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGQ3TONJZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Teresa Johnson | Software Engineer | ***@***.*** |
|
The big issue that I remember was that we couldn't pin down a good/efficient way to be sure the pre-link pipeline would run and then get optimized correctly for the non-lto object code, e.g. because once you start adding instrumentation or certain other passes to the pre-link pipeline, you can't "just" run IIRC this landed about the same time as UnifiedLTO. I think we landed the pipeline changes before UnifiedLTO was ready, but the linker support and maybe one of the Clang changes landed after UnifiedLTO. |
llvm#70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see llvm#55991 and llvm#47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.
#70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.
@mandlebug I tested #72180 against the test code above, but I'd like to confirm this is actually fixed with you before marking this as closed. |
Thanks for fixing this, the reproducer does work now. I tired running test-suite though as there was one other related failure which I didn't bother extracting the source/compile/link commands for. I am seeing a number of crashes during the build step now. Some example output:
Setup is simply a release build of test-suite using ninja and |
There also seems to be an issue with using UnifiedLTO. I have a feeling the issues are related, but I'm not completely sure. I'll need to dig some more. I've included the reproducer in case you want to look. |
It seems like if I add the options in a new cmake cache file things work as expected. I'm not too familiar with the test suite build. Any thoughts on why adding the options to Here's the diff I added, and I just pointed my build at that cache file instead of using the existing
Here's my cmake invocation
|
I used the same |
Yes, both the
I was testing with an asserts enabled build of the complete toolchain (compiler, linker, and runtimes). I've found a similar issue in a much larger project, though. I was able to reduce it down to almost nothing, but I'm seeing a difference in how the bitcode section is generated with "-debug-info-kind=constructor". I haven't run it down yet, though. I also think there may be some conflict with how we're setting the "ThinLTO" module flag in clang. I definitly get further in with out setting that flag, and more cases seem to work. However, I seem to be runing afoul of an assert in ThinLTOBitcodeWriter now.
@ormris do you remember why this assert should hold? In FatLTO, we're using the ThinLTOPreLinkPipeline, and setting the UnifiedLTO module flag, so I'm unclear on why we'd run into trouble with FatLTO but not UnifiedLTO. For whatever reason my toy example ends up with an empty ModuleID. I don't fully understand the reason this can't/shouldn't ever happen for unified LTO. I think I'm missing some context. |
If we're setting the |
That was one of my thoughts, but when I tried that I'm still hitting the assert I mentioned above. I plan to post a patch to stop setting the ThinLTO flag once I can work around the remaining issue. Do you happen to know why they're incompatible? I guess because the point is to defer the decision until later. |
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See llvm#70703 for context.
It would be good to understand why that assert is there. I have a vague recollection that with UnifiedLTO there was more info used to compute the module hash checked there, but I'd have to dig up the old patches to confirm. |
@ormris can you give some context here? The patch I was thinking of was never accepted: https://reviews.llvm.org/D123969. So I'm still not sure why this assert was added. Hmm, the Unified LTO LLVM patch (https://reviews.llvm.org/D123803) says in the summary that it includes "Make sure that the ModuleID is generated by incorporating more types of symbols." But I have skimmed that patch right now and don't see this. This is the patch that added the assert. |
Thanks for the context. I was also going through https://reviews.llvm.org/D123803 and didn't understand why the assert would be there at all, but it makes a lot more sense with https://reviews.llvm.org/D123969. Since that hasn't landed yet, do you think it makes sense to change the behavior back to account for UnifiedLTO? So something like https://reviews.llvm.org/D123803?id=527693#inline-1475183, where we just pass the |
If you run Unified LTO with split LTO units enabled, empty module IDs mean that some modules will be written as regular LTO modules, and some will be written as ThinLTO modules. Unified LTO wants all modules to be either ThinLTO or regular LTO modules so that the whole program can be optimized as one piece. At SIE, we patched getUniqueModuleId to ensure it could produce a ModuleID in almost all circumstances. All of our changes are in https://reviews.llvm.org/D123969. If there's interest, I'd be happy to re-open that review as a PR. Otherwise, disabling split LTO units could also work, assuming that ThinLTO CFI support isn't needed. |
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See #70703 for context.
Thanks. I've filed #77524 to track that. |
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See llvm/llvm-project#70703 for context.
test-suite test
SingleSource/Regression/C/2004-03-15-IndirectGoto.c
exhibits the problem.Input:
Compile commands to reproduce:
test-suite test
SingleSource/Benchmarks/Misc/evalloop.c
is also failing likely due to the same underlying issue.The text was updated successfully, but these errors were encountered: