Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

Closed
tmandry opened this issue Jan 4, 2020 · 18 comments · Fixed by #68435
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented Jan 4, 2020

I got a segfault / LLVM assertion failure while compiling a target with ThinLTO enabled. This is the same failure as in #53912, albeit with a different root cause.

In my case, I was dealing with a global named switch.table._ZN77_$LT$omaha_client..protocol..request..Event$u20$as$u20$core..clone..Clone$GT$5clone17h829b64c9ab982ff5E.llvm.10390335839252477638.llvm.11308644296266801080.

Note the presence of two .llvm.NNNN extensions. This means that this was a local which was promoted to a global symbol twice. The implementation of getOriginalNameBeforePromote doesn't handle this case correctly; it strips off both extensions:

  /// Convenience method for creating a promoted global name
  /// for the given value name of a local, and its original module's ID.
  static std::string getGlobalNameForLocal(StringRef Name, ModuleHash ModHash) {
    SmallString<256> NewName(Name);
    NewName += ".llvm.";
    NewName += utostr((uint64_t(ModHash[0]) << 32) |
                      ModHash[1]); // Take the first 64 bits
    return NewName.str();
  }

  /// Helper to obtain the unpromoted name for a global value (or the original
  /// name if not promoted).
  static StringRef getOriginalNameBeforePromote(StringRef Name) {
    std::pair<StringRef, StringRef> Pair = Name.split(".llvm.");
    return Pair.first;
  }

The fix is actually trivial; just have to change split to rsplit so that it only removes the last .llvm.NNNN extension. Leaving this issue open to track the fix being cherry-picked.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. I-nominated labels Jan 4, 2020
@tmandry
Copy link
Member Author

tmandry commented Jan 6, 2020

Filed a bug upstream.

@tmandry
Copy link
Member Author

tmandry commented Jan 6, 2020

This might still be a rustc bug, since apparently a double-promotion is unexpected. Working on a minimal repro now.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2020

@tmandry is this something that we could/should be pulling in the ICE-breakers on?

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2020

triage: P-high. Assigning to @tmandry and removing I-nominated label.

@pnkfelix pnkfelix added P-high High priority and removed I-nominated labels Jan 9, 2020
@tmandry
Copy link
Member Author

tmandry commented Jan 9, 2020

Possibly, but I’ll need to put a usable reproducer somewhere. That tends to be more difficult when the target is Fuchsia.

I’ve had limited success doing all the minimizing myself. I’ll see if I can reproduce on other targets, and upload a reproducer if so.

@jonas-schievink jonas-schievink added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jan 9, 2020
@tmandry
Copy link
Member Author

tmandry commented Jan 10, 2020

So, from the output of -Csave-temps, what appears to be happening is that we run the same bitcode through the ThinLTO optimization pipeline twice. Once when outputting an rlib, and once when linking to that rlib.

More specifically:

  1. An rlib crate (omaha_client in this case) gets compiled with ThinLTO and optimizations enabled. We run the bitcode through the ThinLTO optimization pipeline.
  • -Csave-temps results in a lot of files that look like libomaha_client.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-after-rename.bc
  1. A binary crate foo is compiled which depends on the rlib. It is also compiled with ThinLTO and optimizations.
  • -Csave-temps again results in a lot of files that look like foo.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-after-rename.bc (except after-internalize and everything after it are missing, because it crashes)
  • the foo.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-input.bc file is the output of the ThinLTO pipeline from the previous step

The offending symbol is promoted to a global once in the first optimization run and again in the second, which accounts for the two .llvm.xxxxx suffixes on it. The crash occurs because it's not expected for a symbol to be promoted twice.

I can't think of a reason why we would run the same pipeline twice on the same bitcode. Is there one?

@tmandry
Copy link
Member Author

tmandry commented Jan 10, 2020

cc @alexcrichton

@alexcrichton
Copy link
Member

@tmandry are you compiling with -Clto=thin, or just normal -Copt-level=3? The former will run some bitcode through thinlto twice, but the latter shouldn't. If the bug happens by running bitcode through twice, that's like where it's coming from?

@tmandry
Copy link
Member Author

tmandry commented Jan 10, 2020

@alexcrichton Ahh you were right, it has to do with the flags!

For the rlib I wasn't passing -Clto=thin, but for the executable I was. As soon as I started passing -Clto=thin for both, the problem went away and we only sent the code through the pipeline once.

(EDIT: I was using -Copt-level=z in both)

Apparently we've been doing this for a very long time in Fuchsia without problems. So is it expected that we should be passing -Clto=thin to both?

@alexcrichton
Copy link
Member

Oh weird, so I didn't expect that to fix anything nor did I think that the compiler does what it does!

Looks like it boils down to this match here which explains:

  • The final executable -Clto=thin means that all bitcode will run through ThinLTO. If the rlib bitcode was previously run through ThinLTO, then it would be run through twice.
  • That match statement, however, means that if you -Clto=thin (Lto::Thin) then we skip the thinlto step in rlibs assuming you'll do it at the end.

So it looks like, yes, you'll want to pass -Clto=thin to all compilations, not just the final one. And that should do the trick! I would still consider this an LLVM bug though because it should in theory work both ways...

@tmandry
Copy link
Member Author

tmandry commented Jan 11, 2020

Agreed on this still being a bug.

Also, as it turns out, the old way of doing it is faster for our build! Presumably since we're doing more work up front.

For progeny, I'm posting the data here.

Here are a couple runs before adding -Clto=thin to the rlib builds:

37625.49user 1599.52system 10:36.78elapsed 6159%CPU (0avgtext+0avgdata 2987252maxresident)k                                                                                                                                                                                                                                          
5760inputs+117653320outputs (4390major+195450569minor)pagefaults 0swaps   
37598.53user 1582.50system 10:38.87elapsed 6132%CPU (0avgtext+0avgdata 2969104maxresident)k
32inputs+116594048outputs (3511major+197468224minor)pagefaults 0swaps

Here are a couple runs after:

47659.28user 1896.30system 13:05.14elapsed 6311%CPU (0avgtext+0avgdata 2971780maxresident)k                                                                                                                                                                                                                                          
240inputs+135449992outputs (2867major+230119654minor)pagefaults 0swaps 
47593.52user 2015.31system 12:58.32elapsed 6373%CPU (0avgtext+0avgdata 2970252maxresident)k
0inputs+135327112outputs (2545major+238473909minor)pagefaults 0swaps

This is on a 72-core machine building a pretty large graph of Rust crates (many executables).

@alexcrichton
Copy link
Member

Interesting, not the results I would have expected!

If you have a lot of executables though and you're running ThinLTO on all of them then you're likely basically optimizing the same code over and over. Before -Clto=thin each crate was individually pretty heavily optimized, doing those optimizations only once instead of once for each executable. Afterwards, however, ThinLTO-per-crate doesn't happen and only happens at the end. This probably produces roughly equivalent code in terms of speed, but each executable has to run more optimizations.

@tmandry
Copy link
Member Author

tmandry commented Jan 14, 2020

This probably produces roughly equivalent code in terms of speed, but each executable has to run more optimizations.

Right, so by doing more of that work upfront in the rlibs, we save time optimizing the code for each executable.

I think there's still a more efficient way of doing all this. For one, we could disable the actual object file output for rlibs.

Also, when compiling the executable, the whole ThinLTO pipeline is run again on the individual crate bitcode before building summaries and distributing the codegen to workers. If there were a way to disable that and tell the compiler to assume that rlib bitcode has already been optimized, we could save some time here. Perhaps we could make that automatic with a flag that emits the summary with the rlib, and detect that it's already there in the executable compile.

Of course I'm talking about how to optimize a completely-unintended way of using Rust's ThinLTO pipeline. But since it happens to be faster, it might be worth experimenting with. :)

@alexcrichton
Copy link
Member

Heh perhaps! In theory optimization pipelines are quite flexible and we can basically design whatever we want, it's all just a matter of implementation and measurement at some point.

@tmandry
Copy link
Member Author

tmandry commented Jan 15, 2020

Teresa Johnson left some really helpful comments on the downside of doing it this way. To summarize, it can affect what gets inlined (for better or for worse), and it disables whole program devirtualization. We'd have to do some more tinkering with the pipeline and measurement to avoid those issues.

@tmandry
Copy link
Member Author

tmandry commented Jan 15, 2020

Also, a fix was landed upstream: https://reviews.llvm.org/D72711

@alexcrichton
Copy link
Member

Oh something I forgot to reply to earlier

For one, we could disable the actual object file output for rlibs.

This is one of the subjects on #66961 actually, and yeah would help quite a bit in your case since all the codegen happening in each crate isn't actually used.

it can affect what gets inlined (for better or for worse)

A good point!

it disables whole program devirtualization

It sounds like this only happens though if we compile with the equivalent of -fwhole-program-vtables which I don't think we're currently doing? If we do that though this is something good to keep in mind!

Also, a fix was landed upstream: https://reviews.llvm.org/D72711

Yay!

@tmandry
Copy link
Member Author

tmandry commented Jan 15, 2020

This is one of the subjects on #66961 actually, and yeah would help quite a bit in your case since all the codegen happening in each crate isn't actually used.

Ah that's all very relevant, thanks. Subscribed :)

It sounds like this only happens though if we compile with the equivalent of -fwhole-program-vtables which I don't think we're currently doing? If we do that though this is something good to keep in mind!

I opened #68262 to track this. Not expecting it to happen anytime soon, but it seems good to have a tracking issue for it.

@tmandry tmandry mentioned this issue Jan 21, 2020
bors added a commit that referenced this issue Jan 23, 2020
@bors bors closed this as completed in 13a91a8 Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants