[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

tmandry · 2020-01-04T01:39:49Z

I got a segfault / LLVM assertion failure while compiling a target with ThinLTO enabled. This is the same failure as in #53912, albeit with a different root cause.

In my case, I was dealing with a global named switch.table._ZN77_$LT$omaha_client..protocol..request..Event$u20$as$u20$core..clone..Clone$GT$5clone17h829b64c9ab982ff5E.llvm.10390335839252477638.llvm.11308644296266801080.

Note the presence of two .llvm.NNNN extensions. This means that this was a local which was promoted to a global symbol twice. The implementation of getOriginalNameBeforePromote doesn't handle this case correctly; it strips off both extensions:

  /// Convenience method for creating a promoted global name
  /// for the given value name of a local, and its original module's ID.
  static std::string getGlobalNameForLocal(StringRef Name, ModuleHash ModHash) {
    SmallString<256> NewName(Name);
    NewName += ".llvm.";
    NewName += utostr((uint64_t(ModHash[0]) << 32) |
                      ModHash[1]); // Take the first 64 bits
    return NewName.str();
  }

  /// Helper to obtain the unpromoted name for a global value (or the original
  /// name if not promoted).
  static StringRef getOriginalNameBeforePromote(StringRef Name) {
    std::pair<StringRef, StringRef> Pair = Name.split(".llvm.");
    return Pair.first;
  }

The fix is actually trivial; just have to change split to rsplit so that it only removes the last .llvm.NNNN extension. Leaving this issue open to track the fix being cherry-picked.

The text was updated successfully, but these errors were encountered:

tmandry · 2020-01-06T22:26:02Z

Filed a bug upstream.

tmandry · 2020-01-06T22:57:30Z

This might still be a rustc bug, since apparently a double-promotion is unexpected. Working on a minimal repro now.

pnkfelix · 2020-01-09T05:10:14Z

@tmandry is this something that we could/should be pulling in the ICE-breakers on?

pnkfelix · 2020-01-09T05:12:25Z

triage: P-high. Assigning to @tmandry and removing I-nominated label.

tmandry · 2020-01-09T05:19:36Z

Possibly, but I’ll need to put a usable reproducer somewhere. That tends to be more difficult when the target is Fuchsia.

I’ve had limited success doing all the minimizing myself. I’ll see if I can reproduce on other targets, and upload a reproducer if so.

tmandry · 2020-01-10T21:27:03Z

So, from the output of -Csave-temps, what appears to be happening is that we run the same bitcode through the ThinLTO optimization pipeline twice. Once when outputting an rlib, and once when linking to that rlib.

More specifically:

An rlib crate (omaha_client in this case) gets compiled with ThinLTO and optimizations enabled. We run the bitcode through the ThinLTO optimization pipeline.

-Csave-temps results in a lot of files that look like libomaha_client.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-after-rename.bc

A binary crate foo is compiled which depends on the rlib. It is also compiled with ThinLTO and optimizations.

-Csave-temps again results in a lot of files that look like foo.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-after-rename.bc (except after-internalize and everything after it are missing, because it crashes)
the foo.omaha_client.3a1fbbbh-cgu.15.rcgu.thin-lto-input.bc file is the output of the ThinLTO pipeline from the previous step

The offending symbol is promoted to a global once in the first optimization run and again in the second, which accounts for the two .llvm.xxxxx suffixes on it. The crash occurs because it's not expected for a symbol to be promoted twice.

I can't think of a reason why we would run the same pipeline twice on the same bitcode. Is there one?

tmandry · 2020-01-10T21:28:35Z

cc @alexcrichton

alexcrichton · 2020-01-10T22:50:09Z

@tmandry are you compiling with -Clto=thin, or just normal -Copt-level=3? The former will run some bitcode through thinlto twice, but the latter shouldn't. If the bug happens by running bitcode through twice, that's like where it's coming from?

tmandry · 2020-01-10T23:13:27Z

@alexcrichton Ahh you were right, it has to do with the flags!

For the rlib I wasn't passing -Clto=thin, but for the executable I was. As soon as I started passing -Clto=thin for both, the problem went away and we only sent the code through the pipeline once.

(EDIT: I was using -Copt-level=z in both)

Apparently we've been doing this for a very long time in Fuchsia without problems. So is it expected that we should be passing -Clto=thin to both?

alexcrichton · 2020-01-10T23:24:20Z

Oh weird, so I didn't expect that to fix anything nor did I think that the compiler does what it does!

Looks like it boils down to this match here which explains:

The final executable -Clto=thin means that all bitcode will run through ThinLTO. If the rlib bitcode was previously run through ThinLTO, then it would be run through twice.
That match statement, however, means that if you -Clto=thin (Lto::Thin) then we skip the thinlto step in rlibs assuming you'll do it at the end.

So it looks like, yes, you'll want to pass -Clto=thin to all compilations, not just the final one. And that should do the trick! I would still consider this an LLVM bug though because it should in theory work both ways...

tmandry · 2020-01-11T04:36:46Z

Agreed on this still being a bug.

Also, as it turns out, the old way of doing it is faster for our build! Presumably since we're doing more work up front.

For progeny, I'm posting the data here.

Here are a couple runs before adding -Clto=thin to the rlib builds:

37625.49user 1599.52system 10:36.78elapsed 6159%CPU (0avgtext+0avgdata 2987252maxresident)k                                                                                                                                                                                                                                          
5760inputs+117653320outputs (4390major+195450569minor)pagefaults 0swaps

37598.53user 1582.50system 10:38.87elapsed 6132%CPU (0avgtext+0avgdata 2969104maxresident)k
32inputs+116594048outputs (3511major+197468224minor)pagefaults 0swaps

Here are a couple runs after:

47659.28user 1896.30system 13:05.14elapsed 6311%CPU (0avgtext+0avgdata 2971780maxresident)k                                                                                                                                                                                                                                          
240inputs+135449992outputs (2867major+230119654minor)pagefaults 0swaps

47593.52user 2015.31system 12:58.32elapsed 6373%CPU (0avgtext+0avgdata 2970252maxresident)k
0inputs+135327112outputs (2545major+238473909minor)pagefaults 0swaps

This is on a 72-core machine building a pretty large graph of Rust crates (many executables).

alexcrichton · 2020-01-13T15:06:36Z

Interesting, not the results I would have expected!

If you have a lot of executables though and you're running ThinLTO on all of them then you're likely basically optimizing the same code over and over. Before -Clto=thin each crate was individually pretty heavily optimized, doing those optimizations only once instead of once for each executable. Afterwards, however, ThinLTO-per-crate doesn't happen and only happens at the end. This probably produces roughly equivalent code in terms of speed, but each executable has to run more optimizations.

tmandry · 2020-01-14T00:44:36Z

This probably produces roughly equivalent code in terms of speed, but each executable has to run more optimizations.

Right, so by doing more of that work upfront in the rlibs, we save time optimizing the code for each executable.

I think there's still a more efficient way of doing all this. For one, we could disable the actual object file output for rlibs.

Also, when compiling the executable, the whole ThinLTO pipeline is run again on the individual crate bitcode before building summaries and distributing the codegen to workers. If there were a way to disable that and tell the compiler to assume that rlib bitcode has already been optimized, we could save some time here. Perhaps we could make that automatic with a flag that emits the summary with the rlib, and detect that it's already there in the executable compile.

Of course I'm talking about how to optimize a completely-unintended way of using Rust's ThinLTO pipeline. But since it happens to be faster, it might be worth experimenting with. :)

alexcrichton · 2020-01-14T15:02:23Z

Heh perhaps! In theory optimization pipelines are quite flexible and we can basically design whatever we want, it's all just a matter of implementation and measurement at some point.

tmandry · 2020-01-15T03:16:15Z

Teresa Johnson left some really helpful comments on the downside of doing it this way. To summarize, it can affect what gets inlined (for better or for worse), and it disables whole program devirtualization. We'd have to do some more tinkering with the pipeline and measurement to avoid those issues.

tmandry · 2020-01-15T03:16:43Z

Also, a fix was landed upstream: https://reviews.llvm.org/D72711

alexcrichton · 2020-01-15T15:18:38Z

Oh something I forgot to reply to earlier

For one, we could disable the actual object file output for rlibs.

This is one of the subjects on #66961 actually, and yeah would help quite a bit in your case since all the codegen happening in each crate isn't actually used.

it can affect what gets inlined (for better or for worse)

A good point!

it disables whole program devirtualization

It sounds like this only happens though if we compile with the equivalent of -fwhole-program-vtables which I don't think we're currently doing? If we do that though this is something good to keep in mind!

Also, a fix was landed upstream: https://reviews.llvm.org/D72711

Yay!

tmandry · 2020-01-15T22:45:01Z

This is one of the subjects on #66961 actually, and yeah would help quite a bit in your case since all the codegen happening in each crate isn't actually used.

Ah that's all very relevant, thanks. Subscribed :)

It sounds like this only happens though if we compile with the equivalent of -fwhole-program-vtables which I don't think we're currently doing? If we do that though this is something good to keep in mind!

I opened #68262 to track this. Not expecting it to happen anytime soon, but it seems good to have a tracking issue for it.

@nikic

…sform-fixes, r=alexcrichton Update LLVM Fixes #67855 (rust-lang/llvm-project#31) Fixes #66036 (rust-lang/llvm-project#32) r? @nikic @alexcrichton

pnkfelix assigned tmandry Jan 9, 2020

pnkfelix added P-high High priority and removed I-nominated labels Jan 9, 2020

jonas-schievink added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jan 9, 2020

This was referenced Jan 16, 2020

[ThinLTO] Handle variable with twice promoted name (Rust) rust-lang/llvm-project#31

Merged

Update LLVM #68299

Closed

[ThinLTO] Handle variable with twice promoted name (Rust) rust-lang/llvm-project#33

Merged

tmandry mentioned this issue Jan 21, 2020

Update LLVM #68435

Merged

bors added a commit that referenced this issue Jan 23, 2020

Auto merge of #68435 - tmandry:llvmup-2-the-return-of-phibitcast-tran…

e0bbe79

…sform-fixes, r=alexcrichton Update LLVM Fixes #67855 (rust-lang/llvm-project#31) Fixes #66036 (rust-lang/llvm-project#32) r? @nikic @alexcrichton

bors closed this as completed in 13a91a8 Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

tmandry commented Jan 4, 2020 •

edited

Loading

tmandry commented Jan 6, 2020

tmandry commented Jan 6, 2020

pnkfelix commented Jan 9, 2020

pnkfelix commented Jan 9, 2020

tmandry commented Jan 9, 2020

tmandry commented Jan 10, 2020 •

edited

Loading

tmandry commented Jan 10, 2020

alexcrichton commented Jan 10, 2020

tmandry commented Jan 10, 2020 •

edited

Loading

alexcrichton commented Jan 10, 2020

tmandry commented Jan 11, 2020

alexcrichton commented Jan 13, 2020

tmandry commented Jan 14, 2020

alexcrichton commented Jan 14, 2020

tmandry commented Jan 15, 2020

tmandry commented Jan 15, 2020

alexcrichton commented Jan 15, 2020

tmandry commented Jan 15, 2020

[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

[LLVM] Segfault / "llvm::thinLTOInternalizeModule ... Assertion `GS != DefinedGlobals.end()' failed", again #67855

Comments

tmandry commented Jan 4, 2020 • edited Loading

tmandry commented Jan 6, 2020

tmandry commented Jan 6, 2020

pnkfelix commented Jan 9, 2020

pnkfelix commented Jan 9, 2020

tmandry commented Jan 9, 2020

tmandry commented Jan 10, 2020 • edited Loading

tmandry commented Jan 10, 2020

alexcrichton commented Jan 10, 2020

tmandry commented Jan 10, 2020 • edited Loading

alexcrichton commented Jan 10, 2020

tmandry commented Jan 11, 2020

alexcrichton commented Jan 13, 2020

tmandry commented Jan 14, 2020

alexcrichton commented Jan 14, 2020

tmandry commented Jan 15, 2020

tmandry commented Jan 15, 2020

alexcrichton commented Jan 15, 2020

tmandry commented Jan 15, 2020

tmandry commented Jan 4, 2020 •

edited

Loading

tmandry commented Jan 10, 2020 •

edited

Loading

tmandry commented Jan 10, 2020 •

edited

Loading