Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTO / aarch64 f64 error in LLVM, stable rust #71506

Closed
kali opened this issue Apr 24, 2020 · 7 comments
Closed

LTO / aarch64 f64 error in LLVM, stable rust #71506

kali opened this issue Apr 24, 2020 · 7 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kali
Copy link
Contributor

kali commented Apr 24, 2020

I'm currently trying to reduce the problem that appeared suddenly with no significant code change in that area, with no compiler bump. It's a long and painful process as it happens only on github actions runners and only in the context of this relatively big project, so I am opening this in case somebody get some intuition of what is happening.

Code

As far I can tell the bug is triggered by https://github.com/snipsco/tract/blob/master/core/src/ops/nn/global_pools.rs#L85 , but did not manage yet to reproduce the ICE out of tract. I assume from the output dealing with f64, that it is happening for the D=f64 instanciation (the code is also instantiated for f32 and f16).

a + b.abs().powi(self.p as i32)

Meta

rustc --version --verbose:

rustc 1.42.0 (b8cedc004 2020-03-09)

Error output

from https://github.com/snipsco/tract/runs/614684662?check_suite_focus=true . Again trying to reduce this more.

2020-04-24T08:50:09.2448807Z Instruction does not dominate all uses!
2020-04-24T08:50:09.7821582Z   %1313 = trunc i64 %393 to i32
2020-04-24T08:50:09.7844941Z   %1297 = call <2 x double> @llvm.powi.v2f64(<2 x double> %1296, i32 %1313)
2020-04-24T08:50:09.7850640Z in function _ZN96_$LT$tract_core..ops..nn..global_pools..GlobalLpPool$u20$as$u20$tract_core..ops..StatelessOp$GT$4eval17h81fcf30ad93a69d3E
2020-04-24T08:50:09.7850869Z LLVM ERROR: Broken function found, compilation aborted!
@kali kali added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 24, 2020
@jonas-schievink jonas-schievink added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 24, 2020
@kali
Copy link
Contributor Author

kali commented Apr 24, 2020

A few things that did no make an impact:

  • github actions env: both ubuntu-16.04 and ubuntu-18.04 have the issue
  • with or without dinghy
  • building all tract features or only the onnx feature
  • running rustup update (env comes with 1.42, while update will setup 1.43)

BUT

  • disabling LTO makes it disappear

@kali kali changed the title aarch64 f64 error in LLVM (only repr. on github actions at this stage) LTO/aarch64 f64 error in LLVM (only repr. on github actions at this stage) Apr 24, 2020
@kali
Copy link
Contributor Author

kali commented Apr 24, 2020

I have a test-case here: https://github.com/kali/bug-rust-71506 . Note that I could not reproduce the bug outside of github actions.

@fredszaq
Copy link

fredszaq commented Apr 24, 2020

I can reproduce on archlinux (using the linker from the aarch64-linux-gnu-gcc package)

just run:

$ CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=/usr/bin/aarch64-linux-gnu-gcc cargo build -vvv --target aarch64-unknown-linux-gnu --release

in https://github.com/kali/bug-rust-71506 folder

@kali
Copy link
Contributor Author

kali commented Apr 24, 2020

After some debugging, I can actually reproduce it on my workstation too. Nothing magic about github action.

It only appears on stable, not on beta, not on nightly.

@kali kali changed the title LTO/aarch64 f64 error in LLVM (only repr. on github actions at this stage) LTO / aarch64 f64 error in LLVM, stable rust Apr 24, 2020
@kali
Copy link
Contributor Author

kali commented Apr 25, 2020

Made some progress isolating the issue further more. In summary

1/ it has been here for a while in terms of rustc versions (at least 1.39.0)
2/ it is not here on beta and nightly
3/ ndarray latest version bump (0.13.0 to 0.13.1) triggered it. Downgrading to 0.13.0 makes it go away.
4/ LTO needs to be enabled
5/ I could not reproduce it without an intermediate library in between ndarray and the final executable.
6/ Some weird patterns are needed in the lib code (like the double call to into_shape())

Shamelessly trying to drag @bluss into this.

Updated procedure to reproduce

One does not even need an actualy aarch64 linker to expose the bug, as it happens at the compilation stage, so in order to reproduce:

1/ checkout https://github.com/kali/bug-rust-71506
2/ be on stable
3/ rustup target add aarch64-unknown-linux-gnu
4/ cargo build --release --target aarch64-unknown-linux-gnu

The bug we are after looks like:

Instruction does not dominate all uses!
  %379 = trunc i64 %225 to i32
  %364 = call <2 x double> @llvm.powi.v2f64(<2 x double> %361, i32 %379)
in function _ZN14bug_rust_715064main17heffb7675ad1d
LLVM ERROR: Broken function found, compilation aborted!

If you get linking errors, then the compilation probably passed without triggering the bug, you're just not having the correct cross linker setup.

@fanninpm
Copy link

I've confirmed that this also happens on the latest nightly on my platform (Mac).

@JohnTitor JohnTitor added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label May 23, 2020
@jonas-schievink jonas-schievink added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label May 23, 2020
@jonas-schievink
Copy link
Contributor

It only appears on stable, not on beta, not on nightly.

In that case, closing as fixed. If it reappears, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants