Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile fails with target_cpu on GitHub Actions for OSX - illegal instruction #162

Closed
leet4tari opened this issue Dec 9, 2020 · 18 comments · Fixed by #177
Closed

Compile fails with target_cpu on GitHub Actions for OSX - illegal instruction #162

leet4tari opened this issue Dec 9, 2020 · 18 comments · Fixed by #177
Labels

Comments

@leet4tari
Copy link

When compiling a project using GitHub Actions for OSX and targeting a CPU other than native, the compile fails with

error: failed to run custom build command for `typenum v1.12.0`

Caused by:
  process didn't exit successfully: `/Users/runner/work/tari/tari/target/release/build/typenum-4114bf614ff41b71/build-script-main` (signal: 4, SIGILL: illegal instruction)
--- stdout
cargo:rustc-env=TYPENUM_BUILD_CONSTS=/Users/runner/work/tari/tari/target/release/build/typenum-e6ec2a2a9a37d166/out/consts.rs

warning: build failed, waiting for other jobs to finish...
error: build failed
Error: Process completed with exit code 101.

Same project works locally fine for OSX and GitHub Action build for Ubuntu and Windows.

Investigation shows that GHA OSX is using ivybridge hardware, where I have skylake and GHA Ubuntu and Windows hardware is been reported as skylake-avx512

@paholg paholg added the bug label Mar 12, 2021
@paholg
Copy link
Owner

paholg commented Mar 14, 2021

I have just switched typenum to use Github actions. Could you share your setup so I can try to reproduce?

@manuelstein
Copy link

same here with both 1.12.0 and 1.13.0

details from /proc/cpuinfo

vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
stepping        : 7
microcode       : 0x71a
cpu MHz         : 1197.068
cache size      : 10240 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4788.28
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual

@manuelstein
Copy link

Sorry, my case has solved itself. It probably comes up a lot with rust n00bs like me.
Compiling typenum 1.12.0 or 1.13.0 from source went just fine.
Using it as a dependency in a project that had target-feature=+avx,+avx2,+sse4.2 in the .cargo/config was a problem, because the CPU didn't have avx2
Thx

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

Follow-up question !

Is it expected to receive SIGILL: illegal instruction when cross-building typenum with -C target-cpu=broadwell from another type of CPU (ivybridge in this case) ?

Thanks !

@paholg
Copy link
Owner

paholg commented May 21, 2021

@NotSqrt I certainly wouldn't expect it. Typenum doesn't do anything cpu-specific, it just does a lot of type shenanigans. I would guess that your issue lies elsewhere, but if you have a clear repro I'd be happy to take a look.

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

Here is an excerpt of the build log, with RUSTFLAGS="-C target-cpu=broadwell":


cargo rustc --lib --manifest-path Cargo.toml --features pyo3/extension-module --release --verbose -- --crate-type cdylib --cfg=Py_3
   Compiling typenum v1.13.0
     Running `rustc --crate-name build_script_main --edition=2018 /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C debug-assertions=off -C metadata=de78301006a8644b -C extra-filename=-de78301006a8644b --out-dir /home/user/project/target/release/build/typenum-de78301006a8644b -L dependency=/home/user/project/target/release/deps --cap-lints allow -C target-cpu=broadwell`
     Running `/home/user/project/target/release/build/typenum-de78301006a8644b/build-script-main`
error: failed to run custom build command for `typenum v1.13.0`

Caused by:
  process didn't exit successfully: `/home/user/project/target/release/build/typenum-de78301006a8644b/build-script-main` (signal: 4, SIGILL: illegal instruction)
warning: build failed, waiting for other jobs to finish...
error: build failed
error: cargo failed with code: 101

Running gdb /home/user/project/target/release/build/typenum-de78301006a8644b/build-script-main gives:

Program received signal SIGILL, Illegal instruction.
0x000055555556997a in std::f64::<impl f64>::round (self=10) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/f64.rs:87
87              unsafe { intrinsics::roundf64(self) }

(gdb) bt
#0  0x000055555556997a in std::f64::<impl f64>::round (self=10) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/f64.rs:87
#1  0x000055555556a616 in build_script_main::main () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/main.rs:83

This is with rustc 1.52.1.

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

intrinsics::roundf64 seems to point to llvm.round.f64 in codegen, so I assume that it is in LLVM that build-script-main is made to use a round that is specialized for AVX2 or similar, that is not available on ivybridge.

@paholg
Copy link
Owner

paholg commented May 21, 2021

Interesting. The line that points to is just doing some very simple math:

let first2: u32 = (highest as f64).log(2.0).round() as u32 + 1;

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

Found the issue !

With this simple hello_wolrd:

fn main() {
    let highest: u64 = 1024;
    let first2: u32 = (highest as f64).log(2.0).round() as u32 + 1; 
    println!("{}", first2);
}

RUSTFLAGS="-C target-cpu=broadwell" cargo run --release works.
But RUSTFLAGS="-C target-cpu=broadwell" cargo run fails.

The issue is the opt-level.
at opt-level > 0, the round seems to be eliminated.
But at opt-level = 0 (which is certainly the level that is used for the build/main.rs script), there is a small difference in asm for the hello_world:

(files generated with RUSTFLAGS="--emit asm -C target-cpu=broadwell" cargo run and RUSTFLAGS="--emit asm -C target-cpu=ivybridge" cargo run respectively)

diff -C5 hello_round_broadwell.s hello_round_ivybridge.s 
*** hello_round_broadwell.s     2021-05-21 14:26:24.617976214 +0200
--- hello_round_ivybridge.s     2021-05-21 14:31:24.847106617 +0200
***************
*** 323,333 ****
  .Ltmp22:
        .loc    3 87 18 prologue_end
        vmovaps %xmm0, %xmm1
        vmovdqa .LCPI7_0(%rip), %xmm2
        vpand   %xmm2, %xmm1, %xmm2
!       vpbroadcastq    .LCPI7_1(%rip), %xmm1
        vpor    %xmm2, %xmm1, %xmm1
        vaddsd  %xmm1, %xmm0, %xmm1
        vroundsd        $11, %xmm1, %xmm0, %xmm0
        vmovsd  %xmm0, 16(%rsp)
        vmovsd  16(%rsp), %xmm0
--- 323,333 ----
  .Ltmp22:
        .loc    3 87 18 prologue_end
        vmovaps %xmm0, %xmm1
        vmovdqa .LCPI7_0(%rip), %xmm2
        vpand   %xmm2, %xmm1, %xmm2
!       vmovddup        .LCPI7_1(%rip), %xmm1
        vpor    %xmm2, %xmm1, %xmm1
        vaddsd  %xmm1, %xmm0, %xmm1
        vroundsd        $11, %xmm1, %xmm0, %xmm0
        vmovsd  %xmm0, 16(%rsp)
        vmovsd  16(%rsp), %xmm0

So potentially, using opt-level other than 0 for your build/main.rs might solve this issue !
No idea if that's possible !

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

Progress !! but new panic when I add this to my cargo.toml.

[profile.release.build-override]
opt-level = 3

(Sorry, I had another panic, but it simply was OUT_DIR that was missing)

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

Actual error:

OUT_DIR=/tmp RUST_BACKTRACE=full gdb target/release/build/typenum-0333481abf8719b3/build-script-main:

Program received signal SIGILL, Illegal instruction.
build_script_main::tests::build_tests () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/tests.rs:267
267             write!(writer, "{}", uint_binary_test(a, "Shl", b, a << b))?;
(gdb) bt
#0  build_script_main::tests::build_tests () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/tests.rs:267
#1  0x0000555555561d20 in build_script_main::main () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/main.rs:181

@NotSqrt
Copy link

NotSqrt commented May 21, 2021

And at opt-level = 1, it's yet another illegal instruction:

Program received signal SIGILL, Illegal instruction.

0x0000555555564118 in core::alloc::layout::Layout::padding_needed_for (self=<optimized out>, align=1) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/alloc/layout.rs:253
253             let len_rounded_up = len.wrapping_add(align).wrapping_sub(1) & !align.wrapping_sub(1);
(gdb) bt
#0  0x0000555555564118 in core::alloc::layout::Layout::padding_needed_for (self=<optimized out>, align=1) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/alloc/layout.rs:253
#1  0x00005555555642b3 in core::alloc::layout::Layout::repeat (self=0x7fffffffd048, n=8192) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/alloc/layout.rs:290
#2  0x00005555555641ce in core::alloc::layout::Layout::array (n=8192) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/alloc/layout.rs:394
#3  0x00005555555648e4 in alloc::raw_vec::RawVec<T,A>::allocate_in (capacity=1, init=alloc::raw_vec::AllocInit::Uninitialized, alloc=...)
    at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:192
#4  0x0000555555566c7d in alloc::raw_vec::RawVec<T,A>::with_capacity_in (capacity=8192, alloc=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:142
#5  0x0000555555566b29 in alloc::vec::Vec<T,A>::with_capacity_in (capacity=8192, alloc=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:572
#6  0x0000555555566b19 in alloc::vec::Vec<T>::with_capacity (capacity=8192) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:438
#7  0x00005555555666e5 in std::io::buffered::bufwriter::BufWriter<W>::with_capacity (capacity=8192, inner=0x7fffffffd184)
    at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/buffered/bufwriter.rs:110
#8  0x0000555555566719 in std::io::buffered::bufwriter::BufWriter<W>::new (inner=0x1) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/buffered/bufwriter.rs:92
#9  0x000055555555eb6c in build_script_main::tests::build_tests () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/tests.rs:250
#10 0x00005555555664e5 in build_script_main::main () at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/typenum-1.13.0/build/main.rs:181

@BratSinot
Copy link

I also have this problem. I'm trying to build docker container for AWS with AVX512, but my host CPU doesn't have AVX512. Tried using Intel® Software Development Emulator but Docker just ignoring it.

@NotSqrt
Copy link

NotSqrt commented Jul 21, 2021

I'm still a little puzzled by the fact that it's possible to cross-compile entire projects between platforms (eg. from x86 to arm), but it seems much harder to compile for a target-cpu that has more features (eg AVX512) on the same platform, because the build.rs scripts are built with the target flags, not the host flags ..

@Volker-Weissmann
Copy link

Volker-Weissmann commented Nov 9, 2021

Update: I'm an idiot (and the documentation is shit)

If you run export RUSTFLAGS="-C target-cpu=znver1", then this flag is used when build.rs and main.rs gets compiled. So when build.rs is executed, you (might) hit an illegal instruction.

(See https://users.rust-lang.org/t/different-rustflags-for-build-rs-and-main-rs/67080)

Part 1

I have a similar problem on my sandybridge:

[weissmann@larissa typenum]$ export RUSTFLAGS="-C target-cpu=znver1"
[weissmann@larissa typenum]$ cargo build -v
   Compiling typenum v1.14.0 (/home/weissmann/typenum)
     Running `rustc --crate-name build_script_main --edition=2018 build/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=9b1e7437de2f8d9a -C extra-filename=-9b1e7437de2f8d9a --out-dir /home/weissmann/typenum/target/debug/build/typenum-9b1e7437de2f8d9a -C incremental=/home/weissmann/typenum/target/debug/incremental -L dependency=/home/weissmann/typenum/target/debug/deps -C target-cpu=znver1`
     Running `/home/weissmann/typenum/target/debug/build/typenum-9b1e7437de2f8d9a/build-script-main`
error: failed to run custom build command for `typenum v1.14.0 (/home/weissmann/typenum)`

Caused by:
  process didn't exit successfully: `/home/weissmann/typenum/target/debug/build/typenum-9b1e7437de2f8d9a/build-script-main` (signal: 4, SIGILL: illegal instruction)

Everything works fine without the target-cpu=znver1 flag.
It also works fine if I run it on my non-sandybridge cpu, even with the target-cpu=znver1 flag.

Any workaround except for removing the target-cpu=znver1 flag or using a different cpu for building?

Part 2

fn main() {
    let highest: u64 = 1024;
    let first2: u32 = (highest as f64).log(2.0).round() as u32 + 1; 
    println!("{}", first2);
}

If I put the code above in main.rs:

The code above builds fine, even on my sandybridge with target-cpu=znver1.
Running the code on my sandybridge only works without the target-cpu=znver1 flag. (This is not a bug, you are not supposed to exectue znver1 code on a sandybridge.)

If I put the code above in my build.rs

Building fails if I add the target-cpu=znver1 flag.

@Volker-Weissmann
Copy link

the build.rs scripts are built with the target flags, not the host flags

Why?

@dunnock
Copy link

dunnock commented Dec 23, 2021

Same issue happens to me when cross-compiling to intel with RUSTFLAGS="-C target-cpu=cascadelake" from amd64 machine. There is related bug rust-lang/cargo#6375 is there any way around of using build script for typenum.

As it's mentioned in the issue setting --target x86_64-unknown-linux-gnu helps, I assume it will also help with other architectures.

@paholg
Copy link
Owner

paholg commented Dec 26, 2021

My hope is that this is fixed in typenum 1.15. Please re-open the issue if it persists.

ruuda added a commit to ChorusOne/solana that referenced this issue Jul 7, 2022
Version 0.15.0 fixes an issue where code that runs at build-time would
be compiled with the target_cpu setting, and the target CPU might
support instructions that the host system does not have, causing a
SIGILL during the build.

See also [1] and [2].

[1]: paholg/typenum#162
[2]: paholg/typenum#177
ruuda added a commit to ChorusOne/solana that referenced this issue Jul 7, 2022
Version 1.15.0 fixes an issue where code that runs at build-time would
be compiled with the target_cpu setting, and the target CPU might
support instructions that the host system does not have, causing a
SIGILL during the build.

See also [1] and [2].

[1]: paholg/typenum#162
[2]: paholg/typenum#177
CriesofCarrots pushed a commit to solana-labs/solana that referenced this issue Jul 7, 2022
Bump typenum from 1.14.0 to 1.15.0

Version 1.15.0 fixes an issue where code that runs at build-time would
be compiled with the target_cpu setting, and the target CPU might
support instructions that the host system does not have, causing a
SIGILL during the build.

See also [1] and [2].

[1]: paholg/typenum#162
[2]: paholg/typenum#177
ruuda added a commit to ChorusOne/solana that referenced this issue Jul 15, 2022
Version 1.15.0 fixes an issue where code that runs at build-time would
be compiled with the target_cpu setting, and the target CPU might
support instructions that the host system does not have, causing a
SIGILL during the build.

See also [1] and [2].

[1]: paholg/typenum#162
[2]: paholg/typenum#177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants