Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building for mipsel, link fails on missing symbol __sync_fetch_and_add_8, which doesn't exist on mips and shouldn't be getting called #112313

Open
SeanMollet opened this issue Jun 5, 2023 · 14 comments
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-MIPS Target: MIPS processors T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@SeanMollet
Copy link

SeanMollet commented Jun 5, 2023

I'm trying to build for mipsel-unknown-linux-gnu
x86_64 host, ubuntu 22.04
Freshly built from github source toolchain, with profiler = true in config.toml and mipsel-unknown-linux-gnu target as well as the host target. Built with x.py dist, using the stage 2 results as my build toolchain.

Building anything with RUSTFLAGS="-C instrument-coverage" set, leads to the following link error on several object files :

Single example:

src/index.crates.io-6f17d22bba15001f/hyper-0.14.26/src/common/buf.rs:33: undefined reference to `__sync_fetch_and_add_8'

Building the same code without the flag works fine. Resulting binary works fine.

Per the following GCC bug, MIPS 32 doesn't support 8 byte atomic operations:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56300

In the project/target/mipsel-unknown-linux-gnu folder, running

grep -r __sync_fetch_and_add_8

Shows that nearly every file includes a call to this non-existent function. I assume it's coming from the profiler somehow.

Executing the same command in my toolchain build/x86_64-unknown-linux-gnu/stage2

grep: lib/rustlib/x86_64-unknown-linux-gnu/lib/libLLVM-16-rust-1.72.0-nightly.so: binary file matches
grep: lib/libLLVM-16-rust-1.72.0-nightly.so: binary file matches

This is where I get stuck. The mipsel rust standard lib is under that directory, so my grep should catch any calls to __sync_fetch_and_add_8 contained in the standard lib. There don't appear to be any. The only cases I find are in libLLVM, which is built for x86_64. It's reasonable for them to be there, since x86_64 does support __sync_fetch_and_add_8.

You'll have to forgive my ignorance here, I'm still quite new to rust. I don't understand the intricacies of the build process well enough to figure out how a call to __sync_fetch_and_add_8 is getting pulled into object files targeting MIPS, when the only library instance of a call to there is in the x86_64 library.

What I am fairly certain of though is that it shouldn't be getting pulled in.

@SeanMollet SeanMollet added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2023
@workingjubilee
Copy link
Member

If your hypothesis about this being the profiler's fault is true, the question is raised: Does LLVM even support profiling this target?

@SeanMollet
Copy link
Author

@workingjubilee Interesting idea that led to quite a deep rabbit hole. In short, LLVM implements the profiling bits using intrinsics that are implemented in C++. So, they should work on all platforms.

This prompted me to take a look at the llvm-ir to possible narrow down whether the problem is coming from Rust or from LLVM.

Hello world compiled with -C instrument-coverage:

; test::main
; Function Attrs: nonlazybind uwtable
define internal void @_RNvCs7qINHSN7ijU_4test4main() unnamed_addr #0 !dbg !323 {
start:
  %_2 = alloca %"core::fmt::Arguments<'_>", align 4
  %0 = atomicrmw add ptr @__profc__RNvCs7qINHSN7ijU_4test4main, i64 1 monotonic, align 8, !dbg !326
; call <core::fmt::Arguments>::new_const
  call void @_RNvMs0_NtCsivG1fSqMrJm_4core3fmtNtB5_9Arguments9new_constCs7qINHSN7ijU_4test(ptr sret(%"core::fmt::Arguments<'_>") %_2, ptr align 4 @alloc_a144b0fa8785ab0f3d8930436717ef6b, i32 1), !dbg !326
; call std::io::stdio::_print
  call void @_ZN3std2io5stdio6_print17h09840d83be27776fE(ptr %_2), !dbg !326
  ret void, !dbg !327
}

Without coverage:

; test::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN4test4main17hebc6b50887e73c33E() unnamed_addr #1 !dbg !323 {
start:
  %_2 = alloca %"core::fmt::Arguments<'_>", align 4
; call core::fmt::Arguments::new_const
  call void @_ZN4core3fmt9Arguments9new_const17h79e2b46816f36f5bE(ptr sret(%"core::fmt::Arguments<'_>") %_2, ptr align 4 @alloc_a144b0fa8785ab0f3d8930436717ef6b, i32 1), !dbg !326
; call std::io::stdio::_print
  call void @_ZN3std2io5stdio6_print17h09840d83be27776fE(ptr %_2), !dbg !326
  ret void, !dbg !327
}

Here's the offending line:

%0 = atomicrmw add ptr @__profc__RNvCs7qINHSN7ijU_4test4main, i64 1 monotonic, align 8, !dbg !326

That gets turned into the call to __sync_fetch_and_add_8 by LLVM.

So, now, the question is where does that come from?

@workingjubilee
Copy link
Member

So, now, the question is where does that come from?

A __profc_ symbol like that one is indeed profiler-related, so your earlier intuition was correct, or at least so it seems.

@SeanMollet
Copy link
Author

SeanMollet commented Jun 6, 2023

Well, it turns out it IS coming from LLVM after all. LLVM is hard coded to "lower" calls to the profile counter with atomic increments in many cases. Best I can figure out, the "lowering" was added behind a command line option, then later it was forced on in one of the build passes.

The annoying part is that it's just blindly injecting a specific instruction that has varying target support and, from the place where it's doing that, I can't find a obvious way to get an instance of the target class that knows whether this is or isn't supported. Turning that injection off fixes the problem.

I've written what I think is a decent patch that's easily extended if there's edge cases I've missed. I'm going to start the submission process with them.

SeanMollet/llvm-project@6ab037f

@workingjubilee workingjubilee added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) labels Jun 19, 2023
@workingjubilee workingjubilee added the O-MIPS Target: MIPS processors label Jun 30, 2023
@Zalathar
Copy link
Contributor

Zalathar commented Jul 7, 2023

Possibly triggered by #111469, which instructs LLVM's coverage-instrumentation pass to generate atomic increments instead of non-atomic ones.

@SeanMollet
Copy link
Author

@Zalathar Great find! Yes, that's exactly what's causing it. I'll look into that section and come up with a patch.

@briansmith
Copy link
Contributor

I suspect that this issue affects any target that has max_atomic_width < 64.

In particular, this same issue seems to also affect powerpc-unknown-linux-gnu, based on my experience in briansmith/ring#1877. I am getting undefined symbol errors for __atomic_fetch_add_8. The target definition says the max_atomic_width is 32 bits so it isn't surprising that we don't have __atomic_fetch_add_8 without linking libatomic or something similar.

PR #113448 proposed a fix for the linking errors, to only use atomics on 64-bit targets. However, it was pointed out that is wrong on non-64-bit targets where max_atomic_width is 64 or more, like i686. There was some discussion in that PR about instead choosing to use atomics for the profiler based on whether max_atomic_width is 64 or more, amongst @cuviper, @workingjubilee, and @SeanMollet. However, if we were to do that, then we'd regress fix for #91092 (implemented in PR #111469) for these targets. Plus, some refactoring of the compiler is required.

It seems to me that we need to use libatomic or equivalent to emulate 64-bit atomics when 64-bit atomics are not implemented. However, some people metioned in PR #113448 that linking libatomic in these cases is not the right thing to do. But, that does seem to be what other (non-Rust) projects have done to resolve these issues, so I don't understand why not.

@briansmith
Copy link
Contributor

I suspect that this issue affects any target that has max_atomic_width < 64.

This should be labeled O-PowerPC in addition to O-mips. Also O-thumb? if there is a label for Thumb targets.

Based on llvm/llvm-project@3681be8 it seems like Clang added "prefer-atomic" which uses atomics iff the target supports them. So presumably there is mo precedent for atomic profiling data updates for 32-bit MIPS and 32-bit PowerPC, so the suggestion in the review of PR #113448 to turn off atomics for the profiler if max_atomic_width < 64 does seem reasonable. Since most targets seem to have max_atomic_width >= 64 maybe a hack to specifically disable for 32-bit targets when the arch is mips or powerpc would be acceptable, since it would avoid the need to refactor anything.

If that is done, then people would likely need to run tests in single-threaded mode when collecting code coverage for 32-bit MIPS and 32-bit PowerPC.

BTW, the reason it matters is that 32-bit MIPS and 32-bit PowerPC are also the only (AFAICT) Rust-supported 32-bit big-endian targets, so we have no code coverage instrumentation for any 32-bit-big-endian-specific code without supporting these targets.

@ecnelises
Copy link
Contributor

(CC'ing @bzEq for PowerPC atomic stuff)

@briansmith
Copy link
Contributor

Besides PowerPC and MIPS, this seems to also affect riscv32.

@urosbericsyrmia
Copy link

Since most targets seem to have max_atomic_width >= 64 maybe a hack to specifically disable for 32-bit targets when the arch is mips or powerpc would be acceptable, since it would avoid the need to refactor anything.

If that is done, then people would likely need to run tests in single-threaded mode when collecting code coverage for 32-bit MIPS and 32-bit PowerPC.

Would this solution be a reasonable trade-off? If not so, do you recommend trying something else?

jdmitrovic-syrmia added a commit to jdmitrovic-syrmia/llvm-project that referenced this issue Feb 28, 2024
Fixed 64-bit step can lead to creating atomic instructions
unsupported by the target architecture (see rust-lang/rust#112313).
jdmitrovic-syrmia added a commit to jdmitrovic-syrmia/llvm-project that referenced this issue Mar 1, 2024
Fixed 64-bit step can lead to creating atomic instructions
unsupported by the target architecture (see rust-lang/rust#112313).

When using atomics, type of the step is a pointer-sized integer.
Otherwise, type of the step is of a largest legal integer type.
jdmitrovic-syrmia added a commit to jdmitrovic-syrmia/llvm-project that referenced this issue Mar 4, 2024
Fixed 64-bit step can lead to creating atomic instructions
unsupported by the target architecture (see rust-lang/rust#112313).

When using atomics, type of the step is a pointer-sized integer.
Otherwise, type of the step is of a largest legal integer type.
jdmitrovic-syrmia added a commit to jdmitrovic-syrmia/llvm-project that referenced this issue Mar 11, 2024
Fixed 64-bit step can lead to creating atomic instructions
unsupported by the target architecture (see rust-lang/rust#112313).

When using atomics, type of the step is a pointer-sized integer.
Otherwise, type of the step is of a largest legal integer type.
@MaskRay
Copy link
Contributor

MaskRay commented Mar 19, 2024

llvm/llvm-project#83239 changing the parameter in LLVM IR from i64 to i32 is incorrect, and breaks the design intention to avoid counter overflows.

For Clang, __sync_fetch_and_add_8 is defined by libgcc_s.so.1 or compiler-rt/lib/builtins.
compiler-rt doesn't seem to provide a definition for mips.

@jdmitrovic-syrmia
Copy link

While testing this, the only way I could reproduce this behavior is when building with my custom LLVM build. When LLVM is built alongside everything else, libatomic is used as default when building for MIPS targets, and we pass -latomic as a linker flag:

if target.starts_with("mips") && target.contains("netbsd") {
// LLVM wants 64-bit atomics, while mipsel is 32-bit only, so needs -latomic
ldflags.exe.push(" -latomic");
ldflags.shared.push(" -latomic");
}

These are passed to CMake:

cfg.define("CMAKE_SHARED_LINKER_FLAGS", &ldflags.shared);
cfg.define("CMAKE_MODULE_LINKER_FLAGS", &ldflags.module);
cfg.define("CMAKE_EXE_LINKER_FLAGS", &ldflags.exe);

This will actually make the compiler to generate __atomic instead of the legacy __sync instructions.

So, when the compiler is built (without custom LLVM), here is the output of a random sample project build:

RUSTFLAGS='--emit=llvm-ir -C instrument-coverage' cargo +stage1 build --target mips-unknown-linux-gnu

/home/syrmia/rust-bin/src/main.rs:2: undefined reference to `__atomic_fetch_add_8'
/usr/lib/gcc-cross/mips-linux-gnu/10/../../../../mips-linux-gnu/bin/ld: /home/syrmia/rust-bin/src/main.rs:2: undefined reference to `__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

So, when we pass -latomic as follows:

RUSTFLAGS='-latomic --emit=llvm-ir -C instrument-coverage' cargo +stage1 build --target mips-unknown-linux-gnu

the build is passing.

I don't think this behavior is incorrect. GCC uses __atomic instructions by default when using 64-bit atomics on 32-bit MIPS.

@briansmith
Copy link
Contributor

briansmith commented May 7, 2024

Based on @jdmitrovic-syrmia's comment above, I modified my CI configuration for to force RUSTFLAGS=-latomic when measuring code coverage for target powerpc-unknown-linux-gnu; this fixed the linking failures and the code coverage reported by codecov.io increased accordingly. So, perhaps we should force -latomic when any feature that uses profiler builtins, in particular source-based code coverage, is enabled for these targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-MIPS Target: MIPS processors T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants