Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking musl with mold causes issues with global variables from libc #1071

Closed
aabacchus opened this issue Jul 28, 2023 · 20 comments
Closed

Linking musl with mold causes issues with global variables from libc #1071

aabacchus opened this issue Jul 28, 2023 · 20 comments

Comments

@aabacchus
Copy link

mold version: 2.0.0
musl version: 1.2.4

I recently rebuilt musl and used mold to link it, and subsequently experienced segfaults and bugs in a lot of random programs. After some digging, I found that the problems were all from globals from musl (program_invocation_short_name and optind in particular). Using a different linker to link musl fixed the problems.

Interestingly, programs built with clang didn't have these problems. Consider this C program:

#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>

int
main(void) {
	puts(program_invocation_short_name);
	return 0;
}

This program, built with GCC against musl linked with mold, segfaults when puts tries to dereference a NULL pointer.

The difference between clang and GCC is how the global is accessed. GCC does this:

        movq    program_invocation_short_name(%rip), %rdi
        call    puts@PLT

but clang does this:

        movq    program_invocation_short_name@GOTPCREL(%rip), %rax
        movq    (%rax), %rdi
        call    puts@PLT

I have confirmed that the use of @GOTPCREL fixes the GCC program.

The first version always gets NULL, the second gets the correct value initialised by musl.
Similarly with optind, in GCC programs, optind is always 1 even after calling getopt, but clang programs can read the updated value.
Now, this is quickly approaching the limits of my understanding. Please let me know if I can help with more testing.

This happened to me once before a few months ago, but since then I had forgotten how I fixed it.

@rui314
Copy link
Owner

rui314 commented Jul 29, 2023

I couldn't reproduce the issue on my machine, so I need your input files. Can you run the last link command with --repro (or -Wl,--repro)? With that option, mold collects all input object files and put them into a tar file. Please upload the generated tar file here so that I can download. Thanks.

@aabacchus
Copy link
Author

Attached is the tarball for the program which segfaults for me. Would you rather have the tarball from linking musl itself?
gcc_bad.repro.tar.gz

Statically linked executables don't have this problem.

@rui314
Copy link
Owner

rui314 commented Jul 31, 2023

I build your program with the given tarball, and the resulting executable worked without crashing in my Alpine/musl Docker container. It is likely that the executable itself isn't actually broken.

So you wrote that you build musl yourself. Are you sure your musl is fine?

@aabacchus
Copy link
Author

in my Alpine/musl

If you provided a different libc.so, then yes it would have worked. Here is the tarball of the link step for musl:
libc.so.repro.tar.gz

Yes, my musl is fine when linked with other linkers.

@rui314
Copy link
Owner

rui314 commented Jul 31, 2023

It seems your reproducer fails really only when it was loaded by your musl libc.so. I built musl 1.2.4 myself and tried to run your program under my musl (i.e. run the program as /path/to/musl/builddir/libc.so gcc_bad) and it didn't crash.

The fact that your program didn't crash with other linkers doesn't immediately mean that your musl is fine; it might happen to work for some program (think C's undefined behavior).

How did you build your musl? What is your distro? How can I reproduce your binaries from scratch?

I also want to make sure you didn't apply your local patch to your musl.

@aabacchus
Copy link
Author

aabacchus commented Jul 31, 2023

To clarify, were you able to use my libc.so.repro.tar.gz to link a libc.so, which did not crash? That's bizarre. Maybe the compiler used for musl is also important.

It's not just this one off, its a large number of programs which crash or have bugs.

I have not patched musl, it is built normally (./configure; make in a fresh tarball reproduces the bug). My distribution is KISS, and we do patch mold to build only for amd64, but removing the patch I can still reproduce this. If you'd like some brief instructions to set up a KISS chroot let me know.

@rui314
Copy link
Owner

rui314 commented Aug 1, 2023

I could reproduce the issue with the musl built from your object files, but that's not really debuggable because it's just .o files. It's not that different from libc.so in libc.so.repro.tar.gz from the debugging point of view.

If KISS Linux provides an official docker image, I can fire it up and try it myself.

@aabacchus
Copy link
Author

We don't have an official docker image but I've created one. I think it should work if you run

docker run -it aabacchus/kiss sh

(the image is here). I'm not particularly familiar with Docker but I have tested it and can still reproduce the issue.

When you are in the image, you will have to do the following:

  • First, build mold and it's dependencies. When it is finished it will prompt you to press Enter to install the packages.
$ kiss b mold
  • Switch mold to provide /usr/bin/ld
$ kiss a mold /usr/bin/ld
  • Rebuild musl, now with mold as the linker
$ kiss b musl
  • Trigger the bug
$ cat >test.c <<EOF
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
int main(void) {
        puts(program_invocation_short_name);
        return 0;
}
EOF
$ cc test.c
$ ./a.out
Segmentation fault (core dumped)

@rui314
Copy link
Owner

rui314 commented Aug 9, 2023

Thanks for the info. How can I build musl with debug info?

@aabacchus
Copy link
Author

Sure. You need to go into the repository for musl and edit its build script:

cd ~/repos/repo/core/musl/
vi build

Uncomment the :>nostrip line (which tells kiss not to strip the libraries) and uncomment the --enable-debug flag to configure. You should also delete the comment line above --enable-debug so that the flag is correctly passed to configure.

If you want to be able to step through the source while debugging, you'll need to add something like this to the top of the build file:

export CFLAGS="$CFLAGS -fdebug-prefix-map=$PWD=/usr/src/musl-1.2.4"

and then put the musl source in /usr/src/musl-1.2.4:

mkdir -p /usr/src
cd /usr/src
kiss d musl
tar xzf ~/.cache/kiss/sources/musl/musl-1.2.4.tar.gz

Finally you can kiss b musl.

@LinuxUserGD
Copy link
Contributor

I recently rebuilt musl and used mold to link it, and subsequently experienced segfaults and bugs in a lot of random programs.

Mimalloc pointers (see microsoft/mimalloc#360 (comment) and https://bugs.gentoo.org/917089) are somehow pointing to the wrong heap space after linking musl with mold, causing segfaults when compiling with Clang.

I can reproduce it with a Gentoo stage3 tarball:
https://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-musl-llvm/

CMake Error at /usr/share/cmake/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "/usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC
    
    Run Build Command(s):/usr/bin/ninja -v cmTC_391b2 && [1/2] /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang    -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -MD -MT CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -MF CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o.d -o CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -c /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC/testCCompiler.c
    [2/2] : && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2   && :
    FAILED: cmTC_391b2 
    : && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2   && :
    mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7f1fbad089b0
    clang-16: error: unable to execute command: Segmentation fault (core dumped)
    clang-16: error: linker command failed due to signal (use -v to see invocation)
    ninja: build stopped: subcommand failed.

@rui314
Copy link
Owner

rui314 commented Nov 10, 2023

I built mold in the gentoo:stage3-musl docker container, replaced /usr/bin/ld with mold, built musl with emerge musl and built clang with emerge clang. All of it worked fine. I didn't observe any failures. How exactly can I reproduce the issue?

@LinuxUserGD
Copy link
Contributor

@rui314 Should be reproducible in a stage3-musl-llvm chroot after recompiling llvm with binutils-plugin and recompiling musl with clang and ld.mold

emerge --sync
echo "sys-devel/llvm binutils-plugin" > /etc/portage/package.use/custom
emerge -1 =sys-devel/llvm-16.0.6 --exclude=llvm:17 && emerge sys-libs/mold
  • replace /etc/portage/make.conf with
COMMON_FLAGS="-O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto"
CC="clang"
CXX="clang++"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS} -stdlib=libc++"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
LDFLAGS="${COMMON_FLAGS} ${LDLIBS} -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind"
CHOST="x86_64-gentoo-linux-musl"
ACCEPT_KEYWORDS="amd64 ~amd64"
LD="ld.mold"
LC_MESSAGES=C
EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS}"
MAKEOPTS="-j4"
emerge -1 =sys-libs/musl-1.2.3* sys-libs/libcxx --exclude=sys-devel/llvm

@aabacchus
Copy link
Author

@LinuxUserGD isn't it mold segfaulting in your case, not a program linked to musl built with mold?

@LinuxUserGD
Copy link
Contributor

@LinuxUserGD isn't it mold segfaulting in your case, not a program linked to musl built with mold?

Yes, mold segfaults with -flto when musl is compiled with mold.
After rebuilding musl with lld, linking with mold completes without the mimalloc error.

Starting program: /usr/bin/ld.mold -pie --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib/ld-musl-x86_64.so.1 -o a.out /lib/Scrt1.o /lib/crti.o /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtbegin-x86_64.o -L/lib -L/usr/lib -plugin /usr/lib/llvm/16/bin/../lib/LLVMgold.so -plugin-opt=mcpu=skylake -plugin-opt=O2 -z relro -z now -O3 --as-needed --strip-debug --undefined-version --icf=safe --threads=4 --compress-debug-sections=none /tmp/check_cxx11-b34c02.o -lc++ -lm /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed -lc /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtend-x86_64.o /lib/crtn.o
[Detaching after fork from child process 232385]
mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7ffff7e36c50

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fd7de7 in setjmp () from /lib/ld-musl-x86_64.so.1

@aabacchus
Copy link
Author

@rui314 I made a docker image with the above commands run, so that it contains the buggy musl. Just

docker run -it aabacchus/test sh
cc test.c
./a.out

to reproduce.

@rui314
Copy link
Owner

rui314 commented Nov 11, 2023

Thank you, everyone. I successfully reproduced the issue following your instructions. It's a challenging issue to debug, but it appears to be related to a subtle bug in weak symbol handling. I will prepare a fix.

@rui314 rui314 closed this as completed in da3f5dd Nov 12, 2023
@rui314
Copy link
Owner

rui314 commented Nov 12, 2023

This was a bad bug, thank you again for reporting. I believe the above commit fixed the issue. Can you try again with the git head?

@aabacchus
Copy link
Author

da3f5dd

It seems to be fixed, thank you!

@LinuxUserGD
Copy link
Contributor

LinuxUserGD commented Nov 12, 2023

The mimalloc segfault is fixed by da3f5dd as well, thanks!

VitalyAnkh pushed a commit to VitalyAnkh/mold that referenced this issue Dec 23, 2023
--dynamic-list, --export-dynamic-symbol and --export-dynamic-symbol-list
have different semantics for executables and DSOs. If the output is an
executable, they specify a list of symbols that are to be exported.
If the output is a shared object, they specify the list of symbols that
are to be interposable.

mold havne't implemented the latter semantics. This commit fixes that
issue.

Fixes rui314#1071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants