-
-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mold-linked binaries occasionally emit SEGV during dynamic linker initialization #1157
Comments
I recently fixed a crash bug (000ce0e) that happens if you use mold to link object files compiled with a recent version of LLVM. So do you mind if I ask you to try the git head to see if it's already resolved? If it still crashes with the latest git commit, I need to reproduce the issue locally to investigate, so in that case I'd like you to provide the information as to how to do that. If your program is open-source, let me know the repository of your program. |
Confirmed it still occurs on current HEAD. Unfortunately, our code is closed source and I haven't managed to figure out exactly what causes it, other than a specific binary of ours always triggers it. I can work on trying to figure out a minimal reproducer, but is there any additional information I can get out of the binary itself in the meantime that might help? |
Can you run |
Better yet, I have a CMakeLists.txt that reproduces it. Turns out it happens pretty consistently when linking the open source clickhouse-cpp library dynamically: cmake_minimum_required(VERSION 3.11)
project(test LANGUAGES CXX)
set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
add_compile_options(
-Wall
-Wextra
-Werror
-fsanitize=address # optional
-fno-omit-frame-pointer # optional
)
add_link_options(
LINKER:--enable-new-dtags
-fuse-ld=mold
-fsanitize=address # optional
)
include(FetchContent)
FetchContent_Declare(clickhouse_cpp
GIT_REPOSITORY https://github.com/ClickHouse/clickhouse-cpp.git
GIT_TAG v2.5.1
SYSTEM
)
FetchContent_MakeAvailable(clickhouse_cpp)
file(WRITE ${CMAKE_BINARY_DIR}/main.cpp "
#include <clickhouse/client.h>
int main() {
clickhouse::Client client(clickhouse::ClientOptions{});
return 0;
}
")
add_executable(main ${CMAKE_BINARY_DIR}/main.cpp)
target_link_libraries(main clickhouse-cpp-lib) I built this with clang-17.0.1 which was configured to use gcc-13.2.0's libstdc++. If this doesn't repro for you, I'll take a look at the readelf output of this minimal repro and paste it here. |
Thanks for the update. Let me try that in a Docker container. What distro are you using? |
RHEL 8.3. FYI, I just tested this with the commit just before the bisected bad commit and it failed too. This one produces a slightly different stack trace with ASAN so maybe I hit a different bug? Still looks really similar:
|
It's probably the same issue. Could you bisect it further to find the first commit that makes your program to fail? |
User testing error. Same commit causes it in the repro. The clickhouse .so itself has to be relinked with the bad commit for it to happen, not just the executable. |
Just to confirm, you meant that you could reproduce the issue with 4cdfc7e? |
I could not. The next commit is when it starts (the one I linked the issue). Both the clickhouse .so and main binary need to be relinked on each commit, I had forgotten to relink the clickhouse .so when I tested that last good commit (the one before the bad one). |
I haven't had luck in building your reproducer so far. Are you using Docker? If so, could you provide me with a Dockerfile and instructions on how to build your program exactly? |
I wasn't using docker--just bare metal git checkout of mold and locally built clang and gcc installations. Night time here, I'll attempt to repro in a docker container tomorrow morning. Thanks for looking into this and for a great linker! |
Just as last note before I sign off, I tested my repro on a totally unrelated fedora 39 VM with upstream clang (it's on 17.0.4). Only custom thing (not a base fedora rpm) was a fresh git checkout of mold built in release mode with gcc 13.2.
|
Ensure you have the above CMakeLists.txt file in the same dir as this Dockerfile, and remove the FROM quay.io/fedora/fedora:39
COPY CMakeLists.txt /root/
WORKDIR /root
RUN dnf install -y gcc clang compiler-rt libasan ninja-build git cmake
RUN git clone https://github.com/rui314/mold
WORKDIR /root/mold
RUN cmake -B build -G Ninja
RUN ninja -C build install
WORKDIR /root
RUN cmake -B repro -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
RUN ninja -C repro You should be able to drop into the container and see the bad binary: $ docker run -it <image>
# ./repro/main |
@jdrouhard Thank you! Just to confirm, even if I remove
|
Yeah that's the expected behavior if it linked correctly. |
I think I found the cause of the issue and fixed it in the above commit. It turned out that the particular commit you found by bisecting was not the root cause of the issue but happened to make the existing issue more visible. I'll release mold 2.4.0 soon. |
A `.eh_frame` section contains data for exception handling. Usually, an object file contains only one `.eh_frame` section, which explains how to handle exceptions for all text sections in the same object file. However, it appears that, in rare cases, we need to handle object files containing multiple `.eh_frame` sections. An example of this is the `/usr/lib/clang/17/lib/x86_64-redhat-linux-gnu/clang_rt.crtbegin.o` file, which is provided by the `compiler-rt` package of Fedora 39. Specifically, I'm using the `quay.io/fedora/fedora:39` Docker image. The file contains two `.eh_frame` sections. One `.eh_frame` in the file is of type `STT_X86_64_UNWIND` and the other is of `STT_PROGBITS`. It's possible that the file was created with `ld -r`, and the linker failed to merge the two incoming `.eh_frame` sections into one output section due to the difference in section types. We did not expect such inputs and consequently produced corrupted output files. This commit improves our linker so that mold can handle multiple `.eh_frame` sections in a single object file. Fixes rui314#1157
This issue is present in 2.3.2 through current HEAD. I bisected the issue to this exact commit: 4dd5d2f.
Not sure what info you'd like to see, but I enabled ASAN and got this stack trace:
This binary was built with clang-17.0.1 using gcc-13.2.0 libstdc++.
The text was updated successfully, but these errors were encountered: