Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test error in 533/960 when compiling with LLVM/Clang-14 and using LLD-14 as linker #21847

Closed
ms178 opened this issue Dec 21, 2021 · 12 comments · Fixed by #21855
Closed

Test error in 533/960 when compiling with LLVM/Clang-14 and using LLD-14 as linker #21847

ms178 opened this issue Dec 21, 2021 · 12 comments · Fixed by #21855
Milestone

Comments

@ms178
Copy link

ms178 commented Dec 21, 2021

systemd version the issue has been seen with

250.rc3.r1.g2cb726adf5

Used distribution

Manjaro 21.2 (modified)

Linux kernel version used (uname -a)

Linux manjaro 5.15.10-xanmod1

CPU architecture issue was seen on

x86-64, Haswell-EP

Expected behaviour you didn't see

build succeeding

Unexpected behaviour you saw

build did not succeed when using LLVM/Clang-14 and LLD-14 (via -fuse-ld=lld) due to a test error (building with -fuse-ld=gold succeeds)

Steps to reproduce the problem

Compiling systemd with LLVM/Clang-14 and LLD-14 (c7f96d5ab188bf371f8096ed0a98f91f18a5435a) should reproduce the issue.

Additional program output to the terminal or log subsystem illustrating the issue

 533/960 test-hashmap                                                                       FAIL             0.09s   killed by signal 6 SIGABRT
>>> MALLOC_PERTURB_=153 SYSTEMD_LANGUAGE_FALLBACK_MAP=/home/marcus/Downloads/systemd-git/src/systemd/src/locale/language-fallback-map PATH=/home/marcus/Downloads/systemd-git/src/build:/home/marcus/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/var/lib/snapd/snap/bin /home/marcus/Downloads/systemd-git/src/build/test-hashmap

meson-log.txt

testlog.txt

@keszybz
Copy link
Member

keszybz commented Dec 21, 2021

Hmm, and what does it say if you just execute /home/marcus/Downloads/systemd-git/src/build/test-hashmap directly?

@ms178
Copy link
Author

ms178 commented Dec 21, 2021

@keszybz

Assertion 'n_extern_tests_run == 2' failed at src/test/test-hashmap.c:164, function int main(int, char **)(). Aborting.
zsh: IOT instruction (core dumped)  /home/marcus/Downloads/systemd-git/src/build/test-hashmap

@yuwata
Copy link
Member

yuwata commented Dec 21, 2021

Hm, the TEST() macro does not work if we have multiple .c files?

@ms178
Copy link
Author

ms178 commented Dec 21, 2021

Please note that LLD triggers this bug, if I use GOLD as the linker instead, the build finishes correctly with LLVM/Clang.

@medhefgo
Copy link
Contributor

This assert is designed to catch issues where not both ordered and unordered tests are run. Considering that it works with clang-13/lld-13 (with and without lto) or if you switch to gold, I would say it's a compiler/linker regression.

@ms178
Copy link
Author

ms178 commented Dec 21, 2021

@medhefgo I've never stated that it worked with LLVM/Clang/LLD-13, I've just tested it with the distribution supplied LLVM/Clang-13 and LLD-13, it shows the same results, the same error appears for that test case. I've also downgraded my GCC to the distribution provided one (11.1.0).

@medhefgo
Copy link
Contributor

I was saying it works on our ci with clang-13. I also cannot reproduce this with clang-14 in a ubuntu vm. So there must be something about your setup.

@medhefgo
Copy link
Contributor

Found the cause: Something about -Dbuildtype=release causes the SYSTEMD_TEST_TABLE sectrion to be stripped from the test binaries alongside the debug info if lld is used. Adding -Dstrip=false does not help.

This would have previously been caught by test-static-destruct as the SYSTEMD_STATIC_DESTRUCT section for it gets cleaned up too.

@bluca
Copy link
Member

bluca commented Dec 21, 2021

sounds like something we need to fix before release?

@bluca bluca added this to the v250 milestone Dec 21, 2021
@medhefgo
Copy link
Contributor

This is caused by -Wl,--gc-sections and lld not realizing we access these functions indirectly through the section (and the section itself indirectly too through __start/__end). This is a linker bug to me, as bfd and gold don't have this problem.

@ms178
Copy link
Author

ms178 commented Dec 21, 2021

Pinging @MaskRay who can provide insights from the LLD side.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 21, 2021

You can compare the output of -fuse-ld=lld -Wl,--gc-sections,--print-gc-sections,-z,start-stop-gc vs -fuse-ld=lld -Wl,--gc-sections,--print-gc-sections,-z,nostart-stop-gc.

The code should use relocations to indicate the section dependency relatioinship, not relying on the linker automatically remaining all C identifier name sections. The latter has some downside for metadata users (https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order What if all metadata sections are discarded?).

ford-prefect pushed a commit to PipeWire/pipewire that referenced this issue Apr 18, 2022
The linker may remove sections that are actually used when
"--gc-sections" and "-z start-stop-gc" is set. Add the `retain`
attribute to prevent that.

Furthermore, fix the alignment for `pwtest_suite_decl` objects.

See: #2292
See: https://lld.llvm.org/ELF/start-stop-gc.html
See: systemd/systemd#21847
See: systemd/systemd#21855
pobrn added a commit to pobrn/nvidia-vaapi-driver that referenced this issue May 10, 2022
The linker may remove sections that are actually used when
"--gc-sections" and "-z start-stop-gc" is set. Add the `retain`
attribute to prevent that.

See: https://lld.llvm.org/ELF/start-stop-gc.html
See: systemd/systemd#21847
See: systemd/systemd#21855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants