New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
References to extern thread_local variables clobber r11 on darwin #51045
Comments
rjmccall, kledzik: This is arguably a dyld bug. Let me know if you (or anyone in your area) is interested in the long-term fix mentioned after "--" in comment 0. |
(Upstream bug where we ran into this: https://bugs.chromium.org/p/chromium/issues/detail?id=1243375#c20) |
Can we not just say that the CC doesn't preserve r11 on Apple platforms? LLVM langref doesn't dictate Apple platform ABI. |
A'ight: https://reviews.llvm.org/D109112 |
In https://chromium.slack.com/archives/CGJF95D8C/p1693400241085249?thread_ts=1693346378.274619&cid=CGJF95D8C one of the Chromium folks retested Nico's testcase with Clang 17 on x86_64-apple-darwin22.6.0, and it seems to work (outputs 8). Has this bug been fixed? Or is the testcase simply not demonstrating the bug anymore, but it still occurs? |
See "The repro is with clang built at 9b6c813 and it's dependent on the optimizer." So it not repro'ing with that exact repro doesn't mean much. dyld code is now at https://github.com/apple-oss-distributions/dyld. How binding works has been rewritten a lot in dyld3. I don't remember when dyld3 became the system linker on macOS. Maybe in macOS 11? It's possible that it no longer clobbers r11 (…but older versions of macOS that we still support on chrome likely still run the old code). So if the repro attempt was on a newer macOS version, the negative result doesn't tell you much for that reason as well. |
Extended Description
TLS wrapper functions use calling convention cxx_fast_tlscc, which per langref:
"""On X86-64 the callee preserves all general purpose registers, except for RDI and RAX."""
When calling a non-dso_local TLS wrapper function on darwin, we'll end up calling into dyld_stub_binder to to resolve the wrapper function.
dyld_stub_binder clobbers r11: https://github.com/opensource-apple/dyld/blob/master/src/dyld_stub_binder.s#L203
(Also, the thunks inserted by lld and ld64 do so too, probably since they figure r11 is already overwritten by dyld_stub_binder)
So we can't use cxx_fast_tlscc for non-dso_local TLS wrapper functions on darwin.
That's of course unfortunate since cxx_fast_tlscc removes lots of stack traffic. So maybe in time we could change dyld_stub_binder to not clobber r11...somehow and make the linkers use rax in the stub code, and then use cxx_fast_tlscc if linkers and targeted macOS versions are new enough. But that needs changes to dyld, so someone at apple would have to drive this.
Here's a standalone repro that shows the bug, but the summary above is really all you need.
The output should be 8, and it is 8 if I remove the
+ j +
bit in tlv.cc. (j is a tlv that's 0.)(The repro is with clang built at 9b6c813 and it's dependent on the optimizer. Pasting the asm clang generated for me below, see how the same r11 issue happens there:
)
The text was updated successfully, but these errors were encountered: