Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References to extern thread_local variables clobber r11 on darwin #51045

Open
nico opened this issue Sep 1, 2021 · 6 comments
Open

References to extern thread_local variables clobber r11 on darwin #51045

nico opened this issue Sep 1, 2021 · 6 comments
Labels
bugzilla Issues migrated from bugzilla clang:codegen

Comments

@nico
Copy link
Contributor

nico commented Sep 1, 2021

Bugzilla Link 51703
Version trunk
OS All
CC @topperc,@RKSimon,@phoebewang,@zygoloid,@rjmccall,@rotateright

Extended Description

TLS wrapper functions use calling convention cxx_fast_tlscc, which per langref:

"""On X86-64 the callee preserves all general purpose registers, except for RDI and RAX."""

When calling a non-dso_local TLS wrapper function on darwin, we'll end up calling into dyld_stub_binder to to resolve the wrapper function.

dyld_stub_binder clobbers r11: https://github.com/opensource-apple/dyld/blob/master/src/dyld_stub_binder.s#L203

(Also, the thunks inserted by lld and ld64 do so too, probably since they figure r11 is already overwritten by dyld_stub_binder)

So we can't use cxx_fast_tlscc for non-dso_local TLS wrapper functions on darwin.


That's of course unfortunate since cxx_fast_tlscc removes lots of stack traffic. So maybe in time we could change dyld_stub_binder to not clobber r11...somehow and make the linkers use rax in the stub code, and then use cxx_fast_tlscc if linkers and targeted macOS versions are new enough. But that needs changes to dyld, so someone at apple would have to drive this.


Here's a standalone repro that shows the bug, but the summary above is really all you need.

% cat tlvhost.cc
extern thread_local int j;
thread_local int j = 0;

% clang -O2 tlvhost.cc  -std=c++11 -shared -o tlvhost.dylib

% cat tlv.cc
extern thread_local int j;

int f(int a, int b) {
  int c = a * b;
  int d = a + b;
  int e = a / b;
  int f = a - b;
  int g = a - 2 * b;
  int h = a - 3 * b;
  int i = a - 4 * b;
  return c / d * e / f + j + g * h + i;
}

% out/gn/bin/clang -O2 tlv.cc  -std=c++11 -shared tlvhost.dylib -o tlv.dylib

% cat main.cc
#include <iostream>

extern int f(int a, int b);

int main(int argc, char* argv[]) {
  printf("%d\n", f(atoi(argv[1]), atoi(argv[2])));
}

% clang main.cc tlv.dylib

% ./a.out 1 2
-846192167

The output should be 8, and it is 8 if I remove the + j + bit in tlv.cc. (j is a tlv that's 0.)

(The repro is with clang built at 9b6c813 and it's dependent on the optimizer. Pasting the asm clang generated for me below, see how the same r11 issue happens there:

 % out/gn/bin/clang -O2 tlv.cc  -std=c++11 -shared tlvhost.dylib -S -o -
clang: warning: tlvhost.dylib: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
	.globl	__Z1fii                         ## -- Begin function _Z1fii
	.p2align	4, 0x90
__Z1fii:                                ## @&#8203;_Z1fii
	.cfi_startproc
## %bb.0:                               ## %entry
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	pushq	%rax
	.cfi_offset %rbx, -24
                                        ## kill: def $esi killed $esi def $rsi
	movl	%edi, %ecx
	movl	%esi, %r10d
	imull	%edi, %r10d
	leal	(%rsi,%rcx), %r9d
	movl	%edi, %eax
	cltd
	idivl	%esi
	movl	%eax, %r8d
	subl	%esi, %edi
	movl	%ecx, %r11d
	subl	%esi, %r11d
	subl	%esi, %r11d
	leal	(%rsi,%rsi,2), %eax
	movl	%ecx, %ebx
	subl	%eax, %ebx
	shll	$2, %esi
	movl	%r10d, %eax
	cltd
	idivl	%r9d
	imull	%r8d, %eax
	cltd
	idivl	%edi
	movl	%eax, %edx

        # Calling into dyld_stub_binder here:
	callq	__ZTW1j

        # Using the now-clobbered r11 register right after:
	imull	%r11d, %ebx
	subl	%esi, %ecx
	addl	%ebx, %ecx
	addl	%ecx, %edx
	addl	(%rax), %edx
	movl	%edx, %eax
	addq	$8, %rsp
	popq	%rbx
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
.subsections_via_symbols

)

@nico
Copy link
Contributor Author

nico commented Sep 1, 2021

rjmccall, kledzik: This is arguably a dyld bug. Let me know if you (or anyone in your area) is interested in the long-term fix mentioned after "--" in comment 0.

@nico
Copy link
Contributor Author

nico commented Sep 1, 2021

(Upstream bug where we ran into this: https://bugs.chromium.org/p/chromium/issues/detail?id=1243375#c20)

@rjmccall
Copy link
Contributor

rjmccall commented Sep 1, 2021

Can we not just say that the CC doesn't preserve r11 on Apple platforms? LLVM langref doesn't dictate Apple platform ABI.

@nico
Copy link
Contributor Author

nico commented Sep 1, 2021

A'ight: https://reviews.llvm.org/D109112

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@pkasting
Copy link
Member

In https://chromium.slack.com/archives/CGJF95D8C/p1693400241085249?thread_ts=1693346378.274619&cid=CGJF95D8C one of the Chromium folks retested Nico's testcase with Clang 17 on x86_64-apple-darwin22.6.0, and it seems to work (outputs 8).

Has this bug been fixed? Or is the testcase simply not demonstrating the bug anymore, but it still occurs?

@nico
Copy link
Contributor Author

nico commented Nov 13, 2023

See "The repro is with clang built at 9b6c813 and it's dependent on the optimizer."

So it not repro'ing with that exact repro doesn't mean much.

dyld code is now at https://github.com/apple-oss-distributions/dyld. How binding works has been rewritten a lot in dyld3. I don't remember when dyld3 became the system linker on macOS. Maybe in macOS 11? It's possible that it no longer clobbers r11 (…but older versions of macOS that we still support on chrome likely still run the old code).

So if the repro attempt was on a newer macOS version, the negative result doesn't tell you much for that reason as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:codegen
Projects
None yet
Development

No branches or pull requests

3 participants