Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: remove assumptions on Android Bionic's TLS layout #29674

Closed
rprichard opened this issue Jan 11, 2019 · 56 comments
Closed

runtime: remove assumptions on Android Bionic's TLS layout #29674

rprichard opened this issue Jan 11, 2019 · 56 comments

Comments

@rprichard
Copy link

@rprichard rprichard commented Jan 11, 2019

What version of Go are you using (go version)?

$ go version
go version devel +67164c3014 Fri Jan 4 20:25:45 2019 -0800 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Android

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/rprichard/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/rprichard/go"
GOPROXY=""
GORACE=""
GOROOT="/x/golang/go"
GOTMPDIR=""
GOTOOLDIR="/x/golang/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build009028863=/tmp/go-build -gno-record-gcc-switches"

Details

I posted about this issue earlier on golang-dev:
https://groups.google.com/forum/#!msg/golang-dev/yVrkFnYrYPE/2G3aFzYqBgAJ

I'm working on adding ELF TLS support to Android's libc (Bionic), and Go is making assumptions about Android's TLS memory layout that a future version of Android might break. Specifically:

  • On Android/{arm,arm64}, Go saves and restores its g register to a pthread key, which it accesses directly using the thread pointer. Go allocates a pthread key and assumes that it can find it at a positive offset, within 384 words after the thread pointer. ELF TLS on arm/arm64 allocates the space after the thread pointer to PT_TLS segments instead, which can be arbitrarily large. For maximum efficiency, Bionic might prefer to allocate the pthread keys at a negative offset from the TP, known at compile-time, but then Go can't find them.

  • On Android/{386,amd64}, Go uses %gs:0xf8/%fs:1d0 for its g register and assumes it can allocate the location by repeatedly calling pthread_key_create until it finds a key such that pthread_setspecific modifies the desired location. This assumption breaks if Bionic has an even number of TLS slots, because then Go's fixed location is allocated to a pthread_key_data_t::seq field rather than a pthread_key_data_t::data field.

I suspect Go is assuming that its pthread-allocated word is initialized to zero, but that's not generally the case when pthread keys are recycled. Bionic's pthread_getspecific lazily initializes a key to zero. (It'd be great if I could get confirmation on this point.)

Android may be able to keep Go working, but it would be better if Go had a reliable interface for accessing its TLS memory.

There's a simple way to fix arm/arm64 by adding an API to Bionic that allocates a single word of static TLS memory at a fixed offset. For example, I could add something like this to Bionic:

int pthread_alloc_static_tls_word_np(intptr_t* tpoff, void** tlsbase)
(https://android-review.googlesource.com/c/platform/bionic/+/862083)

gcc_android_{arm,arm64}.c could then look for the new API with dlopen and fall back to the current allocation behavior if the API is missing.

386/amd64 is harder to fix. Two possibilities:

  • It could use the API above, then use something resembling a TLS IE access in each function prologue. (Something like this: https://godbolt.org/z/5Cfw7S.)
  • Switch amd64 to an ARM-like design: pick a GPR to hold g, then save/restore the g register to memory allocated with the above API. Drop support for Android/386.

For the ARM design, I'm wondering to what extent Go could use pthread_getspecific or dynamic TLS (whether __tls_get_addr or TLSDESC) to load the g register. Possible downsides:

  • Might be noticeably slower.
  • Neither system is guaranteed to be async-signal safe. Perhaps Bionic could guarantee AS-safety.
  • Dynamic TLS could use a lot of stack if it needs to allocate heap memory. Perhaps Go can guarantee a large stack in situations where the thread's TLS var isn't allocated yet.

I could upload a Go patch demonstrating arm/arm64 use of the API I proposed.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Jan 11, 2019

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Jan 11, 2019

Thank you for working on this. I'll page someone more knowledgable on the runtime than myself: @randall77 @ianlancetaylor.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 11, 2019

What are the chances that x86 Bionic could implement TLS relocations as they are implemented on x86 GNU/Linux systems? Because then I think everything would just work.

Reserving a register on amd64 would break all existing assembly code, which means breaking a lot of crypto packages. I think that would be difficult. Though the ABI support in 1.12 might provide a stepping stone toward supporting that.

I don't fully understand your TLS IE example for amd64. It seems to assume that the offset from %fs can be determined at run time, but as far as I know it must be known at compile time. I think that handling a variable offset from %fs would require a longer instruction sequence.

In glibc, certain offsets from %fs are reserved for certain uses. See tcbhead_t in https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/nptl/tls.h;h=e25430a92826855f62617c5b51c65cb8d194f898;hb=HEAD . For example, that is where gccgo stores its split-stack information. Is there any chance that Bionic could give Go a similar pointer-sized word?

@eliasnaur eliasnaur changed the title Android Go's g TLS allocation makes assumptions about Bionic's TLS layout runtime: remove assumptions on Android Bionic's TLS layout Jan 11, 2019
@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Jan 12, 2019

What are the chances that x86 Bionic could implement TLS relocations as they are implemented on x86 GNU/Linux systems? Because then I think everything would just work.

Implementing the relocations is straightforward. The difficulty is that Go always uses static TLS ("IE" accesses), even though an Android app's solib is loaded with dlopen. Using IE in an solib doesn't generally work, because libc needs to allocate a thread's static TLS memory before it starts running, and it can't move/extend that memory when dlopen is called later. It can work if the libc reserves enough surplus static TLS memory. glibc reserves space (TLS_STATIC_SURPLUS and DTV_SURPLUS), and at least some BSDs also do (RTLD_STATIC_TLS_EXTRA). The reserved amounts I've seen vary between about 64 and ~1600 bytes.

The surplus amount needs to be enough for all dlopen'ed solibs. Once it's exhausted, dlopen fails.

There are other problems with surplus static TLS involving (a) initialization and (b) the DF_STATIC_TLS flag. Maybe we'd limit support to zero-initialized TLS segments and solibs marked with DF_1_NODELETE.

I don't fully understand your TLS IE example for amd64. It seems to assume that the offset from %fs can be determined at run time, but as far as I know it must be known at compile time. I think that handling a variable offset from %fs would require a longer instruction sequence.

On GNU/Linux, Go uses a TLS LE access in an executable, and each access is a single instruction, e.g.:

  402e10:       64 48 8b 0c 25 f8 ff    mov    %fs:0xfffffffffffffff8,%rcx

The offset is known at build-time and encoded into the instruction. Typically the static linker would encode the offset, but I think Go's compiler knows that the runtime.tlsg symbol is at offset -8 and can encode it earlier.

In a GNU/Linux shared object (-buildmode=c-shared), Go instead uses TLS IE, where the offset is known at load-time, e.g.:

   80250:       48 8b 0d 79 9d 39 00    mov    0x399d79(%rip),%rcx        # 419fd0 <.got>
   80257:       64 48 8b 09             mov    %fs:(%rcx),%rcx
   ...
   0000000000419fd0  0000000000000012 R_X86_64_TPOFF64                          0

The dynamic loader computes the TPOFF value for the solib's TLS segment and writes it into the GOT.

My proposal is to use something resembling TLS IE, but instead of storing the TPOFF in the GOT at load-time, it's computed at run-time and stored in an ordinary STB_LOCAL symbol.

In glibc, certain offsets from %fs are reserved for certain uses. ... Is there any chance that Bionic could give Go a similar pointer-sized word?

We could allocate a TLS slot that Go could use, but I think we're hesitant to give one to a particular language runtime if we can avoid it.

If we did allocate a slot today, Go would only be able to use it on new versions of Android. That's problematic for Android/x86 because it uses TLS LE in the app solib, e.g.:

   9ca20:       64 48 8b 0c 25 d0 01    mov    %fs:0x1d0,%rcx

The situation for Android/ARM is better. For ARM, Go is already computing a TP-relative offset at run-time.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 16, 2019

My proposal is to use something resembling TLS IE, but instead of storing the TPOFF in the GOT at load-time, it's computed at run-time and stored in an ordinary STB_LOCAL symbol.

I think that I have now paged enough memory to say that this seems reasonable to me. Thanks.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Jan 18, 2019

Ok, I think Bionic can add an API like the one I've proposed[1], and then I can upload a patch for Go to use the API on arm/arm64. Fixing x86 is more involved, though -- maybe someone on the Go team could look at that?

I'll work on adding the Bionic API.

[1] https://android-review.googlesource.com/c/platform/bionic/+/862083

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 19, 2019

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Jan 24, 2019

Bionic could reserve a TLS slot for use by apps: https://android-review.googlesource.com/c/platform/bionic/+/883872.

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Mar 24, 2019

What's the status of this? The feature freeze is about a month away and it would be a shame if Android Q didn't work with Go 1.13.

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Mar 25, 2019

@rprichard I see that your pthread_alloc_static_tls_word_np CL is not merged yet (https://android-review.googlesource.com/c/platform/bionic/+/862083). The TLS_SLOT_APP CL is (https://android-review.googlesource.com/c/platform/bionic/+/883872), but since that slot is now "Available for use by apps" and not just for Go, I'm not sure it's a good idea to depend on that.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Mar 26, 2019

I suspect it's too late to get the pthread_alloc_static_tls_word_np API into Q.

FWIW, so far, I don't know of any other third-party software that's trying to allocate memory at fixed offsets from the thread pointer. Using TLS_SLOT_APP on Android is not that different from the fixed offsets Go is using for g on Windows and Darwin for x86:

The "underalignment" errors on Q are related to this issue, but I think they can be fixed independently of the "how does Go access g" issue. AFAIK, the errors could be fixed by omitting the runtime.tlsg ELF symbol and the PT_TLS segment on targets that don't use them. AFAIK, those errors only happen when a Go binary is exec'ed (which I thought would be rare?).

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Mar 26, 2019

Ok, thank you Ryan.

@ianlancetaylor @cherrymui @aclements how do we go about changing 386 and amd64 to not use static offsets on Android? amd64 is using %fs:0x1d0, 386 is using %gs:0xf8, but as far as I understand this issue, we need to use a different offset (0x02?) for Android Q.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Mar 26, 2019

The offset is 2 words (0x8 on x86 and 0x10 on x86-64).

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Mar 26, 2019

Of course, silly me. Does the same apply for arm and arm64?

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Mar 26, 2019

Yes, TLS_SLOT_APP is at 0x8 for arm32 and 0x10 for arm64.

@Catfriend1

This comment has been minimized.

Copy link

@Catfriend1 Catfriend1 commented Mar 27, 2019

So which byte has to be patched to get syncthing saved from dying out on the android q emulator with x86?

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 27, 2019

Change https://golang.org/cl/169618 mentions this issue: cmd/link/internal/ld: skip TLS section on Android

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Mar 27, 2019

@Catfriend1 please try CL 169618.

gopherbot pushed a commit that referenced this issue Mar 27, 2019
We don't use the TLS section on android, and dropping it avoids
complaints about underalignment from the Android Q linker.

Updates #29674

Change-Id: I91dabf2a58e6eb1783872639a6a144858db09cef
Reviewed-on: https://go-review.googlesource.com/c/go/+/169618
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@hyangah

This comment has been minimized.

Copy link
Contributor

@hyangah hyangah commented Mar 27, 2019

The reserved TLS_SLOT_APP slot will work only for Q, not for P or earlier. The idea suggested by @ianlancetaylor was

change the code sequence to always load the offset from a memory location, as opposed to the constant that we use today. Then on earlier versions of Android we could store the constant in that memory location. That is, always generate the same code at compile time, and check at run time to decide how to handle it.

But I guess it will be a bigger change and no one has time to deal with it.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 29, 2019

@gopherbot please open a backport for 1.12

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 29, 2019

Backport issue(s) opened: #31155 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@Catfriend1

This comment has been minimized.

Copy link

@Catfriend1 Catfriend1 commented Apr 2, 2019

@eliasnaur Good work! I can confirm syncthing is running fine on Android Q, x86 emulator now.
image

@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Apr 2, 2019

@eliasnaur can you create the backport CL per #29674 (comment) ?

Thanks

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Apr 2, 2019

Are you sure? The linker changes are quite invasive and Android Q is in beta. Every Android version since 4.4 was released in August or later, so I hope that Go 1.13 will be out just in time for Q final.

@andybons

This comment has been minimized.

Copy link
Member

@andybons andybons commented Apr 2, 2019

@eliasnaur thanks for the context. We'll wait until 1.13.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Apr 2, 2019

Thanks for fixing this!

I see that Go is using TLS_SLOT_APP on Android Q, which is detected by the presence of the android_get_device_api_level symbol -- something like dlsym(dlopen(NULL, RTLD_LAZY), "android_get_device_api_level") (https://go-review.googlesource.com/c/go/+/170117). I think this will work. Maybe it's better to load "libc.so" instead of NULL in case something else in the program provides the symbol somehow?

For Android versions prior to Q, the Android NDK provides a static inline android_get_device_api_level function in android/api-level.h, starting with NDK r20, but the function is static so dlsym() shouldn't ever find it.

https://android.googlesource.com/platform/bionic/+/73d1fb95793709f06566c93347d1714a3b97bb4f/libc/include/android/api-level.h#119

https://android.googlesource.com/platform/bionic/+/73d1fb95793709f06566c93347d1714a3b97bb4f/libc/include/bits/get_device_api_level_inlines.h#41

Another option is to inline this function into the Go source. That'd be automatic with NDK r20, but with earlier NDKs, it could do something like:

#include <stdlib.h>
#include <sys/system_properties.h>

int is_Q() {
	char value[PROP_VALUE_MAX];
	if (__system_property_get("ro.build.version.sdk", value) < 1) {
		return 0;
	}
	return atoi(value) >= 29;
}

Unfortunately, it looks like Q still reports SDK 28 and won't be updated for a while, which would make testing the fix difficult/impossible.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Apr 2, 2019

Change https://golang.org/cl/170127 mentions this issue: runtime/cgo: look for android_get_device_api_level in libc.so

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Apr 2, 2019

Thanks, I created CL 170127 to look in libc.so directly. I'll leave the arguably more convenient NDK version to sometime after it returns the correct value.

@rprichard

This comment has been minimized.

Copy link
Author

@rprichard rprichard commented Apr 2, 2019

Yeah, that change looks good to me.

gopherbot pushed a commit that referenced this issue Apr 2, 2019
The presence of the android_get_device_api_level symbol is used to
detect Android Q or later. Use the suggestion by Ryan Prichard and
look for it in libc.so and not in the entire program where someone
else might have defined it.

Manually tested on an Android Q amd64 emulator and arm64 Pixel.

Updates #29674

Change-Id: Iaef35d8f8910037b3690aa21f319e216a05a9a73
Reviewed-on: https://go-review.googlesource.com/c/go/+/170127
Run-TryBot: Elias Naur <mail@eliasnaur.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Apr 8, 2019

Change https://golang.org/cl/170955 mentions this issue: runtime,runtime/cgo: set up TLS storage for Android Q without cgo

gopherbot pushed a commit that referenced this issue Apr 8, 2019
Android Q frees a static TLS slot for us to use. Use the offset of
that slot as the default for our TLS offset.

As a result, runtime/cgo is no more a requirement for Android Q and
newer.

Updates #31343
Updates #29674

Change-Id: I759049b2e2865bd3d4fdc05a8cfc6db8b0da1f5d
Reviewed-on: https://go-review.googlesource.com/c/go/+/170955
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@Catfriend1

This comment has been minimized.

Copy link

@Catfriend1 Catfriend1 commented Jul 21, 2019

Which release version has this commit already on-board? go1.12.7?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 21, 2019

This should be fixed in the upcoming 1.13 release. It is not fixed in any current release, and there are no plans to backport the various changes.

@Catfriend1

This comment has been minimized.

Copy link

@Catfriend1 Catfriend1 commented Jul 21, 2019

Thanks for the information. Will wait for 1.13 :)

@jacentsao

This comment has been minimized.

Copy link

@jacentsao jacentsao commented Sep 4, 2019

Still have this issue after upgrade to go 1.13, could any one provide some kindly suggestion. Many thanks.

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Sep 4, 2019

This issue is closed, so please file a new one with the error message you see, the device or emulator version, and Android version (10 final or a beta?).

FWIW, I upgraded my Pixel 1 to Android 10 yesterday and ran

https://play.google.com/apps/testing/im.scatter.app

with no problems.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 10, 2019

Change https://golang.org/cl/200337 mentions this issue: cmd/compile: fix spurious R_TLE_LE reloc on android/386

gopherbot pushed a commit that referenced this issue Oct 29, 2019
When compiling for GOARCH=386 GOOS=android, the compiler was attaching
R_TLS_LE relocations inappropriately -- as of Go 1.13 the TLS access
recipe for Android refers to a runtime symbol and no longer needs this
type of relocation (which was causing a crash when the linker tried to
process it).

Updates #29674.
Fixes #34788.

Change-Id: Ida01875011b524586597b1f7e273aa14e11815d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/200337
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Elias Naur <mail@eliasnaur.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.