Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: c-archive Output Mode Forces Initial-Exec TLS model #48596

Open
auzhva opened this issue Sep 24, 2021 · 6 comments
Open

cmd/link: c-archive Output Mode Forces Initial-Exec TLS model #48596

auzhva opened this issue Sep 24, 2021 · 6 comments

Comments

@auzhva
Copy link

@auzhva auzhva commented Sep 24, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1642499936=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Full Gist is available here - https://gist.github.com/auzhva/6e3660c2433f7ead765d4d09e645013b

In provided example there are 3 files:

  • app.c - sample loader app
  • lib.c - sample C-library
  • lib.go - sample Go-library

If built in below fashion it works:

go build -o libGoShared.so -buildmode=c-shared ./lib.go
gcc -shared -lpthread -fPIC lib.c -o libClibShared.so -lGoShared -L.
gcc app.c -o app -ldl

Then when running ./app it's fine.

While if I do below it fails with cannot allocate memory in static TLS block error.

go build -ldflags="-s -w" -o libGoStatic.a -buildmode=c-archive ./lib.go
gcc -shared -lpthread -fPIC lib.c -o libClibStatic.so -lGoStatic -L.
gcc app.c -o app -ldl

More investigation on STATIC_TLS flag

It appears that Go always builds with Initial-Exec TLS model, which appears to lead to STATIC_TLS flag being set to produced ELF binaries.

This is easily visible with below:

$ readelf --dynamic -W libGoShared.so 
... skipped ...
 0x000000000000001e (FLAGS)              SYMBOLIC STATIC_TLS
... skipped ...

This is a case with c-shared library itself. But when c-archive is produced then whatever upper code will use it - it inherits STATIC_TLS flag.

So in second (faulty) example above the entire libClibShared.so inherits STATIC_TLS flag as well. At least on x86_64 arch (in theory this can be arch-dependent).

Ok, there is STATIC_TLS flag. So what?

Loading STATIC_TLS images is a complicated thing.

In linux version glibc it allocates pre-populated constant surplus of 512 bytes in TLS block for further unspecified use. It's defined here - https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-tls.c;h=d554ae44976bc494da4e41aaa4f6ecb5ca2ca4be;hb=HEAD

If I'm checking sizes of .tbss/.tdata sections via readelf -S then I see that go uses only 0x10 bytes of TLS. Which is lower then glibc surplus. And theoretically glibc is able to dlopen() up to 32 go libs until it will run out of the buffer.

But it doesn't work with c-archive. Go lib in c-archive mode requires same 16 bytes of TLS stack, but other code may have additional TLS requirements. So if additional code needs more TLS then c-archive plus that code runs out of glibc boundaries and entire thing fails.

That wouldn't have been happening if c-shared and c-archive would've not been forcing Initial-TLS TLS models. As this 512-byte glibc limit applies only to that model. And any of the other 3 models (General Dynamic, Local Dynamic, Local Executable) do not have this limit.

What did you expect to see?

Both c-shared and c-archive cases working.

What did you see instead?

C-archive case failing.

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Sep 24, 2021

@ianlancetaylor
Hard to tell if this is easy or not possible.

@ianlancetaylor ianlancetaylor changed the title c-archive Output Mode Forces Initial-Exec TLS model cmd/link: c-archive Output Mode Forces Initial-Exec TLS model Sep 24, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 24, 2021

Offhand I don't know why this would be different for c-archive and c-shared. In both cases we pass -shared to cmd/compile.

Perhaps the difference is in cmd/link. I'm not sure. In general for c-archive the linker should produce an object that can be linked into a shared library, including using local-dynamic rather than initial-exec for the single TLS variable g. If that is indeed the problem, this should be doable but is probably not trivial. Perhaps some runtime changes are needed as well, I'm not sure.

CC @cherrymui

@auzhva
Copy link
Author

@auzhva auzhva commented Sep 24, 2021

Both с-archive and c-shared include same STATIC_TLS flag and IE TLS model. So in a sense they work same.

But that leads to different effects when c-archive is being combined with other user code. Because c-archive propagates STATIC_TLS flag to entire user codebase (including user-defined code), and that combination fails.

c-shared does not fail only because it's isolated from the other user code.

If it's still confusing, I'll try to think how to better illustrate... Or maybe attached gist could help.

In attached gist: lib.c and lib.go are fail in combination, but work separately. Because of lib.go enforcement of STATIC_TLS flag, which propagates to lib.c as well. Which doesn't happen in c-archive mode.

@fweimer-rh
Copy link

@fweimer-rh fweimer-rh commented Sep 26, 2021

Current glibc no longer uses all static TLS for optimizations:

commit ffb17e7ba3a5ba9632cee97330b325072fbe41dd
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Jun 10 13:40:40 2020 +0100

    rtld: Avoid using up static TLS surplus for optimizations [BZ #25051]
    
    On some targets static TLS surplus area can be used opportunistically
    for dynamically loaded modules such that the TLS access then becomes
    faster (TLSDESC and powerpc TLS optimization). However we don't want
    all surplus TLS to be used for this optimization because dynamically
    loaded modules with initial-exec model TLS can only use surplus TLS.
    
    The new contract for surplus static TLS use is:
    
    - libc.so can have up to 192 bytes of IE TLS,
    - other system libraries together can have up to 144 bytes of IE TLS.
    - Some "optional" static TLS is available for opportunistic use.
    
    The optional TLS is now tunable: rtld.optional_static_tls, so users
    can directly affect the allocated static TLS size. (Note that module
    unloading with dlclose does not reclaim static TLS. After the optional
    TLS runs out, TLS access is no longer optimized to use static TLS.)
    
    The default setting of rtld.optional_static_tls is 512 so the surplus
    TLS is 3*192 + 4*144 + 512 = 1664 by default, the same as before.
    
    Fixes BZ #25051.
    
    Tested on aarch64-linux-gnu and x86_64-linux-gnu.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

The TLS surplus can also be tuned at run time, somewhat indirectly, using the glibc.rtld.optional_static_tls tunable. I'm not sure if it is necessary or even desirable to avoid static TLS.

@auzhva
Copy link
Author

@auzhva auzhva commented Sep 26, 2021

Oh, that sounds even worse.

If I understood correctly default glibc setting wouldn't be able to dlopen() cgo libs at all in some new versions, isn't it? I hope I misunderstood that otherwise that really sounds like an issue.

glibc tuneables are out of any stability contract and not always enabled even.

Tunables are not part of the GNU C Library stable ABI, and they are subject to change or removal 
across releases. Additionally, the method to modify tunable values may change between releases 
and across distributions. 

@fweimer-rh
Copy link

@fweimer-rh fweimer-rh commented Sep 26, 2021

If I understood correctly default glibc setting wouldn't be able to dlopen() cgo libs at all in some new versions, isn't it? I hope I misunderstood that otherwise that really sounds like an issue.

No, newer glibc versions have larger TLS surplus space by default, and dedicate some of this space to use in future dlopen calls for static TLS only (i.e., it can't won't be used up by dynamic TLS, as was the case before).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants