Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating fibers across threads vs. TLS #666

Open
dnadlinger opened this issue Jul 5, 2014 · 4 comments
Open

Migrating fibers across threads vs. TLS #666

dnadlinger opened this issue Jul 5, 2014 · 4 comments

Comments

@dnadlinger
Copy link
Member

core.thread fibers are supposed to be safe to migrate across threads, i.e. resume them from a different thread than they yielded from. There is also a unit test to verify that this works.

However, there is a problem: LLVM currently (version 3.4/3.5 SVN) always assumes that the address of thread-local variables don't change while on a given stack. This assumption hasn't really caused any problems with non-PIC code so far: TLS accesses are made by addressing off fs, gs or so in most Windows/Linux x86 ABIs, and as a result there is nothing to gain by caching. However, when using the ELF general dynamic TLS model (default for PIC on Linux), the address is determined via a call to __tls_get_addr. If there now are multiple accesses to a TLS variable in the same function, LLVM indeed caches the result of that call on the stack. This is a problem if some code in between those accesses ends up calling Fiber.yield, as the thread the code is running on, and thus the address of the TLS symbol, might have changed in the meantime.

In consequence, migrating fibers between threads currently is an unsafe thing to do if the code they execute relies on TLS at all. See core.thread.Fiber.switchOut for an example of how this play out – in user code, the context switch can be hidden behind many layers of calls, of course.

This is not a bug in druntime, but poses a problem for pretty much all userspace fiber/coroutine/green-thread implementations out there. The correct solution is to have an option in the compiler backend to disable caching of TLS addresses across (opaque) function calls. MSVC has a switch for this, while the GCC devs refuse to acknowledge the problem altogether. The LLVM/Clang devs are aware of the issue, although I don't think I agree with the proposed solution. This should likely be handled by just providing an option to force the target-specific lowering code not to keep the address across function calls.

Note that this is not an issue on the IR level. If the variable is read two times, the final IR contains two loads from the TLS variable, just as it is supposed to. I'm not sure yet which parts of the target lowering layer handle generate the address lookup code, but the problem needs to be addressed there.

In the meantime, a workaround would be to emit all TLS address explicitly via a noinline function (like in the above druntime snippet, but emitted by the compiler). This would obviously be problematic in terms of performance, but we could probably use that code for implementing TLS emulation on Android as well.

@dnadlinger
Copy link
Member Author

@ibuclaw: This is going to be an issue for you guys too. @MartinNowak: FYI, even though DMD seems to always call __tls_get_addr for each load/store, so its not an issue there.

@ibuclaw
Copy link
Contributor

ibuclaw commented Jul 6, 2014

That gcc bug is old, I can bump it and cross-reference it back to llvm if you like.

I'm not so fussed about a compiler flag controlling behavior because you can just default it to 'on'.

@smolt
Copy link
Member

smolt commented Mar 29, 2015

I noticed while working on OS X merge-2.067 that core.thread Fiber runShared unittest still crashes with release builds because the code path that uses pthread_getspecific(sm_this) is not enabled for version OSX. Can we fix that locally as part of merge-2.067 activity then later push upstream? It will allow us to get a clean test on OS X.

@ibuclaw
Copy link
Contributor

ibuclaw commented Apr 30, 2020

Confirmed that this also affects PPC64 too.

We now have core.volatile, so I wonder if the sm_this accesses could be done through volatileLoad. So far the only workaround that works 3/4s of the time is to disable inlining of switchIn, switchOut, and getThis. Setting optimization level of these (and maybe some others) functions to -O0 has a 100% success rate, but that is a non-standard extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants