Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining assembly into slp_switch is fragile and compiler implementation-dependent #66

basak opened this issue Oct 14, 2014 · 1 comment


Copy link

@basak basak commented Oct 14, 2014

The underlying issues leading to pull requests #64 and #65 suggest to me that slp_switch has potential to be compiled buggy when undefined compiler behaviour changes. Specifically:

  • The gcc manual says: "Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions." We rely that the assembly in slp_switch is not re-ordered around calls to slp_save_state and slp_restore_state.
  • The declaration of REGS_TO_SAVE with registers that are clobbered only applies to that __asm__ statement itself. A compiler might, for example, fulfill the need to preserve these registers in each call just by pushing and popping them before and after. This breaks the assumption that these registers will be saved across the slp_save_state and slp_restore_state calls.
  • The code after slp_restore_state runs in a "different" stack (right?). The compiler does not know this, and can generate code that assumes that registers are preserved within the function but over the call to slp_restore_state.
  • Fundamentally, changing the stack and frame pointers under the compiler's feet always risks tripping it up, unless you assume that it generates the code in a particular way (which it didn't in this case).

It's also particularly awkward to debug a problem in this code, since it heavily depends on what gcc generated and this is more difficult to read.

It seems to me that all of these issues would go away if slp_switch were implemented in a function defined entirely in assembly.

I understand that the SLP_SAVE_STATE and SLP_RESTORE_STATE macros would no longer be possible so changes to these macros would instead have to be done individually for every single platform. But I think this is a better option than having them end up generating bad code, which is what has happened in this case because the current mechanism depends on undefined behaviour. Looking at the git history, it looks like one macro only had to be changed once.

I'll leave it up to you. Feel free to close as "Won't Fix" - in this case I guess we'll just have to act when we see test failures.

(speaking of which, thank you for the tests, which picked up on these issues, and give me confidence that my workarounds work)

Copy link

@snaury snaury commented Oct 14, 2014

Yes, I'm well aware of the issue, and full rewrite in assembly in the only proper solution. Unfortunately it's not that easy, one would need to call helper functions to do actual save/restore. I could do it for x86, then maybe for amd64 (harder, it needs to maintain stack alignment guarantees, which are impossible to know within __asm__ blocks), but for other platforms this would be simply out of the question, not by me at least.

I don't have enough interest to do this, and in my opinion Go, Rust and proper coroutines are the future, not flimsy hacks like greenlet. Even if greenlet did switching perfectly, saving and restoring (or rather overwriting) stacks the way greenlet currently does is unsafe, which breaks gc, breaks threaded C++ code (global pointers to stack allocated objects become unsafe). And I'm sure no one would be thrilled if I moved to proper switching code (e.g. from Boost.Context), used separate stacks (would severely limit 32-bit architectures, e.g. no more than ~2000 greenlets), and dropped lots of architectures in the process. So until someone has enough itch to do the job we are left to working around smarter and smarter compilers.

Btw, pypy has a much better solution to this problem (but for a much smaller set of platforms), single asm block with pointers to save/restore functions. But it's not ideal either due to possibly incomplete clobber lists (callee saved registers are tricky) or potential stack alignment issues (which you can't maintain in __asm__ blocks, since you don't have any guarantees in there). Works in practice though, even has a cross-stacklet stack introspection, which is awesome!

Not closing since the issue is true, and is awaiting its heroes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants