New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: preserve extra M across calls from C to Go #51676
Comments
I finished a draft PR for this proposal: #51679 With hello.go package main
import "C"
//export AddFromGo
func AddFromGo(a int64, b int64) int64 {
return a + b
}
func main() {} hello.c #include <stdio.h>
#include "libgo-hello.h"
#include <stdlib.h>
int main(int argc, char **argv) {
long a = 2;
long b = 3;
long max = 1;
if (argc > 1) {
max = atoi(argv[1]);
}
printf("max loop: %ld\n", max);
PreBindExtraM();
long r;
for (int i = 0; i < max; i++) {
r = AddFromGo(a, b);
}
printf("%ld + %ld = %ld\n", a, b, r);
} benchmark with $ time ./hello 1000000
max loop: 1000000
2 + 3 = 5
real 0m0.150s
user 0m0.156s
sys 0m0.010s benchmark without $ time ./hello 1000000
max loop: 1000000
2 + 3 = 5
real 0m5.088s
user 0m1.536s
sys 0m4.116s |
Even after looking at the pull request I'm not sure precisely what you are proposing. Is user code expected to call PreBindExtraM? What is the exact semantics of that function? How would you write user documentation for it? Thanks. |
@ianlancetaylor Thanks.
Yes, user code have to call Let me try to write a bit document for it: When calling a go exported function in a c process, in short, it works as this flow:
In step 1 ( To avoid these five signal syscall, cgo also generated a built-in C function |
I haven't thought through this deeply, but is the TODO(rsc) comment on |
OK, I think that in effect what the suggested change does is, for a thread created by C, set the So, I agree: the TODO by @rsc is a better approach. With that approach, the first time a C thread calls into Go we allocate a G and M and set the Note that we will get into trouble if the C thread calls Go code, then disables the signal stack, then calls Go code again. Perhaps that case is not worth worrying about. I'm going to take this out of the proposal process because I think we can get the same effect without an API change. |
Sorry for basic question, but today does it already track when a thread created by C exits? |
To partly answer my own question, it looks like registering a destructor which would be called on thread exit would be part of the work here... |
Yes, we would use |
Oh, agreed, the TODO by @rsc is a better approach. Using
Do it need to create a new Does the following change is in the right way? I would love to have a try. Thanks.
In short, we always try to pre-bind M in every Go exported function. And drop M in destructor to avoid M leaking. |
Yes, that is the right thing to do. Your set of steps sounds basically right. |
Okay, jumping out of the |
I'm not sure I completely understand what you mean, but I think that's the right direction.
Creating the m in |
Yeah, I mean keep
Yeah, this sounds better than |
I have implemented the new way in CL 387415. In CL 387415, we introduced to variables:
|
Change https://go.dev/cl/392854 mentions this issue: |
Some time ago, I had briefly looked into whether an equivalent solution might be possible on Windows. FWIW, some people seem to suggest that Fiber Local Storage functions on Windows could provide a destructor call back on thread exit, even if not using Fibers. It looks like FlsAlloc takes an FlsCallback that is called at thread exit (and fiber deletion):
Some more from another piece of the Fiber documentation, including how FLS is treated if no fiber switching has happened:
There is also a related discussion in the MSDN forums here about trying to emulate a pthread_key_create destructor on Windows: In that discussion, user For Fiber Local Storage, one wrinkle would be if the user code is itself using fibers and for example deletes the fiber of interest. At that point, as I understand it the destructor would be called before the thread exited, but maybe things could be set up in a way so that scenario is just a performance hit (that is, ~similar performance as happens today) where the M and whatever other resources are released "early" compared to if the fiber hadn't been deleted? And if the fiber is not deleted, then thread exit still properly releases things, which avoids a leak. Maybe? In any event, this might not work, and please take with a large grain of salt, but I wanted to at least leave a note here in case it is helpful for any future work after the (exciting!) non-Windows version lands. |
A comparison instruction was missing in CL 392854. Should fix ARM builders. For #51676. Change-Id: Ica27a99be10e595bab4fad35e2e6c00a1c68a662 Reviewed-on: https://go-review.googlesource.com/c/go/+/479255 TryBot-Bypass: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/479255 mentions this issue: |
There are 5
sigprocmask
calls and 3sigaltstack
calls when calling every go exported function from C.syscall during
needm
:syscall during
dropm
:We can call
PreBindExtraM
to bind extra M after loaded go so file and before call any go exported functions, for better performance.And nothing changes without this
PreBindExtraM
call.background:
We are building GoLang extension for Envoy which heavily relies on cgo.
The text was updated successfully, but these errors were encountered: