gollvm: find a better way to deal with g #37295
Currently, gollvm stores the current g in tls. The runtime.getg () function returns the current g. This function will be inlined for better performance and the inlining will be disabled by GoSafeGetg pass in some situations. Cherry described this situation in this pass:
A practical example of this situation: gofrontend/chan.go#154
As I know, this kind of optimization is common in llvm and gcc, and it seems to be correct and very good for c / c ++. Before c / c ++ introduced a concept like goroutine, I think this optimization will not be changed. Please refer to similar issues: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461, https://bugs.llvm.org/show_bug.cgi?id=19177.
At present, the methods I can think of are as follows:
The text was updated successfully, but these errors were encountered:
Personally I'd really like a way to let the backend not cache TLS addresses in GCC and LLVM. I don't think this is specific to Go. In C, if you use
I thought about that. The tricky thing is that the libgo runtime uses C for many things, including external C code, libbacktrace, libffi, libgcc, and syscall wrappers from libc. If we reserve a register globally, all these C code need to be compiled in a special way. Or we have to have some wrapper that save/restore the reserved register at C library boundaries.
I'm not sure how this is going to work. Also, the C code seems to need special compilation.
Maybe a possibility is with the current approach, and add a machine-IR pass, runs after CSE, that inlines getg calls back in. This will be machine dependent. And I'm not sure if this can be done for gccgo.
Hi Cherry, I also thought about many possible solutions, including:
Here is an example on Linux/arm64:
compile the above getg.c file with "clang -O2 -o getg getg.c".
As you can see, getg_new is inlined, which also avoids the optimization problem of CSE. Then we don't need the GoSafeGetg pass any more. I'm not familiar with x86 assembly, so I didn't write a x86 example, but I think This approach applies to other architectures.
The getg function is called frequently in runtime, and its performance is very critical, so we turn the call to the getg function into a load operation to inline it. In order to avoid the thread pointer caching problem when thread switching happens between two function calls of this function, we only inline the first one through GoSafeGetg pass. But this implementation is not feasible on linux arm64, because it is based on the assumption that the llvm backend's cse optimization of the thread pointer only happens in a block range, but on linux arm64, this optimization occurs in the entire function range. Expanding GoSafeGetg pass to the entire function range is not a good solution, because there will still be a large number of getg function calls that are not inlined. This CL simulates the implementation of getg in the c file through inline assembly. This not only ensures its correctness, but also ensures that each getg function call in Go files is inlined. The disadvantage is that if no thread switching occurs between two getg function calls, then theoretically the second getg function call can be optimized, but not in this implementation. In this implementation, all instruction sequences of getg function will be executed once it is called. This CL also added a unit test case for this implementation. Updates golang/go#37295 Change-Id: If9e47b2afeb420a1d0316f2a82602b18bed82477 Reviewed-on: https://go-review.googlesource.com/c/gollvm/+/228737 Reviewed-by: Than McIntosh <firstname.lastname@example.org>