-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: performance of go wasm is very poor #65440
Comments
cc @golang/wasm @golang/compiler |
I manually modified the wat code to express the inner
|
The reason for the use of br_table is to support goroutines in WebAssembly's single threaded environment. The generated code is able to unwind and rewind the WebAssembly stack when goroutines are switched, and so each block of code is accessible via br_table. |
Thank you for your reply. Does that mean that the current solution of using br_table is a temporary solution to WebAssembly's single-threaded environment? Currently, the performance of go wasm is poor, is there any optimization plan for the use of br_table? |
See huge discussion here on why WebAssembly is designed in such a way: WebAssembly/design#796 (comment) So far this design flaw (my opinion) has not been resolved. |
It might be possible to compile the innermost loop with the |
Cherry, I had the same thought looking at the code last night. I’m in that
code atm working on the wasm32 port, I’ll do a little experimenting to see
if we can optimize a call-less loop.
…On Fri, Feb 2, 2024 at 7:53 AM cherrymui ***@***.***> wrote:
It might be possible to compile the innermost loop with the loop
instruction if it is not preemptible (contains no calls). On the other
hand, nonpreemptible loop is not a good thing in general. Maybe it is fine
on Wasm as there is no preemption anyway?
—
Reply to this email directly, view it on GitHub
<#65440 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAABZCPD7VI5IVLYHLQKTYRUDWJAVCNFSM6AAAAABCV5IEXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGE2TCNJZHA>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
Even if we had threads I don't think this would allow us to rely on the host for scheduling, we need GC and resizable stacks which the host is unlikely to implement like we want it to. (wasm gc still has no support for inner pointers AFAIK) |
After looking at WebAssembly/design#796 (really good link btw) I think we could emulate Would need to be benchmarked before trying out an implementation. Edit turns out this was already proposed in WebAssembly/design#796 (comment):
|
I'm not sure we can use tail calls to emulate |
@cherrymui I thought we didn't / rarely used local variables. |
I suspect something like https://github.com/WebAssembly/stack-switching would make a lot of these problems go away. In this model, the Go runtime's scheduler would be the parent fiber of all other fibers, and all other fibers would be goroutines. Calling into the scheduler or onto the systemstack then just involves switching to the parent fiber. We'll still probably need to use linear memory as the data stack, but at least we'll have a much better way to change what is executing. I do not know what the status of Wasm stack switching is at this point in time. |
@Jorropo we use local variables for "registers". The "shadow" stack in linear memory is used for non-registerized local variables like all other architectures. If we don't use "local variables", that essentially behaves like a non-registerized build, which would probably not be very performant. I agree with @mknyszek that with the stack switching support, we probably don't need to use the top level loop and branch tables. |
Go version
go version go1.21.6 linux/arm64
Output of
go env
in your module/workspace:What did you do?
As shown in the following example,
test.go
is compiled into wasm:test.go :
go to wasm compile command:
GOOS=wasip1 GOARCH=wasm go build -o test.wasm test.go
What did you see happen?
As you can see in the wat code, the
for
loop is expressed using thebr_table
operation.wat code:
What did you expect to see?
When the aot compiler of wasm runtime performs backend optimization, it is difficult to identify the
br_table
as afor
loop. So, during the backend optimization, thisfor
loop was not optimized.I tested several of the most popular wasm runtimes, such as wasmtime, wamr, and wasmer, and I found that the performance of go wasm after aot compilation was very poor, and the runtime performance was only 20% of go native in the best case.
Why use so many
br_table
operation instead ofloop
operation? Will the performance of go wasm be optimized in the future?Also, I found that the wat code of the go runtime functions uses br_table a lot,the craziest function has 417 hops in the br_table.
The text was updated successfully, but these errors were encountered: