Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/glimpl: [wasm] remove syscall/js #4

Closed
wants to merge 1 commit into from

Conversation

inkeliz
Copy link
Contributor

@inkeliz inkeliz commented Dec 17, 2020

That change avoid JS calls through syscall/js. Instead we use the
WebAssembly imports, which is defined on the gl_js.s and gl_js.js.

That change is similar to the previous patch, but it's easier to read
the code and fix. It also have only five JavaScript functions, instead
of one function for each WebGL call.

Opera 72 (w/ AMD Ryzen 3900X): ~8.0ms per frame to ~3.4ms;
Chrome 87 (w/ Snapdragon 435): ~83.2ms per frame to ~39.8ms;

The gl_js.js file is embed by the gogio compiler.

Signed-off-by: Inkeliz inkeliz@inkeliz.com

@inkeliz inkeliz force-pushed the inkeliz-syscall-mini branch 2 times, most recently from cd87e2b to 60829a4 Compare December 17, 2020 10:08
@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 17, 2020

I forgot to include the gl_js.s file in the commit. 🤣
I hope that it will pass the tests now. 🤞

@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 17, 2020

The syscall/js is known to be slow, I don't think it will improve soon, so that change is to "bypass" the syscall/js at all.


No one asked, but I will answer...

1. Why not[]interface?
It is expensive. The GC triggers very often, and the runtime.convTslice (and such) is painfully slow.
Using []interface, the time per frame drops to ~50ms (from ~80ms). Using []uintptr the time per frame drops to ~40ms.
In general, switching to []uintptr saves around ~10ms on smartphone and ~0.7ms on a desktop.

2. Why create newProc?
That avoids the function name to be converted everytime. Yes, it is expensive. The original code (using syscall/js) needs to convert the golang-string to js-string EVERY CALL. Supposing that you are calling window.console.log you will use: console.Call(“log”, …). That “log” must be converted to JS-string, each time that you call that. It is not cheap as you are supposing, the LoadString function is one of the slowest on JS side.

The newProc will re-use the same string. Also, since we are not using interface, we need to know the type of input. Usually, the interface provide us the type, but interface is slow.

Furthermore, the ref value varies their meaning accordingly with the output provided, as commented in the code. The ref could be either: the length of string, the reference of the js-object or the integer-value.

Using Inkove instead of Call (on syscal/js) may improve that, but will be slower than this change anyway).

3. Why drop js.Value?
Mixing js.Value and custom JS is dangerous. The older patch does that quite well, but at first it works on Go 1.15 and not on Go 1.14. So, trying to make it compatible with js.Value is not worth.

4. Where is the GC? Why not use runtime.SetFinalizer?
Seems that we did not need. We have the delete* function. Such as DeleteBuffer. So, it also calls the free, which flag that index as available for re-use. So, instead of increasing the list of values, it will re-use the same slot of the previous “free” value.

Yes, each Delete* will call JS twice, but relying on runtime.SetFinalizer will also call twice, and also lead to more inconsistent frame-time.

5. Why have a shared string buffer?
String is expensive. The original syscall/js makes almost 3 calls to create the string. I did it with a single call. (:
The idea is to create an already-known buffer. If you use _glTypeString it will set that buffer and the ref will be the length of the string. So, when you call Call it already sets the buffer. The buffer (which is already on the Golang side) will the read by String(). Exactly one single call to JS.

The string cannot be larger than 4096 bytes, that seems enough for us. It is not threading safe, but WebGL does not seems to be thread-safe either.

6. That works?
Yes. It is working fine on multiple versions of Golang, without any specific change for each version. For browser, it requires the ES6 (2015).

I test it against multiple devices and multiple conditions (WebGL 1-only, WebGL 2-desabled, WebGL 2) and on multiple devices that I have available, also using the BrowserStack. Far I tested, it works even on Firefox 53 (2017-04-18), and Chrome 63 (2017-12-05) and Safari 12. So far, the browser-support seems on par with the current Gio version.

I tested with the "Kitchen" and my own app, only.

Copy link
Contributor

@eliasnaur eliasnaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. Why drop js.Value?
Mixing js.Value and custom JS is dangerous. The older patch does that quite well, but at first it works on Go 1.15 and not on Go 1.14. So, trying to make it compatible with js.Value is not worth.

Can you elaborate what makes mixing js.Value and low-level javascript assembly dangerous, or at least more dangerous than your custom syscall approach? As far as I can tell, this argument blocks leaving performance insensitive function using the slower but safer syscall/js.

}
func (f *Functions) Uniform2f(dst Uniform, v0, v1 float32) {
f.Ctx.Call("uniform2f", js.Value(dst), v0, v1)
vv0, vv1 := float64(v0), float64(v1) // Float32 doesn't work
f.Ctx.Call(_glUniform2f, uintptr(unsafe.Pointer(&dst)), uintptr(unsafe.Pointer(&vv0)), uintptr(unsafe.Pointer(&vv1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uniform etc. are uints. Won't all these &value pointer taking expressions lead the values to be heap-allocated and thus subject to garbage collection?

Copy link
Contributor Author

@inkeliz inkeliz Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, did you have any idea to avoid it? Using []interface seems to have the same issue, actually even worse. I need to get the pointer anyway.

About the float64() it's also done internally by the syscall/js, https://github.com/golang/go/blob/master/src/syscall/js/js.go#L162-L221. I'm trying find a way to read the float32 directly on the JS too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about generating the assembly stubs.

@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 18, 2020

3. Why drop js.Value?
Mixing js.Value and custom JS is dangerous. The older patch does that quite well, but at first it works on Go 1.15 and not on Go 1.14. So, trying to make it compatible with js.Value is not worth.

Can you elaborate what makes mixing js.Value and low-level javascript assembly dangerous, or at least more dangerous than your custom syscall approach? As far as I can tell, this argument blocks leaving performance insensitive function using the slower but safer syscall/js.

Well, the Call() is by slow by default, since it needs to "translate" the golang-string to javascript-string (using Invoke instead of Call might mitigate that).

Anyway....

Let suppose that NewFramebuffer is fast using syscall/js, but BindFramebuffer is slow:

The BindFramebuffer (custom js) will take the js.Value. Then, we need to read that value directly from go._values. If the js.Value changes (either new fields or change the order of the fields), your custom code will break...

The oposite is even worse, if NewFramebuffer is slow and BindFramebuffer is fast:

The NewFramebuffer (custom js) needs to generate a valid js.Value. So.... The js.Value created by the custom-function must have the same "signature" (the same flags) of the syscall/js.Value. That "signature"/"flags" changes from Go 1.14 to Go 1.14.1 to Go 1.15, and may change in Go 1.16 too.

The current patch doesn't have to match any flag with the syscall/js. The only need is to have the go.mem (on the JS side) exposed, that is only used to read/write the input/output from the function. The go.mem (js) is available on Go 1.14.


In general currently we only need go._mem, and doesn't need anything from syscall/js.

@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 18, 2020

I forcepush just to replace the Float64 to Float32, avoiding the float64().

@eliasnaur
Copy link
Contributor

3. Why drop js.Value?
Mixing js.Value and custom JS is dangerous. The older patch does that quite well, but at first it works on Go 1.15 and not on Go 1.14. So, trying to make it compatible with js.Value is not worth.

Can you elaborate what makes mixing js.Value and low-level javascript assembly dangerous, or at least more dangerous than your custom syscall approach? As far as I can tell, this argument blocks leaving performance insensitive function using the slower but safer syscall/js.

Well, the Call() is by slow by default, since it needs to "translate" the golang-string to javascript-string (using Invoke instead of Call might mitigate that).

Indeed, any syscall/js is slow. I hope to avoid it where the slowness is in the per-frame hot path.

Anyway....

Let suppose that NewFramebuffer is fast using syscall/js, but BindFramebuffer is slow:

The BindFramebuffer (custom js) will take the js.Value. Then, we need to read that value directly from go._values. If the js.Value changes (either new fields or change the order of the fields), your custom code will break...

The oposite is even worse, if NewFramebuffer is slow and BindFramebuffer is fast:

The NewFramebuffer (custom js) needs to generate a valid js.Value. So.... The js.Value created by the custom-function must have the same "signature" (the same flags) of the syscall/js.Value. That "signature"/"flags" changes from Go 1.14 to Go 1.14.1 to Go 1.15, and may change in Go 1.16 too.

The current patch doesn't have to match any flag with the syscall/js. The only need is to have the go.mem (on the JS side) exposed, that is only used to read/write the input/output from the function. The go.mem (js) is available on Go 1.14.

In general currently we only need go._mem, and doesn't need anything from syscall/js.

To make the discussion clearer, I added a sketch of an approach that should be compatible with both syscall/js and whatever faster assembly approach we end up with, by storing both a js.Value and an integer reference for each object:

https://git.sr.ht/~eliasnaur/gio/commit/cf926a7491a63a7cfddb530f4f93bcabf6930700

WDYT?

@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 19, 2020

I'm not sure if that is worth it. I don't think it solves any issue:

  1. It doesn't remove the need of go.mem (which is a DataView).

But, additionaly:

  1. It will call the slow function of js.Global().Call("createRef", v).Int(), and use the slow syscall/js.
  2. It will need to delete the reference twice.

I use something similar here: 0289d07#diff-bf9a8e77af361576a7335f1f62097cd39db03ab401616e81df00a3a9be5aa46dR168-R170

The global.GlimpContext get's the js.Value and create another reference. However, it's only called on the NewContext, which is infrequently called.


I'm not sure if I agree with the idea of "just replace slow functions". There's a lot of calls to WebGL, every single frame. For me, all functions must be as fast as possible.

Also, let's consider that we have six slow functions, such as BufferData, CheckBufferframe, ReadPixel, EnableVertexAttribArray, VertexAttribPointer and Viewport. In the end, we will have 6 JS functions, 6 assemblies (and some additional overhead due to the another js.Global().Call) and worse performance than the current PR... Just replacing the BufferData saves ~1.5ms (of ~8ms), but changing everything saves ~4.5ms, so why not?

I'm not sure why you are so worried about dropping the syscall/js, the syscall/js also say:

This package is EXPERIMENTAL. Its current scope is only to allow tests to run, but not yet to provide a comprehensive API for users. It is exempt from the Go compatibility promise.

The current PR has only five functions. If it has some bug will be easier to spot than the older one, since all WebGL functions are using the same call function. It also passes all tests, including tests on real-browsers (on WebGL 1 and WebGL 2). I think it even possible to revert this commit if it breaks due to Golang changes (changes on the wasm_exec or the new //go:wasmexport). Also, using syscall/js is also susceptible to the same problem (Go1.13 to Go1.14 have breaking changes on syscall/js).

This PR is working on the new Go1.16Beta1, without any change.

@eliasnaur
Copy link
Contributor

eliasnaur commented Dec 19, 2020

I'm not sure if that is worth it. I don't think it solves any issue:

1. It doesn't remove the need of `go.mem` (which is a [DataView](https://developer.mozilla.org/pt-BR/docs/Web/JavaScript/Reference/Global_Objects/DataView)).

Sure, avoiding syscall/js means tinkering with internals. The goal is to minimize the tinkering.

But, additionaly:

1. It will call the slow function of `js.Global().Call("createRef", v).Int()`, and use the slow `syscall/js`.

Yes, but only on creation (and deletion) of objects, which is already expensive. The GPU backend takes care to re-use GPU objects.

2. It will need to delete the reference twice.

Again, only during deletion which is a rare occurrence.

I use something similar here: 0289d07#diff-bf9a8e77af361576a7335f1f62097cd39db03ab401616e81df00a3a9be5aa46dR168-R170

The global.GlimpContext get's the js.Value and create another reference. However, it's only called on the NewContext, which is infrequently called.

Indeed. I'm proposing to expand that facility to all GPU objects, and to keep a struct of both the js.Value and fast integer reference.

I'm not sure if I agree with the idea of "just replace slow functions". There's a lot of calls to WebGL, every single frame. For me, all functions must be as fast as possible.

It's true that a lot of calls are per-frame and need assembly implementation. However, with the dual js.Value/ref approach we're not bound to all-or-nothing. As a side bonus, I don't think any performance critical functions use string parameters, so we can omit string handling entirely.

Also, let's consider that we have six slow functions, such as BufferData, CheckBufferframe, ReadPixel, EnableVertexAttribArray, VertexAttribPointer and Viewport. In the end, we will have 6 JS functions, 6 assemblies (and some additional overhead due to the another js.Global().Call) and worse performance than the current PR... Just replacing the BufferData saves ~1.5ms (of ~8ms), but changing everything saves ~4.5ms, so why not?

I argue that the 4.5ms gain can be achieved with fewer functions replaced.

I'm not sure why you are so worried about dropping the syscall/js, the syscall/js also say:

This package is EXPERIMENTAL. Its current scope is only to allow tests to run, but not yet to provide a comprehensive API for users. It is exempt from the Go compatibility promise.

The entire WASM port is experimental :) However, can we agree that syscall/js is safer than any other approach?

The current PR has only five functions. If it has some bug will be easier to spot than the older one, since all WebGL functions are using the same call function. It also passes all tests, including tests on real-browsers (on WebGL 1 and WebGL 2). I think it even possible to revert this commit if it breaks due to Golang changes (changes on the wasm_exec or the new //go:wasmexport). Also, using syscall/js is also susceptible to the same problem (Go1.13 to Go1.14 have breaking changes on syscall/js).

This PR is working on the new Go1.16Beta1, without any change.

Ok. I admit that your latest revision does look a fair bit less scary than previous revisions, and I think I can be persuaded to replace all of syscall/js. However, there are two issues I'd like to see fixed:

  • The uintptr(unsafe.Pointer()) conversions are both unsafe and creates garbage.
  • The JNI-like approach is slower than straight up assembly/JS implementations, yet more complex.

I believe both issues can be solved by go generate'ing the assembly implementations. By generating you gain the maximal performance, don't need uintptr parameter types, and only need to ensure the generator is correct, not all the hand-written assembly stubs. I assume the generator will be about as complex as your JNI-like approach, except that the overhead is moved to generate time, not runtime.

@inkeliz
Copy link
Contributor Author

inkeliz commented Dec 19, 2020

The uintptr(unsafe.Pointer()) conversions are both unsafe and creates garbage.

I thought about that, but the current syscall/js seems to create the same garbage or more. Also, using the func somefunction(arg int) will not copy the arg int too? Just asking, but removing the uintptr(unsafe.Pointer()) is better anyway.

The JNI-like approach is slower than straight up assembly/JS implementations, yet more complex.

I'll try to build some proper generator.

@eliasnaur
Copy link
Contributor

The uintptr(unsafe.Pointer()) conversions are both unsafe and creates garbage.

I thought about that, but the current syscall/js seems to create the same garbage or more. Also, using the func somefunction(arg int) will not copy the arg int too? Just asking, but removing the uintptr(unsafe.Pointer()) is better anyway.

No, non-pointer parameters don't create garbage. uintptr is a value-type and doesn't create garbage either; taking the address (&value) is what forces heap allocation and garbage.

The JNI-like approach is slower than straight up assembly/JS implementations, yet more complex.

I'll try to build some proper generator.

@eliasnaur
Copy link
Contributor

While restructuring the GL backend for the new renderer, I discovered that the WebGL backend has been passing much more data than intended (the entire cached byte buffer, not just a slice of it). See https://gioui.org/commit/e012c3f393792 for the fix. It should speed up buffer(Sub)Data and friends.

@inkeliz
Copy link
Contributor Author

inkeliz commented Sep 27, 2021

I'm parking the code here. The generator is very poorly written, I tried to use the go/package and go/ast but it seem every more difficult, so I basically copy my old "generator" (which is bunch of regex).


There's some changes, that I think not belong to here. I add some cache to GetInteger, GetFloat (...), since most of the values are constant and it's called every frame. It saves ~1.5ms, even using syscall/js. Maybe the gpu package could cache that values, instead of requesting them every frame.

To use the asm version, must use -tags unsafe, so both methods exists.


I don't think it will be accepted in Gio core, but I'm using that patch. :\

@inkeliz inkeliz force-pushed the inkeliz-syscall-mini branch 2 times, most recently from 99bd349 to 126c88b Compare September 27, 2021 02:47
inkeliz added a commit to inkeliz/gio that referenced this pull request Sep 27, 2021
In the beginning of each frame, multiple calls to
`GetInteger`/`GetFloat` is performed, taking around ~2ms out of ~8ms.

Now, those data are cache at first usage, avoiding calls to JS. It saves
around ~2ms out of ~8ms.

Due to `syscall/js` usage of `TextEncoder`, which is slow on Chrome, a
new cache was created, to bind function and use `Invoke` instead. It
saves around ~1ms out of ~8ms.

In total, this patch improves the performance from ~8ms to ~5ms. It
still slower than gioui#4.

Signed-off-by: Inkeliz <inkeliz@inkeliz.com>
@inkeliz inkeliz force-pushed the inkeliz-syscall-mini branch 2 times, most recently from 8bf1cd4 to eb637b9 Compare October 2, 2021 23:01
Signed-off-by: Inkeliz <inkeliz@inkeliz.com>
@inkeliz inkeliz mentioned this pull request Feb 26, 2022
theclapp added a commit to theclapp/gio that referenced this pull request Jan 16, 2023
- This is the 1st commit message:

ScrollTo, PagePrev/Next, etc

This is a combination of 5 commits.

- layout: add List.ScrollTo, ScrollPages, and 2 more

Also add PagePrev & PageNext.

Include tests for all.

- app,app/internal/window,io/system: macOS menus

A strawman interface to the macOS menuing system.

- Make menus compile on other architectures

- app/internal/window: Add modifiers to macOS menu event

- Update to latest Gio main

- This is the commit message gioui#2:

Delete app/internal/wm/window.go

No longer used in main branch.

- This is the commit message gioui#3:

Move menu changes into their own files.

- This is the commit message gioui#4:

io/key,widget: Make editor hotkey filter more specific

Only look for up/down/pageup/pagedown if the editor is multiline.

Only look for enter/return if the editor is Submit == true.

Fix modifier for delete-backward & forward.

Implement some key.Set builder functions and use them in the above code.

Fixes: https://todo.sr.ht/~eliasnaur/gio/399

- This is the commit message gioui#5:

widget: Tweak editor hotkey event listener

For delete-forward and backward, only listen for ShortAlt.

Only listen for Short-[C,X] (copy selection and cut selection) when
there's a selection.

- This is the commit message gioui#6:

Add system.MenuEvent to Window.processEvent

- This is the commit message gioui#7:

key: rename BuildKeySet and BuildKeyGroup

to BuildKeyset and BuildKeygroup.

- This is the commit message gioui#8:

Sleep 2ms after onClose.

Signed-off-by: Larry Clapp <larry@theclapp.org>
@inkeliz inkeliz closed this Feb 24, 2023
@inkeliz inkeliz deleted the inkeliz-syscall-mini branch February 24, 2023 14:37
@inkeliz inkeliz restored the inkeliz-syscall-mini branch February 24, 2023 20:50
theclapp added a commit to theclapp/gio that referenced this pull request Jan 12, 2024
- This is the 1st commit message:

ScrollTo, PagePrev/Next, etc

This is a combination of 5 commits.

- layout: add List.ScrollTo, ScrollPages, and 2 more

Also add PagePrev & PageNext.

Include tests for all.

- app,app/internal/window,io/system: macOS menus

A strawman interface to the macOS menuing system.

- Make menus compile on other architectures

- app/internal/window: Add modifiers to macOS menu event

- Update to latest Gio main

- This is the commit message gioui#2:

Delete app/internal/wm/window.go

No longer used in main branch.

- This is the commit message gioui#3:

Move menu changes into their own files.

- This is the commit message gioui#4:

io/key,widget: Make editor hotkey filter more specific

Only look for up/down/pageup/pagedown if the editor is multiline.

Only look for enter/return if the editor is Submit == true.

Fix modifier for delete-backward & forward.

Implement some key.Set builder functions and use them in the above code.

Fixes: https://todo.sr.ht/~eliasnaur/gio/399

- This is the commit message gioui#5:

widget: Tweak editor hotkey event listener

For delete-forward and backward, only listen for ShortAlt.

Only listen for Short-[C,X] (copy selection and cut selection) when
there's a selection.

- This is the commit message gioui#6:

Add system.MenuEvent to Window.processEvent

- This is the commit message gioui#7:

key: rename BuildKeySet and BuildKeyGroup

to BuildKeyset and BuildKeygroup.

- This is the commit message gioui#8:

Sleep 2ms after onClose.

Signed-off-by: Larry Clapp <larry@theclapp.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants