-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Make sure that a minimal-garbage hot path is supported #214
Comments
I also mentioned callbacks a while ago in #52. I agree that it's really important to get this right for the use cases WebGPU is targeting. |
With setSubData removed, it seems like the path for buffer uploads is one that creates Promises. |
One historical data point from WebGL API from Emscripten/Wasm when I was optimizing support for Unreal Engine 3, 4 and Unity3D: in WebGL 1 the API entry points were trashy, and while we did a lot of temp object pooling for That led us proposing that WebGL 2 should adopt new function entry points that do not need to employ pooling (which adds CPU overhead and increases distributed code size). When that finally became available, the result in Unreal Engine 4 was benchmarked to be close to ~6-7% reduction in JS CPU overhead, not to mention a good guarantee against GC stuttering (that is generally pretty hard to solidly quantify). This benefit one gets just by switching from WebGL 1 to WebGL 2, so even if an app only needed WebGL 1 features, it should definitely build against WebGL 2 to get a noticeable amount of free JS perf. WebGL 2 applications can generally run garbage-free (i.e. not even "some garbage in cold paths"), but the amount of garbage there is relates to debugging, validation and error checking - and more generally speaking - Emscripten-compiled games/game engines like UE4 and Unity3D have been carefully optimized to be as close to 100% garbage free in their operation. E.g. UE4 does not generate any JS garbage when running, except for the occassional items that are required by web APIs (DOM input events, web audio clip playback). One of the remaining really bad Wasm interop items that could not be fixed in WebGL is its use of opaque It would be beneficial for WebGPU to also behave garbage free, and moreover, be generally friendly to Wasm in that it avoids the need to create unnecessary mapping constructs. Some example aspects that are concerning are the use of strings as enums, and unbounded sized arrays-of-objects as function input parameters. I have not yet had a chance to build test cases in practice to see how hot/cold these different functions will end up being, but looking to explore that in more detail. Hopefully WebGPU could adopt a fully 100% garbage free styled API? I'd go as far as even to define an ABI/serialized interface for the property objects where one could write the function parameters as a struct (or arrays-of-structs) to an ArrayBuffer in the wasm heap, and then call a function by passing an index to that ArrayBuffer. |
Does the case of direct native to WebAssembly usage (without jumping through the JS hoop) fit somewhere in this discussion ? With WebGPU seeing a growing number of users coming from both C/C++ and Rust, I think this will be more relevant than ever, we can count on a non-trivial amount of users benefitting from such a path in the future |
Yeah. For my WebGL 2-based engine, the only garbage that's 100% required in the frame loop is its use of WebGLSync objects, which can't be recycled, but must be destroyed and reused. The goal behind not allowing reusing of sync handles in OpenGL was that they're not actually handles, but in many places instead an opaque serial counter that can be easily checked and compared directly, e.g. allocation is "cheaper than free". WebGLSync being an opaque object makes it impossible to be garbage-free while doing efficient GPU readback. Just showing that even with simple principles and careful attention, it's easy to accidentally force garbage into the hot path. |
Tentatively closing. |
I don't feel like writeBuffer is sufficient here, since it lacks the control that some hot-paths will want to retain. |
I find promises also unsatisfying for other reasons related to the event loop timing as well, but that's sort of a tangent from here. Any functionality that relies on Promises is a footgun, IMO. |
Also, as a general update: in the two years since I've filed this issue, I've noticed GC getting a lot better. GC pauses are still an issue, but they seem to be much, much less frequent, and I'm less afraid of allocating small amounts of small objects (for...of is still a disaster, but that's for another day), so I'm not afraid of Promises for the GC costs anymore, but I am afraid of them for the problems with timing. Also, things have strayed a bit far from "zero-GC" in the design of the system: every frame already mandates we create a few small objects: getCurrentTexture() / createView() / createCommandEncoder() at the bare minimum. So I'm not as strict on zero-GC as I once was. |
Thought I tested this, but error.stack does not contain the error message on Safari and Firefox. So include it explicitly when printing.
For high-performance WebGPU applications like ones I am creating, it should be able to support minimal or zero garbage generated per frame. Unfortunately, the existing buffer mapping proposals return Promises, which are objects that have to be generated and collected, causing GC pauses. (Promises also have the issue that they are tied into the microtask loop, which means that you are waiting at least until the next frame for them to resolve, even if the buffer is immediately ready, introducing latency.)
This means that using Promises for robust buffer management is going to be an issue, unless the Promises can be reused. In my experience, having to choose between a buffer copy and GC pauses, I would choose the extra copy (memcpy is one of the fastest things a computer can do, GC pauses are a lot more expensive), but the best strategy for preventing garbage is object reuse and up-front allocation. An API where you create an object and keep it around, similar to a Promise, might be an option. To bikeshed a bit, something like this:
The bizarre "user data" pointer is to prevent closure allocations. Also, for those who might reply that setBufferData is only for low-performance code, the same concerns should apply to the mapWriteAsync/unmap cycle as well, if I am reading the usage correctly.
Thoughts?
The text was updated successfully, but these errors were encountered: