Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make sure that a minimal-garbage hot path is supported #214

Closed
magcius opened this issue Feb 19, 2019 · 10 comments
Closed

RFC: Make sure that a minimal-garbage hot path is supported #214

magcius opened this issue Feb 19, 2019 · 10 comments
Projects
Milestone

Comments

@magcius
Copy link

magcius commented Feb 19, 2019

For high-performance WebGPU applications like ones I am creating, it should be able to support minimal or zero garbage generated per frame. Unfortunately, the existing buffer mapping proposals return Promises, which are objects that have to be generated and collected, causing GC pauses. (Promises also have the issue that they are tied into the microtask loop, which means that you are waiting at least until the next frame for them to resolve, even if the buffer is immediately ready, introducing latency.)

This means that using Promises for robust buffer management is going to be an issue, unless the Promises can be reused. In my experience, having to choose between a buffer copy and GC pauses, I would choose the extra copy (memcpy is one of the fastest things a computer can do, GC pauses are a lot more expensive), but the best strategy for preventing garbage is object reuse and up-front allocation. An API where you create an object and keep it around, similar to a Promise, might be an option. To bikeshed a bit, something like this:

const myData = MyEnginesUniformBufferDataWrapper();
const bufferFillToken = new GPU.BufferFillToken();
bufferFillToken.onready = function(buffer, userData) {
    userData.fillInto(buffer);
};
cmd.setBufferData(bufferFillToken, myData);

The bizarre "user data" pointer is to prevent closure allocations. Also, for those who might reply that setBufferData is only for low-performance code, the same concerns should apply to the mapWriteAsync/unmap cycle as well, if I am reading the usage correctly.

Thoughts?

@grovesNL
Copy link
Contributor

I also mentioned callbacks a while ago in #52. I agree that it's really important to get this right for the use cases WebGPU is targeting.

@magcius
Copy link
Author

magcius commented May 21, 2019

With setSubData removed, it seems like the path for buffer uploads is one that creates Promises.

@juj
Copy link

juj commented May 4, 2020

One historical data point from WebGL API from Emscripten/Wasm when I was optimizing support for Unreal Engine 3, 4 and Unity3D: in WebGL 1 the API entry points were trashy, and while we did a lot of temp object pooling for glUniform*fv, bufferData and related APIs (you can see the techniques in https://github.com/emscripten-core/emscripten/blob/master/src/library_webgl.js ), profiles still showed the pooling approach to be awkward and slow for performance.

That led us proposing that WebGL 2 should adopt new function entry points that do not need to employ pooling (which adds CPU overhead and increases distributed code size). When that finally became available, the result in Unreal Engine 4 was benchmarked to be close to ~6-7% reduction in JS CPU overhead, not to mention a good guarantee against GC stuttering (that is generally pretty hard to solidly quantify). This benefit one gets just by switching from WebGL 1 to WebGL 2, so even if an app only needed WebGL 1 features, it should definitely build against WebGL 2 to get a noticeable amount of free JS perf.

WebGL 2 applications can generally run garbage-free (i.e. not even "some garbage in cold paths"), but the amount of garbage there is relates to debugging, validation and error checking - and more generally speaking - Emscripten-compiled games/game engines like UE4 and Unity3D have been carefully optimized to be as close to 100% garbage free in their operation. E.g. UE4 does not generate any JS garbage when running, except for the occassional items that are required by web APIs (DOM input events, web audio clip playback).

One of the remaining really bad Wasm interop items that could not be fixed in WebGL is its use of opaque WebGLUniformLocation interface objects, which led to the need to maintain large mapping tables, that are painfully slow to build up as well.

It would be beneficial for WebGPU to also behave garbage free, and moreover, be generally friendly to Wasm in that it avoids the need to create unnecessary mapping constructs.

Some example aspects that are concerning are the use of strings as enums, and unbounded sized arrays-of-objects as function input parameters. I have not yet had a chance to build test cases in practice to see how hot/cold these different functions will end up being, but looking to explore that in more detail.

Hopefully WebGPU could adopt a fully 100% garbage free styled API? I'd go as far as even to define an ABI/serialized interface for the property objects where one could write the function parameters as a struct (or arrays-of-structs) to an ArrayBuffer in the wasm heap, and then call a function by passing an index to that ArrayBuffer.

@hugoam
Copy link

hugoam commented May 10, 2020

Does the case of direct native to WebAssembly usage (without jumping through the JS hoop) fit somewhere in this discussion ? With WebGPU seeing a growing number of users coming from both C/C++ and Rust, I think this will be more relevant than ever, we can count on a non-trivial amount of users benefitting from such a path in the future
I believe WebAssembly interface types will provide ways to make this possible one day ?
This is of course orthogonal to the problem of JS garbage, if it ever happens, but it is integral to being able to use WebGPU in the most efficient way possible

@magcius
Copy link
Author

magcius commented May 10, 2020

One of the remaining really bad Wasm interop items that could not be fixed in WebGL is its use of opaque WebGLUniformLocation interface objects

Yeah. For my WebGL 2-based engine, the only garbage that's 100% required in the frame loop is its use of WebGLSync objects, which can't be recycled, but must be destroyed and reused. The goal behind not allowing reusing of sync handles in OpenGL was that they're not actually handles, but in many places instead an opaque serial counter that can be easily checked and compared directly, e.g. allocation is "cheaper than free".

WebGLSync being an opaque object makes it impossible to be garbage-free while doing efficient GPU readback.

Just showing that even with simple principles and careful attention, it's easy to accidentally force garbage into the hot path.

@kvark
Copy link
Contributor

kvark commented May 13, 2020

@magcius thank you for filing! I wish this was turned into a meta-issue with all the aspects of the API listed where we currently require GC objects.
Note that writeBuffer landed in #749 which makes it possible to upload/update data without promises or call-backs.

@Kangz
Copy link
Contributor

Kangz commented Sep 2, 2021

Tentatively closing. writeBuffer landed a while ago, and we can have new issues for specific pieces of GC pressure we want to handle.

@Kangz Kangz closed this as completed Sep 2, 2021
@kdashg
Copy link
Contributor

kdashg commented Sep 7, 2021

I don't feel like writeBuffer is sufficient here, since it lacks the control that some hot-paths will want to retain.

@kdashg kdashg reopened this Sep 7, 2021
@magcius
Copy link
Author

magcius commented Sep 7, 2021

I find promises also unsatisfying for other reasons related to the event loop timing as well, but that's sort of a tangent from here. Any functionality that relies on Promises is a footgun, IMO.

@magcius
Copy link
Author

magcius commented Sep 7, 2021

Also, as a general update: in the two years since I've filed this issue, I've noticed GC getting a lot better. GC pauses are still an issue, but they seem to be much, much less frequent, and I'm less afraid of allocating small amounts of small objects (for...of is still a disaster, but that's for another day), so I'm not afraid of Promises for the GC costs anymore, but I am afraid of them for the problems with timing.

Also, things have strayed a bit far from "zero-GC" in the design of the system: every frame already mandates we create a few small objects: getCurrentTexture() / createView() / createCommandEncoder() at the bare minimum. So I'm not as strict on zero-GC as I once was.

@kdashg kdashg added this to the post-V1 milestone Sep 28, 2021
@kdashg kdashg added this to Needs Investigation/Proposal or Revision in Main Sep 28, 2021
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
Thought I tested this, but error.stack does not contain the error
message on Safari and Firefox. So include it explicitly when printing.
@magcius magcius closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Main
Needs Investigation/Proposal or Revision
Development

No branches or pull requests

7 participants