Skip to content

loadPixels and PGraphics dependency batching in WebGPU #1320

@tychedelia

Description

@tychedelia

The processing API allows performing arbitrary GPU->CPU readback on demand via the loadPixels method. This is generally an extension of the immediate mode philosophy of processing but poses significant friction with modern graphics architectures. In contrast with opengl, which goes to significant lengths to preserve the illusion that GPU data is easily accessible, WebGPU is designed to foreground that GPU operations are asynchronous relative to the CPU timeline. While the API does support blocking on these operations, the expectation of most framework including bevy is that you won't.

Concretely why this matters is because modern graphics libraries really want to batch rendering together for efficiency reasons. For example, in Bevy, all cameras (which could be considered an analog to a PGraphics instance) want to render at the same time every frame. loadPixels introduces an architectural complication in that the current draw state may need to be flushed and made visible to any other graphics context at any arbitrary time.

In other words, beyond any performance concerns, this highlights a potential dependency problem between PGraphics instances:

  • In the event where multiple PGraphics instances aren't dependent on each other, batching works fine and we can delay flushing all their draw state til the end of frame.
  • When a PGraphics is used in another PGraphics, e.g. an off-screen texture used by something in rendering to the screen, we could simply just track a relative order to ensure the Bevy Camera for one runs before the other.
  • If the user calls loadPixels, we need to flush the draw state right now and make the texture visible to potentially any other CPU code.

Approach for WebGPU with Bevy

We should start just by mirroring the immediate mode API. What this means is that we'll mirror the drawStart, flush, drawEnd lifecycle. Set CameraOutputMode::Skipat the beginning of each frame for each surface to render only to the intermediate texture and CameraOutputMode::Write only when calling drawEnd.

We can handle loadPixels in this way just by doing a synchronous readback after a flush. It's fine, and will superficially look like opengl's behavior from the user perspective.

Because this immediate mode approach likely has some undesirable overhead (although tbd, it may be pretty minimal), we can do our own dependency/dirty tracking if necessary down the line. What this would look like is keeping an in-flight dependency graph of PGraphics and how they're being used, and only trigger flushes when they acutally need to be made visible to other graphics contexts. This isn't hard to do but will make the implementation more confusing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions