New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU / CPU Transfers #45
Comments
In the meeting discussion went a bit into organisational matters, so I'll put my question here. It may be a bit stupid, but here it goes: how with this API we'll protect buffer we're trying to download from write-after-read (IIUC) hazards from commands further in the queue that may write into it? Is there such a problem at all? For example:
My understanding is that under the hood of the GPUWeb something like this will be happening:
There's a way out of it: between compute passes insert a copy command that will copy contents of the buffer into some staging area, that will then be safely read from. But that's +1 copy, which may be undesirable on UMA GPUs (on dGPUs, AFAIK, there's need for "staging" anyway). UPD. I think I've got a possible answer:) (moral is: don't ask questions at 2 a.m.) It seems that at least in 2 target APIs there's a way to make device wait for an event or fence being signalled from CPU. In Vulkan it's |
The way we do it in our engine is that we have an address (virtual memory) allocator sitting over default up/down stream (Staging) buffers that are persistently mapped (yes you can have that in all APIs) To these buffers the first writing of data occurs, then a copy to the actual device-native immutable buffer. This has many performance benefit, you do not want your actual GPU-side buffer to be mappable, sit in some special DMA memory, or be updateable... all of this causes serious performance drawbacks. I don't think you should worry about the +1 copy for the UMA devices as the security considerations will require you to examine the contents being written/read anyway, so might as well do that during copy. |
I'm going to retract this proposal because the required extra internal copy is distasteful to the WebGPU CG. Instead, we are debating the merits of the proposals here. |
This is Apple's proposal for GPU to CPU transfers, and visa versa.
We believe that for a first version (MVP), we can stick to an extremely simple model. If we later discover we need something more complicated for efficiency, we can add to the API.
Benefits
Drawbacks
All transfers require at least one copy.
Example
The text was updated successfully, but these errors were encountered: