-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPUQueue.writeBuffer/writeTexture #509
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for the quick review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: let's make sure to merge this correctly with #498.
re: naming: These functionally are related to copyBufferToX. How about copyTo{Buffer,Texture}
or copyDataTo{Buffer,Texture}
?
@kainino0x thanks for the detailed review! An update is to follow.
It looks like #498 could easily land first, and I'll rebase
Given that we already have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Would like to hear opinions from others on this though.
OK, my preference is not that strong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see details about when and why this might be useful as opposed to mapping.
The main arguments mentioned previously is that this more ergonomic than mapping and with decent performance. |
spec/index.bs
Outdated
GPUTextureCopyView destination, | ||
GPUExtent3D copySize); | ||
|
||
void copyTextureToBuffer( | ||
GPUTextureCopyView source, | ||
GPUBufferCopyView destination, | ||
GPUBuffer destination, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I kinda like the symmetry between GPUTextureCopyView
and GPUBufferCopyView
, maybe GPUBufferCopyView
could extend GPUTextureDataLayout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea I haven't considered, let me try it
|
||
void writeTexture( | ||
ArrayBuffer sourceData, | ||
GPUTextureDataLayout sourceLayout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have the same 256 limitation as for copies between buffers and textures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
I don't think it should have the limitation, since the data is coming from the CPU side, via a shortcut method, so we have both a room and a permit to manage the alignment automatically.
This could also be one more thing towards the list of why an API like this is useful. cc @jdashg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this too, I think it should not have the limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't have the limitation, it must copy bloat when this happens, which chips away at the "good enough performance" claims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume when @Kangz says "256 limitation" he's referring to buffer copies having to have strides that are a multiple of 256? (I ⌘f-ed for "256" in the spec and didn't find anything...)
If the source data just happens to have stride that is a multiple of 256, just by coincidence, then no copy bloat, right? Just like how, if writeBuffer()
is called when the buffer just happens to be ready, just by coincidence, then no copy bloat either, right? It doesn't make a lot of sense to mandate 256 here when we aren't mandating an internal copy.
Perhaps this can be alleviated with additional spec text describing a set of steps which, if applications take, they are guaranteed to be on the fast path. These steps would say "use fences to know when your buffer is ready" and "make the stride a multiple of 256" (possibly among other things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment wasn't super clear: the 256 limitation is around the rowPitchInBytes in GPUTextureDataLayout
. copyBufferToTexture
and copyTextureToBuffer
require it to be a multiple of 256 and I was wondering if the same applied for writeTexture
.
Just like how, if writeBuffer() is called when the buffer just happens to be ready, just by coincidence, then no copy bloat either, right?
That's an implementation detail, and at least in Chromium/Dawn we'd like to avoid doing it if we can (it requires knowing for each resource when's the last time they were used by the GPU, which we'd like to avoid tracking).
@Kangz deriving seems to be working well, thanks for the idea! @jdashg see Myles concerns in #418 plus #509 (comment) . An alternative is:
|
I'm not compelled by this reasoning. |
We are currently sailing this ship, and I haven't given up on trying to keep in pointed in the direction I see as ideal. I'm not sure why you see these things as foregone conclusions, but the path of least resistance is not what gets us to the optimal spec. |
Could you (re-)point me to the full code please? It doesn't appear to be linked from your hack.md, and stuff I can find by a cursory look isn't the code we are talking about (e.g. #418 (comment)).
There isn't an extra copy. The texture data provided by the user in this PR has to go to a driver-managed staging buffer first, anyway, which is the CPU copy, and we can just resolve the alignment there.
This means doing something custom that the polyfill you are describing doesn't do. So the polyfill (alone) can't replace
Do you mean the variants where the user provides the range, and we can detect if the range is not used by the GPU? This behavior is yet to even be attempted to be specified, since the current usage rules can allow parts of the buffer to be used differently.
We've been sailing this ship for 3 years by now, and we are on the verge of settling on the MVP. We can, indeed, still change the course, but it's getting more and more costy to do so. |
|
||
- `destination.texture` texture wasn't created with {{GPUTextureUsage/COPY_DST}} flag in {{GPUTextureDescriptor/usage}}. | ||
- `destination.texture` must not be destroyed | ||
- `destination.origin + size` exceeds the size of texture dimensions of the mipmap `destination.mipLevel`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume that size
is specified to be in bytes and origin
in pixels? If so, then the operation destination.origin + size
is using mismatched units and will not work, especially if there are padding pixels at the end of each scanline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size here is a GPUExtent3D in pixels. Unfortunately the types are not locally visible (you have to check the IDL)...
spec/index.bs
Outdated
- `destination.origin + size` exceeds the size of texture dimensions of the mipmap `destination.mipLevel`. | ||
- any of the `size` extents are zero | ||
- `sourceOffset + requiredSize` exceeds `sourceData.byteLength`, where `requiredSize` can be computed as | ||
`(sourceLayout.imageHeight * sourceLayout.rowPitch * (size.depth - 1) + sourceLayout.rowPitch * (size.height - 1) + size.width) * bpp`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume that rowPitch
is specified as being bytes per scanline, regardless of the pixel format. If so, we shouldn't also be multiplying everything by bpp
spec/index.bs
Outdated
dictionary GPUTextureDataLayout { | ||
GPUBufferSize offset = 0; | ||
required unsigned long rowPitch; | ||
required unsigned long imageHeight; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please spec the units of these fields. I presume that offset
, rowPitch
and imageHeight
are in bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset
and rowPitch
are in bytes but imageHeight
is in texels. We discussed this a while ago in . #148
spec/index.bs
Outdated
Takes the `sourceData` contents starting from the offset `sourceOffset` and | ||
schedules a write operation of these contents to the `destination` buffer in the queue. | ||
The operation does nothing and produces an error if any of the following is true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would be good to spec that the contents of sourceData
are read during the execution of the writeXXX
function call. Any subsequent modifications to sourceData
do not affect what is written at the time that the scheduled operation runs.
spec/index.bs
Outdated
The operation does nothing and produces an error if any of the following is true: | ||
|
||
- `destination` buffer wasn't created with {{GPUBufferUsage/COPY_DST}} flag in {{GPUBufferDescriptor/usage}}. | ||
- `destination` buffer must not be destroyed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `destination` buffer must not be destroyed | |
- `destination` buffer is destroyed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is implied by ["destination" buffer isn't in the "unmapped" state.], so you could also remove this.
spec/index.bs
Outdated
The operation does nothing and produces an error if any of the following is true: | ||
|
||
- `destination.texture` texture wasn't created with {{GPUTextureUsage/COPY_DST}} flag in {{GPUTextureDescriptor/usage}}. | ||
- `destination.texture` must not be destroyed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `destination.texture` must not be destroyed | |
- `destination.texture` is destroyed |
Thanks for the reviews! Addressed the notes and rebased now. |
On client-side tracking and direct writes to the buffer if it's not in use: I think this ability should be left for a follow-up extension, since in a general case the original buffer is not even mappable, so it's impossible to write any data in directly without a copy. So for now we can say this API is only meant to stage and queue uploads. |
Note that at least until we are able to map memory in WASM space, |
…gpuweb#509) * Support passing skip/fail test expectations in wpt or by command line On the command line, test expectations are accepted by a command line flag --expectations path/to/expectations.ts, where expectations.ts is a module that exports RawTestQueryStringWithExpectation[] as "expectations". In WPT, the page should define a global variable |loadWebGPUExpectations| which is a Promise<RawTestQueryStringWithExpectation[]>. The test expectations are filtered down to only what is relevant for the running test query, and applied to the testcases before they run. A 'skip' expectation will entirely skip a testcase, and a 'fail' expectation will surface failures as passes. If a 'fail' expectation is found, but a test actually passes, the harness will log an warning for the unexpected pass. * Update tools/deno Co-authored-by: Luca Casonato <lucacasonato@yahoo.com> * Address comments from code review * testcaseQuery -> filterQuery Co-authored-by: Luca Casonato <lucacasonato@yahoo.com>
This PR brings setSubData calls on the queue for buffers and textures.
This PR can be seen as either:
Preview | Diff