Skip to content

GPU Web 2020 02 24

François Daoust edited this page Dec 1, 2020 · 1 revision

Chair: Dean

Scribe: Ken, Austin

Location: Google Meet

TL;DR

  • slides Queries
    • YH: Separated “precise-occlusion” because binary occlusion can be core.
    • KR: Only time duration queries, not timestamps because timestamps aren’t well supported everywhere.
    • DP: Also timestamps can rollover (~days)
    • MM: For iOS, time queries can only be at the whole command-buffer level (read back on the CPU too)
    • DM: Since time, occlusion, and pipeline statistics are all done separately, consider having different shaped APIs for all of them.
    • DM: Consider new usage for query results. Native APIs have this too.
    • YH to make further follow-ups in PR
  • #551 Predication
    • MM: Supported on D3D12 for rendering, compute, copies, and clears; Vulkan (low support) for rendering, compute, and clears; not on Metal.
    • MM: We should not have a predication extension because it can be implemented using indirect draws / dispatch
      • ai: Make a benchmark to compare perf
  • Shading language name: WGSL
  • Shading language editors: 2-3. MM and DS volunteer, will wait for others
  • Shading language repo: same repo as gpuweb
    • Need to consider messaging / communication to the public about shading language decision.
    • JG: Should move the cts into the same repo. Valuable for WebGL

Tentative agenda

  • Query API
  • Shading language naming / editors / repo / ...
  • Shading language meeting frequency / attendees
  • (stretch) data uploads (mapping, or immediate)
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Justin Fan
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Dan Sinclair
    • David Neto
    • James Darpinian
    • Ken Russell
    • Shrek Shao
    • Ryan Harrison
  • Intel
    • Yunchao He
  • Microsoft
    • Chas Boyd
    • Damyan Pepper
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Timo de Kort

Query API (Intel’s proposal, #551 by Myles)

  • Presentation by Yunchao
  • Updated portion:
    • Copied “precise occlusion” into name because it’s an optional feature. Binary occlusion is a core feature in the API.
    • Need to add a parameter to PassDescriptor. For Firefox, need this to set a buffer for occlusion query on Metal. Chromium uses wire layer to record/replay, but for ordering implementation, need to set this before begin/endQuery.
    • DM: not just for FF. Even in Chrome, trying to use 2 different visibility query sets, that’s incorrect - can only have one. It’s needed for all impls.
    • MM: also helpful for WebKit. It’s good.
    • Change sampleCount in QueryResultOcclusion. Previously bool. Need unsigned long to get the sample count.
    • KR: In WebGL, we found that timestamp queries are not widely supported or are buggy. Time duration were the only accurate ones. Do the low level APIs work this way?
    • YH: Yes
    • JG: Ken is saying that a number of devices on the old API were not able to use timestamps, only disjoint timer queries. So the question is does timestamp have universal support for WebGPU?
    • MM: timestamps as presented here only match the Metal model on macOS, not on iOS.
    • YH: these are extensions, not core features.
    • DP: also wonder about maximum timestamp and whether it can roll over. Depends on the GPU vendor.
    • MM: so you can get timestamps smaller than those returned before?
    • DP: yes.
    • JG: what’s the interval for rollover?
    • DP: we see it on machines running for several days.
    • DN: Khronos just added a shader_clock extension. Spec has to be extremely weak. Could be 0, could have interruption. Not something you can rely on.
    • JG: those kinds of specs break down in weird ways.
    • MM: think all 3 APIs have facilities for measuring durations. Think would be possible to have something that measures duration and is in core.
    • JG: agree, think you can polyfill disjoint_timer_query on top of timestamp and present it everywhere. Not sure about the new Vulkan thing but sounds like weaker version of disjoint_timer_query.
    • DN: inside the shader is where you get into trouble. At API level you have OS granularity.
    • MM: think at API level all 3 APIs have facilities for durations.
    • KR: I suggest we should focus on durations.
    • MM: Agreed.
    • YH: what’s the difference?
    • JG: timestamps have some arbitrary 0 point. Durations are, start measuring here, end here, and you get the time delta between them.
    • MM: app doesn’t have to subtract 2 numbers to get the duration.
    • RC: In D3D11 there’s a provision for it erroring out. E.g., app / computer put to sleep. Think that needs to be part of the API too.
    • JG: yes, needs to be a Maybe<uint64>.
    • MM: could we handle that by returning a negative duration?
    • JG: could do. Some APIs return 0. Web IDL doesn’t have a Maybe generally?
    • KR: only for Object / nullable objects.
    • MM: these should be available in Buffers, so WebIDL’s the wrong place to look for this.
    • YH: for timestamps: during execution GPU frequency is unstable. Querying timestamp multiple times in the app might not be reliable.
    • KR: Handled by the Disjoint Timer Query in WebGL. If something happens in the middle such that it can’t handle it, it errors
    • MM: We should read the specs for what the platforms return and figure out if we can reduce to the same duration units.
    • YH: our current proposal is to get time data / difference and we’ll calculate the time to nanoseconds. Metal / Vulkan use ns, D3D doesn’t (“ticks”). Can do more experiments to see if data is stable.
    • MM: if data’s incorrect, that would be bug in the platform API.
    • MM: for iOS, in Metal, timestamp information only available at command buffer boundaries. Current design of begin/endQuery wouldn’t work in that context.
    • YH: yes.
    • JG: because you’d have to split encoders?
    • MM: no, the only way this information is exposed is attributes on the command buffer itself.
    • JG: or resolved into a buffer?
    • MM: no, they’re only available on the CPU. Want it on the GPU, have to upload it manually yourself. May constrain the design here.
    • DP: awkward questions about doing it in the command buffer. Draw calls often executed in parallel. What does it even mean to have a timestamp between draw calls?
    • YH: currently we put it in render pass.
    • MM: think unimplementable on iOS Metal.
    • MM: I like the change you made to visibility buffers. Think that’s an improvement.
    • DM: you have query set in RenderPassDescriptor, and query set in BeginQuery. Do they have to be the same?
    • YH: I think for occlusion queries it should be the same. For RenderPassDescriptor, only need to set this for occlusion queries, not other query types.
    • DM: if for render pass occlusion queries you know the query set, then all 3 queries have different semantics and I think we should use different types of objects for them. E.g., GpuOcclusionQuerySet as opposed to GpuQuerySet.
    • YH: makes sense - maybe can change to different types.
    • MM: or - if you want to do occlusion queries, you set the query set on the RenderPassDescriptor, and then you don’t pass the query set in beginQuery.
    • DM: then you can’t measure things inside the renderpass.
    • YH: for occlusion queries, maybe can only set the index, but for other types we can change the query set.
    • JG: question about resolving to buffer. Why isn’t that the default for many of these? Is resolving into the buffer only for binary occlusion queries? What about timestamps?
    • YH: should we resolve data for occlusion queries only, or for all of them? Think GPU buffer’s used only for occlusion queries.
    • MM: it’s used for statistics in Metal.
    • JG: also resolved into a buffer?
    • MM: yes.
    • JG: if all / most of these resolve into a buffer, maybe it’s worth considering that the default way of using them is to resolve into a buffer and then read that buffer back, instead of having 2-layer API where you can query it with a rich object you get back, or by writing into a buffer.
    • YH: that’s also an option.
    • YH: maybe we don’t need getQueryResults, just resolve to a GPU buffer, and then you can read the GPU buffer back.
    • MM: makes sense to me.
    • YH: then we can delete the last one.
    • DM: I don’t think you need the storage usage requirement for the destination buffer.
    • YH: for occlusion query, maybe need to use compute shader to change things to 0/1. Also conversions to nanoseconds on some APIs.
    • DM: 2 ways to address without storage qualifier. 1) define usage flag “this is for collecting query results”. 2) compute it to a buffer managed by impl, then copy to dest buffer. Extra copy, but unlikely that this will matter.
    • YH: you’re right. Actually 3 solutions. 1) Gpu buffer, add copy_dest and storage buffer usage. 2) use another buffer, do a copy. 3) totally new buffer usage for this, differing for different impls.
    • DM: having any usage is something the APIs do already. Not sure about Metal. Think this is the easiest thing to specify.
    • YH: you mean new buffer usage?
    • DM: yes.
    • YH. OK. Should we introduce a new usage?
    • JG: I think adding a new usage isn’t a bad idea. You know ahead of time if you’re going to use a buffer for query results. Gives browsers a chance to pick what we think is best.
    • YH: another reason to add new usage: if we have SetPredication, need new buffer usage
    • DM / JG: good point. Really good, thanks.
  • RC: if you want more feedback on this do we do it on the slide deck? Pull request? etc.
  • YH: think we’ll submit a pull request.
  • MM: in future, I’d prefer pull requests to slides.
  • YH: ok.
  • DM: are you going to do separate PR for binary occlusion queries in the core API?
  • YH: maybe.
  • DM: thank you.
  • #551 by Myles
  • MM: YH did great investigation here. D3D team said during F2F that major use case of queries is predication. Want to do object culling. Render object bounding box in buffer, see if any pixel made it to framebuffer (binary occlusion query). Next draw predicated on those rendering results. To accomplish, need way to say: this draw call should be predicated on this value in this buffer.
  • MM: did little investigation on predicated rendering. D3D has a SetPredication method. Everything after this call is predicated on this buffer. Can end the region, too. Start predication mode, draw, end predication mode.
  • MM: Vulkan is similar, in extension, 1% availability on Android.
  • MM: Metal doesn’t have this at all. Use indirect command buffers / indirect dispatch / indirect draws, set vertex count to 0 if you don’t want it to draw at all. In Metal, just get GPU to drive rendering, and then get the GPU to not emit that draw call. Or draw 0 vertices.
  • MM: kind of elegant. In WebGPU right now we already have indirect draws / dispatches, so this mechanism can already be used by WebGPU authors today, and run on all devices. That 1% support on Android doesn’t matter if the suggestion is to implement predicated rendering yourself.
  • MM: if we wanted to add an extension that matches the way D3D / Vulkan do things, that’s fine, but implementing it on Metal or Vulkan w/o extension with apps already doing indirect calls would be tricky - lots of fixes up.
  • MM: I think we shouldn’t make any changes to WebGPU for this, just tell authors how to do it themselves.
  • DM: I don’t understand - impl wanting to implement extension but not supported natively, wouldn’t it just not implement the extension?
  • YH: we have this problem already - occlusion queries don’t always return 0/1, but 0/nonzero. Prediction stuff could be confused.
  • JG: Is the compute shader thing just about coalescing all nonzero values to 1?
  • YH: web should have consistent results.
  • MM: both Vulkan extension and D3D allow you to swap what 0 and nonzero mean. Predicate this on zero, predicate on nonzero.
  • JG: was part of DM’s question about why we’re worrying about supporting this as extension if the functionality isn’t available?
  • MM: if we want predicated rendering as extensions in WebGPU it would be cool if Metal supported that extension. If we wanted to claim to the world that we truly supported predicated rendering we’d need to emulate the extension.
  • JG: in WebGL something like this has come up with emulating compressed texture formats on platforms not supporting them. Decision we got to: extension should be an indicator of whether it’s truly supported by the hardware, and developers would make decision of whether they want to truly emulate it. MM is your concern that there’d be pressure for impls to support the extensions because they look like functionality even though they’re not?
  • MM: yes.
  • JG: could we combat that with strong best practices?
  • MM: are the best practices you propose, “please use indirect draws / dispatches”?
  • JG: best practices for predicated would be, if you want universal support, do it this way.
  • MM: could strengthen that by not having an extension that doesn’t work everywhere.
  • JG: I agree that most people won’t need the extension.
  • DM: MM you care very much about benchmarks - think the next best step would be to see whether the native predication APIs are faster than the way proposed with indirect draws.
  • MM: fair point.
  • JG: what do you want to benchmark?
  • DM: using predication vs. indirect draws and zeroing them out.
  • MM: If I understand correctly. if there is an extension, they can use it if it’s supported, otherwise use indirect draws and dispatches everywhere. If there’s a difference then there’s a reason for them to coexist. If there’s no difference, then we should just have the one supported everywhere.
  • DJ: that’s a good argument but we’ll wait on the result. Let’s wrap this up now.
  • DN: caution against 1.4% for example. In GPUInfo, for example, that’s the # of drivers reporting the extension, denominator is distinct number of driver kinds, but not the number of devices that support that extension. 1.4% is not the actual answer, but an indication of the directionality.
  • MM: I’m quoting the numbers we have - don’t think we have better numbers. Don’t think that if you count differently that 1% turns into 90%
  • DN: agree, just saying that in closer cases we should be more careful with the numbers.

Shading Language: Naming

  • DS: going to call it WGSL - WebGPU Shading Language. Chrome impl will go out as Tint - an impl of the language.
  • DJ: any negative reaction to the naming?
  • JF: I think Wiggles is great. :D

Shading Language: Editors

  • DJ: ok. Next question is, who will write the document that describes the language?
  • MM: I’d like to volunteer, don’t know what the process will be. WebGPU had calls, nominations, etc. Could also just decide right now.
  • DS: I’d also be interested in helping out in this respect.
  • JG: let’s give it one period instead of deciding right now.
  • DJ: I think Dan is almost definitely - as the author of the current spec.
  • MM: how about chairs post a call to the email list?
  • DJ: I think like the spec itself we should limit it to 2-3 editors. Becomes easier.

Shading Language: Meetings

  • DJ: do we want separate meetings for this topic?
  • DS: I think so.
  • MM: conclusion from F2F was yes. how often?
  • DS: why don’t we do weekly to start, and go back to biweekly if we don’t have enough to discuss.
  • DJ: Dan, can you schedule such a meeting? Assume we’ll use Google Hangouts.
  • MM: is the editors call right after this meeting?
  • DM: no, it’s after an hour.
  • DJ: obviously not clashing with either this meeting or the editors’ meeting.
  • DJ: if Dan sends out an email and find who’s interested you can negotiate a time.

Shading Language: Repository

  • DJ: where should logistics live? In WebGPU repo or another one?
  • JG: I’d prefer to keep it in the same and have different PRs for different parts. Will be some overlap. Also don’t like idea of it not being super clear of which API spec lines up with shading language spec.
  • DM: we already have the multi-repo structure. CTS in one repo, spec in another.
  • JG: I was against having the CTS in a different repo.
  • DS: doesn’t matter to me.
  • MM: me neither.
  • DJ: ok, I suggest we start in one repo and split it if necessary.
  • DS: so wgsl directort in gpuweb repo?
  • DJ: yes, and wgsl Github tag for issues.
  • DJ: also prefixing issue title like [wgsl].
  • DM: will we follow up with CTS moving into the same repository?
  • DJ: let’s start with this and consider the CTS.
  • JG: WebGL has the mono-repo setup and I think it’s valuable.
  • DJ: I agree, let’s start here and consider folding CTS in.
  • DS: going to touch base with Wendy from W3C about whether it’s OK to reference specs like this.
  • DJ: I have a meeting with Apple’s legal about the referencing. Need email from Neil with what he described in words, or in a presentation. Gets lost in translation.
  • DS: if you can CC us on that we’d appreciate it.
  • DJ: ok, you handle discussion with Wendy, I’ll email Neil. Anyone else want to get copied? Dan + Yunchao
  • JG: really cool that we have shading language resolved.
  • DJ: I want to say it’s great that it’s resolved. Jeff had a good point about us not reviewing pull requests promptly.
  • DN: the news about the shading language has not been broken yet. Would like to have a nice way of saying it that’s constructive and all that. I pinged some of you on the weekend about this. Would like us to have a message out there relatively soon.
  • DJ: assume you’ve talked with everyone who’s given input.
  • JG: would it be useful to once-over a press release style thing?
  • DJ: yes, could be something we publish on our wiki on Github. Our group’s opinion, refer to it.
  • JG: think we need for all legal stuff to be crossed first, but can get started writing the press release.
  • DN: didn’t want it to be too heavyweight but wanted us to agree on the communication.
  • MM: think it’s a good idea.

Agenda for next meeting

  • Buffer uploads
  • MM: design for keeping data on-chip, will have written down by 2 days before next call. Different from either of the native platform ones.
Clone this wiki locally