Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrawIndirectCount #1354

Open
frustum opened this issue Jan 19, 2021 · 42 comments · May be fixed by #2315
Open

DrawIndirectCount #1354

frustum opened this issue Jan 19, 2021 · 42 comments · May be fixed by #2315
Labels
api WebGPU API feature request A request for a new GPU feature exposed in the API
Milestone

Comments

@frustum
Copy link

frustum commented Jan 19, 2021

Hello,

It would be awesome if the drawIndirectCount functionality will be available in WebGPU.
DrawIndirect functionality is already available, and it's excellent, but DrawIndirectCount allows to render more advanced things.

DrawIndirectCount is available under almost all platforms now: Vulkan, OpenGL, Direct3D12, Direct3D11 with AMD AGS.
Metal API has an Argument buffer extension, which should be able to emulate such functionality.

Thank you!

void vkCmdDrawIndirectCount( VkBuffer buffer, VkDeviceSize offset, VkBuffer countBuffer, VkDeviceSize countBufferOffset, uint32_t maxDrawCount, uint32_t stride);

void vkCmdDrawIndexedIndirectCount( VkBuffer buffer, VkDeviceSize offset, VkBuffer countBuffer, VkDeviceSize countBufferOffset, uint32_t maxDrawCount, uint32_t stride);

@Kangz Kangz added the feature request A request for a new GPU feature exposed in the API label Jan 19, 2021
@kainino0x kainino0x added this to the post-MVP milestone Jan 19, 2021
@kainino0x
Copy link
Contributor

kainino0x commented Jan 19, 2021

Of note, our current drawIndirect/drawIndexedIndirect can only do one draw call - we don't even have a drawCount parameter like Vulkan's vkCmdDrawIndirect. This is because of the multiDrawIndirect feature which is available on many but not all Android devices.

VUID-vkCmdDrawIndirect-drawCount-02718
If the multi-draw indirect feature is not enabled, drawCount must be 0 or 1

Also note the drawIndirectCount feature is optional in Vulkan, and is available on no Android devices.

drawIndirectCount indicates whether the implementation supports the vkCmdDrawIndirectCount and vkCmdDrawIndexedIndirectCount functions. If this feature is not enabled, these functions must not be used.

I think we would be interested in adding optional support for both multiDrawIndirect and drawIndirectCount, but since they're optional this will be post-1.0.

@frustum
Copy link
Author

frustum commented Jan 19, 2021

Yep, it would be great to have that extension in the future.
multiDrawIndirectCount() is available on Android Qualcomm devices, and we are using it now.
Device Name: Adreno (TM) 650 V@444.0 (GIT@380077c, Ifdda647016, 1598982727) (Date:09/01/20)
So it is a simple extension in the case of the desktop browser running over Vulkan/OpenGL/Direct3D12.
Thank you

@kvark
Copy link
Contributor

kvark commented Jan 19, 2021

True! I think we can even poly-fill this everywhere. We could schedule a compute operation to happen before the render pass that would fill out a temporary GPU indirect buffer. I.e. we'd copy over the indirect data for maxDrawCount entries into a temporary buffer, and then zero out the counts in entries that have an index equal or above the actual count value in the buffer. Then we'd just turn a draw_indirect_count into a for loop of draw_indirect calls on platforms that need that.

Anyway, I don't think it's worth pursuing at this point, we better focus on MVP essentials instead.

@frustum
Copy link
Author

frustum commented Jan 19, 2021

I agree that it looks like not very important extensions. But with can make super awesome WebGPU showcases with it. CPU emulation will not work when the number of draw calls is getting more than several thousand. The driver will do the job better. Or we will wait for better times :)

@cdiggins
Copy link

cdiggins commented Feb 8, 2021

Also note the drawIndirectCount feature is required in Vulkan, and is available on [no Android devices]
(http://vulkan.gpuinfo.org/listdevicescoverage.php?core=1.2&feature=drawIndirectCount&platform=android).

Just a small note, that I actually found it on 11% of Android devices according to this search:
http://vulkan.gpuinfo.org/listdevicescoverage.php?extension=VK_KHR_draw_indirect_count&platform=android

@kainino0x
Copy link
Contributor

kainino0x commented Feb 9, 2021

Ah, didn't realize it was previously an extension (still getting used to the new refactored vulkan.gpuinfo.org that splits by vulkan version).

Also note the drawIndirectCount feature is required in Vulkan

I must have meant "not required"?

@alecazam
Copy link

alecazam commented Jun 17, 2021

So iOS on 6S and above has a full implementation of indirect draw, but the 63% Android score means it's limited to 1 draw call in WebGPU 1.0? This is a call that one would only use when it's supported, and fallback to explicit draws if not. Only having 1 draw is a reason not to use this at all.

In the iOS9 api, also seems like iOS is missing a drawCount in the draw call where Vulkan has one. This means that the drawIndexedPrimitive and offset into the buffer must be called for each indirect draw. This kind of defeat the purpose of reducing draws, but at least allows Vulkan to benefit while Metal can catch up later.

@frustum
Copy link
Author

frustum commented Jun 17, 2021

Hello,

I'm asking about that particular extension because compute shaders are useless without the ability to draw the result. Or everything will be limited by a single draw call which is only suitable for simple things like terrain or grass. This is okay for CPU-driven technologies. But compute shaders allows moving everything on compute level. And boosting performance and power efficiency as a result.

This is a link to our benchmark: https://gravitymark.com/

And I think that it must work on WebGPU. Otherwise, I don't know why the WebGPU project was initiated. We don't need much from the functionality level. The engine is perfectly working on OpenGLES 3.1 feature set level. And WebGPU should be at least equal to OpenGLES. Moreover, OpenGLES is an outdated API for everybody.

We have full WebGPU and WGSL support. And there is a minimal list of features which is stopping us from running our engine on WebGPU:

  • storage buffer access from vertex and fragment shaders
  • multiple viewports support (supported by all desktop API)
  • multi draw indirect count (supported by all desktop API except Direct3D11 running on Nvidia/Intel, and Metal)

It is possible to workaround multi_draw_indirect_count on platforms where it's not supported by additional synchronization and loop of draw indirect command inside. And sometimes, it's working even faster than the driver.

Guys, you already did tremendous work with WebGPU. Please don't stop with it :)

@kainino0x
Copy link
Contributor

AFAIK, in order to achieve compute-driven multidraw right now, you need to issue one indirect draw call for each possible draw call you want to do, then use a compute shader to generate up-to that many draw calls, zeroing out the ones you don't need. This isn't amazing, but it still might be able to achieve good performance, at least if you have a reasonable bound on the maximum number of draw calls. Unfortunately I don't know the characteristics of this technique on different architectures.

Most likely we'll later have optional features for all of those things you listed, but unlikely before MVP so we can focus on the core.

@mrshannon
Copy link
Contributor

I have to 2nd the need for multiDrawIndirect. In fact it should be an MVP feature. Without it the amount of draw calls, and thus API/driver overhead, is too great for complex scenes. And complex scenes are the only reason for many to use WebGPU over WebGL.

@frustum
Copy link
Author

frustum commented Sep 11, 2021

It's doable to emulate MDI with Metal. It's working much faster with Apple GPU. But all AMD GPUs are losing a lot of performance with it. Because the driver will do a redundant job of converting draw calls back to MDI format: https://tellusim.com/metal-mdi/

@kainino0x
Copy link
Contributor

If we do emulation like that we could enable it on Apple GPUs but not on AMD GPUs if it doesn't perform well! 👀

@jeremyong
Copy link

Also note the drawIndirectCount feature is optional in Vulkan, and is available on no Android devices.

@kainino0x The GPU Info DB shows percentages based on reported device support, but this percentage isn't usable as a proxy for real world device market share. Was there any analysis done on an estimate of actual devices in the wild that don't support multiDrawIndirect?

@kainino0x
Copy link
Contributor

@jeremyong We don't use the percentages on gpuinfo.org to understand market share; instead, we use it as a rough estimate (i.e. is it very close to 0% or 100%?). To actually figure out what actual devices we would lose by requiring a feature we have to dig deeper - see #1069 (comment).

In this case the claim I made was incorrect because I looked only at the core 1.2 feature drawIndirectCount, and not at VK_KHR_draw_indirect_count. But looking at that extension on http://vulkan.gpuinfo.org/listextensions.php we can easily see that it's close to neither 0% nor 100%, on both desktop and android, which is enough for us to understand that (1) it can't be core, and (2) it's a viable optional feature for both desktop and mobile devices.

@alecazam
Copy link

The reports on gpuinfo.org are mostly taken from Linux driver installs that are often homegrown. They don't seem to reflect many of the better drivers on Windows that are way more prevalant installs of Vulkan. Android reports are more reflective of that platform. So I wouldn't rely on percentages there as a in-use percentage like the Steam charts.

Apple A9 has ICB with the ability to draw a range from CPU, and A11 can generate the range and count from the GPU, and repack missing elements in the array that is output via that blit encoder.

Mali doesn't have multiDrawIndirect, but Adreno does. But even then, one needs the VK_EXT_descriptor_indexing extension to switch the descriptorSet within the drawIndirect calls for varied materials.

Also Mali and Adreno are missing VK_conditional_rendering extension to skip instances, but that stinks for multiDrawIndirect. Can use VTF to vertkill all the verts if a visibility buffer reports that a shape is occluded.

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
This CL adds unimplemented stub tests for the `transpose` builtin.

Issue: gpuweb#1246
@logankaser
Copy link

I think there is a potential for a performance regression compared to WebGL for renderers using https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_multi_draw if this is not included.
It's only available for chrome, but in my experience it makes a large difference, perhaps because chrome's
draw call overhead for WebGL is high, and It's fairly easy even for a user of the API to implement a fallback when it's not available, which is what I did for Firefox / Safari.
For my use case, I was using it only to reduce draw call overhead.

@mmgeorge
Copy link

mmgeorge commented Sep 5, 2023

Just tried to use drawIndirect, if I understand correctly from @kainino0x's comment, we can currently only use an indirect buffer of u32x4 for a single draw? Would also very much like to see support for an optional drawCount parameter as that would make this much more useful than it is currently.

EDIT: I think I'm hitting a chrome bug with this when trying to call drawIndirect multiple times?

@JesseRMeyer
Copy link

JesseRMeyer commented Nov 8, 2023

Recording integers to an array is much faster than making a draw call. Allowing web workers (or even later -- compute) to fill in these parameters to a shared array opens up many optimization opportunities and possibly power efficiency gains. I strongly favor seeing a count parameter added to drawIndirect, even if for now the count value must be supplied at call time and not found in some other buffer (say, GPU local). Otherwise, as is, this functionality is no better than supplying the parameters directly, and possibly worse. I see this is being explored in even richer territory (multidrawindirect) here: #2315 which I find promising!

@lcodes
Copy link

lcodes commented Mar 17, 2024

Is there any news on this feature?

Even behind a feature flag at device creation time, it would be valuable to support this.

For many domains, ie large game worlds or CAD applications, it makes no sense to play them on mobile or older hardware anyways; the target audience already has support for it, but right now it's hidden away from wgpu while available everywhere else.

@chingham
Copy link

Still not implemented ? Even as an extension, it really is a must have and is the main reason we would use WebGPU instead of WebGL

@ringoz
Copy link

ringoz commented Mar 22, 2024

we need at least MultiDrawIndirect for GPU-driven rendering.
it is possible to emulate bindless resources, but impossible to efficiently emulate MDI.

@saydric
Copy link

saydric commented Mar 25, 2024

I totally agree, MultiDrawIndirect + Bindless textures are a must have if we want WebGPU to perform as well as a desktop app on scene with a large amount of meshes (e.g. CAD data)

@cdiggins
Copy link

We need this feature in the AEC (Architecture Engineering and Construction) industry to enable rendering of medium to large-size architectural models (aka BIM for Building Information Models) on browsers. Right now, only small models can be rendered with reasonable FPS.

@logankaser
Copy link

Bindless textures can be emulated with array textures, but MD / MDI can't

@kainino0x kainino0x removed this from the Milestone 3+ milestone Mar 25, 2024
@kainino0x kainino0x added this to the Milestone 1 milestone Mar 25, 2024
@kainino0x
Copy link
Contributor

We clearly need to reevaluate the priority here so moving to M1 for us to at least re-triage.

@kainino0x
Copy link
Contributor

kainino0x commented Mar 26, 2024

Oops, just realized I had a duplicated bug in #4349. Let me move this here:


Originally posted by @mrshannon in #1949

Indirect drawing with multiple indirect drawing commands is a common technique for drawing complex scenes that would otherwise be infeasible due to either an excessive number of CPU issued draw calls or scene complexity that cannot be built by the CPU alone. This is done by:

  • Executing multiple draws with a single API call.
  • Allowing the GPU to generate both geometry and the draws necessary to render it.
  • Culling out unnecessary draw calls on the GPU in more complex scenes than CPU culling could achieve.

This PR addresses adding a multi-draw-indirect feature. In particular it addresses adding:

  • multiDrawIndirect and multiDrawIndexedIndirect methods on GPURenderEncoderBase.
    • Allows submitting multiple draws with a single API call (multi-draw).
    • Allows the GPU to determine the number of draw calls (draw count).
    • Use cases:
      • GPU derived scene data
      • GPU based culling
      • GPU based LOD
      • Efficient execution of complex scenes with a large number of draws
  • Non-zero firstInstance for drawIndirect, drawIndexedIndirect, multiDrawIndirect, and multiDrawIndexedIndirect.
    • This is the only available per draw input, without rebinds, that is readable in the shader.
    • Use cases:
      • Select instance stride vertex data
      • Index into per object or per draw data in storage buffers
      • Multi material, single API call, rendering

Compatibility

The required backend features to implement multi-draw-indirect are available on:

  • Newer Apple devices (~2016+)
  • All DX12 devices
  • All Vulkan capable desktops (with up to date drivers)
  • 30% of Android devices

See the sections below for details.

Vulkan

Multi-Draw

Requires the 0 or 1 restriction on the drawCount argument of vkCmdDrawIndirect and vkCmdDrawIndexIndirect to be relaxed to any non-negative integer. This requires the multiDrawIndirect feature which is supported on:

  • 99% of desktop GPUs
  • 63% of Android devices

NOTE: The stride argument will always be set for tight packing, in order to maintain compatibility with DX12.

Draw Count

Requires the vkCmdDrawIndirectCount and vkCmdDrawIndexedIndirectCount functions which are provided by either the drawIndirectCount feature of Vulkan 1.2 or one of the following extensions:

  • VK_AMD_draw_indirect_count
  • VK_KHR_draw_indirect_count

Because drawIndirectCount was introduced in driver updates the statistics at https://vulkan.gpuinfo.org cannot be relied upon. The following is based on the oldest card that supports drawIndirectCount from each manufacturer, if newer cards dropped support for drawIndirectCount that is not captured here.

  • Intel integrated cards (that support Vulkan) support drawIndirectCount.
  • NVIDIA cards going back to Kepler support drawIndirectCount.
  • AMD cards going back to the HD 8000 series support drawIndirectCount.

For Android:

  • drawIndirectCount is supported on 100% of devices that support Vulkan 1.2.
  • drawIndirectCount is supported, as an extension, on 28% of devices that do not support Vulkan 1.2.

Non-zero firstInstance

Requires the firstInstance property of the VkDrawIndirectCommand and VkDrawIndexedIndirectCommand to be non-zero. This requires the drawIndirectFirstInstance feature which is supported on:

  • 99% of desktop GPUs
  • 64% of Android devices

DX12

All required features are core to DX12.

Multi-Draw

Uses ExecuteIndirect where the MaxCommandCount argument is greater than 1 and the pArgumentBuffer argument points to a GPU buffer containing an array of D3D12_DRAW_ARGUMENTS or D3D12_DRAW_INDEXED_ARGUMENTS.

NOTE: The binary layout of these structs are compatible with Vulkan.

Draw Count

Uses ExecuteIndirect where the pCountBuffer argument is not NULL.

Non-zero firstInstance

This is the StartInstanceLocation of the D3D12_DRAW_ARGUMENTS or D3D12_DRAW_INDEXED_ARGUMENTS structures. Has native support for values greater than 0.

Metal

Multi-Draw

Can be emulated with Indirect Command Buffers (ICBs) and an extra compute shader invocation to translate from the Vulkan-like indirect draw buffer to an ICB.

Requires

  • iOS 12.0+
  • macOS 10.14+
  • MTLGPUFamilyMac2

Non-zero firstInstance

Natively supported with the baseInstance argument.

Draw Count

Don't record commands past this count in the ICB and use optimizedIndirectCommandBuffer.

Requires

  • iOS 12.0+
  • macOS 10.14+
  • MTLGPUFamilyMac2

@kainino0x
Copy link
Contributor

See #4349 for gpuinfo-vulkan-query results and Nov 2023 meeting minutes also.

@alecazam
Copy link

alecazam commented Mar 26, 2024

Just to be clear MDI isn't universal. It's available on Vulkan desktop and that's it. On Android, Mali Vulkan is limited to 1 draw from the buffer per draw call. And much of the older Android vulkan hw. And on Metal, it's the same. For some reason, Apple decided to hobble the MDI API on A9 and encourage use of a different more complex API that has range support, but is far more challenging to encode on the gpu. That encoding support didn't happen until A11. There's almost no universal support for the indirect count that is also needed.

I still think there's value in MDI + count 1 though. Since the content is in a buffer, the GPU can generate it, and can zero or mod the instance count in the MDI buffer structs. But it means that the cpu has to specify a higher count, and then draw zero count MDI structs. Then A13 is needed for descriptor indexing for material indirection. This is way simpler to use than Apple's current indirect draw arguments.

@JesseRMeyer
Copy link

JesseRMeyer commented Mar 26, 2024

There's two concerns at play. The user API and the underlying implementation. For a user to specify an array of draw parameters is from an application design perspective already a win towards future proofing if the underlying implementation has to run under emulation, and it's probably no worse in that case than the user doing essentially the same work of issuing a draw per array item in a loop anyway. The question seems to be related to performance expectations (non-emulated vs accelerated). But again, for hardware that doesn't natively accelerate MDI would be fed draw calls via a loop of some kind, whether from a user application or the graphics library. So offering the API generally seems like a net win to me all around.

@lcodes
Copy link

lcodes commented Mar 26, 2024

Behind a feature flag passed at device creation time, this would be perfect.

If it's there, use it, if not fallback to single draws in a loop.

I think it's more valuable to offer advanced APIs as optional features, than not offering them at all.

@alecazam
Copy link

There is no emulation of a GPU driven count with MDI. There is the fixed count from the cpu, and the potential for a dynamic count from the GPU.

@JesseRMeyer
Copy link

The implementation could issue a hard gpu -> cpu sync in that case but I think most (including myself) would prefer an error instead. Presumably there could be MultiDrawCount and MultiDrawIndirectCount, where the latter is restricted behind a feature flag.

@mrshannon
Copy link
Contributor

There is no emulation of a GPU driven count with MDI. There is the fixed count from the cpu, and the potential for a dynamic count from the GPU.

Actually, you can use a compute shader to emulate this fairly easily by zeroing out the instanceCount of the draws after the draw count in the GPU side buffer. If you have hardware backed MDI but not draw count this is not too bad a performance hit. The problem is if you also have to emulate multi-draw itself by running a loop you have to issue many draw calls that may not do anything. So, it's possible to always emulate this approach, the problem is that while it's possible to emulate for correctness, the performance difference between full emulation and no emulation (or draw count emulation) is large. Therefore, it's probably better that this emulation be left to the developer so they can try to do better than naive emulation.

@logankaser
Copy link

Therefore, it's probably better that this emulation be left to the developer so they can try to do better than naive emulation.

Worth pointing out that for many years (I'm not familiar with the current state of the codebase),
multidraw in chrome did just loop over the
input. This did actually still help a lot, as it reduced JS vm trampoline and allowed some validation to be hoisted out
of the loop. Just my two cents.

@alecazam
Copy link

alecazam commented Mar 27, 2024

That doesn't work. This isn't cpu driven. The MDI data must reside in a buffer, then compute or vertex shader can modify the values, and then the draw needs to increment 1 buffer slot on iOS/Mali/older Android to the cpu count, or by multidraw indirect set by the gpu into a buffer. The idea of offering two apis with multidraw count indirect optional is probably best. iOS and older Android just wouldn't offer that, but there are Apple devs on this thread. Also the iOS MDI API can't have additional data (no stride), but Vulkan can.

@lcodes
Copy link

lcodes commented Mar 27, 2024

dispatchCount and indirectCount aren't always easy to emulate, or possible.

There's a hack I'm using by doing a mod/div over the instanceIndex or globalInvocationId, but when the counts differ from batch to batch it's no longer possible.

For example, doing GPU culling means a series of indirect draws each using a different instanceCount and startInstance for each draw.

Doing a naive loop over drawIndirect on the CPU right now, but also issuing 1500 draw calls instead of 1, most of which are zero'd out. It's 1ms of javascript time being wasted.

Having optional support behind a feature flag means users whose hardware support it can handle exponentially larger scenes, while those without support also see reduced simulation limits but can still run it.

I think the need for all target hardware to support a given feature is a downgrade, because it forces high-end machines to the level of the lowest common denominator. More feature flags makes webgpu also more attractive to developers with higher-end hardware in mind.

@Dampfwalze
Copy link

I want to note that wgpu does already implement this draw method. It is behind the MULTI_DRAW_INDIRECT feature, which introduces multi_draw_indirect and multi_draw_indexed_indirect. It is implemented for Vulkan and DX12 and emulated on Metal using draw_indirect. But I suspect this support could be extended.

On top of that, it also has the MULTI_DRAW_INDIRECT_COUNT feature, which allows you to specify the count via a GPU buffer.

@lcodes
Copy link

lcodes commented Mar 28, 2024

Oh it has push constants too, nice.

Let's get these on the web too :)

@alecazam
Copy link

With web you get LCD. Push constants are problematic in other ways. Some support fp16 and some don't (AMD). Maybe WebGPU doesn't even offer fp16 support.

@mrshannon
Copy link
Contributor

mrshannon commented Apr 3, 2024

Push constants are problematic in other ways. Some support fp16 and some don't (AMD). Maybe WebGPU doesn't even offer fp16 support.

I don't need f16 push constants. If I can write u32 push constants, then I can encode/decode anything I need.

@Kangz
Copy link
Contributor

Kangz commented Apr 9, 2024

GPU Web WG 2024-03-27
  • KN: accidentally filed another issue about this. Last time we discussed was last Nov, 2023. Milestone prioritization? An optional feature. If we put it in the spec, don't need to implement immediately. But it's complicated.
  • KG: Moz would prefer for M2. But we don't have regular indirect support yet - don't want to build on it before we have impl feedback.
  • KN: suspect wgpu already has this.
  • KG: not the validation.
  • JB: can't enable in Firefox because we need to add that validation pass.
  • KN: pretty big thing - our team isn't that excited about implementing this either, at least as much as users want it. No problem with M2.

@kainino0x kainino0x added the api WebGPU API label Apr 30, 2024
@ivanpopelyshev
Copy link

I'm on the way to support webgpu on tesera.io . I already use draft multidraw_bvbi webgl functionality with fallback on vertexAttribPointer, and since this moment I wait for webgpu multidraw too.

God bless whoever implements this!

@kainino0x kainino0x linked a pull request Jun 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api WebGPU API feature request A request for a new GPU feature exposed in the API
Projects
None yet
Development

Successfully merging a pull request may close this issue.