Implement cuda/nvdev hwdec for vulkan backend #6170

philipl · 2018-09-30T01:09:35Z

This is a working implementation of nvdec acceleration with the vulkan backend using the new CUDA 10 vulkan interop API. Due to bad documentation, I haven't yet worked out how to work with a VkImage directly in CUDA, and have needed to use an intermediate VkBuffer as the target for the copy from CUDA, which is then copied to the VkImage texture.

The implementation only works on Linux as the windows memory export mechanism is slightly different and I can't test it.

Finally, I did not attempt to do any semaphore based synchronisation between cuda and vulkan. The interop API supports exporting a vulkan semaphore and using it from CUDA. One can imagine this would be necessary to ensure the CUDA copy is correctly synchronised. I can believe that the existing barrier logic on the buffer -> image copy on the vulkan side is sufficient; certainly there is no visual indication of a synchronisation problem.

haasn · 2018-09-30T08:02:27Z

The way you have the code written now, your intermediate buffer will be allocated in host memory, not GPU memory. So you're actually doing a texture download followed by a texture upload.

The code in vk_buf_create effectively determines what memory type to place the buffer in based on the type: for RA_BUF_TYPE_TEX_UPLOAD it requires memFlags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, which in practice means it's forced to be in host RAM.

You could probably introduce a new RA_BUF_TYPE for this (maybe RA_BUF_TYPE_SHARED_MEMORY?) and have it require DEVICE_LOCAL_BIT as well as exportable. (If you combine the bool exportable with this memory type you wouldn't even need the new bool - "is exportable" would be equal to "is of type SHARED_MEMORY")

philipl · 2018-09-30T15:23:29Z

Thanks for catching that. I've added a new RA_BUF_TYPE and removed exportable from the ra API.

philipl · 2018-10-07T23:08:46Z

This is now good to merge.

philipl · 2018-10-10T04:37:29Z

Added an additional change to implement device matching between Vulkan and CUDA. This requires an additional method in the nv-codec-headers.

philipl · 2018-10-14T18:00:14Z

I've implemented a VkBuffer pool to avoid problems with interpolation. Everything I can test seems to work correctly at this stage.

philipl · 2018-10-15T00:48:55Z

Further update with dynamic pool allocation.

DOCS/man/options.rst

haasn

More or less okay. Some commits don't really make sense in isolation, also the commit names don't really conform to mpv style.

The vk_buf_priv thing is sort of annoying, but we've been bikeshedding the buffer pool enough considering it's going to be replaced once nvidia fixes their driver.

video/out/vulkan/malloc.c

video/out/opengl/hwdec_cuda.c

video/out/vulkan/ra_vk.c

video/out/vulkan/utils.c

philipl · 2018-10-16T14:26:57Z

I've pushed updates that address all comments except the one about extension loading. I will work on that next.

philipl · 2018-10-17T05:19:39Z

I've now addressed all the outstanding comments, including one out of band from @BtbN that it wasn't building cleanly if one or the other backend was disabled.

I can squash changes in various ways if you want; just let me know what you'd like me to do there.

video/out/opengl/hwdec_cuda.c

philipl · 2018-10-18T15:09:32Z

Updated to address comments (except the one open question).

philipl · 2018-10-19T04:54:34Z

Simplified the GL vs Vulkan #ifdefs.

haasn · 2018-10-20T04:32:41Z

I see no more major issues with this, just bikeshedding / cosmetic / peace-of-mind. So I guess it LGTM.

I would suggest squashing the two commits related to external memory extensions together (9ae36c5 and 6528f07), as well as all of the commits with hwdec_cuda in the name. The changes make more sense as a whole than as the individual commits, and they partially undo each other so it makes the history needlessly confusing to have them separate.

philipl · 2018-10-20T05:00:23Z

I have squashed all the cuda changes and left the vulkan changes in three logical parts (exportable memory, buffer user data, and the device UUID getter). Thanks!

haasn

Two more minor changes I found while re-viewing the squashed commits.

video/out/vulkan/common.h

video/out/vulkan/utils.c

The CUDA/Vulkan interop works on the basis of memory being exported from Vulkan and then imported by CUDA. To enable this, we add a way to declare a buffer as being intended for export, and then add a function to do the export. For now, we support the fd and Handle based exports on Linux and Windows respectively. There are others, which we can support when a need arises. Also note that this is just for exporting buffers, rather than textures (VkImages). Image import on the CUDA side is supposed to work, but it is currently buggy and waiting for a new driver release. Finally, at least with my nvidia hardware and drivers, everything seems to work even if we don't initialise the buffer with the right exportability options. Nevertheless I'm enforcing it so that we're following the spec.

This is arguably a little contrived, but in the case of CUDA interop, we have to track additional state on the cuda side for each exported buffer. If we want to be able to manage buffers with an ra_buf_pool, we need some way to keep that CUDA state associated with each created buffer. The easiest way to do that is to attach it directly to the buffers.

We need this to do device matching for the cuda interop.

Despite their place in the tree, hwdecs can be loaded and used just fine by the vulkan GPU backend. In this change we add Vulkan interop support to the cuda/nvdec hwdec. The overall process is mostly straight forward, so the main observation here is that I had to implement it using an intermediate Vulkan buffer because the direct VkImage usage is blocked by a bug in the nvidia driver. When that gets fixed, I will revist this. Nevertheless, the intermediate buffer copy is very cheap as it's all device memory from start to finish. Overall CPU utilisiation is pretty much the same as with the OpenGL GPU backend. Note that we cannot use a single intermediate buffer - rather there is a pool of them. This is done because the cuda memcpys are not explicitly synchronised with the texture uploads. In the basic case, this doesn't matter because the hwdec is not asked to map and copy the next frame until after the previous one is rendered. In the interpolation case, we need extra future frames available immediately, so we'll be asked to map/copy those frames and vulkan will be asked to render them. So far, harmless right? No. All the vulkan rendering, including the upload steps, are batched together and end up running very asynchronously from the CUDA copies. The end result is that all the copies happen one after another, and only then do the uploads happen, which means all textures are uploaded the same, final, frame data. Whoops. Unsurprisingly this results in the jerky motion because every 3/4 frames are identical. The buffer pool ensures that we do not overwrite a buffer that is still waiting to be uploaded. The ra_buf_pool implementation automatically checks if existing buffers are available for use and only creates a new one if it really has to. It's hard to say for sure what the maximum number of buffers might be but we believe it won't be so large as to make this strategy unusable. The highest I've seen is 12 when using interpolation with tscale=bicubic. A future optimisation here is to synchronise the CUDA copies with respect to the vulkan uploads. This can be done with shared semaphores that would ensure the copy of the second frames only happens after the upload of the first frame, and so on. This isn't trivial to implement as I'd have to first adjust the hwdec code to use asynchronous cuda; without that, there's no way to use the semaphore for synchronisation. This should result in fewer intermediate buffers being required.

philipl · 2018-10-20T14:41:51Z

Fixed. Thanks.

sfan5 · 2018-10-22T19:15:01Z

With these patches applied, nvdec stops working for both OpenGL and Vulkan.
The log file doesn't really show any error: https://0x0.st/s6Hu.txt

philipl · 2018-10-22T19:27:53Z

That logs says that nvdec isn't compiled in. Did you update your nv-codec-headers?

sfan5 · 2018-10-22T19:35:42Z

Arch had an update for ffnvcodec-headers in the repos, works fine after installing that.

philipl force-pushed the master branch from b470f43 to 434485e Compare September 30, 2018 15:22

philipl force-pushed the master branch 7 times, most recently from a4294e2 to d2a416f Compare October 7, 2018 23:06

philipl force-pushed the master branch from 15021bd to dec464c Compare October 10, 2018 04:37

philipl force-pushed the master branch from ac2aa9f to 78e001f Compare October 15, 2018 00:48

BtbN reviewed Oct 15, 2018

View reviewed changes

DOCS/man/options.rst Outdated Show resolved Hide resolved

philipl force-pushed the master branch from 78e001f to c0c3eac Compare October 15, 2018 15:17

haasn requested changes Oct 15, 2018

View reviewed changes

video/out/vulkan/malloc.c Outdated Show resolved Hide resolved

video/out/opengl/hwdec_cuda.c Outdated Show resolved Hide resolved

video/out/vulkan/ra_vk.c Outdated Show resolved Hide resolved

video/out/vulkan/utils.c Outdated Show resolved Hide resolved

philipl force-pushed the master branch from c0c3eac to acb4df4 Compare October 16, 2018 14:26

philipl force-pushed the master branch from acb4df4 to 3e7d18b Compare October 17, 2018 04:58

BtbN reviewed Oct 18, 2018

View reviewed changes

video/out/opengl/hwdec_cuda.c Show resolved Hide resolved

haasn requested changes Oct 18, 2018

View reviewed changes

video/out/opengl/hwdec_cuda.c Outdated Show resolved Hide resolved

video/out/opengl/hwdec_cuda.c Outdated Show resolved Hide resolved

video/out/opengl/hwdec_cuda.c Show resolved Hide resolved

philipl force-pushed the master branch from 9d0e73a to 2dc6ea8 Compare October 18, 2018 15:08

haasn approved these changes Oct 20, 2018

View reviewed changes

philipl force-pushed the master branch from 085acd1 to 6d8c9e0 Compare October 20, 2018 04:59

philipl force-pushed the master branch from 6d8c9e0 to e2f2056 Compare October 20, 2018 05:01

haasn approved these changes Oct 20, 2018

View reviewed changes

video/out/vulkan/common.h Outdated Show resolved Hide resolved

video/out/vulkan/utils.c Outdated Show resolved Hide resolved

philipl added 4 commits October 20, 2018 07:41

vo_gpu: vulkan: Add a function to get the device UUID

14eccec

We need this to do device matching for the cuda interop.

philipl force-pushed the master branch from e2f2056 to a67336d Compare October 20, 2018 14:41

sfan5 merged commit da1073c into mpv-player:master Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cuda/nvdev hwdec for vulkan backend #6170

Implement cuda/nvdev hwdec for vulkan backend #6170

philipl commented Sep 30, 2018

haasn commented Sep 30, 2018

philipl commented Sep 30, 2018

philipl commented Oct 7, 2018

philipl commented Oct 10, 2018

philipl commented Oct 14, 2018

philipl commented Oct 15, 2018

haasn left a comment

philipl commented Oct 16, 2018

philipl commented Oct 17, 2018

philipl commented Oct 18, 2018

philipl commented Oct 19, 2018

haasn commented Oct 20, 2018 •

edited

philipl commented Oct 20, 2018

haasn left a comment

philipl commented Oct 20, 2018

sfan5 commented Oct 22, 2018 •

edited

philipl commented Oct 22, 2018

sfan5 commented Oct 22, 2018 •

edited

Implement cuda/nvdev hwdec for vulkan backend #6170

Implement cuda/nvdev hwdec for vulkan backend #6170

Conversation

philipl commented Sep 30, 2018

haasn commented Sep 30, 2018

philipl commented Sep 30, 2018

philipl commented Oct 7, 2018

philipl commented Oct 10, 2018

philipl commented Oct 14, 2018

philipl commented Oct 15, 2018

haasn left a comment

Choose a reason for hiding this comment

philipl commented Oct 16, 2018

philipl commented Oct 17, 2018

philipl commented Oct 18, 2018

philipl commented Oct 19, 2018

haasn commented Oct 20, 2018 • edited

philipl commented Oct 20, 2018

haasn left a comment

Choose a reason for hiding this comment

philipl commented Oct 20, 2018

sfan5 commented Oct 22, 2018 • edited

philipl commented Oct 22, 2018

sfan5 commented Oct 22, 2018 • edited

haasn commented Oct 20, 2018 •

edited

sfan5 commented Oct 22, 2018 •

edited

sfan5 commented Oct 22, 2018 •

edited