New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps/media-playback and plugins/win-dshow: Prioritize CUDA for media source decode #10607
deps/media-playback and plugins/win-dshow: Prioritize CUDA for media source decode #10607
Conversation
Prior discussion occurred here: |
Wow, that's a seriously huge performance gap between CUDA and DX11, thanks for collecting those test results. If you feel like testing these changes against some other codecs and making sure that no adverse behavior is observed, that would probably help a lot. I've added the seeking testers label to this PR so that the CI will build binary artifacts if anyone else would like to experiment. |
No problem! Yeah I'll test with AV1 and H.264. H.264 won't support 8K but I'll test it with 4K |
H.264 at 4K gives me about 40% 3D usage in Task Manager with D3D11VA and 18% with CUDA. AV1 does not use the hardware decoder at all for either CUDA or D3D11VA. It theoretically should use the CUDA hardware decoder for AV1 (D3D11VA doesn't support AV1 as far as I know)...that selection is probably in a different area of the code. |
If there's no visual distortion then I'd say go for it. That was the reason it was originally reverted. |
I've tested this on my system (Intel 12900k, RTX 4060, Windows 10) and can verify that CUDA does result in a performance improvement, although I did not reproduce the gains reported by @moocowsheep NVIDIA gpus use a very dynamic clock rate, so the only way to properly test the difference is by simulating a baseline load which causes the GPU to go to max clocks. In this case, I used CS2 in windowed mode on the home screen to apply a load on the GPU, which resulted in clocks going to maximum and a 35% 3D load with OBS also open and no videos playing. When enabling a media source with an 8k video file, I observed 20-22% decode usage in both cases (d3d11/cuda). With 3D, I observed ~60% total usage while decoding with d3d11, and ~50% total 3D usage while decoding with CUDA. |
Discussed off-thread with @tt2468 . This is fine to merge as long as we keep an eye out for reports of issues. If we receive user reports of issues that are root-caused to this change, we may have to revert it with context. |
Description
8K60 files would not play back at full rate in media source on Windows on Nvidia cards, due to D3D11VA and DXVA2 not being able to decode fast enough. This pull request puts CUDA (Really, NVDEC) decoder at highest priority, to enable high resolution / frame rate file decode at full speed. This does NOT use any additional CUDA cores, it just leverages the NVDEC block on the Nvidia GPU.
Motivation and Context
I have only been able to play back 8K HEVC video files on MacOS at full speed...this brings the Windows version at feature parity. I am looking to make this change because VR video playback and re-streaming requires 8K decode, at HEVC and possibly AV1.
How Has This Been Tested?
I tested the media source with HW Decode on 2 Windows machines, one with an Nvidia L4 and one with an Nvidia ADA 6000. This does not appear to affect any other code. Using the D3D11VA decoder, the media source playback plays the 8K60 file at approximately 20fps (1/3 speed). Using the CUDA decoder, the media source playback plays the file at 60fps (full speed).
I gathered detailed stats on the L4 machine through task manager. The CUDA decoder actually uses considerably fewer GPU resources than the D3D11VA decoder. I averaged between 86 and 90% 3D usage with the D3D11VA decoder, and between 42% and 46% 3D usage with the CUDA decoder. GPU memory usage was the same, and Decoder block usage was actually slightly less with CUDA (11% vs 15%). There was no change in CPU or RAM usage.
This appears to fix this video playback issue with no impact on anything else within OBS.
Types of changes
Bug fix (non-breaking change which fixes an issue)
Performance enhancement (non-breaking change which improves efficiency)
Checklist: