Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps/media-playback and plugins/win-dshow: Prioritize CUDA for media source decode #10607

Merged
merged 2 commits into from May 12, 2024

Conversation

moocowsheep
Copy link
Contributor

Description

8K60 files would not play back at full rate in media source on Windows on Nvidia cards, due to D3D11VA and DXVA2 not being able to decode fast enough. This pull request puts CUDA (Really, NVDEC) decoder at highest priority, to enable high resolution / frame rate file decode at full speed. This does NOT use any additional CUDA cores, it just leverages the NVDEC block on the Nvidia GPU.

Motivation and Context

I have only been able to play back 8K HEVC video files on MacOS at full speed...this brings the Windows version at feature parity. I am looking to make this change because VR video playback and re-streaming requires 8K decode, at HEVC and possibly AV1.

How Has This Been Tested?

I tested the media source with HW Decode on 2 Windows machines, one with an Nvidia L4 and one with an Nvidia ADA 6000. This does not appear to affect any other code. Using the D3D11VA decoder, the media source playback plays the 8K60 file at approximately 20fps (1/3 speed). Using the CUDA decoder, the media source playback plays the file at 60fps (full speed).

I gathered detailed stats on the L4 machine through task manager. The CUDA decoder actually uses considerably fewer GPU resources than the D3D11VA decoder. I averaged between 86 and 90% 3D usage with the D3D11VA decoder, and between 42% and 46% 3D usage with the CUDA decoder. GPU memory usage was the same, and Decoder block usage was actually slightly less with CUDA (11% vs 15%). There was no change in CPU or RAM usage.

This appears to fix this video playback issue with no impact on anything else within OBS.

Types of changes

Bug fix (non-breaking change which fixes an issue)

Performance enhancement (non-breaking change which improves efficiency)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@RytoEX
Copy link
Member

RytoEX commented Apr 26, 2024

@tt2468 tt2468 added Enhancement Improvement to existing functionality Seeking Testers Build artifacts on CI Windows Affects Windows labels Apr 26, 2024
@tt2468
Copy link
Member

tt2468 commented Apr 26, 2024

Wow, that's a seriously huge performance gap between CUDA and DX11, thanks for collecting those test results. If you feel like testing these changes against some other codecs and making sure that no adverse behavior is observed, that would probably help a lot. I've added the seeking testers label to this PR so that the CI will build binary artifacts if anyone else would like to experiment.

@moocowsheep
Copy link
Contributor Author

moocowsheep commented Apr 26, 2024

No problem! Yeah I'll test with AV1 and H.264. H.264 won't support 8K but I'll test it with 4K

@moocowsheep
Copy link
Contributor Author

H.264 at 4K gives me about 40% 3D usage in Task Manager with D3D11VA and 18% with CUDA. AV1 does not use the hardware decoder at all for either CUDA or D3D11VA. It theoretically should use the CUDA hardware decoder for AV1 (D3D11VA doesn't support AV1 as far as I know)...that selection is probably in a different area of the code.

@RytoEX
Copy link
Member

RytoEX commented May 4, 2024

Just FYI, CUDA was originally added here:
a3fface

It was removed here:
8657293

@Lain-B
Copy link
Collaborator

Lain-B commented May 4, 2024

If there's no visual distortion then I'd say go for it. That was the reason it was originally reverted.

@tt2468
Copy link
Member

tt2468 commented May 12, 2024

I've tested this on my system (Intel 12900k, RTX 4060, Windows 10) and can verify that CUDA does result in a performance improvement, although I did not reproduce the gains reported by @moocowsheep

NVIDIA gpus use a very dynamic clock rate, so the only way to properly test the difference is by simulating a baseline load which causes the GPU to go to max clocks. In this case, I used CS2 in windowed mode on the home screen to apply a load on the GPU, which resulted in clocks going to maximum and a 35% 3D load with OBS also open and no videos playing.

When enabling a media source with an 8k video file, I observed 20-22% decode usage in both cases (d3d11/cuda). With 3D, I observed ~60% total usage while decoding with d3d11, and ~50% total 3D usage while decoding with CUDA.

@tt2468 tt2468 merged commit ce4c99b into obsproject:master May 12, 2024
14 checks passed
@RytoEX RytoEX added this to the OBS Studio (Next Version) milestone May 12, 2024
@RytoEX
Copy link
Member

RytoEX commented May 12, 2024

Discussed off-thread with @tt2468 . This is fine to merge as long as we keep an eye out for reports of issues. If we receive user reports of issues that are root-caused to this change, we may have to revert it with context.

@moocowsheep moocowsheep deleted the prioritize-cuda-playback branch May 12, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement to existing functionality Seeking Testers Build artifacts on CI Windows Affects Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants