Add support for hardware video decoding #4839

Opelkuh · 2021-10-24T23:37:15Z

Performance gains are extremely variable between different machines but in general they're smaller than I expected.

On a Windows / Linux

On high bitrate / high resolution videos it's definitely noticeable, HW decoders have no problem with these while the SW one sometimes can't maintain 1x speed. Here's an example of a 4K 60fps video playback:

vid.mp4

The SW one is slightly laggy while NVDEC runs at 2x speed without a problem (slowed down for the comparison). This is on a GTX 1060 6GB, so nothing cutting edge.

However, with lower bitrate videos I was getting pretty much the same CPU usage, but some HW decoders usually used more RAM (especially NVDEC).

On Android

I was testing playback of the default test video that's in visual tests on an old Samsung Galaxy S6 Edge. Decoding with MediaCodec used ~3% less total CPU and RAM usage was pretty much the same. Not great, not terrible.

Based on this testing I decided to default desktop platforms to not use HW decoders but enabled them by default on Android and iOS.

I don't have the hardware to test every HW decoder but the list bellow contains the ones that I could test so far. If anyone has access to machines that support any of these, please share if it works for you (and ideally how well/badly)!

Windows

NVDEC
Intel Quick Sync Video
DirectX Video Acceleration 2

Linux

VDPAU
VA-API

Android

MediaCodec

macOS

VideoToolbox

iOS

VideoToolbox

HW decoder selection

Automatic HW decoder selection is implemented the simplest way possible, get what decoders are available and then try them one by one until something sticks. This has the added benefit that it will automatically fallback to SW decoding if everything else fails.

Possible performance improvements

AFAIK all hardware decoders return frames in NV12 pixel format but the current rendering is set up to consume YUV420P. This means that all HW decoded frames have to go through a format conversion on the CPU. From my testing, this adds roughly 1% total CPU usage on desktop and ~5% on Android. It also adds a bit of RAM usage.

I plan to address this in a separate PR, without affecting the SW decoding path, but here's a quick and dirty patch if you'd like to try this change now:

Patch

Index: osu.Framework/Graphics/Video/VideoSpriteDrawNode.cs
===================================================================
diff --git a/osu.Framework/Graphics/Video/VideoSpriteDrawNode.cs b/osu.Framework/Graphics/Video/VideoSpriteDrawNode.cs
--- a/osu.Framework/Graphics/Video/VideoSpriteDrawNode.cs	(revision bf0a870fc35278929759d8586942bcd9ebfa67ef)
+++ b/osu.Framework/Graphics/Video/VideoSpriteDrawNode.cs	(date 1635037508888)
@@ -18,13 +18,12 @@
             video = source;
         }
 
-        private int yLoc, uLoc = 1, vLoc = 2;
+        private int yLoc, uvLoc = 1;
 
         public override void Draw(Action<TexturedVertex2D> vertexAction)
         {
             Shader.GetUniform<int>("m_SamplerY").UpdateValue(ref yLoc);
-            Shader.GetUniform<int>("m_SamplerU").UpdateValue(ref uLoc);
-            Shader.GetUniform<int>("m_SamplerV").UpdateValue(ref vLoc);
+            Shader.GetUniform<int>("m_SamplerUV").UpdateValue(ref uvLoc);
 
             var yuvCoeff = video.ConversionMatrix;
             Shader.GetUniform<Matrix3>("yuvCoeff").UpdateValue(ref yuvCoeff);
Index: osu.Framework/Resources/Shaders/sh_yuv2rgb.h
===================================================================
diff --git a/osu.Framework/Resources/Shaders/sh_yuv2rgb.h b/osu.Framework/Resources/Shaders/sh_yuv2rgb.h
--- a/osu.Framework/Resources/Shaders/sh_yuv2rgb.h	(revision bf0a870fc35278929759d8586942bcd9ebfa67ef)
+++ b/osu.Framework/Resources/Shaders/sh_yuv2rgb.h	(date 1635037495377)
@@ -1,8 +1,7 @@
 #include "sh_TextureWrapping.h"
 
 uniform sampler2D m_SamplerY;
-uniform sampler2D m_SamplerU;
-uniform sampler2D m_SamplerV;
+uniform sampler2D m_SamplerUV;
 
 uniform mediump mat3 yuvCoeff;
 
@@ -16,7 +15,7 @@
         return vec4(0.0);
 
     lowp float y = texture2D(m_SamplerY, wrappedCoord, lodBias).r;
-    lowp float u = texture2D(m_SamplerU, wrappedCoord, lodBias).r;
-    lowp float v = texture2D(m_SamplerV, wrappedCoord, lodBias).r;
+    lowp float u = texture2D(m_SamplerUV, wrappedCoord, lodBias).r;
+    lowp float v = texture2D(m_SamplerUV, wrappedCoord, lodBias).g;
     return vec4(yuvCoeff * (vec3(y, u, v) + offsets), 1.0);
 }
Index: osu.Framework/Graphics/Video/VideoDecoder.cs
===================================================================
diff --git a/osu.Framework/Graphics/Video/VideoDecoder.cs b/osu.Framework/Graphics/Video/VideoDecoder.cs
--- a/osu.Framework/Graphics/Video/VideoDecoder.cs	(revision bf0a870fc35278929759d8586942bcd9ebfa67ef)
+++ b/osu.Framework/Graphics/Video/VideoDecoder.cs	(date 1635037362441)
@@ -618,7 +618,7 @@
                 lastDecodedFrameTime = (float)frameTime;
 
                 // Note: this is the pixel format that `VideoTexture` expects internally
-                frame = ensureFramePixelFormat(frame, AVPixelFormat.AV_PIX_FMT_YUV420P);
+                frame = ensureFramePixelFormat(frame, AVPixelFormat.AV_PIX_FMT_NV12);
                 if (frame == null)
                     continue;
 
Index: osu.Framework/Graphics/Video/VideoTexture.cs
===================================================================
diff --git a/osu.Framework/Graphics/Video/VideoTexture.cs b/osu.Framework/Graphics/Video/VideoTexture.cs
--- a/osu.Framework/Graphics/Video/VideoTexture.cs	(revision bf0a870fc35278929759d8586942bcd9ebfa67ef)
+++ b/osu.Framework/Graphics/Video/VideoTexture.cs	(date 1635037660628)
@@ -78,7 +78,7 @@
                 Debug.Assert(memoryLease == null);
                 memoryLease = NativeMemoryTracker.AddMemory(this, Width * Height * 3 / 2);
 
-                textureIds = new int[3];
+                textureIds = new int[2];
                 GL.GenTextures(textureIds.Length, textureIds);
 
                 for (int i = 0; i < textureIds.Length; i++)
@@ -99,7 +99,7 @@
                         int width = (videoUpload.Frame->width + 1) / 2;
                         int height = (videoUpload.Frame->height + 1) / 2;
 
-                        GL.TexImage2D(TextureTarget2d.Texture2D, 0, TextureComponentCount.R8, width, height, 0, PixelFormat.Red, PixelType.UnsignedByte, IntPtr.Zero);
+                        GL.TexImage2D(TextureTarget2d.Texture2D, 0, TextureComponentCount.Rg8, width, height, 0, PixelFormat.Rg, PixelType.UnsignedByte, IntPtr.Zero);
 
                         textureSize += width * height;
                     }
@@ -116,12 +116,9 @@
             {
                 GLWrapper.BindTexture(textureIds[i]);
 
-                GL.PixelStore(PixelStoreParameter.UnpackRowLength, videoUpload.Frame->linesize[(uint)i]);
-                GL.TexSubImage2D(TextureTarget2d.Texture2D, 0, 0, 0, videoUpload.Frame->width / (i > 0 ? 2 : 1), videoUpload.Frame->height / (i > 0 ? 2 : 1),
-                    PixelFormat.Red, PixelType.UnsignedByte, (IntPtr)videoUpload.Frame->data[(uint)i]);
+                GL.TexSubImage2D(TextureTarget2d.Texture2D, 0, 0, 0, videoUpload.Frame->width / (i + 1), videoUpload.Frame->height / (i + 1),
+                    i == 0 ? PixelFormat.Red : PixelFormat.Rg, PixelType.UnsignedByte, (IntPtr)videoUpload.Frame->data[(uint)i]);
             }
-
-            GL.PixelStore(PixelStoreParameter.UnpackRowLength, 0);
 
             UploadComplete = true;
         }

Doesn't affect SW decoder path

Some codecs (like MediaCodec on Android) don't set `best_effort_timestamp` for some reason.

I only ran into these while using MediaCodec decoder on Android. `avcodec_send_packet` would get stuck and always return EAGAIN because the buffered frames were never consumed. Fixing this loop fixed video playback but there were visual glitches, because if `avcodec_send_packet` failed, the packet's data would get overwritten on the next iteration of `decodeNextFrame` which created a gap in the video data that FFmpeg received.

Result is exactly the same but the used parameters should make more sense now.

Wieku · 2021-10-26T13:01:28Z

I haven't looked in depth into this, but on my project I noticed a significant increase in color conversion speed by using Google's libyuv library over ffmpeg's swscale

Screenshot compares 1080p60 encoding using libyuv for rgb->i420 (on the left) conversion instead of libswscale (right):

bdach · 2021-10-26T17:30:14Z

Ran this on my linux box, it selected VDPAU and ran OK, so can probably cross that off the list. Works fine but I don't see a huge difference in the performance numbers I've looked at (mostly CPU and RAM, didn't check much else truth be told).

bdach

Have spent an hour and a bit reading through it and for my untrained eye looks solid enough. A few things raised question marks when I was reading but then at the end after reading the whole thing kinda clicked into place. Interesting design.

osu.Framework/Graphics/Video/FFmpegFuncs.cs

bdach · 2021-10-26T18:12:49Z

osu.Framework/Graphics/Video/FfmpegExtensions.cs

@@ -8,5 +8,31 @@ namespace osu.Framework.Graphics.Video
    internal static class FfmpegExtensions
    {
        internal static double GetValue(this AVRational rational) => rational.num / (double)rational.den;
+
+        internal static bool IsHardwarePixelFormat(this AVPixelFormat pixFmt)


For anyone else reading this, I think this can be used to cross-reference: https://ffmpeg.org/doxygen/4.1/pixfmt_8h_source.html

Seems like all formats were caught except maybe AVPixelFormat.AV_PIX_FMT_XVMC, but that one is apparently pretty old/obsoleted by VDPAU/VA-API so...

Nice catch, I added it to the list. In reality most of these aren't really required but I added them to live up to the function's name.

osu.Framework/Graphics/Video/HardwareVideoDecoder.cs

bdach · 2021-10-26T18:52:49Z

osu.Framework/Graphics/Video/VideoDecoder.cs

+                if (targetHwDecoders.Count == 0)
+                    break;
+
+                // Note: Intersect order here is important, order of the returned elements is determined by the first enumerable.


For what it's worth this is documented behaviour so probably safe to rely on it...?

I made sure that it was at least documented but it's still not ideal. I'll change it to proper sort. It should also sort the entire codecs list instead of just the devices in each codec separately.

Are you ok with sorting it by the enum value (where lowest = "best" device)?

Sorting by enum value sounds good.

I ended up with a separate Comparer as otherwise I would have to shuffle between AVHWDeviceType and HardwareVideoDecoder.
The way it's written now should also move Quick Sync higher than DXVA, which the previous sorting didn't do correctly.

osu.Framework/Graphics/Video/VideoDecoder.cs

peppy · 2021-10-27T02:06:44Z

Can confirm this is working on macOS. I think it's probably fine to have it on by default for desktops?

That said, wondering if we want to add a framework-level configuration setting for this (a simple bool). Quite feasible that this would be a user checkbox at osu!'s side? At which point we may also want to consider the ability to toggle the decoder status without recreating the decoder.

Maybe that's over-complicating things and it should be always-on, as there are few drawbacks of having it enabled when available?

bdach · 2021-10-27T04:34:16Z

wondering if we want to add a framework-level configuration setting for this (a simple bool)

There is one already in this pull - mind you, it's not really a simple bool, but a HardwareVideoDecoder flag enum value, but it can work either way (as is you could toggle that setting between None and Any).

peppy · 2021-10-27T04:48:33Z

Right, specifically the bindable part then, which isn't currently implemented to update on the fly.

Opelkuh · 2021-10-28T01:48:54Z

I wasn't sure if it was possible to change the FFmpeg decoder mid-playback, but it turned out to be pretty simple to implement , so now it's properly bound to the config. But I'll note that the video lags for a short period when changing it + it sometimes spits out some warnings, until the new decoder gets enough of the video stream.

I added a toggle for this to the tests so you can try it from there.

peppy · 2021-10-28T03:33:11Z

That's pretty amazing, I was expecting us to have to recreate the decoder object completely.

Opelkuh · 2021-10-28T12:31:42Z

Recreating the whole VideoDecoder was the first thing I tried but that didn't work at all, presumably because FFmpeg needs to read some headers at the start of the file.

I think it's probably fine to have it on by default for desktops?

Sorry, I missed this yesterday. I was mainly concerned with the increased RAM usage for little to no benefit with lower bitrate videos using NVDEC (it adds ~100MB). But it seems that other APIs don't have this "problem" and as some people like to say, unused RAM is wasted RAM. So I enabled it by default for all platforms, after all, people can always turn it off.

bdach · 2021-10-28T19:20:48Z

Have checked the mid-playback decoder switch and it does look to be working pretty great, aside from the mentioned hitch immediately after switching. I've also tested semi-experimentally how it would behave if the HW codec failed to initialise by applying the following:

diff --git a/osu.Framework/Graphics/Video/VideoDecoder.cs b/osu.Framework/Graphics/Video/VideoDecoder.cs
index 65af89ec9c..1fa5ec9f69 100644
--- a/osu.Framework/Graphics/Video/VideoDecoder.cs
+++ b/osu.Framework/Graphics/Video/VideoDecoder.cs
@@ -137,6 +137,9 @@ public VideoDecoder(Stream videoStream)
                 if (formatContext == null)
                     return;
 
+                if (e.NewValue == HardwareVideoDecoder.Any)
+                    decoderCommands.Enqueue(() => throw new InvalidOperationException());
+
                 decoderCommands.Enqueue(recreateCodecContext);
             });
         }

The good part is that it doesn't crash, the bad part is that it leaves the decoder in a faulted state. But I think that's fine, being able to hotswap in the first place is already quite something. I'd leave that be until it is deemed a problem.

Opelkuh · 2021-10-28T20:37:01Z

It should (hopefully) happen very rarely as it should always at least fallback to SW decoding when everything else fails.

osu-framework/osu.Framework/Graphics/Video/VideoDecoder.cs

Line 356 in 1cd2ff3

codecs.Add((firstCodec, AVHWDeviceType.AV_HWDEVICE_TYPE_NONE));

… step

Opelkuh added 30 commits October 1, 2021 22:38

Add video hardware decoder support

d9dfe68

Reuse AVFrame instances

75ba55a

Code style stuff

8abbbd2

Fix memory leak after seek

6314d4e

Reduce AVFrame allocations

69854ad

Doesn't affect SW decoder path

Add new error handlers

d9f75b8

Rename variable

f700525

Fix frame memory leaks during disposal

5090112

Fix ConversionMatrix null

0b07e5e

Fix Android and iOS video decoders

0e42fbc

Use pts instead of best_effort_timestamp

4591248

Some codecs (like MediaCodec on Android) don't set `best_effort_timestamp` for some reason.

Rework decoder selection and make used HW devices configurable

cf106de

Fix decoder selection not ignoring non-HW decoders when requested

a80a0c5

Add Android hardware video decoding

9e082ef

Move isHwPixelFormat to extensions

8487fa2

Use FFmpeg's sws_getCachedContext instead of custom function

8c9aa34

Rework Frame struct

6cce065

Localize constant

44116ba

Move FFmpegCodec into it's own class

8936714

Improve prepareDecoding logging

a0c4bb6

Skip opening decoder if all HW context creation attempts fail

e91c44a

Add support for Intel Quick Sync Video

4c57414

Fix potential memory leak in scaler frame buffer allocation

55707a2

Split scaler from readDecodedFrames

d0ddafb

Fix code inspection

8824d15

Change scaler parameters

3d44f98

Result is exactly the same but the used parameters should make more sense now.

Fix build on Android and iOS

862897f

Use debug assert instead of exception

f73574f

Make VideoToolbox the default HW decoder on iOS

bf0a870

bdach added the area:video label Oct 26, 2021

bdach reviewed Oct 26, 2021

View reviewed changes

Opelkuh added 3 commits October 27, 2021 00:51

Add missing HW pixel format

5366973

Remove UnmanagedType casts

31b0c22

Fix misspelled FFmpeg

e094240

Opelkuh added 3 commits October 28, 2021 00:27

Make hardware video decoders setting bindable

2ea1059

Add HW decoding toggle to tests

7bcff92

Rework decoder selection sorting

9657784

Opelkuh added 2 commits October 28, 2021 14:07

Merge branch 'master' into vid-hw-decode

ebb5fb8

Enable HW decoding by default on all platforms

1cd2ff3

bdach previously approved these changes Oct 28, 2021

View reviewed changes

Move hardware encoding test to own test method to avoid isolated test…

cc3f3f8

… step

peppy dismissed bdach’s stale review via cc3f3f8 October 29, 2021 01:41

peppy approved these changes Oct 29, 2021

View reviewed changes

peppy enabled auto-merge October 29, 2021 01:44

peppy merged commit 0ec0502 into ppy:master Oct 29, 2021

Opelkuh deleted the vid-hw-decode branch October 29, 2021 11:54

Opelkuh mentioned this pull request Oct 29, 2021

Add setting for hardware video acceleration ppy/osu#15371

Merged

nekodex mentioned this pull request Mar 9, 2022

Upgrade FFmpeg libraries to 5.0 #5051

Open

2 tasks

turbedi mentioned this pull request Dec 29, 2022

Beatmap video flickering on low powered devices (decode performance) ppy/osu#14544

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for hardware video decoding #4839

Add support for hardware video decoding #4839

Opelkuh commented Oct 24, 2021 •

edited by peppy

Loading

Wieku commented Oct 26, 2021 •

edited

Loading

bdach commented Oct 26, 2021

bdach left a comment

bdach Oct 26, 2021

Opelkuh Oct 26, 2021

bdach Oct 26, 2021

Opelkuh Oct 26, 2021

bdach Oct 27, 2021

Opelkuh Oct 28, 2021

peppy commented Oct 27, 2021

bdach commented Oct 27, 2021 •

edited

Loading

peppy commented Oct 27, 2021

Opelkuh commented Oct 28, 2021

peppy commented Oct 28, 2021

Opelkuh commented Oct 28, 2021

bdach commented Oct 28, 2021

Opelkuh commented Oct 28, 2021

Add support for hardware video decoding #4839

Add support for hardware video decoding #4839

Conversation

Opelkuh commented Oct 24, 2021 • edited by peppy Loading

On a Windows / Linux

On Android

Windows

Linux

Android

macOS

iOS

HW decoder selection

Possible performance improvements

Wieku commented Oct 26, 2021 • edited Loading

bdach commented Oct 26, 2021

bdach left a comment

Choose a reason for hiding this comment

bdach Oct 26, 2021

Choose a reason for hiding this comment

Opelkuh Oct 26, 2021

Choose a reason for hiding this comment

bdach Oct 26, 2021

Choose a reason for hiding this comment

Opelkuh Oct 26, 2021

Choose a reason for hiding this comment

bdach Oct 27, 2021

Choose a reason for hiding this comment

Opelkuh Oct 28, 2021

Choose a reason for hiding this comment

peppy commented Oct 27, 2021

bdach commented Oct 27, 2021 • edited Loading

peppy commented Oct 27, 2021

Opelkuh commented Oct 28, 2021

peppy commented Oct 28, 2021

Opelkuh commented Oct 28, 2021

bdach commented Oct 28, 2021

Opelkuh commented Oct 28, 2021

Opelkuh commented Oct 24, 2021 •

edited by peppy

Loading

Wieku commented Oct 26, 2021 •

edited

Loading

bdach commented Oct 27, 2021 •

edited

Loading