-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hardware video decoding #4839
Conversation
Doesn't affect SW decoder path
Some codecs (like MediaCodec on Android) don't set `best_effort_timestamp` for some reason.
I only ran into these while using MediaCodec decoder on Android. `avcodec_send_packet` would get stuck and always return EAGAIN because the buffered frames were never consumed. Fixing this loop fixed video playback but there were visual glitches, because if `avcodec_send_packet` failed, the packet's data would get overwritten on the next iteration of `decodeNextFrame` which created a gap in the video data that FFmpeg received.
Result is exactly the same but the used parameters should make more sense now.
Ran this on my linux box, it selected VDPAU and ran OK, so can probably cross that off the list. Works fine but I don't see a huge difference in the performance numbers I've looked at (mostly CPU and RAM, didn't check much else truth be told). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have spent an hour and a bit reading through it and for my untrained eye looks solid enough. A few things raised question marks when I was reading but then at the end after reading the whole thing kinda clicked into place. Interesting design.
@@ -8,5 +8,31 @@ namespace osu.Framework.Graphics.Video | |||
internal static class FfmpegExtensions | |||
{ | |||
internal static double GetValue(this AVRational rational) => rational.num / (double)rational.den; | |||
|
|||
internal static bool IsHardwarePixelFormat(this AVPixelFormat pixFmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone else reading this, I think this can be used to cross-reference: https://ffmpeg.org/doxygen/4.1/pixfmt_8h_source.html
Seems like all formats were caught except maybe AVPixelFormat.AV_PIX_FMT_XVMC
, but that one is apparently pretty old/obsoleted by VDPAU/VA-API so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, I added it to the list. In reality most of these aren't really required but I added them to live up to the function's name.
if (targetHwDecoders.Count == 0) | ||
break; | ||
|
||
// Note: Intersect order here is important, order of the returned elements is determined by the first enumerable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth this is documented behaviour so probably safe to rely on it...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure that it was at least documented but it's still not ideal. I'll change it to proper sort. It should also sort the entire codecs
list instead of just the devices in each codec separately.
Are you ok with sorting it by the enum value (where lowest = "best" device)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting by enum value sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up with a separate Comparer as otherwise I would have to shuffle between AVHWDeviceType
and HardwareVideoDecoder
.
The way it's written now should also move Quick Sync higher than DXVA, which the previous sorting didn't do correctly.
Can confirm this is working on macOS. I think it's probably fine to have it on by default for desktops? That said, wondering if we want to add a framework-level configuration setting for this (a simple bool). Quite feasible that this would be a user checkbox at osu!'s side? At which point we may also want to consider the ability to toggle the decoder status without recreating the decoder. Maybe that's over-complicating things and it should be always-on, as there are few drawbacks of having it enabled when available? |
There is one already in this pull - mind you, it's not really a simple bool, but a |
Right, specifically the bindable part then, which isn't currently implemented to update on the fly. |
I wasn't sure if it was possible to change the FFmpeg decoder mid-playback, but it turned out to be pretty simple to implement , so now it's properly bound to the config. But I'll note that the video lags for a short period when changing it + it sometimes spits out some warnings, until the new decoder gets enough of the video stream. I added a toggle for this to the tests so you can try it from there. |
That's pretty amazing, I was expecting us to have to recreate the decoder object completely. |
Recreating the whole VideoDecoder was the first thing I tried but that didn't work at all, presumably because FFmpeg needs to read some headers at the start of the file.
Sorry, I missed this yesterday. I was mainly concerned with the increased RAM usage for little to no benefit with lower bitrate videos using NVDEC (it adds ~100MB). But it seems that other APIs don't have this "problem" and as some people like to say, unused RAM is wasted RAM. So I enabled it by default for all platforms, after all, people can always turn it off. |
Have checked the mid-playback decoder switch and it does look to be working pretty great, aside from the mentioned hitch immediately after switching. I've also tested semi-experimentally how it would behave if the HW codec failed to initialise by applying the following: diff --git a/osu.Framework/Graphics/Video/VideoDecoder.cs b/osu.Framework/Graphics/Video/VideoDecoder.cs
index 65af89ec9c..1fa5ec9f69 100644
--- a/osu.Framework/Graphics/Video/VideoDecoder.cs
+++ b/osu.Framework/Graphics/Video/VideoDecoder.cs
@@ -137,6 +137,9 @@ public VideoDecoder(Stream videoStream)
if (formatContext == null)
return;
+ if (e.NewValue == HardwareVideoDecoder.Any)
+ decoderCommands.Enqueue(() => throw new InvalidOperationException());
+
decoderCommands.Enqueue(recreateCodecContext);
});
} The good part is that it doesn't crash, the bad part is that it leaves the decoder in a faulted state. But I think that's fine, being able to hotswap in the first place is already quite something. I'd leave that be until it is deemed a problem. |
It should (hopefully) happen very rarely as it should always at least fallback to SW decoding when everything else fails.
|
Closes #4079
Performance gains are extremely variable between different machines but in general they're smaller than I expected.
On a Windows / Linux
On high bitrate / high resolution videos it's definitely noticeable, HW decoders have no problem with these while the SW one sometimes can't maintain 1x speed. Here's an example of a 4K 60fps video playback:
vid.mp4
The SW one is slightly laggy while NVDEC runs at 2x speed without a problem (slowed down for the comparison). This is on a GTX 1060 6GB, so nothing cutting edge.
However, with lower bitrate videos I was getting pretty much the same CPU usage, but some HW decoders usually used more RAM (especially NVDEC).
On Android
I was testing playback of the default test video that's in visual tests on an old Samsung Galaxy S6 Edge. Decoding with MediaCodec used ~3% less total CPU and RAM usage was pretty much the same. Not great, not terrible.
Based on this testing I decided to default desktop platforms to not use HW decoders but enabled them by default on Android and iOS.
I don't have the hardware to test every HW decoder but the list bellow contains the ones that I could test so far. If anyone has access to machines that support any of these, please share if it works for you (and ideally how well/badly)!
Windows
Linux
Android
macOS
iOS
HW decoder selection
Automatic HW decoder selection is implemented the simplest way possible, get what decoders are available and then try them one by one until something sticks. This has the added benefit that it will automatically fallback to SW decoding if everything else fails.
Possible performance improvements
AFAIK all hardware decoders return frames in NV12 pixel format but the current rendering is set up to consume YUV420P. This means that all HW decoded frames have to go through a format conversion on the CPU. From my testing, this adds roughly 1% total CPU usage on desktop and ~5% on Android. It also adds a bit of RAM usage.
I plan to address this in a separate PR, without affecting the SW decoding path, but here's a quick and dirty patch if you'd like to try this change now:
Patch