feat(server): fully accelerated nvenc #9452

mertalev · 2024-05-14T00:36:00Z

Description

Edit: An earlier version of this PR elected to make a breaking change here, but after some consideration I decided against it. The PR now makes hardware decoding opt-in and existing setups will continue to work. There is a new toggle for whether to use hardware decoding, applicable to NVENC and RKMPP. Since it defaults to false, the behavior is the same as current for NVENC. RKMPP will be downgraded to software decoding until the admin enables hardware decoding.

This is a smaller version of #9402 that only changes the behavior for NVENC. That PR aimed to streamline the decoding and filtering process to one pipeline by leveraging libplacebo and Vulkan's cross-device capabilities. However, this is premature due to the following reasons:

Vulkan is not supported on RKMPP, so a separate pipeline is still necessary for end-to-end acceleration.
There is a roughly 30% speed penalty on CPU compared to the more traditional pipeline, likely overhead from uploading frames to and from Vulkan.
Vulkan on FFmpeg is a very active area of development, and we are not able to upgrade to the newest and most feature-complete versions of FFmpeg while Jellyfin is still on 6.0. This limits the APIs and devices that can use Vulkan.
After speaking with Jellyfin devs, they strongly recommended against relying on it too heavily due to an above-average rate of breaking changes and poor backwards compatibility (hitting Windows primarily, but also affecting some Intel devices on Linux).

Vulkan works very well with Nvidia from my testing and has reasonable backwards compatibility (9xx series onward), so it's fine to use it here. Maybe someday it can be used more extensively, but in the meantime it's similar to this XKCD.

Testing

Tested transcoding a video on NVENC and CPU with tone-mapping enabled and disabled, confirming success logs and confirming the video plays (with browser caching disabled to ensure the video is up-to-date).

jrasm91

Was this just for testing?

server/Dockerfile

cloudflare-workers-and-pages · 2024-05-14T01:05:07Z

Deploying immich with Cloudflare Pages

Latest commit:	`75906c1`
Status:	✅ Deploy successful!
Preview URL:	https://6541231f.immich.pages.dev
Branch Preview URL:	https://feat-server-hw-decoding-no-t.immich.pages.dev

View logs

docker/hwaccel.transcoding.yml

mertalev · 2024-05-15T01:01:44Z

I decided to add the hardware decoding toggle after all. The new command may not work with older kernels or drivers, and it's useful for debugging purposes. Overall, this takes it from being a change I'm slightly nervous about to one I can confidently say is fine.

nyanmisaka · 2024-05-15T07:31:44Z

Just a FYI, jellyfin-ffmpeg includes our homemade native filter tonemap_cuda. It has lower overhead than hwupload+libplacebo impl (extra semaphore required for interop between Cuda<->Vulkan) and performs better on entry-level nVidia GPUs, although it only has the most basic functionality. It also avoids dependence on Vulkan runtime, only Cuda runtime is enough. So this way you can avoid breaking changes introduced by Vulkan. Its usage is similar to the existing tonemap_opencl.

BTW for issue #9252, we also have transpose_{cuda,opencl} as well as vpp_{qsv,rkrga} and flip_vulkan filters. FFmpeg is not so smart that it does not automatically insert them in a full hardware pipeline, so the video captured by the GoPro can be upside down after transcoding.

zackpollard · 2024-05-15T11:47:51Z

I was going to give this a test but given the latest comment I will wait for @mertalev to give it another pass if he wants to make changes related to that comment.

mertalev · 2024-05-15T13:29:43Z

Thanks for the tips, @nyanmisaka! I didn't know about tonemap_cuda or the other filters you mentioned. I'll try using it here.

fyfrey · 2024-05-15T14:54:25Z

@mertalev let me know if (and when) I should test this on my RK3588 device

mertalev · 2024-05-15T18:40:32Z

@nyanmisaka I did some testing and tonemap_cuda is indeed faster (106s vs 97s), but the resulting colors seem off compared to libplacebo and the source. Do you have any thoughts why that might be?

Test video (downloaded with youtube-dl)

tonemap_cuda command:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -noautorotate -threads 1 -i HDR.mkv \
-tune hq -qmin 0 -rc-lookahead 20 -i_qfactor 0.75 -c:v av1_nvenc -c:a aac -movflags faststart \
-fps_mode passthrough -map 0:0 -map 0:1 -g 256 -temporal-aq 1 -v verbose -preset p1 -cq:v 40 \
-vf scale_cuda=-2:720,tonemap_cuda=matrix=bt709:primaries=bt709:range=pc:tonemap=hable:transfer=bt709:format=nv12 \
SDR.mp4

libplacebo command:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -noautorotate -threads 1 -i HDR.mkv \
-tune hq -qmin 0 -rc-lookahead 20 -i_qfactor 0.75 -c:v av1_nvenc -c:a aac -movflags faststart \
-fps_mode passthrough -map 0:0 -map 0:1 -g 256 -temporal-aq 1 -v verbose -preset p1 -cq:v 40 \
-vf scale_cuda=-2:720,hwupload=derive_device=vulkan,libplacebo=color_primaries=bt709:color_trc=bt709:colorspace=bt709:downscaler=none:format=yuv420p:tonemapping=hable:upscaler=none,hwupload=derive_device=cuda \
SDR.mp4

Performance results:

Disabled means software decoding + tone-mapping with zscale before hardware encoding

libplacebo (1:06 mark):

tonemap_cuda:

Also wow, I was not expecting that big of a gap between software and hardware decoding / tone-mapping.

Edit: I noticed there are some artifacts in the libplacebo image (lower right, middle left, top left). Looks like there's a bug when used in tandem with -temporal-aq. This is what it looks like without that flag (and debanding enabled):

Interestingly, both filters also have distorted colors before a scene change when -temporal-aq is used.

mertalev · 2024-05-15T22:13:06Z

@fyfrey There shouldn't be any other changes to the RKMPP code so feel free to test it whenever you're able! It basically makes the software decoding variant the default, but the commands are otherwise the same.

update dockerfile

…and rkmpp

nyanmisaka · 2024-05-16T02:28:40Z

@mertalev The image produced by tonemap_cuda is less saturated because the filter option desat is not turned off. Just like tonemap_opencl, it imposes a fixed value for desaturation.

The complete filter options can be queried through the following command.

./ffmpeg -hide_banner -h filter=tonemap_cuda
Filter tonemap_cuda
  GPU accelerated HDR to SDR tonemapping
    Inputs:
       #0: default (video)
    Outputs:
       #0: default (video)
tonemap_cuda AVOptions:
   tonemap           <int>        ..FV....... Tonemap algorithm selection (from 0 to 7) (default none)
     none            0            ..FV.......
     linear          1            ..FV.......
     gamma           2            ..FV.......
     clip            3            ..FV.......
     reinhard        4            ..FV.......
     hable           5            ..FV.......
     mobius          6            ..FV.......
     bt2390          7            ..FV.......
   tonemap_mode      <int>        ..FV....... Tonemap mode selection (from 0 to 1) (default max)
     max             0            ..FV.......
     rgb             1            ..FV.......
   transfer          <int>        ..FV....... Set transfer characteristic (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          14           ..FV.......
     smpte2084       16           ..FV.......
   t                 <int>        ..FV....... Set transfer characteristic (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          14           ..FV.......
     smpte2084       16           ..FV.......
   matrix            <int>        ..FV....... Set colorspace matrix (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          9            ..FV.......
   m                 <int>        ..FV....... Set colorspace matrix (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          9            ..FV.......
   primaries         <int>        ..FV....... Set color primaries (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          9            ..FV.......
   p                 <int>        ..FV....... Set color primaries (from -1 to INT_MAX) (default bt709)
     bt709           1            ..FV.......
     bt2020          9            ..FV.......
   range             <int>        ..FV....... Set color range (from -1 to INT_MAX) (default tv)
     tv              1            ..FV.......
     pc              2            ..FV.......
     limited         1            ..FV.......
     full            2            ..FV.......
   r                 <int>        ..FV....... Set color range (from -1 to INT_MAX) (default tv)
     tv              1            ..FV.......
     pc              2            ..FV.......
     limited         1            ..FV.......
     full            2            ..FV.......
   format            <string>     ..FV....... Output format (default "same")
   apply_dovi        <boolean>    ..FV....... Apply Dolby Vision metadata if possible (default true)
   tradeoff          <int>        ..FV....... Apply tradeoffs to offload computing (from -1 to 1) (default auto)
     auto            -1           ..FV.......
     disabled        0            ..FV.......
     enabled         1            ..FV.......
   peak              <double>     ..FV....... Signal peak override (from 0 to DBL_MAX) (default 0)
   param             <double>     ..FV....... Tonemap parameter (from DBL_MIN to DBL_MAX) (default nan)
   desat             <double>     ..FV....... Desaturation parameter (from 0 to DBL_MAX) (default 0.5)
   threshold         <double>     ..FV....... Scene detection threshold (from 0 to DBL_MAX) (default 0.2)

As for the artifacts caused by the -temporal-aq option of the av1_nvenc encoder, I think it is an internal issue of the NVENC hardware encoder and has nothing to do with the tonemap filter. You can use hwdownload to save that frame as raw YUV and check it with the YUView tool. ... -vf scale_cuda=...,tonemap_cuda=...,hwdownload,format=nv12 -f rawvideo /path/to/raw.yuv

mertalev · 2024-05-16T04:08:37Z

@nyanmisaka Thanks, disabling de-saturation did the trick! The colors in the OpenCL image actually look closer to the HDR video than with libplacebo now.

With that change, I think this PR is ready for final review.

nyanmisaka · 2024-05-16T05:28:59Z

@nyanmisaka Thanks, disabling de-saturation did the trick! The colors in the OpenCL image actually look closer to the HDR video than with libplacebo now.

In fact, tone-mapping is a lossy process. Therefore there is no completely accurate result. Personal preference also plays an important role here.

As for the performance difference, it is not much different on beefy GPUs such as RTX. It is more obvious on weak GPUs such as GTX1050.

zackpollard

LGTM, if you've tested nvidia then I am happy for this to be merged, would just be good to get a test of RKMP although it seems that is effectively the same as before with just some code moved around to account for the new hw decode toggle

mertalev requested review from danieldietzler and bo0tzz as code owners May 14, 2024 00:36

mertalev mentioned this pull request May 14, 2024

feat(server): hardware decoding, libplacebo for tone-mapping #9402

Closed

mertalev added the 🗄️server label May 14, 2024

mertalev changed the title ~~feat(server): fully accelerated nvenc~~ feat(server)!: fully accelerated nvenc May 14, 2024

jrasm91 reviewed May 14, 2024

View reviewed changes

server/Dockerfile Outdated Show resolved Hide resolved

mertalev force-pushed the feat/server-hw-decoding-no-toggle branch from 98edc66 to d5ace68 Compare May 14, 2024 01:05

mertalev force-pushed the feat/server-hw-decoding-no-toggle branch from eb495c8 to 60aadf3 Compare May 14, 2024 14:19

danieldietzler approved these changes May 14, 2024

View reviewed changes

docker/hwaccel.transcoding.yml Outdated Show resolved Hide resolved

mertalev force-pushed the feat/server-hw-decoding-no-toggle branch from 60aadf3 to ea33cd9 Compare May 15, 2024 00:55

mertalev changed the title ~~feat(server)!: fully accelerated nvenc~~ feat(server): fully accelerated nvenc May 15, 2024

mertalev added 10 commits May 15, 2024 21:07

use arrayContaining

ce20790

libplacebo for nvenc

b35eef9

update dockerfile

tweaks

43e07e7

update nvenc options

7f6d38f

tweak settings

ccdaa58

refactor

1131c19

toggle for hardware decoding, software / hardware decoding for nvenc …

46c257e

…and rkmpp

fix software tone-mapping not being applied

49da4df

separate configs for hw/sw

f4c4ef7

update api

1da7f48

mertalev added 6 commits May 15, 2024 21:15

add hw decode toggle

3dd3428

fix mutating config

70ea60d

remove version flag

e492a62

fix config type

0842012

remove submodule

0c2377f

handle temporal AQ

a6732cf

mertalev force-pushed the feat/server-hw-decoding-no-toggle branch from 54dbbf9 to a6732cf Compare May 16, 2024 01:24

remove duplicate tests

f56d4be

mertalev added 3 commits May 15, 2024 23:52

use tonemap_opencl

3f892b3

wording

1b6d9ef

update docs

75906c1

zackpollard approved these changes May 16, 2024

View reviewed changes

zackpollard added the feel-free-to-merge-👍 label May 16, 2024

mertalev merged commit d8eca16 into main May 16, 2024
24 checks passed

mertalev deleted the feat/server-hw-decoding-no-toggle branch May 16, 2024 17:30

mertalev mentioned this pull request May 16, 2024

Immich defaults to software encoding for remaining videos after a hardware accelerated encode fails. #9505

Closed

3 tasks

mertalev linked an issue May 16, 2024 that may be closed by this pull request

Immich defaults to software encoding for remaining videos after a hardware accelerated encode fails. #9505

Closed

3 tasks

mertalev mentioned this pull request May 26, 2024

HW acceleration seems to function poorly on RTX 3060 Ti #9780

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): fully accelerated nvenc #9452

feat(server): fully accelerated nvenc #9452

mertalev commented May 14, 2024 •

edited

Loading

jrasm91 left a comment

cloudflare-workers-and-pages bot commented May 14, 2024 •

edited

Loading

mertalev commented May 15, 2024

nyanmisaka commented May 15, 2024

zackpollard commented May 15, 2024

mertalev commented May 15, 2024

fyfrey commented May 15, 2024

mertalev commented May 15, 2024 •

edited

Loading

mertalev commented May 15, 2024

nyanmisaka commented May 16, 2024

mertalev commented May 16, 2024

nyanmisaka commented May 16, 2024

zackpollard left a comment

feat(server): fully accelerated nvenc #9452

feat(server): fully accelerated nvenc #9452

Conversation

mertalev commented May 14, 2024 • edited Loading

Description

Testing

jrasm91 left a comment

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented May 14, 2024 • edited Loading

Deploying immich with Cloudflare Pages

mertalev commented May 15, 2024

nyanmisaka commented May 15, 2024

zackpollard commented May 15, 2024

mertalev commented May 15, 2024

fyfrey commented May 15, 2024

mertalev commented May 15, 2024 • edited Loading

mertalev commented May 15, 2024

nyanmisaka commented May 16, 2024

mertalev commented May 16, 2024

nyanmisaka commented May 16, 2024

zackpollard left a comment

Choose a reason for hiding this comment

mertalev commented May 14, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented May 14, 2024 •

edited

Loading

mertalev commented May 15, 2024 •

edited

Loading