Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] hardware encoding #305

Open
iamjen023 opened this issue Jan 3, 2020 · 63 comments
Open

[Feature] hardware encoding #305

iamjen023 opened this issue Jan 3, 2020 · 63 comments
Labels
feature Pull requests that add a new feature

Comments

@iamjen023
Copy link

i would love for nvidia NVenc for tarnscoding and Generation
this would also work with amd and intel encoders
this could speed up the Generation process

@bnkai
Copy link
Collaborator

bnkai commented Jan 3, 2020

For the file transcoding we use x264 with the faster preset and thats probably the only place it might be quicker with nvenc (not sure about quality) BUT file transcoding imho is not that needed anymore since we now have live stream transcoding for unsupported files.( IMHO transcodes in the generated content section should be left unticked in 99% of the cases )

For live transcoding we have vp9 webm files that are not supported through hardware encoding except through intel vaapi i think and not sure about the stability / quality of that also.

Finally for the generated previews and markers we use x264 with preset veryslow to get the highest quality , since they are only generated once but viewed many times. If you wanted to make the generate faster thats where maybe we could opt to change the veryslow preset to medium or even fast and still get better quality/performance than hardware encoders. Thats ofcourse only for anyone that is willing to compromise the quality for speed and only as an extra selection not as the default.

@HASJ
Copy link

HASJ commented Aug 18, 2020

The only way live streaming would even remotely be viable here is by hardware acceleration. Software-bound encoding is a no-go.
VP9 is even worse. I am using a FX-6300. It was not optimized for these tasks, to put it kindly.
The people asking for this feature need this. They do not care about the fabled and scary quality loss.

@CenterThrowaway
Copy link

I'd add that after Pascal on the Nvidia side, hardware encoding with their GPUs is leaps and bounds better, comparable to CPU based x264 up to Medium presets I believe, while being much faster.

@praul
Copy link

praul commented Jan 4, 2021

I too would be very pleased about this feature. It does not have to be as user friendly as for example jellyfin with its hardware encoding support. It could be an advanced setting to add parameters for ffmpeg (playback/live transcoding or preview generation). If it causes problems, users could just set it back to default. But advanced users would be able to fiddle with it a litte more.

It's quite easy to pass vaapi support to docker containers, and hardware encoding would greatly benefit my high cpu loads.

@r3538987
Copy link
Contributor

r3538987 commented Jan 5, 2021

Can someone share command-line which is being used when generating such video previews.
Only thing I see at the moment is this when generation fails on WMVs sometimes.

"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\\Downloads\\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2 C:\\Users\\username\\.stash-data\\tmp\\preview013.mp4>: F:\\Downloads\\1.wmv: corrupt decoded frame in stream 1\r\n""
I would like to play around and at least see how GPU encode would help.

Took me 6 hours to generate video previews for 700 videos, each approximately 0,5-4GB.
20 pieces, 3% skip on both end, fast preset. i5 4460.

@ghost
Copy link

ghost commented Jan 5, 2021

Can someone share command-line which is being used when generating such video previews.
Only thing I see at the moment is this when generation fails on WMVs sometimes.

`"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\Downloads\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2

That is the command for generating a preview segment. In your case that's run 20 times for each video, with the result spliced together into the final preview. You can cut down on generation time by choosing fewer segments, and setting encoding preset to ultrafast.

We're investigating hardware acceleration for transcoding, but I have no idea if it's going to be useful for generation seeing as hw acceleration likely has more startup latency.

@r3538987
Copy link
Contributor

r3538987 commented Jan 5, 2021

Just tried NVENC in handbrake app just to see difference on some random file.
After 1 minute CPU encoded simple 220MB 720p WMV file only 1min30sec far.
In comparison RTX2070S managed to encode entire 5 minutes video in these 1 minute restrictions.

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

@HASJ
Copy link

HASJ commented Jan 11, 2021

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

Seconded.

@praul
Copy link

praul commented Feb 9, 2021

Any news on this?
This is how jellyfin handles gpu transcoding gui-wise
image

And this is the ffmpeg command
ffmpeg -vaapi_device /dev/dri/renderD128 -i file:"INPUT.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_vaapi -b:v 6621920 -maxrate 6621920 -bufsize 13243840 -force_key_frames:0 "expr:gte(t,0+n_forced*3)" -g 72 -keyint_min 72 -sc_threshold 0 -vf "format=nv12|vaapi,hwupload,scale_vaapi=w=1022:h=574:format=nv12" -start_at_zero -vsync -1 -codec:a:0 aac -ac 6 -ab 256000 -copyts -avoid_negative_ts disabled -f hls -max_delay 5000000 -hls_time 3 -individual_header_trailer 0 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "OUTPUT.ts" -hls_playlist_type vod -hls_list_size 0 -y "SOMEPLAYLISTIDONTKNOW.m3u8"

It is very performant and easy on the cpu

@bnkai
Copy link
Collaborator

bnkai commented Feb 9, 2021

Jellyfin uses a different player so hls is supported, thats not the case for stash as jwplayers hls support depends on the browser afaik. This issue makes it more complicated to adapt to.

@reduych
Copy link

reduych commented Jun 21, 2021

For generating previews I found that this really doesn't help much. Since previews are converted only 0.75 seconds at a time, the overhead of creating and concatenating (twelve 0.75 clips) is probably a lot more than generating these individual bursts. Here's what my GPU graph looked like - notice only very sparse spikes of usage (as opposed to continuous usage when converting larger files), even with 12 parallel tasks, while the CPU was still 100% all the time (doing the preparing, other processing). Overall it did not help much.

If anyone wants to test change "-c:v", "libx264" to hevc_nvenc here.

@willfe
Copy link

willfe commented Jun 24, 2021

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):

  • Hardware encoders are pickier about input formats, color spaces, etc.
    • ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
    • Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.
  • Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.
  • Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

  • Hardware decoding could potentially speed things up during hardware encoding if:
    • the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
    • the entire job can be done in a single invocation of ffmpeg (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding stash currently invokes ffmpeg multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get ffmpeg to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

@jimz011
Copy link

jimz011 commented Nov 27, 2022

I think the problem is not that the software decoder is bad, but for instance, I have files that will ramp up my CPU cores to 100% interfering with other services that also need those cores (the very same thing is true when transcoding with Plex on software).

I have a pretty old CPU (4790K) and it has a lot of trouble playing some files because the CPU simply can't keep up. The GPU however is a pretty decent one (GTX1070) and has no problem doing multiple 4K hardware transcodes simultaneously without my CPU ramping up to a 100%.

I understand that this is probably too hard to implement (or that people don't see the benefits of it) and thus will probably never come to Stash, but I wish it did though. Yes ofc I can transcode by generating the files, but that takes up diskspace.

@notme43
Copy link

notme43 commented Feb 15, 2023

About 1/3 of my library is HEVC, in either 720p/1080p. The software transcoder starts to struggle if I try outputting to anything higher than 720p. I use Firefox on everything, which doesn't support HEVC for licensing reasons, so it's always transcoding and tying up the host CPU.

I experimented with building Stash on top of the nvidia/cuda docker stack and was able to achieve hardware accelerated decoding and encoding. I'm pretty impressed with the results. I let a 1080p HEVC video stream in H264 for about 5 minutes - CPU load stayed around 1.00 while FFMPEG quickly filled the buffer and throttled the GPU. I noticed the biggest difference when using both NVDEC and NVENC - just enabling one didn't seem to effect CPU usage much. I'm using a GTX 1650 with a Ryzen 5 3600.

I don't know Golang, my changes are pretty hacky and this isn't robust enough for a PR. But it works as a proof of concept and I'm sure someone wiser can implement this properly. I did notice unintended behavior when accessing stash over a reverse proxy + SSL. FFMPEG would peg the GPU at 100% then fail after about 3 minutes of playing a video. This is probably due to my own nginx misconfiguration, it did not occur when accessing Stash directly.

Here is my modified Dockerfile from docker/build/x86_64/Dockerfile.

I changed the video codec in pkg/ffmpeg/codec.go on line 14:
VideoCodecLibX264 VideoCodec = "h264_nvenc"

And the ffmpeg arguments for StreamFormatH264 in pkg/ffmpeg/stream.go starting on line 68. The "+" in front of frag_keyframe was strictly necessary I found, but the rest I tuned according to preference because the default quality was quite poor.

StreamFormatH264 = StreamFormat{
        codec:    VideoCodecLibX264,
        format:   FormatMP4,
        MimeType: MimeMp4,
        extraArgs: []string{
                "-acodec", "aac",
                "-pix_fmt", "yuv420p",
                "-movflags", "+frag_keyframe+empty_moov",
                "-preset", "llhp",
                "-rc", "vbr",
                "-zerolatency", "1",
                "-temporal-aq", "1",
                "-cq", "24",
        },
}

Running make docker-build after this should produce a Stash container capable of GPU encoding. For decoding, I set -hwaccel auto as a setting in the interface under "FFmpeg LiveTranscode Input Args". Setting it globally like this broke the other transcode formats where hardware accelerated decoding is not possible (like WebM, the default transcode target). I commented out the WebM scene routes and endpoints in internal/api/routes_scene.go as a workaround, so it always falls back to MP4.

One of the obstacles mentioned by @willfe was the transcode limit imposed by the Nvidia drivers. I didn't try this because my host is already patched, but the transcode limit patch can be integrated into docker containers so the user doesn't have to bother with it.

I think the missing piece to a possible all-in-one Stash container for hardware transcoding is the logic to determine when to use it, which is tricky depending on the particular architecture of GPU the user has - even with the Nvidia CUDA tools.

Edit: Wow, preview generation is almost instantaneous.

@i-am-at0m
Copy link

Would a similar technique allow for QuickSync transcoding?

@notme43
Copy link

notme43 commented Feb 15, 2023

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

@bnkai
Copy link
Collaborator

bnkai commented Feb 15, 2023

There is an open PR #3419 btw if anyone is interested in testing or providing some feedback

@electblake
Copy link

electblake commented Feb 15, 2023

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

exactly what I was hoping you'd say

Edit: maybe getting ahead of myself, but this is a guide for exposing card with plex I used (shows commands to list available devices etc, and is synology specific but maybe works for others)

https://medium.com/@MrNick4B/plex-on-docker-on-synology-enabling-hardware-transcoding-fa017190cad7

@Tweeticoats
Copy link
Contributor

I have an unusual Nas with a Rockchip RK3399 arm CPU.
It does support hardware decoding with the H264_RKMPP HEVC_RKMPP decoders.
I believe I need to compile ffmpeg myself to use these decoders which I have not bothered with yet.

Would it be possible to have a setting to specify the extra command line arguments for edge cases like this?

@NodudeWasTaken
Copy link
Contributor

NodudeWasTaken commented Mar 10, 2023

Great news, hardware encoding is now merged and ready for testing for anyone willing.
It should work for:

  • NVIDIA GPUS (h264_nvenc)
    The docker image can be built with make docker-cuda-build, this makes the docker tag stash/cuda-build:latest
    You will additionally need to specify the args:
    --runtime=nvidia --gpus all --device /dev/nvidiactl --device /dev/nvidia0
  • Intel (h264_qsv, vp9_qsv)
    For docker you must use the CUDA build and arg --device=/dev/dri
  • Raspberry pi (newer) (h264_v4l2m2m)
  • AMD Linux and most VAAPI supported platforms (h264_vaapi, vp9_vaapi) (hopefully)
    For docker you must use the arg --device=/dev/dri

Note that RPI and VAAPI dont support direct file transcode for h264 (mp4), so it only uses h264 hardware transcoding for HLS (h264).

Note that the normal Docker build only supports VAAPI and v4l2m2m.

You can check the logs for which codecs where found and enabled, and check the debug log for why they failed

@derN3rd
Copy link

derN3rd commented Mar 10, 2023

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.

23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:

Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Any idea/tips how to get more information for this?

@i-am-at0m
Copy link

i-am-at0m commented Mar 10, 2023 via email

@NodudeWasTaken
Copy link
Contributor

NodudeWasTaken commented Mar 10, 2023

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.

23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:

Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Any idea/tips how to get more information for this?

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

@derN3rd
Copy link

derN3rd commented Mar 10, 2023

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

Switching to debug or even trace shows nothing more from the server startup.
When starting stash the only hint for HWacc is [InitHWSupport] Supported HW codecs:, when I try to live transcode it just works but as slow as it did with CPU only and the logs show nothing related to HWacc (tried with HLS, webm, dash all running slow apparently without HWacc)

2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:29:53
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:28:30
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_hls at segment #0
2023-03-10 13:28:29
Debug   
[transcode] returning HLS manifest for scene 4711
2023-03-10 13:28:10
Debug   
[transcode] streaming scene 4711 as video/webm
2023-03-10 13:28:08
Debug   
[transcode] streaming scene 4711 as video/webm

@NodudeWasTaken
Copy link
Contributor

Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)

My docker-compose.yml has both entries configured as devices:

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.

services:
  stash:
    image: stashapp/stash:development
    // [...]
    mem_limit: 2048m
    mem_reservation: 1024M

Still Out of memory in all cases

Could you try modifying the docker build to add:
For alpine build: RUN apk add --no-cache mesa-dri-gallium libva-intel-driver
For cuda build: RUN apt install intel-media-va-driver-non-free -y
Below the other apk add or apt install's.

@derN3rd
Copy link

derN3rd commented Mar 14, 2023

I tried the image by CarlNs92891 (who deleted their message or got it deleted, idk) which does

apt install libvips-tools ffmpeg musl 
apt install intel-media-va-driver-non-free vainfo

and with that it works!

@derN3rd
Copy link

derN3rd commented Jun 10, 2023

Can someone from the maintainers tell whats the current blocker here?

I really would like to have this running in the official docker images, so I can use watchtower auto updates for my containers, therefore self building with these tricks is not a good option for me.

How about having the CUDA image also auto released to docker hub as stashapp/stash:CUDA-latest or similar?

@i-am-at0m
Copy link

QSV works I think?

@derN3rd
Copy link

derN3rd commented Jun 10, 2023

QSV works I think?

I'm not sure anymore what kind of hardware encoding works on my NAS, but apparently it's not QSV.
I still get [InitHWSupport] Supported HW codecs: in my logs with the default latest docker image.

With the CUDA image it works, but it's not published on the docker hub, which is my main issue currently

@nerethos
Copy link

nerethos commented Jun 27, 2023

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

@FoodFighters
Copy link

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):

* Hardware encoders are pickier about input formats, color spaces, etc.
  
  * ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
  * Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.

* Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.

* Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

* Hardware _decoding_ could potentially speed things up during hardware encoding _if_:
  
  * the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
  * the entire job can be done in a single invocation of _ffmpeg_ (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding _stash_ currently invokes _ffmpeg_ multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get _ffmpeg_ to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

Nvidia limited you to 3 encodes, not 2. And they recently changed it to 5.

@algers
Copy link

algers commented Aug 24, 2023

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

Mind sharing the build file?

@anonstash
Copy link

anonstash commented Aug 29, 2023

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

@wormvortex
Copy link

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

This works perfectly. Any chance of it being updated to match the newest release :D

@guim31
Copy link

guim31 commented Oct 16, 2023

When I pull this image https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg instead of my installed nightly version it crashes (probably the two are not swapable because of the date difference).
I hope I'll be able soon to use HW transcode within my "classic" stash install.

@JeremyTsai26
Copy link

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

@nerethos
Can this version use vaapi to transcoding with old iGPU?

@Casper889
Copy link

I got this working with iGPU on the current docker image release.

  1. Pass through iGPU. /dev/dri/card0, /dev/dri/renderD128 in my case
  2. install driver in docker image apk add libva-intel-driver
  3. in Stash System Settings pass these arguments to ffmpeg: -hwaccel and auto
  4. in Stash System Settings FFmpeg hardware encoding turned on

Hope this helps someone

@razgriz88
Copy link

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation?
those are the only 2 things i need from hardware accel

@Casper889
Copy link

Casper889 commented Dec 1, 2023

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel

I got this working on Unraid as well but with a much older CPU (ivy bridge). Generation tasks still don’t use hardware acceleration, just transcoding tasks. I’m not sure Stash supports this as I didn’t find any config options related to it..

@ChilledSlim
Copy link
Contributor

ChilledSlim commented Dec 4, 2023

I made a script in my config folder called entrypoint.sh.
Be sure to chmod 755 entrypoint.sh the script so it's executable.

It's contents:

#!/bin/sh
# Add requirements for FansDB scraper
# cd /root/.stash/scrapers/FansDB-SHALookup && pip install -r requirements.txt
# Add VAAPI drivers
apk add libva-intel-driver

(note, the script actually does other things as well like sets my API keys, downloads FanDB, etc.)

The docker-compose file includes the following:

    devices:
      # VAAPI Devices
      - /dev/dri/renderD128:/dev/dri/renderD128
      - /dev/dri/card0:/dev/dri/card0
    command: sh -c "/root/.stash/entrypoint.sh && stash"

With that, it correctly shows the hardware codecs:
[InitHWSupport] Supported HW codecs: h264_vaapi vp9_vaapi

@deepradio
Copy link

I made a script in my config folder called entrypoint.sh. Be sure to chmod 755 entrypoint.sh the script so it's executable.

Thanks for the script! But I am using 13th gen Intel CPU with Iris Xe GPU. I have to install intel-media-driver instead of libva-intel-driver to make it work.

apk add --no-cache intel-media-driver

@parad0x3Dart
Copy link

Confirming deepradio's comment works for me as well. My platform is Intel(R) Celeron(R) N5105 @ 2.00GHz (from /proc/cpuinfo)

I didn't feel like bind mounting a script, so I just used the following command directive:

command: /bin/sh -c 'apk --no-cache add intel-media-driver && stash'

Also, I found it easier just to mount the entire device:

devices:
- /dev/dri:/dev/dri

@Casper889
Copy link

Confirming deepradio's comment works for me as well. My platform is Intel(R) Celeron(R) N5105 @ 2.00GHz (from /proc/cpuinfo)

I didn't feel like bind mounting a script, so I just used the following command directive:

command: /bin/sh -c 'apk --no-cache add intel-media-driver && stash'

Also, I found it easier just to mount the entire device:

devices:
- /dev/dri:/dev/dri

Correct - the driver you need depends on the age of your CPU. This link will help: https://wiki.archlinux.org/title/Hardware_video_acceleration

@WithoutPants
Copy link
Collaborator

This looks like it's been completed by #3419. Is there any reason left to keep this open?

@codycjy
Copy link

codycjy commented Feb 3, 2024

Currently, I'm facing an issue with enabling CUDA acceleration for encoding and decoding . I tried using the -hwaccel cuda and -hwaccel_output_format cuda options, but it seems not to work as expected. Can anyone provide guidance on how to properly configure to utilize CUDA for accelerated encoding and decoding?
I'm using the cuda docker build
The main problem

  • The output format (expected h264_nvenc)
  • The scale parameter should be replaced by scale_cuda in generating preview or it will raise an error.
    @NodudeWasTaken

@Jglrz
Copy link

Jglrz commented Mar 19, 2024

Hardware transcoding for live streaming works fine. And I understand why in case of preview generation some might want to use libx264 with higher presets, but not everyone needs it and it would be nice to be able to use hardware encoding for preview generation as well. Ideally this would be changeable in settings, so it's not forced upon everyone.

@davin900
Copy link

davin900 commented Jun 5, 2024

I got this working with iGPU on the current docker image release.

1. Pass through iGPU. /dev/dri/card0, /dev/dri/renderD128 in my case

2. install driver in docker image `apk add libva-intel-driver`

3. in Stash System Settings pass these arguments to ffmpeg: `-hwaccel` and `auto`

4. in Stash System Settings FFmpeg hardware encoding turned on

Hope this helps someone

This worked perfectly for me up until version 26, which was just released. Now my logs don't indicate any hwaccel devices were detected. Any thoughts? Thanks in advance.

@WithoutPants
Copy link
Collaborator

If you turn on debug logging and restart, you should get log output showing the testing for each codec, and the errors encountered to indicate they are not supported.

Another user fixed it by removing an old ffmpeg version (4.1) in their stash config directory, so that stash resolves the correct ffmpeg version (which in this case was 6.1).

@davin900
Copy link

davin900 commented Jun 9, 2024

If you turn on debug logging and restart, you should get log output showing the testing for each codec, and the errors encountered to indicate they are not supported.

Another user fixed it by removing an old ffmpeg version (4.1) in their stash config directory, so that stash resolves the correct ffmpeg version (which in this case was 6.1).

That was me! Sorry I made this comment before you responded on Discord. Thanks again for your help.

@wormvortex
Copy link

Where abouts in the config folder was this older ffmpeg? I can't see one in mine.

@i-am-at0m
Copy link

i-am-at0m commented Jun 10, 2024 via email

@Hoempi
Copy link

Hoempi commented Jun 26, 2024

I'm trying to get this to run as well. My platform is a Synology 918+ with an Intel Celeron J3455. I added /dev/dri:/dev/dri in the devices section of my compose file. I also installed both libva-intel-driver and intel-media-driver. But I still got the dreaded "[InitHWSupport] Supported HW codecs [0]:"

ffmpeg --version reports version 6.1.1. Does anyone have some pointers for me, how to perhaps identify if I need another package to install?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

No branches or pull requests