[Feature] hardware encoding #305

iamjen023 · 2020-01-03T14:10:43Z

i would love for nvidia NVenc for tarnscoding and Generation
this would also work with amd and intel encoders
this could speed up the Generation process

bnkai · 2020-01-03T16:20:12Z

For the file transcoding we use x264 with the faster preset and thats probably the only place it might be quicker with nvenc (not sure about quality) BUT file transcoding imho is not that needed anymore since we now have live stream transcoding for unsupported files.( IMHO transcodes in the generated content section should be left unticked in 99% of the cases )

For live transcoding we have vp9 webm files that are not supported through hardware encoding except through intel vaapi i think and not sure about the stability / quality of that also.

Finally for the generated previews and markers we use x264 with preset veryslow to get the highest quality , since they are only generated once but viewed many times. If you wanted to make the generate faster thats where maybe we could opt to change the veryslow preset to medium or even fast and still get better quality/performance than hardware encoders. Thats ofcourse only for anyone that is willing to compromise the quality for speed and only as an extra selection not as the default.

HASJ · 2020-08-18T21:43:57Z

The only way live streaming would even remotely be viable here is by hardware acceleration. Software-bound encoding is a no-go.
VP9 is even worse. I am using a FX-6300. It was not optimized for these tasks, to put it kindly.
The people asking for this feature need this. They do not care about the fabled and scary quality loss.

CenterThrowaway · 2020-08-18T21:57:46Z

I'd add that after Pascal on the Nvidia side, hardware encoding with their GPUs is leaps and bounds better, comparable to CPU based x264 up to Medium presets I believe, while being much faster.

praul · 2021-01-04T08:42:53Z

I too would be very pleased about this feature. It does not have to be as user friendly as for example jellyfin with its hardware encoding support. It could be an advanced setting to add parameters for ffmpeg (playback/live transcoding or preview generation). If it causes problems, users could just set it back to default. But advanced users would be able to fiddle with it a litte more.

It's quite easy to pass vaapi support to docker containers, and hardware encoding would greatly benefit my high cpu loads.

r3538987 · 2021-01-05T19:33:02Z

Can someone share command-line which is being used when generating such video previews.
Only thing I see at the moment is this when generation fails on WMVs sometimes.

"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\\Downloads\\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2 C:\\Users\\username\\.stash-data\\tmp\\preview013.mp4>: F:\\Downloads\\1.wmv: corrupt decoded frame in stream 1\r\n""
I would like to play around and at least see how GPU encode would help.

Took me 6 hours to generate video previews for 700 videos, each approximately 0,5-4GB.
20 pieces, 3% skip on both end, fast preset. i5 4460.

ghost · 2021-01-05T19:39:52Z

Can someone share command-line which is being used when generating such video previews.
Only thing I see at the moment is this when generation fails on WMVs sometimes.

`"ffmpeg.exe -v error -xerror -ss 85.32 -i F:\Downloads\1.wmv -t 0.75 -max_muxing_queue_size 1024 -y -c:v libx264 -pix_fmt yuv420p -profile:v high -level 4.2 -preset fast -crf 21 -threads 4 -vf scale=640:-2 -c:a aac -b:a 128k -strict -2

That is the command for generating a preview segment. In your case that's run 20 times for each video, with the result spliced together into the final preview. You can cut down on generation time by choosing fewer segments, and setting encoding preset to ultrafast.

We're investigating hardware acceleration for transcoding, but I have no idea if it's going to be useful for generation seeing as hw acceleration likely has more startup latency.

r3538987 · 2021-01-05T20:06:20Z

Just tried NVENC in handbrake app just to see difference on some random file.
After 1 minute CPU encoded simple 220MB 720p WMV file only 1min30sec far.
In comparison RTX2070S managed to encode entire 5 minutes video in these 1 minute restrictions.

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

HASJ · 2021-01-11T17:01:54Z

Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible?

Seconded.

praul · 2021-02-09T11:45:25Z

Any news on this?
This is how jellyfin handles gpu transcoding gui-wise

And this is the ffmpeg command
ffmpeg -vaapi_device /dev/dri/renderD128 -i file:"INPUT.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_vaapi -b:v 6621920 -maxrate 6621920 -bufsize 13243840 -force_key_frames:0 "expr:gte(t,0+n_forced*3)" -g 72 -keyint_min 72 -sc_threshold 0 -vf "format=nv12|vaapi,hwupload,scale_vaapi=w=1022:h=574:format=nv12" -start_at_zero -vsync -1 -codec:a:0 aac -ac 6 -ab 256000 -copyts -avoid_negative_ts disabled -f hls -max_delay 5000000 -hls_time 3 -individual_header_trailer 0 -hls_segment_type mpegts -start_number 0 -hls_segment_filename "OUTPUT.ts" -hls_playlist_type vod -hls_list_size 0 -y "SOMEPLAYLISTIDONTKNOW.m3u8"

It is very performant and easy on the cpu

bnkai · 2021-02-09T12:14:24Z

Jellyfin uses a different player so hls is supported, thats not the case for stash as jwplayers hls support depends on the browser afaik. This issue makes it more complicated to adapt to.

reduych · 2021-06-21T06:45:02Z

For generating previews I found that this really doesn't help much. Since previews are converted only 0.75 seconds at a time, the overhead of creating and concatenating (twelve 0.75 clips) is probably a lot more than generating these individual bursts. Here's what my GPU graph looked like - notice only very sparse spikes of usage (as opposed to continuous usage when converting larger files), even with 12 parallel tasks, while the CPU was still 100% all the time (doing the preparing, other processing). Overall it did not help much.

If anyone wants to test change "-c:v", "libx264" to hevc_nvenc here.

willfe · 2021-06-24T13:23:36Z

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):

Hardware encoders are pickier about input formats, color spaces, etc.
- ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
- Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.
Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.
Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

Hardware decoding could potentially speed things up during hardware encoding if:
- the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
- the entire job can be done in a single invocation of ffmpeg (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding stash currently invokes ffmpeg multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get ffmpeg to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

jimz011 · 2022-11-27T18:51:53Z

I think the problem is not that the software decoder is bad, but for instance, I have files that will ramp up my CPU cores to 100% interfering with other services that also need those cores (the very same thing is true when transcoding with Plex on software).

I have a pretty old CPU (4790K) and it has a lot of trouble playing some files because the CPU simply can't keep up. The GPU however is a pretty decent one (GTX1070) and has no problem doing multiple 4K hardware transcodes simultaneously without my CPU ramping up to a 100%.

I understand that this is probably too hard to implement (or that people don't see the benefits of it) and thus will probably never come to Stash, but I wish it did though. Yes ofc I can transcode by generating the files, but that takes up diskspace.

notme43 · 2023-02-15T05:12:55Z

About 1/3 of my library is HEVC, in either 720p/1080p. The software transcoder starts to struggle if I try outputting to anything higher than 720p. I use Firefox on everything, which doesn't support HEVC for licensing reasons, so it's always transcoding and tying up the host CPU.

I experimented with building Stash on top of the nvidia/cuda docker stack and was able to achieve hardware accelerated decoding and encoding. I'm pretty impressed with the results. I let a 1080p HEVC video stream in H264 for about 5 minutes - CPU load stayed around 1.00 while FFMPEG quickly filled the buffer and throttled the GPU. I noticed the biggest difference when using both NVDEC and NVENC - just enabling one didn't seem to effect CPU usage much. I'm using a GTX 1650 with a Ryzen 5 3600.

I don't know Golang, my changes are pretty hacky and this isn't robust enough for a PR. But it works as a proof of concept and I'm sure someone wiser can implement this properly. I did notice unintended behavior when accessing stash over a reverse proxy + SSL. FFMPEG would peg the GPU at 100% then fail after about 3 minutes of playing a video. This is probably due to my own nginx misconfiguration, it did not occur when accessing Stash directly.

Here is my modified Dockerfile from docker/build/x86_64/Dockerfile.

I changed the video codec in pkg/ffmpeg/codec.go on line 14:
VideoCodecLibX264 VideoCodec = "h264_nvenc"

And the ffmpeg arguments for StreamFormatH264 in pkg/ffmpeg/stream.go starting on line 68. The "+" in front of frag_keyframe was strictly necessary I found, but the rest I tuned according to preference because the default quality was quite poor.

StreamFormatH264 = StreamFormat{
        codec:    VideoCodecLibX264,
        format:   FormatMP4,
        MimeType: MimeMp4,
        extraArgs: []string{
                "-acodec", "aac",
                "-pix_fmt", "yuv420p",
                "-movflags", "+frag_keyframe+empty_moov",
                "-preset", "llhp",
                "-rc", "vbr",
                "-zerolatency", "1",
                "-temporal-aq", "1",
                "-cq", "24",
        },
}

Running make docker-build after this should produce a Stash container capable of GPU encoding. For decoding, I set -hwaccel auto as a setting in the interface under "FFmpeg LiveTranscode Input Args". Setting it globally like this broke the other transcode formats where hardware accelerated decoding is not possible (like WebM, the default transcode target). I commented out the WebM scene routes and endpoints in internal/api/routes_scene.go as a workaround, so it always falls back to MP4.

One of the obstacles mentioned by @willfe was the transcode limit imposed by the Nvidia drivers. I didn't try this because my host is already patched, but the transcode limit patch can be integrated into docker containers so the user doesn't have to bother with it.

I think the missing piece to a possible all-in-one Stash container for hardware transcoding is the logic to determine when to use it, which is tricky depending on the particular architecture of GPU the user has - even with the Nvidia CUDA tools.

Edit: Wow, preview generation is almost instantaneous.

i-am-at0m · 2023-02-15T18:24:16Z

Would a similar technique allow for QuickSync transcoding?

notme43 · 2023-02-15T19:17:35Z

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

bnkai · 2023-02-15T19:33:19Z

There is an open PR #3419 btw if anyone is interested in testing or providing some feedback

electblake · 2023-02-15T20:09:13Z

Would a similar technique allow for QuickSync transcoding?

AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this --device /dev/dri/render128.

exactly what I was hoping you'd say

Edit: maybe getting ahead of myself, but this is a guide for exposing card with plex I used (shows commands to list available devices etc, and is synology specific but maybe works for others)

https://medium.com/@MrNick4B/plex-on-docker-on-synology-enabling-hardware-transcoding-fa017190cad7

Tweeticoats · 2023-02-16T13:26:41Z

I have an unusual Nas with a Rockchip RK3399 arm CPU.
It does support hardware decoding with the H264_RKMPP HEVC_RKMPP decoders.
I believe I need to compile ffmpeg myself to use these decoders which I have not bothered with yet.

Would it be possible to have a setting to specify the extra command line arguments for edge cases like this?

NodudeWasTaken · 2023-03-10T10:48:03Z

Great news, hardware encoding is now merged and ready for testing for anyone willing.
It should work for:

NVIDIA GPUS (h264_nvenc)
The docker image can be built with make docker-cuda-build, this makes the docker tag stash/cuda-build:latest
You will additionally need to specify the args:
--runtime=nvidia --gpus all --device /dev/nvidiactl --device /dev/nvidia0
Intel (h264_qsv, vp9_qsv)
For docker you must use the CUDA build and arg --device=/dev/dri
Raspberry pi (newer) (h264_v4l2m2m)
AMD Linux and most VAAPI supported platforms (h264_vaapi, vp9_vaapi) (hopefully)
For docker you must use the arg --device=/dev/dri

Note that RPI and VAAPI dont support direct file transcode for h264 (mp4), so it only uses h264 hardware transcoding for HLS (h264).

Note that the normal Docker build only supports VAAPI and v4l2m2m.

You can check the logs for which codecs where found and enabled, and check the debug log for why they failed

derN3rd · 2023-03-10T12:12:10Z

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.

23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:

Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough

    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"

Any idea/tips how to get more information for this?

i-am-at0m · 2023-03-10T12:16:10Z

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

…

On Fri, Mar 10, 2023, 07:12 Max ***@***.***> wrote: Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs. 23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs: Plex is managing to use hw acceleration just fine, so not sure where to start looking here. My docker-compose.yml already includes the device passthrough devices: - "/dev/dri/card0:/dev/dri/card0" - "/dev/dri/renderD128:/dev/dri/renderD128" Any idea/tips how to get more information for this? — Reply to this email directly, view it on GitHub <#305 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHROW435QUVDQVEQEIKKSDW3MLCLANCNFSM4KCN4KXQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

NodudeWasTaken · 2023-03-10T12:19:45Z

Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
23-03-10 13:10:57 Info    [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to start looking here.

My docker-compose.yml already includes the device passthrough
    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

derN3rd · 2023-03-10T12:31:13Z

Do you also have the intel-gpu-top plugin installed and have rebooted afterwards?

No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.

When stash starts, go into the webui->settings->logs and set the log level to debug, find the entry with codec h264_qsv and send the specific error

Switching to debug or even trace shows nothing more from the server startup.
When starting stash the only hint for HWacc is [InitHWSupport] Supported HW codecs:, when I try to live transcode it just works but as slow as it did with CPU only and the logs show nothing related to HWacc (tried with HLS, webm, dash all running slow apparently without HWacc)

2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:03
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-v_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_dash-a_1080 at segment #0
2023-03-10 13:30:02
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:29:53
Debug   
[transcode] returning DASH manifest for scene 4711
2023-03-10 13:28:30
Debug   
[transcode] starting transcode for 24d73d4def4e2e9ab797d46e28b1292c_hls at segment #0
2023-03-10 13:28:29
Debug   
[transcode] returning HLS manifest for scene 4711
2023-03-10 13:28:10
Debug   
[transcode] streaming scene 4711 as video/webm
2023-03-10 13:28:08
Debug   
[transcode] streaming scene 4711 as video/webm

NodudeWasTaken · 2023-03-14T18:39:49Z

Are you passing /dev/dri to the container as a volume? Or as a device? (It should be the second one)

My docker-compose.yml has both entries configured as devices:
    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"
Tried to use combinations of only one of them, tried as well limiting memory of the containers as well as reserving more, which didn't change the error messages at all.
services:
  stash:
    image: stashapp/stash:development
    // [...]
    mem_limit: 2048m
    mem_reservation: 1024M
Still Out of memory in all cases

Could you try modifying the docker build to add:
For alpine build: RUN apk add --no-cache mesa-dri-gallium libva-intel-driver
For cuda build: RUN apt install intel-media-va-driver-non-free -y
Below the other apk add or apt install's.

derN3rd · 2023-03-14T18:50:05Z

I tried the image by CarlNs92891 (who deleted their message or got it deleted, idk) which does

apt install libvips-tools ffmpeg musl 
apt install intel-media-va-driver-non-free vainfo

and with that it works!

derN3rd · 2023-06-10T13:37:47Z

Can someone from the maintainers tell whats the current blocker here?

I really would like to have this running in the official docker images, so I can use watchtower auto updates for my containers, therefore self building with these tricks is not a good option for me.

How about having the CUDA image also auto released to docker hub as stashapp/stash:CUDA-latest or similar?

i-am-at0m · 2023-06-10T13:48:25Z

QSV works I think?

derN3rd · 2023-06-10T14:47:54Z

QSV works I think?

I'm not sure anymore what kind of hardware encoding works on my NAS, but apparently it's not QSV.
I still get [InitHWSupport] Supported HW codecs: in my logs with the default latest docker image.

With the CUDA image it works, but it's not published on the docker hub, which is my main issue currently

nerethos · 2023-06-27T18:39:30Z

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

FoodFighters · 2023-07-08T11:50:15Z

There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):

* Hardware encoders are pickier about input formats, color spaces, etc.
  
  * ffmpeg can handle the conversion, but that's in-software, so you're back to CPU-intensive work even with hardware encoding.
  * Setting that up means keeping lists of all the formats each hardware encoder supports, comparing the format of the source file, and invoking inline conversion only when needed.

* Hardware encoding on consumer-grade GPUs are usually artificially limited to no more than N encodes at once (nvidia limits it to 2); this can be fixed by the user patching the driver, but it's an annoyance regardless. There's no way to auto-detect the current limit either; the drivers won't report it.

* Fallback to software needs to be implemented to handle cases where the hardware encoder fails (bogus input format, too many encodes in progress, solar flares, etc.).

Now on the plus side:

* Hardware _decoding_ could potentially speed things up during hardware encoding _if_:
  
  * the source format is supported by the decoder (hardware decoders usually do support more formats than the encoders), and
  * the entire job can be done in a single invocation of _ffmpeg_ (the biggest speedup comes from keeping all the work and data on the GPU, because that avoids some expensive copies to/from main/video memory). From my understanding _stash_ currently invokes _ffmpeg_ multiple times (once per desired segment), and invoking it a single time to do the same thing is slower because it slurps in the entire video instead of just seeking to each segment, so again this speedup might not be worth it unless a way can be found to get _ffmpeg_ to be more efficient about this.

I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case.

Nvidia limited you to 3 encodes, not 2. And they recently changed it to 5.

algers · 2023-08-24T17:04:19Z

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

Mind sharing the build file?

anonstash · 2023-08-29T20:18:55Z

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

wormvortex · 2023-10-15T22:50:22Z

@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg

Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future.

This works perfectly. Any chance of it being updated to match the newest release :D

guim31 · 2023-10-16T06:59:47Z

When I pull this image https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg instead of my installed nightly version it crashes (probably the two are not swapable because of the date difference).
I hope I'll be able soon to use HW transcode within my "classic" stash install.

JeremyTsai26 · 2023-11-02T18:12:35Z

QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub.

I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it.

As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version?

@nerethos
Can this version use vaapi to transcoding with old iGPU?

Casper889 · 2023-11-29T18:36:11Z

I got this working with iGPU on the current docker image release.

Pass through iGPU. /dev/dri/card0, /dev/dri/renderD128 in my case
install driver in docker image apk add libva-intel-driver
in Stash System Settings pass these arguments to ffmpeg: -hwaccel and auto
in Stash System Settings FFmpeg hardware encoding turned on

Hope this helps someone

razgriz88 · 2023-11-30T23:20:27Z

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation?
those are the only 2 things i need from hardware accel

Casper889 · 2023-12-01T07:58:11Z

I got this working with iGPU on the current docker image release.

i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? those are the only 2 things i need from hardware accel

I got this working on Unraid as well but with a much older CPU (ivy bridge). Generation tasks still don’t use hardware acceleration, just transcoding tasks. I’m not sure Stash supports this as I didn’t find any config options related to it..

ChilledSlim · 2023-12-04T23:18:57Z

I made a script in my config folder called entrypoint.sh.
Be sure to chmod 755 entrypoint.sh the script so it's executable.

It's contents:

#!/bin/sh
# Add requirements for FansDB scraper
# cd /root/.stash/scrapers/FansDB-SHALookup && pip install -r requirements.txt
# Add VAAPI drivers
apk add libva-intel-driver

(note, the script actually does other things as well like sets my API keys, downloads FanDB, etc.)

The docker-compose file includes the following:

    devices:
      # VAAPI Devices
      - /dev/dri/renderD128:/dev/dri/renderD128
      - /dev/dri/card0:/dev/dri/card0
    command: sh -c "/root/.stash/entrypoint.sh && stash"

With that, it correctly shows the hardware codecs:
[InitHWSupport] Supported HW codecs: h264_vaapi vp9_vaapi

deepradio · 2023-12-05T07:55:49Z

I made a script in my config folder called entrypoint.sh. Be sure to chmod 755 entrypoint.sh the script so it's executable.

Thanks for the script! But I am using 13th gen Intel CPU with Iris Xe GPU. I have to install intel-media-driver instead of libva-intel-driver to make it work.

apk add --no-cache intel-media-driver

parad0x3Dart · 2023-12-10T16:30:22Z

Confirming deepradio's comment works for me as well. My platform is Intel(R) Celeron(R) N5105 @ 2.00GHz (from /proc/cpuinfo)

I didn't feel like bind mounting a script, so I just used the following command directive:

command: /bin/sh -c 'apk --no-cache add intel-media-driver && stash'

Also, I found it easier just to mount the entire device:

devices:
- /dev/dri:/dev/dri

Casper889 · 2023-12-10T16:46:17Z

Confirming deepradio's comment works for me as well. My platform is Intel(R) Celeron(R) N5105 @ 2.00GHz (from /proc/cpuinfo)

I didn't feel like bind mounting a script, so I just used the following command directive:
command: /bin/sh -c 'apk --no-cache add intel-media-driver && stash'
Also, I found it easier just to mount the entire device:
devices:
- /dev/dri:/dev/dri

Correct - the driver you need depends on the age of your CPU. This link will help: https://wiki.archlinux.org/title/Hardware_video_acceleration

WithoutPants · 2024-01-10T06:19:32Z

This looks like it's been completed by #3419. Is there any reason left to keep this open?

codycjy · 2024-02-03T05:53:13Z

Currently, I'm facing an issue with enabling CUDA acceleration for encoding and decoding . I tried using the -hwaccel cuda and -hwaccel_output_format cuda options, but it seems not to work as expected. Can anyone provide guidance on how to properly configure to utilize CUDA for accelerated encoding and decoding?
I'm using the cuda docker build
The main problem

The output format (expected h264_nvenc)
The scale parameter should be replaced by scale_cuda in generating preview or it will raise an error.
@NodudeWasTaken

Jglrz · 2024-03-19T11:29:07Z

Hardware transcoding for live streaming works fine. And I understand why in case of preview generation some might want to use libx264 with higher presets, but not everyone needs it and it would be nice to be able to use hardware encoding for preview generation as well. Ideally this would be changeable in settings, so it's not forced upon everyone.

davin900 · 2024-06-05T23:09:02Z

I got this working with iGPU on the current docker image release.

1. Pass through iGPU. /dev/dri/card0, /dev/dri/renderD128 in my case

2. install driver in docker image `apk add libva-intel-driver`

3. in Stash System Settings pass these arguments to ffmpeg: `-hwaccel` and `auto`

4. in Stash System Settings FFmpeg hardware encoding turned on

Hope this helps someone

This worked perfectly for me up until version 26, which was just released. Now my logs don't indicate any hwaccel devices were detected. Any thoughts? Thanks in advance.

WithoutPants · 2024-06-07T03:18:46Z

If you turn on debug logging and restart, you should get log output showing the testing for each codec, and the errors encountered to indicate they are not supported.

Another user fixed it by removing an old ffmpeg version (4.1) in their stash config directory, so that stash resolves the correct ffmpeg version (which in this case was 6.1).

davin900 · 2024-06-09T00:03:39Z

If you turn on debug logging and restart, you should get log output showing the testing for each codec, and the errors encountered to indicate they are not supported.

Another user fixed it by removing an old ffmpeg version (4.1) in their stash config directory, so that stash resolves the correct ffmpeg version (which in this case was 6.1).

That was me! Sorry I made this comment before you responded on Discord. Thanks again for your help.

wormvortex · 2024-06-09T04:57:42Z

Where abouts in the config folder was this older ffmpeg? I can't see one in mine.

i-am-at0m · 2024-06-10T23:58:09Z

If you didn't put one there, it's probably not there. It's just one way to make the binary accessible to docker since using apt or something inside the container defeats the purpose of using a container in that way. You can check what version it's finding in its PATH if you run ffmpeg --version INSIDE the container iirc

…

On Sun, Jun 9, 2024, 00:58 wormvortex ***@***.***> wrote: Where abouts in the config folder was this older ffmpeg? I can't see one in mine. — Reply to this email directly, view it on GitHub <#305 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHROW7BPKOHTCAV6NZIVJTZGPOFXAVCNFSM4KCN4KX2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJVGYZTCOJYGIZA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Hoempi · 2024-06-26T09:55:26Z

I'm trying to get this to run as well. My platform is a Synology 918+ with an Intel Celeron J3455. I added /dev/dri:/dev/dri in the devices section of my compose file. I also installed both libva-intel-driver and intel-media-driver. But I still got the dreaded "[InitHWSupport] Supported HW codecs [0]:"

ffmpeg --version reports version 6.1.1. Does anyone have some pointers for me, how to perhaps identify if I need another package to install?

bnkai added the feature Pull requests that add a new feature label Mar 27, 2020

bnkai mentioned this issue May 6, 2020

[Feature] Configurable Live Transcode Encoder & Output #521

Closed

bikkmakk86 mentioned this issue Jun 13, 2020

[Feature] Optionally use nvenc (GPU) when generating previews #610

Closed

bnkai mentioned this issue Aug 18, 2020

[Feature] Allow usage of hardware acceleration when transcoding #749

Closed

bnkai mentioned this issue Oct 24, 2020

[Feature] Allow to change encoder for FFmpeg -c:v parameter in Configuration for Preview Generation #894

Open

bnkai mentioned this issue Mar 28, 2022

[Feature] Ability to use GPU add-on card #2436

Closed

NodudeWasTaken mentioned this issue Mar 14, 2023

Added Intel QSV drivers to docker #3540

Merged

NodudeWasTaken mentioned this issue Mar 25, 2023

Simple hardware encoding #3419

Merged

[Feature] hardware encoding #305

[Feature] hardware encoding #305

Comments

iamjen023 commented Jan 3, 2020

bnkai commented Jan 3, 2020

HASJ commented Aug 18, 2020 • edited Loading

CenterThrowaway commented Aug 18, 2020

praul commented Jan 4, 2021

r3538987 commented Jan 5, 2021 • edited Loading

ghost commented Jan 5, 2021 • edited by ghost Loading

r3538987 commented Jan 5, 2021 • edited Loading

HASJ commented Jan 11, 2021 • edited Loading

praul commented Feb 9, 2021 • edited Loading

bnkai commented Feb 9, 2021

reduych commented Jun 21, 2021

willfe commented Jun 24, 2021

jimz011 commented Nov 27, 2022

notme43 commented Feb 15, 2023 • edited Loading

i-am-at0m commented Feb 15, 2023

notme43 commented Feb 15, 2023

bnkai commented Feb 15, 2023

electblake commented Feb 15, 2023 • edited Loading

Tweeticoats commented Feb 16, 2023

NodudeWasTaken commented Mar 10, 2023 • edited Loading

derN3rd commented Mar 10, 2023

i-am-at0m commented Mar 10, 2023 via email

NodudeWasTaken commented Mar 10, 2023 • edited Loading

derN3rd commented Mar 10, 2023

NodudeWasTaken commented Mar 14, 2023

derN3rd commented Mar 14, 2023

derN3rd commented Jun 10, 2023

i-am-at0m commented Jun 10, 2023

derN3rd commented Jun 10, 2023

nerethos commented Jun 27, 2023 • edited Loading

FoodFighters commented Jul 8, 2023

algers commented Aug 24, 2023

anonstash commented Aug 29, 2023 • edited Loading

wormvortex commented Oct 15, 2023

guim31 commented Oct 16, 2023

JeremyTsai26 commented Nov 2, 2023

Casper889 commented Nov 29, 2023

razgriz88 commented Nov 30, 2023

Casper889 commented Dec 1, 2023 • edited Loading

ChilledSlim commented Dec 4, 2023 • edited Loading

deepradio commented Dec 5, 2023

parad0x3Dart commented Dec 10, 2023

Casper889 commented Dec 10, 2023

WithoutPants commented Jan 10, 2024

codycjy commented Feb 3, 2024

Jglrz commented Mar 19, 2024 • edited Loading

davin900 commented Jun 5, 2024

WithoutPants commented Jun 7, 2024

davin900 commented Jun 9, 2024

wormvortex commented Jun 9, 2024

i-am-at0m commented Jun 10, 2024 via email

Hoempi commented Jun 26, 2024

HASJ commented Aug 18, 2020 •

edited

Loading

r3538987 commented Jan 5, 2021 •

edited

Loading

ghost commented Jan 5, 2021 •

edited by ghost

Loading

r3538987 commented Jan 5, 2021 •

edited

Loading

HASJ commented Jan 11, 2021 •

edited

Loading

praul commented Feb 9, 2021 •

edited

Loading

notme43 commented Feb 15, 2023 •

edited

Loading

electblake commented Feb 15, 2023 •

edited

Loading

NodudeWasTaken commented Mar 10, 2023 •

edited

Loading

NodudeWasTaken commented Mar 10, 2023 •

edited

Loading

nerethos commented Jun 27, 2023 •

edited

Loading

anonstash commented Aug 29, 2023 •

edited

Loading

Casper889 commented Dec 1, 2023 •

edited

Loading

ChilledSlim commented Dec 4, 2023 •

edited

Loading

Jglrz commented Mar 19, 2024 •

edited

Loading