-
-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] hardware encoding #305
Comments
For the file transcoding we use x264 with the faster preset and thats probably the only place it might be quicker with nvenc (not sure about quality) BUT file transcoding imho is not that needed anymore since we now have live stream transcoding for unsupported files.( IMHO transcodes in the generated content section should be left unticked in 99% of the cases ) For live transcoding we have vp9 webm files that are not supported through hardware encoding except through intel vaapi i think and not sure about the stability / quality of that also. Finally for the generated previews and markers we use x264 with preset veryslow to get the highest quality , since they are only generated once but viewed many times. If you wanted to make the generate faster thats where maybe we could opt to change the veryslow preset to medium or even fast and still get better quality/performance than hardware encoders. Thats ofcourse only for anyone that is willing to compromise the quality for speed and only as an extra selection not as the default. |
The only way live streaming would even remotely be viable here is by hardware acceleration. Software-bound encoding is a no-go. |
I'd add that after Pascal on the Nvidia side, hardware encoding with their GPUs is leaps and bounds better, comparable to CPU based x264 up to Medium presets I believe, while being much faster. |
I too would be very pleased about this feature. It does not have to be as user friendly as for example jellyfin with its hardware encoding support. It could be an advanced setting to add parameters for ffmpeg (playback/live transcoding or preview generation). If it causes problems, users could just set it back to default. But advanced users would be able to fiddle with it a litte more. It's quite easy to pass vaapi support to docker containers, and hardware encoding would greatly benefit my high cpu loads. |
Can someone share command-line which is being used when generating such video previews.
Took me 6 hours to generate video previews for 700 videos, each approximately 0,5-4GB. |
That is the command for generating a preview segment. In your case that's run 20 times for each video, with the result spliced together into the final preview. You can cut down on generation time by choosing fewer segments, and setting encoding preset to ultrafast. We're investigating hardware acceleration for transcoding, but I have no idea if it's going to be useful for generation seeing as hw acceleration likely has more startup latency. |
Just tried NVENC in handbrake app just to see difference on some random file. Currently can I create my own build to edit hardcoded command-line and make use of NVENC? Possible? |
Seconded. |
Jellyfin uses a different player so hls is supported, thats not the case for stash as jwplayers hls support depends on the browser afaik. This issue makes it more complicated to adapt to. |
For generating previews I found that this really doesn't help much. Since previews are converted only 0.75 seconds at a time, the overhead of creating and concatenating (twelve 0.75 clips) is probably a lot more than generating these individual bursts. Here's what my GPU graph looked like - notice only very sparse spikes of usage (as opposed to continuous usage when converting larger files), even with 12 parallel tasks, while the CPU was still 100% all the time (doing the preparing, other processing). Overall it did not help much. If anyone wants to test change |
There's a few subtle issues involved in hardware encoding beyond what's been mentioned here (I rambled about them a bit in #894 (comment)):
Now on the plus side:
I don't think hardware decoding will help at all at the moment though given how ffmpeg is currently used. Reading compressed data and decoding it on-CPU versus initializing the GPU decoder, reading the compressed data, shipping it to GPU memory, waiting for the decode and then shipping the output back to main memory -- I think software-only is faster in that case. |
I think the problem is not that the software decoder is bad, but for instance, I have files that will ramp up my CPU cores to 100% interfering with other services that also need those cores (the very same thing is true when transcoding with Plex on software). I have a pretty old CPU (4790K) and it has a lot of trouble playing some files because the CPU simply can't keep up. The GPU however is a pretty decent one (GTX1070) and has no problem doing multiple 4K hardware transcodes simultaneously without my CPU ramping up to a 100%. I understand that this is probably too hard to implement (or that people don't see the benefits of it) and thus will probably never come to Stash, but I wish it did though. Yes ofc I can transcode by generating the files, but that takes up diskspace. |
About 1/3 of my library is HEVC, in either 720p/1080p. The software transcoder starts to struggle if I try outputting to anything higher than 720p. I use Firefox on everything, which doesn't support HEVC for licensing reasons, so it's always transcoding and tying up the host CPU. I experimented with building Stash on top of the nvidia/cuda docker stack and was able to achieve hardware accelerated decoding and encoding. I'm pretty impressed with the results. I let a 1080p HEVC video stream in H264 for about 5 minutes - CPU load stayed around 1.00 while FFMPEG quickly filled the buffer and throttled the GPU. I noticed the biggest difference when using both NVDEC and NVENC - just enabling one didn't seem to effect CPU usage much. I'm using a GTX 1650 with a Ryzen 5 3600. I don't know Golang, my changes are pretty hacky and this isn't robust enough for a PR. But it works as a proof of concept and I'm sure someone wiser can implement this properly. I did notice unintended behavior when accessing stash over a reverse proxy + SSL. FFMPEG would peg the GPU at 100% then fail after about 3 minutes of playing a video. This is probably due to my own nginx misconfiguration, it did not occur when accessing Stash directly. Here is my modified Dockerfile from I changed the video codec in And the ffmpeg arguments for StreamFormatH264 in
Running One of the obstacles mentioned by @willfe was the transcode limit imposed by the Nvidia drivers. I didn't try this because my host is already patched, but the transcode limit patch can be integrated into docker containers so the user doesn't have to bother with it. I think the missing piece to a possible all-in-one Stash container for hardware transcoding is the logic to determine when to use it, which is tricky depending on the particular architecture of GPU the user has - even with the Nvidia CUDA tools. Edit: Wow, preview generation is almost instantaneous. |
Would a similar technique allow for QuickSync transcoding? |
AFAIK QuickSync leverages LibVA, so as long as the host had the supporting libraries, it would be just a matter of exposing the video card to the container like this |
There is an open PR #3419 btw if anyone is interested in testing or providing some feedback |
exactly what I was hoping you'd say Edit: maybe getting ahead of myself, but this is a guide for exposing card with plex I used (shows commands to list available devices etc, and is synology specific but maybe works for others)
|
I have an unusual Nas with a Rockchip RK3399 arm CPU. Would it be possible to have a setting to specify the extra command line arguments for edge cases like this? |
Great news, hardware encoding is now merged and ready for testing for anyone willing.
Note that RPI and VAAPI dont support direct file transcode for h264 (mp4), so it only uses h264 hardware transcoding for HLS (h264). Note that You can check the logs for which codecs where found and enabled, and check the debug log for why they failed |
Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455) reports back no available HW codecs.
Plex is managing to use hw acceleration just fine, so not sure where to start looking here. My docker-compose.yml already includes the device passthrough
Any idea/tips how to get more information for this? |
Do you also have the intel-gpu-top plugin installed and have rebooted
afterwards?
…On Fri, Mar 10, 2023, 07:12 Max ***@***.***> wrote:
Having this enabled on my Unraid 6.11.5 Server (Intel Celeron J3455)
reports back no available HW codecs.
23-03-10 13:10:57 Info [InitHWSupport] Supported HW codecs:
Plex is managing to use hw acceleration just fine, so not sure where to
start looking here.
My docker-compose.yml already includes the device passthrough
devices:
- "/dev/dri/card0:/dev/dri/card0"
- "/dev/dri/renderD128:/dev/dri/renderD128"
Any idea/tips how to get more information for this?
—
Reply to this email directly, view it on GitHub
<#305 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHROW435QUVDQVEQEIKKSDW3MLCLANCNFSM4KCN4KXQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
When stash starts, go into the |
No, didn't see this in the docs or in the commit. Is it used by stash or just for debugging? As the linux on unraid servers has no package manager, it's kinda hard to build packages on your own for it.
Switching to debug or even trace shows nothing more from the server startup.
|
Could you try modifying the docker build to add: |
I tried the image by CarlNs92891 (who deleted their message or got it deleted, idk) which does
and with that it works! |
Can someone from the maintainers tell whats the current blocker here? I really would like to have this running in the official docker images, so I can use watchtower auto updates for my containers, therefore self building with these tricks is not a good option for me. How about having the CUDA image also auto released to docker hub as stashapp/stash:CUDA-latest or similar? |
QSV works I think? |
I'm not sure anymore what kind of hardware encoding works on my NAS, but apparently it's not QSV. With the CUDA image it works, but it's not published on the docker hub, which is my main issue currently |
QSV transcoding works fine with the CUDA build. I agree, it would be great if the maintainers could build and publish this on dockerhub. I've also had a go at modifying the CUDA build to include jellyfin-ffmpeg5 as it includes all the usermode drivers for QSV and NVENC, and also various optimisations for full hardware transcoding. This is working really well for me and the performance is great (I have a 12th gen intel iGPU). My only problem with the CUDA build and my jellyfin-ffmpeg build is that sprite/preview generation does not utilist hardware acceleration. Unfortunately I don't have the knowledge to debug and fix it. As the jellyfin maintainers have already done a lot of hard work in optimising the hardware transcoding on ffmpeg, would it make sense for stash to work towards implementing their version? |
Nvidia limited you to 3 encodes, not 2. And they recently changed it to 5. |
Mind sharing the build file? |
@algers nerethos shared this dockerhub link in the discord for the jellyfin-ffmpeg5 build: https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg Just wanted to add another data point that I wasn't able to get QSV working on an alderlake chip but the jellyfin + CUDA build linked above worked out of the box. Hopefully we can get better HW encoding support added to the release build in the near future. |
This works perfectly. Any chance of it being updated to match the newest release :D |
When I pull this image https://hub.docker.com/r/nerethos/stash-jellyfin-ffmpeg instead of my installed nightly version it crashes (probably the two are not swapable because of the date difference). |
@nerethos |
I got this working with iGPU on the current docker image release.
Hope this helps someone |
i'm running unraid with a 13th gen intel chip. i haven't had a chance to try this yet because i'm still building the server, but does this work with sprite and preview generation? |
I got this working on Unraid as well but with a much older CPU (ivy bridge). Generation tasks still don’t use hardware acceleration, just transcoding tasks. I’m not sure Stash supports this as I didn’t find any config options related to it.. |
I made a script in my config folder called It's contents:
(note, the script actually does other things as well like sets my API keys, downloads FanDB, etc.) The docker-compose file includes the following:
With that, it correctly shows the hardware codecs: |
Thanks for the script! But I am using 13th gen Intel CPU with Iris Xe GPU. I have to install
|
Confirming deepradio's comment works for me as well. My platform is Intel(R) Celeron(R) N5105 @ 2.00GHz (from I didn't feel like bind mounting a script, so I just used the following command directive: command: /bin/sh -c 'apk --no-cache add intel-media-driver && stash' Also, I found it easier just to mount the entire device: devices:
- /dev/dri:/dev/dri |
Correct - the driver you need depends on the age of your CPU. This link will help: https://wiki.archlinux.org/title/Hardware_video_acceleration |
This looks like it's been completed by #3419. Is there any reason left to keep this open? |
Currently, I'm facing an issue with enabling CUDA acceleration for encoding and decoding . I tried using the
|
Hardware transcoding for live streaming works fine. And I understand why in case of preview generation some might want to use |
This worked perfectly for me up until version 26, which was just released. Now my logs don't indicate any hwaccel devices were detected. Any thoughts? Thanks in advance. |
If you turn on debug logging and restart, you should get log output showing the testing for each codec, and the errors encountered to indicate they are not supported. Another user fixed it by removing an old ffmpeg version (4.1) in their stash config directory, so that stash resolves the correct ffmpeg version (which in this case was 6.1). |
That was me! Sorry I made this comment before you responded on Discord. Thanks again for your help. |
Where abouts in the config folder was this older ffmpeg? I can't see one in mine. |
If you didn't put one there, it's probably not there.
It's just one way to make the binary accessible to docker since using apt
or something inside the container defeats the purpose of using a container
in that way.
You can check what version it's finding in its PATH if you run ffmpeg
--version INSIDE the container iirc
…On Sun, Jun 9, 2024, 00:58 wormvortex ***@***.***> wrote:
Where abouts in the config folder was this older ffmpeg? I can't see one
in mine.
—
Reply to this email directly, view it on GitHub
<#305 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHROW7BPKOHTCAV6NZIVJTZGPOFXAVCNFSM4KCN4KX2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJVGYZTCOJYGIZA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I'm trying to get this to run as well. My platform is a Synology 918+ with an Intel Celeron J3455. I added
|
i would love for nvidia NVenc for tarnscoding and Generation
this would also work with amd and intel encoders
this could speed up the Generation process
The text was updated successfully, but these errors were encountered: