New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: 4K HDR media transcoding never starts (10.9) #11380
Comments
This comment was marked as resolved.
This comment was marked as resolved.
The only alteration I made to the ffmpeg log is mild censoring of the movie name; can provide direct copy and paste of the ffmpeg log if desired. |
It's clear that the error is caused by the driver/kernel. Can you disable the AV1 encoder and VPP tonemap options in turn and try again? |
I figured as much, but I'm still surprised. I haven't updated the kernel since 10.8 and it was working fine before the 10.9 update so I figured I'd report it. Disabled AV1 encoder and no dice but I can't seem to find the VPP option anymore? It's no longer in the same place as 10.8 at least. |
I wasn't able to reproduce this issue on my Arc equipped Ubuntu server. I only did a quick test with an Android client and will do more testing when I get home. |
Did some more testing to more closely replicate your situation. I was not able to reproduce. Tried with HEVC DV7.6 or 8.1 -> AV1 SDR. Tried with AV1 HDR10 -> AV1 SDR. All browser (Firefox and Chromium) and codec combinations transcoded and tone mapped. |
Since this night, i have exact the same issue. First i thought something broke hardware wise. After many Hours troubleshooting, i found out jellyfin never transcode anymore (didnt matter from to wich container/format). BUT, im running the latest stable Jellyfin. In my Server is a Intel ARC A380 gpu. OS is Arch Linux. Everytime jellyfin tries something to transcode, this errors i have in dmesg: [ 81.026591] Fence expiration time out i915-0000:01:00.0:ffmpeg[521]:2! And in LOG of Jellyfin self: Sometimes with Error code 135. But that i found out at the last step. Because running ffmpeg alone works without problems with commands like $ ffmpeg -i source.ext -c:v h264_qsv output.mkv I try now to build ffmpeg nightly/git and test if this works. if not, i install debian stable and try again. Because i dont know if this is a bug in the arch package of jellyfin. But so far i can say from my point, it must be a bug in jellyfin, because hardware acceleration with ffmpeg or mpv works just fine. sry for my bad english Edit: Builded ffmpeg from git. Than created a symlink to /usr/lib/jellyfin-ffmpeg/ffmpeg because for some reason it isnt possible anymore to change the path in the webinterface. Now ffmpeg didnt crash and dmesg is fine. But in the transcoding log of jellyfin is now this: Option autorotate (automatically insert correct rotate filters) cannot be applied to output url 0 -- you are trying to apply an input option to an output file or vice versa. Move this option before the file it belongs to. (for example, tried with hardware decoding first, than this error came up with hwaccel) Giving now up. Sadly i didnt have any ideas anymore. I saw that OP have OpenSuse Micro installed with a nearly up to date kernel. My 2 cents are, that something broke the last few Days with the Intel Media Driver, ffmpeg, or opencl or something on rolling releases like Arch or OpenSuse in combination with jellyfin (this makes more sense because ubuntu seems to run fine like @solidsnake1298 mentioned. so i going to install debian now) |
Hi @solisinvictum, what's VAAPI driver version on Arch linux? Can run |
lol i just edited my first post and was one secound away formating my jellyfin server. Of course, here is the output: [user@jellyfin ~]$ vainfo Please see my last edit. |
i dont want to be offend somebody, but if some more logs help, please response wich, than i upload/backup them. Because my whole Family and Friends are using my Server (~20 people) and they nagging me on whatsapp already than my server runs again :-) |
This is possibly related to the i915 driver change in 6.6 kernel and everything after. We are having similar issues with DG2 series on everything after 6.6 kernel and erroring out with Fence expiration time out |
@gnattu good idea. give me a moment, i install a LTS / 6.5 Kernel and test it and report back. |
Yep! @gnattu is absolutly right. i just installed: [user@jellyfin ~]$ uname -a Deleted the ffmpeg symlink, installed jellyfin-ffmpeg again and voila. Transcoding runs perfectly. Only one thing: i just reviewed the pacman.log. i updated in this week from 6.8.4 to 6.8.7. BUT restarted yesterday night, because server is whole the time busy and didnt wanted to interrupt my viewers. So for Arch, it seems something from 6.8.4 to 6.8.7 has changed, wich breaks transcoding/intel dg2 series gpus in combination with jellyfin (because like said, mpv and ffmpeg alone runs just fine). |
I'm bisecting this. The last known working kernel version for 6.6 branch is 6.6.25, and you will have this problem on 6.6.26. I'm looking at 6.8 branch now to cross-checking which change may caused this. |
Now I got the suspecting changes: Both https://lore.kernel.org/stable/20240327155622.538140-4-andi.shyti@linux.intel.com/T/ And apparently this breaks our transcoding. |
for my part, thanks for your good hint. so i stay on arch and 6.5.9 and wait/dont update anymore until 10.9 of jellyfin is released. maybe until than the kernel/jellyfin is fixed in this regard. @gnattu i dont know if it helps, but on Arch, it worked 99,9% with 6.8.4. But Arch maintain a kernel on they repo, with many patches etc. Maybe they had already this or similar problem and fixed it on they kernel? Because like said, i updated this week from 6.8.4 to 6.8.7. And after a reboot, i got these problems. But so far it matters for Arch: i cant say to 100% that this is because of Arch kernel self, or the packages wich get updated along with the kernel update. I can imagine, that something in the jellyfin package from the official Archrepo is wrong wich didnt got fixed in they repo? Because what makes me curious, that ffmpeg works just like it should (but i tested only in this way without any extra flags: ffmpeg -i source.ext -c:v h264_qsv output.mkv (to and from different hardware accelerated codecs wich the A380 is capable of)). |
Arch applies very minimal patches and it is very unlikes for them to applying patches to fix this for us, and this probably has not even raised upstream awareness yet. The suspecting patches are applied directly the upstream kernel and affecting more than arch users. Everyone using those kernels will have their transcoding broken.
We are using much more of that during HDR transcoding. We will require some OpenCL capability and the frame mapping functionalities to work, and their driver changes usually break such usages. (Fun fact: Intel has already broken the HDR tone mapping once on Windows this year: intel/vpl-gpu-rt#323) |
Have you got an Intel Arc for testing? |
I've asked another one having this issue helped me discovered that |
https://gitlab.com/linux-kernel/stable/-/commits/linux-6.6.y/drivers/gpu/drm/i915 Commits between edit: |
I have cross-checked. Only the one I linked is preset in both 6.8.5 and 6.6.26 changes and directly affects DG2. The other |
Make sense. There are also some similar issues in upstream - |
The user one seems to be unrelated as he is using an unaffected kernel. The CI one might be related but I'm not sure |
Now we need someone who can revert that patch series and test it on Arc GPU. |
As for why transcoding is affected by CCS (Compute Command Streamer), I think it is because some stages in the transcoding pipeline such as video scaling and VPP tone-mapping filters are performed on CCS. |
time for Jellyfin OS/Kernel 😄 |
If you can explain me how i apply the patch, i could do it. i have like mentioned a arc a380. i know how i compile a kernel, but not how i patch it.
yeah, something like libreelec (just enough os for kodi) would be sick. with the jellyfin-media-player options and interface. Something you could flash on a pc/raspberry/something else and it starts only jellyfin-media-player :D but this is OT for this thread. |
Based on what you've said, the version of 6.8 that Ubuntu 24.04 will be using is NOT affected by the issues in 6.8.5, correct? I had planned on upgrading to Ubuntu 24.04 along with some planned hardware upgrades so I wanted some clarification before I do that. |
It is hard to say before I actually look at what is inside the so-called 6.8 kernel used in Ubuntu 24.04. Ubuntu’s kernel usually has a lot of patches applied themselves so I am not very sure about that. Edit: I just have a Quick Look at the current Ubuntu noble tree. That kernel is currently basing on 6.8.4 and does not have this patch back-ported for now. But be aware as a kernel upgrade could happen at any time and Ubuntu may move to a newer base in the future, and if this issue has not been addressed by then, your setup will break with an apt upgrade. |
Hi @gnattu. I'm the other one! I'm posting here too to stay up to date, and to lend a hand at further testing if anyone needs my Intel Arc A310 🫡. I'm staying on |
I made a pacman package reverting the changes of the suspecting patch series: https://github.com/gnattu/linux/releases/tag/6.8.7-jelly You can try install this as your kernel to see if it makes things better. This is not verified by myself though so it could have some human errors, so be prepared to fix the kernel when this thing does not boot, you have been warned. |
Im currently working until tomorrow (for your timezone). As soon im at home, i try your kernel out and report back. From my Side, many thanks to community etc back too! rly nice to pin out a problem :) |
Just tested the your patched/reverted kernel on my system. I am able to transcode 4K videos. Looks like that's the offending patch series. Thanks for this @gnattu!!! |
To make the upstream maintainer's life easier, if someone wants to provide the following information, it would be very helpful:
I will attach this information to the upstream bug report as well as the offending patch series. |
I can grab that this evening/in a bit. |
uname:
Dmesg log: Sounds like y'all have already bisected the problem but on the off chance another datapoint is helpful, I'm having a user report that non 4K/tonemapped content is affected by this too. |
Upstream bug report: https://gitlab.freedesktop.org/drm/intel/-/issues/10895 |
Maybe remove the |
Just upgraded to 10.9, glad to see others have the same issue and it's being worked on. Cheering from the side-lines! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@solisinvictum If you don't mind, can you add |
Sure. Let me quick reinstall the kernel, add the parameters etc and reboot and try again. i edit this post as soon im done. Edit: Ok, im just seeing two Users are watching something. Im quick asking if they can stop for 5 minutes. |
This comment was marked as outdated.
This comment was marked as outdated.
Alright, we've received the latest patch from Intel developers: https://github.com/gnattu/linux/releases/tag/6.8.7-intel-set-ccs-mode-on-reset This time, Intel developers reproduced our issue on their side, so this kernel has a higher chance of fixing our issue. As always, the kernel is an Arch Linux package with the patches applied. If anyone has time to verify this kernel, it would be very helpful. |
Nice. Give me a Moment to boot up my PC. Than i try out and report back. |
First, only a minor suggestion someone maybe could pick up for 10.9: It would be nice, if the Kernel would be written where the arrow is. Like: Architecture: X64 (Linux jellyfin 6.8.7-arch1-2-custom) Now to the Topic: It works!
Here are the Jelyfin Transcode Log and full DMESG if wanted to verification everything is good. |
Actually, we removed this architecture entry entirely in version 10.9 because we believe the server should not expose too much information about the underlying system, especially in a public API that does not require any authentication. Therefore, you will no longer see this architecture bar in 10.9. It's good to see that this is properly fixed. Now we only need to wait for this patch to be merged into upstream and for Torvalds to draft a new kernel release. |
Can confirm that the version of 6.8 in Noble is not affected (yet). |
Please describe your bug
Upgraded my 10.8.13 instance to 10.9.0 unstable and ran into the following issue:
When attempting to play 4K HDR media I can see ffmpeg starting for the transcode but it never actually writes anything to the transcode segment file and eventually times out with following error in dmesg:
Fence expiration time out i915-0000:29:00.0:ffmpeg[18950]:2!
I suspect this has something to do with tone-mapping as when I play a SDR 4K file everything works fine. It also doesn't appear to ever stop trying as I just get loading circle forever and I see jellyfin spawn more ffmpeg instances once one dies.
Reproduction Steps
Jellyfin Version
Unstable (master branch)
if other:
No response
Environment
Jellyfin logs
FFmpeg logs
Please attach any browser or client logs here
No response
Please attach any screenshots here
Transcode settings:
Code of Conduct
The text was updated successfully, but these errors were encountered: