Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow direct play startup #873

Closed
mcarlton00 opened this issue Feb 12, 2019 · 22 comments · Fixed by jellyfin-archive/jellyfin-docs#656
Closed

Slow direct play startup #873

mcarlton00 opened this issue Feb 12, 2019 · 22 comments · Fixed by jellyfin-archive/jellyfin-docs#656
Labels
bug Something isn't working confirmed This issue has been reviewed and confirmed upstream The issue is due to a upstream library or project

Comments

@mcarlton00
Copy link
Member

Describe the bug
Direct play from the server has large initial slowdowns, taking an average of 12 seconds to send a response to the client.

To Reproduce
Trigger a direct play using the API:
http://192.168.0.200:8096/emby/Videos/{{ item_id }}/stream?static=true

Expected behavior
The video starts playing immediately, or within a few seconds

Logs

[2019-02-12 03:05:12.700 +00:00] [INF] HTTP "HEAD" "http://192.168.0.200:8096/emby/Videos/bf313a860a766111a7e296fcfc8ddccb/stream?static=true&MediaSourceId=bf313a860a766111a7e296fcfc8ddccb". "Host=192.168.0.200:8096, User-Agent=Kodi/17.6 (X11; Linux x86_64) Arch Linux/0.0 App_Bitness/64 Version/17.6-Git:20190116-nogitfound, Accept=*/*, Accept-Charset=UTF-8,*;q=0.8"
[2019-02-12 13:29:22.466 +00:00] [INF] HTTP Response 200 to "192.168.0.148". Time: "12032""ms (slow)". "http://192.168.0.200:8096/emby/Videos/bf313a860a766111a7e296fcfc8ddccb/stream?static=true&MediaSourceId=bf313a860a766111a7e296fcfc8ddccb" ""
[2019-02-12 13:29:22.473 +00:00] [INF] HTTP "GET" "http://192.168.0.200:8096/emby/Videos/bf313a860a766111a7e296fcfc8ddccb/stream?static=true&MediaSourceId=bf313a860a766111a7e296fcfc8ddccb". "Host=192.168.0.200:8096, Range=bytes=0-, User-Agent=Kodi/17.6 (X11; Linux x86_64) Arch Linux/0.0 App_Bitness/64 Version/17.6-Git:20190116-nogitfound, Accept=*/*, Accept-Charset=UTF-8,*;q=0.8"
[2019-02-12 13:29:46.551 +00:00] [INF] HTTP Response 206 to "192.168.0.148". Time: "12042""ms (slow)". "http://192.168.0.200:8096/emby/Videos

System (please complete the following information):

  • OS: Docker
  • Browser: Kodi plugin/Firefox/Python requests library
  • Jellyfin Version: 10.1.0

Additional context
Direct streaming works with almost no initial delay, the problem seems to be limited to Direct Play features

@mcarlton00 mcarlton00 added the bug Something isn't working label Feb 12, 2019
@sachk
Copy link
Contributor

sachk commented Mar 6, 2019

I'm having the same problem. It's far worse when the server has 200ms ping, it often takes longer than a minute to start.

@mcarlton00
Copy link
Member Author

I finally narrowed mine down to being caused by a poor NFS mount in my library. Something in my NAS/NFS exports was causing massive slowdowns where jellyfin was concerned, but had fine performance in other tests. I've temporarily fixed it by using sshfs mounts instead of nfs mounts to my library. I'm planning on going back and doing more tests to see if I can pinpoint exactly what's causing it, though.

@JustAMan
Copy link
Contributor

JustAMan commented Mar 6, 2019

My current suspects would be slow syscalls on such shares, would be great if you could strace problematic cases...

@mcarlton00
Copy link
Member Author

I'm hoping to get a good strace result this weekend, as long as I get some free time.

@sachk
Copy link
Contributor

sachk commented Mar 6, 2019

I'm running my Jellyfin on Google Drive mounted with rclone, so I assume I'm also having the same problem. I can also do some profiling soon.

@mcarlton00
Copy link
Member Author

Ran several tests, mostly searching for specific open or read calls that were taking a lot of time, but I couldn't find any. I finally ran with -c to get a summary at the end. Let the server run for ~15 minutes. In that time I hit the API directly several times and started/stopped multiple direct play streams from kodi.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.68   11.383282       64678       176        15 futex
  0.10    0.011822           5      2182           munmap
  0.09    0.009890           2      4231           mprotect
  0.07    0.008271           3      2486           mmap
  0.01    0.001387           5       266           read
  0.01    0.001215           4       326       108 openat
  0.01    0.001046           1      1684         1 lstat
  0.01    0.000594           1       914           write
  0.00    0.000562           2       311           fstat
  0.00    0.000354           2       168         2 close
  0.00    0.000332           2       200           fcntl
  0.00    0.000280           1       419        44 stat
  0.00    0.000200           6        34           gettid
  0.00    0.000197           2       106           lseek
  0.00    0.000184           6        31           rt_sigaction
  0.00    0.000119          13         9           clone
  0.00    0.000109           4        30        30 access
  0.00    0.000103           4        28           sched_yield
  0.00    0.000057          14         4         2 unlink
  0.00    0.000046           6         8           prlimit64
  0.00    0.000039          20         2           mknod
  0.00    0.000039          10         4           pipe2
  0.00    0.000033           3        12           flock
  0.00    0.000025           1        18           brk
  0.00    0.000017           3         5           ioctl
  0.00    0.000017           3         5           sysinfo
  0.00    0.000016           3         5           readlink
  0.00    0.000015           8         2           getsid
  0.00    0.000014           2         6           getpid
  0.00    0.000011          11         1           socket
  0.00    0.000011          11         1           statfs
  0.00    0.000011           4         3           sched_getaffinity
  0.00    0.000007           4         2           geteuid
  0.00    0.000007           2         3           sigaltstack
  0.00    0.000005           5         1           bind
  0.00    0.000005           3         2           sched_setscheduler
  0.00    0.000004           2         2           sched_getparam
  0.00    0.000004           2         2           sched_getscheduler
  0.00    0.000004           4         1           fadvise64
  0.00    0.000003           2         2           sched_get_priority_max
  0.00    0.000002           1         2           sched_get_priority_min
  0.00    0.000001           1         1           rt_sigreturn
  0.00    0.000000           0         2           poll
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0        25           getdents
  0.00    0.000000           0         2           getuid
  0.00    0.000000           0         1           mlock
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   11.420340                 13732       202 total

Each api call/direct play stream has about a 12.5 second response time still, according to the JF logs, but none of the syscalls indicate that. The futex has ~11 seconds total, but it didn't seem to be being called in real time as a direct play was started. I didn't see any new calls when a direct play was started, actually.

@anthonylavado anthonylavado added this to Needs triage in Issue Triage for Main Repo Jul 4, 2019
@stale
Copy link

stale bot commented Jul 29, 2019

Issues go stale after 60d of inactivity. Mark the issue as fresh by adding a comment or commit. Stale issues close after an additional 7d of inactivity. If this issue is safe to close now please do so. If you have any questions you can reach us on Matrix or Social Media.

@stale stale bot added the stale Stale and will be closed if no activity occurs label Jul 29, 2019
@stale stale bot closed this as completed Aug 5, 2019
Issue Triage for Main Repo automation moved this from Needs triage to Closed/Done Aug 5, 2019
@mcarlton00
Copy link
Member Author

Update: Reports from users on matrix that this is still happening. FreeBSD host, Linux VM. Probably needs to be reopened

@sparky8251 sparky8251 reopened this Feb 21, 2020
Issue Triage for Main Repo automation moved this from Closed/Done to Needs triage Feb 21, 2020
@stale stale bot removed the stale Stale and will be closed if no activity occurs label Feb 21, 2020
@mcarlton00
Copy link
Member Author

Received reports of similar issues when using Open Media Vault from dinki on matrix

@JustAMan
Copy link
Contributor

Syscalls (and their timings) gathered by strace from running Jellyfin instance would help immensely here. Otherwise it would be a blind search in the dark.

@stale
Copy link

stale bot commented Jan 6, 2021

This issue has gone 120 days without comment. To avoid abandoned issues, it will be closed in 21 days if there are no new comments.
If you're the original submitter of this issue, please comment confirming if this issue still affects you in the latest release or nightlies, or close the issue if it has been fixed. If you're another user also affected by this bug, please comment confirming so. Either action will remove the stale label.
This bot exists to prevent issues from becoming stale and forgotten. Jellyfin is always moving forward, and bugs are often fixed as side effects of other changes. We therefore ask that bug report authors remain vigilant about their issues to ensure they are closed if fixed, or re-confirmed - perhaps with fresh logs or reproduction examples - regularly. If you have any questions you can reach us on Matrix or Social Media.

@stale stale bot added the stale Stale and will be closed if no activity occurs label Jan 6, 2021
@stale stale bot closed this as completed Feb 2, 2021
Issue Triage for Main Repo automation moved this from Needs triage to Closed/Done Feb 2, 2021
@mcarlton00
Copy link
Member Author

Discussed in chat again. Still happening in rare circumstances. Josh ran into a likely related issue when setting up https://github.com/joshuaboniface/rffmpeg. Quote from chat:

and specifically - this is as far as i traced it - basically we're waiting on a "chunk" to be loaded for the streamer to send it back to the client. but at the start of the stream, on NFS, it waits a relatively very long time (about 10-15s) and then it finally picks up the piece and sends it
when i was debugging this with rffmpeg, where it was writing out the individual files "over" NFS (with the jellyfin side being the NFS client), I was pretty sure I traced it to the inotify call, which JF was using to know that the transcoding had finished writin gthe file, but that never came

@dkanada dkanada added confirmed This issue has been reviewed and confirmed and removed stale Stale and will be closed if no activity occurs labels Feb 6, 2021
@dkanada dkanada reopened this Feb 6, 2021
Issue Triage for Main Repo automation moved this from Closed/Done to Needs triage Feb 6, 2021
@DomiStyle
Copy link
Contributor

I'm also running into this issue on an Ubuntu server with a NFS mounted library.

Transcoding streams start almost immediately but direct play streams take between 5-15 seconds to start.
Is there any info you guys need to debug this further? I'm running Plex from the same NFS mount and there are no issues there.

@pakar
Copy link

pakar commented Feb 24, 2021

I have the same behavior and it seems possible to reproduce on at least my system.

I cannot reinstall the host-machine this has been tried on, but the slow behavior is repeatable and has a strace attached

Setup is on a single machine:
Host:

  • Debian 9 - 4-core cpu. 16Gb ram, NFS-server with audio-files.
  • VM images located on SSD
  • NFS share located on HDD. Share contains FLAC and mp3's + m3u playlists with relative paths.
    VM:
  • Created with default-settings with virt-manager (2.0.0-3) using a bridged interface with the host.
  • Fresh install of Ubuntu-server 20.04: 3-core, 2Gb ram, latest jellyfin (both stable and unstable tested with same results)

Latency between VM and host is <1ms

NFS share mount-options: rw,noatime,vers=3,rsize=32768,wsize=32768,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountvers=3,mountproto=udp,local_lock=none

EnableAudioPlaybackTranscoding: True
RemoteClientBitrateLimit: 1000000000
IsInLocalNetwork: True

[2021-02-24 23:41:20.377 +01:00] [WRN] Slow HTTP Response from "http://10.254.128.32:8096/Audio/ac14450024ef74e3cfe99d1860377209/universal?UserId=05c57322e3e046f5b471ba7db6ba2398&DeviceId=TW96aWxsYS81LjAgKExpbnV4OyBBbmRyb2lkIDExOyBLQjIwMDMgQnVpbGQvUlAxQS4yMDEwMDUuMDAxOyB3dikgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgVmVyc2lvbi80LjAgQ2hyb21lLzg4LjAuNDMyNC4xODEgTW9iaWxlIFNhZmFyaS81MzcuMzZ8MTYxNDIwNjM5NjIxMA11&MaxStreamingBitrate=140000000&Container=opus%2Cwebm%7Copus%2Cmp3%2Caac%2Cm4a%7Caac%2Cm4b%7Caac%2Cflac%2Cwebma%2Cwebm%7Cwebma%2Cwav%2Cogg&TranscodingContainer=ts&TranscodingProtocol=hls&AudioCodec=aac&api_key=184f8beb166b4c01829aa975e2367806&PlaySessionId=1614206396231&StartTimeTicks=0&EnableRedirection=true&EnableRemoteMedia=false" to "10.254.127.107" in 0:00:06.2372201 with Status Code 206
[2021-02-24 23:41:20.950 +01:00] [INF] WS "10.254.127.107" closed

strace collected via:

  1. Fresh boot of VM
  2. Start app on Android phone
  3. strace -o strace-slow.txt -f -p 2146

  4. Press play in app
  5. Wait for playback to start (~6 seconds in this instance)
  6. play for a few seconds
  7. terminate strace
    strace-slow.txt

CPU during these tests remain below 5%
Network traffic:

  1. Press play on phone
  2. 4-6 big traffic spikes (almost like it's re-reading the full file?)
  3. Log prints the "Slow HTTP Response from"
  4. Playback starts on phone.

I also noticed that you can cause an even higher response-times by pressing play on a few files without waiting for the audio to start playing.

[2021-02-25 00:14:36.652 +01:00] [WRN] Slow HTTP Response from "http://10.254.128.32:8096/Audio/5e6ecc3e2d492f22f26e1b02420ffc4f/universal?UserId=05c57322e3e046f5b471ba7db6ba2398&DeviceId=TW96aWxsYS81LjAgKExpbnV4OyBBbmRyb2lkIDExOyBLQjIwMDMgQnVpbGQvUlAxQS4yMDEwMDUuMDAxOyB3dikgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgVmVyc2lvbi80LjAgQ2hyb21lLzg4LjAuNDMyNC4xODEgTW9iaWxlIFNhZmFyaS81MzcuMzZ8MTYxNDIwNjM5NjIxMA11&MaxStreamingBitrate=140000000&Container=opus%2Cwebm%7Copus%2Cmp3%2Caac%2Cm4a%7Caac%2Cm4b%7Caac%2Cflac%2Cwebma%2Cwebm%7Cwebma%2Cwav%2Cogg&TranscodingContainer=ts&TranscodingProtocol=hls&AudioCodec=aac&api_key=184f8beb166b4c01829aa975e2367806&PlaySessionId=1614208091765&StartTimeTicks=0&EnableRedirection=true&EnableRemoteMedia=false" to "10.254.127.107" in 0:00:25.5485581 with Status Code 206
[2021-02-25 00:14:48.945 +01:00] [WRN] Slow HTTP Response from "http://10.254.128.32:8096/Audio/ecd4c6a1c006f39b9744a6aea0f642e9/universal?UserId=05c57322e3e046f5b471ba7db6ba2398&DeviceId=TW96aWxsYS81LjAgKExpbnV4OyBBbmRyb2lkIDExOyBLQjIwMDMgQnVpbGQvUlAxQS4yMDEwMDUuMDAxOyB3dikgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgVmVyc2lvbi80LjAgQ2hyb21lLzg4LjAuNDMyNC4xODEgTW9iaWxlIFNhZmFyaS81MzcuMzZ8MTYxNDIwNjM5NjIxMA11&MaxStreamingBitrate=140000000&Container=opus%2Cwebm%7Copus%2Cmp3%2Caac%2Cm4a%7Caac%2Cm4b%7Caac%2Cflac%2Cwebma%2Cwebm%7Cwebma%2Cwav%2Cogg&TranscodingContainer=ts&TranscodingProtocol=hls&AudioCodec=aac&api_key=184f8beb166b4c01829aa975e2367806&PlaySessionId=1614208091767&StartTimeTicks=0&EnableRedirection=true&EnableRemoteMedia=false" to "10.254.127.107" in 0:00:34.9520038 with Status Code 206
[2021-02-25 00:14:48.946 +01:00] [WRN] Slow HTTP Response from "http://10.254.128.32:8096/Audio/7541f16b3f0c57b1e15ee422eef868b9/universal?UserId=05c57322e3e046f5b471ba7db6ba2398&DeviceId=TW96aWxsYS81LjAgKExpbnV4OyBBbmRyb2lkIDExOyBLQjIwMDMgQnVpbGQvUlAxQS4yMDEwMDUuMDAxOyB3dikgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgVmVyc2lvbi80LjAgQ2hyb21lLzg4LjAuNDMyNC4xODEgTW9iaWxlIFNhZmFyaS81MzcuMzZ8MTYxNDIwNjM5NjIxMA11&MaxStreamingBitrate=140000000&Container=opus%2Cwebm%7Copus%2Cmp3%2Caac%2Cm4a%7Caac%2Cm4b%7Caac%2Cflac%2Cwebma%2Cwebm%7Cwebma%2Cwav%2Cogg&TranscodingContainer=ts&TranscodingProtocol=hls&AudioCodec=aac&api_key=184f8beb166b4c01829aa975e2367806&PlaySessionId=1614208091766&StartTimeTicks=0&EnableRedirection=true&EnableRemoteMedia=false" to "10.254.127.107" in 0:00:36.5218395 with Status Code 206

Give me a shout if you want specific traces.. This jellyfin VM is setup for testing/debugging..

@DomiStyle
Copy link
Contributor

Since I had this issue in a few of my other .NET applications I went ahead and created an issue in the .NET repository:
dotnet/runtime#48757

It's reproducible even with a blank C# project so I assume the fault doesn't lie with Jellyfin in this case. That also explains why transcoding doesn't have this issue since ffmpeg accesses the file directly.

@cvium cvium added the upstream The issue is due to a upstream library or project label Apr 10, 2021
@DomiStyle
Copy link
Contributor

I think I have identified the problem, my NFS share points to a mergerfs mount. Recently I had an issue with stale file handles and applied the options listed here.

The problem is now gone and my C# applications (including JF) no longer have a 6 second delay.

Can anyone comment if they use any other fuse file systems or even mergerfs? This would still be a .NET problem but at least there's a way to work around it.

@crobibero
Copy link
Member

Can anyone comment if they use any other fuse file systems or even mergerfs? This would still be a .NET problem but at least there's a way to work around it.

I use mergerfs with snapraid and have immediate DirectPlay playback

@paulmorabito
Copy link

I had the same problem as above also and filed a separate issue before finding this (#6559). I ended up mounting the drives as SMB and passing through the directories as a workaround.

@EorlBruder
Copy link

The issue in the .NET-repo now received an update which should solve the performance issues: dotnet/runtime#48757 (comment)

The root cause for the delay is .NET locking over NFSv3 without the lock service being available.

The options are:

Disable .NET locking using DOTNET_SYSTEM_IO_DISABLEFILELOCKING (introduced in .NET 6).
Disable locking for the mount using the nolock option.
Enable the lock service.
Use NFS4 which has built-in lock support.

For me mounting with the nolock-option solved the problem. Is the first option something which should/could be added in jellyfin (don't really know anything about .NET)? Else it might be good to add these solutions to the storage-documentation.

@mcarlton00
Copy link
Member Author

Adding it to the docs is probably the best solution for the time being. Feel free to open a PR adding for it here if you're willing.

EorlBruder added a commit to EorlBruder/jellyfin-docs that referenced this issue Apr 10, 2022
Describe the solution causing slow directplay startup issues with
NFSv3, which is discussed in this issue on the jellyfin-repository
jellyfin/jellyfin#873

Fixes jellyfin/jellyfin#873
@Mengine
Copy link

Mengine commented Apr 10, 2022

Had this exact problem today when moving my library from a freenas nfs share to a truenas scale nfs share. All the server side mount settings were alike so was the client mount options (nfs 3). The nolock option did however solve the problem straight away.

Issue Triage for Main Repo automation moved this from Needs triage to Closed/Done Apr 10, 2022
@SIGSTKFLT
Copy link

Had this exact problem today when moving my library from a freenas nfs share to a truenas scale nfs share. All the server side mount settings were alike so was the client mount options (nfs 3). The nolock option did however solve the problem straight away.

I too was sharing from truenas scale and this has fixed it for me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed This issue has been reviewed and confirmed upstream The issue is due to a upstream library or project
Projects
Development

Successfully merging a pull request may close this issue.