-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnfs-related SegFault/Abort with recent master builds of Kodi #124
Comments
Hm. So, as far as you know, up to and including 6894180 all was well ? I will try to see what could be happening. But all changes post 6894180 On Fri, Aug 7, 2015 at 9:24 AM, MilhouseVH notifications@github.com wrote:
|
Yes, no reports of problems with 6894180. According to the user experiencing the crashes this only started with the #0803 build[1], which to be honest doesn't really make a great deal of sense as the #120 and 1.9.8 commits were added to libnfs on 2 Aug and included in build #0802, so #0802 should also have been crashing but apparently did not. Although my suspicion would be that if this is in any way related to the most recent 2288339 commits, then it may just be that a crash has not yet been observed in #0802 as the crash obviously doesn't happen often, takes a while and I don't think we know what is triggering the crash, so it could just be that #0802 hasn't been tested long enough to confirm the problem. So far there is just the one user reporting this issue. |
The assert means that we have a data corruption, or use after free, issue. I think one of my merges today to address an issue with the reconnect logic where we might, at best, deadlock in some situations when we reconnect a tcp connection could be related. Lets see if the user can reproduce using 5c7a0f0 or later. |
OK many thanks, I did ask the user if this issue had repeated but |
Hi, there's been another couple of libnfs segfault reports when using latest libnfs master 82d2a22 with Kodi master: http://forum.kodi.tv/showthread.php?tid=231092&pid=2133289#pid2133289 (with debug-enabled crashlog) |
Looking at the first one. The backtrace : #0 GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:56 struct rpc_pdu *rpc_allocate_pdu(struct rpc_context *rpc, int program,
asserted. The RPC context is bogus. === It is called from : int rpc_nfs3_read_async(struct rpc_context *rpc, rpc_cb cb, struct
private_data, (zdrproc_t)zdr_READ3res, sizeof(READ3res)); === which is called from : static int nfs_pread_async_internal(struct nfs_context *nfs, struct ...
=== which is called from : int nfs_read_async(struct nfs_context *nfs, struct nfsfh *nfsfh, === which is called from : int nfs_read(struct nfs_context *nfs, struct nfsfh *nfsfh, uint64_t
which is called from kodi. So the assert itself is triggered because we do not have a valid RPC This could happen if 1, we passed a garbage nfs context pointer from kodi->libnfs. Unlikely. 2, if the nfs context is good, and nfs->rpc context is good too, but This is because we clear rpc->magic when we destroy the context, to 3, You do call nfs_destroy_context() which in turn calls But, then, that does not explain why this only happens for this (Could you add logging and print when you create a nfs_context, and 4, there is memory corruption that has caused the rpc context to be is not longer valid. I think 4 is most likely here, followed by unlikely 3, and 1 and 2 Something that is REALLY interesting though is this part of the stack trace : #6 0x69cf6fcc in nfs_pread_async_internal (nfs=0x3f5bed0, ... #9 0x005b3060 in XFILE::CNFSFile::Read (this=0x69787ac8, In stackframe #6 count is suddenly a HUGE random value, while it Unfortunately, in the intermediate frames we can not see what this But as both the functions in frames #7 and #8 simply just pass the Very strange. Looks memory memory corruption. Could the stack have Is it possible to get libnfs built without optimization? so the stack I will look further. On Wed, Oct 14, 2015 at 11:12 AM, MilhouseVH notifications@github.com
|
If you have a patch with extra logging (or anything else you think might be worth adding) I'll be more than happy to apply it in a test build and get you updated logs.
I've uploaded a new debug-enabled build (#1015x) which uses |
Hi. Nobody has been able to reproduce the crash problem with build #1015x (optimisations disabled, debug-enabled) in a week of testing. However after switching back to a regular build, #1022, an NFS-related crash has occurred. The new crash is a little different from before, but does seem to NFS related, but could also be a Kodi core issue (is the Could the fact that the original crash problem didn't happen with #1015x suggest that a compiler optimisation is responsible (gcc4.9 on OpenELEC)? |
I don't think there is a bug in the compiler optimization. It is more The nfs_service error might is interesting. Since this merge contains both changes to how to deal with socket I will try to revert these patches over the next few days. On Sun, Oct 25, 2015 at 7:48 PM, MilhouseVH notifications@github.com
|
I'd be happy to include any patches in the builds that might help narrow down or confirm the issue, for example additional logging - the patches wouldn't have to be committed so if you'd like something tested before committing, let me know. |
Hi. I've just experienced a SIGSEGV (most likely an assert, due to corruption), this time with x86 Kodi 17 master and latest libnfs master (build details). Crash log: http://sprunge.us/PNZe (unfortunately not debug-enabled but maybe enough to give a clue?) The crash happened after I'd paused a video for quite a while (13h16m!) and when I unpaused the video to resume playback Kodi immediately crashed. Kodi crashing "after pausing a video for a while then unpausing" seems to crop up fairly often on the forums, but is hard to reproduce on demand. Could Kodi be corrupting something libnfs-related while libnfs keeps the connection alive? Or maybe after a certain amount of time (several hours?) the NFS context is invalidated by the server? My NFS server is a FreeNAS 8.3.x box. What seems a little odd is this:
That's 13h16m elapsed, but 97 x 180 is only 17460 seconds or 4h51m - shouldn't there have been more like 265 keep alive attempts during 13h16m? I'll see if I can get lucky with a debug-enabled build to reproduce and get you a better stacktrace (no promises though - this is a pretty random/hard to reproduce issue). |
On Thu, Feb 4, 2016 at 8:50 PM, MilhouseVH notifications@github.com wrote:
libnfs itself does not try to keep the connection alive. Instead, IF I think that kodi even tries to prevent this from happening (server
I think it should. There should definitely be 265 of them. This is what kodi does for the keepalives: void CNfsConnection::keepAlive(std::string _exportPath, struct nfsfh Note that while this function does call out to nfs_read() too, it does This looks like kodi started writing these log entries as it should at 07:49 I.e. the line The crash then happened as since you unpaused the video. Again hit a #0 0x00007f290ed52ce2 in ?? () from /usr/lib/libnfs.so.8 We dont have any asserts to check for the magic value in the I.e. just like last time, the nfs structure has been corrupted. But we My hypothesis is that everything was running fine for 4h51m. Other parts of kodi still held a pointer to the now free()d context I think the missing keepalive log messages could be a smoking gun. Now, what to test. What you also could try could be to run kodi.bin under gdb. Then if If you can build and statically link a version of kodi.bin that
|
In kodi, there are 4 places where nfs_destroy_context() is called. Can you add log entries for these 4 places to see if it is triggered On Thu, Feb 4, 2016 at 11:04 PM, ronnie sahlberg
|
Many thanks for the insight. I'll add the logging for nfs_destroy_context(), that seems to be the most likely way to confirm what might be happening - removing/disabling the nfs_destroy_context() calls might stop the issue but would then make reproducing not just hard but impossible! Do you have a Raspberry Pi 1 or 2, or x86 box on which you can run OpenELEC - I could upload a debug-enabled OpenELEC build with gdb and all symbols. |
Cracked it, I think. I say "I think" as I've seen both a SIGSEGV and SIGABRT while reproducing this issue - I suspect it's the same cause but perhaps slightly different paths that ultimately result in the crash. Debug-enabled crash logs: Note that this build includes the following patch with additional Basically, the issue involves pausing an nfs:// video, then while the video is paused performing any JSON RPC query that fiddles with the NFS context (ie. refreshing and eventually destroying then creating a new context), then finally unpausing the paused video and... boom. Steps to reproduce (log details from
produces:
If you want me to open a Kodi trac ticket I can do so, as it's really a Kodi issue rather than libnfs, but thought I'd run this past you first in case you wanted to handle it differently. |
That is good stuff! Please open a kodi track ticket so this can be fixed. A very slow memory leak (of < 100 bytes at a time) is pretty harmless Since kodi will open and destroy multiple contexts at a time. On Sun, Feb 7, 2016 at 9:47 AM, MilhouseVH notifications@github.com wrote:
|
Opened trac #16576. In terms of the improved logging, is this what you had in mind?
|
That looks like much better logging! On Sun, Feb 7, 2016 at 4:51 PM, MilhouseVH notifications@github.com wrote:
|
Closing this since we have pretty much toot caused the issues to a use-after-free |
I've had a report of crashes in Kodi with RPi2 test builds after 2 August - this would coincide with the inclusion of 2288339 (1.9.8).
Prior to this, my test builds had been including libnfs 6894180 and no reported issues.
Since the 1.9.8 bump on 2 August appears fairly benign, I'm not sure what else might be responsible for these crashes.
Below is a debug-enabled crashlog from build #0803 - would appreciate if you could give it the once-over:
http://sprunge.us/HKOD
PS. please ignore the references to 1.9.7, this is just the OpenELEC internal version that I haven't changed as it simplifies the process of updating source tarballs - be assured I'm using the source from libnfs 2288339.
The text was updated successfully, but these errors were encountered: