Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rar2fs coredump - Signal 6 (ABRT) #160

Closed
hrxcodes opened this issue Mar 17, 2021 · 12 comments
Closed

rar2fs coredump - Signal 6 (ABRT) #160

hrxcodes opened this issue Mar 17, 2021 · 12 comments

Comments

@hrxcodes
Copy link

hrxcodes commented Mar 17, 2021

Issue

rar2fs is randomly stopped, I've managed to get the core dump of one of these failures.
rar2fs has been working fine for the last 12+months before this issue started happening unrelated to any updates as far as I could tell.

Does the core dump tell you anything interesting? Memory at the time was as follows:

Total in system: 31.21GiB
Used: 9.37GiB
Cache + Buffer: 21.55GiB
Free: 293.58MiB

Since I'm seeing malloc in the core dump I started to suspect it's memory related but there is plenty of cache to take from.
I'm kind of stuck so any ideas are welcome 😃
Thank you for your work on rar2fs!

Versions

  • Linux 5.9.12-arch1-1
  • rar2fs v1.29.4-gitcad225 (DLL version 8)
    FUSE library version: 2.9.9
    fusermount version: 2.9.9
    using FUSE kernel interface version 7.19
    
  • Unrar Source: 5.9.4 (tried 6.0.3 first then downgraded when I started having issues)

ldd

ldd /usr/local/bin/rar2fs
	linux-vdso.so.1 (0x00007ffd94bf6000)
	libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00007f829aa0f000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f829aa08000)
	librt.so.1 => /usr/lib/librt.so.1 (0x00007f829a9fd000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f829a820000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f829a6db000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f829a6c1000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f829a69e000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f829a4d1000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f829aac7000)

Logs

dmesg

Mar 17 11:10:44.909223 host kernel: audit: type=1701 audit(1615975844.897:76578): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3753864 comm="rar2fs" exe="/usr/local/bin/rar2fs" sig=6 res=1
Mar 17 11:10:44.926067 host kernel: audit: type=1334 audit(1615975844.920:76579): prog-id=43 op=LOAD
Mar 17 11:10:44.926110 host kernel: audit: type=1334 audit(1615975844.920:76580): prog-id=44 op=LOAD
Mar 17 11:10:44.926128 host kernel: audit: type=1130 audit(1615975844.920:76581): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-2056350-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 17 11:10:45.256811 host kernel: audit: type=1131 audit(1615975845.250:76582): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-2056350-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

journal

Mar 16 08:04:35 host systemd[1]: media-storage-stuff-level1.automount: Got automount request for /media/storage/stuff/level1, triggered by 3753845 (ls)
Mar 17 11:10:44 host systemd[1]: media-storage-stuff-level1.automount: Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?
Mar 17 11:10:44 host systemd[1]: media-storage-stuff-level1.automount: Failed with result 'unmounted'.

coredump

           PID: 3753864 (rar2fs)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Wed 2021-03-17 11:10:44 CET (46min ago)
  Command Line: rar2fs /media/storage/stuff/level1 /media/storage/stuff/mount -o rw,allow_other,dev,suid
    Executable: /usr/local/bin/rar2fs
 Control Group: /system.slice/media-storage-stuff-mount.mount
          Unit: media-storage-stuff-mount.mount
         Slice: system.slice
       Boot ID: 03494dfcd5f54869a9f63d8d387dcaae
       Storage: /var/lib/systemd/coredump/core.rar2fs.0.03494dfcd5f54869a9f63d8d387dcaae.3753864.1615975844000000.zst
       Message: Process 3753864 (rar2fs) of user 0 dumped core.

                Stack trace of thread 3950980:
                #0  0x00007f1be01d7ef5 raise (libc.so.6 + 0x3cef5)
                #1  0x00007f1be01c1862 abort (libc.so.6 + 0x26862)
                #2  0x00007f1be0219f38 __libc_message (libc.so.6 + 0x7ef38)
                #3  0x00007f1be0221bea malloc_printerr (libc.so.6 + 0x86bea)
                #4  0x00007f1be02236c8 _int_free (libc.so.6 + 0x886c8)
                #5  0x00007f1be0226ca8 __libc_free (libc.so.6 + 0x8bca8)
                #6  0x0000560ddf2d62d8 __resolve_dir (rar2fs + 0x1d2d8)
                #7  0x0000560ddf2d6a3e readdir_scan (rar2fs + 0x1da3e)
                #8  0x00007f1be06e7d9b n/a (libfuse.so.2 + 0xed9b)
                #9  0x00007f1be06ef6b2 n/a (libfuse.so.2 + 0x166b2)
                #10 0x00007f1be06f07c4 n/a (libfuse.so.2 + 0x177c4)
                #11 0x00007f1be06ed3f0 n/a (libfuse.so.2 + 0x143f0)
                #12 0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #13 0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 3753865:
                #0  0x00007f1be037d9ba __futex_abstimed_wait_common64 (libpthread.so.0 + 0x159ba)
                #1  0x00007f1be0379b98 __new_sem_wait_slow64.constprop.0 (libpthread.so.0 + 0x11b98)
                #2  0x00007f1be06ed639 fuse_session_loop_mt (libfuse.so.2 + 0x14639)
                #3  0x00007f1be06f31ab fuse_loop_mt (libfuse.so.2 + 0x1a1ab)
                #4  0x0000560ddf2cf3fb work_task (rar2fs + 0x163fb)
                #5  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #6  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 3753866:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 3780051:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 3754421:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 3753864:
                #0  0x00007f1be0262125 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc7125)
                #1  0x00007f1be0267357 __nanosleep (libc.so.6 + 0xcc357)
                #2  0x00007f1be026728e sleep (libc.so.6 + 0xcc28e)
                #3  0x0000560ddf2d0224 work (rar2fs + 0x17224)
                #4  0x0000560ddf2c9b73 main (rar2fs + 0x10b73)
                #5  0x00007f1be01c2b25 __libc_start_main (libc.so.6 + 0x27b25)
                #6  0x0000560ddf2c9fde _start (rar2fs + 0x10fde)

                Stack trace of thread 3753867:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
                
                Stack trace of thread 2056332:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
@hasse69
Copy link
Owner

hasse69 commented Mar 17, 2021

Does not really look good, does it? ☹️
I thought I caught all these bugs already but apparently not.
Any idea how to recreate this crash or is it simply random?

@hasse69
Copy link
Owner

hasse69 commented Mar 17, 2021

Good news is that is seems to be isolated to the __resolve_dir() function which is then easier to track down than some asynchronous event and it seems to be a call to free() that is bad. There are a few of those in this function and the one I suspect the most is related to an error case/branch in the code. If this would be a bug in the "normal" flow I think we would have caught it already.

When you try to reproduce. can you please run rar2fs in the foreground using the -f flag? That way we can catch any output from rar2fs to stdout/stderr which might give us a clue to what path is not handled properly.

@hasse69
Copy link
Owner

hasse69 commented Mar 17, 2021

Or even better, run it through gdb if you know how to do that? If you do please run the binary you compiled and not the one you install since the latter would become stripped from any symbols. Note that the -f flag is needed also in this case.

@hrxcodes
Copy link
Author

Yeah it showed up from nowhere and happens quite regularly. So while I don't have instructions on how to reproduce, its hows up after a few hours or days (can happen multiple times a day, or work for a few days).

I will let you know as soon as I have something from gdb.

When it stops, do you want just a bt of the current thread or also things like info locals and info threds ?

Notes to self (and possibly others in the future):

sudo gdb ./rar2fs
handle SIG32 pass nostop noprint
r <rar2fs-arguments>

@hasse69
Copy link
Owner

hasse69 commented Mar 18, 2021

A stack trace and possibly a dump of some of the symbols/variables in the context of the crash would be enough for now. The value of the pointer being freed would be good. I suspect it to be a rogue pointer not initialized properly. Otherwise I would have expected some double free or corruption error. Calling free() with NULL is always ok.

@hasse69
Copy link
Owner

hasse69 commented Mar 21, 2021

In case you can reproduce, also try this attached patch on top of master/HEAD. I have my suspicions that patch might solve this issue.

issue160.patch.txt

@hrxcodes
Copy link
Author

So far it has been running smooth, but I suspect not for much longer. It's always like this when you want it to happen it doesn't.
I'll let you know once it crashes and then try and update to the patch you supplied and run it in gdb again in case it would not solve the issue.

@hasse69
Copy link
Owner

hasse69 commented Mar 27, 2021

No news here? Still no crash to work with?

@hrxcodes
Copy link
Author

Annoying as it is it's suddenly working as expected, no crashes so far since I started it.
I'm keeping an eye on it every day but it's solid so far even without the patch.
I suggest we keep this open another week and if it does not happen we can close it. The longest I had it running for before was 12 days (been 11 days since I started it in gdb).

It's just annoying it can't be reproduced from having issues multiple times a day/week -> creates issue -> can't reproduce 😞
I will keep you updated as soon as there is something happening on my end

@hasse69
Copy link
Owner

hasse69 commented Mar 29, 2021

Let's just keep it open then.

When I studied the actual crash signature there are not many places it would blow up like this. I think the patch provided would cover that. It would be a rather rare situation for this to happen so you must have been very unlucky or lucky to spot it. Never the less, the problem is/was there. Just need to wait for it.

@hrxcodes
Copy link
Author

hrxcodes commented Apr 9, 2021

Still working perfectly fine curiously enough.
I'll keep it running in gdb until it either crashes on me or I need to reboot the box.

I will get back when/if something bad happens, you may close the issue in the meanwhile if you want.
Thank you for your work on rar2fs, it's great! 🍻

hasse69 pushed a commit that referenced this issue Apr 9, 2021
Reset pointer after free to avoid a potential double-free error.
Even though it has not yet been confirmed this might be the reason
behind issue #160.

Signed-off-by: Hans Beckerus <hans.beckerus at gmail.com>
@hasse69
Copy link
Owner

hasse69 commented Apr 9, 2021

Despite the fact you have not been able to reproduce the issue I have chosen to merge the patch since it does no harm really.
Let's close this issue but feel free to re-open when/if needed.

@hasse69 hasse69 closed this as completed Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants