Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rar2fs stops responding and taking up 100% of cpu #11

Closed
hasse69 opened this issue Mar 13, 2015 · 22 comments
Closed

rar2fs stops responding and taking up 100% of cpu #11

hasse69 opened this issue Mar 13, 2015 · 22 comments

Comments

@hasse69
Copy link
Owner

hasse69 commented Mar 13, 2015

I have an Ubuntu 12.04 LTS machine with plex media server that accesses
an ext 4 partition on a raid via rar2fs. About once every day i have to kill -9 <rar2fs pid> and umount my dir because rar2fs locks up using all the cpu cycles it can get.

cmndline for rar2fs as root: rar2fs --seek-length=1 -o allow_other /data/media/ /data/rar2media/

rar2fs -V gives:
rar2fs v1.15.5 build 310 (DLL version 5)    Copyright (C) 2009-2012 Hans Beckerus
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it under
certain conditions; see <http://www.gnu.org/licenses/> for details.
FUSE library version: 2.8.6
fusermount version: 2.8.6
using FUSE kernel interface version 7.12

When rar2fs isn't locking up it works just as it is supposed to. I have fsck:ed the data drive and the ubunto os is a fresh install.

On my old server rar2fs didnt have the -o allow_other option so i had to run rar2fs as the user plex so i'm going to try that now.

/Jerry

Original issue reported on code.google.com by rapp.jerry on 2013-01-17

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Hi Jerry. Thanks for your issue report. 

Before we continue I need to understand some more details about your problem. When
you detect that rar2fs is consuming 100% of the CPU, can you tell if it happens at
some specific operation towards the mounted folder? Is it always after accessing the
same file or is it completely random? Does the problem happen also if you don't access
any files?
Is this a single- or multi-core server? If it is a multi-core server, what happens
if you try the option --no-smp when starting rar2fs?

This is the first time I have had a report on this specific issue so I do not have
any clues yet to what could actually be the root cause. What you could try (since you
run 1.15.5 I assume you built rar2fs yourself) is to configure with --enable-debug=2
and re-compile. Then run with the -f flag. This will log a lot to the terminal from
which rar2fs it is started so try to use a fairly large history buffer. Then at least
we can see what rar2fs is doing when CPU load sky rockets.
Also, connect to the process using gdb is an option and check current stack. This provides
that the server can respond to anything when CPU load is so high.
Since CPU load is stuck at 100% I suspect a loop of some kind. Could be that PMS is
triggering a case I have not yet been able to test. It could also be something in FUSE
or libunrar :(
Did you compile against unrar source 4.2.4?

Hans









Original issue reported on code.google.com by hans.beckerus on 2013-01-17 16:48:09

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

One other thing. How is PMS interacting with the mounted folder? Does it try to write
something to it? I have never tested PMS myself so any information you can give me
is valuable. How does the PLEX client interact with the server? I guess it is using
some network protocol for this, do you know which one?
And, when you see the problem, how many threads have actually been created by rar2fs?
You can install eg. htop which will show you that in a nice way.
Is there something in /var/log/messages or dmesg that can hint towards something?

Hans

Original issue reported on code.google.com by hans.beckerus on 2013-01-17 17:11:51

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Hi, thanks for your quick response!

i will try to implement your suggestions and i will post back what i have when it crashes
the next time. 

I will post further info regarding my build when i get home from work.

Original issue reported on code.google.com by rapp.jerry on 2013-01-17 17:17:36

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Great. One more question, do you have a huge amount of data in /data/media ?
What happens if you limit that to only a few files, does it still behave the same?
What we need to do here is to narrow it down a bit so we can rule out some components
involved. Currently I am leaning towards some problem with massive "attacks" towards
the file system by PMS caused by some periodic indexing maybe?

Hans

Original issue reported on code.google.com by hans.beckerus on 2013-01-17 17:26:36

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

(No text was entered with this change)

Original issue reported on code.google.com by hans.beckerus on 2013-01-17 17:29:28

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

To better mimic your environment I installed PMS on my QNAP NAS. It is a dual core 1Ghz
Intel Atom server, so not a very powerful machine in general. And, I think I might
see a behavior similar to what you describe. It is related to the scanning of the media
library by PMS, which I actually suspected already from the beginning. However, even
though rar2fs is consuming a whole lot of CPU, it is only during the scanning process
which in my case is *extremely* slow. An update of the library with around 300 media
files took close to 30 minutes! During that time the CPU load is very high but once
the scan is complete the load drops back to normal again. It could appear as if rar2fs
is locking up but it did not in my case. The server just can not respond to any heavy
tasks during the PMS library update. Also, I am running rar2fs on top of another FUSE
based encryption filesystem and that too peaks at almost 100% during the same time
frame.
Can you confirm if rar2fs actually is stuck at 100% or if it is only during the time
of the PMS media library update? Check your Plex/Web timeline, it will show the current
progress/activity. I do not know what settings you made in PMS, but if you have the
periodic update enabled it could explain why you see this only now and then.

Original issue reported on code.google.com by hans.beckerus on 2013-01-18 08:33:27

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

More tests conclude that running with a patched version of rar2fs that ignores compressed
archives completely rectifies the CPU load issue. However, indexing of the media library
only went down from 30 or so minutes to around 27 minutes so nothing much was really
gained on this system other than that the PMS server is more responsive during the
media library scan. I have not tested the deep scan mode yet.
Locking rar2fs to a single CPU core using --no-smp does not have any effect other than
making things worse since load balancing performed by Linux is disabled for this process.

Original issue reported on code.google.com by hans.beckerus on 2013-01-18 21:37:44

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Can you try to replace your current rar2fs.c file with this patched one and rebuild.
Using this CPU load is still high but if something else needs the CPU it will be granted
so. PMS indexing time was reduced somewhat and at the same time server responsiveness
was radically increased.

Original issue reported on code.google.com by hans.beckerus on 2013-01-19 01:15:34


- _Attachment: [rar2fs.c](https://storage.googleapis.com/google-code-attachments/rar2fs/issue-11/comment-10/rar2fs.c)_

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

My system
Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4 
memory: 6G

Rar2fs
rar source 4.2.4

Rar2fs has been taking up 100% cpu for several hours now and the only program that
is unresponsive is plex. The rest of the computer is fine/responsive.

Nothing in dmesg that i can tell.

I do have a large media drive ~8TB but on my old system with older versions of plex
and rar2fs this was never a problem.

Last lines of Plex Media scanner.log
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - HTTP requesting to: http://127.0.0.1:32400/:/metadata/notify/cullTimeline?librarySecti$
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - Scanning Anime with 1449 current items
in the database...
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG - Performing a scan with 'Plex Series
Scanner' (language: en virtual: 0).
Jan 18, 2013 09:26:36 [0x7fd4704f5740] DEBUG -   * Scanning /data/rar2media/anime

so im guessing it has been stuck for 24hrs now. 

in htop i get:

16335 plex       20   0  166M 67664   376 S 100.  1.1 26h55:18 ├─ /bin/rar2fs --seek-length=1
-o allow_other /data/media/ /data/rar2med
16336 plex       20   0  166M 67664   376 R 100.  1.1 26h55:16 │  ├─ /bin/rar2fs --seek-length=1
-o allow_other /data/media/ /data/rar2
 8394 plex       20   0     0     0     0 Z  0.0  0.0  0:00.00 │  └─ rar2fs

I will try some of your recompiling advice now

I don't really know if there is a way to see what a process is doing right now in linux
as i am more of a windows user. 

Original issue reported on code.google.com by rapp.jerry on 2013-01-19 09:47:44

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Well definitely does not look good :(
The zombie process indicates that the thread could not terminate properly.
What exact version of rar2fs did you use before, and was it the same version of Ubuntu?
Did your old system use the same version of FUSE?

Original issue reported on code.google.com by hans.beckerus on 2013-01-19 11:28:00

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

You can try to attach gdb to either PID 16335 or PID 16336.
Eg. gdb --pid 16336
Then you can dump the current stack using 'where'
It might give a hint to what is going on.
But PID 16335 is sleeping so it is probably just blocking on something.
PID 16336 is the most interesting one I would guess.


Original issue reported on code.google.com by hans.beckerus on 2013-01-19 11:34:55

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Another thing you can check is under /proc filesystem.
Every process has its own entry there.

Eg.
  cd /proc/16336
  cat stack

This should provide you with a fairly recent snapshot of the call stack.

Original issue reported on code.google.com by hans.beckerus on 2013-01-19 13:13:45

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

If you get the chance, also try to reproduce the problem with the attached patch.

Original issue reported on code.google.com by hans.beckerus on 2013-01-20 21:57:16


- _Attachment: [rar2fs.c](https://storage.googleapis.com/google-code-attachments/rar2fs/issue-11/comment-15/rar2fs.c)_

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Sure i'll try. But i haven't had a lockup for days now. I'll report my findings as soon
as/if it crashes again.

Original issue reported on code.google.com by rapp.jerry on 2013-01-21 06:27:53

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Just got a freeze, unpatched version.

I redacted the filename.

From gdb:

0x00007fa95ce3ed2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0  0x00007fa95ce3ed2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000409a12 in read (__nbytes=315160, __buf=0x7fa9560091f0, __fd=<optimized
out>)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:45
#2  extract_to (file=<optimized out>, sz=315160, fp=0x7fa95461daf0, oper=1, entry_p=<optimized
out>)
    at rar2fs.c:235
#3  0x0000000000409c3a in listrar_rar (path=0x7fa9545d7c30 "/utdelat", buffer=0x7fa95bc84c18,
    arch=0x7fa95bc84a40 "/data/media/utdelat/<redacted>.rar", hdl=<optimized out>,
next=0x7fa9545ffc20,
    entry_p=0x7fa95461a630, need_password=0) at rar2fs.c:1518
#4  0x000000000040aa47 in listrar (path=0x7fa9545d7c30 "/utdelat", buffer=0x7fa95bc84c18,
    arch=0x7fa95bc84a40 "/data/media/utdelat/<redacted>.rar", Password=0x0) at rar2fs.c:1755
#5  0x000000000040c197 in readdir_scan (dir=0x7fa9545d7c30 "/utdelat", root=0x7fa95bc84bb0
"/data/media/utdelat",
    next=0x7fa95bc84c18) at rar2fs.c:1979
#6  0x000000000040c56b in rar2_readdir (path=0x7fa9545d7c30 "/utdelat", buffer=0x7fa95448cc30,
    filler=0x7fa95d7b1de0, offset=<optimized out>, fi=<optimized out>) at rar2fs.c:2037
#7  0x00007fa95d7b5149 in fuse_fs_readdir () from /lib/libfuse.so.2
#8  0x00007fa95d7b538c in ?? () from /lib/libfuse.so.2
#9  0x00007fa95d7bc596 in ?? () from /lib/libfuse.so.2
#10 0x00007fa95d7bad65 in fuse_session_loop () from /lib/libfuse.so.2
#11 0x0000000000408683 in work_task (data=0x7fff1ffa0d70) at rar2fs.c:3379
#12 0x00007fa95ce37e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fa95cb64cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

from /proc/pid/stack

nexx@lemur:/proc/23484$ sudo cat stack
[sudo] password for nexx:
[<ffffffff81090708>] hrtimer_nanosleep+0xb8/0x180
[<ffffffff81090846>] sys_nanosleep+0x76/0x90
[<ffffffff81665a02>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


Original issue reported on code.google.com by rapp.jerry on 2013-01-21 07:57:48

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Ok, seems there is a problem with the RAR inside RAR function :(
If the stack is taken from the looping thread/process then it seems read is stuck trying
to get data that never arrives on the pipe. Probably the unrar process died without
notification. Nothing that I have ever encountered before I am afraid.

I would be good if you could provide me the details (in private mail if you so wish)
about the .rar file in question. If I could get access to it I might be able to reproduce
the problem. That would certainly make debugging easier ;) Otherwise I will have to
ask for your help to run different version of rar2fs in an attempt to find the problem.


Original issue reported on code.google.com by hans.beckerus on 2013-01-21 09:34:48

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

right. it seems that if i try to list (navigate to dir and typing ls) that particular
.rar file, rar2fs will lock up. It isn't always taking up 100% cpu but it wont respond
anymore. I will send the file to your gmail address if that's OK? I'm guessing i have
other rars that lock up similarly as the offending file isn't in a directory that plex
watches.

Original issue reported on code.google.com by rapp.jerry on 2013-01-21 09:52:40

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

The problem has been identified to show up when there are non-flat .rar files inside
the main archive. The solution is either to remove support for this specific type of
archives (not expand the .rar inside main archive) or do some coding to display it
properly. Have not decided on what the final solution will be yet.

In the mean-while, try using the --seek-depth=0 flag. It should resolve the lock-up
issue, but instead will completely disable the support for .rar files inside main archive.

Original issue reported on code.google.com by hans.beckerus on 2013-01-21 23:30:45

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Progress, But still a few miles to go. Actually, while trying to resolve this problem
numerous other issues popped up. In short, this will be a major update. That also means
official commit will be delayed due to the amount of regression testing involved.

Original issue reported on code.google.com by hans.beckerus on 2013-01-23 23:14:27

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Fix committed to trunk. Please test the latest version and verify that the issue is
resolved.

Original issue reported on code.google.com by hans.beckerus on 2013-01-25 20:39:36

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Right. I have installed the latest ver from trunk and so far everything seems good.
I tried the file that crasched before and now it works perfectly so hopefully the issue
is sorted.

I'll post again if it crasches.

Thanks a lot for your work!


Original issue reported on code.google.com by rapp.jerry on 2013-01-25 21:12:09

@hasse69
Copy link
Owner Author

hasse69 commented May 23, 2015

Since I have not heard anything in a while I assume things are working as expected.
Case closed.

Original issue reported on code.google.com by hans.beckerus on 2013-02-03 20:34:07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant