Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezing when receiving dcc #159

Open
tiagoapimenta opened this Issue Oct 22, 2014 · 45 comments

Comments

Projects
None yet
10 participants
@tiagoapimenta
Copy link

commented Oct 22, 2014

I'm running irssi 0.8.15 (20100403 1617):armhf on a Raspberry Pi Model B with Raspbian 7 (wheezy) kernel 3.12.28+ armv6l distribution.

Ever I request a DCC streaming after accept it the screen and keyboard freezes, I am unable to type any command, even Meta + Arrow doesn't respond, but I'm using it over screen, and Ctrl + A # does work properly, that way I can see the streaming file growing.

When the streaming finishes (it usually breaks because some kind of timeout) it unfreeze and all the previous typed commands runs at once. When the timeout isn't reached it's fine, but when it is, the connection is restarted either.

I realised it only happen on lightweight hardware. I wonder it could be fixed by multithreading or sort of asynchronous file I/O solution.

Questions: U&L RPi

Edit: Even though RPi uses a SD-card I'm writing the file on a USB HDD, it should support the streaming speed but it seems it doesn't.

@uwe-schwarz

This comment has been minimized.

Copy link

commented Jul 29, 2015

I have the same problem with a VIA nano processor irssi 0.8.15 (Ubuntu 0.8.15-5ubuntu3). Happens mostly on DCC SENDs with more than 15MB/s.

@dequis dequis added the bug label Jan 11, 2016

@giulange

This comment has been minimized.

Copy link

commented Jan 17, 2016

I had the same problem on my raspberry pi:
Linux rpi-giuliano 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux

irssi freezes both via screen window or not (anyway via ssh) as soon as the first xdcc request is submitted.
I use an external ntfs HDD where dcc downloads are stored.
But I remember the first time I used the irssi application in which it was slow but was working.
I also highlight the large cpu usage (30-40%) by mount.ntfs.

I wonder if there could be something to solve this issue/bug.

Giuliano

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 17, 2016

@giulange

This comment has been minimized.

Copy link

commented Jan 18, 2016

I installed the 0.8.18-beta1 as you suggested, but nothing changed! The program freeze as soon as I submit a xdcc request. It continues downloading the requested object and stops when an interruption happens (rarely when the file is fully downloaded).

Could you provide more explanations (if any) on why this happens?

–––––––––––––––––––––––––––––––––––––––––––––––––––––
This was the config during installation (ensure that all is fine):
Building text frontend ........... : yes, using terminfo
Building irssi bot ............... : no
Building irssi proxy ............. : no
Building with module support ..... : yes
Building with Perl support ....... : static (in irssi binary)
Perl library directory ........... : (site default - /usr/local/lib/perl/5.14.2)
Install prefix ................... : /usr/local

Building with IPv6 support ....... : yes
Building with SSL support ........ : yes
Building with 64bit DCC support .. : yes
Building with DANE support ....... : no
Building with true color support.. : no
–––––––––––––––––––––––––––––––––––––––––––––––––––––

Thank you very much!!
Bye,
Giuliano

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 18, 2016

No idea, that was a shot in the dark because there were vaguely relevant fixes somewhat recently. Thanks for testing anyway.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2016

I don't know of any change in 0.8.18 that could either positively or negatively impact this issue. I also cannot reproduce this issue myself. One guess would be that maybe the storing to disk is consuming too many resources so that Irssi is always busy saving data and cannot keep up with the other stuff? Most reports seem to be on "relatively" weak hardware here. I wonder if raising the process priority of irssi (or lowering that of NTFS) could positively influence the issue?

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 18, 2016

FWIW, b984f1f is what i was thinking of. I just vaguely remembered something that involved ntfs and dcc, and the previous reports only mentioned 0.8.15.

The whole thing sounds like IO buffers filling up and blocking. Maybe we can get the same effect with a shitty networked fuse filesystem like sshfs. Or sending sigstop to a fuse process.

@uwe-schwarz

This comment has been minimized.

Copy link

commented Jan 18, 2016

I don't think it's disk-i/o, I experienced similar transfer rates without the issue. Maybe something with jumbo frames or special network driver issues. I don't use this hardware anymore and the issue never happened again (though my new hardware is way more powerful).

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2016

irssi is single-threaded so if "something" is keeping irssi hanging, maybe waiting for the disk or I don't know what. can we get more infos somehow, like attaching a debugger while it freezes or running it with strace?

@giulange

This comment has been minimized.

Copy link

commented Jan 18, 2016

Ok, I can run the 0.8.18-beta1 version with strace.
I'm using screen, so tell me what I have to do. I try to write a possible procedure:

  1. screen -S irc
  2. strace -oirssi.log -cT irssi
  3. (irssi automatically connects to a server and joins a channel)
  4. /msg bot|string|num xdcc send #id (if the file starts download, irssi freeze!)
  5. Crtl-A D (to exit screen called "irc" and take the command of the raspberry)
  6. Then what do I do? I wait until irssi un-freeze? or I can soon after kill the "irssi" process?
  7. I give you the log file

Waiting for your feedback.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2016

sounds like a good plan. if it doesnt take too long, wait until irssi unfreezes. and make note of the time of freeze and unfreeze. also if you think you can try more things, and can get gdb installed, do at the time of irssi freezing, in another screen: gdb -p pidof irssi``. then get the output of "bt full", then still in gdb, type "c" (continue), confirm that irssi is still freezed, wait a few seconds, type Ctrl+C in gdb, and get another bt full

@giulange

This comment has been minimized.

Copy link

commented Jan 18, 2016

I ran "gdb -p 4884", then I type "bt full" as you suggested, but I'm not sure (I'm not confident with gdb).
This is the result of gdb:
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Attaching to process 4884
ptrace: Operation not permitted.
(gdb) bt full
No stack.
(gdb) c
The program is not being run.
(gdb) bt full
No stack.
(gdb) bt full
No stack.
(gdb) c
The program is not being run.
(gdb) ^CQuit
(gdb) bt full
No stack.
(gdb)

irssi is still frozen...

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2016

the key here is probably:

Attaching to process 4884
ptrace: Operation not permitted.

try sudo gdb -p 4884 instead

@giulange

This comment has been minimized.

Copy link

commented Jan 18, 2016

I give you the log file in the meanwhile. Now I try to repeat all.
irssi.log.zip

@giulange

This comment has been minimized.

Copy link

commented Jan 18, 2016

I tried with sudo but get the same issue:

sudo gdb -p 5190
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Attaching to process 5190
ptrace: Operation not permitted.
(gdb) bt full
No stack.

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 18, 2016

Is that irssi still running under strace? If so, run it directly instead.

Otherwise, are you running any security-related variants of the kernel? Grsec, selinux, etc?

@giulange

This comment has been minimized.

Copy link

commented Jan 19, 2016

Here it is (without strace):

(gdb) bt full 
#0  0xb6d21794 in write () from /lib/arm-linux-gnueabihf/libpthread.so.0
No symbol table info available.
#1  0x00079588 in sig_dccget_receive (dcc=0x35cec0) at dcc-get.c:159
        buffer = "\b\031\377`r\031\203TD\354\357zh\r\031\261]>\346[-\n0\336a}\365\031\023@\271y\341\360VWoJ>\332_\202a\262\265\361\245B\"\035r\340\331\\\272#A\212\070h\370\071\336D\037 \245\333\233.\022E\271\016f^:\017<arI\f\250\215\335\216SQhjts6h\361\347\354<U1F\016H\240x\364o\027\307\060\247\256\237\"\216\274\024\327\030\367\246-\353\aWX\003%\235\t\254\367Z\234\247\331\356\336\234\070\373\021\251\237t\262.\362}\201\237\260\212\260\224\277*\260hV\306_\362\262<K\215-VE\326\256\340c\272\365Mn\231\340\365\265\300\303\212?\354\367Hh \274Dv\234\240\247\205\375o\337mD]\336,\302\267\000B\257\352Q\201@\261\205\365\340\251C\r\267,\025%:\004$\251\035\300_\276V\356C\317?'y\354\033\344\345A^\266\261F@\350\372w,\t\354\260L!\244\060\233\232\311\035\377\fW\333\302\235Od\000\314\201y|\037T\351\272u\216\375\000\066\310l\212\022\215/\365\223;\025-\322t\332_\265a\"\214\027|\003\306u\005/p\242\a\004\231=s\321\066\276\325\000$nH\001\061b\nv\272\035y\351+\336\241z\260\354n*QL\245'\271\306)\n\200\307y\260\027"...
        ret = 512
#2  0x0008750c in irssi_io_invoke (source=0x35d090, condition=<optimized out>, data=<optimized out>) at misc.c:54
        rec = <optimized out>
        icond = <optimized out>
#3  0xb6c5ded4 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
No symbol table info available.
#4  0xb6c5ded4 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xb6937830 in poll () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) 

––––––––––––––––
here the download interrupted, then I started a new one to freeze again!
––––––––––––––––

(gdb) bt full
#0  0xb6937830 in poll () from /lib/arm-linux-gnueabihf/libc.so.6
No symbol table info available.
#1  0xb6c185f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
No symbol table info available.
Cannot access memory at address 0x0
#2  0xb6c185f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
No symbol table info available.
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xb6d21794 in write () from /lib/arm-linux-gnueabihf/libpthread.so.0
(gdb) quit
A debugging session is active.

        Inferior 1 [process 11878] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/local/bin/irssi, process 11878

––––––––––––––––
irssi still frozen
––––––––––––––––

@dequis

This comment has been minimized.

Copy link
Member

commented Jan 19, 2016

So, again,

The whole thing sounds like IO buffers filling up and blocking. Maybe we can get the same effect with a shitty networked fuse filesystem like sshfs. Or sending sigstop to a fuse process.

The network is fast, the disk is crappy, unix is crappy. The recommended solution seems to be to use threads, which sucks.

Weechat forks (xfer_network_recv_file_fork / xfer_dcc_recv_file_child) and has a handmade select() loop in the child process.

@giulange

This comment has been minimized.

Copy link

commented Jan 20, 2016

Important update:
I formatted the HDD ext4 (no NTFS) and irssi does not freeze!

Maybe the mount.ntfs consumes large resources...

Bye!
Giuliano

@slashbeast

This comment has been minimized.

Copy link

commented Jun 24, 2016

Hi,

The issue is still open so allow me to revive it.

I am facing this issue, however, I am not using embedded hardware but rather powerful box. Usually there's no issue with dcc downloading anything, irssi is fully responsible. If I do however abuse storage quite a bit, I can reproduce the issue.

It seems to be connected to the block I/O that is triggerd when a dirty bytes hits the amount set in vm.dirty(_bytes). When this value is hit, the kernel blocks all writes and flushes the data to the storage, and it seems to be the issue here. Irssi is stuck in IOWAIT and is not responsible then.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2016

thanks for the analysis, question that remains is how can this be solved? can it be solved at all without forking?

@LemonBoy

This comment has been minimized.

Copy link
Member

commented Jun 24, 2016

The only way to break out of the loop is to drain the buffer faster than it is filled.
And, beside that, I think 512 is too small as buffer size.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2016

thanks LemonBoy, then I guess one way might be to introduce some forced interruption of the buffer receiving operation in order to keep the main loop from starving

is there an easy way to reproduce and interactively tackle this issue?

@slashbeast

This comment has been minimized.

Copy link

commented Jun 24, 2016

The issue happen when you hit either vm.dirty, which is set by either vm.dirty_ratio (percent of system memory) or vm.dirty_bytes. By default Linux kernel have vm.dirty_ratio set to 20 and vm.dirty_background_ratio set to 10, meaning it will start flushing dirty pages when they reach 10% and do the block I/O flushing when it reach 20% of system memory (or if it hit the dirty_writeback_centisecs).

My box does have set the background one to 50M and the blocking-one to 512M via sysctl

vm.dirty_bytes=536870912
vm.dirty_background_bytes=52428800

So to answer your question, you do need to reduce vm.dirty_bytes to something small and then do an huge write to the same block device that you fetch your DCC traffic onto. In my case it was repacking an 20 GB video file into mkv container.

You can check the dirty pages in memory via the /proc/meminfo interface

grep Dirty /proc/meminfo
@dthierbach

This comment has been minimized.

Copy link

commented Aug 1, 2016

I had the same issue, with a fairly powerful desktop. I change the buffersize in sig_dccget_receive in dcc-get.c from 512 bytes to 32 kilobytes (no idea if that much is necessary, I just gratuitously increased it), and the problem is gone. The source used is version 0.8.19-2 from Debian.

My guess is that the larger number of syscalls and the delays in context switching needed for the small buffer makes everything too slow for the loop to exit.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

thanks for reporting back! the small buffer is certainly suboptimal, if it doesn't bother you too much could you test how 4kb behaves on your system? I think a change from 512 -> 4096 wouldn't be too "dangerous" here

@dthierbach

This comment has been minimized.

Copy link

commented Aug 1, 2016

Ok, I did a bit of testing. This is on a slow DSL 2000 connection via WLAN, MTU is 1500. I think my WLAN can use Jumbo frames (I've seen them mentioned in the syslog), but I don't know the size (Wikipedia says it's up to 9K).

I added a bit of code to show average and maximum size when doing network reads. I tested with a buffer size of 4K, 32K and 512K. This is on a normally loaded system, i.e. with web browser open etc, doing a bit of other stuff while downloading.

Average read size is around 2K for all buffer sizes. Both for 4K and 32K buffer size, the read maxed the buffer at least once. For 512K, the max was 350K. For 4K, there was no freezing, but response to typing seemed to lag a little bit from time to time. For 32K and 512K, there were no such issues.

So in conclusion, 4K still seems to be a bit on the "low" side, especially considering that my DSL connection is slow. I don't know what the Linux scheduler is doing, but apparently sometimes it takes quite a while until irssi gets to read data again, and then the kernel may have a lot of data it wants to get rid of all at once.

Is there any reason you want to keep the buffer size small? How did you come up with the 512 bytes in the first place? That's quite below a normal MTU. 32K is not a lot of memory in today's terms. Or maybe make it configurable if you are anxious?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2016

well I think what may be necessary instead/alternatively (apart from threading) is to bail out and resume the user interface processing instead of always draining the input buffer
(I didn't come up with the 512 personally, I can only guess that the buffer size fell down from heavens)

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2016

thanks for the thorough investigation @dthierbach , I think we could use a 32K buffer on the heap instead. @dequis @LemonBoy ? patch welcome from my side

@LemonBoy

This comment has been minimized.

Copy link
Member

commented Nov 1, 2016

I think that's fine as a workaround and definitely better than 512.
A comment should be possibly added with a brief explanation, too.

ailin-nemui added a commit to ailin-nemui/irssi that referenced this issue Nov 23, 2016

add a static buffer for dcc received data
increased buffersize might make irssi freeze less / irssi#159

ailin-nemui added a commit to ailin-nemui/irssi that referenced this issue Nov 23, 2016

add a static buffer for dcc received data
increased buffersize might make irssi freeze less / irssi#159
@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

I will close this, please reopen if needed

@ailin-nemui ailin-nemui closed this Jan 3, 2017

@foice

This comment has been minimized.

Copy link

commented Jun 1, 2017

I am a afraid I have to reopen this. I just compiled irssi from the git version and I was hoping that I would not see anymore the freeze that I got on /dcc get. Unfortunately even with rssi 1.1-g2d0a9b4 (20170530 1314) I am getting irssi to freeze as soon as I do /dcc get. Now, however, is unfreezing after a short while, is just very unresponsive. Am I alone with this?

I am on a raspberry pi with raspbian jessie, if that matters.

Is there anything we can do about this?

@dequis dequis reopened this Jun 1, 2017

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2017

yes, for example the first thing you could try would be to add a break; here:

irssi/src/irc/dcc/dcc-get.c

Lines 174 to 175 in 31b9d11

dcc->transfd += ret;
}
and see if that helps

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2017

btw. what file system are you using?

@foice

This comment has been minimized.

Copy link

commented Jun 2, 2017

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2017

it may be waiting for the samba to sync, the current idea is to improve the situation using async io and not having to use even more processes. did you see any difference after adding the break?

@foice

This comment has been minimized.

Copy link

commented Jun 3, 2017

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2017

yes, just make and make install should even be enough if you configured it before

@foice

This comment has been minimized.

Copy link

commented Jun 4, 2017

@hrvstr

This comment has been minimized.

Copy link

commented Dec 8, 2018

@foice Can you chime in if it fixed the issue for you? I am experiencing the same at the moment.

@foice

This comment has been minimized.

Copy link

commented Dec 8, 2018

@hrvstr

This comment has been minimized.

Copy link

commented Dec 9, 2018

yes, for example the first thing you could try would be to add a break; here:

irssi/src/irc/dcc/dcc-get.c
Lines 174 to 175 in 31b9d11
dcc->transfd += ret;
}
and see if that helps

@ailin-nemui would it be possible to add this fix to the binaries on apt?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2018

I suspect I am not the one creating the binaries that you are using. You would have to ask that person

@hrvstr

This comment has been minimized.

Copy link

commented Dec 10, 2018

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@hrvstr does this fix work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.