-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on sending large message #796
Comments
Works for me using ftp://ftp.cs.princeton.edu/pub/cs226/textfiles/bible.txt (3.9mb) Get a backtrace https://wiki.debian.org/HowToGetABacktrace |
Got to 1 Kings 11:43 before the crash |
What quit message do other people see in the channel? Does this also happen with a clean profile? |
A few seconds after the messages stop, irssi segfaults and
With a completely clean profile, flood protection stops me from trying this. After turning off flood protection, I send the entire thing, but only so much of it actually shows up to others in the channel (same as was happening before) but a few seconds later instead of segfaulting, the connection is just closed. Irssi says "Connection lost" and the channel displays the same quit message that it was when segfaulting occurred (has quit [Connection closed]). I'm surprised the behavior changed, since the only real changes to my config besides turning off flood protection are colorschemes and aliases. |
Okay this is interesting. For reference, I'm testing this with a "test" client where I actually send the bible text from, and another client (the "viewing" client) joined to the same channel where I observe externally. When I send the bible text in the test client, I see it all get sent to the channel within a second or two. At this point, I usually wait for a few seconds and then the client segfaults or gets disconnected (depending on config), but I just realized that this always coincides with someone else in the channel saying or doing something that gets sent to the test client. In the case of the original config where segfaulting occurs, anyone saying anything in the channel or quitting the channel after the text has halted triggers the segfault. With the fresh config, it triggers a disconnect. |
Here's another backtrace
|
are there any scripts loaded which could explain the difference between the configs, or any other config differences? which client crashes, the one who sends the bible? |
The only scripts I have loaded are a few simple things that register a command. Nothing that listens to all lines, incoming or outgoing, like I would expect something to do if it causes a segfault from high volume text. The only config differences are also very minor - just a few aliases an a colorscheme. When I have time later I'll try enabling things in my normal config bit by bit in a fresh config and see at what point it starts crashing.
Yes |
the odd thing in your backtrace is that it crashes in some incoming message, so it should be unrelated to your sending of the bible. (but maybe that sending causes some memory corruption somewhere? just random speculation from me..) are you trying it with the bible dequis linked? do you have the possibility to get a wireshark capture of the network traffic before it crashes?
already does not look like a real line that should be coming in from the IRC server if you go to the first frame (f 0) and p *server, is the server record intact? you can also try to show us a |
Yeah, it's always the first incoming message after the bible text stops getting sent.
I don't think they're exactly the same bible, but they are nearly the same size and it's all just normal ascii text.
|
Oh, here's a new development. I only crash when I send the text to a channel that has other people in it. What seems to be happening is that while the text is flooding into the channel and my client is busy sending it, some other client in the channel can't keep up and crashes, and the message from the server about their disconnect gets sent to me and my bible flood immediately stops. The next message to me then segfaults. If I do this in an empty channel or with other users I know won't say anything or crash for the duration of the flood, I do not crash and the entire text is sent. |
I can't really track it down even with all the information you provided. are you comfortable with compiling irssi? you could apply this patch https://github.com/ailin-nemui/irssi/commit/4e56675d2fea8aed59d91875f317ec228c94da85.patch and record a signal emission trace like so: |
@ailin-nemui Sure, give me a few minutes |
Okay so with the version of irssi I compiled with the patch, it never crashes. I have no idea why. Also the signal_trace.log file is 645MB. |
well.. thanks for trying.. meh. I wonder if you could run it under valgrind memory checker (without the signal trace patch); but maybe it's also be too slow to make it crash. or with some other methods @josephbisch do you have some suggestion how to best triage this issue? |
I second @ailin-nemui suggestion about trying to capture network traffic if possible with Wireshark to see what the server is sending to the crashing client. Then we might be able to create a "fake" server that sends the same commands with netcat to avoid needing a real server and client(s) setup to reproduce the issue on our end (hopefully). |
f wireshark is too hard, can you try to /connect -rawlog file and if we're lucky it captures/writes enough before it crashes |
So irssi doesn't crash either with the signal tracing patch or when running under valgrind. I guess both make it too slow? I don't think wireshark is feasible. The server I run irssi on has many users causing a lot of network traffic (including other irc clients). It's not so much traffic that I think it could have anything to do with the crashes, but it's enough that it would probably be a pain to extract anything from the log unless wireshark has some really good filtering tools that I'm not aware of.
Sure. What does that do exactly? I just tried using it and it didn't seem to create any file. |
For the sake of reproduction, I'll mention again that this only happens if the ircd sends a message to your bible-sending client while it's in the middle of sending the bible. If you guys were trying to reproduce it in an empty channel, it probably won't work. Having a bot in the same channel that says something arbitrary every second or so should work, as should having a second client connected where you just manually send some messages while the first client is sending the bible. You guys can probably figure out what's wrong a lot more easily than I can, if only it could be reproduced. |
Are you sure that /connect -rawlog file doesn't create a file named file in the same directory as you run irssi from? It should record the raw IRC messages being sent. I think that if we can get the rawlog from you, we can reproduce it without needing to try to get the same conditions with respect to bots or other clients that you have. Ultimately the issue here is that irssi is crashing in response to something being received from the server, even if that message from the server is being sent to the crashing irssi as a result of a message from some other client. So we really need the messages between the server and the crashing client and then we should be able to reproduce the bug. |
I can't reproduce the bug, even with the instructions in your latest comment. I think it is more complex than just having a bot or second client sending messages at the same time as the bible is being sent. |
by the way @spaghetti2514 what exactly did you do when you say disable flood protection? |
I set cmd_queue_speed to 0 |
I got rawlog to work (I was giving it a path instead of just a filename because I wanted the file somewhere else). The results aren't illuminating. It's just a bunch of outgoing PRIVMSGs from me to the server containing bible passages followed by the single message that kills me, whether that's a message from a bot or whatever. There's nothing after it. Example:
|
Maybe you guys can't reproduce it because you can't fill up some kind of buffer fast enough? If disk IO is bottlenecking you there's a kernel module that you can compile and load to get a character device that you can read the bible straight out of. |
I don't think disk IO should be a bottle neck here. You said on one config it didn't crash but only disconnect, did you have a chance to investigate the config difference? |
Okay, I finally figured out what the profile difference is. If nickcolor.pl is present in scripts/autorun, irssi segfaults, otherwise it doesn't. |
Would it be possible to get a copy of your As well, could you point me to a copy of the Would it be possible for us to test on the IRC server you're using as well? If so, could you tell us the information to connect? Maybe we can reproduce it if we copy everything exactly! Thanks for your persistence in helping us figure it out! |
Here's a zip containing the minimal .irssi directory and the bible.txt I'm testing this on irc.oppaiti.me, which is running inspircd 2.2.0. If you're gonna join be warned that the motd is the entire bee movie script. |
Ah, yes, a christian server. I managed to reproduce the crash. Thank you! The short version (breakpoints on
The The incoming message sent from the second irssi is handled by nickcolor.pl's
So when it gets to I think there's no use after free involved here, which is pretty cool. By the time it crashes the refcount of the server is 1, everything is good there. It's just a handful of things getting called after the irc part of the server is full of NULLs. |
@spaghetti2514 can you test #803 and see if that fixes the issue? |
I would be happy to |
@ailin-nemui |
gdb info and backtrace with #803 applied
|
can you try with this script? how much does it print before crashing?
|
@ailin-nemui What does it print to? I don't see anything extra printed with this script loaded. |
@spaghetti2514 odd. it's supposed to print to the active window. you can change the code to
to print to stderr instead and run first, test it without the bible:
it should print something like this, if everything is working
the 3 messages "Irssi: server..." are from this little debugging script |
fixes an odd crash when $server unexpectedly becomes invalid during command execution. cf irssi/irssi#796
I tested the changes in #803 as well. From master at d85f867, I merged the branch in. Before merging I reliably saw the segfault. After applying the branch there is no segfault. One interesting point: I see this in the status window (even before Irssi updates to show there is no connection):
While there's no crash, the behaviour afterwards is weird. Irssi sits there looking like it is connected until another client in the channel does something (sends a message). Then Irssi quits. Doing anything else in Irssi, including sending a message to the channel, does not cause it to quit, but it appears disconnected in that if I try to send a message to the channel it looks like it worked yet nothing shows in the channel in the other client. This is the same thing @spaghetti2514 found in that something must happen externally after flooding, but now we do not crash, just disconnect. By the way, I am able to reproduce the crash reliably using a server I put up for testing (I didn't want to repeatedly test on the server @spaghetti2514 mentioned). |
thanks for the detailed analysis @horgh , sounds like we should further investigate what's going on there and why it doesn't disconnect (could it be a bug/side effect of my fix?) |
Weird. Maybe I applied the patch wrong
I don't think it's a side-effect of your fix. It's the same behavior I was seeing before your patch whenever nickcolor.pl isn't loaded. Loading nickcolor.pl only replaces the disconnect with a segfault. Either one is still triggered by the server sending something after the client ghost-disconnects. Also sorry if I'm slow to try things or help out, I'm currently traveling and am very busy. |
Yeah, I think the weird semi-connected state is not caused by the fix since we saw the delay before. About the delayed disconnection: We end up setting The disconnect happens in This changes in this commit make us call It works okay for this one scenario. If we want to do this, we should check any signal that uses a I also tried having a timer that disconnected any servers flagged Edit: We probably can't use that approach (killing the |
@spaghetti2514 maybe you can just retry the patch whenever you get the time. verify that the patch is working first, with the simple case I showed you in #796 (comment) |
@horgh I think calling server_disconnect seems to be the right (and only) solution. The only other idea I have would be to add the check to the mainloop. I once had this plan to change the signal (event) emitter model to default to "after event", i.e. you put in some events that are going to run after the current signal stack finishes... |
Your idea to have a way to run things after the current signal processing finishes is a neat one! I was looking around to see if there was an API kind of like that. It would definitely be useful here, and it sounds like there have been other situations where it would have been handy. Maybe something as simple as registering callback functions for the main loop to fire would work. I'll take a look at that loop! |
This fixes the delayed segfault/disconnection aspect of irssi#796. When sending fails, the server is marked as disconnected and no longer useful. However it was not yet fully cleaned up. That cleanup happened in irc_parse_incoming() when we receive traffic from the server. This meant there could be a noticeable delay until we disconnect. With this change we clean it up in the next tick.
I have a solution to the delayed disconnect/segfault aspect of this in horgh/irssi@f8bdc01. It causes us to clean up the server the next time through the event loop. The interval is very short as you can see. The idea was to be effectively instant. I've tested it together with the changes in #803 and it seems to work fine. No segfault and Irssi shows the server is disconnected right away. I've not been able to come up with a better solution despite continuing to poking at this. #803 seems to solve the segfault part nicely. |
I think we should probably implement our own event source to queue events to instead of the timeout,1 hack :-) |
Yeah, it's not too nice. Short though! I'm not sure what such a queue would look like. Do you have a rough idea of how you think it should look? I'm not clear on how the events work in general 😳 |
When I try to print the entire christian bible to a channel (with flood protection disabled on the server and client), irssi segfaults part of the way through.
If I use
/exec -out cat bible.txt
to print the bible, irssi segfaults. If I use a different command that breaks the file into ~10 chunks and sleeps for two seconds between printing chunks, the entire thing is successfully sent.Steps to reproduce:
Expected behavior:
Actual behavior:
I am using irssi 1.0.5-1 (20171020 1715) from the debian testing repositories, but this issue has persisted across several versions.
Please let me know if there is any additional information that would be useful for me to provide.
The text was updated successfully, but these errors were encountered: