Skip to content
This repository has been archived by the owner. It is now read-only.

http: Do not buffer writes to a destroyed socket #4775

Closed
wants to merge 2 commits into from

Conversation

@isaacs
Copy link

commented Feb 14, 2013

This fix fills me with sadness. Here is why:

Prior to v0.10, Node ignored ECONNRESET errors in many situations.
There are valid cases in which ECONNRESET should be ignored as a
normal part of the TCP dance, but in many others, it's a very relevant
signal that must be heeded with care.

Exacerbating this problem, if the OutgoingMessage does not have a
req.connection._handle, it assumes that it is in the process of
connecting, and thus buffers writes up in an array.

The problem happens when you reuse a socket between two requests, and it
is destroyed abruptly in between them. The writes will be buffered,
because the socket has no handle, but it's not ever going to GET a
handle, because it's not connecting, it's destroyed.

The proper fix is to treat ECONNRESET correctly. However, this is a
behavior/semantics change, and cannot land in a stable branch. So, the
full-of-sad bandaid fix is to not put data into the output buffer if the
socket is destroyed, and also remove anything that is in the output
buffer when the HTTP request sees that it closes.

/cc @dpacheco @bcantrill @mranney @piscisaureus

@mranney

This comment has been minimized.

Copy link

commented Feb 14, 2013

As a user of this OutgoingMessage, what is the best way to detect that your underlying socket is gone?

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 14, 2013

Maybe it should raise EPIPE instead of being quiet like it is now. Otherwise, you'll just be leaking (much smaller) req objects instead of half a million long array of buffers.

@mranney

This comment has been minimized.

Copy link

commented Feb 14, 2013

Sending up some kind of error would be nice, but that would also be a change in behavior.

I'm happy to fix this in our application code with a simple check for socket.writable or something like that. Is that the best way to do this?

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 14, 2013

Well, as I look into this further, it seems like it's actually a pretty subtle thing. If the connection gets destroyed at just the right time, you'll get no error at all, and it'll treat it as a "connecting" request. But, if it's destroyed just a bit later, then you'll get an EPIPE when you try to write to it, because the client won't have gotten the message yet. If it's destroyed a bit sooner, during the first request, then it'll trigger the createHangUpError() function, which will raise ECONNREFUSED on the client (like you probably would expect). So, this only happens when a socket is reused, and the connection gets dropped in between them. (Otherwise, you'd see it every time any of your servers goes down ever.)

Maybe the correct fix is to just create a hangup error, and call it a day, since that's the behavior that happens most of the time in v0.8 when a socket dies anyway, and we could make the case that ignoring socket destruction is just a simple bug, and we should treat it like as if it happened while we were listening.

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 14, 2013

@mranney Thoughts on 727cad5? This way, it just raises a connection hangup error no matter when it happens, which is probably more correct anyway.

@mranney

This comment has been minimized.

Copy link

commented Feb 14, 2013

I agree that it seems more correct, so it's probably better.

Cc: @sentientwaffle, @dannycoates

@dannycoates

This comment has been minimized.

Copy link

commented Feb 14, 2013

@ceejbot and I noticed this when we wrote our keep-alive logic in poolee. See #4373 for our workaround.

isaacs added 2 commits Feb 14, 2013
http: Raise hangup error on destroyed socket write
Prior to v0.10, Node ignored ECONNRESET errors in many situations.
There *are* valid cases in which ECONNRESET should be ignored as a
normal part of the TCP dance, but in many others, it's a very relevant
signal that must be heeded with care.

Exacerbating this problem, if the OutgoingMessage does not have a
req.connection._handle, it assumes that it is in the process of
connecting, and thus buffers writes up in an array.

The problem happens when you reuse a socket between two requests, and it
is destroyed abruptly in between them.  The writes will be buffered,
because the socket has no handle, but it's not ever going to GET a
handle, because it's not connecting, it's destroyed.

The proper fix is to treat ECONNRESET correctly.  However, this is a
behavior/semantics change, and cannot land in a stable branch.

Fix #4775
@piscisaureus

This comment has been minimized.

Copy link
Member

commented Feb 15, 2013

Looks great. I love it. @isaacs, You're my hero.

isaacs added a commit that referenced this pull request Feb 15, 2013
http: Raise hangup error on destroyed socket write
Prior to v0.10, Node ignored ECONNRESET errors in many situations.
There *are* valid cases in which ECONNRESET should be ignored as a
normal part of the TCP dance, but in many others, it's a very relevant
signal that must be heeded with care.

Exacerbating this problem, if the OutgoingMessage does not have a
req.connection._handle, it assumes that it is in the process of
connecting, and thus buffers writes up in an array.

The problem happens when you reuse a socket between two requests, and it
is destroyed abruptly in between them.  The writes will be buffered,
because the socket has no handle, but it's not ever going to GET a
handle, because it's not connecting, it's destroyed.

The proper fix is to treat ECONNRESET correctly.  However, this is a
behavior/semantics change, and cannot land in a stable branch.

Fix #4775
@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 15, 2013

Landed in v0.8.

@isaacs isaacs closed this Feb 15, 2013

@dougwilson

This comment has been minimized.

Copy link
Member

commented Feb 17, 2013

@isaacs It seems like this change is causing some people's "working" apps to die from socket hang up errors. I just saw a user report it at senchalabs/connect#750 while I was searching to see if any people were having the same issue as myself. For myself, I had to temporarily revert back to 0.8.19 while I figure out how to handle these. I am getting the socket hang up errors in a server application using express and and hang up will occur on the first write to a client.

The main issue is that a lot of the clients are sadly within networks which have monitoring equipment that is trying to block clients from accessing the application. Since the application is working over SSL, this monitoring equipment just injects RST packets into the network directed to the application and client, and so connections get reset all the time, typically they get reset faster than I can even add an error handler to the connection in express.

Do you have any suggestions, perhaps?

@mranney

This comment has been minimized.

Copy link

commented Feb 17, 2013

@dougwilson Listening for error events should solve the problem, unless express is getting in the way somehow.

On 0.8.19 you are probably leaking memory, but maybe that's better than the arms race that it sounds like you are already dealing with.

@dougwilson

This comment has been minimized.

Copy link
Member

commented Feb 17, 2013

@mranney Thanks, but AFAIK the memory leak was only for http clients, not servers. At least in my situation, it wasn't leaking, as since this.connection && this.connection.writable was passing it was not doing the buffer routine. The patch adds an additional check for this.connection.destroyed and emits the new error, which is what I am running into here.

I have been able to work around it now by listening for errors high up and just reverting to the pre-0.8.20 behavior of "swallowing" ECONNRESET errors and just letting the routines write to no where. I can see fixing the leak in 0.8.x, but emitting the errors kind of seems like a behavior change to me :/

@mranney

This comment has been minimized.

Copy link

commented Feb 17, 2013

Man, HTTP is hard.

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 17, 2013

@dougwilson Can you provide a standalone example script/test that shows the bad behavior?

If it is actually not a leak in the server case, and is causing new unwanted errors, then we will fix it, and a failing test will make that much easier. However, like @mranney, I suspect that you were actually leaking memory before, and ought to be listening to error events anyway.

@dougwilson

This comment has been minimized.

Copy link
Member

commented Feb 17, 2013

So I ended up using the following simple code which does not error pre-0.8.20 and does afterwards:

// node --trace_gc --expose_gc
var http = require('http');
var server = http.createServer(function (req, res) {
    res.once('error', function (error) {
                // Only encountered in 0.8.20
        console.error('sock: ' + error);
    });
    res.write('client, destroy connection\n\n');
    setTimeout(function () {
        res.write(Buffer(65536));
        res.write(Buffer(65536));
        res.write(Buffer(65536));
        res.end();
        process.nextTick(gc);
    }, 2000);
});

setInterval(function () {
    console.error('RSS: %d', process.memoryUsage().rss);
}, 2000);

server.listen(3000);

And I do indeed see a memory leak pre-0.8.20 and not in 0.8.20. I repeatedly ran curl http://127.0.0.1:3000/ and killed it so the socket would hang up while the server still intended to write. Of course thanks for the memory leak fix, but I still kinda feel like the new error being thrown is a change that probably shouldn't have been in a point release, but eh. There are likely many others not listening for errors on res like I wasn't, and now when they upgrade to 0.8.20 it is easy to DoS their node application if it is not behind some proxy that protects it from clients destroying their socket in the middle of sending a response.

Anyway, thanks, I'm listening to error events now on res. In hindsight I wish I had the foresight to have written "node": ">0.8.0 <0.8.20" in my package.json ;)

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 17, 2013

I see.

So, the memory leak definitely affects servers as well as clients. Potentially much worse, in fact. Check out this script: https://gist.github.com/isaacs/4973669

Run that script with node v0.8.19, and you'll see that it's much worse than just crisply emitting an error: the server keeps writing data and filling up the output buffer until it runs out of memory relatively quickly (or at least, would, if it was not just writing the same exact buffer repeatedly).

There's no universe in which that kind of behavior is not a bug. An ECONNRESET on a socket that you're writing to is clearly an unexpected, erroneous event. v0.8.19 will raise a hangup error if the ECONNRESET occurs at other times, but ignores it once the 'request' event is emitted. Thereafter, it treats reset sockets as if they were in the process of connecting (which is, for servers, even more wrong!)

It does pain me that the behavior changed. But choosing between leaking memory to slow death, vs emitting an error event (and crash your program if unhandled) with a clear and evident cause, it's a no-brainer in terms of overall stability and debuggability.

Add error event handlers to every object you touch, or use domains. You can use the cluster module, and something like the cluster-master module from npm, so that crashes in one worker don't make your site completely unavailable. There are ways to prevent the crash; but without emitting an error event, there is not any way to prevent the memory leak.

@dougwilson

This comment has been minimized.

Copy link
Member

commented Feb 17, 2013

Thanks. I am using clusters, so crashes were only taking out workers, but typically a worker would be handling lots of requests at once, so one client hanging up on the worker would cut off lots of other user's requests, which sucked. I mainly wanted to bring this up and hear what you though, so thanks a lot, I appreciate it!

@jonathanong

This comment has been minimized.

Copy link

commented Feb 17, 2013

can docs be added about error events being emitted? otherwise i (or other people new at this) would never think to listen for an error.

http://nodejs.org/api/http.html#http_class_http_serverresponse
http://nodejs.org/api/http.html#http_http_clientresponse

@isaacs

This comment has been minimized.

Copy link
Author

commented Feb 17, 2013

Doc patch welcome.

@dougwilson

This comment has been minimized.

Copy link
Member

commented Feb 23, 2013

Thank you, @isaacs, commit f9a0140 fixes my woes on v0.8; looking forward to next v0.8 patch release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants
You can’t perform that action at this time.