Lots of connections in CLOSED state #99

yurynix opened this Issue Dec 5, 2012 · 52 comments


None yet
5 participants

yurynix commented Dec 5, 2012

I'm on FreeBSD9, node 0.8.15, sockjs 0.3.4
After the application is running for a few hours, i have lots of connections with CLOSED state in netstat,
after killing the nodejs process, this conncetions dissapear, when i had a lower fds limit the process died with "accept EMFILE", same stacktrace as #94 , not sure if thats a node issue or sockjs issue not disposing sockets properly

Here is output after the app ran for ~18 hours:

netstat -an -p tcp | awk '{print $6}' | sort | uniq -c | sort -n
echo sysctl kern.openfiles

output =>
14590 CLOSED
kern.openfiles: 18652

Not sure how to debug it further, any advice?


majek commented Dec 5, 2012

Good spot! It doesn't seem to occur on Linux. Are you sure it's SockJS fault? Or is it maybe a nodejs bug?

yurynix commented Dec 5, 2012

Not sure if node's, sockjs' or my fault =)

However I think i've narrowed it a bit down,
In my app the connected client needs to send some auth data to the server and then it's authnticated.

sockjs.on('connection', function() {
this._notAuthedSockets.add( socket.remoteAddress + ':' + socket.remotePort, socket );

then I run every 30secs a function that checks wheter that connection has sent auth or not:

    var notAuthedKeys = this._notAuthedSockets.getAllKeys(),
        currentKey = null,
        currentSocket = null;

    this.logger.log('info', "Checking which sockets timedout to auth, pending: " + notAuthedKeys.length);
    for (var i = notAuthedKeys.length - 1; i >= 0; i--) {
        currentKey = notAuthedKeys[i];
        currentSocket = this._notAuthedSockets.get(currentKey);

        if ( currentSocket.connectTime + this._authTimeoutSeconds < Date.now() ) {
            this.logger.log('info', 'Auth timeout for ' + currentKey );


If the commented line is uncommented ( currentSocket.close() or currentSocket.end() ), the process is leaking sockets in CLOSED state, otherwise seems to work just fine, except i have connections just hanging there and doesn't do anything.

Also the connection.end()/close() seems to stuck only certain connections in CLOSED state from certain hosts, my guess is some kind of proxy maybe breaking the protocol?

I'm now digging in the captures of the offending hosts to find something common.


majek commented Dec 6, 2012

connection.close() method in sockjs-node source causes underlying socket to finish long-polling http reply. It may close underlying TCP/IP socket, or not depending on http keep-alives.

Still, TCP/IP connections being reported by netstat as 'CLOSED' look wrong. It looks like node.js isn't close()-ing them, or doesn't detect that they were closed by remote host.

yurynix commented Dec 6, 2012

My current theory is that websockets are not bieng properly cleaned up, so I went to

And added following code:

var timer2 = undefined;
var timer = setTimeout( function() {
        // check socket after 20mins, in my scenario they shouldn't survive more then 5min.
        if ( this.readyState === API.CLOSING ) {
                console.log(this._stream._peername, "Timeout catch socket in: ", this.readyState);
                // if socket in closing/closed state, wait 5 mins more mins
                timer2 = setTimeout( function() {
                        // if socket is still in CLOSING state, lets try to close it
                        if ( this.readyState === API.CLOSING ) {
                                console.log( this );
                                console.log("peer: ", this._stream._peername );

                        } else {
                                console.log(this._stream._peername, "Socket ended up in: ", this.readyState);

                }.bind(this), 5 * 60 * 1000 );
}.bind(this), 20 * 60 * 1000);

var oldEnd = request.socket.end;
request.socket.end = function() {
        if ( timer ) {
                clearTimeout( timer );
        if ( timer2 ) {
                clearTimeout( timer2 );

If the socket would've cleaned up ok, it should end up in API.CLOSED, not API.CLOSING
I'm getting many ips here corresponding with the connections stuck in CLOSED state in netstat, after the call for end() here directly on the socket, it seems that the connections in netstat are now not accumulating in CLOSED state, i have some, but they are not constantly growing as before.

Thats not a real solution though, just a an ugly workaround =)
Any suggestions regarding tracing the code flow that leading to the stuck sockets in CLOSING?


majek commented Dec 6, 2012

@yurynix great investigation, thanks! I think some sockjs-node users have noticed similar behaviour. I've sent a message on sockjs mailing list calling for verification: https://groups.google.com/group/sockjs/browse_thread/thread/9e847dc03efe7ac8

shripadk commented Dec 6, 2012

Just went through the entire thread. @yurynix Are you using a proxy in front of the Node process? If so, then it might be a problem with the proxy itself. The proxy might not be closing its connection to the Node process. The netstat command is also not good enough. It scans the state of all connections on the system. Instead narrow it down to the process's port so we can know if its indeed the node process or the proxy thats causing the problem.

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

yurynix commented Dec 6, 2012

@shripadk No, sockjs running on port 80 without anything in front of it.
also the system in question only runs that single node process, no other services, so the netstat without additional filters is good enough imo.

However, here's the output you requested:
root@fbsd (~) > netstat -anl | grep 80 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

Just to clearify, the output above is when i'm running with the hack i've pasted above, if I remove it we'll see CLOSED stats here rising with time passing.

shripadk commented Dec 6, 2012

Can you try changing this._stream.end() to this._stream.destroy() in https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/api.js#L64 and see if you can reproduce the same issue?


majek commented Dec 6, 2012

@shripadk Come on man, two lines above the readystate is set to CLOSED. This bug is about readyState being stuck at CLOSING!

yurynix commented Dec 6, 2012

@shripadk The close function there is never called for those sockets, if it would, the socket would end in readyState === API.CLOSED, https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/api.js#L62
But instead, it ending up in API.CLOSING.
I will try it anyway, atm, I'm trying @majek's suggestion at the mailing list, passing false to ws.close(), will post results later.

shripadk commented Dec 6, 2012

@majek Where? Has @yurynix has given only the output of the socket states. Also, it cannot end up in API.CLOSING without calling the close function. Can any of you tell me where and how it would set API to CLOSING without ever calling the close() function????? Unless I have gone completely blind :P


majek commented Dec 6, 2012

a) netstat outputing CLOSED != faye having socket readyState=CLOSED
b) take a look at patch from @yurynix, crucial line is this.readyState === API.CLOSING
c) yes, in this bug we are wondering why close() wasn't called.

shripadk commented Dec 6, 2012

I know netstat output isn't the same as readyState=CLOSED. I already mentioned that. Please read what i said :)


majek commented Dec 6, 2012

(to avoid doubts, I'm speaking about this internal close function)

shripadk commented Dec 6, 2012

Ah ok now i get it. I was talking about the outer close() function.

shripadk commented Dec 6, 2012

Also I don't understand why the close() function is so convoluted. There is no need for an ack AFAICT. Can just be:

    if (this._parser.close) this._parser.close(code, reason);

Any reason why we need to check for ack?


majek commented Dec 6, 2012

@shripadk is that the question to @jcoglan, faye author?

yurynix commented Dec 6, 2012


I've applied your suggestion directly to compiled JS:

--- a/sockjs/lib/trans-websocket.js
+++ b/home/yury/app-sockjs/node_modules/sockjs/lib/trans-websocket.js
@@ -130,7 +130,7 @@
     WebSocketReceiver.prototype.didClose = function() {
       WebSocketReceiver.__super__.didClose.apply(this, arguments);
       try {
-        this.ws.close();
+        this.ws.close(1001, "error", false);
       } catch (x) {

@@ -184,7 +184,7 @@
       if (reason == null) reason = "Normal closure";
       if (this.readyState !== Transport.OPEN) return false;
       this.readyState = Transport.CLOSING;
-      this.ws.close(status, reason);
+      this.ws.close(status, reason, false);
       return true;

@@ -193,7 +193,7 @@
       this.ws.removeEventListener('message', this._message_cb);
       this.ws.removeEventListener('close', this._end_cb);
       try {
-        this.ws.close();
+        this.ws.close(1001, "error", false);
       } catch (x) {


The code I've insterted in websocket.js at faye's lib isn't called and I don't see sockets in CLOSED state at all
with netstat, so I guess it fixed the problem =) I'll let it run for a few more days to check for sure.

shripadk commented Dec 6, 2012

@majek yes it was for the author of faye-websocket-node (@jcoglan). checking for ack isn't necessary. I guess he added this mostly to differentiate client initiated disconnection (via opcode 0x08) from raw socket close (so you can send the reason for terminating the connection instead of leaving the client wondering why his connection was terminated in the first place).

Anyways here is my reasoning behind why this is happening:

  1. When you call close() without arguments, it calls this._parser.close because ack is null.

  2. Now when I looked at https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/hybi_parser.js#L255 I saw few problems
    a. The callback (the inner close() which can end the stream) is stored in this._closingCallback but never called. (this._closingCallback is only called in _emitFrame which is I assume for parsing the received websocket stream and only after 2(e)).

    b. It then calls this._socket.send(reason || '', 'close', code || this.ERRORS.normal_closure); which in turn calls this._parser.frame (https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/api.js#L48)

    c. Seems like there is something wrong with the way the close frame is constructed. I haven't gone into reading the code in depth but most likely that is the issue https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/hybi_parser.js#L198

    d. The author does not force close the connection on the server end. Its assumed that the client will acknowledge the close frame (if he ever gets it) and closes it on his end and then sends a close opcode of 0x08 and then the server closes it : https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/hybi_parser.js#L313. (I don't remember if this is a requirement of the websocket spec but this is how its done in faye).

    e. So the stream is not closed until and unless this._closingCallback is invoked. Since its never invoked the socket continues to exist in CLOSE_WAIT state (because the client has closed the connection, but the server hasn't).

    f. If there is nothing wrong with the way the close frame is constructed, then the only logical conclusion is that the client has left even before sending back a close frame.

shripadk commented Dec 6, 2012

@majek Also, this issue does not exist for the Draft 75/76 parser: https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/draft76_parser.js#L95 as the callback is called immediately (unlike the hybi one which requires the client to send back a close frame for the stream to be closed -> which i don't think will ever happen because the client would have gone away even before the frame can be sent back. I guess we have a race condition here!).

yurynix commented Dec 7, 2012

@shripadk I haven't dig all the code at faye-websocket yet, however from how it behaves and the packet captures i've done on the offending clients, it looks like: sockjs/sockjs-client#94 (My server running on port 80)

The user establishes websocket connection but then fails to send any data (I'm sending auth token from the client onOpen, which never arrives) I think the same thing is happening here, the client attempts send the ack, but fails and the server never recives the ack leaving the socket in CLOSING state, never cleaning it up.

I'll try track down those users and ask them about their network config, imo it's some broken firewall/antivirus software

shripadk commented Dec 7, 2012

@yurynix yeah maybe its a firewall/antivirus software thats causing this issue. As far as the API.CLOSING state is concerned, it cannot go into that state without the server initiating a close. When you open a websocket connection, does the socket go into CLOSE_WAIT state immediately (NOTE: I'm talking about socket state and not inner API state now)? or does it take few hours before which it goes into CLOSE_WAIT? If its the latter, then its a result of keepalive (2 hours or whatever idle time you have set) after which the socket times out (server then initiates close()).

yurynix commented Dec 7, 2012

@shripadk I'm calling sockj's end() on connections that doesn't send auth after 30sec, so i think what happened is:

Client connects to websocket ->
sockjs thinks it's open ->
client sends auth token onOpen ->
token never arrives ->
server calls sockj's end() on the socket after 30secs ->
faye-websocket tells client to disconnect and send ack ->
client does exactly that but ACK never arrives ->
client closes socket ->
OS socket on server goes into CLOSE_WAIT and then CLOSED state ->
faye-websocket remains in API.CLOSING forever, not calling node's end() on the socket,
which not calling close() syscall on the socket and therefor leaking resources associated with the socket.

I've also seen cpu constantly with the sockets leaking,
don't know if that's related to that issue or it's something completly different.
With @majek suggestion the app now running ~26 hours, no sockets in CLOSED state, and cpu relatively low.

darklrd commented Dec 9, 2012

@yurynix I also tried this fix and it seems to be working for me. There are no sockets in CLOSED state now. But the ram usage seems to increase with time when ws is used. Perhaps it is not being garbage collected, are you also experiencing this? Thanks.

yurynix commented Dec 9, 2012

@darklrd My stats atm:
Versions: { http_parser: '1.0',
node: '0.9.3',
v8: '',
ares: '1.9.0-DEV',
uv: '0.9',
zlib: '1.2.5',
openssl: '1.0.1c' }
Uptime(seconds): 239915
Connections count: 2235
Max connections count: 2566
Connections seen: 1383948
Protocol counters: { 'xhr-streaming': 768,
websocket: 1321,
'jsonp-polling': 7,
'xhr-polling': 135,
eventsource: 4,
htmlfile: 0,
undefined: 0 }
Memory: { rss: 337092608, heapTotal: 189922160, heapUsed: 167777376 }

ps aux:
yury 73698 53.3 8.0 963548 332084 2 S+ Thu04PM 775:21.80 node server.js

My memory growth however might be related to my code, I need firstly to rule that out.
What mostly concerns me is the cpu growth over time.
@darklrd You have any %cpu issues? Can you provide some stats regarding your app?

darklrd commented Dec 9, 2012

@yurynix No, cpu usage seems to be normal (4.6%), the only issue I am observing is that ram usage keeps on increasing continuously until I restart app.

node: v0.8.15
memory: {264MB, 66.5MB, 34.7MB}
ws connections: 630
uptime: 1.5 days

According to my observation, the memory isn't released when connections are closed and it keeps on increasing as new connections are established.

yurynix commented Dec 9, 2012

@darklrd What OS you're on? I'll try later today remove all my code, leaving just the connection counters and see if I'm still having cpu/memory issues.

darklrd commented Dec 9, 2012

@yurynix Ubuntu 12.04, what about you? I would be interested in your findings. Thanks.

yurynix commented Dec 9, 2012

@darklrd I'm on FreeBSD 9.0-RELEASE

yurynix commented Dec 9, 2012

I think we're might also be hitting node bug: joyent/node#3613
My netstat now:
FIN_WAIT_2 1520
kern.openfiles: 4665

notice FIN_WAIT_2, that explains the memory leak if node indeed not cleaning up the sockets properly. =\

darklrd commented Dec 9, 2012

I see. Thank you. How long has your server been running? Is it same as you have mentioned before? I didn't keep track of FIN_WAIT_2. I will try again.

yurynix commented Dec 9, 2012

@darklrd it's the same one, 251555seconds. How is your netstat FIN_WAIT_2 ? You see the same thing?

darklrd commented Dec 9, 2012

My netstat (as of yesterday):


I will restart the app and track it now.

@yurynix is TIME_WAIT a problem here?


majek commented Dec 9, 2012

BTW, the lsof can't identify protocol issue: https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/

yurynix commented Dec 10, 2012

@darklrd TIME_WAIT/FIN_WAIT_2/otherstate is not a problem by itslef, as long as you don't have sockets stuck in some state forever, in my previous netstat output sockets in FIN_WAIT_2 was rising with time, and thats not really good, eventualy I'll run out of resources.

After seeing joyent/node#3613, I've digged a bit in libuv, from what i can see
(I'm really new to node, so it's better if someone could recheck this)
socket.end() -> eventually ending calling shutdown(fd, SHUT_WR) syscall and then close()
socket.destroy() -> eventually ending calling close() syscall on the fd, without shutdown()

while the first one will await client acknowlgment, in my case in FIN_WAIT_2, the later will discard the socket sooner.

So i changed https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/api.js#L64


And restarted, after 16 hours, netstat:
kern.openfiles: 2902

So it seems to solve that one for me...
Memory: { rss: 231M, heapTotal: 123M, heapUsed: 105M }
Connections count: 2062
Max connections count: 2514

Regarding memory/cpu, too soon to say if it was related.

@darklrd Is the amount of TIME_WAIT sockets in your server rised since yesterday or remained constant?

darklrd commented Dec 11, 2012

@yurynix I will try this code change this weekend. TIME_WAIT sockets was almost half of ESTABLISHED sockets always.


majek commented Dec 14, 2012

Fixed in 0.3.5

majek closed this Dec 14, 2012

darklrd commented Dec 14, 2012

@yurynix any update on memory usage? Thanks.

@yurynix Precisely what i was saying earlier about the this._stream.destroy(). But then it got drowned in a lot of confusion :) Also @majek I feel the fix should have been done in faye rather than sockjs itself. My 2 cents.

darklrd commented Dec 14, 2012

ok, I am going to try this code change and will report back how it goes, thanks!

majek referenced this issue in faye/faye-websocket-node Dec 14, 2012


Connections lost in API.CLOSING state causing EMFILE error #19

yurynix commented Dec 14, 2012

Yeah, there are 2 issues here:

  1. faye-websocket stuck in API.CLOSING and not reaching the internal close() if ACK from client not recived
  2. when it reaches there, due to the handling of the node's close() with shutdown() syscall, the socket in some cases ending up stuck in FIN_WAIT2 (on fbsd, in linux it stuck in some other state imo, like @darklrd said in TIME_WAIT)

the first issue @majek handled when he passing the false to the ack parameter, the second issue however can't be handled from sockjs code and faye-ws code should be changed to call destroy() on the socket when the server initiating the socket close, this is however not a polite method to close a tcp connection.

So not sure if that should be handled at faye-ws or in node's code.

@darklrd The memory usage seems to be ok, rising up to ~350MB when i'm at ~2k connections, dropping to 250MB when i'm down to 1k connections at late night. Don't think there's a problem here, but too soon to say, i've restarted again to try @majek 's fix to #103 (still too soon to say if it's stable, just running for a day or so, i'll post stats after the weekend)
A note here: I'm currently running with the --nouse_idle_notification, trying to see if it solving my high cpu usage after a few days, I think it due my state tracking with a large hash, still investigating it, do you have any cpu issues after a while?

darklrd commented Dec 14, 2012

@yurynix I was earlier using faye-websocket@0.4.0. I switched to its master branch now (but my node.js version still is 0.8.15). Memory RSS (image below) increases continuously although number of connections is constant. Sockets in TIME_WAIT and FIN_WAIT2 state also appear to be constant.


Yes, after upgrading to master branch of faye-websocket my cpu usage is higher now and is increasing continuously.

yurynix commented Dec 30, 2012

@darklrd Just wanted to let you know, running node with --nouse_idle_notification fixed my high cpu load.

darklrd commented Dec 30, 2012

@yurynix Thank you so much! Unfortunately I am still struggling with memory issue but thanks again!

yurynix referenced this issue in socketio/socket.io Feb 23, 2013


memory leak #1015

Kamil93 commented Mar 18, 2013

Problem still exists on FreeBSD 9.1 amd64 (sockjs 0.3.5). After few days of server work, I get error about file descriptors limit.

throw arguments[1]; // Unhandled 'error' event
Error: accept EMFILE
at errnoException (net.js:770:11)
at TCP.onconnection (net.js:1018:24)

Here is an output of "netstat -an -p tcp | awk '{print $6}' | sort | uniq -c | sort -n"

netstat: kvm not available: /dev/mem: Permission denied
1 Foreign
1 been
370 FIN_WAIT_2

any ideas? Or maybe ugly workaround that should work :(

yurynix commented Mar 19, 2013

I think your netstat output is reasonable, and you probably hitting process / system FDs limit,
check the sysctls:


Kamil93 commented Mar 19, 2013

I'm not hitting, because everyday I have same traffic. At night about 0 connections, at day 500-1000) and it works. If I would hitting FDs limit it should happens everyday, but crash happens after few days, so this is about accumulating.

Also I haven't access to check kern.maxfiles and maxfilesperproc. But "ulimit -n" shows 8000.


majek commented Mar 20, 2013

@yurynix Are you sure you're running code with this patch: #99 (comment)

yurynix commented Mar 21, 2013

After we previously discussed that, I've applied your patch + additinoanly made in in the faye_websocket the following change:

    var close = function() {
      this.readyState = API.CLOSED;
      if (this._pingLoop) clearInterval(this._pingLoop);
      if (!ack) {
      } else {
      var event = new Event('close', {code: code || 1000, reason: reason || ''});
      event.initEvent('close', false, false);

at https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket/api.js#L57
with the above changes my app succefully run for a month, with 2K concurrents and 200K seen connections a day.

When I had a problem i was hitting much more FIN_WAIT_2 sockets then @hashi101 so I don't know if it's the same issue.

Kamil93 commented Mar 21, 2013

@yurynix how much more FIN_WAIT_2 you had?
Because for now (about 4 days of server running) I have 2571. And others:

netstat: kvm not available: /dev/mem: Permission denied
1 Foreign
1 been
2571 FIN_WAIT_2

yurynix commented Mar 21, 2013

2571 is suspicious, in your previous post 370 is not that high number of fin_wait_2 sockets for a busy server.

If in your case sometimes the server initiates closing of the socket,
Try patch faye_websocket to call destroy instead of end here:

discussion about it you can find here: faye/faye-websocket-node#19

another solution i would try is to deploy sockjs behind haproxy, that's my setup now, on FreeBSD 9.0-RELEASE-p6, i'm not seeing any fin_wait_2 issues, but then again, i'm running with the patch above.

Kamil93 commented Mar 21, 2013

I've already run with this patch for faye-websocket. I mean just "this._stream.destroy();" instead of this._stream..end();

Now, instead of it I'm gonna try:

if (!ack) {
} else {

but I doubt it would change anything. Anyway. Can I run HAProxy with only one IP? I mean HAProxy on different port, and SockJS workers on differents ports.


I've update my faye-websockets, and in this new version of it, still was "this._stream.end();" so now I've changed it to .destroy(); hope it helps :) I will tell about it after weekend.


Everything seems to work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment