Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Echo server runs out of RAM #15

Closed
purplefox opened this issue Apr 25, 2012 · 33 comments
Closed

Echo server runs out of RAM #15

purplefox opened this issue Apr 25, 2012 · 33 comments

Comments

@purplefox
Copy link

The echo server example under any significant load exceeds available RAM very quickly.

This appears to be because the example writes the received websocket frame directly back out without doing any flow control.

The normal way to implement this in a scalable way using node is to use util.pump or stream.pipe, however the Fay WebSocket object does not appear to be usable in this way.

Shouldn't the Websocket implement read and write streams as is normal with most node.js objects which read and write data?

http://nodejs.org/api/stream.html

@jcoglan
Copy link
Collaborator

jcoglan commented Apr 25, 2012

I'm not sure what you mean, but I could also use educating on stream stuff I've probably missed. A call like websocket.send("Hello, world!") just converts the string into a buffer representing the required WebSocket framing format for the message, then writes the buffer to the WebSocket's TCP connection. I'm not sure where piping between streams comes in, what am I missing?

@purplefox
Copy link
Author

These might be helpful:
http://nodemanual.org/latest/nodejs_dev_guide/understanding_streams.html
http://elegantcode.com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/

Basically if you just read from the read stream and write immediately to the write the stream, as you do in your echo example, then there is a danger that the write stream buffer starts to grow without bound, if the system reads faster than it can write. Remember - node.js writes are queued in a buffer and the actual write occurs some time later.

To fix this, you need some kind of flow control. The way it works is as follows: you write until the write queue is full then, you tell the read stream to pause, you also set an event handler on the write stream that gets called when the write queue has drained, at which point you resume the read stream. By doing this you can guarantee that the write queue buffer will never exceed a certain size and thus you can avoid running out of RAM.

This is such a common thing to do that node.js provides helper functions for this - util.pump and stream.pipe, so you don't have to write all that code yourself.

You'll notice that the node.js echo example on their website uses pump, it does not manually write the buffer received to the output:

http://nodejs.org/

var server = net.createServer(function (socket) {
socket.write('Echo server\r\n');
socket.pipe(socket);
});

If your websocket implemented read and write stream you'd be able to do the same with websockets, and avoid running out of RAM, something like:

var server = http.createServer();

server.addListener('upgrade', function(request, socket, head) {
var ws = new WebSocket(request, socket, head);
ws.pipe(ws);
});

server.listen(8080);

HTH :)

@jcoglan
Copy link
Collaborator

jcoglan commented Apr 25, 2012

With WebSockets, even in an echo server, the bytes coming in are different from those going out. e.g. the incoming message is masked while the outgoing one, though it contains the same text payload, is not. i.e. instead of just a series of stream -> stream pipelines, you have stream -> parser -> user code -> encoder -> stream.

It sounds like I need to do something with the underlying stream object, i.e. this thing:

https://github.com/faye/faye-websocket-node/blob/master/lib/faye/websocket.js#L36

On the data event the incoming bytes are parsed into a sequence of WebSocket frames and yielded to the user. On a send() call, and other internal situations, we call write() on the socket. Will I need to implement the flow control myself, e.g. wrap the stream object in a wrapper that ensures reading only happens while there is write capacity, or are there existing patterns I should be using?

@purplefox
Copy link
Author

If you implement the functions:

pause()
resume()
write()

as per the streams api, node.js will take care of the actual pumping for you

@purplefox
Copy link
Author

pause - is easy - just call pause on the underlying socket that you are reading the data from
resume - also easy - just call resume on the underlying socket that you're reading from
write() - this would be the same as your current write, but return false if the kernel buffer is full - you can just return the value of writing on the underlying socket.
you'd also need a drain event, again i think you can just delegate to the socket one.

Those are the key things. You may need a few other bits and pieces

Basically you need to implement both ReadableStream and WritableStream

http://nodejs.org/api/stream.html

Btw - I am not a node expert, but I implemented something similar here https://github.com/purplefox/vert.x

@jcoglan
Copy link
Collaborator

jcoglan commented May 1, 2012

I think I've been thinking about this all wrong: The library's abstraction model currently looks like this:

+---------+   +---------+   +---------------+
| TCP IN  |-->| Parser  |-->|               |
+---------+   +---------+   |               |    +-----------+
                            | WebSocket API |<-->| User code |
+---------+   +---------+   |               |    +-----------+
| TCP OUT |<--| Encoder |<--|               |
+---------+   +---------+   +---------------+

My initial reading of this is that I should implement this:

+---------+   +---------+   ++-----------+
| TCP IN  |-->| Parser  |-->|            |
+---------+   +---------+   |            |    +---------------+    +-----------+
                            | Stream API |<-->| WebSocket API |<-->| User code |
+---------+   +---------+   |            |    +---------------+    +-----------+
| TCP OUT |<--| Encoder |<--|            |
+---------+   +---------+   +------------+

But I was confused: what if the user is only writing to the socket, and the 'source' of those writes is some other channel entirely, for example incoming HTTP requests, or AMQP messages? There's no point stopping writing to my TCP internally since I'll just have to buffer the messages myself in RAM behind the WebSocket API.

What I now think you're suggesting is this:

+---------+   +---------+   +---------------+
| TCP IN  |-->| Parser  |-->|               |
+---------+   +---------+   | WebSocket API |    +-----------+
                            |       +       |<-->| User code |
+---------+   +---------+   |  Stream API   |    +-----------+
| TCP OUT |<--| Encoder |<--|               |
+---------+   +---------+   +---------------+

i.e. expose the Stream API to the user on the WebSocket object, so then it can be used with Node's stream utils such that Node will stop reading from whatever input channel is causing writes to pile up on the WebSocket output channel. Is this correct?

My only concern with this is adding to the standard WebSocket interface, but if this is the standard way of dealing with data flow in Node then so be it.

@jcoglan
Copy link
Collaborator

jcoglan commented May 1, 2012

Partly my confusion stems from the fact I assume streams were all about byte sequences. TCP emits a stream of bytes, but then WebSocket chunks these into discrete text messages. On the way out, WebSocket takes discrete text messages and converts them into byte sequences to go out over TCP. Is this behaviour compatible with the Stream API?

What about higher-level constructs? e.g. 'writing' to a Faye messaging client is client.publish('/channel', {hello: 'world'}). I'm not sure how you'd model this as a simple write() call if you were to implement Stream on Faye.Client. (Feel free to ignore this if you're just interested in the WebSocket layer).

@squaremo
Copy link

i.e. expose the Stream API to the user on the WebSocket object, so then it can be used with Node's stream utils such
that Node will stop reading from whatever input channel is causing writes to pile up on the WebSocket output channel. Is
this correct?

Yes, that's it. In other places I have simply implemented the Stream interface and left it at that, but if you want to keep your websockets API then perhaps consider having some equivalent for flow control in that API too; e.g., return false if you can't write to the underlying socket.

NB this is the reason I argued for using the Stream interface in SockJS-Node -- but now, of course, we have people asking for interoperability with libraries that invented their own API. C'est la vie.

@jcoglan
Copy link
Collaborator

jcoglan commented May 17, 2012

Yes, interop is s big part of faye-websocket. The fact that it implements the standard API means I can reuse client-side socket code easily. Fortunately browsers seems to return true/false from send() to indicate success, so that's something I can support in this project.

@squaremo
Copy link

With "invented their own API" I was taking a dig at socket.io ..

Following the client-side websockets API is a different story. Yep, returning true/false on send and maintaining the bufferedAmount value* is good enough to build a Stream abstraction on.

*That is, buffering sends when the underlying socket is "blocked". Maybe you already do this.

@dominictarr
Copy link

In node core all streams are streams of bytes, or strings, but this is only because node only implements low level IO,
and leaves everything else to userland.

There is nothing about the Stream API that prevents you sending any type of object down the pipe.
You can make streams of objects, and you can assemble the chunks so that the edges have semantics,
(example, split a stream into lines) and the whole Stream#pipe thing still works great.

@jcoglan
Copy link
Collaborator

jcoglan commented Sep 8, 2012

Looking at this again after someone else reported memory leaks. How are you generating the load to make the server run out of memory? Do you know where the memory leak comes from?

@asp24
Copy link

asp24 commented Sep 9, 2012

Sorry for my english, but i'll try to describe setup errors and solutions.  
After upgrading faye from v 0.4 to 0.8 and rewriting some part of additional server code we've faced the problem of huge RAM growing that never decreas but only inrease over time. As a result, OOM just kills all node process
because RAM is over. I also added node Custer module support and running faye in fork process (faye-redis engine also was used) with upgrading to v0.8.3. I've added the Watchdog timer for restarting child process, because the RAM growing was still very high. This solution helps clients to reconnect the most quickly and without data loss. So, RAM growing is not a problem now but anyway the memory leak is present.
I'll try to attach some memory usage graphs later.

@jcoglan
Copy link
Collaborator

jcoglan commented Sep 10, 2012

@asp24 Ideally I'd like a script that creates load on the echo server that replicates the memory leak. All attempts I've tried so far have failed to produce excessive memory usage so I need someone else to give me some code to reproduce the problem.

@asp24
Copy link

asp24 commented Sep 10, 2012

@jcoglan this is problem because users are real. But I am ready to give any debug information you need.

@jcoglan
Copy link
Collaborator

jcoglan commented Sep 10, 2012

@asp24 As I said, I need a script that creates load against the example echo server, that makes the server leak memory. I've run scripts that open thousands of sockets and fire messages at it as fast as possible, and memory usage remains low for me.

@asp24
Copy link

asp24 commented Sep 10, 2012

@jcoglan what if tests always check good scenario? In real world network is very unstable. Maybe leak is there?

@dylanvee
Copy link

@squaremo Have you found any good WebSocket implementations that correctly adhere to the Stream API?

@dominictarr
Copy link

@dylanvee there is shoe which wraps sockjs in a Stream.

It should be straight forward to use the same idea to wrap engine.io, and faye, etc,
I have this on my todo list, but I havn't done it yet, because shoe works well.

@dylanvee
Copy link

Shoe is sockjs-node with the addition of a client-side Stream API. SockJS's Connection class is a Stream, but it does not support flow control.

However, I suppose I shouldn't start worrying about that until a real-world workload demonstrates that it is a problem.

@dominictarr
Copy link

hmm, I thought that WebSocke#send() returns false if the buffer is full, which is what you need.
That issue doesn't go into much detail...

@jcoglan
Copy link
Collaborator

jcoglan commented Oct 26, 2012

Adding stream support to this library has been on my mind for a while, just haven't had time for it. My current plan is to make WebSocket and WebSocket.Client into streams, so the same object will support both the WebSocket and Stream APIs. Do you think I should instead wrap the WebSocket object in a Stream?

I don't think this is possible right now because send() does not give you enough information. It only returns false if there is an error while writing to TCP or if it knows the connection is closed. It does not implement flow control.

I also have a question about symmetry: send() takes a string/buffer, but onmessage yields an event object with a data property that is a string/buffer. If you want to pipe() two WebSockets together, how should this work?

@squaremo
Copy link

@dylanvee

Have you found any good WebSocket implementations that correctly adhere to the Stream API?

I've not, but then I've not really been looking ..

@jcoglan

Do you think I should instead wrap the WebSocket object in a Stream?

I don't understand what this entails -- what's "wrap" here, how would it look to someone using it?

I also have a question about symmetry: [...]

In the scenario in which a WebSocket is both Stream and er, WebSocket, I imagine .onmessage would be the websockety handler, called with an object if present; and on('data', cb) would be the streamy handler, with cb invoked with a buffer. WebSocket.send() and Stream.write() can be synonyms. The Stream and WebSocket interfaces don't need to interact, do they?

@dominictarr
Copy link

you'd use it like this

stream1.pipe(faye).pipe(stream2)

faye.on('data', onData(string)) is also fine.

Correct, send and write would be synonyms,
and 'data' should be the message string.

onopen should be emit('connect') (like in http://nodejs.org/api/net.html#net_event_connect

onclose should be emit('end')

stream.end(), and stream.destroy() should trigger ws.close()

@pselden
Copy link

pselden commented Dec 13, 2012

Perhaps this is unrelated, but I was debugging a memory leak on my faye servers and found that there was a large buffer that was hanging around in memory. Tracing through its retainers, I found it inside a Faye.Engine.Connection instance. The underlying socket has its readyState set to 3 (CLOSED), but the connection was never deleted from the _connections object. The socket was hanging around in memory forever with a buffer that hand been filling up with stuff that would never be delivered.

I could not determine what was causing the root issue (closed socket but no flush/disconnect message being sent), but was able to reproduce eventually by having clients rapidly reload a page that had a connection being made. Eventually (was no reliable way to reproduce it instantly) one of the connections would stick around forever.

I am trying to mitigate the issue by checking the proxy's connection list for closed websockets in the engine's GC routine, and calling flush on the connection if it is marked as closed and eligible for garbage collection. I'll let you know if this relieves any memory pressure.

@jcoglan
Copy link
Collaborator

jcoglan commented Dec 13, 2012

@pselden4 The proxy that sits between the Server and Engine should already by listening for disconnect from the Engine and flushing any related sockets. At least it's supposed to do that -- open a new issue if you find problems in it.

@pselden
Copy link

pselden commented Dec 13, 2012

Yes, it isn't properly disconnecting in all cases, and the buffer is sticking around in memory forever. I just implemented an additional reaping process that flushes the closed ones and I'll monitor to see if relieves the leak or not.

It didn't make sense to me that the out of RAM issue is the result of just using buffers and not streams because you'd have to be pumping gigabytes through the system at one time to cause the buffers to grow that big. My hypothesis is that it leaks the connections (which contain the socket, which holds the buffer).

I'll file a bug with faye.

@pselden
Copy link

pselden commented Dec 14, 2012

My reaping process has not significantly affected memory consumption (still gradual growth that never goes down). It's safe to say this is not the root cause of the issue.

@pselden
Copy link

pselden commented Dec 21, 2012

In case you were working on this now -- changes for node 0.10 streams on the horizon! http://blog.nodejs.org/2012/12/21/streams2/

@jcoglan
Copy link
Collaborator

jcoglan commented May 1, 2013

So, with apologies for how long this has taken, I just pushed some changes that make this library support streams. This is done in two parts:

  • The new module websocket-protocol is a WebSocket protocol handler with pluggable I/O that has a streaming interface and implements backpressure.
  • The extract_parser branch of this repo is now based on this module. The WebSocket class is a duplex stream and the EventSource class is a writable stream. The original browser-compatible API on these classes is still supported.

I'd like a few of you to begin trying this new version out and let me know how it goes. Thanks to everyone who gave me pointers on how streams work.

@jcoglan
Copy link
Collaborator

jcoglan commented May 5, 2013

The 0.5.0 release of this library, combines with the new websocket-driver module, now supports the Node stream API.

@jcoglan jcoglan closed this as completed May 5, 2013
@squaremo
Copy link

squaremo commented May 9, 2013

@jcoglan Did you use Node.JS v0.10 "new style" streams or "old style" streams? It looks like the latter -- but actually I think that's OK, because I'm not sure new style streams really work with framed data. For instance, there's nothing stopping the Readable implementation from trying to concatenate the chunks in its read buffer, so far as I can tell.

@jcoglan
Copy link
Collaborator

jcoglan commented May 9, 2013

They're old streams, which 0.10 still supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants