@maxpert maxpert released this Nov 23, 2017 · 6 commits to master since this release

Assets 2

Simplicity over complexity

DISCLAIMER some people think this post is like a negative press for golang or I am trying to bash Golang. If you look at codebase history you will see how experimental my code was and I have been playing with go for quite long time. I totally love Go; these are just my notes on why I choose Node.js over Go, and TLDR; is I find it simpler and convenient to do it in Node.js and it works well for a single core machines (which is what I am targeting). I don't find complexity of go-routines, channels back and forth, or writing my own event systems, or my own disruptor library worth it. It's like using hammer for every problem. Go out performs Node.js when it comes to speed; but if it comes down to write my own event loops I will choose a framework that's built for doing so.

Before I even begin to explain and reason about what and why, let me rewind and jot down list of motivations for building this small system. Raspchat is "supposed to be" plain, simple, and the dumbest chat system implementation for modern browsers, that inspires most of it's ideas from IRC. Having said that I don't want to write it in something as boilerplate as C/C++ but still want maintain reasonable scale for something as primitive as Raspberry Pi (hence the name Raspchat). The rule to remember is how well it behaves on single core machine with limited memory. It started off from idea of putting my rotting RPi to work.

With that in mind there are three pieces to solve. The front-end, the back-end, and the protocol they would talk over. For protocol I have already started documenting and it's not anything close to a innovation; it's just a repetition of tried/tested JSON messages going across wire; with properties carrying a meaning.

At time of implementation when I looked at the available options; Golang and it's efficiency hype made it an obvious choice for me (Go was probably 1.3 at that time and Node.js was something like 5.x). Go was a great choice for implementation because it cross-compiles, and while the concepts of go-routines and channels is nothing new, having giants like Google backing it is a great assurance for it's bright future. However after implementing the system, then benchmarking, and carefully analyzing the system I realized it's not the right design choice for my scenario. Why would I say that? Let's have a look:

Go-routines are not cheap!

Let's start of with biggest problem. For Raspchat I used the Gorilla WebSocket Library for obvious performance reasons. Now for an active connection the library implements read and write blocking methods (every library out in wild does that; not gorilla's fault). For a fully functioning full duplex server I would need 2 go-routines per client (one constantly reading and other dedicated to write). Not going in nitty gritty details but you would need some additional channels as well to get messages in and out of these go routines and also to signal them when they should stop etc. It's just a typical house keeping stuff that every go developer understands. This is where the problem starts, if calculate the stack size required for these go-routines it's around 4KB to begin with (2KB each); not taking anything else into consideration yet. So you are paying a cost of 4KB upfront per connection on an already resource constrained system (In reality it's obviously more than that due to extra fields you can look at code). With embedded storage, multiple channels for every room membership, and synchronization constructs like mutex and everything the overall cost was way more. 4KB does sound innocent on paper, but you can't get everything done just with these go routines, you need some communication mechanism and that's what leads to my next discovery.

Channels are not very efficient! And you need a lot of them for implementing PubSub.

Yes channels are thread safe and that obviously comes with a cost. And I am not the first one to discover that. In a typical chat room scenario you need to implement a Publish Subscribe Pattern and they are not the most efficient data-structures to do PubSub. I looked at various implementations how I can implement a PubSub system internally without paying cost for so many channels and turns out you have only two options. Either you run a go routine invoking callbacks in a loop, or you use channels to fan out messages to individual recipients. I used the later one.

Disappointed with both solutions I looked at other patterns like Disruptor Pattern but there were no mature implementation(s) (the one here recommends to stay away and use channels instead). I would rather spend my energy on solving problem at hand. May be someday I will write a go lang disruptor library that is well tested and is usable in production.

Serialization, unions, missing generics, workarounds, and complexity

Looking at the message protocol (since I decided to go with JSON as a design choice and build on plethora of existing JSON parsers); the property @ usually acts like OPCODE and effects the shape of payload (totally different fields might show up for different values of @). There are only 2 fixed properties in the payload coming down from server i.e. @, and !id (Only God knows what was I thinking when I came up with these names). Now based value on @ rest of pay load requires different properties (one might have msg and to while other might not have any of them at all), just like HTTP request headers and body.

Since go does not have generics or unions my only option right now is to decode message in a base message struct with just the @ JSON field and then based on that try to decode message in a full payload struct. It's a price you have to pay when you don't have generics or unions in system and your data has dynamic shape. The same rule applies when message is serialized to storage etc. I also discovered later in benchmarks that I was using gob serialization to persist these structures in store for faster speed, but gob is a huge memory hog.

Back to white board

I was not able to get more than 1.5K~2K parallel connections on my RPi with 512MB of RAM (NOTE this includes everything including the HN comments pointed poor choice of BoltDB) and system will come to crawl as load goes higher due to thrashing, and memory swapping. Despite my efforts on improving code for performance rather than readability; I was not getting much improvements, except the fact that I am doing something wrong. At this point I kind of stopped development and asked myself: "What am I doing wrong?". To be fair on go's part; if you run this system on you 4 core laptop with GBs or RAM it works extremely well, faster than anything else. The problem hits you when it runs on a resources constrained system; the accumulative memory usage brings down the whole thing and I wanted this system to handle more connections on a cheap 512MB AWS instance (forget RPi).

Various explorations and going back to simplicity

I started exploring various options ranging from Rust, Elixir, Crystal, and Node.js. Rust was my second choice, but it doesn't have a good, stable, production ready WebSocket server library yet. Crystal was dropped due to conservative nature of Boehm GC, and Elixir also used more memory than I expected. Node.js surprisingly gave me a nice balance of memory usage and speed. With new async/await syntax it was less annoying for me to write reasonable code (not saying it's best code yet). There are more benchmarks on WebSockets out there.

So after writing a basic pubsub server in all these languages, and load testing (I used tcpkali to benchmark these systems with continuous message loads), I found Node.js was the right fit for me. The current system gracefully handles 5K concurrent clients, sending messages back and forth on single room (imagine 5k chatty people in same room), and it does all of this under 150MB of memory usage (including DB and everything else stays under 250 MB). It's not a scientific benchmark; but it already performs better than go version with same load test. I intend to do a more scientific version later and post the results.

Conclusion

I am still not calling it v1.0 because I am refactoring front-end code to clean that up as well (moving away from Vue 1.0 to hyper app JSX), server code still has some rough edges, and some development debt (packaging, code cleanup, unit tests etc.). But as of now the server is running on a Node.js server on a cheap 512MB VPS (the cheapest I could have found). So far I am happy with the amount of fat I am able to cut down and the results.

UPDATE People have pointed me to many resources, one of the most common one is this article. It points out same issues that I had, and goes down same path of writing event loops. I am re-implementing and benchmarking with new proposals. I won't be switching away from Node.js anytime soon but it's valuable to measure.