Selecting a javascript flow control solution

scriby edited this page Jan 29, 2012 · 1 revision

I am writing this piece to help others make a decision on which flow control solution is right for them when starting a new node.js project. A quick peek at the node modules page reveals a good number of options. I think the number of options points to both the need for some sort of flow control solution, and that one size doesn't fit all.

Who am I?

I am currently working for a company that is building a large node.js based web platform with a team of developers. Having written non-trivial javascript for 5+ years in a professional setting using a variety of flow control solutions, I am in a good position to give an overview of the landscape of solutions.

Let's begin

So, where do we begin? I'll break the options into four categories:

  1. Plain javascript
  2. A flow control library
  3. Fibers
  4. CPS transforms

Plain javascript

Pros

  • It's a common ground all javascript developers need to be familiar with, so it shouldn't require any additional learning / training (I think this is what people mean when they call it "simple")
  • The solution "closest to the metal", offering the best possible performance. Assuming you're not lazy and take the time to paralellize everything as much as possible. Also note that the performance gains are quite negligible and in most applications it would be a mistake to prioritize them over other trade offs.
  • Works in both nodejs and the browser

Cons

  • Error handling is, well, prone to errors. Every callback must include the if(err) { return callback(err); } boilerplate. Forget it in one place and you will lose errors.
  • Series operations result in lots of nesting (Breaking out each step into a function can help, but not always convenient)
  • Writing parallel operations is easy, but tracking when they're all finished requires introduction of extra variables and logic.
  • Don't forget that you shouldn't let exceptions bubble up. This practically means a lot of extra try / catches. Or you need to figure out how to restart your node processes without any loss of service.
  • It's easy enough to hack out some code that gets the job done, but what about readability and maintainability? It's going to take longer to understand the code and make changes. Small changes to requirements may lead to unnecessarily complex restructuring of code.
  • Doesn't scale well with complexity. The more complicated the business logic gets, the worse situation you'll be in. Even projects that start with simple needs may grow into complex beasts that need to be tamed.

Flow control libraries

There is a veritable horde of flow control libraries available. Two popular ones that come up frequently are async and step. I also think seq looks interesting. I'm currently using async on a project and it's working out well.

There are also a number of libraries built around the idea of futures or promises. These use a different pattern than most node.js programs -- instead of accepting a callback, they return an object that can be used to get the result of the async operation. Because of this difference, I haven't considered them much for use on the server, but I think they have some popularity in the browser.

Pros

  • Code will be in a state that is easier to read compared to plain javascript, because the flow control will be expressed directly instead of something that needs to be reverse engineered.
  • It's easier to modify code when requirements change. Want to add an extra step into an async.series? It's very straightforward.
  • Error handling is greatly simplified. Each step no longer has to check for errors itself, and can delegate error handling to a single function. Some flow control libraries catch thrown exceptions as well, and pass them to the error handling function like errors passed to the callback.
  • Works in both node.js and the browser

Cons

  • Call stacks (used for debugging) get littered with many extra frames. For example, between each step in an async.series, there's about 5 lines added to the call stack. (You can work around this by massaging the string representation of the stack trace).
  • The code still has a good amount of boilerplate, especially for complex control flows.
  • Sharing state between steps can be a pain. You either need to declare variables in an outer scope or use something like async.waterfall.
  • Stack overflows can occur. Continuation passing style leads to very long stack traces, as each series operation adds more to the call stack. If you try doing an async.series with 100,000 steps, you'll overflow the call stack. I noticed that async.waterfall does a process.nextTick between each step, but this leads to a substantial performance penalty, and you lose the entire call stack which hurts debugging.
  • Complex business logic can still be difficult to express in a straightforward manner.

Fibers

Fibers are commonly misunderstood. A fiber is simply a thread in javascript which has the ability to stop in place without blocking the event loop, and be resumed from outside the thread. Node is still single-threaded, and when a fiber is running, no other code gets to run. This concept itself is a bit low level, but it provides enough for powerful libraries to be built.

In this section, I'll be covering the pros and cons as related to this module (asyncblock). You may also want to check out sync if you don't mind a solution that modifies the Function prototype.

Pros

  • Write async code in synchronous style without blocking the event loop
  • Effortlessly combine serial and parallel operations with minimal boilerplate
  • Produce code which is easier to read, reason about, and modify
  • Simplify error handling practices
  • Improve debugging by not losing stack traces across async calls
  • The code you write is the code that runs. You can step through it in the debugger, and line numbers in stack traces will be accurate
  • Simple to setup and use (if using node v0.5.2+). Just npm install asyncblock or add to your package.json and you're good to go
  • A great choice for a typical business application where you just want to get the business logic expressed directly in a maintainable and readable format

Cons

  • Does not work in the browser, so it's only suitable for server-side code
  • Currently doesn't work on Windows
  • Requires V8 extensions (which are maintained in the node-fibers module)
  • Performance-wise it's a little slower than other solutions, but the time spent in flow control libraries accounts for ~1% of a typical application's time. See benchmarks for more details.

CPS transforms

CPS solutions add additional syntax to javascript that is converted to plain javascript before the code is run. The most popular CPS solutions are streamline, tame, and most recently tame for CoffeeScript, or IcedCoffeeScript.

It's difficult to combine all CPS solutions under a single umbrella, but in general they provide similar benefits as fibers. You can write code in a similar style to traditional C style programming languages but still get the benefits of async programming. One benefit of CPS solutions over fibers is that the translated javascript also works in the browser. (I think it would also be possible to write a source transform against a subset of the asyncblock syntax as well if this became important).

There are a few drawbacks to CPS solutions:

  • The code you write isn't the code that runs. This is the biggest argument against CoffeeScript in my opinion, and has implications when it comes to debugging.
  • Related to the above point, call stacks may be useless as the line numbers won't match the original source, and code will be split into additional methods that don't have meaningful names.
  • Development / build processes will have to be more complicated to account for the source transformation step