cluster: use round-robin load balancing

Empirical evidence suggests that OS-level load balancing (that is, having multiple processes listen on a socket and have the operating system wake up one when a connection comes in) produces skewed load distributions on Linux, Solaris and possibly other operating systems. The observed behavior is that a fraction of the listening processes receive the majority of the connections. From the perspective of the operating system, that somewhat makes sense: a task switch is expensive, to be avoided whenever possible. That's why the operating system likes to give preferential treatment to a few processes, because it reduces the number of switches. However, that rather subverts the purpose of the cluster module, which is to distribute the load as evenly as possible. That's why this commit adds (and defaults to) round-robin support, meaning that the master process accepts connections and distributes them to the workers in a round-robin fashion, effectively bypassing the operating system. Round-robin is currently disabled on Windows due to how IOCP is wired up. It works and you can select it manually but it probably results in a heavy performance hit. Fixes #4435.
nodejs · May 13, 2013 · e72cd41 · e72cd41
1 parent bdc5881
commit e72cd41
Show file tree

Hide file tree

Showing 2 changed files with 313 additions and 51 deletions.
diff --git a/doc/api/cluster.markdown b/doc/api/cluster.markdown
@@ -53,14 +53,28 @@ The worker processes are spawned using the `child_process.fork` method,
 so that they can communicate with the parent via IPC and pass server
 handles back and forth.
 
-When you call `server.listen(...)` in a worker, it serializes the
-arguments and passes the request to the master process.  If the master
-process already has a listening server matching the worker's
-requirements, then it passes the handle to the worker.  If it does not
-already have a listening server matching that requirement, then it will
-create one, and pass the handle to the child.
+The cluster module supports two methods of distributing incoming
+connections.
+
+The first one (and the default one on all platforms except Windows),
+is the round-robin approach, where the master process listens on a
+port, accepts new connections and distributes them across the workers
+in a round-robin fashion, with some built-in smarts to avoid
+overloading a worker process.
+
+The second approach is where the master process creates the listen
+socket and sends it to interested workers. The workers then accept
+incoming connections directly.
+
+The second approach should, in theory, give the best performance.
+In practice however, distribution tends to be very unbalanced due
+to operating system scheduler vagaries. Loads have been observed
+where over 70% of all connections ended up in just two processes,
+out of a total of eight.
 
-This causes potentially surprising behavior in three edge cases:
+Because `server.listen()` hands off most of the work to the master
+process, there are three cases where the behavior between a normal
+node.js process and a cluster worker differs:
 
 1. `server.listen({fd: 7})` Because the message is passed to the master,
    file descriptor 7 **in the parent** will be listened on, and the
@@ -77,12 +91,10 @@ This causes potentially surprising behavior in three edge cases:
    want to listen on a unique port, generate a port number based on the
    cluster worker ID.
 
-When multiple processes are all `accept()`ing on the same underlying
-resource, the operating system load-balances across them very
-efficiently.  There is no routing logic in Node.js, or in your program,
-and no shared state between the workers.  Therefore, it is important to
-design your program such that it does not rely too heavily on in-memory
-data objects for things like sessions and login.
+There is no routing logic in Node.js, or in your program, and no shared
+state between the workers.  Therefore, it is important to design your
+program such that it does not rely too heavily on in-memory data objects
+for things like sessions and login.
 
 Because workers are all separate processes, they can be killed or
 re-spawned depending on your program's needs, without affecting other
@@ -91,6 +103,21 @@ continue to accept connections.  Node does not automatically manage the
 number of workers for you, however.  It is your responsibility to manage
 the worker pool for your application's needs.
 
+## cluster.schedulingPolicy
+
+The scheduling policy, either `cluster.SCHED_RR` for round-robin or
+`cluster.SCHED_NONE` to leave it to the operating system. This is a
+global setting and effectively frozen once you spawn the first worker
+or call `cluster.setupMaster()`, whatever comes first.
+
+`SCHED_RR` is the default on all operating systems except Windows.
+Windows will change to `SCHED_RR` once libuv is able to effectively
+distribute IOCP handles without incurring a large performance hit.
+
+`cluster.schedulingPolicy` can also be set through the
+`NODE_CLUSTER_SCHED_POLICY` environment variable. Valid
+values are `"rr"` and `"none"`.
+
 ## cluster.settings
 
 * {Object}