Permalink
Browse files

Add Passenger rolling restarts post.

  • Loading branch information...
1 parent b4faf47 commit f2b6aeebbc350139a9d3af69a33f1c003b549660 leehuffman committed Feb 26, 2012
Showing with 96 additions and 0 deletions.
  1. +96 −0 _posts/2012-02-25-passenger-rolling-restarts.md
@@ -0,0 +1,96 @@
+---
+layout: post
+title: Passenger Rolling Restarts with HAProxy, Nginx and Capistrano
+---
+
+{{ page.title }}
+================
+
+<p class="meta">25 Feb 2012 - Seattle</p>
+
+I was recently tasked with making rolling restarts happen for a customer of [ours](http://www.bluebox.net/). They had a traditional Ruby on Rails application, being served with Nginx & Passenger, with Capistrano handling their deployments. My first thought was to take advantage of the [API](https://boxpanel.bluebox.net/public/the_vault/index.php/Load_Balancing_API) we make available for managing these services, but eventually decided to stick with the tools we all know and love. Here's why:
+
+* We don't have to rely on HTTP calls during deployments. This keeps things as simple as possible for the customer, the rest of our staff (who we assume will end up supporting this at some point) and the deployment process itself.
+* Our load balancer offering is built on top of HAProxy. When you remove a server from the load balancer backend, we have to "reload" your HAProxy process. This results in the zeroing of all the counters that make up HAProxy's stats, which are extremely useful.
+* This is a configuration that can be used anywhere. The customer can take this to another provider, and I get to share it with you.
+
+While you'll find similar techniques to what I've used here with a quick Google search, I wasn't completely sold on anything I stumbled upon.
+
+One suggested having HAProxy's health check request a static file which could be removed to disable a host during the application restart. This works fine, but it means HAProxy will never know the true health of the application/backend server. Unless HTTP becomes completely unresponsive, we'll always return a 200 for health checks, simply because an empty file exists on disk. We'd prefer a health check that exercises the entire stack.
+
+Another article suggested taking advantage of HAProxy's backup host option paired with some tweaking of iptables' forwarding chain (on the backend servers) during deployments. The solution was well thought out, and I thought it was an interesting approach, but it felt overly complex. Modifying firewall rules when you're only trying to push some code is something I'd like to avoid. In addition to this, it's tough for someone unfamiliar with this process to step in, look over the configuration, and wrap their head around what we're doing. As I mentioned above, I want this to be as simple as possible.
+
+I think I found a happy middle-ground. Check it out below.
+
+HAProxy
+--------------------------------------
+
+We first need to configure HAProxy's health check to set an HTTP header that we can identify at the backend host. This allows us to differentiate between requests from outside visitors and those of HAProxy trying to determine the server's health. In this example, we set the 'User-Agent' header to 'healthcheck'.
+
+<script src="https://gist.github.com/1912584.js?file=haproxy-httpchk.txt"></script>
+
+You'll also want to verify that the inter, rise, and fall values for each of your backend servers are set to something reasonable. Below is an example of what I'm using.
+
+<script src="https://gist.github.com/1912755.js?file=haproxy-server.txt"></script>
+
+Nginx
+--------------------------------------
+
+Next we'll add a block to the server definition in our Nginx configuration that will perform the following:
+
+1. If the remove_me_from_load_balancer file exists, set the first half of $return_bad_status.
+1. If the 'User-Agent' header matches 'healthcheck', set the second half of $return_bad_status.
+1. If $return_bad_status matches 'true', return the HTTP status code 500.
+
+Here's what I'm using to accomplish this.
+
+<script src="https://gist.github.com/1912837.js"></script>
+
+Capistrano
+--------------------------------------
+
+We'll start here by setting some variables that we'll call in our new rolling restart task.
+
+<script src="https://gist.github.com/1912866.js?file=rolling-restart-variables.rb"></script>
+
+- *haproxy_disable_wait*: This is the amount of time in seconds we'll wait for the server to be disabled after creating our disable file. This should be adjusted based on the inter and fall values mentioned above.
+- *haproxy_enable_wait*: This is the amount of time in seconds we'll wait for the server to be enabled after removing our disable file. This should be adjusted based on the inter and rise values mentioned above.
+- *warm_protocol:* This is the protocol we'll use to spawn Passenger processes after we've triggered a restart. If your application is only available over HTTPS, you'll want to adjust this.
+- *warm_host_header:* This is the 'Host' header we'll use in the HTTP request to spawn Passenger workers. If you have multiple virtual hosts configured in Nginx, you'll want to adjust this so we can be sure we're hitting the right application.
+
+Next up we have our [tasks](https://gist.github.com/1912935).
+
+<script src="https://gist.github.com/1912935.js"></script>
+
+**create_tmp_symlink**
+
+With the standard Capistrano deployment, the tmp directory is specific to each release. Passenger watches the 'tmp/restart.txt' file for changes (usually in file modification time), and when it sees this change, it will restart your application. Unfortunately, this means that when Capistrano creates the current symlink to our new release, the 'tmp/restart.txt' file disappears and Passenger treats this as a sign that it should restart. This task ensures that doesn't happen.
+
+**restart**
+
+This is simply a transaction that calls our rolling_restart task. We could place all of the action here, but I think this does a better job of communicating to users that we're not performing a typical restart.
+
+**rolling_restart**
+
+Here's where the magic happens. We're going to iterate over the hosts that belong to the ':web' role, and deal with them one at a time.
+
+* Write out our 'remove_me_from_load_balancer' file that causes Nginx to return a 500 to our load balancer.
+* Wait for HAProxy to disable the server.
+* Restart Passenger and have curl warm us up. This ensures our server will be ready to handle requests when it's enabled again.
+* Remove 'remove_me_from_load_balancer' file so Nginx can return 200s to our load balancer.
+* Wait for HAProxy to enable the server.
+* Move on to the next server or task.
+
+**web:enable**
+
+Everyone handles their maintenance pages a little differently, so this will likely need to be adjusted to work correctly in your environment. The reason I've included it here is to demonstrate that it's been modified to be idempotent, or safe to execute on every deploy.
+
+As you'll see below in our hooks, we want a maintenance page put in place for every deployment that includes a migration. In order for that to be removed, we need to execute the web:enable task after every restart. Of course, this file won't exist if we haven't executed a migration, so we've adjusted it to eliminate the possibility of triggering a rollback in these scenarios.
+
+And finally, our hooks.
+
+<script src="https://gist.github.com/1919510.js?file=rolling-restart-hooks.rb"></script>
+
+The one important piece of this (outside of the web:enable task explained above) is to recognize we're executing our create_tmp_symlink task before finalize_deploy, since that task creates symlinks inside of tmp/ (like tmp/pids) by default. We want to create our symlink first to make sure we don't lose anything Capistrano might place inside tmp/ later on.
+
+If you have any questions or suggestions on what we covered here, give me a shout at [@leehuffman](http://https://twitter.com/leehuffman).

0 comments on commit f2b6aee

Please sign in to comment.