Puma vs Phusion Passenger

Hongli Lai edited this page Feb 7, 2014 · 37 revisions

Puma is an open source multi-threaded application server for Ruby, written by Evan Phoenix and based on Mongrel. We at Phusion often get questions about how Puma differs from the Phusion Passenger Ruby web server. This article lists some differences.

Regarding objectivity and accuracy

As you may have noticed by now, this article is written by the authors of Phusion Passenger. As a result, some people have expressed their concerns about the objectivity and accuracy of this article. So first of all, let us state that, because we have a commercial interest in this, it is not possible for us to be completely unbiased. That being said, we do our best, and we do not like to make claims that are not backed by facts. If we say that Phusion Passenger is better than X, then that's because we can back that up, and because we believe in it. If you are still worried about objectivity or accuracy, then we encourage you to read this response by Evan Phoenix himself:

Evan Phoenix's response

Ease of use

Phusion Passenger takes a holistic approach and acts more like an integrated whole. It takes care of a lot of system administration and management for you. This is why Phusion Passenger is easier to use.

Puma acts more like a component that you have to integrate into the rest of the system. It requires more system administration knowledge.

License and price

Puma is completely open source and free. Phusion Passenger is also open source and free, but there is a commercial (paid) version - Phusion Passenger Enterprise - which provides more features as well as commercial support. Selling Phusion Passenger Enterprise is our means to sponsor the continued development of both the open source and the Enterprise version.

Concurrency

Puma is multithreaded-only. The open source variant of Phusion Passenger is multi-process single-threaded. The Enterprise variant can be configured to be either single-threaded or multithreaded.

Multithreading allows less memory usage and provides higher concurrency than multi-process single-threading. Multithreading is especially suitable for applications that require high I/O concurrency, e.g. applications that perform a lot of HTTP API calls or otherwise block on I/O, or applications which serve WebSockets.

Both Puma and Phusion Passenger Enterprise can be hybrid multi-process multi-threaded. That is, running multiple multithreaded processes. Hybrid mode allows Ruby and Python, which despite having a Global Interpreter Lock, to fully utilize all CPU cores.[1] In Puma, the hybrid mode is called "clustered".

Where Puma and Phusion Passenger Enterprise differ is the threading model. Puma has a dynamic thread pool. Phusion Passenger Enterprise's thread pool is optimized for performance, and is therefore static. It was decided early in the implementation process that managing threads dynamically, especially in Ruby, involves too much overhead and makes the code more complicated. However in some situations you may find a dynamic thread pool more suitable.

On the other hand, Phusion Passenger Enterprise has a dynamic process pool that can be changed on-the-fly, while Puma in clustering mode has a static process pool.

[1] Only the case on MRI, not on JRuby and Rubinius. JRuby and Rubinius fully support multi-core threads in a single process.

Management tools

Puma provides a control server which allows you to stop or restart Puma and to query its status. However, it appears to be rather minimalistic (it only displays the size of the backlog and the number of running requests), and doesn't appear to be a first-class citizen. For example, the control server does not work in clustered mode. <-- That issue is now marked as fixed and supported since 2.3.0.

Phusion Passenger, both the open source and Enterprise variants, have management tools that provide much more insight. Phusion Passenger allows you to stop, restart and to query its status through command line tools like passenger-status, passenger-config, passenger-memory-stats. These tools are regular command line tools, and their access can be controlled through sudo, which is a very Unix way of doing things. These tools display everything Puma's status server displays, plus the exact requests that are currently running, how long they've been running, the application's CPU and memory usage, etc.

Deployment model

Phusion Passenger supports two deployment models. The most common model is where it integrates directly into the web server. Apps are started together with the web server and stopped together with the web server, and configuration is done through the web server configuration file. Phusion Passenger infers most information from the web server configuration file, to keep configuration to a minimum amount. Normally you only have to set a virtual host, set the document root, set passenger_enabled on, and you're done.

Puma supports the reverse proxy model. Puma is its own application and listens on a TCP or Unix domain socket for HTTP requests. The administrator is then supposed to connect the front end web server to Puma through a reverse proxy setup. You have to do this for every Puma app.

Puma's deployment model also requires additional configuration for serving static assets through the web server, instead of through Puma. Phusion Passenger handles this sort of stuff automatically.

Puma's reverse proxy model is also supported by Phusion Passenger, in the form of Phusion Passenger Standalone. Phusion Passenger Standalone behaves just like Puma, in that it is its own application, listens on a TCP or Unix domain socket. So for those who prefer the reverse proxy model for architectural reasons, Phusion Passenger can accomodate them too.

Multi-app support

Phusion Passenger is designed for multi-app deployment by default. There is no clustered mode to turn on, it just works. This shows in both usage and the management tools. With a single Phusion Passenger install, you can easily deploy multiple apps. With a single set of management tools, you can manage all your apps.

With Puma, you have to manage each app individually, with a different control server per app. Puma does have a tool called Jungle which allows you to manage multiple apps through SysV init scripts or Upstart.

Jungle SysV init version vs Phusion Passenger

  • Jungle can add new apps or remove existing apps on the fly. The open source version of Phusion Passenger does not have such a feature. Phusion Passenger Enterprise offers a similar feature in the form of Flying Passenger.
  • Jungle is capable of starting each app as a different user, through the use of the SysV init script functions. This does not set environment variables from bashrc, so if the user expected that to work then it may cause some confusion. Phusion Passenger loads apps through bash by default, and preserves environment variables in bashrc.
  • With Phusion Passenger, you only have to configure per-app stuff in the web server config file. No per-app reverse proxy settings. With Jungle, you still need per-app reverse proxy settings, and you need to run a Jungle command in addition to modifying the web server config file.
  • Jungle's app removal is currently easier to use than Flying Passenger's (Phusion Passenger 4.0.5). With Flying Passenger you currently have to go through some hoops to remove an app. We intend to address this in a future release.
  • Jungle does not appear to be able to modify the number of processes for an app on-the-fly. Phusion Passenger Enterprise with Flying Passenger can do that.

It should be noted that the SysV init script version of Jungle manages Puma, but does not supervise/monitor Puma, as in watching whether the process has quit, watching memory usage, watching CPU usage, etc. SysV init scripts don't monitor anything and are merely a wrapper interface around running certain commands, so Jungle is a tool for managing the startup and stopping of multiple Puma apps. As an example, use the SysV init version of Jungle to start a Puma, and then kill the master process. Notice that the OS does not restart Puma.

Phusion Passenger restarts all crashed processes. Phusion Passenger even restarts itself if it crashes, thanks to its watchdog architecture.

Jungle Upstart version vs Phusion Passenger

Unlike SysV init, Upstart is a monitoring/supervision system that can restart a process if it crashes. The Upstart version of Jungle restarts Puma if it crashes. There are still differences between Jungle-Upstart and Phusion Passenger though:

  • Jungle-Upstart starts all apps as the same user. There does not appear to be a way to configure this on a per-app basis. Phusion Passenger can handle different-user-per-app just fine. We actually recommend different-user-per-app for security reasons, as we've described in our PivotalLabs talk Securing Ruby apps at the OS level.
  • Like Jungle-SysVinit, it does not set environment variables from bashrc.
  • Jungle-Upstart is not capable of on-the-fly adding new Puma apps, or removing existing Puma apps. The list of apps is static, and they must all be started and stopped at the same time.

Performance

Performance characteristics depends on the workload, so this should be explained in two parts.

CPU-bound, fast requests

For CPU-bound, fast requests that don't involve blocking I/O, Puma and Phusion Passenger (both the open source and Enterprise variant) perform similarly in production, but differently in microbenchmarks. In microbenchmarks Puma is faster because in Phusion Passenger, all data goes through an additional process, the PassengerHelperAgent, which sanitizes request headers, coordinates process spawning, collects statistics, etc. The overhead is not big, approximately a little more than an extra read()/write() call to the kernel. But in microbenchmarks where you are benchmarking how quickly the app can do nothing, Phusion Passenger will appear to be twice as slow because of the extra proxy layer. On the other hand, that extra proxy layer is what allows us to provide accurate statistics and to implement robust process coordination, so it's not there for nothing. But we have some ideas on how to address even this in the future.

I/O-bound, slow requests

  • Against Phusion Passenger open source
    For slow requests that are bound by blocking I/O, Puma achieves higher concurrency than the open source version of Phusion Passenger. Note that concurrency is not the same as performance. Because of the limited amount of concurrency provided by the multi-process model, Phusion Passenger (open source) cannot achieve a high throughput for these kinds of workloads because it will spend a lot of time waiting for the kernel to give it more data. While waiting, it does not use CPU, so your system is being underutilized. You can offset this by spawning more processes, but that requires quite some memory.
  • Against Phusion Passenger Enterprise
    Puma and Phusion Passenger Enterprise can achieve the same concurrency and the same performance in production. Performance in microbenchmarks is still different, as explained above.

Memory usage

Memory usage should be viewed in three parts:

  • The base memory usage, which is memory that is always used regardless of the app server settings and the application.
  • The application-specific memory usage, which is independent from the app server. This includes framework memory usage, e.g. the memory required to load Rails.
  • The concurrency memory usage, which is the amount of memory the app server requires as one increases concurrency.
Aspect Puma 2.2.2 Passenger (OSS) 4.0.6-pre[1] Passenger Enterprise 4.0.6-pre[1]
Base[2] 20 MB per process 3.6 MB once[3], plus 15 MB per process
App-specific[2] Same between all 3. The app server has no effect on the app's memory usage.
Concurrency[2] 20 MB per additional cluster process. Negligible amount of memory growth per additional thread. 15 MB per additional concurrency increase. Because multiprocessing is the only I/O model supported, concurrency memory usage scales with the number of processes. 15 MB per additional cluster process. Negligible amount of memory growth per additional thread.

This chart displays the difference in memory usage as the number of cluster processes grow. Phusion Passenger open source and Enterprise are consolidated as a single entry because their memory usage w.r.t. growing number of processes are identical. Note that [2] still applies.

Chart: Puma vs Phusion Passenger memory usage

The above chart does not display memory usage as a function of the number of threads. Such a chart would be boring: the memory usage would be almost a flat line when compared to memory usage as a function of the number of processes.

Conclusion:

  • On CPU-bound fast-running workloads, in which Puma's multithreading provides no advantages, Phusion Passenger uses less memory than Puma.
  • On I/O-bound slow-running workloads, Puma with N threads uses less memory than the open source variant of Phusion Passenger with N processes. However, Phusion Passenger Enterprise with N threads uses even less memory than Puma with N threads.
  • Phusion Passenger Enterprise's memory usage is very close to the optimal minimum achievable with Ruby. An empty Ruby process that loads only RubyGems and Rack consumes 13 MB.

Notes:

  • All memory usages measured on OS X Mountain Lion, with an empty 'hello world' Rack app, Ruby 1.9.3. Numbers come from the "Private Mem" column in Activity Monitor.
  • [1] Phusion Passenger 4.0.6 pre-release (git ab802c525).
  • [2] Assumes copy-on-write is not enabled.
  • [3] Memory usage for PassengerWatchdog, PassengerHelperAgent and PassengerLoggingAgent.

Copy-on-write

An aspect that may further affect memory usage is whether the app server supports copy-on-write virtual memory. Both Puma and Phusion Passenger support copy-on-write virtual memory. Other than app server support, this also requires support from the Ruby interpreter. At present, Ruby Enterprise Edition and Ruby 2.0.0 support this.

With copy-on-write virtual memory enabled, memory usage is reduced by about 33% if there are at least two processes.

In Puma, utilizing copy-on-write requires setting the preload_app option, available since Puma 2.2.2. In Phusion Passenger, this requires passenger_spawn_method smart (the default), available since version 1.0.

Debugging and inspection

The open source version provides tools for debugging stuck applications by displaying all threads' backtraces, while Puma does not appear to have such functionality. Phusion Passenger Enterprise provides a live IRB console that you can attach to any live, running process for inspection. It also provides ruby-debug integration that you can use even in multi-process mode. In Puma, the debugger is only available in non-clustered mode.

Fault-tolerance

Phusion Passenger, both the open source and Enterprise variant, imposes a time limit on the starting and stopping of the application. Puma does not impose any time limit. If your application is stuck during startup or shutdown (database problem, filesystem problem, network problem, or just a bug) then you will have to find that out manually by seeing that you get Bad Gateway on your requests. Phusion Passenger logs the problem.

If your application throws an error during startup, Puma just keeps trying infinitely, using 100% CPU, while Phusion Passenger only tries once per request. This can be problematic on a redeploy: if you have a startup bug that you did not catch during development or staging, then on a redeploy your site is down (and uses 100% CPU) until you fix it. On Phusion Passenger Enterprise, you can configure it not to retry until the next deploy. It can even hold on to all of its former application processes, to avoid visitors from noticing a problem (the Deployment Error Resistance feature).

Resource control

Phusion Passenger Enterprise provides features for limiting the request time and memory usage of applications. This is useful if the application - or one of the components it interacts with - has bugs which can cause it to become stuck or use a lot of memory. Unlike many memory management tools, Phusion Passenger Enterprise's memory limiting feature is graceful, so the application only shuts down after it has finished its request, preventing clients from noticing a problem.

Puma has no such features. Evan Phoenix (Puma author) did point out that the user can use the Rack-Timeout middleware which uses timeout.rb. However, in our experience, timeout.rb is not only dangerous (it can corrupt a program's state) but also ineffective (it is not able to interrupt all kinds of freezes). This is the reason why Phusion Passenger Enterprise uses operating system level mechanisms to enforce timeouts instead of using timeout.rb. Phusion Pasenger Enterprise also uses its advanced process management features to automatically replace the aborted process.

Out-of-band garbage collection

Phusion Passenger, both the open source and Enterprise variant, provides out-of-band garbage collection, even in multithreaded mode. Although the GC problem is reduced when using Rubinius or JRuby, most people are still on MRI, and out-of-band garbage collection is an excellent way to reduce latency caused by the GC.

Puma has no support for out-of-band garbage collection.

Documentation

Puma's README provides clear quick start instructions, as well as basic usage, deployment and management instructions. Puma's documentation is short, clear and concise. There does not appear to be a configuration reference, an FAQ, a troubleshooting document, an internals document or OS-specific documentation. For the most part, the Puma documentation assumes that the user is already familiar with the Apache or Nginx, and provides no further documentation on the usage of the web server.

Phusion Passenger provides an extensive Users Guide (Apache version, Nginx version). It documents installation, uninstallation, cryptographic verification, basic usage, deployment, management and troubleshooting in detail, with OS-specific information and instructions. There exist appendices and documents which describe the internals in detail, e.g. the Architectural Overview and the appendix which describes the workings of the Smart Spawning Method. There is also a detailed reference for all possible Phusion Passenger configuration options. The documentation tries to appeal to both novices (those who are neither familiar with Apache/Nginx, nor with Ruby, or even Unix system administration) as well as advanced users. This is, for example, shown in the About environment variables appendix.

Incidentally, the Phusion Passenger documentation for the Smart Spawning Method also documents caveats that one should be aware of when using preload_app in Puma, for which no documentation exists. This is because both mechanisms are similar.

Language and runtime support

Phusion Passenger is a polyglot, multi-application server. It supports Python, Node.js and Meteor. Puma is Ruby-only.

Both Puma and Phusion Passenger support JRuby and Rubinius.

Puma supports Windows, but Phusion Passenger does not. Phusion Passenger requires a Unix OS.

Conclusion

Puma is interesting technology. Compared to the open source variant of Phusion Passenger, there are both advantages and disadvantages.

  • For new users, Phusion Passenger is the easiest to use because of its holistic approach, and because it acts more like an integrated solution. Puma requires more system administration knowledge to setup.
  • For CPU-bound, fast-running requests, Phusion Passenger is better because of the better management tools, the fault-tolerance features, the better documentation, etc.
  • The fact that the open source version only supports multi-process I/O is problematic for people who have web apps with lots of I/O bound, slow-running requests. For those people, running hundreds of Phusion Passenger processes to get the same sort of concurrency as Puma's multithreading is probably not doable, so Puma's multithreading support becomes the most important.
  • For sites in-between, e.g. sites with some slow-running blocking I/O requests, the administrator should assess the pros and cons: are the better management tools, fault-tolerance features, out-of-band GC and so forth more important? Or is higher concurrency for less memory more important?

Compared to the Enterprise variant of Phusion Passenger, we believe Phusion Passenger Enterprise has the upper hand, with the exception of the dynamic thread pool. If having a price tag is not a problem, then Phusion Passenger Enterprise is an excellent choice. It provides a ton of stability, robustness, ease of use and insight into a single package, that under the hood still follows the Unix bunch-of-simple-components-working-together philosophy.