Adds a clustering requirements docs and a release note

closes #282
pulp · Apr 14, 2015 · e2a60d7 · e2a60d7
1 parent ed979b4
commit e2a60d7
Show file tree

Hide file tree

Showing 5 changed files with 340 additions and 170 deletions.
diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst
@@ -9,6 +9,7 @@ Contents:
    introduction
    release-notes/index
    installation
+   scaling
    tuning
    broker-settings
    server

diff --git a/docs/user-guide/installation.rst b/docs/user-guide/installation.rst
@@ -201,15 +201,6 @@ Server
 
     $ sudo yum groupinstall pulp-server-qpid
 
-   .. warning::
-      The Pulp team believes that Pulp's webserver and Celery workers can be deployed across several
-      machines (with load balancing for the HTTP requests), but this has not been formally tested by
-      our Quality Engineering team. We encourage feedback if you have tried this, positive or
-      negative. If you wish to try this, each host that participates in the distributed Pulp
-      application will need to have access to a shared /var/lib/pulp filesystem, including the web
-      servers and the task workers. It is important that the httpd and celery processes are run by
-      users with identical UIDs and GIDs for permissions on the shared filesystem.
-
    .. note::
       For RabbitMQ installations, install Pulp server without any Qpid specific libraries using
       ``sudo yum groupinstall pulp-server``. You may need to install additional RabbitMQ
@@ -289,10 +280,9 @@ Server
    once each (i.e., do not enable either of these on any more than one Pulp server).
 
    .. warning::
-      
+
       ``pulp_celerybeat`` and ``pulp_resource_manager`` must both be singletons, so be sure that you
-      only enable each of these on one host if you are experimenting with Pulp's untested HA
-      deployment. They do not have to run on the same host, however.
+      only enable each of these on one host if you are Pulp's clustered deployment.
 
    On some Pulp system, configure, start and enable the Celerybeat process. This process performs a
    job similar to a cron daemon for Pulp. Edit ``/etc/default/pulp_celerybeat`` to your liking, and

diff --git a/docs/user-guide/release-notes/2.6.x.rst b/docs/user-guide/release-notes/2.6.x.rst
@@ -10,9 +10,17 @@ Bug Fixes
 
 This is a minor release which contains bug fixes for `these issues <https://pulp.plan.io/issues?utf8=%E2%9C%93&set_filter=1&f%5B%5D=cf_4&op%5Bcf_4%5D=%3D&v%5Bcf_4%5D%5B%5D=2.6.1&f%5B%5D=tracker_id&op%5Btracker_id%5D=%3D&v%5Btracker_id%5D%5B%5D=1&f%5B%5D=&c%5B%5D=project&c%5B%5D=tracker&c%5B%5D=status&c%5B%5D=priority&c%5B%5D=cf_5&c%5B%5D=subject&c%5B%5D=author&c%5B%5D=assigned_to&c%5B%5D=cf_3&group_by=>`_.
 
-One area of improvement relates to upgrades. Starting with 2.6.1, Pulp processes `pulp_workers`,
-`pulp_celerybeat`, and `pulp_resource_manager` are stopped on upgrade or removal of the
-`pulp-server` package. After upgrading, you must restart all Pulp related services.
+Improvements
+------------
+
+- Pulp has been fully tested in a clustered configuration. A new section of documentation titled
+  :ref:`Clustering Pulp <clustering>` is available with more detail on configuring this type of Pulp
+  deployment.
+
+- One area of improvement relates to upgrades. Starting with 2.6.1, Pulp processes `pulp_workers`,
+  `pulp_celerybeat`, and `pulp_resource_manager` are stopped on upgrade or removal of the
+  `pulp-server` package. After upgrading, you must restart all Pulp related services.
+
 
 Pulp 2.6.0
 ===========

diff --git a/docs/user-guide/scaling.rst b/docs/user-guide/scaling.rst
@@ -0,0 +1,318 @@
+.. _MongoDB: http://www.mongodb.org/
+.. _Apache Qpid: https://qpid.apache.org/
+.. _RabbitMQ: http://www.rabbitmq.com/
+.. _MongoDB Deployment: http://www.mongodb.org/about/introduction/#deployment-architectures
+.. _Apache Qpid HA docs: https://qpid.apache.org/releases/qpid-0.28/cpp-broker/book/chapter-ha.html
+.. _RabbitMQ HA docs: http://www.rabbitmq.com/ha.html
+.. _mod_status: https://httpd.apache.org/docs/2.2/mod/mod_status.html
+.. _HAProxy: http://www.haproxy.org/
+
+Scaling Pulp
+============
+
+Great effort has been put into Pulp to make it scalable. A default Pulp
+install is an "all-in-one" style setup with everything running on one machine.
+However, Pulp supports a clustered deployment across multiple machines and/or
+containers to increase availability and performance.
+
+Overview of Pulp Components
+---------------------------
+
+Pulp consists of several components:
+
+* ``httpd`` - The webserver process serves published repositories and handles
+  Pulp REST API requests. Simple requests like repository creation are handled
+  immediately whereas longer tasks are asynchronously processed by a worker.
+
+* ``pulp_workers`` - Worker processes handle longer running tasks
+  asynchronously, like repository publishes and syncs.
+
+* ``pulp_celerybeat`` - The celerybeat process discovers and monitors workers.
+  Additionally, it performs task cancellations in the event of a worker
+  shutdown or failure. The celerybeat process also initiates scheduled tasks,
+  and automatically cancels tasks that have failed more than *X* times. This
+  process also initiates periodic jobs that Pulp runs internally. In a Pulp
+  cluster, exactly one of these should be running!
+
+* ``pulp_resource_manager`` - The resource manager assigns tasks to workers,
+  and ensures multiple conflicting tasks on a repo are not executed at the same
+  time. In a Pulp cluster, exactly one of these should be running!
+
+Additionally, Pulp relies on other components:
+
+* `MongoDB`_ - the database for Pulp
+
+* `Apache Qpid`_ or `RabbitMQ`_ - the queuing system that Pulp uses to assign
+  work to workers. Pulp can operate equally well with either Qpid or RabbitMQ.
+
+.. warning:: It is critical to note that ``pulp_celerybeat`` and
+    ``pulp_resource_manager`` should *never* have more than a single instance
+    running under any circumstance!
+
+The diagram below shows an example default deployment.
+
+.. image:: images/pulp-exp1.png
+
+.. This section is still TODO.
+.. Sizing Considerations
+.. ^^^^^^^^^^^^^^^^^^^^^
+.. 
+.. * Storage Considerations
+.. 
+..   * How much disk should someone allocate to a Pulp install, and which dirs
+..     should be mapped backed-up storage? Which dirs should be on local disk?
+.. 
+..   * When should they grow their volume?
+.. 
+..   * How do you recover if a volume does indeed fill up?
+.. 
+
+Choosing What to Scale
+----------------------
+
+Not all Pulp installations are used in the same way. One installation may have
+hundreds of thousands of RPMs, another may have a smaller number of RPMs but
+with lots of consumers pulling content to their systems. Others may sync
+frequently from a number of upstream sources.
+
+A good first step is to figure out how many systems will be pulling content
+from your Pulp installation at any given time. This includes RPMs, Puppet
+modules, Docker layers, OSTree layers, Python packages, etc. RPMs are usually
+pulled down on a regular basis as part of a system update schedule, but other
+types of content may be fetched in a more ad-hoc fashion.
+
+If the number of concurrent downloads seems large, you may want to consider
+adding additional servers to service httpd requests. See the `Scaling httpd`_
+section for more information.
+
+If you expect to maintain a large set of repositories that get synced
+frequently, you may want to add additional servers for worker processes.
+Worker processes handle long-running tasks such as content downloads
+from external sources and also perform actions like repository metadata
+regeneration on publish. See the `Scaling workers`_ section for more
+information.
+
+Another consideration for installations with a large number of repositories
+or repositories with a large numbers of RPMs is to have a dedicated server
+or set of servers for MongoDB. Pulp does not store actual content in the
+MongoDB database, but all metadata is stored there. More information on
+scaling MongoDB is available in the `MongoDB Deployment`_ docs.
+
+Pulp uses either RabbitMQ or Apache Qpid as its messaging backend. Pulp does
+not generate many messages in comparison to other applications, so it is not
+expected that the messaging backend would need to be scaled for performance
+unless the number of concurrent consumer connections is large. However,
+additional configuration may be done to make the messaging backend more fault
+tolerant. Examples of this are available in the `Apache Qpid HA docs`_ and
+the `RabbitMQ HA docs`_.
+
+.. warning:: There is a bug in versions of Apache Qpid older than 0.30 that
+    involves running out of file descriptors. This is an issue on deployments
+    with large numbers of consumers. See
+    `RHBZ #1122987 <https://bugzilla.redhat.com/show_bug.cgi?id=1122987>`_
+    for more information about this and for suggested workarounds.
+
+
+Scaling httpd
+-------------
+Additional httpd servers can be added to Pulp to increase both throughput
+and redundancy.
+
+In situations when there are more incoming HTTP or HTTPS requests than a single
+server can respond to, it may be time to add additional httpd servers. httpd
+serves both the Pulp API and content, so increasing capacity could improve
+both API and content delivery performance.
+
+Consider using the Apache `mod_status`_ scoreboard to monitor how busy your
+httpd workers are.
+
+.. note::
+    Pulp itself does not provide httpd load balancing capabilities. See the
+    `Load Balancing Requirements`_ for more information.
+
+To add additional httpd server capacity, configure the desired number of
+`Pulp clustered servers` and start ``httpd`` on them. Remember only one
+instance of ``pulp_celerybeat`` and ``pulp_resource_manager`` should be
+running across all `Pulp clustered servers`.
+
+
+Scaling workers
+---------------
+
+Additional Pulp workers can be added to increase asynchronous work throughput
+and redundancy.
+
+To add additional Pulp worker capacity, configure the desired number of `Pulp
+clustered servers` according to the the `clustering`_ docs and start
+``pulp_workers`` on each of them. Remember only one instance of
+``pulp_celerybeat`` and ``pulp_resource_manager`` should be running across
+all `Pulp clustered servers`.
+
+
+.. _clustering:
+
+Clustering Pulp
+---------------
+
+A clustered Pulp installation is comprised of two or more `Pulp clustered
+servers`. The term `Pulp clustered server` is used to distinguish it as a
+separate concept from :ref:`pulp_nodes`. `Pulp clustered servers` share the
+following components:
+
++--------------------+---------------------------------------------------------+
+| Pulp Configuration | Pulp reads its configuration from conf files inside     |
+|                    | ``/etc/pulp``.                                          |
++--------------------+---------------------------------------------------------+
+| Pulp Files         | Pulp stores files on disk within ``/var/lib/pulp``.     |
++--------------------+---------------------------------------------------------+
+| Certificates       | By default, Pulp keeps certificates in                  |
+|                    | ``/etc/pki/pulp``.                                      |
++--------------------+---------------------------------------------------------+
+| MongoDB            | All clustered Pulp servers must connect to the same     |
+|                    | MongoDB.                                                |
++--------------------+---------------------------------------------------------+
+| AMQP Bus           | All consumers and servers must connect to the same AMQP |
+|                    | bus.                                                    |
++--------------------+---------------------------------------------------------+
+
+
+Filesystem Requirements
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Pulp requires a shared filesystem for `Pulp clustered servers` to run
+correctly. Sharing with NFS has been tested, but any shared filesystem will
+do. Pulp expects all shared filesystem directories to be mounted in their
+usual locations.
+
+The following permissions are required for a `Pulp clustered server` to operate
+correctly.
+
++--------+-------------------+------------------------------------------------+
+| User   | Directory         | Permission                                     |
++========+===================+================================================+
+| apache | ``/etc/pulp``     | Read                                           |
++--------+-------------------+------------------------------------------------+
+| apache | ``/var/lib/pulp`` | Read, Write                                    |
++--------+-------------------+------------------------------------------------+
+| apache | ``/etc/pki/pulp`` | Read, Write                                    |
++--------+-------------------+------------------------------------------------+
+| root   | ``/etc/pki/pulp`` | Read                                           |
++--------+-------------------+------------------------------------------------+
+
+For more details on using NFS for sharing the filesystem with Pulp, see
+`Sharing with NFS`_.
+
+SELinux Requirements
+^^^^^^^^^^^^^^^^^^^^
+
+`Pulp clustered servers` with SELinux in Enforcing mode need the following
+SELinux file contexts for correct operation:
+
++--------------------+---------------------------------------------+
+| Directory          | SELinux Context                             |
++====================+=============================================+
+| ``/etc/pulp``      | system_u:object_r:httpd_sys_rw_content_t:s0 |
++--------------------+---------------------------------------------+
+| ``/var/lib/pulp``  | system_u:object_r:httpd_sys_rw_content_t:s0 |
++--------------------+---------------------------------------------+
+| ``/etc/pki/pulp``  | system_u:object_r:pulp_cert_t:s0            |
++--------------------+---------------------------------------------+
+
+For more details on using NFS with SELinux and Pulp, see `Sharing with NFS`_.
+
+
+Server Settings
+^^^^^^^^^^^^^^^
+
+Several Pulp settings default to ``localhost``, which won't work in a
+clustered environment. In ``/etc/pulp/server.conf`` the following settings
+should be set, at a minimum, for correct Pulp clustering operation.
+
++-------------+--------------+-----------------------------------------------+
+| Section     | Setting Name | Recommended Value                             |
++=============+==============+===============================================+
+| [server]    | host         | Update with the name used by your             |
+|             |              | load balancer.                                |
++-------------+--------------+-----------------------------------------------+
+| [database]  | seeds        | Update with the hostname and port of your     |
+|             |              | network accessible MongoDB installation.      |
++-------------+--------------+-----------------------------------------------+
+| [messaging] | url          | Update with the hostname and port of your     |
+|             |              | network accessible AMQP bus installation.     |
++-------------+--------------+-----------------------------------------------+
+| [tasks]     | broker_url   | Update with the hostname and port of your     |
+|             |              | network accessible AMQP bus installation.     |
++-------------+--------------+-----------------------------------------------+
+
+
+Load Balancing Requirements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To effectively handle inbound HTTP/HTTPS requests to `Pulp clustered servers`
+running ``httpd``, a load balancer should be used. Configuring a load
+balancer is beyond the scope of the Pulp documentation, but there are a few
+recommendations.
+
+Pulp defaults to using SSL for webserver traffic so the easiest thing is to
+use a TCP based load balancer. `HAProxy`_ has been tested with a clustered
+Pulp installation, but any TCP load balancer should work.
+
+With TCP load balancing, all `Pulp clustered servers` need to have ``httpd``
+configured with the same certificate. That certificate needs to have the CN
+set to the hostname or IP of the load balancer. This configuration ensures
+that the load balancer passes traffic to the Pulp webservers, but that the
+client and server will negotiate a SSL connection directly. By setting the CN
+to the hostname or IP of the load balancer, the client will trust the
+certificate presented by the `Pulp clustered server` as it is accessed
+through the load balancer.
+
+
+Consumer Settings
+^^^^^^^^^^^^^^^^^
+
+Consumers use a similar configuration as they would in a non-clustered
+environment. At a minimum there are two areas of
+``/etc/pulp/consumer/consumer.conf`` which need updating.
+
+* The ``host`` value in the ``[server]`` needs to be updated with the
+  load balancer's hostname or IP. This causes web requests from consumers
+  to flow through the load balancer.
+
+* The ``[messaging]`` section needs to be updated to use the same AMQP bus as
+  the server.
+
+.. warning:: Machines acting as a `Pulp clustered nodes` cannot be registered
+    as a consumer until :redmine:`859` is resolved.
+
+
+Sharing with NFS
+^^^^^^^^^^^^^^^^
+
+NFS has been tested with Pulp to share the ``/etc/pulp``, ``/var/lib/pulp``,
+and ``/etc/pki/pulp`` sections of the filesystem, but any shared filesystem
+should work. Typically `Pulp clustered servers` will act as NFS clients,
+and a third party machine will act as the NFS server.
+
+.. warning::
+    Exporting the same directory name (ie: pulp) multiple times can cause the
+    NFS client to incorrectly believe it has already mounted the export. Use
+    the NFS option ``fsid`` with integer numbers to uniquely identify NFS
+    exports.
+
+NFS expects user ids (UID) and group ids (GID) of a client to map directly
+with the UID and GID on the server. To keep your NFS export config simple,
+it is recommended that all NFS servers and clients have the same UID and GID
+for the user ``apache``. If they differ throughout the cluster, use NFS
+options to map UIDs and GIDs accordingly.
+
+Most NFS versions by default squash root which prevents ``root`` on NFS
+clients from automatically having root access on the NFS server. This
+typically prevents ``root`` on a `Pulp clustered server` from having the
+necessary Read access on ``/etc/pki/pulp``. One secure way to workaround
+this without opening up root access on the NFS server is to use the
+``anonuid`` and ``anongid`` NFS options to specify the UID and GID of
+``apache`` on the NFS server. This will effectively provide ``root`` on the
+NFS client with read access to the necessary files in ``/etc/pki/pulp``.
+
+If using SELinux in Enforcing mode, specify the necessary
+`SELinux Requirements`_ with the NFS option ``context``.