vsphere/vsphere_ref_arch.html.md.erb

---
title: Reference Architecture for Pivotal Cloud Foundry on vSphere
owner: Customer0
---

<strong><%= modified_date %></strong>

This guide presents reference architectures for Pivotal Cloud Foundry (PCF) on vSphere.


## <a id="overview"></a> Overview

Pivotal validates the reference architectures described in this topic against multiple production-grade usage scenarios. These test deployments host up to 1500 app instances and use PCF-managed services such as MySQL, RabbitMQ, and Spring Cloud Services.

This document does not replace the [basic installation documentation] (../../customizing/vsphere.html), but gives proven examples of how to apply those instructions to real-world production environments.

| PCF Products Validated        | Version                   |
| ----------------------------- |:-------------------------:|
| PCF Ops Manager               | 1.10.latest                |
| Elastic Runtime               | 1.10.latest                |


## <a id="base-arch"></a> Base Reference Architecture 

This recommended architecture relies on VMware NSX Edge, a software-defined services gateway that runs on VMware ESX/ESXi virtual hosts and combines a firewall, load balancer, and NAT/SNAT. However, see below for architectures that do not rely on NSX Edge.

To use all features listed here, NSX Edge requires at least Advanced licensing from VMware.

For more information about installing and configuring NSX Edge for use with PCF on vSphere, see the <a href="./vsphere_nsx_cookbook.html" class="subnav">NSX Edge Cookbook for Pivotal Cloud Foundry on vSphere</a>.

The diagram below shows an architecture for two PCF installations sharing the same vSphere server clusters, yet segmented from each other with VMware Resource Pools. 

This design supports long term use, capacity growth at the vSphere level, and maximum installation security through the NSX Edge firewall. It allocates 3+ servers to each cluster, as recommended for vSphere, and spreads PCF components across 3 (or another odd number) of clusters, as [recommended](../../concepts/high-availability.html) for PCF.

<%= image_tag('vsphere-overview-arch.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-overview-arch.png).

### <a id="install"></a> Installation

To create a system following this architecture, do the following:

1. From vCenter, create three clusters.<br/><br/>

    Pivotal recommends NSX Logical Switches for all clusters used by PCF. This approach avoids VLAN consumption while benefiting from the overlay capability NSX enables. NSX can create a DPG (Distributed Port Group) on a DVS (Distributed Virtual Switch) for each interface provisioned on the NSX Edge as shown in the <a href="#dist-port-group">Port Groups diagram</a> below.<br/><br/>
    Alternatively, port groups on a DVS with VLANs tagged on each can be used for the networks above.

1. Populate each cluster with two VMware Resource Pools. Enable VMware distributed resource scheduler (DRS) for each Resource Pool, so vMotion can automatically migrate data to avoid downtime.

1. For hosting capacity, populate each cluster with three ESXi hosts, making nine hosts for each installation. All installations collectively draw from the same nine hosts.

1. In one PCF deployment, use Ops Manager to create three Availability Zones (AZs), each corresponding to one of the Resource Pools from each cluster.

1. In the other PCF deployment, create an AZ for each of the three remaining Resource Pools.

1. For storage, add dedicated datastores to each PCF deployment following one of the two approaches, vertical or horizontal, as described [below](#storage).

1. Supply core networking for each deployment by configuring an NSX Edge with the following subnets. See [below](#networking-design) for details:
  - Infrastructure
  - Elastic Runtime (ERT)
  - Service tiles (one or more)
  - Dynamic service tiles (a network managed entirely by BOSH Director)
  - IsoZone##

### <a id="scale"></a> Scaling

You can easily scale up this architecture to support additional PCF installations with the same capacity, keeping each one resource-protected and separated.

To support more PCF installations, scale this architecture vertically by adding Resource Pools. To add capacity to all PCF installations, scale it horizontally by adding hosts to the existing clusters in sets of three, one per cluster. 

### <a id="priority"></a> Priority

In this architecture, multiple PCF installations can share host resources. You can use vCenter resource allocation shares to assign `High`, `Normal`, or `Low` priority to pools used by multiple installations. When host resources keep up with demand, these share values make no difference, but when multiple installations compete for limited resources, you can prioritize a production installation over a development installation (for example) by assigning its resource pools a `High` share value setting.

### <a id="storage"></a> Storage Configuration

You can allocate networked storage to the host clusters following one of two common approaches, _horizontal_ or _vertical_. The approach you follow should reflect how your data center arranges its storage and host blocks in its physical layout:

* **Horizontal**: You grant all hosts access to all datastores, and assign a subset to each installation. For example, with 6 datastores `ds01` through `ds06`, you grant all nine hosts access to all six datastores, then provision PCF installation #1 to use stores `ds01` through `ds03`, and installation #2 to use `ds04` through `ds06`. Installation #1 will use `ds01` until it is full, then `ds02`, and so on.

* **Vertical**: You grant each host cluster its own dedicated datastores, giving each installation multiple datastores based on their host cluster. vSphere VSAN storage requires this architecture. With 6 datastores `ds01` through `ds06`, for example, you assign datastores `ds01` and `ds02` to cluster 1, `ds03` and `ds04` to cluster 2, and `ds05` and `ds06` to cluster 3. Then you provision PCF installation #1 to use `ds01`, `ds03`, and `ds05`, and installation #2 to use `ds02`, `ds04`, and `ds06`. With this arrangement, all VMs in the same installation and cluster share a dedicated datastore.

<p class="note"><strong>Note</strong>: If a datastore is part of a vSphere Storage Cluster using sDRS (storage DRS), you must disable the s-vMotion feature of the sDRS function on any datastores used by PCF. Otherwise, s-vMotion (Storage vMotion) activity can rename independent disks and cause BOSH to malfunction. For more information, see <a href="./vsphere_migrate_datastore.html">How to Migrate PCF to a New Datastore in vSphere</a>.</p>

### <a id="storage-type"></a> Storage Capacity and Type

* **Capacity**: Pivotal recommends allocating at least 16TB of data storage for a typical PCF installation, either as two 8TB stores or a greater number of smaller volumes. Small installations without many tiles can use less; two 4TB volumes is reasonable. The primary consumer of storage is the NFS/WebDAV blobstore.

<p class="note"><strong>Note</strong>: At time of publication, PCF does not support the use of vSphere Storage Clusters with the <a href="#overview">latest versions of PCF</a> validated for the reference architecture. Datastores should be listed individually in the vSphere tile.</p>

* **Type**: Pivotal recommends block-based (fiber channel or iSCSI) and file-based (NFS) over high-speed carriers such as 8Gb FC or 10GigE. Redundant storage is highly recommended for both ephemeral and persistent storage types used by PCF.

### <a id="networking"></a> Networking

Using VMware NSX SDN (software-defined networking) provides the following benefits:

  * Distributed, hyper-local firewall capability per-installation through the built-in Edge Firewall
  * High capacity, resilient, distributed load balancing per-installation through the NSX Load Balancer
  * Network fencing
  * Element obfuscation through the use of non-routed RFC-1918 networks behind the NSX Edge and the use of SNAT/DNAT connections to expose only the endpoints of Cloud Foundry that need exposure
  * High repeatability of installations through the repeat use of all network and addressing conventions on the right hand side of the diagram (the Tenant Side)
  * Automatic rule and ACL sharing via NSX Manager Global Ruleset
  * Automatic HA pairing of NSX Edges, managed by NSX Manager
  * Support for PCF Go Router IP membership in the NSX Edge virtual load balancer pool by the BOSH CPI (not an Ops Manager feature)

#### <a id="network-design"></a> Networking Design

Each PCF installation consumes four (or more) networks with the NSX Edge, aligned to specific job types:

- **Infrastructure**: This inward-facing network has a small CIDR range and hosts resources that interact with the IaaS layer and back-office systems, such as the cloud provider interface (CPI), BOSH, Ops Manager, and other utility VMs such as Jumpbox VM.
- **Deployment**: Also known as the _apps wire_, this network has a large CIDR range. It hosts the Diego cell VMs that Elastic Runtime deploys apps into, and it also hosts Elastic Runtime support components. 
- **Services**: This network (or multiple networks) has a large CIDR range. It hosts services that are installed with Ops Manager tiles and managed by BOSH. A simple approach is to use this network for all PCF tiles except ERT. A more involved approach would be to deploy multiple**CF-Tiles-#** networks, one for each tile or one for each type of tile. For example, a network for databases, a network for message busses and so on. 
- **Dynamic Services**: A single network granted to BOSH Director for use with service tiles that require an [on-demand](http://
docs.pivotal.io/on-demand-service-broker/about.html#on-demand) (dynamic) address space for deployment. This is the only network that is marked as "Services" with a check box in the vSphere Ops Manager Director tile.
- **IsoZones##**: A single network granted to the PCF Isolation Segment tile for use to isolate Gorouters and Diego Cells into a new network space independent of the ERT installation.

All of these networks are considered "inside" or "tenant-side" networks, and use non-routable RFC-1918 network space to make provisioning repeatable. The NSX Edge translates between the tenant and service provider side networks using SNAT and DNAT.

Provision each NSX Edge with at least four routable IP addresses from the service provider:

1. A static IP address by which NSX Manager manages the NSX Edge
2. A static IP address for use as egress SNAT. Traffic from the tenant side exits the Edge on this IP address 
3. A static IP address for DNATs to Ops Manager
4. A static IP address for the load balancer VIP that balances to a pool of PCF Gorouters (HTTP/HTTPS)

In addition to these four, many more uses for IP addresses exist on the routed side of the NSX Edge. Pivotal recommends reserving a total of ten contiguous, static IP addresses per NSX Edge for future needs and flexibility.

On the tenant side, each interface defined on the NSX Edge acts as the IP gateway for the network used on that port group. Pivotal recommends allocating the following address ranges for the networks, and defining the gateway at `.1` for each:

- **Infra** (infrastructure) network: 192.168.10.0/26 
- **Deployment** network: 192.168.20.0/22
- **Services** network: 192.168.24.0/22
- **Dynamic Services** network: 192.168.28.0/22
- **IsoZone##** network: 192.168.32.0/22, Gateway at .1

<%= image_tag('vsphere-exploded-edge.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-exploded-edge.png).

<p class="note"><strong>Note</strong>: Review the container-to-container settings in your installation for any IP address range conflicts with the enterprise addressing scheme in use by your organization. The specified range cannot be used anywhere else in your deployment. Change the range to a non-conflicting range as needed.</p>
##### <a name="dist-port-group"> </a> Distributed Port Groups

vSphere DVS (Distributed Virtual Switching) is recommended for all Clusters used by PCF. NSX will create a DPG (distributed port group) for each interface provisioned on the NSX Edge. 

Alternatively, NSX Logical Switches can be used on the Tenant Side of this design, which leverages vWires, reducing the dependency on VLAN address space.

<%= image_tag('vsphere-port-groups.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-port-groups.png).

### <a id="highperf"></a> High Performance Variants

#### <a id="one_arm_lb"></a> One-Armed Load Balancing

The NSX Edge can act as a stand-alone, one-armed load balancer. 

This variant can improve performance and separate the dependence on the Edge that acts as NAT/SNAT/Firewall/Router by separating the load balancing function to a separate Edge deployed exclusively for use per installation. 

In short, you divide the jobs between two Edges per install rather than one. To implement this architecture, you place a single interface (internal) of a new Edge on the Deployment network, enable the load balancing function, and DNAT to it through the boundary Edge.


## <a id="no-nsx"></a> Reference Architecture Without VMware NSX

The reference architecture for deploying production PCF on vSphere without VMware NSX SDN technology follows the [base architecture](#base-arch), but with the following differences.

### <a id="no-nsx-features"></a> Networking Features

- Load balancing is handled by an external service, such as a hardware appliance or a VM from a 3rd party.
- An external service also performs SSL termination.
- You must set up firewalls for each zone or network inside the installation, rather than having the NSX Edge appliance span multiple networks. 
- To obfuscate network addresses, you must configure a SNAT/DNAT and single or possible multiple VLANs from the routable network, rather than turn on the SNAT/DNAT functionality of the NSX Edge.

### <a id="no-nsx-design"></a> Networking Design

The more traditional approach without SDN would be to deploy a single VLAN for use with all of PCF, or possibly a pair of VLANs, one for infrastructure and one for PCF. As VLAN capacity is frequently limited and scarce, this design seeks to limit the need for VLANs to a functional minimum.

<%= image_tag('vsphere-no-nsx.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-no-nsx.png).

In this example, the firewall and load balancer functions run outside of vSphere, on generic devices that most datacenters provide. The PCF installation is bound to two port groups provided by a DVS on ESXi, each of which aligns to different job types:

  1. **Infra**: CPI, BOSH, and Ops Manager VMs that communicate with the IaaS layer
  2. **PCF**: the deployment network for all tiles, including ERT

In a typical installation, you assign each of these port groups to a VLAN out of the datacenter pool, and a routable IP address segment. Routing functions are handled by switching layers outside of vSphere, such as a top-of-rack (TOR) or end-of-row (EOR) switch/router appliance.

It is still valid to deploy all the networks shown in the original design, so deploy them if the resources are readily available. The main thing to keep in mind is that this is a requirement per PCF installation, so keep a count of how many of those overall you will require.


## <a id="single-cluster"></a> Reference Architecture Without Multiple Clusters

If you are working with three or more ESXi hosts and want to use less resources than the [base architecture](#base-arch) requires, Pivotal recommends setting up PCF in three clusters with one host in each.

To reduce resource use even further, you can place all hosts into a single cluster with VMware DRS and HA (high availability) enabled.

<%= image_tag('vsphere-single-cluster.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-single-cluster.png).

A two-cluster architecture may offer useful symmetry at the vSphere level, but PCF works best when it deploys resources in odd numbers. A two-cluster configuration would force the operator into aligning odd-numbered components into two AZs, which does not work well for PCF internal voting algorithms. If you do not want to consume three clusters for PCF, using one works better than using two.

### <a id="single-cluster-design"></a> Networking Design

For a single-cluster deployment, follow the networking setup described in either the [base](#networking) or the [without-NSX](#no-nsx-design). architectures above. The internal compute arrangement for a production PCF deployment does not affect its networking.

Pivotal recommends mapping all datastores used by PCF to all of the hosts in a single-cluster deployment.


## <a id="multi-datacenter"></a> Multi-Datacenter Reference Architecture

To avoid downtime, some PCF customer scenarios demand a multi-datacenter architecture that spreads deployment resources across more than one physical location. A multi-datacenter architecture can support the hardware, power source, and geographic redundancy needed to guarantee high availability.

One interesting strategy for high availability is to keep a record of how many hosts are in a cluster and deploy enough copies of a PCF component in that AZ to ensure survivability in a site loss. This means placing large, odd numbers of components (such as consul) in the cluster so that at least two components are left on either site in the event of a site outage. In a four host cluster, this would call for five consul VMs, so each site has at least two if not the third. DRS anti-affinity rules can be used here (set at the IaaS level) to force like VMs apart for best effect.

The two main ways of designing a multi-datacenter PCF architecture is with [stretched clusters](#multi-datacenter-stretched), in which single logical clusters combine components in multiple physical locations, and [East/West clusters](#multi-datacenter-east-west), in which locally self-contained clusters are mirrored across multiple locations.

Both of these approaches have their own caveats, and you can combine either with the [without-NSX](#no-nsx) and [single-cluster](#single-cluster) architectures described above.

### <a id="multi-datacenter-stretched"></a> Multi-Datacenter vSphere With Stretched Clusters

For this approach, you define logical clusters that contain components physically located in two or more sites. With four hosts, for example, build a a four-host cluster with two hosts in an East datacenter and two from the West. Apply networking such that all hosts see the same networks through a stretched layer 2 application. Or you can use NSX or another SDN solution to tunnel one location over the other.

<%= image_tag('vsphere-multi-datacenter.png') %>

To view a larger version of this diagram, click [here](../images/vsphere-multi-datacenter.png).

PCF and BOSH treat the stretched cluster as an AZ, and make the same demands on it that they do with any other AZ. So the hosting, networking, and storage components within the stretched cluster must perform with normal latency and connectivity.

For seamless operation, hosts must share all datastores, and you must replicate storage across sites. Otherwise, vMotion cannot move VMs freely across hosts for maintenance or DRS.

A stretched version of the base architecture splits three clusters across two sites, yielding a 4×3×3 geometry:

- Four hosts per cluster (two from each site)
- Three clusters for PCF as AZs
- Three AZs mapped to PCF clusters

You can also deploy a stretched version of the [single-cluster](#single-cluster) model. This may be the more practical approach to achieving HA, since any stretched deployment already requires so many resources from two sites.

As with any VMware installation, job scheduling works more efficiently when VMs have fewer cores, so you should configure many smaller Diego Cell VMs rather than a lower number of larger ones. If single or 2-core VMs can handle your apps, favor them over 4- and 8-core options. This is especially important with stretched deployments.

Network traffic is a challenge with stretched clusters, since app traffic may enter at any connection point in either location, but can only leave through a designated gateway. The architect should consider that app traffic landing in the East might have to flow out of the West, a "trombone effect" that forces additional traffic across datacenter links.

### <a id="multi-datacenter-east-west"></a> Multi-Datacenter vSphere With Combined East/West Clusters

For this approach, the architect assigns parallel capacity from two sites independently, and deploys clusters to PCF in matched pairs. This creates even numbers of clusters, which makes suboptimal use of resources in PCF.

East/West mirroring the [base architecture](#base-arch) yields a deployment with six total clusters, three from each side. This may seem like a lot of gear to apply to PCF, but in a Business Continuity and Disaster Recovery (BCDR) scenario, doubling everything is the point.

Combining the East/West multi-datacenter and [single-cluster](#single-cluster) approaches creates a geometry with two clusters and 
pools in one cluster per site, or six AZs. Such a deployment only uses one cluster of capacity from each site, and does not scale readily. But drawing capacity from only one cluster makes it easy to provision with only a few hosts.

A multi-datacenter architecture makes replicating storage less critical. There are enough AZs from either side to survive a point failure, and you can recover the installation without vSphere HA enabled for the clusters.


## <a id="add-docs"></a>Additional Documentation

* <a href="vsphere_upgrade.html" class="subnav">How to Upgrade vSphere without PCF Downtime</a>
* <a href="vsphere_migrate_datastore.html" class="subnav">How to Migrate PCF to a New Datastore in vSphere</a>