openshift · rvanderp3 · Jan 17, 2023 · JoelSpeed · Jan 19, 2023 · rvanderp3
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -543,6 +543,8 @@ Topics:
 - Name: Enabling cluster capabilities
   File: enabling-cluster-capabilities
   Distros: openshift-origin,openshift-enterprise
+- Name: Configuring a cluster on vSphere with multiple failure domains
+  File: vsphere-failure-domain-configuration
 - Name: Configuring additional devices in an IBM Z or LinuxONE environment
   File: ibmz-post-install
 - Name: Red Hat Enterprise Linux CoreOS image layering

diff --git a/modules/infrastructure-vsphere-failure-domains-yaml.adoc b/modules/infrastructure-vsphere-failure-domains-yaml.adoc
@@ -0,0 +1,99 @@
+[id="infrastructure-vsphere-failure-domains-yaml_{context}"]
+== Defining the Failure Domain Topology
+Topology aware features of the cloud controller manager and vSphere CSI driver require information about the vSphere topolgy where your {product-title} cluster is hosted. This information is defined in an instance named `cluster` of CRD `infrastructures.config.openshift.io`.  In the scenario where a cluster is installed with defined failure domains[ref to installation], this CRD will be preconfigured with the `failureDomains` 
+
+=== Sample infrastructure customizations for a VMware vSphere cluster in multiple failure domains
+
+[source,yaml]
+----
+spec:
+  cloudConfig:
+    key: config
+    name: cloud-provider-config
+  platformSpec:
+    type: VSphere
+    vsphere:
+      vcenters: <1>
+        - datacenters: <2>
+            - region-a-dc
+            - region-b-dc
+          port: 443 <3>
+          server: your.vcenter.server <4>
+      failureDomains: <5>
+        - name: failure-domain-1 <6>
+          region: region-a <7>
+          zone: zone-a <8>
+          server: your.vcenter.server <9>
+          topology: <10>
+            datacenter: region-a-dc <11>
+            computeCluster: "/region-a-dc/host/zone-a-cluster" <12>  
+            resourcePool: "/region-a-dc/host/zone-a-cluster/Resources/resource-pool" <13> 
+            datastore: "/region-a-dc/datastore/datastore-a" <14>
+            networks: <15>
+            - port-group
+        - name: failure-domain-2
+          region: region-a
+          zone: zone-b
+          server: your.vcenter.server
+          topology:
+            computeCluster: /region-a-dc/host/zone-b-cluster
+            datacenter: region-a-dc
+            datastore: /region-a-dc/datastore/datastore-a            
+            networks:
+            - port-group            
+        - name: failure-domain-3
+          region: region-b
+          zone: zone-a
+          server: your.vcenter.server
+          topology:
+            computeCluster: /region-b-dc/host/zone-a-cluster
+            datacenter: region-b-dc
+            datastore: /region-b-dc/datastore/datastore-b
+            networks:
+            - port-group       
+      nodeNetworking:
+        external: {}
+        internal: {}
+----
+
+<1> The list of vCenter servers associated with the {product-title} cluster. Only one vCenter may be defined.
+
+<2> The list of vCenter datacenters where VMs associated with the {product-title} cluster will be created or presently exist.
+
+<3> The TCP port of the vCenter server.
+
+<4> The FQDN of the vCenter server.
+
+<5> The list of failure domains.
+
+<6> The name of the failure domain. 
+
+<7> The value of the `openshift-region` tag assigned to the topology for the failure failure domain.
+
+<8> The value of the `openshift-zone` tag assigned to the topology for the failure failure domain.
+
+<9> The name of the vCenter server as defined by <4>
+
+<10> The vCenter reources associated with the failure domain.
+
+<11> The datacenter associated with the failure domain. 
+
+<12> The full path of the compute cluster associated with the failure domain. 
+
+<13> Optional: The full path of the resource pool associated with the failure domain. 
+
+<14> The full path of the datastore associated with the failure domain. 
+
+<15> A list of port groups associated with the failure domain. Only one portgroup may be defined.
+
+
+You can edit the `Infrastructure` CRD instance containing the topology configuration by running the following command:
+[source,terminal]
+----
+$ oc edit infrastructures.config.openshift.io cluster
+----
+
+[IMPORTANT]
+====
+Once a failure domain has been created it must not be deleted or modified.  However, new failure domains can be appended to the list of failure domains.
+====
diff --git a/modules/installation-vsphere-zones-prerequisites.adoc b/modules/installation-vsphere-zones-prerequisites.adoc
@@ -0,0 +1,8 @@
+[id="installation-vsphere-zones-prerequisites_{context}"]
+= Prerequisites for multiple failure domains
+
+* All failure domains share a common layer 3 network
+* You created tag categories `openshift-region` and `openshift-zone` in vCenter.
+* Datacenters and compute clusters have tags representing the name of their associated region and/or zone. 
+
+For example, if `datacenter-1` represents `region-a` and `compute-cluster-1` represents `zone-1`, then a tag of category `openshift-region` with a value of `region-a` is applied to `datacenter-1`.  Addtionally, a tag of category `openshift-zone` with a value of `zone-1` is applied to `compute-cluster-1`.
diff --git a/post_installation_configuration/vsphere-failure-domain-configuration.adoc b/post_installation_configuration/vsphere-failure-domain-configuration.adoc
@@ -0,0 +1,47 @@
+:_content-type: ASSEMBLY
+:context: post-install-vsphere-failure-domain-configuration
+[id="post-install-vsphere-failure-domain-configuration"]
+=  Configuring a cluster on vSphere with multiple failure domains
+include::_attributes/common-attributes.adoc[]
+
+toc::[]
+
+After deploying {product-title}, you can configure a cluster to utilize multiple failure domains. A failure domain describes a unique topology which may consist of:
+
+* datacenter
+
+* compute cluster
+
+* datastore
+
+* portgroup
+
+* resource pool
+
+By defining multiple failure domains, administrators are able to distribute key control plane and workload elements among varied hardware resources in their datacenter. 
+
+include::modules/installation-vsphere-zones-prerequisites.adoc[leveloffset=+1]
+
+[IMPORTANT]
+====
+If tags are not applied prior to node migration or creation, nodes may not be labeled with the `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels by the cloud provider.
+====
+
+[IMPORTANT]
+====
+The API and ingress VIPs require that failure domains share a common Layer 3 network. 
+====
+
+include::modules/infrastructure-vsphere-failure-domains-yaml.adoc[leveloffset=+1]
+
+## Node Placement
+
+After you have defined failure domains, nodes may be migrated or created in the required failure domains.
+
+### Control Plane Nodes
+
+Control plane nodes may be migrated with compute vMotion to the desired failure domain. Nodes will be labeled with `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels associated with their failure domains by the cloud provider.
+
+### Compute Nodes
+
+Preexsting compute nodes may be migrated as with control plane nodes. However, it is recommended that new xref:../machine_management/creating_machinesets/creating-machineset-vsphere.html[machinesets] be created to provision compute nodes in the topology associated with each failure domain. For example, the failure domains defined in <<infrastructure-vsphere-failure-domains-yaml_{context},Defining the Failure Domain Topology>> will require new `machinesets` corresponding to `failure-domain-1`, `failure-domain-2`, and `failure-domain-3`. Once adequate compute nodes are scaled up in the required failure domains, the preexisting compute nodes may be scaled down. Nodes provisioned by the created `machinesets` will be labeled with `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels associated with their failure domains by the cloud provider.