-
Notifications
You must be signed in to change notification settings - Fork 1.8k
SPLAT-898: day 2 configuration of multiple falure domains #54788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
[id="infrastructure-vsphere-failure-domains-yaml_{context}"] | ||
== Defining the Failure Domain Topology | ||
Topology aware features of the cloud controller manager and vSphere CSI driver require information about the vSphere topolgy where your {product-title} cluster is hosted. This information is defined in an instance named `cluster` of CRD `infrastructures.config.openshift.io`. In the scenario where a cluster is installed with defined failure domains[ref to installation], this CRD will be preconfigured with the `failureDomains` | ||
|
||
=== Sample infrastructure customizations for a VMware vSphere cluster in multiple failure domains | ||
|
||
[source,yaml] | ||
---- | ||
spec: | ||
cloudConfig: | ||
key: config | ||
name: cloud-provider-config | ||
platformSpec: | ||
type: VSphere | ||
vsphere: | ||
vcenters: <1> | ||
- datacenters: <2> | ||
- region-a-dc | ||
- region-b-dc | ||
port: 443 <3> | ||
server: your.vcenter.server <4> | ||
failureDomains: <5> | ||
- name: failure-domain-1 <6> | ||
region: region-a <7> | ||
zone: zone-a <8> | ||
server: your.vcenter.server <9> | ||
topology: <10> | ||
datacenter: region-a-dc <11> | ||
computeCluster: "/region-a-dc/host/zone-a-cluster" <12> | ||
resourcePool: "/region-a-dc/host/zone-a-cluster/Resources/resource-pool" <13> | ||
datastore: "/region-a-dc/datastore/datastore-a" <14> | ||
networks: <15> | ||
- port-group | ||
- name: failure-domain-2 | ||
region: region-a | ||
zone: zone-b | ||
server: your.vcenter.server | ||
topology: | ||
computeCluster: /region-a-dc/host/zone-b-cluster | ||
datacenter: region-a-dc | ||
datastore: /region-a-dc/datastore/datastore-a | ||
networks: | ||
- port-group | ||
- name: failure-domain-3 | ||
region: region-b | ||
zone: zone-a | ||
server: your.vcenter.server | ||
topology: | ||
computeCluster: /region-b-dc/host/zone-a-cluster | ||
datacenter: region-b-dc | ||
datastore: /region-b-dc/datastore/datastore-b | ||
networks: | ||
- port-group | ||
nodeNetworking: | ||
external: {} | ||
internal: {} | ||
---- | ||
|
||
<1> The list of vCenter servers associated with the {product-title} cluster. Only one vCenter may be defined. | ||
|
||
<2> The list of vCenter datacenters where VMs associated with the {product-title} cluster will be created or presently exist. | ||
|
||
<3> The TCP port of the vCenter server. | ||
|
||
<4> The FQDN of the vCenter server. | ||
|
||
<5> The list of failure domains. | ||
|
||
<6> The name of the failure domain. | ||
|
||
<7> The value of the `openshift-region` tag assigned to the topology for the failure failure domain. | ||
|
||
<8> The value of the `openshift-zone` tag assigned to the topology for the failure failure domain. | ||
|
||
<9> The name of the vCenter server as defined by <4> | ||
|
||
<10> The vCenter reources associated with the failure domain. | ||
|
||
<11> The datacenter associated with the failure domain. | ||
|
||
<12> The full path of the compute cluster associated with the failure domain. | ||
|
||
<13> Optional: The full path of the resource pool associated with the failure domain. | ||
|
||
<14> The full path of the datastore associated with the failure domain. | ||
|
||
<15> A list of port groups associated with the failure domain. Only one portgroup may be defined. | ||
|
||
|
||
You can edit the `Infrastructure` CRD instance containing the topology configuration by running the following command: | ||
[source,terminal] | ||
---- | ||
$ oc edit infrastructures.config.openshift.io cluster | ||
---- | ||
|
||
[IMPORTANT] | ||
==== | ||
Once a failure domain has been created it must not be deleted or modified. However, new failure domains can be appended to the list of failure domains. | ||
==== |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[id="installation-vsphere-zones-prerequisites_{context}"] | ||
= Prerequisites for multiple failure domains | ||
|
||
* All failure domains share a common layer 3 network | ||
* You created tag categories `openshift-region` and `openshift-zone` in vCenter. | ||
* Datacenters and compute clusters have tags representing the name of their associated region and/or zone. | ||
|
||
For example, if `datacenter-1` represents `region-a` and `compute-cluster-1` represents `zone-1`, then a tag of category `openshift-region` with a value of `region-a` is applied to `datacenter-1`. Addtionally, a tag of category `openshift-zone` with a value of `zone-1` is applied to `compute-cluster-1`. |
47 changes: 47 additions & 0 deletions
47
post_installation_configuration/vsphere-failure-domain-configuration.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
:_content-type: ASSEMBLY | ||
:context: post-install-vsphere-failure-domain-configuration | ||
[id="post-install-vsphere-failure-domain-configuration"] | ||
= Configuring a cluster on vSphere with multiple failure domains | ||
include::_attributes/common-attributes.adoc[] | ||
|
||
toc::[] | ||
|
||
After deploying {product-title}, you can configure a cluster to utilize multiple failure domains. A failure domain describes a unique topology which may consist of: | ||
|
||
* datacenter | ||
|
||
* compute cluster | ||
|
||
* datastore | ||
|
||
* portgroup | ||
|
||
* resource pool | ||
|
||
By defining multiple failure domains, administrators are able to distribute key control plane and workload elements among varied hardware resources in their datacenter. | ||
|
||
include::modules/installation-vsphere-zones-prerequisites.adoc[leveloffset=+1] | ||
|
||
[IMPORTANT] | ||
==== | ||
If tags are not applied prior to node migration or creation, nodes may not be labeled with the `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels by the cloud provider. | ||
==== | ||
|
||
[IMPORTANT] | ||
==== | ||
The API and ingress VIPs require that failure domains share a common Layer 3 network. | ||
==== | ||
|
||
include::modules/infrastructure-vsphere-failure-domains-yaml.adoc[leveloffset=+1] | ||
|
||
## Node Placement | ||
|
||
After you have defined failure domains, nodes may be migrated or created in the required failure domains. | ||
|
||
### Control Plane Nodes | ||
|
||
Control plane nodes may be migrated with compute vMotion to the desired failure domain. Nodes will be labeled with `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels associated with their failure domains by the cloud provider. | ||
|
||
### Compute Nodes | ||
|
||
Preexsting compute nodes may be migrated as with control plane nodes. However, it is recommended that new xref:../machine_management/creating_machinesets/creating-machineset-vsphere.html[machinesets] be created to provision compute nodes in the topology associated with each failure domain. For example, the failure domains defined in <<infrastructure-vsphere-failure-domains-yaml_{context},Defining the Failure Domain Topology>> will require new `machinesets` corresponding to `failure-domain-1`, `failure-domain-2`, and `failure-domain-3`. Once adequate compute nodes are scaled up in the required failure domains, the preexisting compute nodes may be scaled down. Nodes provisioned by the created `machinesets` will be labeled with `topology.kubernetes.io/zone` and `topology.kubernetes.io/region` labels associated with their failure domains by the cloud provider. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to enumerate the things you must do before doing this? Would it help the end user to understand if we had a list of, here's what would break/here's what you'd need to do if you really wanted to remove one?
Statements like this always seem weird to me when they don't have justifications for why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Joel. @gnufied would you mind providing a quick summary of the 'why' and I can incorporate it in to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but some follow up questions. Is this storage only item? Say for example - if adding a failure-domain results in creation of VMs in that failure-domain , will removal of failure-domain result in deletion of VMs in that failure-domain? (i.e will get implemented?).