Skip to content

Latest commit

 

History

History
343 lines (285 loc) · 14.7 KB

File metadata and controls

343 lines (285 loc) · 14.7 KB

Installation

This document provides instructions for users to configure a control plane machine set within an OpenShift cluster. This process can be followed on OpenShift clusters that are version 4.12 or higher.

A ControlPlaneMachineSet can be installed on a supported platforms provided it has existing, and Running control plane machines. Typically, this would only be true if the cluster was created using Installer-Provisioned infrastructure.

Note: A Running control plane machine means the machine is in the Running phase. By requiring at least 1 Running machine we can ensure that the spec of the machine is valid and that the control plane machine set will be able to create new machines based on that template.

In order to understand what path to take for installing the control plane machine set into the cluster:

  1. check supported platforms to understand the type of support for the cluster
  2. depending on the type of support, follow the corresponding steps:

Pre-installed

For clusters born (installed) with a version/platform combination highlighted as Full in the supported platforms, the installer provisioned infrastructure (IPI) installer workflow will create a control plane machine set and set it to Active.

No further action is required by the user in this case.

This can be checked by using the following command:

oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api

Installation into an existing cluster with a generated resource

In this configuration the control plane machine set may already exist in the cluster.

Its state can be checked by using the following command:

oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api

If Active, there is nothing to do, as the control plane machine set has already been activated by a cluster administrator and is operational.

If Inactive, the control plane machine set can be activated. Before doing so, the control plane machine set spec must be thoroughly reviewed to ensure that the generated spec aligns with the desired specification.

Consult the anatomy of a ControlPlaneMachineSet resource as a reference for understanding the fields and values within a ControlPlaneMachineSet resource.

The generated control plane machine set can be reviewed with the following command:

oc --namespace openshift-machine-api edit controlplanemachineset.machine.openshift.io cluster

If any of the fields do not match with the expected value, the value may be changed, provided that the edit is done in the same oc edit session where the control plane machine set is activated.

Once the spec of the control plane machine set has been reviewed, activate the control plane machine set by setting the .spec.state field to Active.

Once activated, the ControlPlaneMachineSet operator should start the reconciliation of the resource.

Installation into an existing cluster with manual resource

The control plane machine set may not exist in the cluster (unless a cluster administrator has created one already), but it can be manually created and activated.

This can be checked by using the following command:

oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api

To manually create a control plane machine set define a ControlPlaneMachineSet resource as described in the anatomy of a ControlPlaneMachineSet resource.

Anatomy of a ControlPlaneMachineSet

The ControlPlaneMachineSet resource should look something like below:

apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  name: cluster
  namespace: openshift-machine-api
spec:
  state: Active [1]
  replicas: 3 [2]
  strategy:
    type: RollingUpdate [3]
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains: [4]
        platform: <platform>
        <platform failure domains>
      metadata:
        labels:
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
          machine.openshift.io/cluster-api-cluster: <cluster-id> [5]
      spec:
        providerSpec:
          value:
            <platform provider spec> [6]
  1. The state defines whether the ControlPlaneMachineSet is Active or Inactive. When Inactive, the control plane machine set will not take any action on the state of the control plane machines within the cluster. The operator will monitor the state of the cluster and keep the ControlPlaneMachineSet resource up to date. When Active, the control plane machine set will reconcile the control plane machines and will update them as necessary. Once Active, a control plane machine set cannot be made Inactive again.
  2. Replicas is 3 in most cases. Support exceptions may allow this to be 5 replicas in certain circumstances. Horizontal scaling is not currently supported and so this field is currently immutable. This may change in a future release.
  3. The strategy defaults to RollingUpdate. OnDelete is also supported.
  4. ControlPlaneMachineSet spreads Machines across multiple failure domains where possible. Because the underlying primitive used to implement failure domains varies across platforms, you must specify the platform name and a platform-specific field. See configuring provider specific fields for how to configure a failure domain on each platform.
  5. The cluster ID is required here. You should be able to find this label on existing Machines in the cluster. Alternatively, it can be found on the infrastructure resource: oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
  6. The provider spec must match that of the Control Plane Machines created by the installer except you can omit any field set in the failure domains.

Configuring provider specific fields

The following instructions describe how the failure domains and providerSpec fields should be constructed depending on the platform of the cluster.

Configuring a control plane machine set on Amazon Web Services (AWS)

AWS supports both the availabilityZone and subnet in its failure domains. Gather the existing control plane machines and make a note of the values of both the availabilityZone and subnet. Aside from these fields, the remaining spec in the machines should be identical.

Copy the value from one of the machines into the providerSpec.value ([6] in the example above). Remove the avialabilityZone and subnet fields from the providerSpec.value once you have done that.

For each failure domain you have in the cluster (normally 3-6 on AWS), configure a failure domain like below:

- placement:
    availabilityZone: <zone>
  subnet:
    type: Filters
    filters:
    - name: tag:Name
      values:
      - <subnet>

The complete failureDomains ([4] in the example above) should look something like below:

failureDomains:
  platform: AWS
  aws:
  - placement:
      availabilityZone: <zone-1>
    subnet:
      type: Filters
      filters:
      - name: tag:Name
        values:
        - <zone-1-subnet>
  - placement:
      availabilityZone: <zone-2>
    subnet:
      type: Filters
      filters:
      - name: tag:Name
        values:
        - <zone-2-subnet>
  - placement:
      availabilityZone: <zone-3>
    subnet:
      type: Filters
      filters:
      - name: tag:Name
        values:
        - <zone-3-subnet>

Configuring a control plane machine set on Microsoft Azure

Azure supports both the zone and subnet in its failure domains. Gather the existing control plane machines and make a note of the values of both the availabilityZone and subnet. Aside from these fields, the remaining spec in the machines should be identical.

Copy the value from one of the machines into the providerSpec.value ([6] in the example above). Remove the zone and subnet fields from the providerSpec.value once you have done that.

Note: On clusters created before OpenShift 4.15, the subnet field remains consistent for all control plane machines. In this case, it can be retained within the providerSpec.value and does not necessitate configuration within the failureDomains.

For each zone you have in the cluster (normally 3), configure a failure domain like below:

- zone: "<zone>"
  subnet: "<subnet>"

With these zones, the complete failureDomains ([4] in the example above) should look something like below:

failureDomains:
  platform: Azure
  azure:
  - zone: "1"
    subnet: "<cluster_id>-subnet-0"
  - zone: "2"
    subnet: "<cluster_id>-subnet-1"
  - zone: "3"
    subnet: "<cluster_id>-subnet-2"

Note: The internalLoadBalancer field may not be set on the Azure providerSpec. This field is required for control plane machines and you should populate this on both the Machine and the ControlPlaneMachineSet resource specs.

Configuring a control plane machine set on Google Cloud Platform (GCP)

Currently the only field supported by the GCP failure domain is the zone. Gather the existing control plane machines and note the value of the zone of each. Aside from the zone field, the remaining in spec the machines should be identical.

Copy the value from one of the machines into the providerSpec.value ([6] in the example above). Remove the zone field from the providerSpec.value once you have done that.

For each zone you have in the cluster (normally 3), configure a failure domain like below:

- zone: "<zone>"

With these zones, the complete failureDomains ([4] in the example above) should look something like below:

failureDomains:
  platform: GCP
  gcp:
  - zone: us-central1-a
  - zone: us-central1-b
  - zone: us-central1-c

Note: The targetPools field may not be set on the GCP providerSpec. This field is required for control plane machines and you should populate this on both the Machine and the ControlPlaneMachineSet resource specs.

Configuring a control plane machine set on OpenStack

The OpenStack failureDomain configuration supports three fields: availabilityZone (instance AZ), rootVolume.availabilityZone (root volume AZ) and rootVolume.volumeType. Gather the existing control plane machines and note the value of the properties of each if they differ from each other. Aside from these fields, the remaining in spec the machines should be identical.

Copy the value from one of the machines into the providerSpec.value (6) on the example above. Remove the AZ fields from the providerSpec.value once you have done that.

For each AZ you have in the cluster, configure a failure domain like below:

- availabilityZone: "<nova availability zone>"
  rootVolume:
    availabilityZone: "<cinder availability zone>"
    volumeType: "<cinder volume type>"

OpenStack failure domains may not be empty, however each individual property is optional. With these zones, the failureDomains (4 and 5) on the example above should look something like below:

failureDomains:
  platform: OpenStack
  openstack:
  - availabilityZone: nova-az0
    rootVolume:
      availabilityZone: cinder-az0
  - availabilityZone: nova-az1
    rootVolume:
      availabilityZone: cinder-az1
  - availabilityZone: nova-az2
    rootVolume:
      availabilityZone: cinder-az2

Prior to 4.14, if the masters were configured with Availability Zones (AZ), the installer (via Terraform) would create one ServerGroup in OpenStack (the one initially created for master-0, ending with the name of the AZ) but configure the Machine ProviderSpec with different ServerGroups, one per AZ. So if you upgrade a cluster from a previous release to 4.14, you'll need to follow this solution.

Configuring a control plane machine set on vSphere

Currently the only field supported by the vSphere failure domain is the name. On vSphere, the failure domains are represented by the infrastructure resource spec. A vSphere failure domain represents a combination of network, datastore, compute cluster, and datacenter. This allows an administrator to deploy machines in to separate hardware configurations.

A vSphere failure domain will look something like the example below in the infrastructure resource:

  spec:
    cloudConfig:
      key: config
      name: cloud-provider-config
    platformSpec:
      type: VSphere
      vsphere:
        failureDomains:
        - name: us-east-1
          region: us-east
          server: vcs8e-vc.ocp2.dev.cluster.com
          topology:
            computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1
            datacenter: IBMCloud
            datastore: /IBMCloud/datastore/mdcnc-ds-1
            networks:
            - ci-vlan-1289
            resourcePool: /IBMCloud/host/vcs-mdcnc-workload-1/Resources
          zone: us-east-1a
        - name: us-east-2
          region: us-east
          server: vcs8e-vc.ocp2.dev.cluster.com
          topology:
            computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2
            datacenter: IBMCloud
            datastore: /IBMCloud/datastore/mdcnc-ds-2
            networks:
            - ci-vlan-1289
            resourcePool: /IBMCloud/host/vcs-mdcnc-workload-2/Resources

The control plane machine set for vSphere refers to failure domains by their name as defined in the infrastructure spec. vSphere failure domains defined in the control plane machine set will look something like the example below:

  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: us-east-1
        - name: us-east-2

Prior to 4.15, failure domains were not available for vSphere and control plane machine sets. In 4.15, failure domains are available as tech preview.