Skip to content

Commit

Permalink
feat: add GCP ccm
Browse files Browse the repository at this point in the history
Update docs to reflect deploying GCP Cloud Control Manager (CCM)

Signed-off-by: Noel Georgi <git@frezbo.dev>
  • Loading branch information
frezbo committed Nov 25, 2021
1 parent 7433150 commit d5cbc36
Show file tree
Hide file tree
Showing 4 changed files with 452 additions and 16 deletions.
89 changes: 89 additions & 0 deletions website/content/docs/v0.14/Cloud Platforms/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,8 @@ We need to download two deployment manifests for the deployment from the Talos g
```bash
curl -fsSLO "https://raw.githubusercontent.com/talos-systems/talos/master/website/content/docs/v0.14/Cloud%20Platforms/gcp/config.yaml"
curl -fsSLO "https://raw.githubusercontent.com/talos-systems/talos/master/website/content/docs/v0.14/Cloud%20Platforms/gcp/talos-ha.yaml"
# if using ccm
curl -fsSLO "https://raw.githubusercontent.com/talos-systems/talos/master/website/content/docs/v0.14/Cloud%20Platforms/gcp/gcp-ccm.yaml"
```

### Updating the config
Expand All @@ -273,6 +275,7 @@ resources:
properties:
zone: us-west2-c
talosVersion: v0.13.2
externalCloudProvider: false
controlPlaneNodeCount: 5
controlPlaneNodeType: n1-standard-1
workerNodeCount: 3
Expand All @@ -282,6 +285,16 @@ outputs:
value: $(ref.talos-ha.bucketName)
```

#### Enabling external cloud provider

Note: The `externalCloudProvider` property is set to `false` by default.
The [manifest](https://raw.githubusercontent.com/talos-systems/talos/master/website/content/docs/v0.14/Cloud%20Platforms/gcp/gcp-ccm.yaml#L256) used for deploying the ccm (cloud controller manager) is currently using the GCP ccm provided by openshift since there are no public images for the [ccm](https://github.com/kubernetes/cloud-provider-gcp) yet.

> Since the routes controller is disabled while deploying the CCM, the CNI pods needs to be restarted after the CCM deployment is complete to remove the `node.kubernetes.io/network-unavailable` taint.
See [Nodes network-unavailable taint not removed after installing ccm](https://github.com/kubernetes/cloud-provider-gcp/issues/291) for more information

Use a custom built image for the ccm deployment if required.

### Creating the deployment

Now we are ready to create the deployment.
Expand All @@ -303,15 +316,72 @@ First we need to get the deployment outputs.
OUTPUTS=$(gcloud deployment-manager deployments describe "${DEPLOYMENT_NAME}" --format json | jq '.outputs[]')

BUCKET_NAME=$(jq -r '. | select(.name == "bucketName").finalValue' <<< "${OUTPUTS}")
# used when cloud controller is enabled
SERVICE_ACCOUNT=$(jq -r '. | select(.name == "serviceAccount").finalValue' <<< "${OUTPUTS}")
PROJECT=$(jq -r '. | select(.name == "project").finalValue' <<< "${OUTPUTS}")
```

Note: If cloud controller manager is enabled, the below command needs to be run to allow the controller custom role to access cloud resources

```bash
gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member "serviceAccount:${SERVICE_ACCOUNT}" \
--role roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.admin

gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.loadBalancerAdmin
```

### Downloading talos and kube config

In addition to the `talosconfig` and `kubeconfig` files, the storage bucket contains the `controlplane.yaml` and `worker.yaml` files used to join additional nodes to the cluster.

```bash
gsutil cp "gs://${BUCKET_NAME}/generated/talosconfig" .
gsutil cp "gs://${BUCKET_NAME}/generated/kubeconfig" .
```

### Deploying the cloud controller manager

```bash
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
apply \
--filename gcp-ccm.yaml
# wait for the ccm to be up
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset cloud-controller-manager
```

If the cloud controller manager is enabled, we need to restart the CNI pods to remove the `node.kubernetes.io/network-unavailable` taint.

```bash
# restart the CNI pods, in this case flannel
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout restart \
daemonset kube-flannel
# wait for the pods to be restarted
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset kube-flannel
```

### Check cluster status

```bash
Expand All @@ -329,3 +399,22 @@ Warning: This will delete the deployment and all resources associated with it.
gsutil rm -r "gs://${BUCKET_NAME}"
gcloud deployment-manager deployments delete "${DEPLOYMENT_NAME}"
```

Run below if cloud controller manager is enabled

```bash
gcloud projects delete-iam-policy-binding \
"${PROJECT}" \
--member "serviceAccount:${SERVICE_ACCOUNT}" \
--role roles/iam.serviceAccountUser

gcloud projects delete-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.admin

gcloud projects delete-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.loadBalancerAdmin
```
7 changes: 6 additions & 1 deletion website/content/docs/v0.14/Cloud Platforms/gcp/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@ resources:
type: talos-ha.jinja
properties:
zone: us-west2-c
talosVersion: v0.13.2
talosVersion: v0.13.3
externalCloudProvider: false
controlPlaneNodeCount: 3
controlPlaneNodeType: n1-standard-1
workerNodeCount: 1
workerNodeType: n1-standard-1
outputs:
- name: bucketName
value: $(ref.talos-ha.bucketName)
- name: serviceAccount
value: $(ref.talos-ha.serviceAccount)
- name: project
value: $(ref.talos-ha.project)

0 comments on commit d5cbc36

Please sign in to comment.