Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add operator manual for GCP #62

Merged
merged 1 commit into from
Mar 1, 2021
Merged

Conversation

Ajarmar
Copy link
Contributor

@Ajarmar Ajarmar commented Feb 17, 2021

Added operator manual for setting up Compliant Kubernetes on GCP using compliantkubernetes-kubespray and compliantkubernetes-apps, using the GCP Persistent Disk Driver for block storage.

Fixes elastisys/compliantkubernetes-kubespray#14 and #58

Copy link
Collaborator

@cristiklein cristiklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Please fix:

  • the minor issues I suggested
  • double-check the formatting with mkdocs serve
  • fix the pipeline errors

4. Modify `kubespray/contrib/terraform/gcp/tfvars.json` in the following way:
- Set `gcp_project_id` to the ID of your GCP project.
- Set `keyfile_location` to the location of your JSON keyfile.
- Set `ssh_pub_key` to the path of your public ssh key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capitalize SSH throughout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, somewhat orthogonal to the present PR, please note that for Exoscale we opted to include the SSH key inline to facilitate operators sharing operation of a cluster. See full discussion here.

Copy link
Contributor

@Xartos Xartos Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a variable name I think we should stick to the naming convention of terraform
EDIT: nvm. I didn't see the ssh word in the end of the sentence

2. In `compliantkubernetes-apps`, run:
```bash
export CK8S_ENVIRONMENT_NAME=<environment-name>
export CK8S_CLOUD_PROVIDER=baremetal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions for myself:

  1. Do we use GCP load-balancers?
  2. Do we have an issue to iron this out?


Note that in release v0.9.0 of compliantkubernetes-apps, fluentd will not work in the service cluster.

1. Set up [ck8s-dns](https://github.com/elastisys/ck8s-dns) on a provider of your choice, using the `ingress_controller_lb_ip_address` from `terraform apply` as your loadbalancer IPs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo is internal. Can you write what DNS entries to create in the following format:

echo "
*.$BASE_DOMAIN     60s A 203.0.113.123
*.ops.$BASE_DOMAIN 60s A 203.0.113.123
"

Copy link
Contributor

@llarsson llarsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Some comments/questions, and then I am really looking forward to learning what @ph4n666's run through of this stuff shows. :)

docs/operator-manual/gcp.md Show resolved Hide resolved

The following instructions were made for release v0.9.0 of compliantkubernetes-apps. There may be discrepancies with newer versions.

Note that in release v0.9.0 of compliantkubernetes-apps, fluentd will not work in the service cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a line about why it will not work. What alternative should they go with instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need a for loop for every bash script mentioned in this commit?

docs/operator-manual/gcp.md Show resolved Hide resolved
Copy link
Contributor

@Xartos Xartos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tutorial! just some minor comments

docs/operator-manual/gcp.md Outdated Show resolved Hide resolved
- Set `keyfile_location` to the location of your JSON keyfile.
- Set `ssh_pub_key` to the path of your public ssh key.
- In `ssh_whitelist`, `api_server_whitelist` and `nodeport_whitelist`, add IP address(es) that you want to be able to access the cluster.
5. Set up the nodes by performing the following steps, replacing `<prefix>` with `sc`/`wc`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with something more generic, like

Suggested change
5. Set up the nodes by performing the following steps, replacing `<prefix>` with `sc`/`wc`:
5. Set up the nodes by performing the following steps, replacing `<prefix>` with `my-sc-cluster`/`my-wc-cluster`:

To make it feel like you want to change this and not that it's a requirement to use sc/wc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compliantkubernetes-kubespray readme still says:

For now you need to set this to wc or sc if you want to install compliantkubernetes apps on top afterwards, this restriction will be removed later.

Is this no longer true? I'm also a bit unsure in general how much this documentation should accommodate for multitenancy setups since there still are some issues on the apps side, like the DNS problem

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is no longer true. Otherwise, how did I set up two WCs on AWS? 😄

Can you clarify "DNS problem"? I am aware of a few sharp corners, but not blockers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the specifics but @lentzi90 and @pettersv ran into some DNS issues when setting up a MT cluster. Not really "blockers" because it was still possible to set it up, but there were some limitations as a consequence. You'd have to ask them for the details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compliantkubernetes-kubespray it will work fine with different names now. In apps it is a bit rough around the edges since you will need to use each prefix as a separate CK8S_CONFIG_PATH and the kubeconfigs for the workload clusters must all be named kube_config_wc.yaml. See elastisys/compliantkubernetes-apps#85.

The DNS issue is that the service cluster cannot measure uptime or alert correctly for WC API servers. It will think that they (actually "it", as it only knows about one) are down and send out alerts accordingly. This is because all clusters in one environment must have the same ops and base domains pointing to the SC for this to work. This means that the health check for the WC API server ends up targeting SC instead of one of the WCs and of course fails. So expect constant alerts. 😉 🚨

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lentzi90 Do we have an issue for the uptime alert?

Regarding the former, this does the trick for me:

for CLUSTER in $WORKLOAD_CLUSTERS; do
    ln -sf $CK8S_CONFIG_PATH/.state/kube_config_${CLUSTER}.yaml $CK8S_CONFIG_PATH/.state/kube_config_wc.yaml
    ./bin/ck8s apply wc  # Respond "n" if you get a WARN
done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue now for the DNS/alerting: elastisys/compliantkubernetes-apps#253

I bet your snippet works great Cristian. Could you add it somewhere so that it is impossible to miss it when setting up a cluster?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The program PC Pitstop included a clause in their end-user license agreement stating that anybody who read the clause and contacted the company would receive a monetary reward, but it took four months and over 3,000 software downloads before anybody collected it.

Should I include something similar around my snippet? 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more:

Please don't contact me before you've read all the relevant parts of this page. I will know if you haven't and I'll ignore your message.
[2 screen scrolls later]
Write me a letter that indicates that you've read this page by including the phrase “parens rock”,

1. Set up the nodes with terraform. If desired, first modify `"machines"` in `kubespray/contrib/terraform/gcp/tfvars.json` to add/remove nodes, change node sizes, etc. (For setting up compliantkubernetes-apps in the service cluster, one `n1-standard-8` worker and one `n1-standard-4` worker is enough.)
```bash
cd kubespray/contrib/terraform/gcp
export CLUSTER=<prefix>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to have the for loop as we have in the AWS and Exoscale tutorials here? To make them more inline with each other?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please we support multiple workload clusters. 😄

5. Set up the nodes by performing the following steps, replacing `<prefix>` with `sc`/`wc`:
1. Set up the nodes with terraform. If desired, first modify `"machines"` in `kubespray/contrib/terraform/gcp/tfvars.json` to add/remove nodes, change node sizes, etc. (For setting up compliantkubernetes-apps in the service cluster, one `n1-standard-8` worker and one `n1-standard-4` worker is enough.)
```bash
cd kubespray/contrib/terraform/gcp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you use this for other snippets as well. Should you use pushd/popd so that you don't need to run cd ../../../../ between each snippet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a bit nicer if you want the commands to just be copy-pastable - my idea with writing it the way I did was just to make it clear which folder the commands should be executed in, not necessarily that the user should cd ../../../../ after each snippet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I totally understand. That's why I like the pushd/popd because then it's clear where the commands are running AND you can still copy/paste it and it will automagically work. Best of both worlds 😉

```
* `path to ssh key` should point to your private ssh key. It will be copied into your config path and encrypted with SOPS, the original file left as it were.
* `SOPS fingerprint` is the gpg fingerprint that will be used for SOPS encryption. You need to set this or the environment variable `CK8S_PGP_FP` the first time SOPS is used in your specified config path.
4. Edit the IP addresses and nodes in your `inventory.ini` (found in your config path) to match the VMs that should be part of the cluster. The contents of the `$CLUSTER-inventory.ini` file that you generated in the previous section can be copy-pasted into the appropriate `inventory.ini` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just mv $CLUSTER-inventory.ini inventory.ini right?
Or is there some things missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that should work fine.

docs/operator-manual/gcp.md Show resolved Hide resolved
Comment on lines +90 to +142
bin/ck8s ops kubectl sc "patch storageclass csi-gce-pd -p '{\"metadata\": {\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"true\"}}}'"
bin/ck8s ops kubectl wc "patch storageclass csi-gce-pd -p '{\"metadata\": {\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"true\"}}}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this isn't possible to do in kubespray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, haven't really looked into it.


The following instructions were made for release v0.9.0 of compliantkubernetes-apps. There may be discrepancies with newer versions.

Note that in release v0.9.0 of compliantkubernetes-apps, fluentd will not work in the service cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need a for loop for every bash script mentioned in this commit?

Copy link
Collaborator

@cristiklein cristiklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

docs/operator-manual/gcp.md Outdated Show resolved Hide resolved
@Ajarmar Ajarmar merged commit c0b0ad3 into main Mar 1, 2021
@cristiklein cristiklein deleted the axel/gcp-operator-docs branch November 19, 2021 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[1] Demonstrate: setting up a GCP cluster using google csi
6 participants