Skip to content

Commit

Permalink
Ready to deploy rick cluster (#13)
Browse files Browse the repository at this point in the history
* Remove internal_ip not needed anymore

* Remove hetzner software raid after centos installation too

* Remove internal_hostname use inventory_hostname instead

* Setup dns & lb first before firewall

because configure-hrobot-firewall.yaml need ip of lb vm

* Update toolbox, add missing packages

* Update hosts.yaml.example

* Update haproxy config, add only masters & bootstrap

* Fixed issue #11 - setup raid on masters

* Clean ip firewall configuration, bacause of #8

* Wipe RHCOS Raid too part of #11

* Improve reboot

* Provide playbook to disable hetzner firewall

* Fix igntion creation to add worker node afterwards

* Update README, add how to add worker

* Add post installation step

* Add missing newline

* fix: Make pre-commit happy

Signed-off-by: Tomas Coufal <tcoufal@redhat.com>

Co-authored-by: Tomas Coufal <tcoufal@redhat.com>
  • Loading branch information
rbo and tumido committed Jun 24, 2021
1 parent acf8d3b commit fe7beed
Show file tree
Hide file tree
Showing 13 changed files with 287 additions and 76 deletions.
14 changes: 12 additions & 2 deletions Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,17 @@


FROM registry.fedoraproject.org/fedora-toolbox

ARG OPENSHIFT_VERSION=4.7.12

RUN \
dnf install -y ansible pip python3-google-auth.noarch && \
dnf install -y ansible pip python3-google-auth.noarch vim && \
pip install hcloud && \
ansible-galaxy collection install -p /usr/share/ansible/collections community.hrobot
ansible-galaxy collection install -p /usr/share/ansible/collections community.hrobot && \
ansible-galaxy collection install -p /usr/share/ansible/collections hetzner.hcloud

RUN echo "===== Install $OPENSHIFT_VERSION openshift-install =====" \
&& curl -# -L -o /tmp/openshift-install-linux.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$OPENSHIFT_VERSION/openshift-install-linux.tar.gz \
&& tar xzvf /tmp/openshift-install-linux.tar.gz -C /usr/local/bin/ openshift-install \
&& chmod +x /usr/local/bin/openshift-install \
&& rm /tmp/openshift-install-linux.tar.gz
170 changes: 148 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,54 @@

## Network Overview

![Network overview](docs/nimages/etwork-overview-v3.png)
![Network overview](docs/images/network-overview-v3.png)


## Cluster design

* 3 Master
* 3 Worker

One of the worker is used to be the bootstrap during installation.

## Installation

You can run all playbooks inside a toolbox [![Docker Repository on Quay](https://quay.io/repository/operate-first/hetzner-baremetal-toolbox/status "Docker Repository on Quay")](https://quay.io/repository/operate-first/hetzner-baremetal-toolbox) :

```bash
toolbox create --image quay.io/operate-first/hetzner-baremetal-toolbox hetzner-toolbox
toolbox enter hetzner-toolbox
```

Source of the toolbox [Containerfile](/Containerfile)


### Create initial hosts.yaml based on hosts.yaml.example

```bash
cp -v hosts.yaml.example hosts.yaml
$EDITOR hosts.yaml
```

### DNS & load balancer preperations

* Configure load balancer:
* Public for api & ingress
* Private for api-int

* Configure DNS entries for
* `api.<cluster_name>.emea.operate-first.cloud`
* `api-int.<cluster_name>.emea.operate-first.cloud`
* `*.apps.<cluster_name>.emea.operate-first.cloud`

All steps are done with one single playbook:

**Important: this step is not idempotent** (Issue #5)

```bash
./configure-lb-and-dns.yaml
```

### Hardware preperations

* Order a BareMetal Server - Issue for sizing: #6
Expand All @@ -27,13 +64,15 @@ $EDITOR hosts.yaml
* Run a RH CoreOS Test installation with ssh-only ignition
```bash
./reset-server.yaml [-l hostname]
# SSh into rescue system and run coreos-install command printed out at the end ot the playbook.
# SSh into rescue system and run coreos-install
# command printed out at the end ot the playbook.


```

Check installation, server boot? Can connect via SSH?

* Boot Rescue mode - should fail! :-)
* Boot Rescue mode

```bash
./force-rescue-mode.yaml [-l hostname]
Expand All @@ -50,25 +89,6 @@ $EDITOR hosts.yaml
./configure-hrobot-firewall.yaml [-l hostname]
```

### DNS & load balancer preperations

* Configure load balancer:
* Public for api & ingress
* Private for api-int

* Configure DNS entries for
* `api.<cluster_name>.emea.operate-first.cloud`
* `api-int.<cluster_name>.emea.operate-first.cloud`
* `*.apps.<cluster_name>.emea.operate-first.cloud`

All steps are done with one single playbook:

**Important: this step is not idempotent** (Issue #5)

```bash
./configure-lb-and-dns.yaml
```

### OpenShift installation

Prerequisites:
Expand All @@ -84,6 +104,15 @@ Boots into rescue mode and prepare rescue system to install Red Hat CoreOS
./reset-server.yaml
```

#### Wipe server

To ensure nothing is on the disk wipe it:

```bash
./wipe-server.yaml
```


#### Create ignition config and transfer to hosts

```bash
Expand All @@ -105,6 +134,103 @@ Optional split it into two steps:
./run-installer.yaml --tags reboot
```

#### During installation watch for CSR

Accept pending CSR from your worker nodes

```bash
oc get csr | awk '/Pending/ { print $1}' | xargs -n1 oc adm certificate approve
```

### Add bootstrap node as worker

#### Adjust haproxy config

```bash
ssh -l root -i <private-key> <private lb vm>
vi /etc/haproxy/haproxy.cfg
systemctl reload haproxy
```

<details>
<summary>Check a proxy stats</summary>

```bash
echo "show stat" | nc -U /var/lib/haproxy/stats | cut -d "," -f 1,2,18,57| column -s, -t;
# pxname svname status last_chk
machine-config-server FRONTEND OPEN
machine-config-server host01.example.com UP
machine-config-server host02.example.com UP
machine-config-server host04.example.com UP
machine-config-server BACKEND UP
api FRONTEND OPEN
api host01.example.com UP
api host02.example.com UP
api host04.example.com UP
api BACKEND UP
```

</details>

#### Adjust `host.yaml`

Move bootstrap node from bootstrap to worker hostgroup.

#### Boot rescure mode

```bash
./force-rescue-mode.yaml -l <bootstrap-node>
```

#### Prepare installation
```bash
./reset-server.yaml -l <bootstrap-node>
```

#### Preare ignition config
```bash
./create-ignition.yaml -l <bootstrap-node>
```

#### Wipe server

```bash
./wipe-server.yaml -l <bootstrap-node>
```

#### Run the installer


```bash
./run-installer.yaml -l <bootstrap-node>
```

Optional split it into two steps:

```bash
./run-installer.yaml --skip-tags reboot -l <bootstrap-node>
# Check output
./run-installer.yaml --tags reboot -l <bootstrap-node>
```

#### Watch for pending CSRs

Accept pending CSR from your worker node.

```bash
oc get csr | awk '/Pending/ { print $1}' | xargs -n1 oc adm certificate approve
```

### Post installation

#### Remove worker label from master

```
oc edit scheduler
```
Change `mastersSchedulable: true` to `mastersSchedulable: false`


### OpenShift reinstallation

Its recommended to wipe the disk with `wipe-server.yaml` playbook bevor reinstallation.
14 changes: 14 additions & 0 deletions clear-hrobot-firewall.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env ansible-playbook
---
- hosts: nodes,bootstrap
gather_facts: false
connection: local
tasks:
- community.hrobot.firewall:
hetzner_user: "{{ hetzner_webservice_username }}"
hetzner_password: "{{ hetzner_webservice_password }}"
server_ip: "{{ hetzner_ip }}"
state: absent
whitelist_hos: yes
rules:
input:
4 changes: 2 additions & 2 deletions configure-lb-and-dns.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
bind :22623
mode tcp
{% for host in groups['nodes'] -%}
{% for host in groups['masters'] -%}
server {{ host }} {{ hostvars[host].hetzner_ip }}:22623 check
{% endfor -%}
Expand All @@ -78,7 +78,7 @@
bind :6443
mode tcp
{% for host in groups['nodes'] -%}
{% for host in groups['masters'] -%}
server {{ host }} {{ hostvars[host].hetzner_ip }}:6443 check
{% endfor -%}
Expand Down
26 changes: 26 additions & 0 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,29 @@ Partition table scan:
```
dd if=/dev/zero of=/dev/sdc bs=512 count=1
```

## Delete software raid

In case `./wipe-server.yaml` failes with `Device or resource busy`, please check software raid:

```bash
root@rescue ~ # cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[0] sda3[1]
463998784 blocks super 1.2 [2/2] [UU]
[===>.................] resync = 15.7% (73277376/463998784) finish=32.1min speed=202359K/sec
bitmap: 4/4 pages [16KB], 65536KB chunk

md1 : active raid1 sdb2[0] sda2[1]
523264 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
4189184 blocks super 1.2 [2/2] [UU]

unused devices: <none>

root@rescue ~ # mdadm --stop md0 md1 md2
mdadm: stopped md0
mdadm: stopped md1
mdadm: stopped md2
```
27 changes: 19 additions & 8 deletions hosts.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,25 @@ all:
nodes:
children:
masters:
workers:
bootstrap:

workers:
vars:
install_device: /dev/sda
network_dns: "213.133.98.98;213.133.99.99;213.133.100.100"

network_primary_interface: enp4s0
network_mask: 26
ignition_name: worker.ign
hosts:
host05.example.com:
hetzner_ip: 138.201.33.199
network_gateway: 138.201.33.193

host06.example.com:
hetzner_ip: 138.201.33.87
network_gateway: 138.201.33.65

masters:
vars:
Expand All @@ -39,19 +58,13 @@ masters:

hosts:
host01.example.com:
internal_hostname: host01.example.com
internal_ip: 172.22.2.3
hetzner_ip: 1.1.1.35
network_gateway: 1.1.1.1
host02.example.com:
internal_hostname: host02.example.com
internal_ip: 172.22.2.4
hetzner_ip: 2.2.2.2.217
network_gateway: 2.2.2.2.193
host03.example.com:
network_primary_interface: enp3s0
internal_hostname: host03.example.com
internal_ip: 172.22.2.5
hetzner_ip: 3.3.3.3.105
network_gateway: 3.3.3.3.65

Expand All @@ -65,7 +78,5 @@ bootstrap:
ignition_name: bootstrap.ign
hosts:
host04.example.com:
internal_hostname: host04.example.com
internal_ip: 172.22.2.6
hetzner_ip: 4.4.4.4.215
network_gateway: 4.4.4.4.193
6 changes: 5 additions & 1 deletion roles/hetzner-baremetal-openshift/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ coreos_install_url: "https://mirror.openshift.com/pub/openshift-v4/clients/coreo
coreos_image_url: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-metal.x86_64.raw.gz"
coreos_image: /root/rhcos-4.7.7-x86_64-metal.x86_64.raw.gz

butane_url: "https://mirror.openshift.com/pub/openshift-v4/clients/butane/v0.12.1-1/butane-amd64"
butane: /root/butane

openshift_install_dir: "{{ playbook_dir }}/{{ cluster_name }}/"
openshift_install_command: "openshift-install"
# Part of toolbox quay.io/operate-first/hetzner-baremetal-toolbox
openshift_install_command: "/usr/local/bin/openshift-install"
# hetzner_webservice_username:
# hetzner_webservice_password:
# hetzner_hostname: "hostname.example.com"
Loading

0 comments on commit fe7beed

Please sign in to comment.