Skip to content

Commit

Permalink
Cleanup & enable IPSec (#7)
Browse files Browse the repository at this point in the history
* Add toolbox (Containerfile) with ansible + dependency

* Update comments

* Add hrobot firewall settings

* Update README.md

* Restructure ansible roles

* Cleanup README.md

* Add .gitignore - ignoring __pycache__

* Remove obsolete vlan interface

* Cleanup documentation

* Remove useless ansible code

* Update openshift_install_dir

* Enable tags for run-installer

* Add playbook to wipe disks

* Enable ipsec

* Update README, last installations runs perfect

* Add newline at the and of file

* Remove useless char
  • Loading branch information
rbo committed Jun 3, 2021
1 parent 8706c97 commit acf8d3b
Show file tree
Hide file tree
Showing 39 changed files with 359 additions and 709 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__
15 changes: 15 additions & 0 deletions Containerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Build:
# podman build -t quay.io/rbo/hetzner-baremetal-openshift-toolbox:latest .
#
#
# podman stop hetzner-baremetal-openshift-toolbox-latest
# podman rm hetzner-baremetal-openshift-toolbox-latest
# toolbox create --image quay.io/rbo/hetzner-baremetal-openshift-toolbox:latest
# toolbox enter hetzner-baremetal-openshift-toolbox-latest


FROM registry.fedoraproject.org/fedora-toolbox
RUN \
dnf install -y ansible pip python3-google-auth.noarch && \
pip install hcloud && \
ansible-galaxy collection install -p /usr/share/ansible/collections community.hrobot
210 changes: 63 additions & 147 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,193 +2,109 @@

## Network Overview

![Network overview](docs/network-overview-v3.png)
![Network overview](docs/nimages/etwork-overview-v3.png)

## Installation

## Issues to solve
### Create initial hosts.yaml based on hosts.yaml.example

### prodect machine config server (22623)

Currently :22623 is public available, it should be only available for nodes.


Tcpdump on host01
```
19:56:48.618128 IP 49.12.23.25.26634 > 94.130.55.35.22623: Flags [S], seq 1010699155, win 64240, options [mss 1460,sackOK,TS val 1548548017 ecr 0,nop,wscale 7], length 0
19:56:48.618187 IP 94.130.55.35.22623 > 49.12.23.25.26634: Flags [S.], seq 1554818295, ack 1010699156, win 28960, options [mss 1460,sackOK,TS val 2776963539 ecr 1548548017,nop,wscale 7], length 0
19:56:48.618432 IP 49.12.23.25.26634 > 94.130.55.35.22623: Flags [.], ack 1, win 502, options [nop,nop,TS val 1548548018 ecr 2776963539], length 0
19:56:48.640778 IP 49.12.23.25.26634 > 94.130.55.35.22623: Flags [P.], seq 1:518, ack 1, win 502, options [nop,nop,TS val 1548548040 ecr 2776963539], length 517
19:56:48.640789 IP 94.130.55.35.22623 > 49.12.23.25.26634: Flags [.], ack 518, win 235, options [nop,nop,TS val 2776963561 ecr 1548548040], length 0
19:56:48.642483 IP 94.130.55.35.22623 > 49.12.23.25.26634: Flags [P.], seq 1:1578, ack 518, win 235, options [nop,nop,TS val 2776963563 ecr 1548548040], length 1577
19:
```bash
cp -v hosts.yaml.example hosts.yaml
$EDITOR hosts.yaml
```
Source Adress is the address of the load balancer :-(

### Hardware preperations

* Order a BareMetal Server - Issue for sizing: #6

* Configure DNS ( A & PTR ) for BareMetal Server
`<hostname>.emea.operate-first.cloud`

## High level steps
* [Install Centos 8 to determine the network interface name](docs/install-centos-8.md)

### Order bare metal server
### Setup Network
#### Attach dedicated server to vSwitch
#### Configure dedicated server firewall (allow internal traffic)
* Add server to `hosts.yaml`

![](docs/firewall-example.png)
* Run a RH CoreOS Test installation with ssh-only ignition
```bash
./reset-server.yaml [-l hostname]
# SSh into rescue system and run coreos-install command printed out at the end ot the playbook.

### Setup Load Balancer & DNS at hcloud
#### Start fedora 33 instance
```

After dnf update you have to fix dns -> fedora bug?
```
systemctl stop systemd-resolved.service ; systemctl disable systemd-resolved.service
Check installation, server boot? Can connect via SSH?

rm -f /etc/resolv.conf
ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
* Boot Rescue mode - should fail! :-)

cat /etc/resolve.conf
nameserver 213.133.99.99
nameserver 213.133.100.100
nameserver 213.133.98.98
```bash
./force-rescue-mode.yaml [-l hostname]
```

```
#### Connect network with vSwitch
Check if resecue system is booted:

#### Setup Proxy server
**If NOT**: File a ticket to switch into EFI boot (Example Tickets: Ticket#2021050503020988, Ticket#2021050603003594, Ticket#2021051903013942)

**Why proxy setup?**
Because of openshift installation use interfaces with a default gateway for main interface decision. We changed the default gw to 172.22.2.1 at vlan4000 interface to force to use the VLAN interface IP.
* Check if you can switch between RH CoreOS and rescue mode.

Additional we decided to complete disable public IP because bootstrap pick the first IP and this is the public one so bootstrap etcd member uses public API but all other nodes do not have access anymore to public IPs.
* Configure firewall
```bash
./configure-hrobot-firewall.yaml [-l hostname]
```

On the nodes the server nodeip-configuration.service takes care of the kubelet IP:
```
systemctl status nodeip-configuration.service
systemctl cat nodeip-configuration.service
```
### DNS & load balancer preperations

* Configure load balancer:
* Public for api & ingress
* Private for api-int

```
dnf -y install tinyproxy
```
* Configure DNS entries for
* `api.<cluster_name>.emea.operate-first.cloud`
* `api-int.<cluster_name>.emea.operate-first.cloud`
* `*.apps.<cluster_name>.emea.operate-first.cloud`

Add to `/etc/tinyproxy/tinyproxy.conf` :
All steps are done with one single playbook:

```
Listen 172.22.1.10
Allow 172.22.0.0/16
```
**Important: this step is not idempotent** (Issue #5)

Start
```bash
./configure-lb-and-dns.yaml
```
systemctl enable --now tinyproxy
systemctl status tinyproxy

```
### OpenShift installation

#### Setup Load Balancer
Prerequisites:
* At least 4 prepared nodes (see Hardware preperations)
* DNS & load balancer preperations

```
dnf -y install podman
```

```
cat > /etc/systemd/system/openshift-4-loadbalancer.service <<EOF
[Unit]
Description=OpenShift 4 LoadBalancer CLUSTER
After=network.target
[Service]
Type=simple
TimeoutStartSec=5m
ExecStartPre=-/usr/bin/podman rm "openshift-4-loadbalancer"
ExecStartPre=/usr/bin/podman pull quay.io/redhat-emea-ssa-team/openshift-4-loadbalancer
ExecStart=/usr/bin/podman run --name openshift-4-loadbalancer --net host \
-e API=bootstrap=172.22.2.6:6443,master-0=172.22.2.3:6443,master-1=172.22.2.4:6443,master-3=172.22.2.5:6443 \
-e API_LISTEN=78.46.236.55:6443,172.22.1.10:6443 \
-e INGRESS_HTTP=master-0=172.22.2.3:80,master-1=172.22.2.4:80,master-3=172.22.2.5:80 \
-e INGRESS_HTTP_LISTEN=78.46.236.55:80,172.22.1.10:80 \
-e INGRESS_HTTPS=master-0=172.22.2.3:443,master-1=172.22.2.4:443,master-3=172.22.2.5:443 \
-e INGRESS_HTTPS_LISTEN=78.46.236.55:443,172.22.1.10:443 \
-e MACHINE_CONFIG_SERVER=bootstrap=172.22.2.6:22623,master-0=172.22.2.3:22623,master-1=172.22.2.4:22623,master-3=172.22.2.5:22623 \
-e MACHINE_CONFIG_SERVER_LISTEN=172.22.1.10:22623 \
-e STATS_LISTEN=127.0.0.1:1984 \
-e STATS_ADMIN_PASSWORD=aengeo4oodoidaiP \
-e HAPROXY_CLIENT_TIMEOUT=1m \
-e HAPROXY_SERVER_TIMEOUT=1m \
quay.io/redhat-emea-ssa-team/openshift-4-loadbalancer
ExecReload=-/usr/bin/podman stop "openshift-4-loadbalancer"
ExecReload=-/usr/bin/podman rm "openshift-4-loadbalancer"
ExecStop=-/usr/bin/podman stop "openshift-4-loadbalancer"
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target
EOF
```
#### Reset server

```
systemctl daemon-reload
systemctl enable --now openshift-4-loadbalancer.service
```
Boots into rescue mode and prepare rescue system to install Red Hat CoreOS


#### Setup DNS

```
dnf -y install dnsmasq
```bash
./reset-server.yaml
```

```
cat > /etc/dnsmasq.conf << EOF
no-resolv
server=213.133.100.100
server=213.133.99.99
server=213.133.98.98
address=/apps.openshift.pub/172.22.1.10
user=dnsmasq
group=dnsmasq
listen-address=172.22.1.10
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig
bind-dynamic
except-interface=lo
EOF
```
#### Create ignition config and transfer to hosts

```
systemctl daemon-reload
systemctl enable --now openshift-4-loadbalancer.service
```bash
./create-ignition.yaml
```

Add to `/etc/hosts`
```
172.22.1.10 api-int.openshift.pub api.openshift.pub
172.22.2.3 master-0.compute.local
172.22.2.4 master-1.compute.local
172.22.2.5 master-3.compute.local
172.22.2.6 bootstrap.compute.local
```
#### Run the installer


### Install coreos on hetzner baremetal
```bash
./run-installer.yaml
```

**Important: RHCOS supports only ignition.version 3.1.0**
![](docs/ioctl-error.png)
Optional split it into two steps:

Boot into rescue system:
```bash
./run-installer.yaml --skip-tags reboot
# Check output
./run-installer.yaml --tags reboot
```
curl -L -O https://mirror.openshift.com/pub/openshift-v4/clients/coreos-installer/v0.6.0-3/coreos-installer
chmod +x coreos-installer

curl -L -O https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.6/4.6.1/rhcos-4.6.1-x86_64-metal.x86_64.raw.gz
### OpenShift reinstallation

./coreos-installer install /dev/nvme0n1 \
--ignition-file config.ignition \
--copy-network --firstboot-args="rd.neednet=1" \
--network-dir ./network-config/ \
--insecure \
--image-file rhcos-4.6.1-x86_64-metal.x86_64.raw.gz
```
Its recommended to wipe the disk with `wipe-server.yaml` playbook bevor reinstallation.
8 changes: 8 additions & 0 deletions configure-hrobot-firewall.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env ansible-playbook
---
- hosts: nodes,bootstrap
gather_facts: false
tasks:
- include_role:
name: hetzner-baremetal-openshift
tasks_from: configure-firewall.yaml
9 changes: 6 additions & 3 deletions configure-lb-and-dns.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
hosts: localhost
connection: local
gather_facts: no
roles:
- hcloud
tasks:
- include_role:
name: hetzner-baremetal-openshift
tasks_from: create-dns-lb.yaml

- name: Basis installation
hosts: lb
Expand All @@ -15,7 +17,8 @@
package:
name:
- haproxy
# To get stats: echo "show stat" | nc -U /var/lib/haproxy/stats | cut -d "," -f 1,2,18,57| column -s, -t;
# To get stats:
# echo "show stat" | nc -U /var/lib/haproxy/stats | cut -d "," -f 1,2,18,57| column -s, -t;
- nc
state: present

Expand Down
2 changes: 1 addition & 1 deletion 02_create-ignition.yaml → create-ignition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
- hosts: nodes,bootstrap
tasks:
- include_role:
name: provision-hetzner
name: hetzner-baremetal-openshift
tasks_from: create-ignition.yaml
Binary file removed docs/firewall-example.png
Binary file not shown.
13 changes: 0 additions & 13 deletions docs/ignition/first-boot.fcc

This file was deleted.

26 changes: 0 additions & 26 deletions docs/ignition/first-boot.ign

This file was deleted.

File renamed without changes
File renamed without changes
7 changes: 7 additions & 0 deletions docs/install-centos-8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Install Centos 8

* Open <https://robot.your-server.de/server>
* Select server
* Select tab "Linux"

![Centos 8 installation](images/centos-8-installation.png)
Loading

0 comments on commit acf8d3b

Please sign in to comment.