Skip to content

Commit

Permalink
openstack: Remove the Service VM
Browse files Browse the repository at this point in the history
The experimental OpenStack backend used to create an extra server
running DNS and load balancer services that the cluster needed.
OpenStack does not always come with DNSaaS or LBaaS so we had to provide
the functionality the OpenShift cluster depends on (e.g. the etcd SRV
records, the api-int records & load balancing, etc.).

This approach is undesirable for two reasons: first, it adds an extra
node that the other IPI platforms do not need. Second, this node is a
single point of failure.

The Baremetal platform has faced the same issues and they have solved
them with a few virtual IP addresses managed by keepalived in
combination with coredns static pod running on every node using the mDNS
protocol to update records as new nodes are added or removed and a
similar static pod haproxy to load balance the control plane internally.

The VIPs are defined here in the installer and they use the
PlatformStatus field to be passed to the necessary
machine-config-operator fields:

openshift/api#374

The Bare Metal IPI Networking Infrastructure document is broadly
applicable here as well:

https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md

Notable differences in OpenStack:

* We only use the API and DNS VIPs right now
* Instead of Baremetal's Ingress VIP (which is attached to the OpenShift
  routers) our haproxy static pods balance the 80 & 443 pods to the
  worker nodes
* We do not run coredns on the bootstrap node. Instead, bootstrap itself
  uses one of the masters for DNS.

These differences are not fundamental to OpenStack and we will be
looking at aligning more closely with the Baremetal provider in the
future.

There is also a great oportunity to share some of the configuration
files and scripts here.

This change needs several other pull requests:

Keepalived plus the coredns & haproxy static pods in the MCO:
openshift/machine-config-operator/pull/740

Passing the API and DNS VIPs through the installer:
#1998

Vendoring the OpenStack PlatformStatus changes in the MCO:
openshift/machine-config-operator#978

Allowing to use PlatformStatus in the MCO templates:
openshift/machine-config-operator#943

Co-authored-by: Emilio Garcia <egarcia@redhat.com>
Co-authored-by: John Trowbridge <trown@redhat.com>
Co-authored-by: Martin Andre <m.andre@redhat.com>
Co-authored-by: Tomas Sedovic <tsedovic@redhat.com>

Massive thanks to the Bare Metal and oVirt people!
  • Loading branch information
trown authored and tomassedovic committed Jul 18, 2019
1 parent 4579e6c commit 309991a
Show file tree
Hide file tree
Showing 22 changed files with 277 additions and 584 deletions.
2 changes: 2 additions & 0 deletions data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
Expand Up @@ -19,6 +19,8 @@ fi
MACHINE_CONFIG_OPERATOR_IMAGE=$(podman run --quiet --rm ${release} image machine-config-operator)
MACHINE_CONFIG_OSCONTENT=$(podman run --quiet --rm ${release} image machine-os-content)
MACHINE_CONFIG_ETCD_IMAGE=$(podman run --quiet --rm ${release} image etcd)
# FIXME(shadower): without this, the etcd containers later on keep failing with our custom MCO. Investigate what's goin on.
podman pull --quiet $MACHINE_CONFIG_ETCD_IMAGE
MACHINE_CONFIG_KUBE_CLIENT_AGENT_IMAGE=$(podman run --quiet --rm ${release} image kube-client-agent)
MACHINE_CONFIG_INFRA_IMAGE=$(podman run --quiet --rm ${release} image pod)

Expand Down
@@ -0,0 +1,25 @@
vrrp_script chk_ocp {
# NOTE(mandre) the fake kube-api server doesn't responds to the
# https://0:6443/readyz URL, we need to find another check
script "ss -tnl | grep 6443"
interval 1
weight 50
}

vrrp_instance ${CLUSTER_NAME}_API {
state BACKUP
interface ${INTERFACE}
virtual_router_id ${API_VRID}
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass ${CLUSTER_NAME}_api_vip
}
virtual_ipaddress {
${API_VIP}/${NET_MASK}
}
track_script {
chk_ocp
}
}
10 changes: 10 additions & 0 deletions data/data/bootstrap/openstack/files/usr/local/bin/fletcher8
@@ -0,0 +1,10 @@
#!/usr/libexec/platform-python
import sys

data = map(ord, sys.argv[1])
ckA = ckB = 0

for b in data:
ckA = (ckA + b) & 0xf
ckB = (ckB + ckA) & 0xf
print((ckB << 4) | ckA )
@@ -0,0 +1,24 @@
#!/usr/libexec/platform-python
import sys
import socket
import struct

vip = sys.argv[1]
iface_cidrs = sys.argv[2].split()
vip_int = struct.unpack("!I", socket.inet_aton(vip))[0]

for iface_cidr in iface_cidrs:
ip, prefix = iface_cidr.split('/')
ip_int = struct.unpack("!I", socket.inet_aton(ip))[0]
prefix_int = int(prefix)
mask = int('1' * prefix_int + '0' * (32 - prefix_int), 2)
subnet_ip_int_min = ip_int & mask
subnet_ip = socket.inet_ntoa(struct.pack("!I", subnet_ip_int_min))
subnet_ip_int_max = subnet_ip_int_min | int('1' * (32 - prefix_int), 2)
subnet_ip_max = socket.inet_ntoa(struct.pack("!I", subnet_ip_int_max))
sys.stderr.write('Is %s between %s and %s\n' % (vip, subnet_ip, subnet_ip_max))
if subnet_ip_int_min < vip_int < subnet_ip_int_max:
subnet_ip = socket.inet_ntoa(struct.pack("!I", subnet_ip_int_min))
print('%s/%s' % (subnet_ip, prefix))
sys.exit(0)
sys.exit(1)
@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -e

mkdir --parents /etc/keepalived

# TODO(shadower): switch to the keepalived image from the release:
# https://github.com/openshift/installer/pull/2025/files#diff-ce82c1d8a44f7dfc41dfc024085ccfeeR24
KEEPALIVED_IMAGE=quay.io/celebdor/keepalived:latest
if ! podman inspect "$KEEPALIVED_IMAGE" &>/dev/null; then
echo "Pulling release image..."
podman pull "$KEEPALIVED_IMAGE"
fi

# TODO(shadower): at least some of these can be passed into this
# template rather than discovered at runtime:
API_DNS="$(sudo awk -F[/:] '/apiServerURL/ {print $5}' /opt/openshift/manifests/cluster-infrastructure-02-config.yml)"
CLUSTER_NAME="$(awk -F. '{print $2}' <<< "$API_DNS")"
API_VIP="{{ .InstallConfig.Platform.OpenStack.APIVIP }}"
IFACE_CIDRS="$(ip addr show | grep -v "scope host" | grep -Po 'inet \K[\d.]+/[\d.]+' | xargs)"
SUBNET_CIDR="$(/usr/local/bin/get_vip_subnet_cidr "$API_VIP" "$IFACE_CIDRS")"
NET_MASK="$(echo "$SUBNET_CIDR" | cut -d "/" -f 2)"
INTERFACE="$(ip -o addr show to "$SUBNET_CIDR" | head -n 1 | awk '{print $2}')"
CLUSTER_DOMAIN="${API_DNS#*.}"

# Virtual Router IDs. They must be different and 8 bit in length
API_VRID=$(/usr/local/bin/fletcher8 "$CLUSTER_NAME-api")
DNS_VRID=$(/usr/local/bin/fletcher8 "$CLUSTER_NAME-dns")

export API_VIP
export CLUSTER_NAME
export INTERFACE
export API_VRID
export NET_MASK
envsubst < /etc/keepalived/keepalived.conf.tmpl | sudo tee /etc/keepalived/keepalived.conf

MATCHES="$(sudo podman ps -a --format "{{`{{.Names}}`}}" | awk '/keepalived$/ {print $0}')"
if [[ -z "$MATCHES" ]]; then
# TODO(bnemec): Figure out how to run with less perms
podman create \
--name keepalived \
--volume /etc/keepalived:/etc/keepalived:z \
--network=host \
--privileged \
--cap-add=ALL \
"${KEEPALIVED_IMAGE}" \
/usr/sbin/keepalived -f /etc/keepalived/keepalived.conf \
--dont-fork -D -l -P
fi
18 changes: 18 additions & 0 deletions data/data/bootstrap/openstack/systemd/units/keepalived.service
@@ -0,0 +1,18 @@
[Unit]
Description=Manage node VIPs with keepalived
Wants=network-online.target
After=network-online.target

[Service]
WorkingDirectory=/etc/keepalived
ExecStartPre=/usr/local/bin/keepalived.sh
ExecStart=/usr/bin/podman start -a keepalived
ExecStop=/usr/bin/podman stop -t 10 keepalived
ConditionPathExists=!/etc/pivot/image-pullspec

Restart=on-failure
RestartSec=5
TimeoutStartSec=600

[Install]
WantedBy=multi-user.target
62 changes: 47 additions & 15 deletions data/data/openstack/bootstrap/main.tf
Expand Up @@ -18,40 +18,73 @@ data "ignition_config" "redirect" {

files = [
data.ignition_file.hostname.id,
data.ignition_file.bootstrap_ifcfg.id,
data.ignition_file.dns_conf.id,
data.ignition_file.dhcp_conf.id,
data.ignition_file.hosts.id,
]
}

data "ignition_file" "bootstrap_ifcfg" {
data "ignition_file" "dhcp_conf" {
filesystem = "root"
mode = "420" // 0644
path = "/etc/sysconfig/network-scripts/ifcfg-eth0"
mode = "420"
path = "/etc/NetworkManager/conf.d/dhcp-client.conf"

content {
content = <<EOF
DEVICE="eth0"
BOOTPROTO="dhcp"
ONBOOT="yes"
TYPE="Ethernet"
PERSISTENT_DHCLIENT="yes"
DNS1="${var.service_vm_fixed_ip}"
PEERDNS="no"
NM_CONTROLLED="yes"
[main]
dhcp=dhclient
EOF
}
}

data "ignition_file" "dns_conf" {
filesystem = "root"
mode = "420"
path = "/etc/dhcp/dhclient.conf"

# FIXME(mandre) this will likely cause delay with bootstrap node networking
# until the master come up and are able to serve DNS queries. Not sure the
# bootstrap is trying to resolve anything it doesn't have in its hosts
# file...
# BareMetal solved this by running coredns on the bootstrap node
#
# NOTE(shadower) bootstrap's waiting for the etcd cluster seems to
# always fail the first time because of this. The second attempt
# succeeds, but we should probably run the core dns there too so
# that:
# 1. We don't show spurious errors in the logs
# 2. Align better with what the baremetal platfrorm is doing
content {
content = <<EOF
send dhcp-client-identifier = hardware;
prepend domain-name-servers ${var.node_dns_ip};
EOF
}
}

data "ignition_file" "hostname" {
filesystem = "root"
mode = "420" // 0644
path = "/etc/hostname"
mode = "420" // 0644
path = "/etc/hostname"

content {
content = <<EOF
${var.cluster_id}-bootstrap
EOF
}
}

data "ignition_file" "hosts" {
filesystem = "root"
mode = "420" // 0644
path = "/etc/hosts"

content {
content = <<EOF
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
${var.api_int_ip} api-int.${var.cluster_domain} api.${var.cluster_domain}
EOF
}
}

Expand Down Expand Up @@ -81,4 +114,3 @@ resource "openstack_compute_instance_v2" "bootstrap" {
openshiftClusterID = var.cluster_id
}
}

5 changes: 4 additions & 1 deletion data/data/openstack/bootstrap/variables.tf
Expand Up @@ -33,7 +33,10 @@ variable "bootstrap_port_id" {
description = "The subnet ID for the bootstrap node."
}

variable "service_vm_fixed_ip" {
variable "api_int_ip" {
type = string
}

variable "node_dns_ip" {
type = string
}
59 changes: 21 additions & 38 deletions data/data/openstack/main.tf
Expand Up @@ -22,66 +22,47 @@ provider "openstack" {
user_name = var.openstack_credentials_user_name
}

module "service" {
source = "./service"
module "bootstrap" {
source = "./bootstrap"

swift_container = openstack_objectstorage_container_v1.container.name
cluster_id = var.cluster_id
cluster_domain = var.cluster_domain
image_name = var.openstack_base_image
flavor_name = var.openstack_master_flavor_name
ignition = var.ignition_bootstrap
lb_floating_ip = var.openstack_lb_floating_ip
service_port_id = module.topology.service_port_id
service_port_ip = module.topology.service_port_ip
master_ips = module.topology.master_ips
master_port_names = module.topology.master_port_names
bootstrap_ip = module.topology.bootstrap_port_ip
}

module "bootstrap" {
source = "./bootstrap"

swift_container = openstack_objectstorage_container_v1.container.name
cluster_id = var.cluster_id
cluster_domain = var.cluster_domain
image_name = var.openstack_base_image
flavor_name = var.openstack_master_flavor_name
ignition = var.ignition_bootstrap
bootstrap_port_id = module.topology.bootstrap_port_id
service_vm_fixed_ip = module.topology.service_vm_fixed_ip
bootstrap_port_id = module.topology.bootstrap_port_id
api_int_ip = var.openstack_api_int_ip
node_dns_ip = var.openstack_node_dns_ip
}

module "masters" {
source = "./masters"

base_image = var.openstack_base_image
bootstrap_ip = module.topology.bootstrap_port_ip
cluster_id = var.cluster_id
cluster_domain = var.cluster_domain
flavor_name = var.openstack_master_flavor_name
instance_count = var.master_count
lb_floating_ip = var.openstack_lb_floating_ip
master_ips = module.topology.master_ips
master_port_ids = module.topology.master_port_ids
master_port_names = module.topology.master_port_names
user_data_ign = var.ignition_master
service_vm_fixed_ip = module.topology.service_vm_fixed_ip
api_int_ip = var.openstack_api_int_ip
node_dns_ip = var.openstack_node_dns_ip
base_image = var.openstack_base_image
bootstrap_ip = module.topology.bootstrap_port_ip
cluster_id = var.cluster_id
cluster_domain = var.cluster_domain
flavor_name = var.openstack_master_flavor_name
instance_count = var.master_count
lb_floating_ip = var.openstack_lb_floating_ip
master_ips = module.topology.master_ips
master_port_ids = module.topology.master_port_ids
user_data_ign = var.ignition_master
api_int_ip = var.openstack_api_int_ip
node_dns_ip = var.openstack_node_dns_ip
master_sg_ids = concat(
var.openstack_master_extra_sg_ids,
[module.topology.master_sg_id],
)
}

# TODO(shadower) add a dns module here

module "topology" {
source = "./topology"

cidr_block = var.machine_cidr
cluster_id = var.cluster_id
cluster_domain = var.cluster_domain
external_network = var.openstack_external_network
external_network_id = var.openstack_external_network_id
masters_count = var.master_count
Expand All @@ -98,9 +79,11 @@ resource "openstack_objectstorage_container_v1" "container" {
# "kubernetes.io/cluster/${var.cluster_id}" = "owned"
metadata = merge(
{
"Name" = "${var.cluster_id}-ignition-master"
"Name" = "${var.cluster_id}-ignition"
"openshiftClusterID" = var.cluster_id
},
# FIXME(mandre) the openstack_extra_tags should be applied to all resources
# created
var.openstack_extra_tags,
)
}
Expand Down
4 changes: 1 addition & 3 deletions data/data/openstack/masters/main.tf
Expand Up @@ -17,7 +17,6 @@ data "ignition_file" "hostname" {
content = <<EOF
${var.cluster_id}-master-${count.index}
EOF

}
}

Expand All @@ -30,9 +29,7 @@ data "ignition_file" "clustervars" {
content = <<EOF
export API_VIP=${var.api_int_ip}
export DNS_VIP=${var.node_dns_ip}
export FLOATING_IP=${var.lb_floating_ip}
export BOOTSTRAP_IP=${var.bootstrap_ip}
${replace(join("\n", formatlist("export MASTER_FIXED_IPS_%s=%s", var.master_port_names, var.master_ips)), "${var.cluster_id}-master-port-", "")}
EOF
}
}
Expand Down Expand Up @@ -67,6 +64,7 @@ resource "openstack_compute_instance_v2" "master_conf" {
}

metadata = {
# FIXME(mandre) shouldn't it be "${var.cluster_id}-master-${count.index}" ?
Name = "${var.cluster_id}-master"
# "kubernetes.io/cluster/${var.cluster_id}" = "owned"
openshiftClusterID = var.cluster_id
Expand Down
8 changes: 0 additions & 8 deletions data/data/openstack/masters/variables.tf
Expand Up @@ -43,10 +43,6 @@ variable "master_port_ids" {
description = "List of port ids for the master nodes"
}

variable "master_port_names" {
type = list(string)
}

variable "user_data_ign" {
type = string
}
Expand All @@ -58,7 +54,3 @@ variable "api_int_ip" {
variable "node_dns_ip" {
type = string
}

variable "service_vm_fixed_ip" {
type = string
}

0 comments on commit 309991a

Please sign in to comment.