Dalton Hubble edited this page Nov 19, 2018 · 4 revisions

Errata

Inconsistencies or known issues on platforms that Typhoon cannot yet directly address.

AWS

  • Switching from spot workers to on-demand workers in-place is not possible

Azure

  • Azure virtual networks must have non-overlapping IPv4 CIDRs (e.g. 10.0.0.0/20 for the 1st cluster, 10.0.16.0/20 for the 2nd cluster, etc). You cannot just use 10.0.0.0/16 for all clusters.
  • Calico and Network Policy are not available on Azure. Azure does not allow traffic with unknown source IPs.
  • Worker nodes that are evicted do not always delete themselves from the Kubernetes cluster as they should due to a race in rkt. Sometimes, manual kubectl delete NODE is needed.
  • To SSH to workers, you must agent-forward SSH to a controller and then to the worker's private IP.
  • Azure scale set instances use a (rather bothersome) multiple disk setup. We choose not to add a managed /dev/sdc data disk to workers. disk_size only affects controllers.
  • Additional worker pools must use the same region as the cluster (advanced).
  • Azure clusters use availability sets and fault domains, but modern notions of zones within regions are still being introduced. Azure clusters are not multi-zonal like AWS and GCP clusters.
  • Azure blocks outbound ICMP
  • Azure node to node networking bandwidth is not up to par with AWS and GCP. It's possible to choose certain machine types with adequate networking.

UX

  • Azure load balancer UI doesn't show which instances in a backend pool are passing or failing health checks.
  • Azure scale set UI doesn't show adequate details about instances (e.g. private IP, health, disks)
  • Azure credentials expire often, requiring re-running az login or touching active directory.

Terraform

  • Removing a cluster requires two runs of terraform apply. Deleting the network and NICs takes too long.

Bare-Metal

None

DigitalOcean

  • Calico and Network Policy are not available on DigitalOcean. DigitalOcean does not allow IP tunneling protocol firewall rules.

Google Cloud

  • kube-apiserver uses 443 instead of 6443. Google Cloud TCP Proxy load balancers don't allow port 6443 and TCP/UDP load balancers (legacy, regional) would be a non-starter for multi-master.
  • kubectl commands logs, exec, and port-forward disconnect after 60 seconds. TCP proxy load balancers

Terraform

  • Removing a cluster requires two runs of terraform apply. Deleting the network takes too long.
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.