Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd fails to find hostname certs with capital letters #2485

Closed
davidcorbin opened this issue Mar 11, 2021 · 2 comments
Closed

Etcd fails to find hostname certs with capital letters #2485

davidcorbin opened this issue Mar 11, 2021 · 2 comments
Assignees
Labels

Comments

@davidcorbin
Copy link

RKE version:
1.2.6

Docker version: (docker version,docker info preferred)
Docker version 20.10.5, build 55c4c88

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Kernel: 4.15.0-135-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Bare Metal

cluster.yml file:

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: UBT-4198.davcor.co
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - worker
  - etcd
  hostname_override: ""
  user: username
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: UBT-4199.davcor.co
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - worker
  - etcd
  hostname_override: ""
  user: username
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: UBT-4200.davcor.co
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - worker
  - etcd
  hostname_override: ""
  user: username
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: true
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
network:
  plugin: flannel
  options: {}
  mtu: 0
  node_selector: {}
  update_strategy: null
  tolerations: []
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.4.14-rancher1
  alpine: rancher/rke-tools:v0.1.72
  nginx_proxy: rancher/rke-tools:v0.1.72
  cert_downloader: rancher/rke-tools:v0.1.72
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.72
  kubedns: rancher/k8s-dns-kube-dns:1.15.10
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.10
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.10
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.8.1
  coredns: rancher/coredns-coredns:1.8.0
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.8.1
  nodelocal: rancher/k8s-dns-node-cache:1.15.13
  kubernetes: rancher/hyperkube:v1.20.4-rancher1
  flannel: rancher/coreos-flannel:v0.13.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher6
  calico_node: rancher/calico-node:v3.17.2
  calico_cni: rancher/calico-cni:v3.17.2
  calico_controllers: rancher/calico-kube-controllers:v3.17.2
  calico_ctl: rancher/calico-ctl:v3.17.2
  calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.17.2
  canal_node: rancher/calico-node:v3.17.2
  canal_cni: rancher/calico-cni:v3.17.2
  canal_controllers: rancher/calico-kube-controllers:v3.17.2
  canal_flannel: rancher/coreos-flannel:v0.13.0-rancher1
  canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.17.2
  weave_node: weaveworks/weave-kube:2.8.1
  weave_cni: weaveworks/weave-npc:2.8.1
  pod_infra_container: rancher/pause:3.2
  ingress: rancher/nginx-ingress-controller:nginx-0.43.0-rancher1
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.4.1
  windows_pod_infra_container: rancher/kubelet-pause:v0.1.6
  aci_cni_deploy_container: noiro/cnideploy:5.1.1.0.1ae238a
  aci_host_container: noiro/aci-containers-host:5.1.1.0.1ae238a
  aci_opflex_container: noiro/opflex:5.1.1.0.1ae238a
  aci_mcast_container: noiro/opflex:5.1.1.0.1ae238a
  aci_ovs_container: noiro/openvswitch:5.1.1.0.1ae238a
  aci_controller_container: noiro/aci-containers-controller:5.1.1.0.1ae238a
  aci_gbp_server_container: noiro/gbp-server:5.1.1.0.1ae238a
  aci_opflex_server_container: noiro/opflex-server:5.1.1.0.1ae238a
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: null
kubernetes_version: ""
private_registries: []
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
  update_strategy: null
  http_port: 0
  https_port: 0
  network_mode: ""
  tolerations: []
  default_backend: null
  default_http_backend_priority_class_name: ""
  nginx_ingress_controller_priority_class_name: ""
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
win_prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: ""
  options: {}
  node_selector: {}
  update_strategy: null
  replicas: null
  tolerations: []
  metrics_server_priority_class_name: ""
restore:
  restore: false
  snapshot_name: ""
rotate_encryption_key: false
dns: null

Steps to Reproduce:

rke up

Results:
When run with capital letters in node hostnames, etcd fails to start because the generated .pem files are lowercase but the etcd docker container is looking for files with capital letters.

@aaronyeeski
Copy link

aaronyeeski commented May 28, 2021

This issue is reproduced with RKE version v1.2.6 with the following steps:
Bring up a Ubuntu node. Ubuntu version 20.04, docker version 20.10.03
Create a DNS record pointing to the node IP
Bring up a cluster with rke up using following yaml

nodes:
  - address: AARON.qa.rancher
    user: ubuntu
    role: [etcd, controlplane, worker]
    ssh_key_path: key.pem
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

Cluster creation fails with:

INFO[0101] [etcd] Successfully started etcd plane.. Checking etcd cluster health
WARN[0207] [etcd] host [AARON.qa.rancher] failed to check etcd health: failed to get /health for host [AARON.qa.rancher]: Get "https://AARON.qa.rancher:2379/health": Unable to access the service on AARON.qa.rancher:2379. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)
FATA[0207] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [AARON.qa.rancher] failed to report healthy. Check etcd container logs on each host for more information

@aaronyeeski
Copy link

The bug fix is verified with RKE version v1.3.0-rc2
Tested with the same steps as above.
Cluster creation succeeds.

INFO[0165] [ingress] ingress controller nginx deployed successfully
INFO[0165] [addons] Setting up user addons
INFO[0165] [addons] no user addons defined
INFO[0165] Finished building Kubernetes cluster successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants