Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rke up fails in environment without default gateway #3359

Closed
antoinetran opened this issue Sep 8, 2023 · 9 comments
Closed

Rke up fails in environment without default gateway #3359

antoinetran opened this issue Sep 8, 2023 · 9 comments

Comments

@antoinetran
Copy link

RKE version:

1.4.0

Docker version: (docker version,docker info preferred)

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  scan: Docker Scan (Docker Inc., v0.21.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.21
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1c90a442489720eec95342e1789ee8a5e1b9536f
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1160.95.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 28.6GiB
 Name: ci1l2psto-s00.l2pf.continuousintegration1.mtg
 ID: JTAA:IJIO:Q3JZ:CYYD:MTHO:C6VP:7AUF:BLC3:JONZ:D5XK:JESI:6J45
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
uname -r
3.10.0-1160.95.1.el7.x86_64

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Public Cloud OVH
cluster.yml file:
Relevant part:

...
services:
  kube-api:
    # IP range for any services created on Kubernetes
    # This must match the service_cluster_ip_range in kube-controller
    service_cluster_ip_range: 10.43.0.0/16
    extra_args:
      # kubectl get events retention. Default: 1h
      event-ttl: "72h0m0s"
      # Set the level of log output to debug-level
      #v: 4
  # Note for Rancher 2 users: If you are configuring Cluster Options
  # using a Config File when creating Rancher Launched Kubernetes,
  # the names of services should contain underscores only:
  # `kube_controller`. This only applies to Rancher v2.0.5 and v2.0.6.
  kube-controller:
    # CIDR pool used to assign IP addresses to pods in the cluster
    cluster_cidr: 10.41.0.0/16
    # IP range for any services created on Kubernetes
    # This must match the service_cluster_ip_range in kube-api
    service_cluster_ip_range: 10.43.0.0/16
...

Steps to Reproduce:
Scenario (A)
Creates a pool of VMs without default gateway

ip route
10.0.0.0/16 via 10.0.21.254 dev eth0 proto static metric 102 
10.0.21.0/24 dev eth0 proto kernel scope link src 10.0.21.20 metric 102
169.254.169.254 via 192.168.21.1 dev eth1 proto dhcp metric 101 
169.254.169.254 via 10.0.21.1 dev eth0 proto dhcp metric 102 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.21.0/24 dev eth1 proto kernel scope link src 192.168.21.20 metric 101

Then do rke up

Scenario (B)

ip route
10.0.0.0/8 via 10.0.21.254 dev eth0 proto static metric 102 
10.0.21.0/24 dev eth0 proto kernel scope link src 10.0.21.20 metric 102
169.254.169.254 via 192.168.21.1 dev eth1 proto dhcp metric 101 
169.254.169.254 via 10.0.21.1 dev eth0 proto dhcp metric 102 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.21.0/24 dev eth1 proto kernel scope link src 192.168.21.20 metric 101

Scenario (C)

ip route
default via 10.0.21.254 dev eth0
10.0.0.0/8 via 10.0.21.254 dev eth0 proto static metric 102 
10.0.21.0/24 dev eth0 proto kernel scope link src 10.0.21.20 metric 102
169.254.169.254 via 192.168.21.1 dev eth1 proto dhcp metric 101 
169.254.169.254 via 10.0.21.1 dev eth0 proto dhcp metric 102 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.21.0/24 dev eth1 proto kernel scope link src 192.168.21.20 metric 101

Results:
(A) I have the exact as #1231
(B) I do not reproduce #1231 and get after this blocked step. The rke up log is in success but in reality nothing works:

kubectl get pods -A
NAME                                       READY   STATUS             RESTARTS         AGE
calico-kube-controllers-5f95bd4b77-bskxk   0/1     CrashLoopBackOff   18 (3m24s ago)   71m
calico-node-67w7p                          1/1     Running            0                43m
calico-node-8ss7c                          1/1     Running            0                43m
calico-node-fsf5g                          1/1     Running            0                43m
calico-node-lz4cs                          1/1     Running            0                43m
calico-node-p7t7q                          1/1     Running            0                43m
calico-node-zqj94                          1/1     Running            0                43m
coredns-65cd75c4d-csstq                    0/1     CrashLoopBackOff   19 (23s ago)     71m
coredns-autoscaler-f4948fc95-2442w         0/1     CrashLoopBackOff   19 (88s ago)     71m
metrics-server-67c8745cb8-lxnxh            0/1     CrashLoopBackOff   12 (5m7s ago)    42m
rke-coredns-addon-deploy-job-lc8km         0/1     Completed          0                71m
rke-ingress-controller-deploy-job-2chjn    0/1     Completed          0                71m
rke-metrics-addon-deploy-job-4l6zb         0/1     Completed          0                71m
rke-network-plugin-deploy-job-zxz2b        0/1     Completed          0                71m

If I look at logs of pods like metrics-server-67c8745cb8-lxnxh or calico-kube-controllers-5f95bd4b77-bskxk, I get a no route to host to 10.43.0.1:443, even if this is accessible from host, and from container.

curl -k https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication                                                                                                    
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

(C) The deployment works!!

It seems a default gateway is a hard prerequisite of RKE, while it should not be.

@superseb
Copy link
Contributor

The requirement for a default route has been existing for a longer time within Kubernetes and system components like CNI. Was it working before or does it work in another Kubernetes distro? See kubernetes/kubernetes#57534 and other references.

@antoinetran
Copy link
Author

The requirement for a default route has been existing for a longer time within Kubernetes and system components like CNI. Was it working before or does it work in another Kubernetes distro? See kubernetes/kubernetes#57534 and other references.

I cannot tell if it was working in previous Kubernetes version. It is the first time we deployed to an environment without default gateway. I will try to add info in the ticket you mentionned, to see if I get an official answer.

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@antoinetran
Copy link
Author

up

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@antoinetran
Copy link
Author

up

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@antoinetran
Copy link
Author

up

Copy link
Contributor

github-actions bot commented Jun 2, 2024

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants