Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline install or run with no gateway set - k3s service won't run #1103

Closed
danielbarron42 opened this issue Nov 19, 2019 · 6 comments
Closed
Assignees
Labels
kind/documentation Improvements or additions to documentation

Comments

@danielbarron42
Copy link

danielbarron42 commented Nov 19, 2019

Version:
k3s version v0.9.1 (755bd1c)
/usr/local/bin/k3s server --write-kubeconfig-mode 664 --no-deploy traefik --docker --cluster-cidr 10.244.0.0/16

Describe the bug
I am using k3s in air gap/offline environments. I can install and run successfully without any internet access, but only if a gateway address is set. To be truly offline/air gap, I would like to be able to run and install without a gateway set. If I don't, for example during install I get:

systemctl status k3s.service -l
● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Tue 2019-11-19 06:32:10 PST; 1min 56s ago
     Docs: https://k3s.io
  Process: 5175 ExecStart=/usr/local/bin/k3s server --write-kubeconfig-mode 664 --no-deploy traefik --docker --cluster-cidr 10.244.0.0/16 (code=exited, status=1/FAILURE)
  Process: 5173 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 5170 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
 Main PID: 5175 (code=exited, status=1/FAILURE)

Nov 19 06:32:10 myhostname k3s[5175]: -v, --v Level                          number for the log level verbosity
Nov 19 06:32:10 myhostname k3s[5175]: --version version[=true]           Print version information and quit
Nov 19 06:32:10 myhostname k3s[5175]: --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
Nov 19 06:32:10 myhostname k3s[5175]: time="2019-11-19T06:32:10.544522701-08:00" level=fatal msg="apiserver exited: unable to find suitable network address.error='no default routes found in \"/proc/net/route\" or \"/proc/net/ipv6_route\"'. Try to set the AdvertiseAddress directly or provide a valid BindAddress to fix this"
Nov 19 06:32:10 myhostname systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 06:32:10 myhostname systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 06:32:10 myhostname systemd[1]: start request repeated too quickly for k3s.service
Nov 19 06:32:10 myhostname systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 06:32:10 myhostname systemd[1]: Unit k3s.service entered failed state.
Nov 19 06:32:10 myhostname systemd[1]: k3s.service failed.

There are circumstances where the gateway may become unset after installation as well which takes out some of the pods and causes remote (via local LAN in the same subnet via a service in the affected pods) access to be unavailable.

I believe this is caused by the code which tries to determine what IP to use for the node.

I have tried specifying --advertise-address as the IP of the node as well as trying 127.0.0.1. I tried the same with --bind-address.

To Reproduce
Attempt air gap install with no network gateway set.

Expected behavior
Install to succeed and all normal functionality to work. And post install, if the gateway is removed, all normal functionality to continue.

Actual behavior
Install fails. Service errors as listed above.

@erikwilson
Copy link
Contributor

Sorry, it used to be documented that a default route is needed for air-gap. The docs were something like this:


If networking is completely disabled k3s may not be able to start (ie ethernet unplugged or wifi disconnected), in which case it may be necessary to add a default route. For example:

sudo ip -c address add 192.168.123.123/24 dev eno1	
sudo ip route add default via 192.168.123.1	

We should investigate what flags are needed to work without a default route, add better docs, and maybe check for a default route in check-config.

@danielbarron42
Copy link
Author

Thank you for looking at this.

I have already scripted checking for a default route when installing. For example:

if [ -z "$(ip route | grep default)" ]; then
    echo "default route missing"
    exit 1
fi

Docs are nice. What would help the most is being able to install without a default route.

I have looked at the code and could not see anything 'easy'. The main problem is "how to determine the IP of the node" - especially on a host with more than one NIC. The simple answer is "it's the one with the default route" - so I can see why it's currently like that.

@davidnuzik davidnuzik added [zube]: To Triage kind/documentation Improvements or additions to documentation labels Nov 25, 2019
@davidnuzik davidnuzik added this to the v1.x - Backlog milestone Nov 25, 2019
@danielbarron42
Copy link
Author

danielbarron42 commented Jan 8, 2020

I have found a workaround which allows a true air-gap installation:

#/etc/sysconfig/network-scripts/ifcfg-tap0
DEVICE=tap0
ONBOOT=yes
BOOTPROTO=none
TYPE=Tap
DEFROUTE=no
IPADDR=10.243.255.254
PREFIX=32

ifup tap0

Add --flannel-iface tap0 to ExecStart=/usr/local/bin/k3s server in /etc/systemd/system/k3s.service

Alternatively just add --flannel-iface <your eth interface>

I traced the problem to flannel wanting to know what IP to bind to, to do that it looks for which interface has the default route and obtains its IP address. By specifying the interface it does not need a default route. By specifying a tap interface you don't even need an ethernet interface up with an IP.

It would be better if it did some of this itself and didn't rely on there being a default route. So I don't think this is only a documentation defect/feature request.

@branttaylor
Copy link

interested in this too. my use case is that i keep my RPi k3s cluster off any network (in a camper) periodically and then bring it back home and put it back on my home network. k3s works great when connected to my home network, but when it has no connection to a network (and therefore no default gateway), the k3s service fails to start.

@erulabs
Copy link
Contributor

erulabs commented Sep 2, 2022

This would really be a nice benefit if k3s could more reliably start in different network conditions. Testing @danielbarron42's suggestion - but this becomes a real foot-gun with k3s currently!

One note that I haven't 100% confirmed yet - starting the cluster with --cluster-cidr= seems to side-step this issue.

@danielbarron42
Copy link
Author

I have k3s working reliably air-gapped and without a gateway for years using my above workaround. Recently I found a couple of other things I do are required as well. This is what I do:

--flannel-iface tap0 (see how to make the tap0 interface in my workaround above)
--kube-proxy-arg "proxy-mode=ipvs" (needed because of how it creates routes to pods and services bypassing the need of a gateway)

I make sure the coredns configmap doesn't have a forward if there's no DNS configured, which I would expect there not to be when air-gapped. Or coredns won't start.

I also set:
--cluster-cidr 10.242.0.0/16 --service-cidr 10.243.0.0/16 --cluster-dns 10.243.0.10
But I've not tested if that's required or not for air-gapped. This is only to avoid overlap with some local networks.

@dereknola dereknola self-assigned this Feb 27, 2023
Development [DEPRECATED] automation moved this from Backlog to Done Issue Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Improvements or additions to documentation
Projects
Status: Closed
Archived in project
Development

No branches or pull requests

7 participants