Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing in a Alpine VM error #1387

Closed
MichelDiz opened this issue Feb 5, 2020 · 18 comments
Closed

Installing in a Alpine VM error #1387

MichelDiz opened this issue Feb 5, 2020 · 18 comments
Assignees
Milestone

Comments

@MichelDiz
Copy link

MichelDiz commented Feb 5, 2020

Version:

k3s version v1.17.2+k3s1 (cdab19b0)

Os details:

localhost:~# cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.11.3
PRETTY_NAME="Alpine Linux v3.11"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

Describe the bug
It is not a bug, I guess (Maybe a support?), the issue is that the installation failed.
I just wanna install k3s in a VM before mount any hardware structure.

localhost:~# k3s kubectl get node
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
k3s server
INFO[2020-02-05T18:21:10.367578680Z] Starting k3s.cattle.io/v1, Kind=Addon controller 
INFO[2020-02-05T18:21:10.367703324Z] Waiting for master node  startup: resource name may not be empty 
INFO[2020-02-05T18:21:10.369016512Z] Node token is available at /var/lib/rancher/k3s/server/token 
INFO[2020-02-05T18:21:10.369043523Z] To join node to cluster: k3s agent -s https://10.0.2.15:6443 -t ${NODE_TOKEN} 
I0205 18:21:10.382825    3195 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
2020-02-05 18:21:10.401164 I | http: TLS handshake error from 127.0.0.1:39748: remote error: tls: bad certificate
INFO[2020-02-05T18:21:10.405754261Z] Wrote kubeconfig /etc/rancher/k3s/k3s.yaml   
INFO[2020-02-05T18:21:10.405781202Z] Run: k3s kubectl                             
INFO[2020-02-05T18:21:10.405786573Z] k3s is up and running                        
WARN[2020-02-05T18:21:10.405831594Z] Failed to find cpuset cgroup, you may need to add "cgroup_enable=cpuset" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi) 
ERRO[2020-02-05T18:21:10.405840404Z] Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi) 
FATA[2020-02-05T18:21:10.405849805Z] failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi) 
localhost:~# cat /boot/cmdline.txt
cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

Also, other behavior happens

When I reboot it shows

localhost:~# k3s server
INFO[2020-02-05T18:29:26.546848476Z] Starting k3s v1.17.2+k3s1 (cdab19b0)         
INFO[2020-02-05T18:29:26.547068084Z] Cluster bootstrap already complete           
FATA[2020-02-05T18:29:26.555324035Z] starting kubernetes: preparing server: start cluster and https: listen tcp :6443: bind: address already in use 
localhost:~# k3s kubectl get node
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

To Reproduce
Install alpine-virt-3.11.3-x86_64.iso on VirtualBox
Then install k3s.

More details

localhost:~# k3s check-config

Verifying binaries in /var/lib/rancher/k3s/data/7c4aaa633ac3ff4849166ba2759da158a70beb5734940e84b6e67011a35f4c59/bin:
- sha256sum: good
- links: good

System:
- /var/lib/rancher/k3s/data/7c4aaa633ac3ff4849166ba2759da158a70beb5734940e84b6e67011a35f4c59/bin iptables v1.8.3 (legacy): ok
- swap: should be disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: nonexistent?? (fail)
    (see https://github.com/tianon/cgroupfs-mount)
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: missing (fail)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: missing (fail)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 3 (fail)
@brandond
Copy link
Contributor

brandond commented Feb 5, 2020

Did you just download the k3s binary and run it with k3s server or did you use the install script? The output from your attempt at starting it manually after a reboot suggests that you used the install script and it's already running (as a service):
FATA[2020-02-05T18:29:26.555324035Z] starting kubernetes: preparing server: start cluster and https: listen tcp :6443: bind: address already in use

@MichelDiz
Copy link
Author

MichelDiz commented Feb 5, 2020

Yeah, I let the script install normally. However, I thought that trying to execute it manually again would have an effect(Because I thought the k3s was not running). At first install (via script), it just gave a bunch of errors. So I added the "/boot/cmdline.txt" and the errors got smaller.

But the point is, it should install just fine in Alpine?

BTW, feels like Alpine doesn't have "cgroup hierarchy: nonexistent?? (fail)"

But, Alpine has OpenRC, which supports a kind of cgroups. So, the k3s isn't aware of it, it seems.

@brandond
Copy link
Contributor

brandond commented Feb 5, 2020

Ok, if you installed it with the script then you should not be running k3s server manually. Start or stop the service via the init script.

My understanding is that the install script just sets up and then starts the service, so even if the service crashed the first time due to missing cgroups, it still started automatically after you rebooted with the right kernel options. You can tell it's running due to your attempt to start a second copy of the server failing to bind to the port -it's already in use by the copy started by the init script.

That said, the cluster bootstrap may have only partially completed due to the missing cgroups. I would run k3s-uninstall.sh (should have been created by the install script), and then reinstall from scratch, now that you've got the cgroups configured properly. You may want to also finish configuring the network stack (give it a permanent IP address, set the hostname, etc) to avoid further problems down the road.

@galal-hussein
Copy link
Contributor

@MichelDiz Thanks for submitting the issue, I agree with @brandond, the error you are seeing "address already in use" is due to running k3s server manually, please let us know if removing k3s and reinstalling after properly configuring the cgroups works correctly

@MichelDiz
Copy link
Author

Well, I've started if from scratch. But, no success.

Also tried by removing it and doing what brandond mentioned. (no success)

It seems to me that this is a bug. The installation script supports OpenRC (see https://github.com/rancher/k3s/blob/0374c4f63d056df01c3e9e8cf4d77a6461169070/install.sh#L91). However, on this line https://github.com/rancher/k3s/blob/4cacffd7e64451b6b199c0561562c55510e46db3/contrib/util/check-config.sh#L313 the code does a wrong check. In $cgroupSubsystemDir.

I don't know how to add groups correctly in the context of K3s and Alpine. I did research, and couldn't find a solution. So I believe it is a bug, as the Script does not predict this problem with OpenRC (or perhaps it is just with Alpine).

Maybe the point is whether or not supporting Alpine.

Maybe I just will give up.

localhost:/boot# curl -sfL https://get.k3s.io | sh -
[INFO]  Finding latest release
[INFO]  Using v1.17.2+k3s1 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.17.2+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v1.17.2+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/rancher/k3s/k3s.env
[INFO]  openrc: Creating service file /etc/init.d/k3s
[INFO]  openrc: Enabling k3s service for default runlevel
[INFO]  openrc: Starting k3s
 * Caching service dependencies ...                                                                             [ ok ]
 * Starting k3s ...                                                                                             [ ok ]
localhost:/boot# k3s kubectl get node
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
localhost:/boot# rc-status
Runlevel: default
 acpid                                         [  started  ]
 chronyd                                       [  started  ]
 crond                                         [  started  ]
 sshd                                          [  started  ]
 k3s                                            [ failed ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
 sysfs                                        [  started  ]
 fsck                                          [  started  ]
 root                                           [  started  ]
 localmount                             [  started  ]
Dynamic Runlevel: manual
 k3s                                            [ failed ]

Logs from the system

localhost:/# cat k3s.log
(...)
I0205 22:02:53.184921    2645 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I0205 22:02:53.184956    2645 cache.go:39] Caches are synced for AvailableConditionController controller
I0205 22:02:53.184941    2645 cache.go:39] Caches are synced for autoregister controller
I0205 22:02:53.185196    2645 shared_informer.go:204] Caches are synced for crd-autoregister 
I0205 22:02:53.185253    2645 shared_informer.go:204] Caches are synced for cluster_authentication_trust_controller 
E0205 22:02:53.209318    2645 controller.go:150] Unable to perform initial Kubernetes service initialization: Service "kubernetes" is invalid: spec.clusterIP: Invalid value: "10.43.0.1": cannot allocate resources of type serviceipallocations at this time
E0205 22:02:53.211639    2645 controller.go:155] Unable to remove old endpoints from kubernetes service: StorageError: key not found, Code: 1, Key: /registry/masterleases/10.0.2.15, ResourceVersion: 0, AdditionalErrorMsg: 
I0205 22:02:54.084319    2645 controller.go:107] OpenAPI AggregationController: Processing item 
I0205 22:02:54.084369    2645 controller.go:130] OpenAPI AggregationController: action for item : Nothing (removed from the queue).
I0205 22:02:54.084624    2645 controller.go:130] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).
I0205 22:02:54.090307    2645 storage_scheduling.go:133] created PriorityClass system-node-critical with value 2000001000
(...)
(...)
2020-02-05 22:02:56.446982 I | http: TLS handshake error from 127.0.0.1:34982: remote error: tls: bad certificate
time="2020-02-05T22:02:56.451657928Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
time="2020-02-05T22:02:56.451674698Z" level=info msg="Run: k3s kubectl"
time="2020-02-05T22:02:56.451680268Z" level=info msg="k3s is up and running"
time="2020-02-05T22:02:56.451721450Z" level=warning msg="Failed to find cpuset cgroup, you may need to add \"cgroup_enable=cpuset\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
time="2020-02-05T22:02:56.451730290Z" level=error msg="Failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
time="2020-02-05T22:02:56.451740610Z" level=fatal msg="failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
time="2020-02-05T22:03:01.624779744Z" level=info msg="Starting k3s v1.17.2+k3s1 (cdab19b0)"
time="2020-02-05T22:03:01.625200709Z" level=info msg="Cluster bootstrap already complete"
time="2020-02-05T22:03:01.635023089Z" level=info msg="Kine listening on unix://kine.sock"
(...)
(...)
I0205 22:04:28.496762    3217 node_lifecycle_controller.go:77] Sending events to api server
I0205 22:04:28.496801    3217 controllermanager.go:247] Started "cloud-node-lifecycle"
E0205 22:04:28.497565    3217 core.go:90] Failed to start service controller: the cloud provider does not support external load balancers
W0205 22:04:28.497575    3217 controllermanager.go:244] Skipping "service"
W0205 22:04:28.497580    3217 core.go:108] configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes.
W0205 22:04:28.497583    3217 controllermanager.go:244] Skipping "route"
time="2020-02-05T22:04:28.597723858Z" level=info msg="Starting k3s.cattle.io/v1, Kind=Addon controller"
time="2020-02-05T22:04:28.597879715Z" level=info msg="Waiting for master node  startup: resource name may not be empty"
time="2020-02-05T22:04:28.598052143Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
time="2020-02-05T22:04:28.598073773Z" level=info msg="To join node to cluster: k3s agent -s https://10.0.2.15:6443 -t ${NODE_TOKEN}"
I0205 22:04:28.631159    3217 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
2020-02-05 22:04:28.644870 I | http: TLS handshake error from 127.0.0.1:35618: remote error: tls: bad certificate
time="2020-02-05T22:04:28.649571725Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
time="2020-02-05T22:04:28.649587515Z" level=info msg="Run: k3s kubectl"
time="2020-02-05T22:04:28.649592916Z" level=info msg="k3s is up and running"
time="2020-02-05T22:04:28.649632777Z" level=warning msg="Failed to find cpuset cgroup, you may need to add \"cgroup_enable=cpuset\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
time="2020-02-05T22:04:28.649641828Z" level=error msg="Failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
time="2020-02-05T22:04:28.649652468Z" level=fatal msg="failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"

@galal-hussein
Copy link
Contributor

@MichelDiz Is this Alpine on a RPI ? also did you make sure to reboot after enabling the cgroups in cmdline

@dweomer
Copy link
Contributor

dweomer commented Feb 5, 2020

@MichelDiz Is this Alpine on a RPI ? also did you make sure to reboot after enabling the cgroups in cmdline

you'll also want to make sure the cgroups service is started with something like:

rc-update add cgroups boot

or you could make the k3s service explicitly depend on it by:

echo 'rc_want=cgroups' > /etc/conf.d/k3s

@MichelDiz
Copy link
Author

@MichelDiz Is this Alpine on a RPI ? also did you make sure to reboot after enabling the cgroups in cmdline

No, as I said in the issue comment. It is a VM. I was just trying out before mount a hardware cluster.

@dweomer after

echo 'rc_want=cgroups' > /etc/conf.d/k3s

it worked fine after a reboot!

localhost:~# k3s kubectl get node
NAME        STATUS   ROLES    AGE     VERSION
localhost   Ready    master   8m38s   v1.17.2+k3s1

The first command you've shared I had tried before. rc-update add ... with no luck.

It feels like it needs to go to the script. If not so, feel free to close it.

Thank you all for your help! I'm very happy that it is just that simple solution.

Cheers.

@erikwilson
Copy link
Contributor

#1354 (merged just now) should also fix this by adding want cgroups to the OpenRC init script.

@MichelDiz
Copy link
Author

Nice, I can test it out. Is it live?

@erikwilson
Copy link
Contributor

Yes, thank you @MichelDiz! :)

@zube zube bot removed the [zube]: To Triage label Feb 5, 2020
@erikwilson erikwilson removed the kind/question No code change, just asking/answering a question label Feb 5, 2020
@MichelDiz
Copy link
Author

MichelDiz commented Feb 6, 2020

@erikwilson It worked, but I had to reboot. Wouldn't it be possible to automate this? as we do in systemctl daemon-reload should have an equivalent. Need to document this if the case (document the need for a reboot).

@erikwilson
Copy link
Contributor

Thanks for testing @MichelDiz, I don't think it should need a reboot. To test I used the generic/alpine10 image with vagrant, and installed with the old install script via

curl -sfL https://raw.githubusercontent.com/rancher/k3s/0374c4f63d056df01c3e9e8cf4d77a6461169070/install.sh | sh -

verified the cgroup error, and re-ran the install with:
curl -sfL https://get.k3s.io | sh -
which will automatically restart the k3s service (and load the cgroups service /etc/init.d/cgroups in the process).

Sounds like something else is going on, if you are able to reproduce please share the error from /var/log/k3s.log and check the output of the install script, should end with something like:

[INFO]  openrc: Creating service file /etc/init.d/k3s
[INFO]  openrc: Enabling k3s service for default runlevel
[INFO]  openrc: Starting k3s
 * Caching service dependencies ...                                                                                                  [ ok ]
 * Stopping k3s ...                                                                                                                  [ ok ]
 * Starting k3s ...

There may be a few seconds after starting with openrc that kubectl will report The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port? due to differences in openrc and systemd.

@MichelDiz
Copy link
Author

Okay, I believe I was impatient to see the magic happening. I think we done.

Thank you again for the effort!

Cheers.

(I'll let you guys close it)

@cjellick
Copy link
Contributor

This got backported to 1.17. @ShylajaDevadiga testing this is low priority and should not block the 1.17.5 release, but im putting in your queue just so you can minimally give it a read through.

@chwzr
Copy link

chwzr commented May 18, 2020

the cgroup thingy worked for me, but on your logs (and on mine) are other errors:

  • CONFIG_NF_NAT_NEEDED: missing (fail)
  • CONFIG_NF_NAT_IPV4: missing (fail)

Is there something we may have to configure or install to get networking especially nat working?
Internal Networking will not work. Accessing a ClusterIp based Service from inside the cluster is not possible.

some hints anybody?

cheers,
chwzr

@brandond
Copy link
Contributor

@chwzr see #1291 - these checks are outdated, as the modules have been renamed in newer kernels. I would guess that something else is broken in your environment.

@rancher-max
Copy link
Contributor

Validated using both the latest k3s version (v1.18.6+k3s1) and the v1.17 version mentioned here (v1.17.5+k3s1)

  • Ensured I see the cgroup error mentioned in the issue by doing an install of a previous version: curl -sfL https://raw.githubusercontent.com/rancher/k3s/0374c4f63d056df01c3e9e8cf4d77a6461169070/install.sh | sh -

  • Reinstalling while pointing to v1.17.5+k3s1 successfully resolves the error and gives a working cluster: curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.17.5+k3s1 sh -

  • Fresh installs using this version also are successful and do not have the cgroup error

  • Fresh installs using the default (latest) version are also successful ('curl -sfL https://get.k3s.io | sh -`)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants