Rancher single node wont start on m1 mac with latest docker desktop #35930

presidenten · 2021-12-20T08:33:53Z

Rancher Server Setup

Rancher version: 2.6.2
Installation option (Docker install):
- Docker single node install
Proxy/Cert Details: none

Environment

Mac M1 on macOS 12.0.1
Docker desktop for mac 4.3.1, docker engine 20.10.11

Describe the bug
Rancher crashes periodically, usually after 10-15s, and thus never starts.

Getting lots of [FATAL] k3s exited with: exit status 1 and Unexpected watch close - watch lasted less than a second and no items received. The error depends on how many times it has restarted. Sometimes it installs crds and such before crashing.

To Reproduce

Start rancher on m1 mac with latest docker desktop version

docker run -d --privileged --restart=unless-stopped --name rancher -p 4080:80 -p 4443:443 rancher/rancher:v2.6.2

Result

                                                                           ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
CONTAINER ID   IMAGE                    COMMAND           CREATED          STATUS                  PORTS                                         NAMES
c6b9cc8302f5   rancher/rancher:v2.6.2   "entrypoint.sh"   30 minutes ago   Up Less than a second   0.0.0.0:4080->80/tcp, 0.0.0.0:4443->443/tcp   rancher
                                                                           ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

And

2021/12/20 08:31:38 [INFO] Waiting for k3s to start
time="2021-12-20T08:31:38Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
time="2021-12-20T08:31:38Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/d57e75cb49c3cfd88307a8669e8adcf6b7740b66d6125a45c00aaa54301a5746"
2021/12/20 08:31:39 [INFO] Waiting for k3s to start
exit status 1
2021/12/20 08:31:46 [FATAL] k3s exited with: exit status 1

Edit:
I have now verified that rancher 2.6.2 works on Docker desktop for mac 4.2.0, but if I update to 4.3.1, it stops working.
Moving back to 4.2.0, makes it work again.

Expected Result
I expect rancher single node to not crash before starting

I have tried to prune containers, prune volumes, and even factory reset docker desktop.

Full logs until first crash

2021/12/20 08:31:37 [INFO] Rancher version v2.6.2 (64c748d16) is starting
2021/12/20 08:31:37 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:true Embedded:false BindHost: HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLogPath:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features: ClusterRegistry:}
2021/12/20 08:31:37 [INFO] Listening on /tmp/log.sock
2021/12/20 08:31:37 [INFO] Running etcd --data-dir=management-state/etcd --heartbeat-interval=500 --election-timeout=5000
running etcd on unsupported architecture "arm64" since ETCD_UNSUPPORTED_ARCH is set
2021-12-20 08:31:37.558753 W | pkg/flags: unrecognized environment variable ETCD_URL=https://github.com/etcd-io/etcd/releases/download/v3.4.15/etcd-v3.4.15-linux-arm64.tar.gz
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-12-20 08:31:37.558790 I | etcdmain: etcd Version: 3.4.15
2021-12-20 08:31:37.558792 I | etcdmain: Git SHA: aa7126864
2021-12-20 08:31:37.558810 I | etcdmain: Go Version: go1.12.17
2021-12-20 08:31:37.558816 I | etcdmain: Go OS/Arch: linux/arm64
2021-12-20 08:31:37.558820 I | etcdmain: setting maximum number of CPUs to 5, total number of available CPUs is 5
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-12-20 08:31:37.559130 I | embed: name = default
2021-12-20 08:31:37.559140 I | embed: data dir = management-state/etcd
2021-12-20 08:31:37.559142 I | embed: member dir = management-state/etcd/member
2021-12-20 08:31:37.559143 I | embed: heartbeat = 500ms
2021-12-20 08:31:37.559144 I | embed: election = 5000ms
2021-12-20 08:31:37.559146 I | embed: snapshot count = 100000
2021-12-20 08:31:37.559151 I | embed: advertise client URLs = http://localhost:2379
2021-12-20 08:31:37.559208 W | pkg/fileutil: check file permission: directory "management-state/etcd" exist, but the permission is "drwxr-xr-x". The recommended permission is "-rwx------" to prevent possible unprivileged access to the data.
2021-12-20 08:31:37.564482 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
raft2021/12/20 08:31:37 INFO: 8e9e05c52164694d switched to configuration voters=()
raft2021/12/20 08:31:37 INFO: 8e9e05c52164694d became follower at term 0
raft2021/12/20 08:31:37 INFO: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2021/12/20 08:31:37 INFO: 8e9e05c52164694d became follower at term 1
raft2021/12/20 08:31:37 INFO: 8e9e05c52164694d switched to configuration voters=(10276657743932975437)
2021-12-20 08:31:37.565972 W | auth: simple token is not cryptographically signed
2021-12-20 08:31:37.567814 I | etcdserver: starting server... [version: 3.4.15, cluster version: to_be_decided]
2021-12-20 08:31:37.568141 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
raft2021/12/20 08:31:37 INFO: 8e9e05c52164694d switched to configuration voters=(10276657743932975437)
2021-12-20 08:31:37.568395 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2021-12-20 08:31:37.568794 I | embed: listening for peers on 127.0.0.1:2380
raft2021/12/20 08:31:38 INFO: 8e9e05c52164694d is starting a new election at term 1
raft2021/12/20 08:31:38 INFO: 8e9e05c52164694d became candidate at term 2
raft2021/12/20 08:31:38 INFO: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
raft2021/12/20 08:31:38 INFO: 8e9e05c52164694d became leader at term 2
raft2021/12/20 08:31:38 INFO: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2021-12-20 08:31:38.067422 I | etcdserver: setting up the initial cluster version to 3.4
2021-12-20 08:31:38.070460 N | etcdserver/membership: set the initial cluster version to 3.4
2021-12-20 08:31:38.070547 I | etcdserver/api: enabled capabilities for version 3.4
2021-12-20 08:31:38.070585 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
2021-12-20 08:31:38.071103 I | embed: ready to serve client requests
2021-12-20 08:31:38.074404 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
2021/12/20 08:31:38 [INFO] Waiting for k3s to start
time="2021-12-20T08:31:38Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
time="2021-12-20T08:31:38Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/d57e75cb49c3cfd88307a8669e8adcf6b7740b66d6125a45c00aaa54301a5746"
2021/12/20 08:31:39 [INFO] Waiting for k3s to start
exit status 1
2021/12/20 08:31:46 [FATAL] k3s exited with: exit status 1

The text was updated successfully, but these errors were encountered:

Oats87 · 2021-12-20T23:25:41Z

@presidenten Are you able to docker cp <rancher-container>:/var/lib/rancher/k3s.log . to copy the k3s.log out of the Rancher container and see if you are able to inspect those logs to determine why K3s won't start?

Alternatively, you can upload that log (or the crash from it) here.

presidenten · 2021-12-21T07:45:18Z

@Oats87, sure, Ill do so tomorrow, since Im stuck in meetings all day today.

presidenten · 2021-12-21T15:05:49Z

@Oats87 I found some space in the schedule today after all

I start rancher with

docker run -d --privileged --restart=unless-stopped --name rancher -p 4080:80 -p 4443:443 rancher/rancher:v2.6.2

Then copy the log after a while. Then I deleted the docker container and updated docker to 4.3.1, so I continue with the same docker image.

For the 4.3.1 case I also set rancher loglevel to debug after starting the container.

docker exec rancher loglevel --set debug

k3s-docker-desktop-4.2.0.log
k3s-docker-desktop-4.3.1.log
docker-logs-rancher-docker-desktop-4.3.1.log

I noticed this line today that I seemed to have missed when I opened the issue

2021/12/21 14:48:56 [INFO] Running etcd --data-dir=management-state/etcd --heartbeat-interval=500 --election-timeout=5000
running etcd on unsupported architecture "arm64" since ETCD_UNSUPPORTED_ARCH is set
2021-12-21 14:48:56.147450 W | pkg/flags: unrecognized environment variable ETCD_URL=https://github.com/etcd-io/etcd/releases/download/v3.4.15/etcd-v3.4.15-linux-arm64.tar.gz

Seems pretty sus to me.

I couldnt find it in the k3s.log, but I saw it in terminal while looking at the logs, so I added them as well. I set

Its strange because I run the exact same command to start rancher.

For completeness I also deleted the docker image and pulled a new one with docker image pull rancher/rancher:v2.6.2 --platform linux/arm64/v8, but I still get the same result of rancher crashing all the time.

presidenten · 2021-12-21T15:14:49Z

So I also did

docker image pull rancher/rancher:v2.6.2 --platform linux/x86_64

Then redid the test with loglevel debug.

But still the same issue.

Here are the logs:
k3s-docker-desktop-4.3.1-image-archx86.log

docker-logs-rancher-docker-desktop-4.3.1-archx86.log

throrin19 · 2021-12-30T14:16:16Z

Same problem here in macbook pro 2016 with docker desktop 4.3.2

Oats87 · 2022-01-03T17:30:07Z

This appears to be occurring with Docker desktop > 4.3.0 as it has moved to cgroupsv2 which is not directly supported by Rancher yet.

The options to fix/work around this are:

Build a custom rancher/rancher container with a modified entrypoint.sh that will evacuate the root cgroup. Instructions for this are below in this comment but this is NOT a supported workaround and can easily lead to other problems. This is only tested with v2.6.3 and will not work with v2.5.x versions of Rancher as they are currently using a version of K3s that is too old to support cgroupsv2.
Downgrade to a Docker desktop version < 4.3.0
Use Rancher Desktop as this still operates with cgroupsv1 and will do so until Rancher supports cgroupsv2.

Seems that the best mitigation to this is going to be to use a mitigation to attempt to evacuate the root cgroup if run in a containerized environment... and this will likely need to land in norman and get bumped into Rancher.

It's relatively easy to create a custom rancher/rancher container that will evacuate the root cgroup, thus allowing Rancher to start up.

Creating an entrypoint.sh with cribbed evacuation logic like this: https://github.com/rancher/k3d/pull/579/files#diff-71e760f22ea8192fe65294b2330d4bd29fc3888fbf283ba4ac69fda1af3878dd and marking it executable i.e. chmod +x entrypoint.sh:

#!/bin/bash
set -e

if [ ! -e /run/secrets/kubernetes.io/serviceaccount ] && [ ! -e /dev/kmsg ]; then
    echo "ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes"
    exit 1
fi
rm -f /var/lib/rancher/k3s/server/cred/node-passwd
if [ -e /var/lib/rancher/etcd ] && [ ! -e /var/lib/rancher/k3s/server/db/etcd ]; then
  mkdir -p /var/lib/rancher/k3s/server/db
  ln -sf /var/lib/rancher/etcd /var/lib/rancher/k3s/server/db/etcd
  echo -n 'default' > /var/lib/rancher/k3s/server/db/etcd/name
fi
if [ -e /var/lib/rancher/k3s/server/db/etcd ]; then
  k3s server --cluster-init --cluster-reset &> ./k3s-cluster-reset.log
  if [ $? -ne 0 ]; then
    echo "ERROR:" && cat ./k3s-cluster-reset.log
    rm -f /var/lib/rancher/k3s/server/db/reset-flag
  fi
fi
if [ -x "$(command -v update-ca-certificates)" ]; then
  update-ca-certificates
fi
if [ -x "$(command -v c_rehash)" ]; then
  c_rehash
fi
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
  echo "[$(date -Iseconds)] [CgroupV2 Fix] Evacuating Root Cgroup ..."
	# move the processes from the root group to the /init group,
  # otherwise writing subtree_control fails with EBUSY.
  mkdir -p /sys/fs/cgroup/init
  xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
  # enable controllers
  sed -e 's/ / +/g' -e 's/^/+/' <"/sys/fs/cgroup/cgroup.controllers" >"/sys/fs/cgroup/cgroup.subtree_control"
  echo "[$(date -Iseconds)] [CgroupV2 Fix] Done"
fi
exec tini -- rancher --http-listen-port=80 --https-listen-port=443 --audit-log-path=${AUDIT_LOG_PATH} --audit-level=${AUDIT_LEVEL} --audit-log-maxage=${AUDIT_LOG_MAXAGE} --audit-log-maxbackup=${AUDIT_LOG_MAXBACKUP} --audit-log-maxsize=${AUDIT_LOG_MAXSIZE} "${@}"

then creating a Dockerfile

FROM rancher/rancher:v2.6.3
COPY entrypoint.sh /usr/bin/entrypoint.sh

and doing a docker build will end up with a cgroupsv2 compatible rancher container.

I have an example container at oats87/rancher:v2.6.3-cgv2 but I cannot stress enough that you should NOT use this container as I cannot support it.

luctrate · 2022-03-31T08:03:40Z

Any updates?
Same error on debian 11.

2022/03/31 08:00:53 [INFO] Listening on /tmp/log.sock
2022/03/31 08:00:53 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
2022/03/31 08:00:55 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
2022/03/31 08:00:57 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
2022/03/31 08:00:59 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
2022/03/31 08:01:01 [INFO] Waiting for server to become available: the server is currently unable to handle the request
2022/03/31 08:01:03 [INFO] Waiting for server to become available: an error on the server ("apiserver not ready") has prevented the request from succeeding
2022/03/31 08:01:05 [INFO] Waiting for server to become available: an error on the server ("apiserver not ready") has prevented the request from succeeding
2022/03/31 08:01:07 [INFO] Waiting for server to become available: an error on the server ("apiserver not ready") has prevented the request from succeeding
2022/03/31 08:01:09 [INFO] Waiting for server to become available: an error on the server ("apiserver not ready") has prevented the request from succeeding
2022/03/31 08:01:19 [FATAL] k3s exited with: exit status 1

yjqg6666 · 2022-04-25T02:29:52Z

Any update for this issue which stops me from trying it?

xrow · 2022-04-28T10:00:10Z

The wokraround also works on EL9

podman run -d --restart=unless-stopped \
  --name rancher \
  -p 80:80 -p 443:443 \
  --privileged \
  docker.io/oats87/rancher:v2.6.3-cgv2

xrow · 2022-07-21T13:30:41Z

I ended up using k3s testing with helm chart rancher 2.6.7-rc3 on centos 9

snasovich added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 22, 2021

snasovich added this to the v2.6.4 - Triaged milestone Dec 22, 2021

snasovich added feature/rancher-docker-install [zube]: Team Area 2 labels Dec 22, 2021

Oats87 self-assigned this Jan 3, 2022

Oats87 added [zube]: Next Up and removed [zube]: Team Area 2 labels Jan 3, 2022

sceniclife mentioned this issue Jan 16, 2022

rancher v2.6.3 crash in docker at debian 11 #36165

Closed

leros1337 mentioned this issue Jan 17, 2022

Installing Rancher by run docker image on RHEL 8.5 #36047

Closed

Sahota1225 added the release-note Note this issue in the milestone's release notes label Feb 9, 2022

snasovich modified the milestones: v2.6.4, v2.6.5 Mar 1, 2022

snasovich modified the milestones: v2.6.5, v2.6.x Apr 15, 2022

superseb mentioned this issue May 17, 2022

SSL/TLS - key.pem does not contain a certificate or CRL: skipping #37743

Closed

xrow mentioned this issue Jul 21, 2022

Fix for rancher issue #35930 #38360

Closed

Jono-SUSE-Rancher unassigned Oats87 Oct 30, 2023

Jono-SUSE-Rancher added [zube]: To Triage and removed [zube]: Next Up labels Oct 30, 2023

Jono-SUSE-Rancher removed this from the v2.6.x milestone Oct 30, 2023

Jono-SUSE-Rancher added this to the v2.x - Backlog milestone Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rancher single node wont start on m1 mac with latest docker desktop #35930

Rancher single node wont start on m1 mac with latest docker desktop #35930

presidenten commented Dec 20, 2021 •

edited

Oats87 commented Dec 20, 2021

presidenten commented Dec 21, 2021

presidenten commented Dec 21, 2021 •

edited

presidenten commented Dec 21, 2021 •

edited

throrin19 commented Dec 30, 2021

Oats87 commented Jan 3, 2022

luctrate commented Mar 31, 2022 •

edited

yjqg6666 commented Apr 25, 2022

xrow commented Apr 28, 2022

xrow commented Jul 21, 2022

Rancher single node wont start on m1 mac with latest docker desktop #35930

Rancher single node wont start on m1 mac with latest docker desktop #35930

Comments

presidenten commented Dec 20, 2021 • edited

Oats87 commented Dec 20, 2021

presidenten commented Dec 21, 2021

presidenten commented Dec 21, 2021 • edited

presidenten commented Dec 21, 2021 • edited

throrin19 commented Dec 30, 2021

Oats87 commented Jan 3, 2022

luctrate commented Mar 31, 2022 • edited

yjqg6666 commented Apr 25, 2022

xrow commented Apr 28, 2022

xrow commented Jul 21, 2022

presidenten commented Dec 20, 2021 •

edited

presidenten commented Dec 21, 2021 •

edited

presidenten commented Dec 21, 2021 •

edited

luctrate commented Mar 31, 2022 •

edited