Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create cluster fails - kind-control-plane does not work on zfs #1719

Closed
johnlane opened this issue Jul 9, 2020 · 21 comments · Fixed by #1818
Closed

Create cluster fails - kind-control-plane does not work on zfs #1719

johnlane opened this issue Jul 9, 2020 · 21 comments · Fixed by #1818
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Milestone

Comments

@johnlane
Copy link

johnlane commented Jul 9, 2020

What happened: cluster create failed

failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

What you expected to happen: successful cluster creation

How to reproduce it (as minimally and precisely as possible):

$ kind create cluster 

Anything else we need to know?:

I'm using ZFS. I have read the other (now closed as resolved) issue about using zfs and I can see that /dev/mapper is bind-mounted into the kind-control-plane container. I'm running 0.8.1 so I believe that I should have the version that should work with ZFS.

Environment:

  • kind version: (use kind version):
    kind v0.8.1 go1.14.2 linux/amd64

  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"archive", BuildDate:"2020-04-23T22:11:11Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version: (use docker info):
    Server Version: 19.03.8-ce containerd version: d76c121f76a5fc8a462dc64594aea72fe18e1178.m

  • OS (e.g. from /etc/os-release): Arch Linux
    5.6.11-arch1-1 #1 SMP PREEMPT Wed, 06 May 2020 17:32:37 +0000 x86_64 GNU/Linux

To capture more information, I ran with kind create cluster --loglevel=debug --retain. Here is some further log data...

In kind-control-plane/containerd.log

Jul 08 12:53:09 kind-control-plane containerd[127]: time="2020-07-08T12:53:09.553579553Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-kind-control-plane,Uid:350cc499f8fb9468ea828e0b13035d50,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/82/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/82/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: invalid argument: unknown"

Also it appears to be trying to use zfs but doesn't have the zfs executable:

Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850165016Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850516545Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="exec: \"zfs\": executable file not found in $PATH: \"zfs fs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset system/storage/docker\" => "
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850555778Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850594684Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="exec: \"zfs\": executable file not found in $PATH: \"zfs fs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset system/storage/docker\" => "

@johnlane johnlane added the kind/bug Categorizes issue or PR as related to a bug. label Jul 9, 2020
@johnlane
Copy link
Author

johnlane commented Jul 9, 2020

I made a new kind-control-plane image by copying zfs and the necessary library dependencies from my host into a running container which I committed to an image. I then ran with kind create cluster --loglevel=debug --retain --image=kind-control-plane-zfs. The zfs errors are gone but it is still trying to use overlayfs. I've attached a zip of the logs...
287006805.tar.gz

Adding for reference, the following is what I copied in...

$ docker cp /usr/bin/zfs kind-control-plane:/usr/bin/zfs
$ docker cp /usr/lib/libnvpair.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libnvpair.so.1.0.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libuutil.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libuutil.so.1.0.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs.so.2 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs.so.2.0.0 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs_core.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs_core.so.1.0.0 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libtirpc.so.3 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libtirpc.so.3.0.0 kind-control-plane:/usr/lib/

@BenTheElder
Copy link
Member

one thing to consider: I'm not sure if it's safe for us to ship zfs binaries with kind, besides legal questions, I'm not sure if you can have a ZFS binary that isn't shipped to match the dkms module, and the kernel / module are going to come from the host.

in the short term you may have to run kind on some other filesystem that overlay functions on (most of them?)

@BenTheElder
Copy link
Member

BenTheElder commented Jul 10, 2020

I don't think containerd/CRI will use ZFS unless forced to, kind is leaving the defaults here.

kind create cluster --config=config.yaml

config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "zfs"

@johnlane
Copy link
Author

I get what you're saying about zfs and that is not a surprise given the whole thing surrounding the zfs license. It'd be very difficult to have zfs prepackaged inside the container without tying heavily to specific kernel/toolchain versions on the host.

I've also been trying k3d and with that I have a wrapper script that creates a loopmount file system using docker-volume-loopback to create sparse ext4 volumes that are passed into k3d. That works but I don't know enough (anything?) about how kind works to port that usage to it.

I might have to put kind down until I can use anohter filesystem.

@BenTheElder
Copy link
Member

I think licensing wise we're actually probably fine, since we don't ship a kernel and wouldn't ship the DKMS or binary kernel module, it looks like ubuntu does have a package for just the CLI utils so we'd just ship that, and some automatic tweak to the config.

I'm having difficulty determining if it's safe to mix the zfs utils version versus the kernel module, I've never used them out of sync before (and barely at all).

Can you test if this works using the above cluster config to enable the patch? This is similar to the customization needed for microk8s

@BenTheElder
Copy link
Member

if

cat <<EOF | kind create cluster --config=- --image=your-image-with-zfs-binaries
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "zfs"
EOF

works, then we can look at automating this, (pending also if it's OK to ship the ZFS CLIs without regard to the host)

@BenTheElder BenTheElder self-assigned this Jul 11, 2020
@cassandracomar
Copy link

cassandracomar commented Jul 14, 2020

so setting the snapshotter to zfs doesn't work (it complains about a missing metadata.db when actually trying to take snapshots). however, I noticed this from the microk8s project: canonical/microk8s@a5ec1f9#diff-e263cbd0de8da1f880f701684ae8b035R35-R36

and sure enough,

cat <<EOF | kind create cluster --config=-                                 
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
 [plugins."io.containerd.grpc.v1.cri".containerd]
 snapshotter = "native"
EOF

works without modifying the base image.

@BenTheElder
Copy link
Member

thanks!

the "native" snapshotter used to be called "naive" and isn't really meant to be used beyond simple testing IIRC, but that's probably an OK fallback on ZFS at least.

so setting the snapshotter to zfs doesn't work (it complains about a missing metadata.db when actually trying to take snapshots).

this is while using a modified base image w/ the ZFS CLI installed?

@cassandracomar
Copy link

this is while using a modified base image w/ the ZFS CLI installed?

yep. there may be some mounts from the host missing? I didn't have time to dig into it further.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 15, 2020 via email

@BenTheElder
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jul 20, 2020
@BenTheElder BenTheElder changed the title Create cluster fails - kind-control-plane does not use zfs snapshotter Create cluster fails - kind-control-plane does not work on zfs Jul 21, 2020
@teoincontatto
Copy link

The "native" snapshotter fallback is working only for kubernetes versions from 1.15.11 on:

$ cat /tmp/kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
 [plugins."io.containerd.grpc.v1.cri".containerd]
 snapshotter = "native"
networking:
  apiServerAddress: "0.0.0.0"
nodes:
- role: control-plane
$ kind create cluster --name kind --config /tmp/kind-config.yaml --image kindest/node:v1.14.10@sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.14.10) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✗ Starting control-plane 🕹️ 
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Here is the command output: kind-create-cluster-1.14.10-on-zfs.txt

I would like to know if that is because version of kubernetes older than 1.15.11 does not support working on top of ZFS or there is something that can be done to make it work.

Environment:

  • kind version: (use kind version):
    kind v0.8.1 go1.14.2 linux/amd64

  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T14:19:25Z", GoVersion:"go1.13.13", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version: (use docker info):
    Server Version: 19.03.12

  • OS (e.g. from /etc/os-release): Ubuntu Linux
    Ubuntu 20.04.1 LTS

@BenTheElder
Copy link
Member

@teoincontatto if you run with --retain and then kind export logs I'd be happy to take a look but I have no idea what changed wrt this in 1.14.10 => 1.15.11. I'm not going to have time to do this myself.

@BenTheElder
Copy link
Member

I can't actually verify this myself at the moment but a fix based on this thread should be in v0.9.0 (later today?)

@johnlane
Copy link
Author

johnlane commented Sep 1, 2020

@BenTheElder was hoping to test 0.9.0 over the weekend, any idea if that's going to be published soon? I might have some time this week to do a test.

@BenTheElder
Copy link
Member

BenTheElder commented Sep 1, 2020 via email

@BenTheElder
Copy link
Member

BenTheElder commented Sep 1, 2020 via email

@johnlane
Copy link
Author

johnlane commented Oct 9, 2020

I finally got around to being able to test this. Using the above config snippet, it worked for me. I haven't gone further than firing it up and running a nginx hello example but it appears to work.

@BenTheElder
Copy link
Member

Thanks!
This should be automatic in 0.9.0+ now 👍

@taisph
Copy link

taisph commented Oct 9, 2020

Could this be pushed to the gcloud sdk version as well?

@taisph
Copy link

taisph commented Oct 12, 2020

I still have to add the containerdConfigPatches or kind fails to start with v0.9.0.

ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged cubs-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Syslog has a bunch of errors like below when starting without the patch:

[ 5192.792035] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/36/fs' not supported as upperdir
[ 5200.782779] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/39/fs' not supported as upperdir
[ 5204.773607] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/40/fs' not supported as upperdir

I have the kind export logs if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants