Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rootless Docker and Rootless Podman, without patching Kubernetes #1935

Merged
merged 1 commit into from
Mar 11, 2021

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Nov 18, 2020

This PR adds support for running kind with Rootless Docker provider.
Requires Docker 20.10 / Podman 3.0, with cgroup v2.

Unlike the previous PR (#1727), this version works without patching Kubernetes (1.20.4).

However, this version has dirty hacks such as faking sysctl keys by bind-mounting regular files under /proc/sys.
So I still want the Kubernetes PR to be merged: kubernetes/kubernetes#92863

Restrictions

The restrictions of Rootless Docker apply to kind clusters as well.

e.g.

  • OverlayFS cannot be used unless the host is Ubuntu or Debian
  • Cannot mount block storages
  • Cannot mount NFS

To workaround the OverlayFS issue, we could use fuse-overlayfs on kernel >= 4.18: https://github.com/AkihiroSuda/containerd-fuse-overlayfs
However, to decrease complexity of PR, the support for fuse-overlayfs is not included in this PR, and will be introduced in a separate PR after this PR gets merged.

How this PR works

When the entrypoint script detect that it is running inside user namespace (i.e. rootless), it modifies /etc/containerd/config.toml to:

  • Set restrict_oom_score_adj to true to adjust oomScoreAdj value
  • Change the snapshotter from overlay to native, unless running with Ubuntu/Debian kernel

The entrypoint script also does:

How to test

Step 1: Prepare host

  • Install Ubuntu 20.10 host.

  • Add GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1" to /etc/default/grub.

  • Create /etc/systemd/system/user@.service.d/delegate.conf with the following content:

[Service]
Delegate=yes
  • Run sudo update-grub and reboot.

  • Install Docker 20.10 or Podman 3.0.

$ curl -fsSL https://get.docker.com | sh
$ sudo apt-get install -y docker-ce-rootless-extras
$ dockerd-rootless-setuptool.sh install -f
  • Run docker info with DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock, and make sure it shows "rootless" as a Security Option:
$ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock

$ docker info
...
Cgroup Driver: systemd
 Cgroup Version: 2
...
 Security Options:
  seccomp
   Profile: default
  rootless
  cgroupns
...

Step 2: Prepare the node image

$ (cd $GOPATH/src/k8s.io/kubernetes && git checkout v1.20.4)
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock
docker build -t kindest/base:latest ./images/base && kind build node-image --base-image kindest/base:latest --type=bazel

The image is registered as kindest/node:latest in Rootless Docker's image store.

NOTE: --type=bazel is required for running kind build node-image with Rootless Docker.

Step 3: Start kind with Rootless Docker

  • Run kind create cluster --image kindest/node:latest
  • Run ps auxw on the hosts, and make sure the kind processes are running as unprivileged users
  • Make sure kubectl get pods -A shows all pods as Running
$ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock
$ kind create cluster --image kindest/node:latest
...
$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
kube-system          coredns-74ff55c5b-cgdfs                      1/1     Running   0          113s
kube-system          coredns-74ff55c5b-hpldh                      1/1     Running   0          113s
kube-system          etcd-kind-control-plane                      1/1     Running   0          114s
kube-system          kindnet-qqm9j                                1/1     Running   0          113s
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          114s
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          114s
kube-system          kube-proxy-qx849                             1/1     Running   0          113s
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          114s
local-path-storage   local-path-provisioner-78776bfc44-68n5f      1/1     Running   0          113s

Pre-built image is temporarily available on my Docker Hub: https://hub.docker.com/r/akihirosuda/tmp-kind-node/tags

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 18, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @AkihiroSuda. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 18, 2020
@AkihiroSuda AkihiroSuda changed the title Support Rootless Docker, with vanilla Kubernetes Support Rootless Docker, without patching Kubernetes Nov 18, 2020
images/base/Dockerfile Outdated Show resolved Hide resolved
images/base/Dockerfile Outdated Show resolved Hide resolved
images/base/files/usr/local/bin/entrypoint Show resolved Hide resolved
images/base/files/usr/local/bin/entrypoint Outdated Show resolved Hide resolved
images/base/files/etc/containerd/config.toml Show resolved Hide resolved
site/content/docs/user/quick-start.md Outdated Show resolved Hide resolved
site/content/docs/user/quick-start.md Outdated Show resolved Hide resolved
site/content/docs/user/quick-start.md Outdated Show resolved Hide resolved
site/content/docs/user/quick-start.md Outdated Show resolved Hide resolved
images/base/files/usr/local/bin/entrypoint Outdated Show resolved Hide resolved
images/base/Dockerfile Outdated Show resolved Hide resolved
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 1, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 1, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 2, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 2, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 3, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Dec 4, 2020
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/kind that referenced this pull request Jan 20, 2021
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Jan 20, 2021

Now this PR needs to wait for cgroup v2 fix: #2013 (EDIT: merged)

@AkihiroSuda
Copy link
Member Author

Rebased. Tested with Kubernetes v1.20.2.

@BenTheElder
Copy link
Member

runc update upstream should be picked up in #2057

images/base/Dockerfile Outdated Show resolved Hide resolved
@AkihiroSuda
Copy link
Member Author

Rebased. Tested with Kubernetes v1.20.4.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Feb 24, 2021

Podman CI is failing (#2085), unrelated to this PR.

@BenTheElder BenTheElder self-assigned this Feb 24, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/docker Issues or PRs related to docker area/provider/podman Issues or PRs related to podman labels Mar 3, 2021
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 3, 2021

Updated PR to remove prerequisite of "net.netfilter.nf_conntrack_max" , by setting kubeProxyConfiguration.conntrack.maxPerCore to 0 .

I confirmed this version works with Ubuntu 20.04 (kernel 5.4.0-66-generic, Docker 20.10.5).

Image: akihirosuda/tmp-kind-node:gfd99e3c9-v1.20.4

@aojea
Copy link
Contributor

aojea commented Mar 3, 2021

Great, it works for me now, nice, I never used rootless before, it has some networking edges, i.e. you can't access containers from the host ...

I think this approach is better because we reduce bash and is easier to maintain in the long term, however, we need @BenTheElder to check the new changes introduced in the provider interface

@AkihiroSuda AkihiroSuda changed the title Support Rootless Docker, without patching Kubernetes Support Rootless Docker and Rootless Podman, without patching Kubernetes Mar 3, 2021
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 3, 2021

Updated again

  • Allow kernel.dmesg_restrict=1, by bind-mounting /dev/null to /dev/kmsg
  • Unlock Podman. Requires Podman 3.x.

Image: akihirosuda/tmp-kind-node:g11c96b0c-v1.20.4

Tested with vanilla Kubernetes v1.20.4

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

Rebased and squashed commits.

Image: akihirosuda/tmp-kind-node:g85d51d89-v1.20.4

@AkihiroSuda
Copy link
Member Author

/test pull-kind-e2e-kubernetes
/test pull-kind-e2e-kubernetes-1-19
/test pull-kind-e2e-kubernetes-1-20

@AkihiroSuda
Copy link
Member Author

/test pull-kind-e2e-kubernetes-1-19
/test pull-kind-e2e-kubernetes-1-20

@AkihiroSuda
Copy link
Member Author

/test pull-kind-e2e-kubernetes-1-20

identifier: "rootless"
weight: 3
---
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind.
Starting with kind v0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/) and [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) can be used as the node provider of kind.

(we should try to be consistent about this)
(non-blocking nit, can easily handle this sort of thing in a follow-up)

@@ -281,3 +283,33 @@ func (p *provider) CollectLogs(dir string, nodes []nodes.Node) error {
errs = append(errs, errors.AggregateConcurrent(fns))
return errors.NewAggregate(errs)
}

// Info returns the provider info.
func (p *provider) Info() (*providers.ProviderInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I love this API, but the point of node providers being internal is precisely so we can iterate on stuff like this in isolated implementation packages (still exporting) without worrying about users depending on it, versus the public cluster provider. (comment directed at @aojea)

Copy link
Contributor

@aojea aojea Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but that makes a big assumption about providers compatibility, and we have right now 2 providers that try to be completely compatible, I can't see how this will go with kata or windows native, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can pretty trivially rework this method later though, this PR has already seen a lot of back and forth

# If /proc/self/uid_map 4294967295 mappings, we are in the initial user namespace, i.e. the host.
# Otherwise we are in a non-initial user namespace.
# https://github.com/opencontainers/runc/blob/v1.0.0-rc92/libcontainer/system/linux.go#L109-L118
userns=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have to detect rootless provider on the host anyhow, is there a reason to do this inside the container?
do we expect this to vary within rootless?
or should we just start passing a KIND_ROOTLESS_NODE_PROVIDER=true ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking userns here is beneficial for potential support of dockerd --userns-remap and LXD driver.

@BenTheElder
Copy link
Member

thank you for keeping after this.
we've been having fun with kubernetes code freeze which has kept me a bit low bandwidth here. I'd like to go ahead and merge this this week and iterate from where we're at rather than let this linger longer.

@k8s-ci-robot
Copy link
Contributor

@AkihiroSuda: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kind-e2e-kubernetes-1-20 85d51d8 link /test pull-kind-e2e-kubernetes-1-20

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member

/lgtm
/approve
I think there's room to bikeshed some more for sure, but erring on the side of unblocking for now. Thank you.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AkihiroSuda, BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2021
@BenTheElder BenTheElder merged commit e8ad7f7 into kubernetes-sigs:master Mar 11, 2021
maelvls pushed a commit to maelvls/kind that referenced this pull request Jul 1, 2021
Discussed in kubernetes-sigs#1935 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/docker Issues or PRs related to docker area/provider/podman Issues or PRs related to podman cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants