Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.20.0 cannot create clusters on RHEL 7 #3311

Closed
ncouse opened this issue Jul 19, 2023 · 26 comments · Fixed by #3631
Closed

v0.20.0 cannot create clusters on RHEL 7 #3311

ncouse opened this issue Jul 19, 2023 · 26 comments · Fixed by #3631
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ncouse
Copy link

ncouse commented Jul 19, 2023

We are using RHEL7 VMs and have been successfully using these with KinD for quite some time (thank you).

When trying to upgrade to 0.20.0, the cluster fails to install.

Docker is using cgroups v1, and kernel is a rather antique version 3.10.0-1160.81.1.el7.x86_64.

The root problem seems to be the use of --cgroupns=private, which will not work in this environment. I presume the issue is with kernel support for the feature - I believe 4.6 is required.

While RHEL 7 is quite old, it is still in support, even with the old Kernel 3.

What happened:

KinD 0.20.0 fails to install on a RHEL 7 VM (kernel 3.10.0).

What you expected to happen:

Cluster to be created.

How to reproduce it (as minimally and precisely as possible):

$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) 🖼
 ✗ Preparing nodes 📦
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --volume /dev/mapper:/dev/mapper --publish=127.0.0.1:33587:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: WARNING: Your kernel does not support cgroup namespaces.  Cgroup namespace setting discarded.
83f54548a6e5f603f7eac309719806364f9d7c226c77849a07cd363773f40d4b
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: cgroup namespaces aren't enabled in the kernel: unknown.

Anything else we need to know?:

It works in other environments. Other enviroments I have access to support cgroups v2, with modern kernels.

Environment:

  • kind version: (use kind version): 0.20.0
  • Runtime info: (use docker info or podman info): 24.0.2 (cgroups v1)
  • OS (e.g. from /etc/os-release): RHEL 7.9
  • Kubernetes version: (use kubectl version): N/A
  • Any proxies or other special environment settings?: N/A
@parkjeongryul
Copy link

Same here.

@BenTheElder
Copy link
Member

The cgroupns=private is a 20.10.0+ feature (circa 2020), and even older for podman.

This is unfortunate :/

Switching to private cgroupns all the time makes the project's cgroups hackery a lot more reasonable.

However we've seen other broken environments (alpine) and will be revisiting this requirement in the short term.
Longer term I think cgroups v2 will be a hard requirement wether we want it or not because the ecosystem is moving on.

@ncouse
Copy link
Author

ncouse commented Aug 2, 2023

So unfortunately RHEL 7 is stuck on kernel 3.10, which means that the cgroupns=private feature cannot be used, even though we have a recent enough verison of docker to support it.

Even trying to create any container with that feature fails:

$ docker run -ti --rm --cgroupns=private alpine
WARNING: Your kernel does not support cgroup namespaces.  Cgroup namespace setting discarded.
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: cgroup namespaces aren't enabled in the kernel: unknown.

On RHEL8, it uses kernel 4.18, which is sufficient for this feature, and that will also use cgroups v2.

While I understand the project want to use this feature, with good reason, would it be possible to have a flag to disable its use, for use on older environments?

@mybeloved
Copy link

I have the same problem, would it be possible to have a flag to forbid '--cgroupns' or just let '--cgroupns=host' to ensure it works correctly?

@BenTheElder
Copy link
Member

While I understand the project want to use this feature, with good reason, would it be possible to have a flag to disable its use, for use on older environments?

If we're going to do this we might as well just do it by default without adding another flag, because we're still stuck supporting non-namespaced cgroups and all the issues those bring anyhow.

I'm somewhat (not entirely ... undecided) disinclined to support RHEL given it's no longer an environment we can replicate after the recent centOS shenanigans from RedHat. MacOS is at least available in actions (and maintainer's local machines) and Windows is currently similarly receiving primarily community support.

It looks like RHEL7 will be out of support in less than a year and RHEL 8 will seemingly not have this issue, which is something else to consider ... 🤔

Sorry, both Antonio and I have been out recently and there's a lot to catch up on.

@anthosz
Copy link

anthosz commented Aug 4, 2023

Hello,

FYI, same behaviour on Amazon Linux V2 (more/less based on RHEL7), not tested on Amazon Linux 2023.

Workaround in progress -> move to Ubuntu 22

@BenTheElder
Copy link
Member

Seems likely #3442 is related, given CentOS 7.9 which I assume roughly equals RHEL 7.

Kubernetes is likely going to stop supporting RHEL7 Kernels anyhow, I would strongly recommend moving to a newer OS:
kubernetes/kubernetes#116799 (comment)

@zhangtong007
Copy link

I also encountered the same problem on the Centos7 system. Is there any way to avoid this problem?
Maybe a temporary solution?
image
image

@Romain-Geissler-1A
Copy link

I doubt someone is going to invest much time to try to fix this issue. RHEL 7 end of "normal" support is in end of June, so 4 more months. After this, I have no doubt some companies will pay the extended support till 2028, but these companies will have to make a choice: running "recent" cloud related development tools on 10+ years old OS, maybe it's not the most rational situation ;)

Being affected by this in a company with thousands of developers currently moving to cloud tools, here is what we are doing internally:

  • First plan the decomission of our RHEL 7 before the end of the normal support. This is on-going.
  • In the meantime, since we are rather able to control a bit the development environment, we have pinned "kind" to the latest "working" release on RHEL 7: 0.19. It's not great, but we are able to survive in these conditions, until of course a migration to RHEL 9 machines.

@BenTheElder
Copy link
Member

Right, I can't speak for everyone contributing but I just can't see choosing to prioritize this above everything else, even setting aside EOL release, the reason this is broken is because the kernel is too old. Kubernetes, containerd, runc, etc are not tested on RHEL7 to my knowledge and expect a somewhat more reasonably current kernel. I expect the ecosystem will start to require cgroupsv2 at some point in the not too distant future.

@ncouse
Copy link
Author

ncouse commented Feb 27, 2024

@BenTheElder Yes, I understand that RHEL7 support is less priority for you, and can appreciate that. I originally raised this in hopes of a simple workaround.

It is unfortunate, given the number of replies, that many people are stuck on RHEL7 for various reasons.

Beside the kernel, there are other issues on RHEL7 also, such as older versions of libraries like glibc that are giving issues, so it is a problematic platform.

Migration to RHEL 9 (or other platforms) is the obvious solution, but of course that won't work for everyone.

@KubeKyrie
Copy link

Same problem.
kind v0.20.0 cannot create clusters on CentOS 7.9, kernel 3.10.0.

@BenTheElder
Copy link
Member

Note that the ecosystem is moving away from cgroups v1 which will necessitate a newer kernel

#3558 (comment)

One option might be developing Kubernetes things inside of a VM with a newer kernel if you can't upgrade the host.

@anthosz
Copy link

anthosz commented Apr 17, 2024

I guess this issue can be closed no?

Solution: upgrade your OS

@BenTheElder
Copy link
Member

We'd be willing to consider reasonable proposed solutions if others wish to dig in and come up with something, and we still intermittently see more users with this issue.

At minimum to close it we'd have to add an entry here https://kind.sigs.k8s.io/docs/user/known-issues/ (we probably should anyhow but E_TOO_MUCH_TO_DO)

@pwyp
Copy link

pwyp commented May 20, 2024

Same problem.

Citrix VDI + RHEL 7.9 (Maipo)
Kernel: Linux 3.10.0-1160.114.2.el7.x86_64

Unfortunately migration to RHEL 9 is not an option to me.
From the other hand it works perfectly on Win10 WSL2 + Ubuntu 22

$ kind version
kind v0.23.0 go1.21.10 linux/amd64

$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.30.0) 🖼
✗ Preparing nodes 📦
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --volume /dev/mapper:/dev/mapper --publish=127.0.0.1:43665:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.30.0@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e" failed with error: exit status 125
Command Output: WARNING: Your kernel does not support cgroup namespaces. Cgroup namespace setting discarded.
c55774d6753f3d8e257fb4f1dae6c10d12db12b44a933f65649da6df0c7351df
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: cgroup namespaces aren't enabled in the kernel: unknown.

@BenTheElder
Copy link
Member

Please refrain from "same problem" comments that don't add new information to the discussion.

We're aware that RHEL 7 is not supported and does not work because the kernel is too old and does not support a required kernel feature (cgroup namespaces, introduced eight years ago https://lkml.org/lkml/2016/3/26/132/) that we adopted to work around other breaking changes in the cgroup v1 ecosystem. Someone will have to spend time designing a reasonable workaround that does not make kind less reliable for currently supported hosts and then we can review it.

I don't plan to design this myself as these old kernels aren't a priority for me personally, have alternatives available, and the assorted related projects are discussing cgroups v1 EOL anyhow and we cannot exceed the support of our dependencies etc.

Please see the above discussion.

@pwyp
Copy link

pwyp commented May 21, 2024

I can workaround the error on RHEL7 by replacing
--cgroupns=private
parameter with
--cgroupns=host
while executing the failing docker command manually from console (see my previous comment).

Overall this does not help much because even if I create 'kind-control-plane' manually using docker run command

$ docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane (here comes the rest of args)

$ kind get nodes
kind-control-plane

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d6a0930376c kindest/node:v1.30.0 "/usr/local/bin/entr…" About a minute ago Up About a minute 127.0.0.1:43665->6443/tcp kind-control-plane

kind still does a lot more under the hood while creating a cluster and simple re-running cluster creation once again ends up with another error

$ kind create cluster --verbosity 5
ERROR: failed to create cluster: node(s) already exist for a cluster with the name "kind"
Stack Trace:
sigs.k8s.io/kind/pkg/errors.Errorf
sigs.k8s.io/kind/pkg/errors/errors.go:41
sigs.k8s.io/kind/pkg/cluster/internal/create.alreadyExists
sigs.k8s.io/kind/pkg/cluster/internal/create/create.go:182
sigs.k8s.io/kind/pkg/cluster/internal/create.Cluster
sigs.k8s.io/kind/pkg/cluster/internal/create/create.go:80
sigs.k8s.io/kind/pkg/cluster.(*Provider).Create
sigs.k8s.io/kind/pkg/cluster/provider.go:192
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster.runE
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster/createcluster.go:110
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster.NewCommand.func1
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster/createcluster.go:54
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.4.0/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.4.0/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.4.0/command.go:902
sigs.k8s.io/kind/cmd/kind/app.Run
sigs.k8s.io/kind/cmd/kind/app/main.go:53
sigs.k8s.io/kind/cmd/kind/app.Main
sigs.k8s.io/kind/cmd/kind/app/main.go:35
main.main
sigs.k8s.io/kind/main.go:25
runtime.main
runtime/proc.go:267
runtime.goexit
runtime/asm_amd64.s:1650

  1. Is there any way to force 'kind' to implicitly make use of --cgroupns=host rather than --cgroupns=private while creating a cluster? EDIT: I guess not as already discussed above (I missed that point somehow)
  2. Or maybe 'kind' could accept already existing 'kind-control-plane' and proceed rather than end up with the above error?
    These are just questions from an end-user point of view and I cannot say if such workarounds would have implications for reliability.

@BenTheElder
Copy link
Member

Is there any way to force 'kind' to implicitly make use of --cgroupns=host rather than --cgroupns=private while creating a cluster? EDIT: I guess not as already discussed #3311 (comment) (I missed that point somehow)

No, and the reason we require cgroupns is because otherwise there is more leaky behavior from the host cgroups that frequently outright breaks kind. cgroupns=private solves this and it is actually the default in docker / podman on cgroup v2 hosts.

If we disable this feature then it just won't work on newer hosts (and may not work reliably on these old hosts either, even if it appears to bring up a cluster), and if we make it customizable users will start to depend on this detail even though it shouldn't even be allowed on cgroup v2 (with the nested hierarchy this makes no sense) and causes broken behavior on v1.

MAYBE we could automatically do this as a fallback after parsing the error, but this is brittle, slow, and we've already been moving to make the internals of the node setup more maintainable by dropping all the broken attempts at working around hostns issues.

Or maybe 'kind' could accept already existing 'kind-control-plane' and proceed rather than end up with the above error?
These are just questions from an end-user point of view and I cannot say if such workarounds would have implications for reliability.

kind create cluster would not. It is responsible for creating the containers and the options it uses are an implementation detail that the further steps depend on.

@pwyp
Copy link

pwyp commented May 22, 2024

I see the point now.
Thank you for clarification and sharing valuable insights.

@lowang-bh
Copy link
Member

same problem

command Output: WARNING: Your kernel does not support cgroup namespaces.  Cgroup namespace setting discarded.
622723da818fc19f164cdfec877be110348797b33aff47a82cb183177b64ee99
docker: Error response from daemon: OCI runtime create failed: cgroup namespaces aren't enabled in the kernel

kind v0.20.0 go1.20.4 linux/amd64
docker version 20.10.11
kernel 3.10.0

@stmcginnis
Copy link
Contributor

Any reason to keep this issue open? Not sure if there are any actions here.

@anthosz
Copy link

anthosz commented May 27, 2024

Any reason to keep this issue open? Not sure if there are any actions here.

We'd be willing to consider reasonable proposed solutions if others wish to dig in and come up with something, and we still intermittently see more users with this issue.

At minimum to close it we'd have to add an entry here https://kind.sigs.k8s.io/docs/user/known-issues/ (we probably should anyhow but E_TOO_MUCH_TO_DO)

@ncouse
Copy link
Author

ncouse commented May 27, 2024

I am presuming this will not be addressed and therefore can be closed. I didn't close myself, in case there were actions you wanted to take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.