Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sysbox installation on Rancher managed cluster failed #380

Closed
pwurbs opened this issue Aug 12, 2021 · 7 comments
Closed

Sysbox installation on Rancher managed cluster failed #380

pwurbs opened this issue Aug 12, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request
Projects

Comments

@pwurbs
Copy link

pwurbs commented Aug 12, 2021

I tried to install Sysbox in a k8S cluster using the user guide.

  • It's a single node cluster, created and managed by Rancher 2.5.9, no special cluster properties.
  • Kubernetes v1.20.9
  • Host Ubuntu 20.04.2 LTS / Kernel 5.4.0-80-generic / Docker 20.10.7
  • Host got label sysbox-install: yes

So Sysbox requirements should be fulfilled.

RBAC and RuntimeClass have been successfully deployed.
But there are issues with the Daemonset sysbox-deploy-k8s, the Pod is continously crashing.
This is the log line before crashing:
Job for kubelet-config-helper.service failed because the control process exited with error code. See "systemctl status kubelet-config-helper.service" and "journalctl -xe" for details.

This is the result of "systemctl status kubelet-config-helper.service":

kubelet-config-helper.service - Kubelet config service
     Loaded: loaded (/lib/systemd/system/kubelet-config-helper.service; static; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-08-11 09:34:05 UTC; 1min 3s ago
    Process: 98727 ExecStart=/bin/sh -c /usr/local/bin/kubelet-config-helper.sh (code=exited, status=1/FAILURE)
   Main PID: 98727 (code=exited, status=1/FAILURE)
Aug 11 09:34:05 rancher02-testsysbox systemd[1]: Starting Kubelet config service...
Aug 11 09:34:05 rancher02-testsysbox sh[98756]: Usage: grep [OPTION]... PATTERNS [FILE]...
Aug 11 09:34:05 rancher02-testsysbox sh[98756]: Try 'grep --help' for more information.
Aug 11 09:34:05 rancher02-testsysbox sh[98755]: Unit kubelet.service could not be found.
Aug 11 09:34:05 rancher02-testsysbox sh[98728]: Soft-linking dockershim socket to CRI-O socket on the host ...
Aug 11 09:34:05 rancher02-testsysbox sh[98777]: cp: cannot stat '/etc/default/kubelet': No such file or directory
Aug 11 09:34:05 rancher02-testsysbox systemd[1]: kubelet-config-helper.service: Main process exited, code=exited, status=1/FAILURE
Aug 11 09:34:05 rancher02-testsysbox systemd[1]: kubelet-config-helper.service: Failed with result 'exit-code'.
Aug 11 09:34:05 rancher02-testsysbox systemd[1]: Failed to start Kubelet config service.

The cluster has been created in Rancher using the option "Create a new Kubernetes cluster", based on existing nodes. So the single node has been prepared and imported to create the new (downstream) cluster.
Attached, there is the cluster-config, exported from Rancher
cluster-config.txt

@rodnymolina rodnymolina self-assigned this Aug 13, 2021
@rodnymolina rodnymolina added the enhancement New feature or request label Aug 13, 2021
@rodnymolina rodnymolina added this to To do in Sysbox Dev via automation Aug 13, 2021
@rodnymolina
Copy link
Member

rodnymolina commented Aug 13, 2021

Thanks for filing this issue @pwurbs!

Sysbox-PODs feature has not been validated / tested on Rancher yet. Will take a look at this one tomorrow.

@rodnymolina
Copy link
Member

rodnymolina commented Aug 16, 2021

I was able to reproduce the issue by deploying a cluster directly through rke -- had too many issues trying to import pre-existing nodes into rancher. Even though the setup may not be exactly the same as the one originally described, there shouldn't be any relevant differences for us as rancher internally relies on rke too.

There are various issues at play here:

  • Sysbox-pods installer assumes that kubelet service is deployed as a systemd service, which is not the case in rke setups, as there, kubelet executes within a privileged container (sharing pid namespaces with the host and bind-mounting a bunch of host resources). We could expand our installer to cover this deployment pattern, but then we would need to deal with the second issue below ...

  • rke relies on docker to build (and monitor) all the components of the k8s control-plane, so even if we find a way to install cri-o, the rke monitoring routines would fail at detecting them. The same applies to any other high-level container runtime -- see here for similar concerns with containerd's runtime.

  • Now, rke seems to be on its way out, and AFAIK, rancher is about to replace it with rke2 as its K8s engine. The good news here is that rke2 supports containerd off-the-bat, and it should be able to talk to any CRI-complaint runtime (see here), so looks like this is the winning horse we must focus on.

@pwurbs, how does rke2 sounds for you? Is rke-to-rke2 migration already part of your roadmap?

@pwurbs
Copy link
Author

pwurbs commented Aug 16, 2021

@rodnymolina
Thx for the analysis. So I understand that Sysbox can't be deployed currently on a Rancher managed K8S cluster (RKE based). Right?
Unfortunately we currently don't intend to move to RKE2.
Would it be a workaround to install Sysbox using the host installation procedure instead of deploying it using the K8S manifests?

@rodnymolina
Copy link
Member

rodnymolina commented Aug 16, 2021

Would it be a workaround to install Sysbox using the host installation procedure instead of deploying it using the K8S manifests?

Installing Sysbox through the traditional package won't help here as Rancher (and its provisioning tools: rke, rke2, ks3) won't be aware of its existence in the remote hosts. For that integration process to happen is that we have the 'sysbox-k8s-deploy' daemon-set.

Having said that, there may be an alternative approach that we are currently investigating to make this all work. Please stay tuned.

@rodnymolina
Copy link
Member

rodnymolina commented Aug 25, 2021

At the end we were able to make it work (see details below). RKE can now deploy sysbox-powered pods in a cluster. Changes have been pushed to the latest Sysbox-deploy-k8s installer, which will deploy both CRI-O and Sysbox in the desired k8s-nodes.

In terms of implementation, we went for the following approach:

As mentioned above, RKE heavily relies on docker to create both the k8s control-plane as well as its data-plane. The former components are spawned as docker containers (i.e. kubelet, kube-proxy and nginx-proxy), whereas the latter ones (e.g. cni pods and all user workloads) are created as PODs through the docker-shim interface.

As we don't want / we can't change RKE, we are still relying on docker to create the basic control-plane components. However, we have switched all the data-plane components from docker-shim to CRI-O.

As it's usually the case, we have incorporated all the required configuration steps as part of the sysbox-deploy-k8s daemonset. All that is required is the execution of the following steps -- k8s-nodes' re-configuration process shouldn't take more than a minute:

kubectl label nodes <node-name> sysbox-install=yes
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/rbac/sysbox-deploy-rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/daemonset/sysbox-deploy-k8s.yaml
kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/runtime-class/sysbox-runtimeclass.yaml

Refer to our k8s installation guide for more details.

@pwurbs
Copy link
Author

pwurbs commented Aug 26, 2021

I could now successfully deploy Sysbox at a Rancher managed (RKE) cluster node using the K8S manifest files.
I used Ubuntu 20.04-latest, Docker 20.x and Kubernetes v1.20.10
The testing pod according to https://github.com/nestybox/sysbox/blob/master/docs/user-guide/install-k8s.md#pod-deployment could be successfully deployed (without any privileged mode).
Within that container I could successfully pull and start a nginx container.
So far everything is fine, thank you.

Then I started successfully a pod with docker:dind image (docker:19.03.15-dind-alpine3.13)
Trying "docker pull nginx" in this container results in this error:
failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo655593762") failed: failed to rmdir /rdwoo655593762/m/d: remove /rdwoo655593762/m/d: operation not permitted

This is the Docker version info from within the container:

Server: Docker Engine - Community
 Engine:
  Version:          19.03.15
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       99e3ed8
  Built:            Sat Jan 30 03:18:13 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

These versions are a bit different from your ubuntu-bionic-systemd-docker image.
I am not sure, if this issue is K8S / RKE related. I only wanted to let you know...

@ctalledo
Copy link
Member

Hi @pwurbs,

Glad you were able to install Sysbox on your RKE nodes (great work by @rodnymolina to enable this).

Regarding the latest problem you reported:

failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo655593762") failed: failed to rmdir /rdwoo655593762/m/d: remove /rdwoo655593762/m/d: operation not permitted

This looks very similar to issue #254, where the problem showed up when the inner Docker uses slightly older versions.

However, in that issue we reported that the problem occurs when the inner Docker has version < 19.03, but in your case the inner Docker has version 19.03.

Could you retry with a docker dind image using Docker 20+ please?

I am not sure, if this issue is K8S / RKE related. I only wanted to let you know...

I don't believe so. Thus, it makes sense for us to move this discussion to issue #254. I'll copy your prior comment and this current one to that issue, so we can continue the discussion there. I'll close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Sysbox Dev
  
Done
Development

No branches or pull requests

3 participants