feat: al2 support #119

szab100 · 2023-12-11T21:00:21Z

Adding Amazon Linux 2 support.

Notes:

Requires 5.15 kernel (available from amazon-linux-extras RPM source)
- can be done via the following AMI packer provisioner script:

        {
            "type": "shell",
            "inline": [
                "sudo amazon-linux-extras disable kernel-5.4",
                "sudo amazon-linux-extras install -y kernel-5.15",
                "sudo yum install -y kernel-devel"
            ]
        },

Known / pending issues:

Auto K8S ServiceAccount secret mounts using tmpfs are not working (see workaround below).

The resulting error is like

container create failed: time="2024-01-19T16:16:13Z" level=error msg="container_linux.
go:424: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:3
78: bind mounting /var/lib/kubelet/pods/4feffa97-9271-47c6-831f-6f4f2132c146/volumes/kubernetes.io~projected/kube-api-access-dm675 to run/secrets/kubern
etes.io/serviceaccount caused: bind mount through procfd of /var/lib/kubelet/pods/4feffa97-9271-47c6-831f-6f4f2132c146/volumes/kubernetes.io~projected/k
ube-api-access-dm675 -> run/secrets/kubernetes.io/serviceaccount: open o_path procfd: open /var/lib/containers/storage/overlay/642e09746c44d554f6b2ffda3
73169624ef1c1609648924e6be5dd2a33073603/merged/run/secrets/kubernetes.io/serviceaccount: no such file or directory"

Workaround: These auto-mounted SA Tokens failing to be mounted to the default /var/run/secrets/.. location, however, by disabling the auto-mount option on the Pod and adding the volume and volumeMount entries manually with a different mountPath (like /secrets/...) should work:

Disable auto-mounting these SA Tokens (by various K8S / AWS controllers) on the Pod or container level by adding either of these to your Pod spec:
```
annotations:
  eks.amazonaws.com/skip-containers = dev
```
or (depending on which aws controller does it)
```
spec:
  automount_service_account_token = false
```

Add the volume & volumeMount entries manually to mount the Secret-backed token(s) to your Pod/container:

  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token

  containers:
  - name: dev
    volumeMounts:
      - mountPath: /secrets/eks.amazonaws.com/serviceaccount
        name: aws-iam-token
        readOnly: true

fix indentation

ctalledo

Thanks @szab100 for the contribution!

Looks good in general, just a couple of comments.

k8s/scripts/kubelet-config-helper.sh

ctalledo · 2023-12-14T17:43:27Z

k8s/scripts/kubelet-config-helper.sh

@@ -1476,6 +1488,7 @@ function do_config_kubelet() {
 		clean_cgroups_kubepods
 		config_kubelet "host-based"
 		adjust_crio_config_dependencies
+		restart_containerd


Curious on why restarting containerd is needed; it's non-obvious to me because we are not changing any containerd config, and after this script switches kubelet to use CRI-O, containerd will no longer be used.

yeah, i'm not quite sure, I know this was needed at some point (may be AL2 specific), but unsure if it was finally needed or not.. it shouldn't hurt though

I would prefer to leave it out, since there's no change to any containerd config in the script, so restarting containerd is confusing I think; unless it's actually needed of course, in which case a comment as to why would be helpful.

I wonder if it helps on the sysbox uninstall from the cluster, where we revert back from CRI-O -> containerd.

Hey @ctalledo, I made a new build today on top of sysbox v0.6.3 but applying the new sysbox-runc patch + this PR + reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October) and I can confirm that this line (restarting containerd) seems to be needed. Without this, the installer daemonset finishes with the first run and the node just fails (the kubelet is down).

Upon SSH-ing to the failing node, I see the following issue:

[root@ip-10-194-243-117 user]# systemctl status kubelet-config-helper.service ● kubelet-config-helper.service - Kubelet config service Loaded: loaded (/usr/lib/systemd/system/kubelet-config-helper.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2024-02-01 20:53:48 UTC; 14min ago Process: 24294 ExecStart=/bin/sh -c /usr/local/bin/kubelet-config-helper.sh (code=exited, status=1/FAILURE) Main PID: 24294 (code=exited, status=1/FAILURE) Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart crio Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + restart_kubelet Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + echo 'Restarting Kubelet ...' Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: Restarting Kubelet ... Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart kubelet Feb 01 20:53:48 ip-10-194-243-117.vpc.internal sh[24294]: A dependency job for kubelet.service failed. See 'journalctl -xe' for details. Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service: main process exited, code=exited, status=1/FAILURE Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Failed to start Kubelet config service. Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Unit kubelet-config-helper.service entered failed state. Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service failed.

[root@ip-10-194-243-117 user]# systemctl status kubelet.service ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubelet-args.conf, 30-kubelet-extra-args.conf Active: inactive (dead) since Thu 2024-02-01 20:52:13 UTC; 20min ago Docs: https://github.com/kubernetes/kubernetes Main PID: 6995 (code=exited, status=0/SUCCESS) ... Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.113055 6995 kubelet.go:2132] "SyncLoop (PLEG): event for pod" pod="addon-active-monitor-ns/aws-asg-activities-healthcheck-workflow...3aa88fa506f6a} Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopping Kubernetes Kubelet... Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.564307 6995 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt" Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopped Kubernetes Kubelet. Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Dependency failed for Kubernetes Kubelet. Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Job kubelet.service/start failed with result 'dependency'. Hint: Some lines were ellipsized, use -l to show in full.

This is probably because the kubelet systemd unit has containerd as a dependency (After & Requires):

[root@ip-10-194-243-117 user]# cat /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=containerd.service sandbox-image.service Requires=containerd.service sandbox-image.service ...

So I added a new sed command to replace any potential dependencies on 'containerd' with 'crio' to fix it. But even after this replacement, followed by the systemctl daemon-reload command at the end of the config_kubelet() function, reloading kubelet still fails until the containerd service is stopped. That is why adding service containerd restart helped, but I just replaced it with a call to the existing stop_containerd() func instead.

Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).

reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)

Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).

Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).

reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)

Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).

It's fine, actually in the meantime we upgraded to 1.26.6 as well, so no need to bring it back.

ctalledo

One more comment please.

ctalledo · 2024-02-03T15:51:31Z

k8s/scripts/kubelet-config-helper.sh

 	else
 		kubelet_env_file=$(echo "$kubelet_env_files" | awk '{print $NF}')
 	fi

 	backup_config "$kubelet_env_file" "kubelet_env_file"

+	# Replace potential dependencies on 'containerd' with 'crio'
+	sed -i "s/containerd.service/crio.service/" /etc/systemd/system/kubelet.service


During the sysbox-deploy-k8s uninstall, don't we need to "undo" this change? Maybe create a copy of the kubelet.service, and then revert to it during uninstall.

szab100 and others added 2 commits December 11, 2023 11:51

feat: al2 support

a229422

Update sysbox-installer-helper.sh

4a1d6ce

fix indentation

ctalledo reviewed Dec 14, 2023

View reviewed changes

szab100 requested a review from ctalledo January 23, 2024 22:30

szab100 and others added 2 commits February 1, 2024 11:07

Merge branch 'nestybox:master' into al2

c15075f

fix: stop containerd & remove kubelet deps on it

ee34390

ctalledo reviewed Feb 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: al2 support #119

feat: al2 support #119

szab100 commented Dec 11, 2023 •

edited

Loading

ctalledo left a comment

ctalledo Dec 14, 2023

szab100 Jan 23, 2024 •

edited

Loading

ctalledo Jan 30, 2024 •

edited

Loading

ctalledo Jan 30, 2024

szab100 Feb 1, 2024

ctalledo Feb 3, 2024

szab100 Feb 8, 2024

ctalledo left a comment

ctalledo Feb 3, 2024

feat: al2 support #119

Are you sure you want to change the base?

feat: al2 support #119

Conversation

szab100 commented Dec 11, 2023 • edited Loading

Adding Amazon Linux 2 support.

Notes:

Known / pending issues:

ctalledo left a comment

Choose a reason for hiding this comment

ctalledo Dec 14, 2023

Choose a reason for hiding this comment

szab100 Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

ctalledo Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

ctalledo Jan 30, 2024

Choose a reason for hiding this comment

szab100 Feb 1, 2024

Choose a reason for hiding this comment

ctalledo Feb 3, 2024

Choose a reason for hiding this comment

szab100 Feb 8, 2024

Choose a reason for hiding this comment

ctalledo left a comment

Choose a reason for hiding this comment

ctalledo Feb 3, 2024

Choose a reason for hiding this comment

szab100 commented Dec 11, 2023 •

edited

Loading

szab100 Jan 23, 2024 •

edited

Loading

ctalledo Jan 30, 2024 •

edited

Loading