Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC: Nydus-snapshotter trigger enablement cached? #8337

Open
stevenhorsman opened this issue Oct 30, 2023 · 9 comments
Open

CC: Nydus-snapshotter trigger enablement cached? #8337

stevenhorsman opened this issue Oct 30, 2023 · 9 comments
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@stevenhorsman
Copy link
Member

stevenhorsman commented Oct 30, 2023

I'm not sure if this is the correct repo for this issue, but we can migrate it later. I've been trying to automate the nydus snapshotter pull-on-guest for the peer pods tests and as part of that hit some strange findings. I think this is an accurate write up of what I've done, but I was doing a lot of debugging before I found this pattern:

  • The first time a pod is created, whether it using the nydus snapshotter to pull on guest is controlled by whether the io.containerd.cri.runtime-handler annotation is set to the kata-cc runtime class.
    e.g.
apiVersion: v1
kind: Pod
metadata:
  name: busybox-cc
  annotations:
    io.containerd.cri.runtime-handler: kata-remote
spec:
  runtimeClassName: kata-remote
  containers:
  - name: busybox
    image: quay.io/prometheus/busybox:latest
    imagePullPolicy: Always

results in the pull being done on the guest via nydus snapshotter, as can be seen from the cloud-api-adaptor logs:

...
2023/10/30 15:46:20 [adaptor/proxy]     storages:
2023/10/30 15:46:20 [adaptor/proxy]         mount_point:/run/kata-containers/407d6a9dee87f84e0323998a3f57345cadb6f917847a83bdbfeb8db696977709/rootfs source:quay.io/prometheus/busybox:latest fstype:overlay driver:image_guest_pull
2023/10/30 15:46:20 [adaptor/proxy] CreateContainer: Ignoring PullImage before CreateContainer (cid: "407d6a9dee87f84e0323998a3f57345cadb6f917847a83bdbfeb8db696977709")
...

vs.

apiVersion: v1
kind: Pod
metadata:
  name: alpine
spec:
  runtimeClassName: kata-remote
  containers:
  - name: alpine
    image: alpine:latest
    imagePullPolicy: Always

which leads to:

2023/10/30 16:17:20 [adaptor/proxy]     annotations:
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: eb6f6294fb0ec74ade4dd8d1ba8785a817149121d660b121b55026d49961cc82
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.container-name: alpine
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/10/30 16:17:20 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/alpine:latest
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: 85f08c03-ee8a-4241-ab92-a6b2ef57a619
2023/10/30 16:17:20 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/44d49a6ca260d41049bb179f0ac03b4c79c96c6fe63ab6ca22b88e9dbf625ebe
2023/10/30 16:17:20 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: alpine
2023/10/30 16:17:20 [adaptor/proxy] getImageName: got image from annotations: docker.io/library/alpine:latest
2023/10/30 16:17:20 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/alpine:latest" before CreateContainer (cid: "44d49a6ca260d41049bb179f0ac03b4c79c96c6fe63ab6ca22b88e9dbf625ebe")

(Not that the CRI-O fallback of calling PullImage is used as there is no driver:image_guest_pull storage mount_point)

This is as we expected and matches the findings that @fidencio originally saw, which lead to us needing to use the annotation.

Subsequent usage of the same images are where is gets confusing as it appears to be ignoring the annotation, suggesting that either the annotation is cached, or the effect of the annotation is cached.:

I updated the busybox pod spec to remove the annotation:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-cc
spec:
  runtimeClassName: kata-remote
  containers:
  - name: busybox
    image: quay.io/prometheus/busybox:latest
    imagePullPolicy: Always

so would expect that nydus-snapshotter code path to not be activated, but it is:

2023/10/30 16:09:44 [adaptor/proxy]     storages:
2023/10/30 16:09:44 [adaptor/proxy]         mount_point:/run/kata-containers/d512856a2c67db28e1621b6c67056e0e110ee63c88cb2d2e0ccbcc5127fe38df/rootfs source:quay.io/prometheus/busybox:latest fstype:overlay driver:image_guest_pull
2023/10/30 16:09:44 [adaptor/proxy] CreateContainer: Ignoring PullImage before CreateContainer (cid: "d512856a2c67db28e1621b6c67056e0e110ee63c88cb2d2e0ccbcc5127fe38df")
2023/10/30 16:09:45 [adaptor/proxy] StartContainer: containerID:d512856a2c67db28e1621b6c67056e0e110ee63c88cb2d2e0ccbcc5127fe38df

then updating the alpine image to include the annotation:

apiVersion: v1
kind: Pod
metadata:
  name: alpine
  annotations:
    io.containerd.cri.runtime-handler: kata-remote
spec:
  runtimeClassName: kata-remote
  containers:
  - name: alpine
    image: alpine:latest
    imagePullPolicy: Always

which seemed to have not effect and it didn't have the driver:image_guest_pull still

2023/10/30 16:35:58 [adaptor/proxy]     annotations:
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: alpine
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.container-name: alpine
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/alpine:latest
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: d7f53f05-d867-4782-9db6-12dceee87171
2023/10/30 16:35:58 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2023/10/30 16:35:58 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: 5e58dc26d2965de1c16a4c242aeff4011b06264a9f056f433ef70ecf905a2662
2023/10/30 16:35:58 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/71dbdf8c5485bc69943612bc841c9d9c8c27b2b2523941c5a15fa57bf18b9992
2023/10/30 16:35:58 [adaptor/proxy] getImageName: got image from annotations: docker.io/library/alpine:latest
2023/10/30 16:35:58 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/alpine:latest" before CreateContainer (cid: "71dbdf8c5485bc69943612bc841c9d9c8c27b2b2523941c5a15fa57bf18b9992")

It seems that whatever nydus-snapshotter enablement we use the first time we pull the image is used for all subsequent pulls meaning that if a user accidentally misses the cri-runtime annotation the first time, then they can't do pull on guest again. It might be that there is some file on the worker that I can remove that will reset this, but I'm not sure that's a great plan. I achieved this with the kata-remote runtime and cloud-api-adaptor, but I don't think that's it's related to that runtime class.

@stevenhorsman stevenhorsman added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Oct 30, 2023
@stevenhorsman stevenhorsman changed the title CC: Nydus-snapshotter trigger cached? CC: Nydus-snapshotter trigger enablement cached? Oct 30, 2023
@stevenhorsman
Copy link
Member Author

\cc @fidencio @jiangliu @ChengyuZhu6 - you might be interested in this, or have some ideas

@huoqifeng
Copy link
Contributor

huoqifeng commented Nov 2, 2023

What I met is:

if I remove the images on host, I can get it worked as expected:

Like:

crictl --runtime-endpoint /run/containerd/containerd.sock rmi registry.k8s.io/pause:3.8
crictl --runtime-endpoint /run/containerd/containerd.sock rmi docker.io/library/nginx:latest

Log

- Pause
2023/11/02 11:52:36 [adaptor/proxy] CreateContainer: containerID:ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985
2023/11/02 11:52:36 [adaptor/proxy]     mounts:
2023/11/02 11:52:36 [adaptor/proxy]         destination:/proc source:proc type:proc
2023/11/02 11:52:36 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2023/11/02 11:52:36 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2023/11/02 11:52:36 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2023/11/02 11:52:36 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2023/11/02 11:52:36 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2023/11/02 11:52:36 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985-2a749cb3c800dedc-resolv.conf type:bind
2023/11/02 11:52:36 [adaptor/proxy]     annotations:
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-shares: 2
2023/11/02 11:52:36 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_sandbox
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-period: 100000
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-quota: 0
2023/11/02 11:52:36 [adaptor/proxy]         nerdctl/network-namespace: /var/run/netns/cni-ed70521b-7e13-8f7a-0904-fae1524c62d4
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx_fbb552ad-c9ef-4e3d-9f5f-bdc771ee0f86
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-memory: 0
2023/11/02 11:52:36 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.container-type: sandbox
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/11/02 11:52:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: fbb552ad-c9ef-4e3d-9f5f-bdc771ee0f86
2023/11/02 11:52:36 [adaptor/proxy]     storages:
2023/11/02 11:52:36 [adaptor/proxy]         mount_point:/run/kata-containers/ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985/rootfs source:pause fstype:overlay driver:image_guest_pull
2023/11/02 11:52:36 [adaptor/proxy] CreateContainer: Ignoring PullImage before CreateContainer (cid: "ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985")
2023/11/02 11:52:36 [redirector] 

- nginx
2023/11/02 11:52:40 [adaptor/proxy] CreateContainer: containerID:8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866
2023/11/02 11:52:40 [adaptor/proxy]     mounts:
2023/11/02 11:52:40 [adaptor/proxy]         destination:/proc source:proc type:proc
2023/11/02 11:52:40 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2023/11/02 11:52:40 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2023/11/02 11:52:40 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2023/11/02 11:52:40 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2023/11/02 11:52:40 [adaptor/proxy]         destination:/sys/fs/cgroup source:cgroup type:cgroup
2023/11/02 11:52:40 [adaptor/proxy]         destination:/etc/hosts source:/run/kata-containers/shared/containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866-d66fd02ae612cf71-hosts type:bind
2023/11/02 11:52:40 [adaptor/proxy]         destination:/dev/termination-log source:/run/kata-containers/shared/containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866-1c23bf600bdc9a53-termination-log type:bind
2023/11/02 11:52:40 [adaptor/proxy]         destination:/etc/hostname source:/run/kata-containers/shared/containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866-1e835ce09b05cf22-hostname type:bind
2023/11/02 11:52:40 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866-ebbb6b8dcc65d556-resolv.conf type:bind
2023/11/02 11:52:40 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2023/11/02 11:52:40 [adaptor/proxy]         destination:/var/run/secrets/kubernetes.io/serviceaccount source:/run/kata-containers/shared/containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866-e07688da1ded0a5f-serviceaccount type:bind
2023/11/02 11:52:40 [adaptor/proxy]     annotations:
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: ac84f08f905d1157a48916214a947206a4ebe2bd1095ecb77c3edf2af95e7985
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/nginx:latest
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: fbb552ad-c9ef-4e3d-9f5f-bdc771ee0f86
2023/11/02 11:52:40 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.container-name: nginx
2023/11/02 11:52:40 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2023/11/02 11:52:40 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2023/11/02 11:52:40 [adaptor/proxy]     storages:
2023/11/02 11:52:40 [adaptor/proxy]         mount_point:/run/kata-containers/8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
2023/11/02 11:52:40 [adaptor/proxy] CreateContainer: Ignoring PullImage before CreateContainer (cid: "8d85707322932fb8f0fd149e19462e7f2d4fe15bce018eead6f52d5ea964b866")
2023/11/02 11:52:40 [redirector]

If there is pause or nginx image exists on host -- I guess it's image meta data rather than image contents, then it behaviors differently and randomly:

  • Sometime it succeeds completely as above.
  • Sometime it failed with error like:
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd[148229]: time="2023-11-02T11:48:41.609067537Z" level=info msg="CreateContainer within sandbox \"19df2ef541b8b8ceacdf64eaab2207d0858fccda60812d1e6beb3e99a4f73506\" for container &ContainerMetadata{Name:nginx,Attempt:0,}"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd-nydus-grpc[148220]: time="2023-11-02T11:48:41.612780700Z" level=info msg="[Prepare] snapshot with key k8s.io/57/5448e25ac40c7d8c6f993b21dde789dc7ec085578b2c49a4bbb1208792a76004 parent sha256:2eba5bfc4f9e1d514420079c70f86ac151039e6a804e6b83db2f095ea4213baa"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd-nydus-grpc[148220]: time="2023-11-02T11:48:41.614199673Z" level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/57/5448e25ac40c7d8c6f993b21dde789dc7ec085578b2c49a4bbb1208792a76004 parent="sha256:2eba5bfc4f9e1d514420079c70f86ac151039e6a804e6b83db2f095ea4213baa"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd-nydus-grpc[148220]: time="2023-11-02T11:48:41.614233117Z" level=info msg="Prepare active snapshot k8s.io/57/5448e25ac40c7d8c6f993b21dde789dc7ec085578b2c49a4bbb1208792a76004 in proxy mode" key=k8s.io/57/5448e25ac40c7d8c6f993b21dde789dc7ec085578b2c49a4bbb1208792a76004 parent="sha256:2eba5bfc4f9e1d514420079c70f86ac151039e6a804e6b83db2f095ea4213baa"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd-nydus-grpc[148220]: time="2023-11-02T11:48:41.614246541Z" level=debug msg="Prepare remote snapshot 9" key=k8s.io/57/5448e25ac40c7d8c6f993b21dde789dc7ec085578b2c49a4bbb1208792a76004 parent="sha256:2eba5bfc4f9e1d514420079c70f86ac151039e6a804e6b83db2f095ea4213baa"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 containerd[148229]: time="2023-11-02T11:48:41.629843898Z" level=error msg="CreateContainer within sandbox \"19df2ef541b8b8ceacdf64eaab2207d0858fccda60812d1e6beb3e99a4f73506\" for &ContainerMetadata{Name:nginx,Attempt:0,} failed" error="failed to create containerd container: create instance 9: object with key \"9\" already exists: unknown"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 kubelet[5787]: E1102 11:48:41.629966    5787 remote_runtime.go:319] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd container: create instance 9: object with key \"9\" already exists: unknown" podSandboxID="19df2ef541b8b8ceacdf64eaab2207d0858fccda60812d1e6beb3e99a4f73506"
Nov  2 11:48:41 huoqif-ztest-s390x-self-jp-node-1 kubelet[5787]: E1102 11:48:41.630172    5787 kuberuntime_manager.go:1256] container &Container{Name:nginx,Image:nginx,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{kata.peerpods.io/vm: {{1 0} {<nil>} 1 DecimalSI},},Requests:ResourceList{kata.peerpods.io/vm: {{1 0} {<nil>} 1 DecimalSI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-4lfl9,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod nginx_default(b8655acc-f449-4162-96b2-a44ca38e6f64): CreateContainerError: failed to create containerd container: create instance 9: object with key "9" already exists: unknown
  • Sometime pause pull is pulled like in cri-o but nginx pulled in guest via nydus-snapshotter
2023/11/02 06:21:15 [adaptor/cloud] agent proxy is ready
2023/11/02 06:21:15 [adaptor/proxy] CreateSandbox: hostname:nginx sandboxId:ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
2023/11/02 06:21:15 [adaptor/proxy]     storages:
2023/11/02 06:21:15 [adaptor/proxy]         mountpoint:/run/kata-containers/sandbox/shm source:shm fstype:tmpfs driver:ephemeral
2023/11/02 06:21:15 [redirector] Playload of CreateSandboxRequest: {
...
2023/11/02 06:21:15 [adaptor/proxy] CreateContainer: containerID:ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
2023/11/02 06:21:15 [adaptor/proxy]     mounts:
2023/11/02 06:21:15 [adaptor/proxy]         destination:/proc source:proc type:proc
2023/11/02 06:21:15 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2023/11/02 06:21:15 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2023/11/02 06:21:15 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2023/11/02 06:21:15 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2023/11/02 06:21:15 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2023/11/02 06:21:15 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824-479454fcbf34d879-resolv.conf type:bind
2023/11/02 06:21:15 [adaptor/proxy]     annotations:
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-memory: 0
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx_81bd1851-c1af-459d-ad59-ef1bd3945b61
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.container-type: sandbox
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-quota: 0
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-shares: 2
2023/11/02 06:21:15 [adaptor/proxy]         nerdctl/network-namespace: /var/run/netns/cni-29cb221b-d1d8-7abe-7f3f-4c1a19914fe4
2023/11/02 06:21:15 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_sandbox
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: 81bd1851-c1af-459d-ad59-ef1bd3945b61
2023/11/02 06:21:15 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-period: 100000
2023/11/02 06:21:15 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
2023/11/02 06:21:15 [adaptor/proxy] getImageName: no pause image specified uses default pause image: registry.k8s.io/pause:3.7
2023/11/02 06:21:15 [adaptor/proxy] CreateContainer: calling PullImage for "registry.k8s.io/pause:3.7" before CreateContainer (cid: "ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824")
...
2023/11/02 06:21:17 [adaptor/proxy] StartContainer: containerID:ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
...
2023/11/02 06:21:21 [adaptor/proxy] CreateContainer: containerID:bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14
2023/11/02 06:21:21 [adaptor/proxy]     mounts:
2023/11/02 06:21:21 [adaptor/proxy]         destination:/proc source:proc type:proc
2023/11/02 06:21:21 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2023/11/02 06:21:21 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2023/11/02 06:21:21 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2023/11/02 06:21:21 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2023/11/02 06:21:21 [adaptor/proxy]         destination:/sys/fs/cgroup source:cgroup type:cgroup
2023/11/02 06:21:21 [adaptor/proxy]         destination:/etc/hosts source:/run/kata-containers/shared/containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14-b0a6c2dc6338b7d8-hosts type:bind
2023/11/02 06:21:21 [adaptor/proxy]         destination:/dev/termination-log source:/run/kata-containers/shared/containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14-ce8f60288cfa4ec3-termination-log type:bind
2023/11/02 06:21:21 [adaptor/proxy]         destination:/etc/hostname source:/run/kata-containers/shared/containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14-02270cf9222c189f-hostname type:bind
2023/11/02 06:21:21 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14-26a575d6ce2f797a-resolv.conf type:bind
2023/11/02 06:21:21 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2023/11/02 06:21:21 [adaptor/proxy]         destination:/var/run/secrets/kubernetes.io/serviceaccount source:/run/kata-containers/shared/containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14-6a506d0d09eb1bcb-serviceaccount type:bind
2023/11/02 06:21:21 [adaptor/proxy]     annotations:
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.container-name: nginx
2023/11/02 06:21:21 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: 81bd1851-c1af-459d-ad59-ef1bd3945b61
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/nginx:latest
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2023/11/02 06:21:21 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: ac601892e1145f33119b825fa2c0ff9b22eb87b0ae01d4f363751bc1e2565824
2023/11/02 06:21:21 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2023/11/02 06:21:21 [adaptor/proxy]     storages:
2023/11/02 06:21:21 [adaptor/proxy]         mount_point:/run/kata-containers/bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
2023/11/02 06:21:21 [adaptor/proxy] CreateContainer: Ignoring PullImage before CreateContainer (cid: "bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14")
2023/11/02 06:21:21 [redirector] Playload of CreateContainerRequest:
...
2023/11/02 06:21:26 [adaptor/proxy] StartContainer: containerID:bdc129c514a2bf41c8399b6f73304d698689d6e0c3d20b8332566a02bff3fc14
...

@stevenhorsman @fidencio @ChengyuZhu6 FYI

@stevenhorsman
Copy link
Member Author

if I remove the images on host...I can get it worked as expected:

Thanks for the information. That's not ideal for managed kubernetes, but good to know

@huoqifeng
Copy link
Contributor

if I remove the images on host...I can get it worked as expected:

Thanks for the information. That's not ideal for managed kubernetes, but good to know

Agreed, we definitely have the case that images already existing on host, for example:

  • a runc pod uses same image
  • run several pods refer to same image
  • pause will always be on host
    List the cases which might help on @ChengyuZhu6 to debug the nydus code...

@fitzthum
Copy link
Contributor

fitzthum commented Nov 3, 2023

I am seeing similar behavior on my node btw. In fact I had forgotten to add the annotation to my yaml files but since I had already pulled the images when running the tests I did not notice.

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue Nov 13, 2023
Switch the nydus test to use alpine over nginx to avoid hitting
annotation caching issues as reported in:
kata-containers/kata-containers#8337

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
huoqifeng pushed a commit to confidential-containers/cloud-api-adaptor that referenced this issue Nov 14, 2023
Switch the nydus test to use alpine over nginx to avoid hitting
annotation caching issues as reported in:
kata-containers/kata-containers#8337

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Nov 16, 2023

It seems that whatever nydus-snapshotter enablement we use the first time we pull the image is used for all subsequent pulls meaning that if a user accidentally misses the cri-runtime annotation the first time, then they can't do pull on guest again. It might be that there is some file on the worker that I can remove that will reset this, but I'm not sure that's a great plan. I achieved this with the kata-remote runtime and cloud-api-adaptor, but I don't think that's it's related to that runtime class.

@stevenhorsman I reproduce the problem in my local machine with runtimeclass kata-qemu. When I created a pod without the annotation io.containerd.cri.runtime-handler: kata-qemu, it ran with overlayfs. However, even after I deleted the pod and added the annotation, the new pod still ran with overlayfs. So I believe that it's not caused by nydus-snapshotter.

Container runtimes identify container images only by their image name (or image reference) and/or image digest (a SHA256 hash of the content). Therefore, containerd should use a pair of (imageName, runtimeclass) to identify images. There is an issue on containerd to track it. I expect the issue to be resolved once the related work is done.

I hope these information helps you understand:

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Nov 16, 2023

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Nov 16, 2023

BTW, I believe the work is satisfactory for us to use snapshotters in CoCo, and the annotation about cri-handler will be deprecated soon. Therefore, if we want to use the feature of runtime-level snapshotter in containerd, we may need to assign one snapshotter per runtimeclass instead of using multiple snapshotters in the same runtimeclass.

That's the comments in the containerd code:

// RuntimeHandler an experimental annotation key for getting runtime handler from pod annotations.
// See https://github.com/containerd/containerd/issues/6657 and https://github.com/containerd/containerd/pull/6899 for details.
// The value of this annotation should be the runtime for sandboxes.
// e.g. for [plugins.cri.containerd.runtimes.runc] runtime config, this value should be runc
// TODO: we should deprecate this annotation as soon as kubelet supports passing RuntimeHandler from PullImageRequest
RuntimeHandler = "io.containerd.cri.runtime-handler"

@liudalibj
Copy link
Contributor

liudalibj commented Dec 4, 2023

Use busybox image to verify the cached issues, created a new peer-pod env for every case:

test case tee-pod-with-annotation tee-pod-without-annotation tee-pod-with-annotation tee-pod-without-annotation result
1 x x (CAA log, second pod still pull image with nydus), mark as FAIL
2 x x (CAA log, second pod pull image with nydus), mark as PASS
3 x x (CAA log, second pod pull image without nydus), mark as PASS
4 x x (CAA log, second pod pull image without nydus), mark as FAIL

liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 14, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 15, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 15, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 15, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
ChengyuZhu6 pushed a commit to ChengyuZhu6/kata-containers that referenced this issue Dec 18, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
Issue backlog
  
To do
Development

No branches or pull requests

5 participants