Skip to content

Fallback to default registry endpoint is broken when using "*" wildcard mirror in registries.yaml with containerd 2.0 #11857

@lirtistan

Description

@lirtistan

Environmental Info:
K3s Version:

k3s version v1.32.2+k3s1 (381620ef)
go version go1.23.6

Node(s) CPU architecture, OS, and Version:

2 Node Test Cluster uname -r reporting 6.1.0-30-amd64

Both installed with a minimal Debian 12 OS (ansible deployment)

root@staging1:~# kubectl get nodes -o wide
NAME       STATUS   ROLES                  AGE   VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
staging1   Ready    control-plane,master   61m   v1.32.2+k3s1   172.16.0.1    172.16.0.1    Debian GNU/Linux 12 (bookworm)   6.1.0-30-amd64   containerd://2.0.2-k3s2
staging2   Ready    control-plane,master   60m   v1.32.2+k3s1   172.16.0.2    172.16.0.2    Debian GNU/Linux 12 (bookworm)   6.1.0-30-amd64   containerd://2.0.2-k3s2

Cluster Configuration:

  • 2 Node (both as control-plane,master) VMs running in stock KVM (libvirt)
  • every Node has next to lo two Interfaces eth0 for WAN Traffic and eth1 for LAN Traffic
  • Host: staging1 | LAN-IP: 172.16.0.1 | WAN-IP: 192.168.122.246
  • Host: staging2 | LAN-IP: 172.16.0.2 | WAN-IP: 192.168.122.90
  • Following helm charts are installed...
    root@staging2:~# helm list -A
    NAME         	NAMESPACE      	REVISION	UPDATED                                	STATUS  	CHART               	APP VERSION
    cilium       	cilium-system  	1       	2025-02-28 07:28:32.804174608 +0100 CET	deployed	cilium- 1.17.1       	1.17.1     
    ingress-nginx	ingress-nginx  	1       	2025-02-28 07:31:31.713591888 +0100 CET	failed  	ingress-nginx-4.12.0	1.12.0     
    longhorn     	longhorn-system	1       	2025-02-28 07:30:47.23270258 +0100 CET 	deployed	longhorn-1.8.0      	v1.8.0  
    

Describe the bug:
New Workload-Deployments in K3s v1.32.2+k3s1 are failing/hanging in ContainerCreating status, because something must have changed with the format of the registries.yaml config.

root@staging1:~# kubectl get pods -A -o wide
NAMESPACE         NAME                                       READY   STATUS              RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
cilium-system     cilium-envoy-krklb                         0/1     ContainerCreating   0          4m29s   172.16.0.2   staging2   <none>           <none>
cilium-system     cilium-envoy-vmckv                         0/1     ContainerCreating   0          5m48s   172.16.0.1   staging1   <none>           <none>
cilium-system     cilium-operator-85bf6f5694-sc5x8           0/1     ContainerCreating   0          5m48s   172.16.0.2   staging2   <none>           <none>
cilium-system     cilium-operator-85bf6f5694-x6k7v           0/1     ContainerCreating   0          5m48s   172.16.0.1   staging1   <none>           <none>
cilium-system     cilium-p9fpg                               0/1     Init:0/6            0          4m29s   172.16.0.2   staging2   <none>           <none>
cilium-system     cilium-pnpn6                               0/1     Init:0/6            0          5m48s   172.16.0.1   staging1   <none>           <none>
cilium-system     hubble-relay-75d4f954d-gnlsg               0/1     ContainerCreating   0          5m48s   <none>       staging1   <none>           <none>
cilium-system     hubble-relay-75d4f954d-slc5r               0/1     ContainerCreating   0          5m48s   <none>       staging1   <none>           <none>
ingress-nginx     ingress-nginx-admission-create-n2852       0/1     ContainerCreating   0          2m51s   <none>       staging2   <none>           <none>
kube-system       coredns-ff8999cc5-mwjg6                    0/1     ContainerCreating   0          6m11s   <none>       staging1   <none>           <none>
kube-system       coredns-ff8999cc5-nzpfl                    0/1     ContainerCreating   0          6m11s   <none>       staging1   <none>           <none>
kube-system       local-path-provisioner-774c6665dc-bzlcr    0/1     ContainerCreating   0          6m11s   <none>       staging1   <none>           <none>
kube-system       metrics-server-6f4c6675d5-zjdpk            0/1     ContainerCreating   0          6m11s   <none>       staging1   <none>           <none>
longhorn-system   longhorn-driver-deployer-b8bc4675f-wfhw2   0/1     Init:0/1            0          3m34s   <none>       staging2   <none>           <none>
longhorn-system   longhorn-manager-qcg9t                     0/2     ContainerCreating   0          3m34s   <none>       staging1   <none>           <none>
longhorn-system   longhorn-manager-zdjmp                     0/2     ContainerCreating   0          3m34s   <none>       staging2   <none>           <none>
longhorn-system   longhorn-ui-7749bb466f-52gcb               0/1     ContainerCreating   0          3m34s   <none>       staging1   <none>           <none>
longhorn-system   longhorn-ui-7749bb466f-gjk9k               0/1     ContainerCreating   0          3m34s   <none>       staging2   <none>           <none>

Output from a cilium-agent Pod describe:

 Warning  FailedCreatePodSandBox  7s (x9 over 110s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = NotFound desc = failed to start sandbox "eb7a7309006ec34550497ac44a5aea5b18e4348f076babd763cb1c1a19fe5d6d": failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": docker.io/rancher/mirrored-pause:3.6: not found

So i moved /etc/rancher/k3s/registries.yaml to another location and restarted the k3s.service and voila everything got pulled.

Content of the registries.yaml:

mirrors:
  "*":
    endpoint:
    - "http://localhost:5000"
configs:
  "docker.io":
  "quay.io":
  "*":
    tls:
      insecure_skip_verify: true

Steps To Reproduce:

see Bug-Description above.

Expected behavior:

registries.yaml hasn't changed between my earlier deployments nor does the Documentation mention something.
So everything should working.

Actual behavior:

Images can't be pulled, the root cause is actually unknown, currently i haven't much time to dive into the code.

Additional context / logs:

see Bug-Description above.

Metadata

Metadata

Labels

kind/upstream-issueThis issue appears to be caused by an upstream bugwaiting-for-RCIssue is available to test only after we have an RC

Type

Projects

Status

Done Issue

Relationships

None yet

Development

No branches or pull requests

Issue actions