Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateContainerError: "failed to reserve container name is reserved for" #2312

Closed
kradalby opened this issue Sep 25, 2020 · 16 comments
Closed

Comments

@kradalby
Copy link

kradalby commented Sep 25, 2020

Environmental Info:
K3s Version:
v1.19.2+k3s1

Node(s) CPU architecture, OS, and Version:
Raspberry Pi 4GB (x 4)
Linux rpi2.ldn 5.4.51-v8+ #1333 SMP PREEMPT Mon Aug 10 16:58:35 BST 2020 aarch64 aarch64 aarch64 GNU/Linux
Ubuntu 20.04 (Netbooted over NFS)

Cluster Configuration:

1 master, 3 workers

Describe the bug:

Steps To Reproduce:

Running 3 Raspberry Pi's netbooted off the "master" node;

  • Installed K3s
  • Applied a Kubernetes manifest
  • ContainerCreationError

Tried with multiple different manifests

Expected behavior:

Container to schedule and run on a worker node

Actual behavior:

Failing with possibly several errors, failed to reserve container name stands out.

Additional context / logs:

k3s-agent log:

Sep 25 12:39:23 rpi2.ldn k3s[359]: I0925 12:39:23.517191     359 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 4ef2837e74c3b6c5d39ba120cdb6082de84a470b1da2845416e6690b04cb6f6c
Sep 25 12:39:23 rpi2.ldn k3s[359]: E0925 12:39:23.550222     359 remote_runtime.go:224] CreateContainer in sandbox "e5209adbf83d2a5615705cf7f53d51e74f72c1cf4711735e41d16603ca13453a" from runtime service failed: rpc error: code = Unknown desc = failed to reserve container name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1": name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1" is reserved for "d2ab3fef7210908374693aa902a84e7b5b01abb5ff4b1d73834f835f9b2ac2d7"
Sep 25 12:39:23 rpi2.ldn k3s[359]: E0925 12:39:23.550581     359 kuberuntime_manager.go:804] container &Container{Name:homebridge,Image:oznu/homebridge:no-avahi-arm64v8,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:51325,ContainerPort:51325,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:HOMEBRIDGE_CONFIG_UI,Value:1,ValueFrom:nil,},EnvVar{Name:HOMEBRIDGE_CONFIG_UI_PORT,Value:9090,ValueFrom:nil,},EnvVar{Name:TERMINATE_ON_ERROR,Value:1,ValueFrom:nil,},EnvVar{Name:PGID,Value:1000,ValueFrom:nil,},EnvVar{Name:PUID,Value:1000,ValueFrom:nil,},EnvVar{Name:TZ,Value:Europe/London,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:homebridge-persist-nfs,ReadOnly:false,MountPath:/homebridge/persist,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:homebridge-dir,ReadOnly:false,MountPath:/homebridge,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:default-token-84rxw,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod homebridge-59788f6597-hmhm5_homebridge(5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a): CreateContainerError: failed to reserve container name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1": name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1" is reserved for "d2ab3fef7210908374693aa902a84e7b5b01abb5ff4b1d73834f835f9b2ac2d7"
Sep 25 12:39:23 rpi2.ldn k3s[359]: E0925 12:39:23.550707     359 pod_workers.go:191] Error syncing pod 5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a ("homebridge-59788f6597-hmhm5_homebridge(5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a)"), skipping: failed to "StartContainer" for "homebridge" with CreateContainerError: "failed to reserve container name \"homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1\": name \"homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_1\" is reserved for \"d2ab3fef7210908374693aa902a84e7b5b01abb5ff4b1d73834f835f9b2ac2d7\""

kubectl describe -n homebridge po homebridge-59788f6597-hmhm5

Name:         homebridge-59788f6597-hmhm5
Namespace:    homebridge
Priority:     0
Node:         rpi2.ldn/10.65.0.31
Start Time:   Fri, 25 Sep 2020 13:08:54 +0100
Labels:       app=homebridge
              pod-template-hash=59788f6597
Annotations:  kompose.cmd: kompose convert
              kompose.version: 1.20.0 ()
Status:       Running
IP:           10.42.1.64
IPs:
  IP:           10.42.1.64
Controlled By:  ReplicaSet/homebridge-59788f6597
Init Containers:
  correct-config-permissions:
    Container ID:  containerd://a7e28e203716cfeec3df3e512c5ff2f59fc59b239c73a77368347cfdb73449d3
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:d366a4665ab44f0648d7a00ae3fae139d55e32f9712c67accd604bb55df9d05a
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      cp -r /homebridge-config/* /homebridge
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 25 Sep 2020 13:09:11 +0100
      Finished:     Fri, 25 Sep 2020 13:09:11 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /homebridge from homebridge-dir (rw)
      /homebridge-config/config.json from homebridge-config (rw,path="config.json")
      /homebridge-config/startup.sh from homebridge-config (rw,path="startup.sh")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-84rxw (ro)
Containers:
  homebridge:
    Container ID:   containerd://4ef2837e74c3b6c5d39ba120cdb6082de84a470b1da2845416e6690b04cb6f6c
    Image:          oznu/homebridge:no-avahi-arm64v8
    Image ID:       docker.io/oznu/homebridge@sha256:7d91488eedf5b3ddfef6e5460065ce5d5f44baabe2d7273040f9f21b356c1dab
    Port:           51325/TCP
    Host Port:      51325/TCP
    State:          Waiting
      Reason:       CreateContainerError
    Last State:     Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  0
    Environment:
      HOMEBRIDGE_CONFIG_UI:       1
      HOMEBRIDGE_CONFIG_UI_PORT:  9090
      TERMINATE_ON_ERROR:         1
      PGID:                       1000
      PUID:                       1000
      TZ:                         Europe/London
    Mounts:
      /homebridge from homebridge-dir (rw)
      /homebridge/persist from homebridge-persist-nfs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-84rxw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  homebridge-persist-nfs:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    10.65.0.1
    Path:      /k3s/homebridge
    ReadOnly:  false
  homebridge-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      homebridge-config
    Optional:  false
  homebridge-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  default-token-84rxw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-84rxw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  27m                    default-scheduler  Successfully assigned homebridge/homebridge-59788f6597-hmhm5 to rpi2.ldn
  Normal   Pulling    27m                    kubelet            Pulling image "busybox"
  Normal   Pulled     27m                    kubelet            Successfully pulled image "busybox" in 1.91139412s
  Normal   Created    27m                    kubelet            Created container correct-config-permissions
  Normal   Started    27m                    kubelet            Started container correct-config-permissions
  Warning  Failed     25m                    kubelet            Error: context deadline exceeded
  Warning  Failed     23m (x9 over 25m)      kubelet            Error: failed to reserve container name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_0": name "homebridge_homebridge-59788f6597-hmhm5_homebridge_5741ca94-e5bb-429e-a4dc-9aaf8fc7ec0a_0" is reserved for "4ef2837e74c3b6c5d39ba120cdb6082de84a470b1da2845416e6690b04cb6f6c"
  Normal   Pulled     2m42s (x106 over 27m)  kubelet            Container image "oznu/homebridge:no-avahi-arm64v8" already present on machine

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: homebridge
  name: homebridge
  namespace: homebridge
spec:
  replicas: 1
  strategy: {}
  selector:
    matchLabels:
      app: homebridge
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.20.0 ()
      creationTimestamp: null
      labels:
        app: homebridge
    spec:
      volumes:
        - name: homebridge-persist-nfs
          nfs:
            path: /k3s/homebridge
            server: 10.65.0.1
        - name: homebridge-config
          configMap:
            name: homebridge-config
        - name: homebridge-dir
          emptyDir: {}
      initContainers:
        - name: correct-config-permissions
          image: busybox
          # Set the user within the container so the copied files
          # will have the correct permissions in the homebridge
          # container.
          securityContext:
            runAsUser: 1000
            runAsGroup: 1000
          # Give `homebridge` user (id 1000) permissions a mounted volume
          command:
            - sh
            - -c
            - cp -r /homebridge-config/* /homebridge
          volumeMounts:
            - name: homebridge-config
              mountPath: "/homebridge-config/startup.sh"
              subPath: "startup.sh"
            - name: homebridge-config
              mountPath: "/homebridge-config/config.json"
              subPath: "config.json"
            - name: homebridge-dir
              mountPath: "/homebridge"
      containers:
        - env:
            - name: HOMEBRIDGE_CONFIG_UI
              value: "1"
            - name: HOMEBRIDGE_CONFIG_UI_PORT
              value: "9090"
            - name: TERMINATE_ON_ERROR
              value: "1"
            - name: PGID
              value: "1000"
            - name: PUID
              value: "1000"
            - name: TZ
              value: Europe/London
          image: oznu/homebridge:no-avahi-arm64v8
          ports:
            - containerPort: 51325
              hostPort: 51325
          volumeMounts:
            - name: homebridge-persist-nfs
              mountPath: "/homebridge/persist"
            - name: homebridge-dir
              mountPath: "/homebridge"
          name: homebridge
          resources: {}
      restartPolicy: Always

@brandond
Copy link
Contributor

Can you share more information about the process you used to netboot your nodes? What are you using for storage? What filesystem is /var on?

@kradalby
Copy link
Author

Hi, of course.

I use the "official" netboot feature of the Raspberry Pi's, where it looks for a PXE boot server and fetches the bootfiles over TFTP, then the "cmdline" containing the "bootline" will point to the NFS server:

console=serial0,115200 console=tty1 cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 root=/dev/nfs nfsroot=10.65.0.1:/nfs/rpi2,vers=4.1,proto=tcp rw ip=dhcp rootwait elevator=deadline

The main Raspberry Pi (which holds the disk) has a 250GB SSD with EXT4, served over NFS v4.1.

This means that / on the "worker" nodes is NFS with EXT4 underlying, including /var.

@brandond
Copy link
Contributor

Are all nodes using the same /var? That might account for some of the errors about things already existing. Either way, I don't think you can run containerd or docker on a nfs filesystem. If you could mount some local (non-shared) storage at the correct locations that would probably work.

@kradalby
Copy link
Author

kradalby commented Sep 26, 2020 via email

@brandond
Copy link
Contributor

brandond commented Sep 27, 2020

If you're going to try to get k3s working with netboot I would recommend trying to get it working with a network block device like iscsi. That way you can at least have a normal filesystem, even without local storage. You're in fairly uncharted waters with an entire system running off nfs.

@kradalby
Copy link
Author

Shifted to iSCSI, tings works as expected. I guess my recommendation is to not use NFS when doing Kubernetes/Containers/K3s

@jiangytcn
Copy link

hi @brandond I had the same issue recently, my testing lab has setup 2 k3s nodes, 1 server and 1 agent. The two servers stopped for couple days and after booted recently, some pods failed to start. From the k3s logs in systems, shows the same failure message mentioned above.

anything can help to reset the cluster ?
here's the server setup for disk
image

image

@brandond
Copy link
Contributor

@jiangytcn It doesn't look like you're using NFS, so I don't think this is the same problem. Can you open a new issue and fill out the template? Make sure to include the specific error messages you're seeing.

@jiangytcn
Copy link

thanks @brandond, had to reset the cluster state for the urgent task, will create issue separately once I had the problem.

@rur0
Copy link

rur0 commented Nov 8, 2020

Experiencing the same issue running k3s HA with external db (2 masters, 6 workers) over NFS; each node having its own root&boot.
Is there a technical limitation of NFS that it is not possible to run k3s on it?

@brandond
Copy link
Contributor

brandond commented Nov 9, 2020

I can't find this documented specifically anywhere, but I don't believe nfs is a supported fs for docker or containerd. Just doesn't seem to work right.

@rur0
Copy link

rur0 commented Nov 10, 2020

That sucks, since iSCSI is bit of a pain to set up on rpi devices because the kernel modules are not distributed on official raspberry pi os. That means manually compiling custom kernel with required modules, which possibly means every kernel update will have to be done manually.

Does only /var has to be mounted using iSCSI or / ?
Thanks

@brandond
Copy link
Contributor

I am pretty sure that Ubuntu for Pi has the iSCSI modules available, you might give that a try.

@212850a
Copy link

212850a commented May 31, 2021

I have the same setup K3s on RPi's 4 with netboot over NFS and the same problem "failed to reserve container name is reserved for" when I try to deploy home-assistant via helm. However it works fine for other containers which were install via helm either (like kube-prometheus-stack, pihole, plex and so on).
Interesting why the same setup causes the problem for homebridge and home-assistant...

@hedefalk
Copy link

hedefalk commented Mar 6, 2023

+1 with the same issue. I'm running a pi4 * 4 cluster, ssd disk attached to one of them and the rest PXE booting off that. Then k3s master on the one with the disk and agents on the other ones using NFS to the master. Have been running fine for a while, but spurious errors when adding a node:

 Warning  Failed  57m (x4 over 58m)      kubelet  Error: failed to reserve container name "lb-tcp-80_svclb-traefik-45d167d6-5lnh5_kube-system_b86cae2e-3be6-4adc-ba87-67c0390c3fb2_9": name "lb-tcp-80_svclb-traefik-45d167d6-5lnh5_kube-system_b86cae2e-3be6-4adc-ba87-67c0390c3fb2_9" is reserved for "d0fd070c0fbdf98e28031428e82371c3fdc4104e06943b1785331a24e0d8631e"

@hedefalk
Copy link

hedefalk commented Jun 3, 2023

After moving away from NFS entirely I'm still experiencing these problems again, primarilly after a reboot of control-plane/master nodes. Pi cluster of only two with their own usb ssd's.

It feels like a congestion against containerd because of timeouts?

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               15m                    default-scheduler        Successfully assigned default/homeassistant-5cbf8d97df-gwrr2 to pi1
  Normal   SuccessfulAttachVolume  15m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8e69b2b4-15ea-44ac-a1d9-efb9bd803614"
  Warning  Failed                  12m (x2 over 12m)      kubelet                  Error: failed to reserve container name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_0": name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_0" is reserved for "a865d5946f01eae4e91b4ff40729345e3e610b3ac4e03c99bb17c83dc281a119"
  Warning  Failed                  8m8s                   kubelet                  Error: failed to reserve container name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_2": name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_2" is reserved for "925ed3a190a319273dc4345820f2ec32d59ae60c14a975698d29f3b3ace5299a"
  Warning  Failed                  5m54s                  kubelet                  Error: failed to reserve container name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_3": name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_3" is reserved for "2bf5dfe14206c4909d736443fcfdf3d996389b2ef44f1eed05108e2a05c6af34"
  Warning  Failed                  3m29s (x2 over 3m42s)  kubelet                  Error: failed to reserve container name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_4": name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_4" is reserved for "622668b5897b44f6f01b143dae12b54df4cddbe8d9def81973305537b83323e0"
  Warning  Failed                  75s (x6 over 12m)      kubelet                  Error: context deadline exceeded
  Warning  Failed                  74s                    kubelet                  Error: failed to reserve container name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_5": name "homeassistant_homeassistant-5cbf8d97df-gwrr2_default_6e7b2d6b-b857-4cf8-adfb-a82c987517e1_5" is reserved for "9d810172d8f4b2d8f24e222aee4168f257979ea6fb2f6960b9f5fc57f66dc167"
  Normal   Pulled                  63s (x14 over 14m)     kubelet                  Container image "homeassistant/home-assistant:2023.5" already present on machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants