ImagePullBackOff on hcloud-csi-node and hcloud-csi-controller #442

melalj · 2022-12-05T11:04:10Z

As I started a fresh cluster, the terraform execution went well, but after running kubectl get pods --all-namespaces I get the pods hcloud-csi-node and hcloud-csi-controller stuck on the error ImagePullBackOff

When I inspect one them I get that k8s.gcr.io is being forbidden:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  5m12s                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled         3m8s                 default-scheduler  Successfully assigned kube-system/hcloud-csi-controller-bb7658b8f-5fbbq to mycluster-it05-worker-large-lmb
  Normal   Pulled            3m7s                 kubelet            Successfully pulled image "hetznercloud/hcloud-csi-driver:2.1.0" in 130.499337ms
  Warning  Failed            3m7s                 kubelet            Failed to pull image "k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2": rpc error: code = Unknown desc = failed to pull and unpack image "k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2": failed to resolve reference "k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2": pulling from host k8s.gcr.io failed with status code [manifests v2.2.2]: 403 Forbidden
  Warning  Failed            3m7s                 kubelet            Error: ErrImagePull
  Normal   Pulling           3m7s                 kubelet            Pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.2.0"
  Warning  Failed            3m7s                 kubelet            Failed to pull image "k8s.gcr.io/sig-storage/csi-resizer:v1.2.0": rpc error: code = Unknown desc = failed to pull and unpack image "k8s.gcr.io/sig-storage/csi-resizer:v1.2.0": failed to resolve reference "k8s.gcr.io/sig-storage/csi-resizer:v1.2.0": pulling from host k8s.gcr.io failed with status code [manifests v1.2.0]: 403 Forbidden
  Warning  Failed            3m7s                 kubelet            Error: ErrImagePull
  Normal   Pulling           3m7s                 kubelet            Pulling image "k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2"
  Normal   Created           3m7s                 kubelet            Created container hcloud-csi-driver
  Warning  Failed            3m7s                 kubelet            Error: ErrImagePull
  Normal   Pulling           3m7s                 kubelet            Pulling image "hetznercloud/hcloud-csi-driver:2.1.0"
  Normal   Pulling           3m7s                 kubelet            Pulling image "k8s.gcr.io/sig-storage/csi-attacher:v3.2.1"
  Warning  Failed            3m7s                 kubelet            Failed to pull image "k8s.gcr.io/sig-storage/csi-attacher:v3.2.1": rpc error: code = Unknown desc = failed to pull and unpack image "k8s.gcr.io/sig-storage/csi-attacher:v3.2.1": failed to resolve reference "k8s.gcr.io/sig-storage/csi-attacher:v3.2.1": pulling from host k8s.gcr.io failed with status code [manifests v3.2.1]: 403 Forbidden
  Warning  Failed            3m7s                 kubelet            Failed to pull image "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0": rpc error: code = Unknown desc = failed to pull and unpack image "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0": failed to resolve reference "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0": pulling from host k8s.gcr.io failed with status code [manifests v2.3.0]: 403 Forbidden
  Normal   Pulling           3m7s                 kubelet            Pulling image "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0"
  Normal   Started           3m7s                 kubelet            Started container hcloud-csi-driver
  Warning  Failed            3m7s                 kubelet            Error: ErrImagePull
  Warning  Failed            3m6s                 kubelet            Error: ImagePullBackOff
  Warning  Failed            3m6s                 kubelet            Error: ImagePullBackOff
  Normal   BackOff           3m6s                 kubelet            Back-off pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.2.0"
  Warning  Failed            3m6s                 kubelet            Error: ImagePullBackOff
  Normal   BackOff           3m6s                 kubelet            Back-off pulling image "k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2"
  Normal   BackOff           3m6s                 kubelet            Back-off pulling image "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0"
  Warning  Failed            3m6s                 kubelet            Error: ImagePullBackOff
  Normal   BackOff           3m5s (x2 over 3m6s)  kubelet            Back-off pulling image "k8s.gcr.io/sig-storage/csi-attacher:v3.2.1"

FYI here's my terraform file:

terraform {
  required_version = ">= 1.3.5"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "1.36.1"
    }
  }
  backend "gcs" {
    bucket = "xxx"
    credentials = "xxx"
  }
}

provider "hcloud" {
  token = var.hcloud_token
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "1.6.8"
  ssh_public_key = file(var.ssh_public_key)
  ssh_private_key = file(var.ssh_private_key)
  network_region = var.network_region
  enable_cert_manager = false

  control_plane_nodepools = [
    {
      name        = "master",
      server_type = "cpx11",
      location    = var.node_location,
      labels      = [],
      taints      = [],
      count       = var.node_count_master
    }
  ]

  agent_nodepools = [
    {
      name        = "worker-small",
      server_type = "cpx11",
      location    = var.node_location,
      labels      = [],
      taints = [],
      count       = var.node_count_workers_small
    },
    {
      name        = "worker-medium",
      server_type = "cpx21",
      location    = var.node_location,
      labels      = [],
      taints      = [],
      count = var.node_count_workers_medium
    },
    {
      name        = "worker-large",
      server_type = "cpx31",
      location    = var.node_location,
      labels      = [],
      taints      = [],
      count = var.node_count_workers_large
    }
  ]

  load_balancer_type     = "lb11"
  load_balancer_location = var.node_location
  base_domain = ""
  cluster_name = var.cluster_name

  extra_firewall_rules = [
    # all TCP
    {
      description     = "TCP all"
      direction       = "out"
      protocol        = "tcp"
      port            = "any"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
    },
    # all UDP
    {
      description     = "UDP all"
      direction       = "out"
      protocol        = "udp"
      port            = "any"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
    }
  ]
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

Any help would be much appreciated

The text was updated successfully, but these errors were encountered:

mysticaltech · 2022-12-06T01:13:32Z

@melalj Please terraform destroy -auto-approve and terraform init -upgrade and try again.

Also try clear the content in extra_firewall_rules, basically, I think the rules you have are blocking all outgoing comms and that may be interfering with the container pull.

mysticaltech · 2022-12-06T01:16:34Z

Here are the default firewall rules if you are curious, and most are needed for the proper functioning of the cluster.

terraform-hcloud-kube-hetzner/locals.tf

Line 112 in 9b4e813

base_firewall_rules = concat([

melalj · 2022-12-06T02:21:52Z

I have filed a ticket on hcloud csi-driver repo (hetznercloud/csi-driver#339 (comment)), And they mentioned that it might be due to a temporary outage in the k8s.gcr.io registry. I solved that by using a proxy registry.

Regarding the firewall rules, what I did was to open all outgoing network traffic:

extra_firewall_rules = [
    # all TCP
    {
      description     = "TCP all"
      direction       = "out"
      protocol        = "tcp"
      port            = "any"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
    },
    # all UDP
    {
      description     = "UDP all"
      direction       = "out"
      protocol        = "udp"
      port            = "any"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
    }
  ]

Any insights on why this is not a good practice?

Thanks :)

mysticaltech · 2022-12-07T04:40:16Z

My bad @melalj yes you did open the out traffic to the world indeed. My bad, I got confused for a sec there.

IMHO I wouldn't say it's best practice unless you know a service needs to use a specific port, so you can always open that later by adding the firewall rule for that specific port in terraform and applying again.

mysticaltech · 2022-12-07T04:46:18Z

@melalj Out of curiosity, by proxy registry do you mean you used our new k3s_registries feature that adds support for k3s private registries, or something else?

melalj · 2022-12-07T11:47:52Z

Exact :) that feature came in pretty handy!

I used a Sonatype Nexus self-hosted registry that hosts my private docker images but also proxies all public repositories (docker.io, k8s.gcr.io, registry.k8s.io...)

mysticaltech · 2022-12-07T15:07:22Z

Wonderful, good to hear!

@s3rius Thanks again for your contribution! 🙏

mysticaltech closed this as completed Dec 7, 2022

alexisgardin mentioned this issue Dec 11, 2022

403 - Forbidden ErrImagePull #451

Closed

mysticaltech mentioned this issue Jan 10, 2023

How to fix CSI Image pull error on a single node #469

Closed

mysticaltech mentioned this issue Jan 26, 2023

Random download failures - 403 errors [hetzner] kubernetes/registry.k8s.io#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImagePullBackOff on hcloud-csi-node and hcloud-csi-controller #442

ImagePullBackOff on hcloud-csi-node and hcloud-csi-controller #442

melalj commented Dec 5, 2022

mysticaltech commented Dec 6, 2022

mysticaltech commented Dec 6, 2022

melalj commented Dec 6, 2022

mysticaltech commented Dec 7, 2022

mysticaltech commented Dec 7, 2022

melalj commented Dec 7, 2022

mysticaltech commented Dec 7, 2022

ImagePullBackOff on hcloud-csi-node and hcloud-csi-controller #442

ImagePullBackOff on hcloud-csi-node and hcloud-csi-controller #442

Comments

melalj commented Dec 5, 2022

mysticaltech commented Dec 6, 2022

mysticaltech commented Dec 6, 2022

melalj commented Dec 6, 2022

mysticaltech commented Dec 7, 2022

mysticaltech commented Dec 7, 2022

melalj commented Dec 7, 2022

mysticaltech commented Dec 7, 2022