Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restartPolicy Never does not fail pod if init container fails to pull image #83622

Open
sreya92 opened this issue Oct 8, 2019 · 10 comments

Comments

@sreya92
Copy link

commented Oct 8, 2019

What happened:
A pod with a restartPolicy: Never has an init container with an invalid image. Instead of being marked Failed, it is marked Pending indefinitely.

What you expected to happen:
The pod to be marked Failed after the image was unsuccessfully pulled

How to reproduce it (as minimally and precisely as possible):

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  restartPolicy: Never
  initContainers:
    - name: init
      image: some-nonexistent-image
  containers:
    - name: main
      image: ubuntu

kubectl describe pod test

Name:         test
Namespace:    <redacted>
Priority:     0
Node:         <redacted>
Start Time:   Tue, 08 Oct 2019 11:54:24 -0500
Labels:       <none>
Annotations:  cni.projectcalico.org/podIP: 10.8.1.254/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"test","namespace":"<redacted>"},"spec":{"containers":[{"image...
Status:       Pending
IP:           10.8.1.254
Init Containers:
  init:
    Container ID:
    Image:          some-nonexistent-image
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b5zn2 (ro)
Containers:
  main:
    Container ID:
    Image:          ubuntu
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b5zn2 (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-b5zn2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-b5zn2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason          Age                 From                                              Message
  ----     ------          ----                ----                                              -------
  Normal   Scheduled       25m                 default-scheduler                                 Successfully assigned <redacted>
  Normal   SandboxChanged  25m                 kubelet, <redacted>  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling         23m (x4 over 25m)   kubelet, <redacted>  Pulling image "some-nonexistent-image"
  Warning  Failed          23m (x4 over 25m)   <redacted>  Failed to pull image "some-nonexistent-image": rpc error: code = Unknown desc = Error response from daemon: pull access denied for some-nonexistent-image, repository does not exist or may require 'docker login'
  Warning  Failed          23m (x4 over 25m)   kubelet, <redacted>  Error: ErrImagePull
  Normal   BackOff         10m (x67 over 25m)  kubelet, <redacted>  Back-off pulling image "some-nonexistent-image"
  Warning  Failed          2s (x112 over 25m)  kubelet, <redacted>  Error: ImagePullBackOff

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"archive", BuildDate:"2019-08-29T18:43:18Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.6-gke.1", GitCommit:"61c30f98599ad5309185df308962054d9670bafa", GitTreeState:"clean", BuildDate:"2019-08-28T11:06:42Z", GoVersion:"go1.12.9b4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    GKE
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

@kubernetes/sig-api-machinery-bugs

/sig api-machinery

@sreya92 sreya92 added the kind/bug label Oct 8, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

@sreya92: The label(s) sig/bugs cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

@sreya92: The label(s) sig/bugs cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sreya92

This comment has been minimized.

Copy link
Author

commented Oct 8, 2019

@kubernetes/sig-api-machinery-bugs

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

In the log for kubelet, do you see the following ?

		klog.V(5).Infof("pod default case, pending")
@sreya92

This comment has been minimized.

Copy link
Author

commented Oct 8, 2019

@tedyu On the node where the pod was scheduled

sudo journalctl -u kubelet | grep -i "pod default case"

does not return anything

@sreya92

This comment has been minimized.

Copy link
Author

commented Oct 8, 2019

There are plenty of these however

init container start failed: ErrImagePull: rpc error: code = Unknown desc = Error response from daemon: pull access denied for some-nonexistent-image, repository does not exist or may require 'docker login'
Error syncing pod 4e20faa3-ea07-11e9-a691-42010a8001d9 ("<redacted>(4e20faa3-ea07-11e9-a691-42010a8001d9)"), skipping: failed to "StartContainer" for "init" with ErrImagePull: "rpc error: code = Unknown desc = Error response from daemon: pull access denied for some-nonexistent-image, repository does not exist or may require 'docker login'"
@tedyu

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

Initial thought:

diff --git a/pkg/kubelet/errors.go b/pkg/kubelet/errors.go
index 7d6a5dd9ce..bf14d26332 100644
--- a/pkg/kubelet/errors.go
+++ b/pkg/kubelet/errors.go
@@ -21,6 +21,7 @@ import "errors"
 const (
        // NetworkNotReadyErrorMsg is used to describe the error that network is not ready
        NetworkNotReadyErrorMsg = "network is not ready"
+       ImagePullErrorMsg = "with ErrImagePull"
 )

 var (
diff --git a/pkg/kubelet/kubelet.go b/pkg/kubelet/kubelet.go
index e343ca1536..c4eb7891e4 100644
--- a/pkg/kubelet/kubelet.go
+++ b/pkg/kubelet/kubelet.go
@@ -2015,6 +2015,9 @@ func (kl *Kubelet) dispatchWork(pod *v1.Pod, syncType kubetypes.SyncPodType, mir
                        if err != nil {
                                metrics.PodWorkerDuration.WithLabelValues(syncType.String()).Observe(metrics.SinceInSeconds(start))
                                metrics.DeprecatedPodWorkerLatency.WithLabelValues(syncType.String()).Observe(metrics.SinceInMicroseconds(start))
+                               if strings.Contains(err.Error(), ImagePullErrorMsg) {
+                                       // convey the error upstream
+                               }
                        }
                },
        })

Need to figure out proper way of bubbling up the error in OnCompleteFunc

@zhouya0

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

I'm running pod with wrong image in k8s 1.15.3. I found no matter wrong image in initContainers or wrong image in Containers. They all have Pending status.

@sreya92

This comment has been minimized.

Copy link
Author

commented Oct 9, 2019

@roycaihw

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

/remove-sig api-machinery
/sig node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.