Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod should not say "pending" when a fundamental failure has occured #7968

Closed
aronchick opened this issue May 8, 2015 · 12 comments
Closed

Pod should not say "pending" when a fundamental failure has occured #7968

aronchick opened this issue May 8, 2015 · 12 comments
Assignees
Labels
area/introspection sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@aronchick
Copy link
Contributor

Repro:

  • Create an RC.yml with an unknown image
  • Attempt to create RC

Expected:

  • Reported error in status (rather than "Pending")

Actual:

  • Reports Pending indefinitely

Generally, we should have a different message/state when it's a known bad issue. Here's the Kubelet.log:

[...Standard log spam...]
I0508 16:03:33.080049    4546 server.go:635] GET /stats/: (7.30147ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:03:43.052231    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.374072ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:03:43.054652    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (1.95419ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:03:43.081482    4546 server.go:635] GET /stats/: (1.425988ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:03:47.438908    4546 container.go:242] Failed to update stats for container "/docker/70eb42c83a9e798e86355871c62224419dbc8feaf50442c1dbe61c7f950283c0": failed to read stat from "/sys/class/net/statistics/rx_bytes" for device "", continuing to push stats
I0508 16:03:53.052731    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.32469ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:03:53.057521    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (2.172404ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:03:53.078232    4546 server.go:635] GET /stats/: (1.415304ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:03.055660    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (2.008658ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:03.058324    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.153476ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:03.091466    4546 server.go:635] GET /stats/: (1.400028ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:13.068534    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (2.989418ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:13.072799    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (3.588085ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:13.096192    4546 server.go:635] GET /stats/: (1.425312ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:23.057099    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.545718ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:23.059278    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (1.713525ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:23.088538    4546 server.go:635] GET /stats/: (1.484895ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:33.041244    4546 container.go:242] Failed to update stats for container "/docker/e468701460395a7757ca6d1512ec87f248adf1f56f608b479252e2eb1875e8aa": failed to read stat from "/sys/class/net/statistics/rx_bytes" for device "", continuing to push stats
I0508 16:04:33.054764    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.438062ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:33.056940    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (1.788249ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:33.080508    4546 server.go:635] GET /stats/: (1.435836ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:43.062392    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (2.302632ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:43.065072    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.119145ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:43.080599    4546 server.go:635] GET /stats/: (1.314072ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:48.200695    4546 kubelet.go:1004] Need to restart pod infra container for "client-fh62u_default" because it is not found
I0508 16:04:48.208287    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"implicitly required container POD"}): reason: 'pulled' Successfully pulled image "gcr.io/google_containers/pause:0.8.0"
I0508 16:04:48.238861    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"implicitly required container POD"}): reason: 'created' Created with docker id 2b569c13359fad769873c7d428bf05d275875cc1379e41732f42d72c0b3326fe
I0508 16:04:48.303341    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"implicitly required container POD"}): reason: 'started' Started with docker id 2b569c13359fad769873c7d428bf05d275875cc1379e41732f42d72c0b3326fe
I0508 16:04:48.318329    4546 manager.go:662] Added container: "/docker/2b569c13359fad769873c7d428bf05d275875cc1379e41732f42d72c0b3326fe" (aliases: [k8s_POD.d41d03ce_client-fh62u_default_f3752ac9-f59b-11e4-8b22-42010af08867_750aff04 2b569c13359fad769873c7d428bf05d275875cc1379e41732f42d72c0b3326fe], namespace: "docker")
I0508 16:04:48.327464    4546 provider.go:91] Refreshing cache for provider: *gcp_credentials.dockerConfigUrlKeyProvider
I0508 16:04:48.329777    4546 config.go:119] body of failing http response: &{0xc2086ff8c0 {0 0} false <nil> 0x5bb190 0x5bb120}
E0508 16:04:48.329825    4546 metadata.go:121] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
I0508 16:04:48.331936    4546 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
I0508 16:04:48.332039    4546 provider.go:91] Refreshing cache for provider: *gcp_credentials.dockerConfigKeyProvider
I0508 16:04:48.332554    4546 config.go:119] body of failing http response: &{0xc2086ffd00 {0 0} false <nil> 0x5bb190 0x5bb120}
E0508 16:04:48.332580    4546 metadata.go:109] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg
I0508 16:04:48.438874    4546 container.go:242] Failed to update stats for container "/docker/70eb42c83a9e798e86355871c62224419dbc8feaf50442c1dbe61c7f950283c0": failed to read stat from "/sys/class/net/statistics/rx_bytes" for device "", continuing to push stats
W0508 16:04:48.606656    4546 kubelet.go:1199] Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0" from pod "client-fh62u_default" and container "client": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:04:48.606890    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"spec.containers{client}"}): reason: 'failed' Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:04:53.050975    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.569726ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:04:53.054252    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (2.658673ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:04:53.057785    4546 server.go:635] GET /stats/default/client-fh62u/f3752ac9-f59b-11e4-8b22-42010af08867/client: (1.953283ms) 404 [[Go 1.1 package http] 10.244.0.3:56826]
I0508 16:04:53.088914    4546 server.go:635] GET /stats/: (13.844544ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
W0508 16:04:58.344203    4546 kubelet.go:1199] Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0" from pod "client-fh62u_default" and container "client": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:04:58.344668    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"spec.containers{client}"}): reason: 'failed' Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:03.048297    4546 server.go:635] GET /stats/default/client-fh62u/f3752ac9-f59b-11e4-8b22-42010af08867/client: (1.429018ms) 404 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:05:03.052707    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (1.852762ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:05:03.057067    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.273205ms) 0 [[Go 1.1 package http] 10.244.0.3:37286]
I0508 16:05:03.075094    4546 server.go:635] GET /stats/: (1.39575ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
W0508 16:05:08.435914    4546 kubelet.go:1199] Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0" from pod "client-fh62u_default" and container "client": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:08.439094    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"spec.containers{client}"}): reason: 'failed' Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:13.055541    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.619076ms) 0 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:05:13.059485    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (1.884172ms) 0 [[Go 1.1 package http] 10.244.0.3:56840]
I0508 16:05:13.068499    4546 server.go:635] GET /stats/default/client-fh62u/f3752ac9-f59b-11e4-8b22-42010af08867/client: (2.090625ms) 404 [[Go 1.1 package http] 10.244.0.3:37284]
I0508 16:05:13.095861    4546 server.go:635] GET /stats/: (2.549202ms) 0 [[Go 1.1 package http] 10.244.0.3:56849]
W0508 16:05:18.453630    4546 kubelet.go:1199] Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0" from pod "client-fh62u_default" and container "client": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:18.454335    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"spec.containers{client}"}): reason: 'failed' Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:23.051284    4546 server.go:635] GET /stats/default/client-fh62u/f3752ac9-f59b-11e4-8b22-42010af08867/client: (1.652732ms) 404 [[Go 1.1 package http] 10.244.0.3:56849]
I0508 16:05:23.054063    4546 server.go:635] GET /stats/default/kibana-logging-f8x0s/0ab9c0db-f53a-11e4-8b22-42010af08867/kibana-logging: (2.23547ms) 0 [[Go 1.1 package http] 10.244.0.3:56840]
I0508 16:05:23.068526    4546 server.go:635] GET /stats/default/fluentd-elasticsearch-kubernetes-minion-htb6/2d0811ae-f53a-11e4-8b22-42010af08867/fluentd-elasticsearch: (3.331861ms) 0 [[Go 1.1 package http] 10.244.0.3:56849]
I0508 16:05:23.092492    4546 server.go:635] GET /stats/: (2.481458ms) 0 [[Go 1.1 package http] 10.244.0.3:56840]
W0508 16:05:28.453908    4546 kubelet.go:1199] Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0" from pod "client-fh62u_default" and container "client": Error: image foggy-willow-91020/client:v1.0.0 not found
I0508 16:05:28.454536    4546 event.go:200] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"client-fh62u", UID:"f3752ac9-f59b-11e4-8b22-42010af08867", APIVersion:"v1beta3", ResourceVersion:"24269", FieldPath:"spec.containers{client}"}): reason: 'failed' Failed to pull image "gcr.io/foggy-willow-91020/client:v1.0.0": Error: image foggy-willow-91020/client:v1.0.0 not found
[... repeats indefinitely...]
@ghost
Copy link

ghost commented May 8, 2015

I tend to agree. However we have the general philosophy of continually trying to get reality to move towards desired state. So for example, were you to create the missing image in your example, the system would (in theory) retry the fetch, succeed, and all would be rosy. (If the system doesn't actually do that, then that's a bug, IMO).

I believe that we have the "Message" field in the status of at least some API objects to convey to the user what the last notable event regarding their thing is (in this case it should in theory tell the user something like "Failed to pull image" w.r.t. the RC or Pod status.

@bgrant0607 probably has the full story behind this and should decide what the best fix, if any, is here.

@ghost ghost assigned ghost and bgrant0607 and unassigned ghost May 8, 2015
@ghost ghost added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 8, 2015
@aronchick
Copy link
Contributor Author

If the "event" field was visible in 'get po', that'd be a perfect solution.

On Fri, May 8, 2015 at 10:27 AM, Quinton Hoole notifications@github.com
wrote:

I tend to agree. However we have the general philosophy of continually
trying to get reality to move towards desired state. So for example, were
you to create the missing image in your example, the system would (in
theory) retry the fetch, succeed, and all would be rosy. (If the system
doesn't actually do that, then that's a bug, IMO).

I believe that we have the "Message" field in the status of at least some
API objects to convey to the user what the last notable event regarding
their thing is (in this case it should in theory tell the user something
like "Failed to pull image" w.r.t. the RC or Pod status.

@bgrant0607 https://github.com/bgrant0607 probably has the full story
behind this and should decide what the best fix, if any, is here.


Reply to this email directly or view it on GitHub
#7968 (comment)
.

@lavalamp
Copy link
Member

lavalamp commented May 8, 2015

Use 'describe pod' instead of 'get pod' and you will be pleasantly
surprised.

On Fri, May 8, 2015 at 10:34 AM, David Aronchick notifications@github.com
wrote:

If the "event" field was visible in 'get po', that'd be a perfect solution.

On Fri, May 8, 2015 at 10:27 AM, Quinton Hoole notifications@github.com
wrote:

I tend to agree. However we have the general philosophy of continually
trying to get reality to move towards desired state. So for example, were
you to create the missing image in your example, the system would (in
theory) retry the fetch, succeed, and all would be rosy. (If the system
doesn't actually do that, then that's a bug, IMO).

I believe that we have the "Message" field in the status of at least some
API objects to convey to the user what the last notable event regarding
their thing is (in this case it should in theory tell the user something
like "Failed to pull image" w.r.t. the RC or Pod status.

@bgrant0607 https://github.com/bgrant0607 probably has the full story
behind this and should decide what the best fix, if any, is here.


Reply to this email directly or view it on GitHub
<
#7968 (comment)

.


Reply to this email directly or view it on GitHub
#7968 (comment)
.

@bgrant0607
Copy link
Member

More later, but see also #7856 and #2529

@dchen1107
Copy link
Member

I am re-assigning it to myself. @aronchick have you tried kubectl describe pod like @lavalamp suggested above? Is that meet your expectation? If not, please give us more details here, I will take a look at it. Thanks!

@aronchick
Copy link
Contributor Author

Description actually did work great - i just wasn't aware of it. The
biggest problem is that my term is too narrow and wrapped the error message
so i didn't see it. Obviously PEBCAK, but it might be interesting to
explore a more compact default status layout so that people don't miss the
important stuff.

On Fri, May 8, 2015 at 1:34 PM, Dawn Chen notifications@github.com wrote:

I am re-assigning it to myself. @aronchick https://github.com/aronchick
have you tried kubectl describe pod like @lavalamp
https://github.com/lavalamp suggested above? Is that meet your
expectation? If not, please give us more details here, I will take a look
at it. Thanks!


Reply to this email directly or view it on GitHub
#7968 (comment)
.

@yujuhong
Copy link
Contributor

@aronchick, kubelet does the best effort reporting of failure reasons at a glance. The recent change in #7981 added support for surfacing image pulling failures.
E.g.,

$kubectl get pods
POD       IP          CONTAINER(S)   IMAGE(S)    HOST                  LABELS    STATUS    CREATED     MESSAGE
foo       10.0.0.53                              127.0.0.1/127.0.0.1   <none>    Pending   3 seconds   
                      bar1           fooimage1                                   Waiting               Error: image library/fooimage1:latest not found
                      bar2           fooimage2                                   Waiting               Error: image library/fooimage2:latest not found

Would this help you notice the failure?

@aronchick
Copy link
Contributor Author

This is perfect, but my biggest issue is that I'm on a 80 character screen, and stuff gets wrapped and is tough to read. Not sure what the solution there is, but maybe intelligent wrapping? Here's how it looks on my screen - why are there two lines for one container?

Running: ../kubernetes/cluster/../cluster/gce/../../cluster/../platforms/darwin/amd64/kubectl get po -l name=fooboard
POD               IP             CONTAINER(S)   IMAGE(S)                                         HOST                                    LABELS                          STATUS    CREATED   MESSAGE
fooboard-vkpt3   10.24.3.133                                                                   kubernetes-minion-sird/104.17.21.146     name=fooboard,version=v1.0.6   Running   4 hours
                                 fooboard      gcr.io/mythical-willow-91020/foo_image:v1.0.6                                                                           Running   4 hours

@dchen1107
Copy link
Member

@aronchick The first line is for pod, and the following lines with empty first two columns are for containers. We are talking about what is the best way to represent those information to the end users.
cc/ @brendandburns and @bgrant0607

Besides output layout, any other improvements we could do for you? Thanks!

@yujuhong
Copy link
Contributor

As @dchen1107 said, the first line is the pod, and the subsequent lines (without pod names) are containers. We format it this way so that the container lines can reuse the columns (e.g. STATUS, CREATED, MESSAGE).

FYI, the "kubectl output is too wide" issue is #7843.

@bgrant0607
Copy link
Member

Yes, let's discuss output format on #7843

@dchen1107
Copy link
Member

I am close the issue based on above comments for now. @aronchick please re-open the issue if we didn't answer your question here. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

5 participants