Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s.io] MetricsGrabber should grab all metrics from a Kubelet. {Kubernetes e2e suite} #37543

Closed
k8s-github-robot opened this issue Nov 28, 2016 · 12 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@k8s-github-robot
Copy link

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-etcd3/661/

Failed: [k8s.io] MetricsGrabber should grab all metrics from a Kubelet. {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:57
Expected error:
    <*errors.StatusError | 0xc4211b9c00>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "an error on the server (\"Error: 'dial tcp 10.240.0.5:10250: getsockopt: connection refused'\\nTrying to reach: 'https://bootstrap-e2e-minion-group-0v8s:10250/metrics'\") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-0v8s:10250)",
            Reason: "InternalError",
            Details: {
                Name: "bootstrap-e2e-minion-group-0v8s:10250",
                Group: "",
                Kind: "nodes",
                Causes: [
                    {
                        Type: "UnexpectedServerResponse",
                        Message: "Error: 'dial tcp 10.240.0.5:10250: getsockopt: connection refused'\nTrying to reach: 'https://bootstrap-e2e-minion-group-0v8s:10250/metrics'",
                        Field: "",
                    },
                ],
                RetryAfterSeconds: 0,
            },
            Code: 503,
        },
    }
    an error on the server ("Error: 'dial tcp 10.240.0.5:10250: getsockopt: connection refused'\nTrying to reach: 'https://bootstrap-e2e-minion-group-0v8s:10250/metrics'") has prevented the request from succeeding (get nodes bootstrap-e2e-minion-group-0v8s:10250)
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:55

Previous issues for this test: #27295 #35385 #36126 #37452

@k8s-github-robot k8s-github-robot added kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Nov 28, 2016
@gmarek
Copy link
Contributor

gmarek commented Nov 28, 2016

@vishh @dchen1107 - it seems that we can't establish proxy connection to the GCI Node.

@k8s-github-robot
Copy link
Author

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/2309/

Failed: [k8s.io] MetricsGrabber should grab all metrics from a Kubelet. {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:57
Expected error:
    <*errors.errorString | 0xc420b5e020>: {
        s: "Timed out when waiting for proxy to gather metrics from bootstrap-e2e-minion-group-1pkc",
    }
    Timed out when waiting for proxy to gather metrics from bootstrap-e2e-minion-group-1pkc
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:55

@gmarek
Copy link
Contributor

gmarek commented Dec 5, 2016

It seems that api-server proxy wasn't able to connect to this Node, but there are no logs, so it's impossible to know what happened.

@gmarek
Copy link
Contributor

gmarek commented Mar 12, 2017

Looking at recent failures it seems that it's still related to Master-Node connectivity, but not necessarily #36143. Adding retry might help (or not - hard to tell), but it'd shadow real problem, which is not working node proxy (IIUC). @kubernetes/sig-api-machinery-misc

@ethernetdan
Copy link
Contributor

@cmluciano @gmarek should this flake block the 1.6 release?

@cmluciano
Copy link

@gmarek Should this have a label for the apimachinery SIG as well

@davidopp davidopp added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Mar 15, 2017
@davidopp
Copy link
Member

This is clearly just flaking. Making as non-release-blocker.

@davidopp
Copy link
Member

(@gmarek is on vacation)

@ethernetdan ethernetdan modified the milestones: v1.7, v1.6 Mar 15, 2017
@lavalamp lavalamp removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Mar 15, 2017
@lavalamp
Copy link
Member

This looks to be talking directly to kubelets? If so it doesn't go through any api machinery code.

@davidopp davidopp added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Mar 15, 2017
@davidopp
Copy link
Member

Changing to @kubernetes/sig-node-bugs

@yujuhong
Copy link
Contributor

It only flaked once in the past two weeks, and that was because kubelet panicked. The root cause was already fixed by #42927.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-etcd3/5190

Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants