Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky e2e: Proxy version v1 should proxy logs on node (Failed 4 times in the last 30 runs. Stability: 86 % #10792

Closed
ghost opened this issue Jul 7, 2015 · 45 comments
Assignees
Labels
area/test-infra priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@ghost
Copy link

ghost commented Jul 7, 2015

#9312 and #10739 provide further details. #10739, the intended fix, does not seem to have done the job.

@ghost ghost added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/test-infra labels Jul 7, 2015
@ghost ghost assigned wojtek-t Jul 7, 2015
@ghost ghost added this to the v1.0 milestone Jul 7, 2015
@lavalamp
Copy link
Member

lavalamp commented Jul 7, 2015

It's totally nuts that we have to change every test to check for nodes being ready. :/

@ghost
Copy link
Author

ghost commented Jul 7, 2015

@lavalamp You're right (as usual). Better suggestions welcome. A few I can think of off the top of my head:

  1. Have the e2e Framework automatically validate/wait for all nodes to be ready before each test (not ideal, IMO - slow, and not really realistic, especially for larger clusters).
  2. Have each test "know" how many nodes it needs, and explicitly fail if there are an insufficient number ready when it starts.
  3. Have each test be immune to non-ready nodes (e.g. by always using only the set of known-ready nodes, determined by the Framework before each test).

Others?

It's not totally clear which of these is the best approach yet. Let me give it some more thought.

@zmerlynn
Copy link
Member

zmerlynn commented Jul 7, 2015

@lavalamp: That's why I was proposing a general fixture that could check for this type of thing on the way out, then we could catch bad actors.

@lavalamp
Copy link
Member

lavalamp commented Jul 7, 2015

Yeah, it should be pretty trivial to check this when a framework is shut
down.

On Mon, Jul 6, 2015 at 5:26 PM, Zach Loafman notifications@github.com
wrote:

@lavalamp https://github.com/lavalamp: That's why I was proposing a
general fixture that could check for this type of thing on the way out,
then we could catch bad actors.


Reply to this email directly or view it on GitHub
#10792 (comment)
.

ghost pushed a commit that referenced this issue Jul 7, 2015
Demote e2e test as per #10792.
yujuhong added a commit that referenced this issue Jul 7, 2015
@wojtek-t
Copy link
Member

wojtek-t commented Jul 7, 2015

Regarding the original test - it seems that the problem was different in the last failure. Basically, the first 3 failures were failing with error:
Error: 'dial tcp 10.240.41.154:10250: connection refused
which was due to not-ready node.

The last failure (once the previous one was fixed) failed with:
Error: 'read tcp 10.240.216.116:10250: connection reset by peer
which is different - i.e. it seems that the node was reachable, but because of some reason it didn't answer.

@wojtek-t
Copy link
Member

wojtek-t commented Jul 7, 2015

Also - I'm not able to reproduce this failure.

@cjcullen
Copy link
Member

cjcullen commented Jul 7, 2015

I was able to reproduce the connection refused failures pretty consistently (~75%) prior to @wojtek-t's 2 fixes by running:

go run hack/e2e.go -test -v --test_args="--ginkgo.focus=.*Resize.*|.*Proxy.*"

I've now run it 10 times without seeing the "connection refused" error, but I did see the "connection reset" failure once on the "proxy logs on node" test.

@ghost
Copy link
Author

ghost commented Jul 8, 2015

It still seems to be failing occasionally in Jenkins, even after #10820 was merged.

e.g.

job/kubernetes-e2e-gce/7461/
job/kubernetes-e2e-gce-parallel/2989/
job/kubernetes-e2e-gce-parallel/2974/

It's quite possible that the failures are for other reasons - I've not looked into it deeply.

@wojtek-t
Copy link
Member

wojtek-t commented Jul 8, 2015

It's quite possible that the failures are for other reasons - I've not looked into it deeply.

I'm pretty sure the reason is different here. I will try to look into it deeper today.

@wojtek-t
Copy link
Member

wojtek-t commented Jul 8, 2015

@quinton-hoole @davidopp

I took a bit deeper look and it seems that after adding #10820 all of the failures of Proxy tests are caused by some NOT ready node at the end of the test. Also other failures are caused by it.

@dchen1107: FYI
I looked into Kubelet logs and it seems that from time to time some Kubelet is restarted:

  • example: job/kubernetes-e2e-gce-parallel/2989/
  • Kubelet on node e2e-test-parallel-minion-wwfw (see 130.211.175.207-22-kubelet.log on GCS) was restarted at least 2 times there within 4 minutes

My current hypothesis is that a lot of different flakes that we are observing (e.g. Proxy flakes #10792 or EmptyDir flakes #10657) might in fact be caused by not-ready nodes at some random points in time.

@dchen1107 @lavalamp is there any way to get an information why Kubelet was restarted? Can Kubelet be restarted by Monit more frequently than once per 5 minutes?

@dchen1107
Copy link
Member

Kubelet restart because of /healthz failure?

@dchen1107
Copy link
Member

cc/ @saad-ali Saad, could you please take a look at this one to figure out why kubelet restart so frequently.

@yujuhong
Copy link
Contributor

yujuhong commented Jul 8, 2015

Can Kubelet be restarted by Monit more frequently than once per 5 minutes?

monit checks every two minutes.

information why Kubelet was restarted?

monit checks the existence of the pid file and /healthz

@saad-ali
Copy link
Member

saad-ali commented Jul 8, 2015

I0707 23:33:20.571568    2516 server.go:590] Started kubelet
...
I0707 23:33:20.595871    2516 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:34:10.323520    3072 server.go:623] Started kubelet
...
I0707 23:34:10.324340    3072 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:36:40.041878    6809 server.go:623] Started kubelet
...
I0707 23:36:40.052956    6809 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:39:09.054991    6809 server.go:635] GET /healthz: (3.821042ms) 0 [[monit/5.4] 127.0.0.1:53086]

Looks like the first monit healthz check was successfully handled after the last restart. Meaning either there were no healthz checks before that, or kubelet was rejecting them.

@saad-ali
Copy link
Member

saad-ali commented Jul 9, 2015

Looking at the thread numbers, it looks like Kubelet was restarted more than twice:

I0707 23:33:19.363342    2516 manager.go:126] cAdvisor running in container: "/"
...
E0707 23:33:30.649166    2516 kubelet.go:1538] error getting node: node e2e-test-parallel-minion-wwfw not found
I0707 23:33:36.928203    2928 manager.go:127] cAdvisor running in container: "/"
...
I0707 23:33:36.935360    2928 server.go:290] Successfully initialized cloud provider: "gce" from the config file: ""
I0707 23:33:38.535350    3072 manager.go:127] cAdvisor running in container: "/"
...
I0707 23:36:38.563064    3072 manager.go:1384] Need to restart pod infra container for "cleanup60-0201cac8-2501-11e5-a351-42010af01555-x8zoc_e2e-tests-kubelet-delete-iezg2" because it is not found
I0707 23:36:38.957506    6809 manager.go:127] cAdvisor running in container: "/system"

where 2928 was the shortest lived thread:

I0707 23:33:36.928037    2928 server.go:271] Using root directory: /var/lib/kubelet
I0707 23:33:36.928203    2928 manager.go:127] cAdvisor running in container: "/"
I0707 23:33:36.928637    2928 fs.go:93] Filesystem partitions: map[/dev/disk/by-uuid/d1f816ed-d274-4db7-bc93-4984de845be6:{mountpoint:/ major:8 minor:1}]
I0707 23:33:36.931964    2928 machine.go:229] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
I0707 23:33:36.932016    2928 manager.go:156] Machine: {NumCores:2 CpuFrequency:2500000 MemoryCapacity:7863918592 MachineID: SystemUUID:BF8E0FE0-E9D2-2808-EFA2-DC5AC948661C BootID:8ecec253-66cf-4a5e-a190-5c7569a85431 Filesystems:[{Device:/dev/disk/by-uuid/d1f816ed-d274-4db7-bc93-4984de845be6 Capacity:105553100800}] DiskMap:map[8:0:{Name:sda Major:8 Minor:0 Size:107374182400 Scheduler:cfq}] NetworkDevices:[{Name:eth0 MacAddress:42:01:0a:f0:86:0b Speed:0 Mtu:1460}] Topology:[{Id:0 Memory:7863918592 Cores:[{Id:0 Threads:[0 1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:31457280 Type:Unified Level:3}]}]}
I0707 23:33:36.932235    2928 manager.go:163] Version: {KernelVersion:3.16.0-0.bpo.4-amd64 ContainerOsVersion:Debian GNU/Linux 7 (wheezy) DockerVersion:Unknown CadvisorVersion:0.15.1}
I0707 23:33:36.935360    2928 server.go:290] Successfully initialized cloud provider: "gce" from the config file: ""

Nothing indicates why it was killed.

@saad-ali
Copy link
Member

saad-ali commented Jul 9, 2015

There exists a race condition between kubelet start up and monit. If monit comes up before salt starts kubelet, it will notice that kubelet is not running and will start it. And with #8931 if salt then starts kubelet, the second start up call will kill the previous instance of kubelet.

But that should result in only 1 "restart". We see 4 starts (i.e. 3 restarts). It's possible that the healthz check happens during the restarts, triggering the same cycle.

Regardless this seems to be very common. Checking random otherwise successful GCE E2E runs, I see at least 2 restarts:

I0708 20:57:30.557189    2487 manager.go:126] cAdvisor running in container: "/"
...
E0708 20:57:51.943819    2487 kubelet.go:1645] Couldn't sync containers: dial unix /var/run/docker.sock: no such file or directory
I0708 20:57:52.608773    2899 server.go:271] Using root directory: /var/lib/kubelet
...
I0708 20:57:53.691167    2899 kubelet.go:821] Successfully registered node e2e-gce-minion-nf16
I0708 20:57:54.417579    3107 server.go:271] Using root directory: /var/lib/kubelet
...

@wojtek-t
Copy link
Member

wojtek-t commented Jul 9, 2015

@saad-ali - I think that the situation from my test is a bit more dangerous because it took more than 3 minutes (although I agree it seems to be a problem in general).
Do we have an idea why the Kubelet is restarted (i.e. why it's not responding to /healthz?) From the logs it seems that it's working correctly...

--Update--
As you pointed, there were no logged /healthz requests during first 6 minutes. Do you have an idea why those can be rejected?

@ghost ghost removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 10, 2015
@wojtek-t
Copy link
Member

I am not comfortable at such critical time to upgrade monit to a new version, especially there is no evidence showing me monit causing the problem here. There are known races because we don't have fully control of our boot sequences, and restarting kubelet, docker etc. services through various sources. That is our call since we are not managing the node, and not requiring systemd as the only process monitoring system. But we do agree to make our system resilient to those failures / races.

I agree it might be risky to change the monit version now. But the problem @saad-ali and I described is not the problem of bootstrap. Although there are some restarts at the beginning (within 30 seconds after restart) I don't think this is serious.

However - we ARE observing restarts also in the middle of running tests, e.g. 10 minutes after creating cluster. I don't think we can call that moments "boot sequences"

@ghost
Copy link
Author

ghost commented Jul 10, 2015

Closing in favor of #10899 to track remaining issue.

@ghost ghost closed this as completed Jul 10, 2015
@dchen1107
Copy link
Member

@quinton-hoole I think you closed a wrong one here. I am reopen it, re-close it if you disagreed with me. :-)

@dchen1107 dchen1107 reopened this Jul 10, 2015
@ghost
Copy link
Author

ghost commented Jul 10, 2015

@dchen1107 as I understand it, we still need to track down the reason for the seemingly unnecessary kubelet restarts. Is #10899 not the canonical tracking issue for that?

@davidopp
Copy link
Member

@wojtek-t
Copy link
Member

@davidopp

I looked into those failures and both are exactly the same. For me this seems like some problems with network. Basically in the apiserver there is a log it seems that it's correctly sending a request to the kubelet:

I0712 22:26:06.333028       8 handlers.go:137] GET /api/v1/proxy/nodes/e2e-gce-minion-4qfv:10250/logs/: (2.052688ms) 503
goroutine 28999 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(*respLogger).WriteHeader(0xc20ac35500, 0x1f7)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:188 +0x9a
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc209f0ade0, 0x7ffb005a2e38, 0xc20ac35500, 0xc20a726d00)
  /usr/src/go/src/net/http/httputil/reverseproxy.go:159 +0x708
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.(*ProxyHandler).ServeHTTP(0xc208327ef0, 0x7ffb005a2e38, 0xc20ac35500, 0xc209e71930)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/proxy.go:212 +0x2618
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.func·002(0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/api_installer.go:732 +0x60
github.com/emicklei/go-restful.func·005(0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/container.go:215 +0x41
github.com/emicklei/go-restful.(*FilterChain).ProcessFilter(0xc209d9ad80, 0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/filter.go:21 +0xa2
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.func·003(0xc209d9acf0, 0xc20b1920f0, 0xc209d9ad80)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/apiserver.go:59 +0x88
github.com/emicklei/go-restful.(*FilterChain).ProcessFilter(0xc209d9ad80, 0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/filter.go:19 +0x84
github.com/emicklei/go-restful.(*Container).dispatch(0xc208328540, 0x7ffb005a2e38, 0xc20ac35500, 0xc209e71930)
  /go/src/github.com/GoogleCloudPlatform/ [[e2e.test/v1.0.0 (linux/amd64) kubernetes/d108d7f] 104.197.45.156:36080]

However - there arecompletely no logs around that time in Kubelet:

I0712 22:26:00.636718    3054 server.go:635] POST /stats/container/: (181.250092ms) 0 [[Go 1.1 package http] 10.245.2.8:38013]
I0712 22:26:03.611431    3054 server.go:635] GET /healthz: (1.026967ms) 0 [[monit/5.4] 127.0.0.1:46210]
I0712 22:26:11.084983    3054 server.go:635] GET /stats/kube-system/fluentd-elasticsearch-e2e-gce-minion-4qfv/4576f612-28e4-11e5-88ce-42010af0d8d7/fluentd-elasticsearch: (4.627107ms) 0 [     [Go 1.1 package http] 10.245.2.8:38006]

So it seems like the http request was lost somewhere in the network.

It doesn't seem to be a problem with Kubernetes - what we can do is just to retry the request in the test in case of that the errors. What do you think?

@davidopp
Copy link
Member

Yeah, retrying seems like a good idea; it can't hurt and maybe it will fix the problem as you say.

@wojtek-t
Copy link
Member

@davidopp - ok - I can prepare a PR for it.

@ghost
Copy link
Author

ghost commented Jul 13, 2015

Having a retry in the test seems fine, but it would be good to get to the bottom of why the node network is getting borked. Weve seen that problem elsewhere also.

@wojtek-t
Copy link
Member

@quinton-hoole - I don't think it's Kubernetes problems. Basically, this particular request is not going through kube-proxy or anything like that - we just send an http request directly to Kubelet and it seems Kubelet doesn't event receive it. I'm not sure how/if we can debug it...

@ghost
Copy link
Author

ghost commented Jul 13, 2015

I'm concerned about our iptables reconfiguration on the nodes, e2e tests that down the node network interface, and the kubelet process that is being restarted.

@lavalamp
Copy link
Member

I don't know about retrying, it already tries N times and expects them all to succeed.

@davidopp davidopp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jul 16, 2015
@davidopp
Copy link
Member

@davidopp
Copy link
Member

@erictune

@davidopp
Copy link
Member

Did we merge something between 1:30 and 2:00 today that may have broken this?

@davidopp
Copy link
Member

False alarm, these failures are all performance expectations. We should investigate why stuff got slower but it's not a correctness bug.

@davidopp
Copy link
Member

Actually this is frequently failing with a more worrisome error, e.g.
http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/2737/testReport/junit/(root)/Kubernetes%20e2e%20suite/Proxy_version_v1_should_proxy_through_a_service_and_a_pod/

• Failure [76.026 seconds]
Proxy
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:41
  version v1
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:40
    should proxy through a service and a pod [It]
    /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:169

    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding

    /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:167

I'm reluctant to increase the timeout to fix the timeout failures ("took xxx > yyy") until we understand what is going on with this other failure.

@davidopp
Copy link
Member

The "an error on the server has prevented the request from succeeding" failures seem to mostly be happening in the PR builder Jenkins (http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/) whereas all the ones in the kubernetes-e2e-gce-parallel (http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-parallel/3572/) seem to be of the timeout flavor.

@davidopp
Copy link
Member

These failures appear to have been due to another issue that occurred at about the same time. Per-PR, regular, and parallel Jenkins runs appear to be back to normal. I'll check again later this evening to make sure. Thanks @lavalamp and @dchen1107 for noticing the connection between this and the other issue.

@ghost
Copy link
Author

ghost commented Aug 17, 2015

I've confirmed that this test is again 100% stable. Closing.

@ghost ghost closed this as completed Aug 17, 2015
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

9 participants