Flaky e2e: Proxy version v1 should proxy logs on node (Failed 4 times in the last 30 runs. Stability: 86 % #10792

ghost · 2015-07-07T00:01:16Z

#9312 and #10739 provide further details. #10739, the intended fix, does not seem to have done the job.

lavalamp · 2015-07-07T00:12:38Z

It's totally nuts that we have to change every test to check for nodes being ready. :/

ghost · 2015-07-07T00:25:31Z

@lavalamp You're right (as usual). Better suggestions welcome. A few I can think of off the top of my head:

Have the e2e Framework automatically validate/wait for all nodes to be ready before each test (not ideal, IMO - slow, and not really realistic, especially for larger clusters).
Have each test "know" how many nodes it needs, and explicitly fail if there are an insufficient number ready when it starts.
Have each test be immune to non-ready nodes (e.g. by always using only the set of known-ready nodes, determined by the Framework before each test).

Others?

It's not totally clear which of these is the best approach yet. Let me give it some more thought.

zmerlynn · 2015-07-07T00:26:01Z

@lavalamp: That's why I was proposing a general fixture that could check for this type of thing on the way out, then we could catch bad actors.

lavalamp · 2015-07-07T00:29:11Z

Yeah, it should be pretty trivial to check this when a framework is shut
down.

On Mon, Jul 6, 2015 at 5:26 PM, Zach Loafman notifications@github.com
wrote:

@lavalamp https://github.com/lavalamp: That's why I was proposing a
general fixture that could check for this type of thing on the way out,
then we could catch bad actors.

—
Reply to this email directly or view it on GitHub
#10792 (comment)
.

Demote e2e test as per #10792.

…te-proxy-e2e Demote e2e test as per #10792

wojtek-t · 2015-07-07T08:10:47Z

Regarding the original test - it seems that the problem was different in the last failure. Basically, the first 3 failures were failing with error:
Error: 'dial tcp 10.240.41.154:10250: connection refused
which was due to not-ready node.

The last failure (once the previous one was fixed) failed with:
Error: 'read tcp 10.240.216.116:10250: connection reset by peer
which is different - i.e. it seems that the node was reachable, but because of some reason it didn't answer.

wojtek-t · 2015-07-07T08:11:06Z

Also - I'm not able to reproduce this failure.

cjcullen · 2015-07-07T21:10:01Z

I was able to reproduce the connection refused failures pretty consistently (~75%) prior to @wojtek-t's 2 fixes by running:

go run hack/e2e.go -test -v --test_args="--ginkgo.focus=.*Resize.*|.*Proxy.*"

I've now run it 10 times without seeing the "connection refused" error, but I did see the "connection reset" failure once on the "proxy logs on node" test.

ghost · 2015-07-08T01:53:26Z

It still seems to be failing occasionally in Jenkins, even after #10820 was merged.

e.g.

job/kubernetes-e2e-gce/7461/
job/kubernetes-e2e-gce-parallel/2989/
job/kubernetes-e2e-gce-parallel/2974/

It's quite possible that the failures are for other reasons - I've not looked into it deeply.

wojtek-t · 2015-07-08T06:23:52Z

It's quite possible that the failures are for other reasons - I've not looked into it deeply.

I'm pretty sure the reason is different here. I will try to look into it deeper today.

wojtek-t · 2015-07-08T08:56:16Z

@quinton-hoole @davidopp

I took a bit deeper look and it seems that after adding #10820 all of the failures of Proxy tests are caused by some NOT ready node at the end of the test. Also other failures are caused by it.

@dchen1107: FYI
I looked into Kubelet logs and it seems that from time to time some Kubelet is restarted:

example: job/kubernetes-e2e-gce-parallel/2989/
Kubelet on node e2e-test-parallel-minion-wwfw (see 130.211.175.207-22-kubelet.log on GCS) was restarted at least 2 times there within 4 minutes

My current hypothesis is that a lot of different flakes that we are observing (e.g. Proxy flakes #10792 or EmptyDir flakes #10657) might in fact be caused by not-ready nodes at some random points in time.

@dchen1107 @lavalamp is there any way to get an information why Kubelet was restarted? Can Kubelet be restarted by Monit more frequently than once per 5 minutes?

dchen1107 · 2015-07-08T21:10:03Z

Kubelet restart because of /healthz failure?

dchen1107 · 2015-07-08T21:11:34Z

cc/ @saad-ali Saad, could you please take a look at this one to figure out why kubelet restart so frequently.

yujuhong · 2015-07-08T21:21:03Z

Can Kubelet be restarted by Monit more frequently than once per 5 minutes?

monit checks every two minutes.

information why Kubelet was restarted?

monit checks the existence of the pid file and /healthz

saad-ali · 2015-07-08T22:44:34Z

I0707 23:33:20.571568    2516 server.go:590] Started kubelet
...
I0707 23:33:20.595871    2516 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:34:10.323520    3072 server.go:623] Started kubelet
...
I0707 23:34:10.324340    3072 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:36:40.041878    6809 server.go:623] Started kubelet
...
I0707 23:36:40.052956    6809 server.go:63] Starting to listen on 0.0.0.0:10250
...
I0707 23:39:09.054991    6809 server.go:635] GET /healthz: (3.821042ms) 0 [[monit/5.4] 127.0.0.1:53086]

Looks like the first monit healthz check was successfully handled after the last restart. Meaning either there were no healthz checks before that, or kubelet was rejecting them.

saad-ali · 2015-07-09T00:16:52Z

Looking at the thread numbers, it looks like Kubelet was restarted more than twice:

I0707 23:33:19.363342    2516 manager.go:126] cAdvisor running in container: "/"
...
E0707 23:33:30.649166    2516 kubelet.go:1538] error getting node: node e2e-test-parallel-minion-wwfw not found
I0707 23:33:36.928203    2928 manager.go:127] cAdvisor running in container: "/"
...
I0707 23:33:36.935360    2928 server.go:290] Successfully initialized cloud provider: "gce" from the config file: ""
I0707 23:33:38.535350    3072 manager.go:127] cAdvisor running in container: "/"
...
I0707 23:36:38.563064    3072 manager.go:1384] Need to restart pod infra container for "cleanup60-0201cac8-2501-11e5-a351-42010af01555-x8zoc_e2e-tests-kubelet-delete-iezg2" because it is not found
I0707 23:36:38.957506    6809 manager.go:127] cAdvisor running in container: "/system"

where 2928 was the shortest lived thread:

I0707 23:33:36.928037    2928 server.go:271] Using root directory: /var/lib/kubelet
I0707 23:33:36.928203    2928 manager.go:127] cAdvisor running in container: "/"
I0707 23:33:36.928637    2928 fs.go:93] Filesystem partitions: map[/dev/disk/by-uuid/d1f816ed-d274-4db7-bc93-4984de845be6:{mountpoint:/ major:8 minor:1}]
I0707 23:33:36.931964    2928 machine.go:229] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
I0707 23:33:36.932016    2928 manager.go:156] Machine: {NumCores:2 CpuFrequency:2500000 MemoryCapacity:7863918592 MachineID: SystemUUID:BF8E0FE0-E9D2-2808-EFA2-DC5AC948661C BootID:8ecec253-66cf-4a5e-a190-5c7569a85431 Filesystems:[{Device:/dev/disk/by-uuid/d1f816ed-d274-4db7-bc93-4984de845be6 Capacity:105553100800}] DiskMap:map[8:0:{Name:sda Major:8 Minor:0 Size:107374182400 Scheduler:cfq}] NetworkDevices:[{Name:eth0 MacAddress:42:01:0a:f0:86:0b Speed:0 Mtu:1460}] Topology:[{Id:0 Memory:7863918592 Cores:[{Id:0 Threads:[0 1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:31457280 Type:Unified Level:3}]}]}
I0707 23:33:36.932235    2928 manager.go:163] Version: {KernelVersion:3.16.0-0.bpo.4-amd64 ContainerOsVersion:Debian GNU/Linux 7 (wheezy) DockerVersion:Unknown CadvisorVersion:0.15.1}
I0707 23:33:36.935360    2928 server.go:290] Successfully initialized cloud provider: "gce" from the config file: ""

Nothing indicates why it was killed.

saad-ali · 2015-07-09T01:02:37Z

There exists a race condition between kubelet start up and monit. If monit comes up before salt starts kubelet, it will notice that kubelet is not running and will start it. And with #8931 if salt then starts kubelet, the second start up call will kill the previous instance of kubelet.

But that should result in only 1 "restart". We see 4 starts (i.e. 3 restarts). It's possible that the healthz check happens during the restarts, triggering the same cycle.

Regardless this seems to be very common. Checking random otherwise successful GCE E2E runs, I see at least 2 restarts:

I0708 20:57:30.557189    2487 manager.go:126] cAdvisor running in container: "/"
...
E0708 20:57:51.943819    2487 kubelet.go:1645] Couldn't sync containers: dial unix /var/run/docker.sock: no such file or directory
I0708 20:57:52.608773    2899 server.go:271] Using root directory: /var/lib/kubelet
...
I0708 20:57:53.691167    2899 kubelet.go:821] Successfully registered node e2e-gce-minion-nf16
I0708 20:57:54.417579    3107 server.go:271] Using root directory: /var/lib/kubelet
...

wojtek-t · 2015-07-09T07:21:49Z

@saad-ali - I think that the situation from my test is a bit more dangerous because it took more than 3 minutes (although I agree it seems to be a problem in general).
Do we have an idea why the Kubelet is restarted (i.e. why it's not responding to /healthz?) From the logs it seems that it's working correctly...

--Update--
As you pointed, there were no logged /healthz requests during first 6 minutes. Do you have an idea why those can be rejected?

wojtek-t · 2015-07-10T06:35:57Z

I am not comfortable at such critical time to upgrade monit to a new version, especially there is no evidence showing me monit causing the problem here. There are known races because we don't have fully control of our boot sequences, and restarting kubelet, docker etc. services through various sources. That is our call since we are not managing the node, and not requiring systemd as the only process monitoring system. But we do agree to make our system resilient to those failures / races.

I agree it might be risky to change the monit version now. But the problem @saad-ali and I described is not the problem of bootstrap. Although there are some restarts at the beginning (within 30 seconds after restart) I don't think this is serious.

However - we ARE observing restarts also in the middle of running tests, e.g. 10 minutes after creating cluster. I don't think we can call that moments "boot sequences"

ghost · 2015-07-10T16:38:42Z

Closing in favor of #10899 to track remaining issue.

dchen1107 · 2015-07-10T18:40:36Z

@quinton-hoole I think you closed a wrong one here. I am reopen it, re-close it if you disagreed with me. :-)

ghost · 2015-07-10T19:13:07Z

@dchen1107 as I understand it, we still need to track down the reason for the seemingly unnecessary kubelet restarts. Is #10899 not the canonical tracking issue for that?

davidopp · 2015-07-13T04:06:46Z

This falied twice again:
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce/7551/
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce/7561/

wojtek-t · 2015-07-13T07:27:19Z

@davidopp

I looked into those failures and both are exactly the same. For me this seems like some problems with network. Basically in the apiserver there is a log it seems that it's correctly sending a request to the kubelet:

I0712 22:26:06.333028       8 handlers.go:137] GET /api/v1/proxy/nodes/e2e-gce-minion-4qfv:10250/logs/: (2.052688ms) 503
goroutine 28999 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(*respLogger).WriteHeader(0xc20ac35500, 0x1f7)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:188 +0x9a
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc209f0ade0, 0x7ffb005a2e38, 0xc20ac35500, 0xc20a726d00)
  /usr/src/go/src/net/http/httputil/reverseproxy.go:159 +0x708
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.(*ProxyHandler).ServeHTTP(0xc208327ef0, 0x7ffb005a2e38, 0xc20ac35500, 0xc209e71930)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/proxy.go:212 +0x2618
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.func·002(0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/api_installer.go:732 +0x60
github.com/emicklei/go-restful.func·005(0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/container.go:215 +0x41
github.com/emicklei/go-restful.(*FilterChain).ProcessFilter(0xc209d9ad80, 0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/filter.go:21 +0xa2
github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver.func·003(0xc209d9acf0, 0xc20b1920f0, 0xc209d9ad80)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/apiserver/apiserver.go:59 +0x88
github.com/emicklei/go-restful.(*FilterChain).ProcessFilter(0xc209d9ad80, 0xc209d9acf0, 0xc20b1920f0)
  /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/emicklei/go-restful/filter.go:19 +0x84
github.com/emicklei/go-restful.(*Container).dispatch(0xc208328540, 0x7ffb005a2e38, 0xc20ac35500, 0xc209e71930)
  /go/src/github.com/GoogleCloudPlatform/ [[e2e.test/v1.0.0 (linux/amd64) kubernetes/d108d7f] 104.197.45.156:36080]

However - there arecompletely no logs around that time in Kubelet:

I0712 22:26:00.636718    3054 server.go:635] POST /stats/container/: (181.250092ms) 0 [[Go 1.1 package http] 10.245.2.8:38013]
I0712 22:26:03.611431    3054 server.go:635] GET /healthz: (1.026967ms) 0 [[monit/5.4] 127.0.0.1:46210]
I0712 22:26:11.084983    3054 server.go:635] GET /stats/kube-system/fluentd-elasticsearch-e2e-gce-minion-4qfv/4576f612-28e4-11e5-88ce-42010af0d8d7/fluentd-elasticsearch: (4.627107ms) 0 [     [Go 1.1 package http] 10.245.2.8:38006]

So it seems like the http request was lost somewhere in the network.

It doesn't seem to be a problem with Kubernetes - what we can do is just to retry the request in the test in case of that the errors. What do you think?

davidopp · 2015-07-13T07:33:37Z

Yeah, retrying seems like a good idea; it can't hurt and maybe it will fix the problem as you say.

wojtek-t · 2015-07-13T07:35:41Z

@davidopp - ok - I can prepare a PR for it.

ghost · 2015-07-13T08:03:17Z

Having a retry in the test seems fine, but it would be good to get to the bottom of why the node network is getting borked. Weve seen that problem elsewhere also.

wojtek-t · 2015-07-13T08:07:23Z

@quinton-hoole - I don't think it's Kubernetes problems. Basically, this particular request is not going through kube-proxy or anything like that - we just send an http request directly to Kubelet and it seems Kubelet doesn't event receive it. I'm not sure how/if we can debug it...

ghost · 2015-07-13T09:36:25Z

I'm concerned about our iptables reconfiguration on the nodes, e2e tests that down the node network interface, and the kubelet process that is being restarted.

lavalamp · 2015-07-13T19:22:04Z

I don't know about retrying, it already tries N times and expects them all to succeed.

davidopp · 2015-07-16T22:14:51Z

This test is failing repeatedly on kubernetes-e2e-gce-parallel
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-parallel/3570/
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-parallel/3571/
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-parallel/3572/

Did we merge something today that might have broken this?

davidopp · 2015-07-16T22:15:00Z

@erictune

davidopp · 2015-07-16T22:25:45Z

Did we merge something between 1:30 and 2:00 today that may have broken this?

davidopp · 2015-07-16T22:30:46Z

False alarm, these failures are all performance expectations. We should investigate why stuff got slower but it's not a correctness bug.

davidopp · 2015-07-16T22:42:21Z

Actually this is frequently failing with a more worrisome error, e.g.
http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/2737/testReport/junit/(root)/Kubernetes%20e2e%20suite/Proxy_version_v1_should_proxy_through_a_service_and_a_pod/

• Failure [76.026 seconds]
Proxy
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:41
  version v1
  /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:40
    should proxy through a service and a pod [It]
    /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:169

    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    0: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    1: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    2: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    3: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    4: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    5: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    6: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    7: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    8: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    9: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    10: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    11: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    12: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    13: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    14: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    15: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    16: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    17: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    18: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname2/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:80/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:160/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/pods/proxy-service-tf9mw-ngypj:162/ gave error: an error on the server has prevented the request from succeeding
    19: path /api/v1/proxy/namespaces/e2e-tests-proxy-w068s/services/proxy-service-tf9mw:portname1/ gave error: an error on the server has prevented the request from succeeding

    /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/proxy.go:167

I'm reluctant to increase the timeout to fix the timeout failures ("took xxx > yyy") until we understand what is going on with this other failure.

davidopp · 2015-07-16T22:45:45Z

The "an error on the server has prevented the request from succeeding" failures seem to mostly be happening in the PR builder Jenkins (http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/) whereas all the ones in the kubernetes-e2e-gce-parallel (http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-parallel/3572/) seem to be of the timeout flavor.

davidopp · 2015-07-17T00:35:47Z

These failures appear to have been due to another issue that occurred at about the same time. Per-PR, regular, and parallel Jenkins runs appear to be back to normal. I'll check again later this evening to make sure. Thanks @lavalamp and @dchen1107 for noticing the connection between this and the other issue.

ghost · 2015-08-17T16:29:37Z

I've confirmed that this test is again 100% stable. Closing.

ghost added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/test-infra labels Jul 7, 2015

ghost assigned wojtek-t Jul 7, 2015

ghost added this to the v1.0 milestone Jul 7, 2015

This was referenced Jul 7, 2015

Fix "Proxy version v1 should proxy logs on node *" test failures #10739

Merged

3 e2e tests are currently flaky - debug what's happening #9312

Closed

ghost pushed a commit that referenced this issue Jul 7, 2015

Demote e2e test as per #10792

ac3e3db

Demote e2e test as per #10792.

ghost mentioned this issue Jul 7, 2015

Demote e2e test as per #10792 #10796

Merged

yujuhong added a commit that referenced this issue Jul 7, 2015

Merge pull request #10796 from GoogleCloudPlatform/quinton-hoole-demo…

0a201af

…te-proxy-e2e Demote e2e test as per #10792

wojtek-t mentioned this issue Jul 7, 2015

Check whether all nodes are healthy in e2e framework #10820

Merged

wojtek-t mentioned this issue Jul 8, 2015

"MaxPods Validates MaxPods limit number of pods that are allowed to run." has failed 22/30 times #10720

Closed

ghost removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 10, 2015

wojtek-t mentioned this issue Jul 10, 2015

PD E2E tests fail in soak with Pod cannot be started due to exceeded capacity #10899

Closed

ghost closed this as completed Jul 10, 2015

dchen1107 reopened this Jul 10, 2015

wojtek-t mentioned this issue Jul 13, 2015

Retry 'service unavailable' failures in proxy test #11135

Merged

gmarek mentioned this issue Jul 14, 2015

Kubelet restarts for unknown reasons causing e2e test failures. #11215

Closed

wojtek-t mentioned this issue Jul 15, 2015

Log failures in Proxy e2e tests #11301

Merged

davidopp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jul 16, 2015

ghost closed this as completed Aug 17, 2015

This issue was closed.

Flaky e2e: Proxy version v1 should proxy logs on node (Failed 4 times in the last 30 runs. Stability: 86 % #10792

Flaky e2e: Proxy version v1 should proxy logs on node (Failed 4 times in the last 30 runs. Stability: 86 % #10792

Comments

ghost commented Jul 7, 2015

lavalamp commented Jul 7, 2015

ghost commented Jul 7, 2015

zmerlynn commented Jul 7, 2015

lavalamp commented Jul 7, 2015

wojtek-t commented Jul 7, 2015

wojtek-t commented Jul 7, 2015

cjcullen commented Jul 7, 2015

ghost commented Jul 8, 2015

wojtek-t commented Jul 8, 2015

wojtek-t commented Jul 8, 2015

dchen1107 commented Jul 8, 2015

dchen1107 commented Jul 8, 2015

yujuhong commented Jul 8, 2015

saad-ali commented Jul 8, 2015

saad-ali commented Jul 9, 2015

saad-ali commented Jul 9, 2015

wojtek-t commented Jul 9, 2015

wojtek-t commented Jul 10, 2015

ghost commented Jul 10, 2015

dchen1107 commented Jul 10, 2015

ghost commented Jul 10, 2015

davidopp commented Jul 13, 2015

wojtek-t commented Jul 13, 2015

davidopp commented Jul 13, 2015

wojtek-t commented Jul 13, 2015

ghost commented Jul 13, 2015

wojtek-t commented Jul 13, 2015

ghost commented Jul 13, 2015

lavalamp commented Jul 13, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 16, 2015

davidopp commented Jul 17, 2015

ghost commented Aug 17, 2015