Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Tests] Flaky reboot tests #78901

Open
alejandrox1 opened this issue Jun 11, 2019 · 2 comments

Comments

@alejandrox1
Copy link
Contributor

commented Jun 11, 2019

Which jobs are failing:

  • ci-kubernetes-e2e-gci-gce-reboot
  • ci-kubernetes-e2e-gce-cos-k8sbeta-reboot

Which test(s) are failing:
There are multiple reboot tests failing, such as:

  • [sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by switching off the network interface and ensure they function upon switch on
  • [sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by dropping all outbound packets for a while and ensure they function afterwards
  • [sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart
  • [sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by ordering clean reboot and ensure they function upon restart
  • [sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by dropping all inbound packets for a while and ensure they function afterwards

Since when has it been failing:
These tests have been flaking for a while:
triage.
There is a history of issues referencing them: #74661 , #14772 , #9062 , etc.

Testgrid link:

Reason for failure:
The common error message between all of the failures looks like this:

[sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by switching off the network interface and ensure they function upon switch on expand_less	7m24s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:111
Jun 11 10:29:36.739: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:169

[sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by dropping all outbound packets for a while and ensure they function afterwards expand_less	5m13s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:125
Jun 11 10:34:47.931: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:169
 
[sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by ordering unclean reboot and ensure they function upon restart expand_less	5m11s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:99
Jun 11 10:40:00.572: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:169
 
[sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by ordering clean reboot and ensure they function upon restart expand_less	5m11s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:93
Jun 11 10:45:11.861: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:169
 
[sig-cluster-lifecycle] Reboot [Disruptive] [Feature:Reboot] each node by dropping all inbound packets for a while and ensure they function afterwards expand_less	5m12s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:117
Jun 11 10:50:22.772: Test failed; at least one node failed to reboot in the time given.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/lifecycle/reboot.go:169

/cc @kubernetes/sig-cluster-lifecycle
/sig cluster-lifecycle
/kind flake
/priority important-soon
/milestone v1.15

/cc @jimangel @smourapina @alenkacz

@neolit123

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

/assign @justinsb
Justin, PTAL if you can or delegate to someone with knowledge of the gc{e|i}-cos* jobs.
thanks.

@alejandrox1

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

/milestone v1.16

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.15, v1.16 Jun 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.