Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test] Network granular checks tests failing in ci-kubernetes-e2e-gci-gce-alpha-features #81193

Closed
alejandrox1 opened this issue Aug 8, 2019 · 12 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/unresolved Indicates an issue that can not or will not be resolved.
Milestone

Comments

@alejandrox1
Copy link
Contributor

Which jobs are failing:
ci-kubernetes-e2e-gci-gce-alpha-features

Which test(s) are failing:

  • [sig-network] Networking Granular Checks: Pods should function for node-pod communication: http [LinuxOnly] [NodeConformance] [Conformance]
  • [sig-network] Networking Granular Checks: Services should function for client IP based session affinity: udp
  • [sig-network] Networking should provide Internet connection for containers [Feature:Networking-IPv4]

Since when has it been failing:
Failing since 8/8 at around 4pm PDT
See 6d49d69...ef88694
Possible cause #80978

Testgrid link:
https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-alpha-features

Reason for failure:
Tests are failing with messages such as this one:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:153
Aug  8 20:04:32.610: Couldn't delete ns: "nettest-5548": namespace nettest-5548 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed 
(&errors.errorString{s:"namespace nettest-5548 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed"})
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:337

/milestone v1.16
/priority critical-urgent
/kind failing-test
/sig testing
/sig sig-network
/cc @kubernetes/sig-network-test-failures
/cc @Verolop @jimangel @soggiest @alenkacz

@alejandrox1 alejandrox1 added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Aug 8, 2019
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Aug 8, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 8, 2019
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Aug 8, 2019
@alejandrox1
Copy link
Contributor Author

This may be possibly related to #81191

@alejandrox1 alejandrox1 added this to New (no response yet) in CI Signal team (SIG Release) Aug 8, 2019
@athenabot
Copy link

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Aug 8, 2019
@alejandrox1
Copy link
Contributor Author

I guess these were flakes. Will continue watching
/remove-kind failing-test
/kind flake

@k8s-ci-robot k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. and removed kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Aug 10, 2019
@lachie83
Copy link
Member

I've pinged @wojtek-t from sig-scalability on slack to see if he can assist with triage

@wojtek-t
Copy link
Member

Those are unrelated to #81191 - I don't know these tests - those issues doesn't seem to be scalability related.

@lachie83
Copy link
Member

Those are unrelated to #81191 - I don't know these tests - those issues doesn't seem to be scalability related.

Thanks for reviewing @wojtek-t. I understand that the CSI failing tests aren’t in your domain but the [sig-network] tests that are failing in this run are all exhibiting the same error (as listed in reason for failure) - https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-alpha-features/1160930238318776320. Could those specific failures be related to #80978?

@wojtek-t
Copy link
Member

Thanks for reviewing @wojtek-t. I understand that the CSI failing tests aren’t in your domain but the [sig-network] tests that are failing in this run are all exhibiting the same error (as listed in reason for failure) - https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-alpha-features/1160930238318776320. Could those specific failures be related to #80978?

They don't seem to be - they are failing on namespace deletion, which is happening for every single test. This seems like something networking-related.
[Also - I was running e2e tests with my PR (and feature enabled) and CSI tests were failing for me and we all understood the reason, I didn't see any networking tests failures - I just forgot that we have alpha-features suite - that's why I broke it.]

@alejandrox1
Copy link
Contributor Author

This one is kind of stuck until we fix #82174
We don't know if the tests in this issue are still failing

@vllry
Copy link
Contributor

vllry commented Sep 5, 2019

/assign @bowei
cc @robscott

@robscott
Copy link
Member

robscott commented Sep 6, 2019

Hey @alejandrox1, it looks like this may have been resolved by #82288. #82174 also seems to be resolved at this point, so hopefully it's just a matter of waiting for a few more test runs to ensure tests are all passing consistently again.

@alejandrox1 alejandrox1 moved this from New (no response yet) to Observing (observe test failure/flake before marking as resolved) in CI Signal team (SIG Release) Sep 7, 2019
@alejandrox1
Copy link
Contributor Author

It was indeed!
Thank you very much for all the work you did to resolve these issues @robscott !
/close

@k8s-ci-robot
Copy link
Contributor

@alejandrox1: Closing this issue.

In response to this:

It was indeed!
Thank you very much for all the work you did to resolve these issues @robscott !
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alejandrox1 alejandrox1 moved this from Observing (observe test failure/flake before marking as resolved) to Resolved (week Sep 2) in CI Signal team (SIG Release) Sep 9, 2019
@alejandrox1 alejandrox1 moved this from Resolved (week Sep 2) to Resolved (2+ weeks) in CI Signal team (SIG Release) Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
CI Signal team (SIG Release)
  
Resolved (2+ weeks)
Development

No branches or pull requests

8 participants