New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Failing test] [sig-network] Firewall rule [Slow] [Serial] should create valid firewall rules for LoadBalancer type service #74887
Comments
It appears that the test is failing on waiting for the node health check firewall rule (which is shared among external Load Balancers) to be deleted.
Also note that this test doesn't fail in non-upgrade jobs, e.g. https://k8s-testgrid.appspot.com/sig-network-gce#gci-gce-serial&width=20 |
My current suspect is that the LB service created by the upgrade test wasn't properly deleted before starting this "serial" test. If any other LB service exists (other than the ones created by this test), the node health check firewall rule will not be removed. |
cc @grayluck to see if he can help :) |
Dig a bit with @grayluck on this. We found that the upgrade test itself paniced in the middle and exited during the test, hence didn't clean up all the resources (including LoadBalancer type services) it created. Unexpectedly, it continued to run the rest of the tests (including this serial firewall test, which assumes no other LB service exists) and failed. Some relevant logs from https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-cluster-new/2135:
We will do some more investigation, but would be great to have help from relevant folks to look at why the upgrade test itself hits fatal error @mariantalla |
Similar findings from @msau42 on #74890 (comment) as well. |
Seems like we should use #74893 to track the upgrade test failure. |
@MrHohn while the underlying issue gets fixed, is there another job we can look at that covers the same behavior and upgrade path? e.g. something from sig-network's dashboards perhaps? |
@mariantalla From sig-network dashboards we run the same test but that doesn't trigger the upgrade path: This test is passing in another upgrade job though: https://k8s-testgrid.appspot.com/sig-release-master-upgrade#gce-new-master-upgrade-cluster-parallel&include-filter-by-regex=firewall |
Hello! We are in code freeze for 1.14. It looks like investigation is still underway for this issue, will this issue be resolved in the next week? If this is a non-release blocking issue can we move it to 1.15? |
@soggiest we are tracking this issueunder milestone v1.14 because these are failures in master-upgrade. |
@MrHohn I see the tests clearing up in both of the aforementioned jobs but no reference to PRs referencing this issue. Did something else happened? 🤔 |
I'm guessing it will flake again --- still seeing the same error on some latest runs. |
The workaround for #74890 seems to have worked and this test started passing. Will wait for it to become stable. |
It appears go1.12.1 may have fixed this as well, moving to observation |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Which jobs are failing:
ci-kubernetes-e2e-gce-new-master-upgrade-cluster-new
inWhich test(s) are failing:
[sig-network] Firewall rule [Slow] [Serial] should create valid firewall rules for LoadBalancer type service
Since when has it been failing:
2019-03-01, test-infra:e858a8b2e
(gce-new-master-upgrade-cluster-new shows
fa9347840
as the first commit for which the test failed, butfa9347840
came aftere858a8b2e
).Testgrid link:
Reason for failure:
/sig testing
/sig network
/kind failing-test
/priority critical-urgent
/milestone v1.14
cc @smourapina @alejandrox1 @kacole2 @mortent
The text was updated successfully, but these errors were encountered: