New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1806594: Multi-AZ test should only check nodes the pods are scheduled to #24709
Conversation
…cheduled to The test assumes that all nodes are schedulable when calculating nodes, but masters are not and nodes in many e2e runs are only in two zones. The e2e suite needs to be fixed to take selectors for nodes that workloads can schedule to by default and then the test should use only those nodes to get zones, but that is a much more invasive change. The minimum workaround is to only verify spreading across the nodes we are actually scheduled to and error if there is only one node scheduled to (a multi-az test should fail if we are in a single AZ).
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@smarterclayton: This pull request references Bugzilla bug 1806594, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-4.4 |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-4.3 |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4.4 bug is 1814360 and 4.3 bug is 1814363 |
/hold I may have misdiagnosed the particular failure here in GCP. The general bug is still a bug (masters and workers don't have to match) but this may not fix the GCP flake correctly. |
/test e2e-gcp |
/test e2e-aws |
/test e2e-gcp |
Can you make sure we don't call that upstream commit with number pointing to an issue, it will be confusing during the next rebase. |
/test e2e-gcp |
There is no upstream fix yet, so I'd rather reference the issue where you can find the bug than nothing. |
/test e2e-aws-serial |
1 similar comment
/test e2e-aws-serial |
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test e2e-aws-serial |
1 similar comment
/test e2e-aws-serial |
@smarterclayton I've picked the main commit for the rebase PR to try to get the multi-AZ test passing. Will revisit before merge. |
@smarterclayton or @damemi can we get some resolution on this one? |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@smarterclayton: This pull request references Bugzilla bug 1806594. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The test assumes that all nodes are schedulable when calculating nodes,
but masters are not and nodes in many e2e runs are only in two zones.
The e2e suite needs to be fixed to take selectors for nodes that workloads
can schedule to by default and then the test should use only those nodes
to get zones, but that is a much more invasive change.
The minimum workaround is to only verify spreading across the nodes we
are actually scheduled to and error if there is only one node scheduled
to (a multi-az test should fail if we are in a single AZ).
This flakes 1/4 times on AWS because we run 2 zones.