New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344
Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344
Conversation
Seems more than a little cray that the auth operator needs to take responsibility for this. |
801b1e5
to
2328111
Compare
d97cb0b
to
7f98165
Compare
Type: "ReadyIngressNodesAvailable", | ||
Status: operatorv1.ConditionFalse, | ||
Reason: "NoReadyIngressNodes", | ||
Message: fmt.Sprintf("Authentication require functional ingress which require at least one schedulable and ready node. Got %d worker nodes and %d master nodes (none are schedulable or ready for ingress pods).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Make edge team owner of this. |
7f98165
to
3b1d363
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mfojtik, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
3b1d363
to
88e6eba
Compare
New changes are detected. LGTM label has been removed. |
The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the openshift org. You can then trigger verification by writing
|
/retest |
adding lgtm back, fixed stucked informer |
@mfojtik: This pull request references Bugzilla bug 1878776, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@mfojtik: This pull request references Bugzilla bug 1878776, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
OWNERS file copied from ingress-operator |
/retest Please review the full test history for this PR and help us cut down flakes. |
@mfojtik: All pull requests linked via external trackers have merged: Bugzilla bug 1878776 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This reverts commit 88e6eba, openshift#344. The logic assumes the router will be scheduled on "worker"-labeled nodes. That leads to false-positives when there are no vanilla 'worker' compute nodes, but are schedulable compute nodes that have custom names ('infra', 'compute', etc.) [1,2]. Instead of trying to second-guess the scheduler and the ingress-operator, let the ingress operator handle reporting this issue [3]. [1]: https://github.com/openshift/machine-config-operator/blob/0170e082a8b8228373bd841d17555fff2cfb51b7/docs/custom-pools.md [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1893386 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1881155
The purpose of this controller is to check whether nodes are available to schedule ingress pods (which require workload schedulable nodes).
Not having ingress available is very common failure case for authentication but it lacks a good signal (from ingress operator) and the authentication operator is not available resulting in longer bug triage and red herring.
This controller handle also case when masters are schedulable, however the workers schedulable is best-effort as there could be taints and toleration that can take effect and cause router pods to not be able to schedule.