Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344

mfojtik · 2020-09-11T14:07:23Z

The purpose of this controller is to check whether nodes are available to schedule ingress pods (which require workload schedulable nodes).
Not having ingress available is very common failure case for authentication but it lacks a good signal (from ingress operator) and the authentication operator is not available resulting in longer bug triage and red herring.

This controller handle also case when masters are schedulable, however the workers schedulable is best-effort as there could be taints and toleration that can take effect and cause router pods to not be able to schedule.

pkg/controllers/worker/worker_available_controller.go

pkg/operator/starter.go

marun · 2020-09-11T17:13:02Z

Seems more than a little cray that the auth operator needs to take responsibility for this.

sttts · 2020-09-14T07:54:01Z

pkg/controllers/ingressnodesavailable/ingress_nodes_available_controller.go

+			Type:   "ReadyIngressNodesAvailable",
+			Status: operatorv1.ConditionFalse,
+			Reason: "NoReadyIngressNodes",
+			Message: fmt.Sprintf("Authentication require functional ingress which require at least one schedulable and ready node. Got %d worker nodes and %d master nodes (none are schedulable or ready for ingress pods).",


sttts · 2020-09-14T07:57:13Z

Make edge team owner of this.

sttts · 2020-09-14T08:22:16Z

/lgtm
/approve

openshift-ci-robot · 2020-09-14T08:22:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sttts]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-09-14T08:36:22Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-14T08:49:23Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-14T09:41:18Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-14T09:54:17Z

/retest

Please review the full test history for this PR and help us cut down flakes.

mfojtik · 2020-09-14T10:27:03Z

/retest

openshift-bot · 2020-09-14T10:46:17Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-14T10:59:18Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-09-14T11:34:03Z

New changes are detected. LGTM label has been removed.

openshift-ci-robot · 2020-09-14T11:36:35Z

The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the openshift org. You can then trigger verification by writing /verify-owners in a comment.

pravisankar
- User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
- pkg/controllers/ingressnodesavailable/OWNERS
ramr
- User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
- pkg/controllers/ingressnodesavailable/OWNERS

mfojtik · 2020-09-14T13:41:21Z

/retest

mfojtik · 2020-09-14T13:41:44Z

adding lgtm back, fixed stucked informer

openshift-ci-robot · 2020-09-14T13:44:29Z

@mfojtik: This pull request references Bugzilla bug 1878776, which is invalid:

expected the bug to target the "4.6.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1878776:ingressnodesavailable: add controller that checks if router can schedule pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mfojtik · 2020-09-14T13:44:49Z

/bugzilla refresh

openshift-ci-robot · 2020-09-14T13:44:56Z

@mfojtik: This pull request references Bugzilla bug 1878776, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.6.0) matches configured target release for branch (4.6.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mfojtik · 2020-09-14T13:45:20Z

OWNERS file copied from ingress-operator

openshift-bot · 2020-09-14T14:27:52Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-09-14T17:01:14Z

@mfojtik: All pull requests linked via external trackers have merged:

openshift/cluster-authentication-operator#344

Bugzilla bug 1878776 has been moved to the MODIFIED state.

In response to this:

Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This reverts commit 88e6eba, openshift#344. The logic assumes the router will be scheduled on "worker"-labeled nodes. That leads to false-positives when there are no vanilla 'worker' compute nodes, but are schedulable compute nodes that have custom names ('infra', 'compute', etc.) [1,2]. Instead of trying to second-guess the scheduler and the ingress-operator, let the ingress operator handle reporting this issue [3]. [1]: https://github.com/openshift/machine-config-operator/blob/0170e082a8b8228373bd841d17555fff2cfb51b7/docs/custom-pools.md [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1893386 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1881155

openshift-ci-robot requested review from deads2k and stlaz September 11, 2020 14:07

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/controllers/worker/worker_available_controller.go Outdated Show resolved Hide resolved

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/controllers/worker/worker_available_controller.go Outdated Show resolved Hide resolved

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/controllers/worker/worker_available_controller.go Outdated Show resolved Hide resolved

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/controllers/worker/worker_available_controller.go Outdated Show resolved Hide resolved

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/controllers/worker/worker_available_controller.go Outdated Show resolved Hide resolved

sttts reviewed Sep 11, 2020

View reviewed changes

pkg/operator/starter.go Outdated Show resolved Hide resolved

mfojtik force-pushed the worker-controller branch from 801b1e5 to 2328111 Compare September 14, 2020 07:37

mfojtik changed the title ~~workers: add controller that checks if router can schedule pods~~ ingressnodesavailable: add controller that checks if router can schedule pods Sep 14, 2020

mfojtik force-pushed the worker-controller branch 3 times, most recently from d97cb0b to 7f98165 Compare September 14, 2020 07:42

sttts reviewed Sep 14, 2020

View reviewed changes

mfojtik force-pushed the worker-controller branch from 7f98165 to 3b1d363 Compare September 14, 2020 08:20

openshift-ci-robot assigned sttts Sep 14, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2020

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. labels Sep 14, 2020

workers: add controller that checks if router can schedule pods

88e6eba

mfojtik force-pushed the worker-controller branch from 3b1d363 to 88e6eba Compare September 14, 2020 11:34

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2020

mfojtik added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2020

mfojtik changed the title ~~ingressnodesavailable: add controller that checks if router can schedule pods~~ Bug 1878776:ingressnodesavailable: add controller that checks if router can schedule pods Sep 14, 2020

openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Sep 14, 2020

openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Sep 14, 2020

mfojtik changed the title ~~Bug 1878776:ingressnodesavailable: add controller that checks if router can schedule pods~~ Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods Sep 14, 2020

openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Sep 14, 2020

mfojtik removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Sep 14, 2020

openshift-merge-robot merged commit ee0dce6 into openshift:master Sep 14, 2020

wking mentioned this pull request Oct 30, 2020

Bug 1893386: Revert "workers: add controller that checks if router can schedule pods" #368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344

Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344

mfojtik commented Sep 11, 2020 •

edited

marun commented Sep 11, 2020

sttts Sep 14, 2020

mfojtik Sep 14, 2020

sttts commented Sep 14, 2020

sttts commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344

Bug 1878776: ingressnodesavailable: add controller that checks if router can schedule pods #344

Conversation

mfojtik commented Sep 11, 2020 • edited

marun commented Sep 11, 2020

sttts Sep 14, 2020

Choose a reason for hiding this comment

mfojtik Sep 14, 2020

Choose a reason for hiding this comment

sttts commented Sep 14, 2020

sttts commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 14, 2020

openshift-bot commented Sep 14, 2020

openshift-ci-robot commented Sep 14, 2020

mfojtik commented Sep 11, 2020 •

edited