Expands IngressController status conditions #224

danehans · 2019-05-01T20:27:07Z

Refactors the IngressController Available condition to be based on the Available status of IngressController dependent resources (i.e. DNS).
Adds a Deployment status condition.
Adds unit tests.
Updates e2e tests.

Miciah · 2019-05-02T03:59:22Z

/retest

Miciah · 2019-05-02T04:04:40Z

pkg/operator/controller/controller.go

+		var conditions []operatorv1.OperatorCondition
+		conditions = computeIngressStatusConditions(conditions, &appsv1.Deployment{}, false)
+		for _, c := range conditions {
+			updated.Status.Conditions = append(updated.Status.Conditions, c)
 		}


Why not directly assign the slice value from computeIngressStatusConditions to updated.Status.Conditions?

updated.Status.Conditions = computeIngressStatusConditions(conditions, &appsv1.Deployment{}, false)

Alternatively, would it make sense to call syncIngressControllerStatus instead?

In fact, syncIngressControllerStatus could subsume enforceEffectiveIngressDomain almost entirely: If ic.Status.Domain were empty, then enforceEffectiveIngressDomain would call syncIngressControllerStatus with a nil deployment pointer (we know there cannot be a deployment if there is no domain). syncIngressControllerStatus would need to (1) take an ingressConfig; (2) check if deployment were nil, in which case it would not set updated.Status.Selector or updated.Status.AvailableReplicas; and (3) check if ic.Status.Domain were empty, in which case syncIngressControllerStatus would proceed to do the defaulting, uniqueness check, and updating or reporting that enforceEffectiveIngressDomain does now. (We could do similar with enforceEffectiveEndpointPublishingStrategy, but we can save that refactoring for another PR.) What do you think?

Alternatively, would it make sense to call syncIngressControllerStatus instead?

Thanks for the review. I was thinking about using syncIngressControllerStatus. However, it performs a status update, as does enforceEffectiveIngressDomain and I thought multiple status update calls would be suboptimal. With the change recommendations that you outline, using ``syncIngressControllerStatus` sounds sensible. I'll start working through the changes.

@Miciah I implemented some of your suggestions above. I keep hitting e2e failures when I move the domain uniqueness logic from enforceEffectiveIngressDomain to syncIngressControllerStatus:

2019-05-03T11:43:31.721-0700 INFO operator log/log.go:26 started zapr logger === RUN TestCreateIngressControllerThenSecret --- FAIL: TestCreateIngressControllerThenSecret (32.82s) certificate_publisher_test.go:69: failed to observe reconciliation of ingresscontroller: timed out waiting for the condition === RUN TestCreateSecretThenIngressController --- FAIL: TestCreateSecretThenIngressController (62.82s) certificate_publisher_test.go:160: failed to observe updated global secret: timed out waiting for the condition === RUN TestOperatorAvailable --- FAIL: TestOperatorAvailable (12.57s) operator_test.go:102: did not get expected available condition: timed out waiting for the condition === RUN TestDefaultIngressControllerExists --- PASS: TestDefaultIngressControllerExists (2.48s) === RUN TestIngressControllerControllerCreateDelete --- FAIL: TestIngressControllerControllerCreateDelete (62.80s) operator_test.go:159: failed to reconcile IngressController openshift-ingress-operator/test: timed out waiting for the condition

pkg/operator/controller/ingress_status.go

pkg/operator/controller/ingress_status_test.go

Miciah · 2019-05-02T15:17:05Z

/test e2e-aws-operator

Miciah · 2019-05-02T16:22:22Z

/refresh

Miciah · 2019-05-02T16:24:16Z

/test e2e-aws-operator
I want to see whether BZ#1705100 shows up again.

Miciah · 2019-05-02T19:34:58Z

/refresh

Miciah · 2019-05-02T19:59:27Z

Still not seeing a new run since yesterday.
/test e2e-aws-operator

ironcladlou · 2019-05-03T14:28:08Z

Need to determine whether these are necessary for 4.1

/hold

Miciah

In #224 (comment), I suggested that syncIngressControllerStatus could do the unique-domain check itself. Adding to that, Dan suggested in chat today that syncIngressControllerStatus should also look up the deployment. With those two changes, the only parameter for syncIngressControllerStatus would be the ingress controller, which would make the callers more uniform and consolidate the status computation logic. Do these changes seem reasonable, and if so, could we incorporate them into this PR, or would it be better to put them in a separate PR?

Miciah · 2019-05-22T04:26:48Z

pkg/operator/controller/ingress_status.go

+func computeIngressStatusConditions(oldConditions []operatorv1.OperatorCondition, deployment *appsv1.Deployment,
+	uniqueDomain bool) []operatorv1.OperatorCondition {
+	oldDegradedCondition := getIngressDegradedCondition(oldConditions)
+	oldProgressingCondition := getIngressProgressingCondition(oldConditions)
 	oldAvailableCondition := getIngressAvailableCondition(oldConditions)


I don't know that having separate getIngressDegradedCondition, getIngressProgressingCondition, and getIngressAvailableCondition functions is important for readability, and it means more code and more looping. What do you think of using a single loop in computeIngressStatusConditions to get all three values, similar to how the DNS operator does it? https://github.com/openshift/cluster-dns-operator/blob/540ab8bca50b50880a4eb44feaddee8352f565bf/pkg/operator/controller/status.go#L202-L212

@Miciah I was going back and forth on whether to have separate functions for each condition or looping through the conditions in a single function. I thought the former would be easier to read and understand. I will update the PR using a single function.

Do these changes seem reasonable, and if so, could we incorporate them into this PR, or would it be better to put them in a separate PR?

They do make sense. I'm working on updating the PR.

Miciah · 2019-05-22T18:54:37Z

Doesn't syncIngressControllerStatus need to get the deployment if the status domain was not set and syncIngressControllerStatus was able to set it?

danehans · 2019-05-22T23:53:15Z

I don't believe so. If syncIngressControllerStatus() was called with an IngressController that did not have status.domain set, that means all the ensure/enforce methods of the main reconcile loop were bypassed and a Deployment does not exist. However, this approach does cause an IngressController without a status.domain to go through a 2nd reconcile loop. This time the status.domain check will pass and the ensure/enforce methods will be called. Do you have any suggestions to avoid the 2nd reconcile loop? We can go back to the original enforceEffectiveIngressDomain and update status from within the function.

ironcladlou · 2019-05-23T00:26:15Z

Consider condition sets produced by the following:

computeLoadBalancerStatus // returns LoadBalancer* conditions
computeDeploymentStatus // returns Deployment* conditions
computeDNSStatus // returns DNS* conditions

For now, let's say that some subset S of those conditions must be "True" for the ingress controller to be considered available. For example, [LoadBalancerReady=True,DeploymentReady=true,DNSReady=true].

Given a set of the union of those conditions filtered by Status=True indexed by ingress controller, you can compute availability of the ingress controller by checking membership of the set.

New criteria can be later added by introducing new conditions to the set of availability-influencing conditions.

The same methodology could be applied to another meta-condition like Degraded or Progressing.

danehans · 2019-05-23T00:28:46Z

@Miciah I updated the control flow of the main Reconcile() loop to remove the need to perform a 2nd reconciliation of an IngressController that does not have status.domain set. With the latest change, if an IngressController that does not have status.domain set, syncIngressControllerStatus is called. The ensure/enforce methods are called if syncIngressControllerStatus does not return an error. syncIngressControllerStatus is called one last time at the end of the reconciliation.

ironcladlou · 2019-05-23T00:36:00Z

If there's a part of status processing that's mutations for things like domain and publishing, and I think that should be its own reconciler or something.

I think one of the outcome of those mutations would need to be a condition indicating we're "admitting" the ingress controller according to constraints (e.g. it must have a unique domain).

Finally, the main reconciler should be ignoring (and not receiving events for) ingress controllers which haven't been admitted, giving it some bedrock assumptions to stand on. There has to be a trust boundary, and decoupling lets us further disentangle the admission/reconciling/status trifecta.

Miciah · 2019-05-23T00:43:49Z

I don't believe so. If syncIngressControllerStatus() was called with an IngressController that did not have status.domain set, that means all the ensure/enforce methods of the main reconcile loop were bypassed and a Deployment does not exist.

Right, that makes sense. Might be worth a comment since the logic is a little hairy.

I updated the control flow of the main Reconcile() loop to remove the need to perform a 2nd reconciliation of an IngressController that does not have status.domain set. With the latest change, if an IngressController that does not have status.domain set, syncIngressControllerStatus is called.

Yeah, this is what I had in mind earlier.

The ensure/enforce methods are called if syncIngressControllerStatus does not return an error. syncIngressControllerStatus is called one last time at the end of the reconciliation.

Should we call syncIngressControllerStatus even if an earlier ensureFoo method fails? Maybe check IsStatusDomainSet first:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			} else if // ...
			}
			if IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...

Miciah · 2019-05-23T00:46:24Z

If there's a part of status processing that's mutations for things like domain and publishing, and I think that should be its own reconciler or something.

Yeah, separating that logic out would make everything a lot more comprehensible.

I think one of the outcome of those mutations would need to be a condition indicating we're "admitting" the ingress controller according to constraints (e.g. it must have a unique domain).

Finally, the main reconciler should be ignoring (and not receiving events for) ingress controllers which haven't been admitted, giving it some bedrock assumptions to stand on. There has to be a trust boundary, and decoupling lets us further disentangle the admission/reconciling/status trifecta.

I don't think this PR makes the situation worse; should we tackle this refactoring into separate controllers in a follow-up?

Miciah · 2019-05-23T05:02:48Z

The ensure/enforce methods are called if syncIngressControllerStatus does not return an error.

On second look, I believe the above statement is incorrect because of the else if:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			} else if // ...

I amend my earlier suggestion as follows:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			}
			if IsStatusDomainSet(ingress) {
				 // ...
				if err := r.enforceEffectiveEndpointPublishingStrategy(ingress, infraConfig); err != nil {
					// ...
				} else if // ...
				}
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...

danehans · 2019-05-24T15:57:25Z

The build "a234567890123456789012345678901234567890123456789012345678-1" status is "Failed" occurred

/test e2e-aws
/test e2e-aws-upgrade

danehans · 2019-05-25T16:12:40Z

fail [github.com/openshift/origin/test/extended/templates/templateinstance_readiness.go:107]: Unexpected error:
    <*errors.errorString | 0xc0018e3aa0>: {
        s: "Failed to import expected imagestreams",
    }

/test e2e-aws

openshift-ci-robot · 2019-06-20T16:08:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danehans
To complete the pull request process, please assign ironcladlou
You can assign the PR to them by writing /assign @ironcladlou in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danehans · 2019-06-21T23:15:51Z

/test e2e-aws-operator

openshift-ci-robot · 2019-06-28T13:11:28Z

@danehans: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ironcladlou · 2019-08-07T18:22:03Z

I think #282 and #283 make this one obsolete.

ironcladlou · 2019-11-06T15:14:52Z

I think this one is obsolete now.

/close

openshift-ci-robot · 2019-11-06T15:14:53Z

@ironcladlou: Closed this PR.

In response to this:

I think this one is obsolete now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 1, 2019

openshift-ci-robot requested review from Miciah and pravisankar May 1, 2019 20:28

danehans force-pushed the addl_ing_conditions branch from e2323e2 to f05a840 Compare May 1, 2019 21:20

Miciah reviewed May 2, 2019

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2019

danehans force-pushed the addl_ing_conditions branch from f05a840 to dac8850 Compare May 3, 2019 18:52

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2019

danehans force-pushed the addl_ing_conditions branch from dac8850 to 9ec2fe3 Compare May 22, 2019 00:32

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2019

Miciah reviewed May 22, 2019

View reviewed changes

danehans force-pushed the addl_ing_conditions branch 3 times, most recently from efbfdb0 to ea636f8 Compare May 22, 2019 18:04

danehans force-pushed the addl_ing_conditions branch from ea636f8 to 4d60a3e Compare May 23, 2019 00:19

danehans force-pushed the addl_ing_conditions branch from d882325 to 7aef7fc Compare May 24, 2019 21:56

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2019

danehans force-pushed the addl_ing_conditions branch from 7aef7fc to e0540c6 Compare June 20, 2019 16:07

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2019

danehans force-pushed the addl_ing_conditions branch 2 times, most recently from 9be57c3 to 68be75f Compare June 21, 2019 18:47

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 21, 2019

danehans force-pushed the addl_ing_conditions branch from 68be75f to 075f970 Compare June 21, 2019 19:41

danehans force-pushed the addl_ing_conditions branch 2 times, most recently from f5fb727 to 206c1be Compare June 24, 2019 18:07

This was referenced Jun 25, 2019

Refactors main reconcile loop #257

Merged

Refactors effective ingress domain enforcement #258

Closed

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2019

Expands IngressController status conditions

92ef7b9

danehans force-pushed the addl_ing_conditions branch from 206c1be to 92ef7b9 Compare June 26, 2019 17:39

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 26, 2019

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2019

openshift-ci-robot closed this Nov 6, 2019

danehans deleted the addl_ing_conditions branch July 31, 2020 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expands IngressController status conditions #224

Expands IngressController status conditions #224

danehans commented May 1, 2019 •

edited

Miciah commented May 2, 2019

Miciah May 2, 2019

danehans May 2, 2019 •

edited

danehans May 3, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

ironcladlou commented May 3, 2019

Miciah left a comment

Miciah May 22, 2019

danehans May 22, 2019

danehans May 22, 2019

Miciah commented May 22, 2019

danehans commented May 22, 2019

ironcladlou commented May 23, 2019

danehans commented May 23, 2019

ironcladlou commented May 23, 2019

Miciah commented May 23, 2019

Miciah commented May 23, 2019

Miciah commented May 23, 2019

danehans commented May 24, 2019

danehans commented May 25, 2019

openshift-ci-robot commented Jun 20, 2019

danehans commented Jun 21, 2019

openshift-ci-robot commented Jun 28, 2019

ironcladlou commented Aug 7, 2019

ironcladlou commented Nov 6, 2019

openshift-ci-robot commented Nov 6, 2019

Expands IngressController status conditions #224

Expands IngressController status conditions #224

Conversation

danehans commented May 1, 2019 • edited

Miciah commented May 2, 2019

Miciah May 2, 2019

Choose a reason for hiding this comment

danehans May 2, 2019 • edited

Choose a reason for hiding this comment

danehans May 3, 2019

Choose a reason for hiding this comment

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

Miciah commented May 2, 2019

ironcladlou commented May 3, 2019

Miciah left a comment

Choose a reason for hiding this comment

Miciah May 22, 2019

Choose a reason for hiding this comment

danehans May 22, 2019

Choose a reason for hiding this comment

danehans May 22, 2019

Choose a reason for hiding this comment

Miciah commented May 22, 2019

danehans commented May 22, 2019

ironcladlou commented May 23, 2019

danehans commented May 23, 2019

ironcladlou commented May 23, 2019

Miciah commented May 23, 2019

Miciah commented May 23, 2019

Miciah commented May 23, 2019

danehans commented May 24, 2019

danehans commented May 25, 2019

openshift-ci-robot commented Jun 20, 2019

danehans commented Jun 21, 2019

openshift-ci-robot commented Jun 28, 2019

ironcladlou commented Aug 7, 2019

ironcladlou commented Nov 6, 2019

openshift-ci-robot commented Nov 6, 2019

danehans commented May 1, 2019 •

edited

danehans May 2, 2019 •

edited