Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expands IngressController status conditions #224

Closed
wants to merge 1 commit into from

Conversation

danehans
Copy link
Contributor

@danehans danehans commented May 1, 2019

  • Refactors the IngressController Available condition to be based on the Available status of IngressController dependent resources (i.e. DNS).
  • Adds a Deployment status condition.
  • Adds unit tests.
  • Updates e2e tests.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 1, 2019
@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

/retest

var conditions []operatorv1.OperatorCondition
conditions = computeIngressStatusConditions(conditions, &appsv1.Deployment{}, false)
for _, c := range conditions {
updated.Status.Conditions = append(updated.Status.Conditions, c)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly assign the slice value from computeIngressStatusConditions to updated.Status.Conditions?

		updated.Status.Conditions = computeIngressStatusConditions(conditions, &appsv1.Deployment{}, false)

Alternatively, would it make sense to call syncIngressControllerStatus instead?

In fact, syncIngressControllerStatus could subsume enforceEffectiveIngressDomain almost entirely: If ic.Status.Domain were empty, then enforceEffectiveIngressDomain would call syncIngressControllerStatus with a nil deployment pointer (we know there cannot be a deployment if there is no domain). syncIngressControllerStatus would need to (1) take an ingressConfig; (2) check if deployment were nil, in which case it would not set updated.Status.Selector or updated.Status.AvailableReplicas; and (3) check if ic.Status.Domain were empty, in which case syncIngressControllerStatus would proceed to do the defaulting, uniqueness check, and updating or reporting that enforceEffectiveIngressDomain does now. (We could do similar with enforceEffectiveEndpointPublishingStrategy, but we can save that refactoring for another PR.) What do you think?

Copy link
Contributor Author

@danehans danehans May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, would it make sense to call syncIngressControllerStatus instead?

Thanks for the review. I was thinking about using syncIngressControllerStatus. However, it performs a status update, as does enforceEffectiveIngressDomain and I thought multiple status update calls would be suboptimal. With the change recommendations that you outline, using ``syncIngressControllerStatus` sounds sensible. I'll start working through the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah I implemented some of your suggestions above. I keep hitting e2e failures when I move the domain uniqueness logic from enforceEffectiveIngressDomain to syncIngressControllerStatus:

2019-05-03T11:43:31.721-0700	INFO	operator	log/log.go:26	started zapr logger
=== RUN   TestCreateIngressControllerThenSecret
--- FAIL: TestCreateIngressControllerThenSecret (32.82s)
    certificate_publisher_test.go:69: failed to observe reconciliation of ingresscontroller: timed out waiting for the condition
=== RUN   TestCreateSecretThenIngressController
--- FAIL: TestCreateSecretThenIngressController (62.82s)
    certificate_publisher_test.go:160: failed to observe updated global secret: timed out waiting for the condition
=== RUN   TestOperatorAvailable
--- FAIL: TestOperatorAvailable (12.57s)
    operator_test.go:102: did not get expected available condition: timed out waiting for the condition
=== RUN   TestDefaultIngressControllerExists
--- PASS: TestDefaultIngressControllerExists (2.48s)
=== RUN   TestIngressControllerControllerCreateDelete
--- FAIL: TestIngressControllerControllerCreateDelete (62.80s)
    operator_test.go:159: failed to reconcile IngressController openshift-ingress-operator/test: timed out waiting for the condition

pkg/operator/controller/ingress_status.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status_test.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress_status_test.go Outdated Show resolved Hide resolved
@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

/test e2e-aws-operator

@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

/refresh

@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

/test e2e-aws-operator
I want to see whether BZ#1705100 shows up again.

@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

/refresh

@Miciah
Copy link
Contributor

Miciah commented May 2, 2019

Still not seeing a new run since yesterday.
/test e2e-aws-operator

@ironcladlou
Copy link
Contributor

Need to determine whether these are necessary for 4.1

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2019
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2019
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #224 (comment), I suggested that syncIngressControllerStatus could do the unique-domain check itself. Adding to that, Dan suggested in chat today that syncIngressControllerStatus should also look up the deployment. With those two changes, the only parameter for syncIngressControllerStatus would be the ingress controller, which would make the callers more uniform and consolidate the status computation logic. Do these changes seem reasonable, and if so, could we incorporate them into this PR, or would it be better to put them in a separate PR?

func computeIngressStatusConditions(oldConditions []operatorv1.OperatorCondition, deployment *appsv1.Deployment,
uniqueDomain bool) []operatorv1.OperatorCondition {
oldDegradedCondition := getIngressDegradedCondition(oldConditions)
oldProgressingCondition := getIngressProgressingCondition(oldConditions)
oldAvailableCondition := getIngressAvailableCondition(oldConditions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that having separate getIngressDegradedCondition, getIngressProgressingCondition, and getIngressAvailableCondition functions is important for readability, and it means more code and more looping. What do you think of using a single loop in computeIngressStatusConditions to get all three values, similar to how the DNS operator does it? https://github.com/openshift/cluster-dns-operator/blob/540ab8bca50b50880a4eb44feaddee8352f565bf/pkg/operator/controller/status.go#L202-L212

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah I was going back and forth on whether to have separate functions for each condition or looping through the conditions in a single function. I thought the former would be easier to read and understand. I will update the PR using a single function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes seem reasonable, and if so, could we incorporate them into this PR, or would it be better to put them in a separate PR?

They do make sense. I'm working on updating the PR.

@danehans danehans force-pushed the addl_ing_conditions branch 3 times, most recently from efbfdb0 to ea636f8 Compare May 22, 2019 18:04
@Miciah
Copy link
Contributor

Miciah commented May 22, 2019

Doesn't syncIngressControllerStatus need to get the deployment if the status domain was not set and syncIngressControllerStatus was able to set it?

@danehans
Copy link
Contributor Author

I don't believe so. If syncIngressControllerStatus() was called with an IngressController that did not have status.domain set, that means all the ensure/enforce methods of the main reconcile loop were bypassed and a Deployment does not exist. However, this approach does cause an IngressController without a status.domain to go through a 2nd reconcile loop. This time the status.domain check will pass and the ensure/enforce methods will be called. Do you have any suggestions to avoid the 2nd reconcile loop? We can go back to the original enforceEffectiveIngressDomain and update status from within the function.

@ironcladlou
Copy link
Contributor

Consider condition sets produced by the following:

computeLoadBalancerStatus // returns LoadBalancer* conditions
computeDeploymentStatus // returns Deployment* conditions
computeDNSStatus // returns DNS* conditions

For now, let's say that some subset S of those conditions must be "True" for the ingress controller to be considered available. For example, [LoadBalancerReady=True,DeploymentReady=true,DNSReady=true].

Given a set of the union of those conditions filtered by Status=True indexed by ingress controller, you can compute availability of the ingress controller by checking membership of the set.

New criteria can be later added by introducing new conditions to the set of availability-influencing conditions.

The same methodology could be applied to another meta-condition like Degraded or Progressing.

@danehans
Copy link
Contributor Author

@Miciah I updated the control flow of the main Reconcile() loop to remove the need to perform a 2nd reconciliation of an IngressController that does not have status.domain set. With the latest change, if an IngressController that does not have status.domain set, syncIngressControllerStatus is called. The ensure/enforce methods are called if syncIngressControllerStatus does not return an error. syncIngressControllerStatus is called one last time at the end of the reconciliation.

@ironcladlou
Copy link
Contributor

If there's a part of status processing that's mutations for things like domain and publishing, and I think that should be its own reconciler or something.

I think one of the outcome of those mutations would need to be a condition indicating we're "admitting" the ingress controller according to constraints (e.g. it must have a unique domain).

Finally, the main reconciler should be ignoring (and not receiving events for) ingress controllers which haven't been admitted, giving it some bedrock assumptions to stand on. There has to be a trust boundary, and decoupling lets us further disentangle the admission/reconciling/status trifecta.

@Miciah
Copy link
Contributor

Miciah commented May 23, 2019

I don't believe so. If syncIngressControllerStatus() was called with an IngressController that did not have status.domain set, that means all the ensure/enforce methods of the main reconcile loop were bypassed and a Deployment does not exist.

Right, that makes sense. Might be worth a comment since the logic is a little hairy.

I updated the control flow of the main Reconcile() loop to remove the need to perform a 2nd reconciliation of an IngressController that does not have status.domain set. With the latest change, if an IngressController that does not have status.domain set, syncIngressControllerStatus is called.

Yeah, this is what I had in mind earlier.

The ensure/enforce methods are called if syncIngressControllerStatus does not return an error. syncIngressControllerStatus is called one last time at the end of the reconciliation.

Should we call syncIngressControllerStatus even if an earlier ensureFoo method fails? Maybe check IsStatusDomainSet first:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			} else if // ...
			}
			if IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...

@Miciah
Copy link
Contributor

Miciah commented May 23, 2019

If there's a part of status processing that's mutations for things like domain and publishing, and I think that should be its own reconciler or something.

Yeah, separating that logic out would make everything a lot more comprehensible.

I think one of the outcome of those mutations would need to be a condition indicating we're "admitting" the ingress controller according to constraints (e.g. it must have a unique domain).

Finally, the main reconciler should be ignoring (and not receiving events for) ingress controllers which haven't been admitted, giving it some bedrock assumptions to stand on. There has to be a trust boundary, and decoupling lets us further disentangle the admission/reconciling/status trifecta.

I don't think this PR makes the situation worse; should we tackle this refactoring into separate controllers in a follow-up?

@Miciah
Copy link
Contributor

Miciah commented May 23, 2019

The ensure/enforce methods are called if syncIngressControllerStatus does not return an error.

On second look, I believe the above statement is incorrect because of the else if:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			} else if // ...

I amend my earlier suggestion as follows:

			if !IsStatusDomainSet(ingress) {
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...
			}
			if IsStatusDomainSet(ingress) {
				 // ...
				if err := r.enforceEffectiveEndpointPublishingStrategy(ingress, infraConfig); err != nil {
					// ...
				} else if // ...
				}
				if err := r.syncIngressControllerStatus(ingress, ingressConfig); err != nil {
					// ...

@danehans
Copy link
Contributor Author

The build "a234567890123456789012345678901234567890123456789012345678-1" status is "Failed" occurred

/test e2e-aws
/test e2e-aws-upgrade

@danehans
Copy link
Contributor Author

fail [github.com/openshift/origin/test/extended/templates/templateinstance_readiness.go:107]: Unexpected error:
    <*errors.errorString | 0xc0018e3aa0>: {
        s: "Failed to import expected imagestreams",
    }

/test e2e-aws

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danehans
To complete the pull request process, please assign ironcladlou
You can assign the PR to them by writing /assign @ironcladlou in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danehans danehans force-pushed the addl_ing_conditions branch 2 times, most recently from 9be57c3 to 68be75f Compare June 21, 2019 18:47
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 21, 2019
@danehans
Copy link
Contributor Author

/test e2e-aws-operator

@danehans danehans force-pushed the addl_ing_conditions branch 2 times, most recently from f5fb727 to 206c1be Compare June 24, 2019 18:07
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2019
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 26, 2019
@openshift-ci-robot
Copy link
Contributor

@danehans: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2019
@ironcladlou
Copy link
Contributor

I think #282 and #283 make this one obsolete.

@ironcladlou
Copy link
Contributor

I think this one is obsolete now.

/close

@openshift-ci-robot
Copy link
Contributor

@ironcladlou: Closed this PR.

In response to this:

I think this one is obsolete now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danehans danehans deleted the addl_ing_conditions branch July 31, 2020 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants