Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3 #93

Merged

Conversation

marun
Copy link
Contributor

@marun marun commented Jan 14, 2020

This PR is in support of a 4.4 PR that unifies all service ca controllers into a single deployment to simplify maintenance and administration: #89

@openshift-ci-robot
Copy link
Contributor

@marun: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Ensure removal of 4.4-only resources on downgrade to 4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 14, 2020
@marun marun changed the title Ensure removal of 4.4-only resources on downgrade to 4.3 Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3 Jan 14, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jan 14, 2020
@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is invalid:

  • expected the bug to target the "4.3.0" release, but it targets "---" instead
  • expected Bugzilla bug 1791022 to depend on a bug in one of the following states: MODIFIED, ON_QA, VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marun
Copy link
Contributor Author

marun commented Jan 14, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is invalid:

  • expected the bug to target the "4.3.0" release, but it targets "---" instead
  • expected Bugzilla bug 1791022 to depend on a bug in one of the following states: MODIFIED, ON_QA, VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marun
Copy link
Contributor Author

marun commented Jan 15, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is invalid:

  • expected the bug to target the "4.3.0" release, but it targets "4.3.z" instead
  • expected dependent Bugzilla bug 1791188 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marun
Copy link
Contributor Author

marun commented Jan 15, 2020

/hold pending #89

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2020
@marun
Copy link
Contributor Author

marun commented Jan 21, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is invalid:

  • expected the bug to target the "4.3.0" release, but it targets "4.3.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marun
Copy link
Contributor Author

marun commented Jan 27, 2020

/hold cancel
/bugzilla refresh

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2020
@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is invalid:

  • expected dependent Bugzilla bug 1791188 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/hold cancel
/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marun
Copy link
Contributor Author

marun commented Feb 17, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 17, 2020
@openshift-ci-robot
Copy link
Contributor

@marun: This pull request references Bugzilla bug 1791022, which is valid.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 17, 2020
@marun
Copy link
Contributor Author

marun commented Feb 17, 2020

/test all

@marun
Copy link
Contributor Author

marun commented Feb 17, 2020

/retest

@marun
Copy link
Contributor Author

marun commented Feb 19, 2020

@sttts @stlaz This PR would ideally merge in time for the next 4.3.z release.

if err != nil {
klog.Warningf("Failed to retrieve 4.4 deployment: %v", err)
}
err = deployClient.Delete(deployName, &metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not delete directly, why a GET?

Shouldn't we stop the loop as soon as it succeeds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code is still in place, did you add your changes to your latest commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get is still there. But I don't care too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I would prefer to see the loop stop when the deployment is gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the changes as per @deads2k. iirc his assertion is that continual reconciliation is required to avoid confusing users when behavior is inconsistent (removes a deployment on startup not subsequently).

As to why get, I thought it would be preferable to see 'get' in the log vs 'delete' if continually reconciling but maybe that's specious logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fine with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am still confused why the deployment is different from the other resources in cleanupUnifiedDeployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I asked in #93 (comment). Those other resources don't have an operational impact for the operator, so deleting them isn't so critical, but the principle of consistent behavior suggests continual reconciliation for them too.

I guess I've answered my own question. Maybe reconcile on a longer timeline - 20m?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. 20m is also fine, so is 1m.

@marun marun force-pushed the cleanup-4.4-resources branch 2 times, most recently from ca636ac to 4a632a4 Compare February 20, 2020 06:34
//
// The 4.4 deployment does have an operational impact, and is continually monitored
// for removal via a goroutine started in RunOperator.
if err := once.Do(func() error { return cleanupUnifiedDeployment(c) }); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there no loop around these like it is in the other case?

Copy link
Contributor Author

@marun marun Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enclosing function is the operator sync loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination with sync.Once confused me. Anyway, seems to be good enough.

@marun
Copy link
Contributor Author

marun commented Feb 20, 2020

/hold

Need to revert the deployment cleanup to be continuous rather than once as per @deads2k

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2020
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 21, 2020
@marun
Copy link
Contributor Author

marun commented Feb 21, 2020

/hold cancel

Updated to be in keeping with openshift/cluster-openshift-apiserver-operator#313, PTAL.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2020
@@ -14,6 +36,16 @@ func syncControllers(c serviceCAOperator, operatorConfig *operatorv1.ServiceCA)
return err
}

// Remove resources related to the 4.4 controller deployment at most once. These
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Is it acceptable to remove the resources related to the deployment at most once rather than continually, since the resources in question have no operational impact without the deployment (which is continually removed by the starter)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care too much. It's good enough to do it once.

@marun
Copy link
Contributor Author

marun commented Feb 23, 2020

/retest

@marun
Copy link
Contributor Author

marun commented Feb 25, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2020
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 26, 2020
@marun
Copy link
Contributor Author

marun commented Feb 26, 2020

/hold cancel

Updated to attempt removal of 4.4 resources every 20 minutes.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2020
@sttts
Copy link
Contributor

sttts commented Feb 26, 2020

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marun, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2020
@shawn-hurley shawn-hurley added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 6, 2020
@shawn-hurley
Copy link

/retest

@openshift-merge-robot openshift-merge-robot merged commit dd04bc4 into openshift:release-4.3 Mar 6, 2020
@openshift-ci-robot
Copy link
Contributor

@marun: All pull requests linked via external trackers have merged. Bugzilla bug 1791022 has been moved to the MODIFIED state.

In response to this:

Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants