Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OperatorGroup expansion/contraction #736

Merged

Conversation

ecordell
Copy link
Member

@ecordell ecordell commented Mar 2, 2019

This takes the commits from #675, and rebases them one at a time to get them working with newer changes to OperatorGroups.

I'd like to double check / write a test for any possible edge cases here. The one that comes to mind is: the same CSV supports AllNamespaces and SingleNamespace, and we accidentally overwrite a "real" CSV with a "copied" CSV. It may be better to go ahead with these and follow up with that test case / fix, though, given that this is blocking.

@ecordell ecordell requested review from jpeeler and njhale March 2, 2019 21:22
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 2, 2019
@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/retest

1 similar comment
@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/retest

@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/test unit

Copy link
Member

@njhale njhale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some things that may cause problems. I also found some things that we may want to look at in follow up PRs. Here are my thoughts:

pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operator.go Show resolved Hide resolved
pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved
test/e2e/operator_groups_e2e_test.go Outdated Show resolved Hide resolved

t.Log("Waiting to ensure copied CSV shows up in other namespace")
err = wait.Poll(pollInterval, pollDuration, func() (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use awaitCSV(...) which will wait until the CSV exists and then apply the given status checker func.

test/e2e/operator_groups_e2e_test.go Outdated Show resolved Hide resolved

// verify created CSV is cleaned up after operator group is "contracted"
t.Log("Modifying operator group to no longer watch all namespaces")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just generate a new OperatorGroup for this test so we avoid modifications to the initial setup.

test/e2e/util_test.go Outdated Show resolved Hide resolved
@@ -3578,7 +3578,7 @@ func TestSyncOperatorGroups(t *testing.T) {

operatorGroup, err := op.GetClient().OperatorsV1alpha2().OperatorGroups(tt.initial.operatorGroup.GetNamespace()).Get(tt.initial.operatorGroup.GetName(), metav1.GetOptions{})
require.NoError(t, err)
assert.Equal(t, tt.expectedStatus, operatorGroup.Status)
assert.EqualValues(t, tt.expectedStatus, operatorGroup.Status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is annoying to fix, but doesn't this mask the difference between "" vs nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/retest

2 similar comments
@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/retest

@ecordell
Copy link
Member Author

ecordell commented Mar 4, 2019

/retest

@ecordell
Copy link
Member Author

ecordell commented Mar 5, 2019

/test unit


errlist := []error{}
for _, csv := range fetchedCSVs {
if csv.IsCopied() && csv.GetAnnotations()[v1alpha2.OperatorGroupAnnotationKey] == operatorGroupName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the sake of completeness we should check the olm.operatorNamespace annotation as well.

@njhale
Copy link
Member

njhale commented Mar 5, 2019

/retest

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 5, 2019
@ecordell ecordell force-pushed the opgroup-expand-contract branch 6 times, most recently from 883f402 to be33ddf Compare March 5, 2019 23:39
ecordell and others added 19 commits March 10, 2019 15:27
This adds requeuing for both the namespace and operator group sync
loops, which fixes syncing issues after initial CSVs have been copied.
After switching to owner labels for cross-namespace ownerrefs, we didn't
update this requeueing logic, causing some delays for reconciliation of those
objects.
this adds annotations to all csvs in an operator group, and copies them
to target namespaces.
makes use of more of the new namespaceset utilities from the resolver

also doesn't bother precomputing the namespaces to add/prune. instead
it gets all namespaces and adjusts each as needed, visiting each once.
We were treating CSVs that had empty phases as "copyable", but that
happens before the checks to see if operatorgroups conflict.

This was causing a race where CSVs could be copied to target namespaces
incorrectly
are available

this avoids some unnecessary resyncing
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2019
@@ -1255,3 +1321,51 @@ func (a *Operator) cleanupCSVDeployments(logger *logrus.Entry, csv *v1alpha1.Clu
}
}
}

func (a *Operator) ensureDeploymentAnnotations(logger *logrus.Entry, csv *v1alpha1.ClusterServiceVersion) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this while looking into why copying was failing - small optimization that skips the walk through Failed if the annotations change in an operatorgroup.

@@ -2067,6 +2085,51 @@ func TestTransitionCSV(t *testing.T) {
},
},
},
{
name: "SingleCSVSucceededToSucceeded/OperatorGroupChanged",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying we don't have to transition to failed to update the operatorgroup

}
}

// requeue csvs in original namespaces or in new target namespaces (to capture removed/added namespaces)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ended up being the reason contraction wasn't working after the rebase. somewhere in reconciling those commits we lost the code that requeues CSVs from the old set of namespaces, which is important to make sure we pick up CSVs that need to be cleaned up when an operatorgroup is contracted.

Added some abstractions around namespacesets to help make this clearer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this effectively done around line 110?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 110 doesn't capture the "contraction" case, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the the "not copied" predicate in getAllCSVsThatProvide(...), then I think it would since that returns the set of CSVs across all namespaces that provide the given APIs. As I recall, the input it's given in syncOperatorGroup represents the union of APIs in the providedAPIs annotation and the CSVs in the OperatorGroup's namespace.

Copy link
Member

@njhale njhale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ecordell, njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 25a917e into operator-framework:master Mar 11, 2019
@ecordell ecordell added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants