-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy before runlevel 30 so machine API can get creds. #31
Conversation
/test e2e-aws |
/test e2e-aws |
Ok I have an e2e-aws failure here I do not understand, and it appears to only be affecting this PR so it may be something in my change. This PR changes the ordering in the CVO to make sure the credentials operator is up before the machine-api needs it. Previously we were at the default 0000_70, now we're at 0000_30_00 (to get infront of machine api). You can see the new manifest here: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cloud-credential-operator/31/pull-ci-openshift-cloud-credential-operator-master-e2e-aws/160/artifacts/release-latest/release-payload/ The use of adding another 00 to the prefix appears ok as it's done in level 0000_50 as well. The failure surfaces in the install logs:
Note that this is my component.
In the logs it looks like the machine-api is deploying either in parallel, or before the cloud credential operator, despite my attempts to have them loaded first with the ordering. Does everything within a runlevel like 0000_30 apply in parallel? If so should I move to 0000_29? |
I saw you copied the CredentialsRequest from the Machine API Operator here, is the plan to have everything centralised here? |
That was the original plan but no more, we now want component repos to have their control of credentails and we'll audit by looking at what's in release manifest. I'd delete them in this PR but we've got a pending problem we need to solve, there's a component that reads these from source and uses it to validate permissions preflight, need to talk to @joelddiaz before we can delete them here. |
/hold |
Does the machine api operator not wait until the cred shows up? You don’t have to be before something for them to get a chance to load - things are launched in parallel at the same runlevel. |
They're not using a credential at all yet, so conceivably they could. There appears to be a problem at runlevel 30 though per #31 (comment), I haven't seen a result yet since changing it to 29 but at 30 we were getting: E0213 19:31:48.700663 1 sync_worker.go:263] unable to synchronize image (waiting 3m19.747206386s): Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource |
Bearing in mind their cred CR won't apply cleanly until we create the CRD. 29 felt safer but can go either way. |
Cvo already has to handle that. You definitely can’t be after theirs, but we are trying to reduce the number of levels. You need to be after kube apiserver for sure. The error doesn’t look that bad. |
Ok will take it back to 30 and see if the error returns. It seems to have failed differently at 29 for some reason. The 30 failures were consistent and I've only seen them on this PR, not really sure where to go next. Issue filed in CVO repo a few days ago: openshift/cluster-version-operator#119 |
machine-api also runs at 30 but CVO will process all items in same runlevel in parallel, and machine-api will need to know to wait until it's creds exist. Per: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level
@smarterclayton @crawford @derekwaynecarr I am majorly stuck here, I do not understand why the CVO is shutting us down saying "the server has forbidden updates to this resource". It is consistent, it only affects this PR, regardless of runlevel 29 or 30. I've filed last week here: openshift/cluster-version-operator#119 This PR is blocked as a result which means machine-api using minted creds is blocked. |
/retest trying to get a new release image for local testing... |
The actual reason why you are seeing this error is:
kube-apiserver rejects objects into all non Add this label to the operator namespace |
kube-apiserver rejects objects into all non run-level 0,1 namespaces until openshift-apiserver is ready, we need to be up before openshift-apiserver and thus require this annotation. Fixes the CI error: "the server has forbidden updates to this resource".
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, twiest The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
Per: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level
Also removes the credentials definitions that were moved to component repos, or are unused.
@smarterclayton @enxebre @derekwaynecarr does this look ok? The manifests dir should show what we'll be trying to include in release now.