Deploy before runlevel 30 so machine API can get creds. #31

dgoodwin · 2019-02-13T16:43:12Z

Per: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level

Also removes the credentials definitions that were moved to component repos, or are unused.

@smarterclayton @enxebre @derekwaynecarr does this look ok? The manifests dir should show what we'll be trying to include in release now.

dgoodwin · 2019-02-13T18:37:14Z

/test e2e-aws

dgoodwin · 2019-02-15T11:49:59Z

/test e2e-aws

dgoodwin · 2019-02-15T11:58:00Z

Ok I have an e2e-aws failure here I do not understand, and it appears to only be affecting this PR so it may be something in my change.

This PR changes the ordering in the CVO to make sure the credentials operator is up before the machine-api needs it. Previously we were at the default 0000_70, now we're at 0000_30_00 (to get infront of machine api). You can see the new manifest here: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cloud-credential-operator/31/pull-ci-openshift-cloud-credential-operator-master-e2e-aws/160/artifacts/release-latest/release-payload/

The use of adding another 00 to the prefix appears ok as it's done in level 0000_50 as well.

The failure surfaces in the install logs:

time="2019-02-13T19:01:47Z" level=debug
time="2019-02-13T19:01:47Z" level=debug msg="Destroy complete! Resources: 11 destroyed."
time="2019-02-13T19:01:47Z" level=info msg="Waiting up to 30m0s for the cluster to initialize..."
time="2019-02-13T19:01:47Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-13T19:01:58Z" level=debug msg="Still waiting for the cluster to initialize: Could not update service \"openshift-cloud-credential-operator/controller-manager-service\" (84 of 273): the server has forbidden updates to this resource"
time="2019-02-13T19:31:47Z" level=fatal msg="failed to initialize the cluster: Could not update service \"openshift-cloud-credential-operator/controller-manager-service\" (84 of 273): the server has forbidden updates to this resource"

Note that this is my component.

I0213 19:30:54.611858       1 sync_worker.go:457] Running sync for clusteroperator "kube-controller-manager" (79 of 273)
I0213 19:30:55.615205       1 sync_worker.go:470] Done syncing for clusteroperator "kube-controller-manager" (79 of 273)
I0213 19:30:55.615269       1 task_graph.go:497] Running 7 on 1
I0213 19:30:55.615287       1 sync_worker.go:457] Running sync for namespace "openshift-machine-config-operator" (104 of 273)
I0213 19:30:55.615410       1 task_graph.go:497] Running 5 on 0
I0213 19:30:55.615435       1 sync_worker.go:457] Running sync for customresourcedefinition "credentialsrequests.cloudcredential.openshift.io" (80 of 273)
I0213 19:30:55.615605       1 task_graph.go:497] Running 6 on 2
I0213 19:30:55.615626       1 sync_worker.go:457] Running sync for namespace "openshift-machine-api" (87 of 273)
I0213 19:30:55.618005       1 sync_worker.go:470] Done syncing for namespace "openshift-machine-config-operator" (104 of 273)
I0213 19:30:55.618034       1 sync_worker.go:457] Running sync for customresourcedefinition "mcoconfigs.machineconfiguration.openshift.io" (105 of 273)
I0213 19:30:55.618838       1 sync_worker.go:470] Done syncing for namespace "openshift-machine-api" (87 of 273)
I0213 19:30:55.618865       1 sync_worker.go:457] Running sync for namespace "openshift-cluster-api" (88 of 273)
I0213 19:30:55.619653       1 sync_worker.go:470] Done syncing for customresourcedefinition "credentialsrequests.cloudcredential.openshift.io" (80 of 273)
I0213 19:30:55.619684       1 sync_worker.go:457] Running sync for namespace "openshift-cloud-credential-operator" (81 of 273)
I0213 19:30:55.621208       1 sync_worker.go:470] Done syncing for customresourcedefinition "mcoconfigs.machineconfiguration.openshift.io" (105 of 273)
I0213 19:30:55.621242       1 sync_worker.go:457] Running sync for configmap "openshift-machine-config-operator/machine-config-operator-images" (106 of 273)
I0213 19:30:55.622354       1 sync_worker.go:470] Done syncing for namespace "openshift-cluster-api" (88 of 273)
I0213 19:30:55.622384       1 sync_worker.go:457] Running sync for configmap "openshift-machine-api/machine-api-operator-images" (89 of 273)
I0213 19:30:55.622699       1 sync_worker.go:470] Done syncing for namespace "openshift-cloud-credential-operator" (81 of 273)
I0213 19:30:55.622723       1 sync_worker.go:457] Running sync for clusterrole "cloud-credential-operator-role" (82 of 273)
I0213 19:30:55.625154       1 sync_worker.go:470] Done syncing for configmap "openshift-machine-config-operator/machine-config-operator-images" (106 of 273)
I0213 19:30:55.625231       1 sync_worker.go:457] Running sync for clusterrolebinding "default-account-openshift-machine-config-operator" (107 of 273)
I0213 19:30:55.626722       1 sync_worker.go:470] Done syncing for configmap "openshift-machine-api/machine-api-operator-images" (89 of 273)
I0213 19:30:55.626755       1 sync_worker.go:457] Running sync for customresourcedefinition "machines.cluster.k8s.io" (90 of 273)
I0213 19:30:55.627346       1 sync_worker.go:470] Done syncing for clusterrole "cloud-credential-operator-role" (82 of 273)
I0213 19:30:55.627406       1 sync_worker.go:457] Running sync for clusterrolebinding "cloud-credential-operator-rolebinding" (83 of 273)
I0213 19:30:55.628106       1 sync_worker.go:470] Done syncing for clusterrolebinding "default-account-openshift-machine-config-operator" (107 of 273)
I0213 19:30:55.628180       1 sync_worker.go:457] Running sync for deployment "openshift-machine-config-operator/machine-config-operator" (108 of 273)
I0213 19:30:55.630426       1 sync_worker.go:470] Done syncing for clusterrolebinding "cloud-credential-operator-rolebinding" (83 of 273)
I0213 19:30:55.630455       1 sync_worker.go:457] Running sync for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273)
I0213 19:30:55.631196       1 sync_worker.go:470] Done syncing for customresourcedefinition "machines.cluster.k8s.io" (90 of 273)
I0213 19:30:55.631226       1 sync_worker.go:457] Running sync for customresourcedefinition "machinesets.cluster.k8s.io" (91 of 273)
I0213 19:30:55.636029       1 sync_worker.go:470] Done syncing for customresourcedefinition "machinesets.cluster.k8s.io" (91 of 273)
I0213 19:30:55.636059       1 sync_worker.go:457] Running sync for customresourcedefinition "machinedeployments.cluster.k8s.io" (92 of 273)
I0213 19:30:55.636332       1 sync_worker.go:470] Done syncing for deployment "openshift-machine-config-operator/machine-config-operator" (108 of 273)
I0213 19:30:55.636360       1 sync_worker.go:457] Running sync for configmap "openshift-machine-config-operator/machine-config-osimageurl" (109 of 273)
I0213 19:30:55.639809       1 sync_worker.go:470] Done syncing for customresourcedefinition "machinedeployments.cluster.k8s.io" (92 of 273)
I0213 19:30:55.639843       1 sync_worker.go:457] Running sync for customresourcedefinition "clusters.cluster.k8s.io" (93 of 273)
I0213 19:30:55.640162       1 sync_worker.go:470] Done syncing for configmap "openshift-machine-config-operator/machine-config-osimageurl" (109 of 273)
I0213 19:30:55.640195       1 sync_worker.go:457] Running sync for clusteroperator "machine-config" (110 of 273)
I0213 19:30:55.643586       1 sync_worker.go:470] Done syncing for customresourcedefinition "clusters.cluster.k8s.io" (93 of 273)
I0213 19:30:55.643613       1 sync_worker.go:457] Running sync for customresourcedefinition "machineclasses.cluster.k8s.io" (94 of 273)
I0213 19:30:55.645977       1 sync_worker.go:470] Done syncing for customresourcedefinition "machineclasses.cluster.k8s.io" (94 of 273)
I0213 19:30:55.646003       1 sync_worker.go:457] Running sync for customresourcedefinition "machinehealthchecks.healthchecking.openshift.io" (95 of 273)
I0213 19:30:55.648341       1 sync_worker.go:470] Done syncing for customresourcedefinition "machinehealthchecks.healthchecking.openshift.io" (95 of 273)
I0213 19:30:55.648369       1 sync_worker.go:457] Running sync for customresourcedefinition "machines.machine.openshift.io" (96 of 273)
I0213 19:30:55.651627       1 sync_worker.go:470] Done syncing for customresourcedefinition "machines.machine.openshift.io" (96 of 273)
I0213 19:30:55.651658       1 sync_worker.go:457] Running sync for customresourcedefinition "machinesets.machine.openshift.io" (97 of 273)
I0213 19:30:55.654611       1 sync_worker.go:470] Done syncing for customresourcedefinition "machinesets.machine.openshift.io" (97 of 273)
I0213 19:30:55.654637       1 sync_worker.go:457] Running sync for customresourcedefinition "machinedeployments.machine.openshift.io" (98 of 273)
I0213 19:30:55.657589       1 sync_worker.go:470] Done syncing for customresourcedefinition "machinedeployments.machine.openshift.io" (98 of 273)
I0213 19:30:55.657615       1 sync_worker.go:457] Running sync for customresourcedefinition "clusters.machine.openshift.io" (99 of 273)
I0213 19:30:55.660522       1 sync_worker.go:470] Done syncing for customresourcedefinition "clusters.machine.openshift.io" (99 of 273)
I0213 19:30:55.660548       1 sync_worker.go:457] Running sync for customresourcedefinition "machineclasses.machine.openshift.io" (100 of 273)
I0213 19:30:55.663049       1 sync_worker.go:470] Done syncing for customresourcedefinition "machineclasses.machine.openshift.io" (100 of 273)
I0213 19:30:55.663074       1 sync_worker.go:457] Running sync for clusterrolebinding "default-account-openshift-machine-api" (101 of 273)
I0213 19:30:55.665400       1 sync_worker.go:470] Done syncing for clusterrolebinding "default-account-openshift-machine-api" (101 of 273)
I0213 19:30:55.665425       1 sync_worker.go:457] Running sync for deployment "openshift-machine-api/machine-api-operator" (102 of 273)
I0213 19:30:55.675423       1 sync_worker.go:470] Done syncing for deployment "openshift-machine-api/machine-api-operator" (102 of 273)
I0213 19:30:55.675450       1 sync_worker.go:457] Running sync for clusteroperator "machine-api" (103 of 273)
I0213 19:30:58.678828       1 sync_worker.go:470] Done syncing for clusteroperator "machine-api" (103 of 273)
I0213 19:30:59.643622       1 sync_worker.go:470] Done syncing for clusteroperator "machine-config" (110 of 273)
E0213 19:31:05.652641       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
I0213 19:31:13.869279       1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version
I0213 19:31:15.933467       1 reflector.go:286] github.com/openshift/cluster-version-operator/vendor/github.com/openshift/client-go/config/informers/externalversions/factory.go:101: forcing resync
E0213 19:31:25.673291       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
I0213 19:31:43.878161       1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version
E0213 19:31:48.700363       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
I0213 19:31:48.700527       1 task_graph.go:438] No more reachable nodes in graph, continue
I0213 19:31:48.700545       1 task_graph.go:474] No more work
I0213 19:31:48.700563       1 task_graph.go:494] No more work for 3
I0213 19:31:48.700573       1 task_graph.go:494] No more work for 6
I0213 19:31:48.700578       1 task_graph.go:494] No more work for 7
I0213 19:31:48.700581       1 task_graph.go:494] No more work for 5
I0213 19:31:48.700589       1 task_graph.go:494] No more work for 4
I0213 19:31:48.700593       1 task_graph.go:494] No more work for 0
I0213 19:31:48.700598       1 task_graph.go:494] No more work for 1
I0213 19:31:48.700604       1 task_graph.go:494] No more work for 2
I0213 19:31:48.700614       1 task_graph.go:510] Workers finished
I0213 19:31:48.700625       1 task_graph.go:518] Result of work: [Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource]
E0213 19:31:48.700663       1 sync_worker.go:263] unable to synchronize image (waiting 3m19.747206386s): Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource
I0213 19:31:48.700716       1 cvo.go:298] Started syncing cluster version "openshift-cluster-version/version" (2019-02-13 19:31:48.700709155 +0000 UTC m=+2506.887267577)
I0213 19:31:48.700764       1 cvo.go:326] Desired version from operator is v1.Update{Version:"0.0.1-2019-02-13-183754", Image:"registry.svc.ci.openshift.org/ci-op-girsxxlp/release@sha256:f6f118048108d1d34c31dbd85f917c3165050b85f2e86c36bc3e34b05a65a0e1"}
I0213 19:31:48.700867       1 cvo.go:300] Finished syncing cluster version "openshift-cluster-version/version" (154.552µs)
I0213 19:32:13.892888       1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version
I0213 19:32:18.640558       1 reflector.go:286] github.com/openshift/cluster-version-operator/vendor/github.com/openshift/client-go/config/informers/externalversions/factory.go:101: forcing resync
I0213 19:32:18.640686       1 cvo.go:349] Started syncing available updates "openshift-cluster-version/version" (2019-02-13 19:32:18.640678331 +0000 UTC m=+2536.827236755)
I0213 19:32:18.640707       1 cvo.go:298] Started syncing cluster version "openshift-cluster-version/version" (2019-02-13 19:32:18.640698939 +0000 UTC m=+2536.827257395)
I0213 19:32:18.640755       1 cvo.go:326] Desired version from operator is v1.Update{Version:"0.0.1-2019-02-13-183754", Image:"registry.svc.ci.openshift.org/ci-op-girsxxlp/release@sha256:f6f118048108d1d34c31dbd85f917c3165050b85f2e86c36bc3e34b05a65a0e1"}
I0213 19:32:18.640902       1 cvo.go:300] Finished syncing cluster version "openshift-cluster-version/version" (197.489µs)
I0213 19:32:18.694484       1 availableupdates.go:132] Upstream server https://api.openshift.com/api/upgrades_info/v1/graph could not return available updates: unknown version 0.0.1-2019-02-13-183754
I0213 19:32:18.694520       1 cvo.go:351] Finished syncing available updates "openshift-cluster-version/version" (53.838074ms)
I0213 19:32:18.694537       1 cvo.go:298] Started syncing cluster version "openshift-cluster-version/version" (2019-02-13 19:32:18.69453419 +0000 UTC m=+2536.881092555)
I0213 19:32:18.694571       1 cvo.go:326] Desired version from operator is v1.Update{Version:"0.0.1-2019-02-13-183754", Image:"registry.svc.ci.openshift.org/ci-op-girsxxlp/release@sha256:f6f118048108d1d34c31dbd85f917c3165050b85f2e86c36bc3e34b05a65a0e1"}
I0213 19:32:18.694644       1 cvo.go:300] Finished syncing cluster version "openshift-cluster-version/version" (106.248µs)

In the logs it looks like the machine-api is deploying either in parallel, or before the cloud credential operator, despite my attempts to have them loaded first with the ordering. Does everything within a runlevel like 0000_30 apply in parallel? If so should I move to 0000_29?

spangenberg · 2019-02-15T15:53:53Z

I saw you copied the CredentialsRequest from the Machine API Operator here, is the plan to have everything centralised here?

dgoodwin · 2019-02-15T16:10:22Z

That was the original plan but no more, we now want component repos to have their control of credentails and we'll audit by looking at what's in release manifest.

I'd delete them in this PR but we've got a pending problem we need to solve, there's a component that reads these from source and uses it to validate permissions preflight, need to talk to @joelddiaz before we can delete them here.

dgoodwin · 2019-02-15T16:14:03Z

/hold

smarterclayton · 2019-02-15T16:42:41Z

Does the machine api operator not wait until the cred shows up? You don’t have to be before something for them to get a chance to load - things are launched in parallel at the same runlevel.

dgoodwin · 2019-02-15T16:55:31Z

They're not using a credential at all yet, so conceivably they could.

There appears to be a problem at runlevel 30 though per #31 (comment), I haven't seen a result yet since changing it to 29 but at 30 we were getting:

E0213 19:31:48.700663 1 sync_worker.go:263] unable to synchronize image (waiting 3m19.747206386s): Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource

dgoodwin · 2019-02-15T16:56:36Z

Bearing in mind their cred CR won't apply cleanly until we create the CRD. 29 felt safer but can go either way.

smarterclayton · 2019-02-15T17:16:29Z

Cvo already has to handle that. You definitely can’t be after theirs, but we are trying to reduce the number of levels. You need to be after kube apiserver for sure.

The error doesn’t look that bad.

dgoodwin · 2019-02-15T18:11:25Z

Ok will take it back to 30 and see if the error returns. It seems to have failed differently at 29 for some reason. The 30 failures were consistent and I've only seen them on this PR, not really sure where to go next. Issue filed in CVO repo a few days ago: openshift/cluster-version-operator#119

machine-api also runs at 30 but CVO will process all items in same runlevel in parallel, and machine-api will need to know to wait until it's creds exist. Per: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level

dgoodwin · 2019-02-19T13:07:02Z

@smarterclayton @crawford @derekwaynecarr I am majorly stuck here, I do not understand why the CVO is shutting us down saying "the server has forbidden updates to this resource". It is consistent, it only affects this PR, regardless of runlevel 29 or 30. I've filed last week here: openshift/cluster-version-operator#119

This PR is blocked as a result which means machine-api using minted creds is blocked.

abhinavdahiya · 2019-02-19T17:11:27Z

/retest

trying to get a new release image for local testing...

abhinavdahiya · 2019-02-19T17:35:38Z

The actual reason why you are seeing this error is:

error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized

kube-apiserver rejects objects into all non run-level 0,1 namespaces until openshift-apiserver is ready (I think for admission purposes). And since the operator is moving from 70 -> 30 (before openshift-apiserver) resources cannot be created in your namespace.

Add this label to the operator namespace
https://github.com/openshift/cluster-version-operator/blob/93d4eff7a3f4aad8505702d875bb7169e294d221/install/0000_00_cluster-version-operator_00_namespace.yaml#L7

kube-apiserver rejects objects into all non run-level 0,1 namespaces until openshift-apiserver is ready, we need to be up before openshift-apiserver and thus require this annotation. Fixes the CI error: "the server has forbidden updates to this resource".

twiest · 2019-02-19T18:34:01Z

/lgtm

openshift-ci-robot · 2019-02-19T18:34:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, twiest

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin,twiest]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dgoodwin · 2019-02-20T12:34:17Z

/hold cancel

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 13, 2019

openshift-ci-robot requested review from csrwng and twiest February 13, 2019 16:43

dgoodwin mentioned this pull request Feb 15, 2019

Add AWS credentials request. openshift/machine-api-operator#199

Merged

dgoodwin mentioned this pull request Feb 15, 2019

Server Forbidden Updates To This Resource openshift/cluster-version-operator#119

Closed

dgoodwin force-pushed the runlevel branch from 90ccbeb to ff95c95 Compare February 15, 2019 15:49

openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 15, 2019

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 15, 2019

dgoodwin force-pushed the runlevel branch from ff95c95 to 125ef92 Compare February 15, 2019 18:14

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 15, 2019

openshift-ci-robot assigned twiest Feb 19, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2019

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2019

openshift-merge-robot merged commit 4726195 into openshift:master Feb 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy before runlevel 30 so machine API can get creds. #31

Deploy before runlevel 30 so machine API can get creds. #31

dgoodwin commented Feb 13, 2019

dgoodwin commented Feb 13, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

spangenberg commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

smarterclayton commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019 •

edited

Loading

smarterclayton commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 19, 2019

abhinavdahiya commented Feb 19, 2019

abhinavdahiya commented Feb 19, 2019

twiest commented Feb 19, 2019

openshift-ci-robot commented Feb 19, 2019

dgoodwin commented Feb 20, 2019

Deploy before runlevel 30 so machine API can get creds. #31

Deploy before runlevel 30 so machine API can get creds. #31

Conversation

dgoodwin commented Feb 13, 2019

dgoodwin commented Feb 13, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

spangenberg commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

smarterclayton commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 15, 2019 • edited Loading

smarterclayton commented Feb 15, 2019

dgoodwin commented Feb 15, 2019

dgoodwin commented Feb 19, 2019

abhinavdahiya commented Feb 19, 2019

abhinavdahiya commented Feb 19, 2019

twiest commented Feb 19, 2019

openshift-ci-robot commented Feb 19, 2019

dgoodwin commented Feb 20, 2019

dgoodwin commented Feb 15, 2019 •

edited

Loading