-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2065597: Add support for dynamic, user-managed config #78
Bug 2065597: Add support for dynamic, user-managed config #78
Conversation
Skipping CI for Draft Pull Request. |
9a831f4
to
9b51a25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a very good start.
I'm less familial with the controller changes, so I would have give it a spin to check it's working as expected. You could perhaps mark your PR as ready for review so that we can start by testing it in CI.
} | ||
|
||
isMultiAZDeployment, err := isMultiAZDeployment() | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be an error case, or should we fallback and provide a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was an error previously (see starter.go
). I've just preserved the logic here. There's no reason we couldn't default to false
though. WDYT?
blockStorage, _ := cfg.GetSection("BlockStorage") | ||
if blockStorage != nil { | ||
klog.Infof("[BlockStorage] section found; dropping any legacy settings...") | ||
// Remove the legacy keys, once we ensure they're not overridden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're removing trust-device-path
unconditionally. Should we make the operator degraded like we did with CCCMO when this option is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really have a TODO
here. The comment indicates what I want to do but I didn't know what the default value should be. I'll have a look at an existing cluster.
There are still TODOs on this but this is good enough to run CI on (and see what's crashing and burning) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stephenfin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
No one should ever check a compiled binary in. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Allow us to store the common namespace information as we add controllers in the future. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
4c2ba42
to
cce8a89
Compare
Can you explain - how settings from We had a similar problem of incompatible cloud-configuration with vsphere-csi driver and we ended up rewriting the configmap from
The existing configmap configuration is not really static afaict. If user changes configmap inside |
It was not really that these settings were lost, but they were not used by Cinder CSI. If the customer had set any blockstorage options in the user-managed cloud.conf (in
Right, this is the purpose of this change too.
That was only valid for in-tree unfortunately. With this change, the user-manage cloud.conf is now the source for both in-tree config and Cinder CSI's cloud.conf. |
assets/controller.yaml
Outdated
@@ -284,12 +284,10 @@ spec: | |||
path: clouds.yaml | |||
- name: config-cinderplugin | |||
configMap: | |||
name: openstack-cinder-config | |||
name: csi-cinder-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call this CM cloud-conf
and store the file under the cloud.conf
key, to align with the CCM's cloud config?
Finally some green \o/ There are still a couple of oddities with this PR:
|
c0b4614
to
277232a
Compare
@stephenfin: This pull request references Bugzilla bug 2065597, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@mandre: This pull request references Bugzilla bug 2065597, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we also find an explanation for why getCloudInfo was called so frequently?
pkg/controllers/config/configsync.go
Outdated
if err != nil { | ||
return false, fmt.Errorf("couldn't collect info about cloud availability zones: %w", err) | ||
} | ||
klog.V(2).Infof("found cloud info: %+v", ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to remove this debug logs before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept them for now while I try to debug the frequent calls but I have added a TODO
.
pkg/controllers/config/configsync.go
Outdated
klog.V(2).Infof("found volume zones: %+v", ci.ComputeZones) | ||
// We consider a cloud multiaz when it either have several different zones | ||
// or compute and volumes are different. | ||
return len(ci.ComputeZones) != 1 || len(ci.VolumeZones) != 1 || ci.ComputeZones[0] != ci.VolumeZones[0], nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have an issue where getCloudInfo() always return empty compute and volume zones. I believe this comes from the implementation of getCloudInfo() where we returned a cached value if it's not nil.
The first invocation triggers an error (perhaps because it's too early?) and we never fetch the cloud info again. The error is different based on the cloud where the CI runs.
On vexxhost-managed, we get:
E0503 18:17:23.602278 1 base_controller.go:272] ConfigSync reconciliation failed: couldn't collect info about cloud availability zones: failed to create a compute client: Post "https://rdo.vexxhost.ca:5000/v3/auth/tokens": dial tcp: i/o timeout
While on vexxhost-mecha (using self-signed cert) we get:
E0503 17:57:33.893921 1 base_controller.go:272] ConfigSync reconciliation failed: couldn't collect info about cloud availability zones: failed to create a compute client: Error parsing CA Cert from /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified getCloudInfo
so that it no longer caches the CloudInfo
result internally. Instead, isMultiAZDeployment
will cache the result returned by getCloudInfo
but only if there are one or more volume and compute AZs. This makes sense since both Nova and Cinder have a default AZ of nova
(which you shouldn't actually use! See warning here). Any request that doesn't return at least this is errant.
We probably want to discuss whether we even want to do this, since a user could configure it themselves and it feels a bit "magical"...
/retest |
pkg/controllers/config/cloudinfo.go
Outdated
@@ -24,14 +26,38 @@ type clients struct { | |||
|
|||
var ci *CloudInfo | |||
|
|||
// getCloudInfo fetches and caches metadata from openstack | |||
func getCloudInfo() (*CloudInfo, error) { | |||
func isMultiAZDeployment() (bool, error) { | |||
var err error | |||
|
|||
if ci != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ci != nil { | |
if ci == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh
This is responsible for generating a new config map containing configuration for the cinder CSI driver. This config map is based on a user-provided config map (found at ''). It is unlikely that this file can be used as-is and we therefore need to conduct a bit of surgery on it. Namely, we must do the following: - Add the following to the [Global] section: - [Global] use-clouds - [Global] clouds-file - [Global] cloud - Add the following to the [BlockStorage] section: - [BlockStorage] ignore-volume-az - Drop the following from the [BlockStorage] section: - [BlockStorage] trust-device-path With this done, we can remove our use of static config maps. That will be done in a separate change. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We're now creating this dynamically, meaning we no longer need these static assets. Remove them. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
e12fa96
to
fdaaac6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold for CI
The failure seems seems like a flake and unrelated to this change, however let's make sure of it: |
@stephenfin: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold cancel |
@stephenfin: All pull requests linked via external trackers have merged: Bugzilla bug 2065597 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When working on openshift/enhancements#1009, we noticed that we had the same problem with Cinder CSI that we did with OpenStack Cloud provider: namely, that we are using a static
cloud.conf
for configuration. This means all storage settings fromopenshift-config/cloud-provider-config
are lost when we upgrade.This PR resolves this issue by replacing the static config map containing the Cinder CSI driver config with a dynamic config map managed by the operator. To populate this, the copy-modify-save pattern we used to fix the cloud provider issue in the Cluster Cloud Controller Manager Operator (CCCMO) is reused.
TODO:
openshift-config
namespace. This should be preferred if available, to allow us to get away from combining cloud provider and CSI configuration (these are different services nowadays).