Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework to library-go #44

Merged
merged 28 commits into from Jul 30, 2020
Merged

Conversation

jsafrane
Copy link
Contributor

@jsafrane jsafrane commented Jul 27, 2020

Rework to NewCSIControllerSet provided by library-go. In addition, following controllers are running:

  • ManilaController
    • Checks if the cloud has Manila and starts CSIControllerSets for Manila and NFS.
    • Creates StorageClasses based on Manila shares
  • SecretSync controller that translates cloud credentials to format understood by the driver.

The operator assumes that all connection details that the operator needs when talking to OpenStack is prepared by CSO, namely:

  • CredentialRequest and its result in "manila-cloud-credentials" Secret.
  • (Optional) self-signed CA certificate in "cloud-provider-config" ConfigMap.

@openshift/storage

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2020
@jsafrane jsafrane force-pushed the library-go branch 3 times, most recently from 0c9b7b1 to fea78fd Compare July 29, 2020 08:45
Makefile Outdated
# $3 - Dockerfile path
# $4 - context directory for image build
# It will generate target "image-$(1)" for building the image and binding it as a prerequisite to target "images".
$(call build-image,csi-driver-manila-operator,$(IMAGE_REGISTRY)/ocp/4.5:csi-driver-manila-operator,./build/Dockerfile.openshift,.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Makefile Outdated
@@ -0,0 +1,54 @@
SHELL :=/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used (I need to remove from AWS EBS too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- key: CriticalAddonsOnly
operator: Exists
containers:
# Warning: the operator expects the first container to be the CSI driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we can remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

allowPrivilegeEscalation: true
image: ${DRIVER_IMAGE}
args:
- --v=5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${LOG_LEVEL} in all containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure: provisioner and snapshotter need to be privileged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! Removed privileged from controller containers

metadata:
name: ebs.csi.aws.com
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to remove these files, or the whole test dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed whole test directory


klog.Info("Starting controllers")
go manilaController.Run(ctx, 1)
go secretSyncController.Run(ctx, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller will run regardless of manila; can we move it to the manila.Runnable slice above and have it run only when manila is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"csi.storage.k8s.io/node-publish-secret-namespace": util.OperatorNamespace,
},
}
return sc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageclass-base.yaml isn't being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return factory.New().WithSync(c.sync).WithSyncDegradedOnError(operatorClient).WithInformers(
operatorClient.Informer(),
secretInformer.Core().V1().Secrets().Informer(),
).ToController("SecretSync", eventRecorder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no resync interval set with ResyncEvery() so this controller won't resync periodically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 20 minutes resync

Data: data,
}

return &secret, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: why does the CSI driver and the cluster-credentials-operator produce distinct secrets? Are both formats "officially" supported/valid in OpenStack? If so, I'd like to see either cloud-credentials-operator or the CSI driver supporting an additional format. But that's not a discussion for this PR...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @Fedosin and there are at least three different confing/secret formats in OpenStack, each component using its own. I agree it should be in cloud-credentials-operator

@jsafrane jsafrane force-pushed the library-go branch 3 times, most recently from 3f025c9 to 57addc4 Compare July 29, 2020 15:35
k8s.io/client-go v0.19.0-rc.2
k8s.io/component-base v0.19.0-rc.2
k8s.io/klog/v2 v2.3.0
sigs.k8s.io/yaml v1.2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: why are you pinning these deps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go mod tidy does that.

// under (short?) maintenance / reconfiguration.
// Similarly, StorageClasses are not deleted when a share type disappears
// from Manila.
type Controller struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in library-go (for debugging reasons) it's recommended to name these structs with specific names, like ManilaCSIDriverController. Same for the files. We're not in library-go, but just it might be good to follow th same convention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

err = nil
}
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want to return if it's not found here, we want to fall through to ApplyStorageClass instead, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed + added comment about fallthrough

return err
}

func (c *Controller) setDisabled(msg string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setDisableCondition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

kubeInformersForNamespaces,
openstackClient,
[]manila.Runnable{
manilaControllerSet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: manila controller running the manila controller set sounds odd; perhaps rename:

manilaControllerSet -> csiDriverControllerSet
nfsController -> nfsCSIDriverController

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

// Manila needs to replace two of them: Manila driver and NFS driver image.
// Let the Manila image be replaced by CSIDriverController and NFS in this
// custom asset loading func.
func assetWithNFSDriver(file string) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick


const (
operandName = "manila-csi-driver"
operatorName = "manila-csi-driver-operator"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the operator namespace is in util, not sure if want to have them in the same place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's name of the operator, not name of the namespace. It's just coincidence they're the same :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say: have the operator name and namespace consts in the same place. But that's not really important

operatorClient,
assetWithNFSDriver,
kubeClient,
controllerConfig.EventRecorder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of these args is incorrect:

func NewCSIDriverController(
	name string,
	csiDriverName string,
	csiDriverNamespace string,
	instanceName string,
	operatorClient v1helpers.OperatorClient,
	assetFunc func(string) []byte,
	kubeClient kubernetes.Interface,
	eventRecorder events.Recorder,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed

// Minimal interval between controller resyncs. The controller will detect
// new share types in Manila and create StorageClasses for them at least
// once per this interval.
resyncInterval = 1 * time.Minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to resync every minute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When someone creates a new share type in OpenStack, this operator should create StorageClass for it. Question is what's the right frequency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the optimal resync interval for this type of storage backend (manila, EFS) will vary from cluster to cluster. For instance, in some clusters the number of share types will seldom change, while in others new share types will be added and removed quite often.

Having said that, whatever we hardcode here might not be the ideal for some use cases.

IMO the best way to approach this is to define a standard value, say 20 minutes, and allow the cluster admin to customize it through the CR. If we choose this approach, we could tackle the customization part in another PR.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating it to 20 min, however, I am not sure how important it is to have it customizable.

README.md Outdated

Operator to create, configure and manage CSI driver for OpenStack Manila in OpenShift.
An operator to deploy the [Manil CSI driver](https://github.com/openshift/cloud-provider-openstack/tree/master/pkg/csi/manila) in OpenShift.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manila

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

README.md Outdated

### Installing the operator
The operator is based on [openshift/library-go](https://github.com/openshift/library-go). It manages `ManilaDriver` instance named `cluster` and runs several controllers in parallel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind and CR changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

).WithCSIDriverController(
"ManilaDriverController",
operandName,
string(operatorapi.ManilaCSIDriver),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the instance/config name and driver/operand name swapped here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I think

"NFSDriverController",
string(operatorapi.ManilaCSIDriver),
util.OperatorNamespace,
"nfs-csi-driver",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap line 83 with line 85?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I think

README.md Outdated
* It starts `nfsController`: Runs `csidriverset.Controller` that installs NFS CSI driver itself.
* It creates `StorageClass` for each share type reported by Manila and periodically syncs them at least once per minute, in case a new share type appears in Manila.
* It never removes StorageClass if Manila service or share type disappears. It may be temporary OpenStack ir Manila re-configuration hiccup.
* If there is no Manila service, it marks the `ManilaDriver` instance with `ManilaControllerDisabled: True` condition. It does not stop any CSI drivers started when Manila service was present! This allows pod to at least unmount their volumes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace ManilaDriver with ClusterCSIDriver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

type ManilaController struct {
operatorClient v1helpers.OperatorClient
kubeClient kubernetes.Interface
secretLister corelisters.SecretLister
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're not using this anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


cert, err := getCloudProviderCert()
if err != nil && !os.IsNotExist(err) {
return nil, fmt.Errorf("Failed to get cloud provider CA certificate: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: my editor says errors should not start with capital letters

if len(cert) > 0 {
certPool, err := x509.SystemCertPool()
if err != nil {
return nil, fmt.Errorf("Create system cert pool failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@bertinatto
Copy link
Member

/lgtm

Adding hold in case @Fedosin wants to take a look
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
README.md Outdated
```sh
oc delete -f deploy/crds/csi.openshift.io_maniladrivers_crd.yaml -f deploy/role.yaml -f deploy/role_binding.yaml -f deploy/service_account.yaml -f deploy/namespace.yaml
```
* OpenStack cloud credentials are in Secret named "cloud-credentials" in the same namespace where the operator runs. The operator uses the credentials to check if Manila is present in the cluster and, since OpenStack does not allow any fine-grained access controll, it lets the CSI driver to use the same credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access control

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

README.md Outdated
oc delete -f deploy/crds/csi.openshift.io_maniladrivers_crd.yaml -f deploy/role.yaml -f deploy/role_binding.yaml -f deploy/service_account.yaml -f deploy/namespace.yaml
```
* OpenStack cloud credentials are in Secret named "cloud-credentials" in the same namespace where the operator runs. The operator uses the credentials to check if Manila is present in the cluster and, since OpenStack does not allow any fine-grained access controll, it lets the CSI driver to use the same credentials.
* If underlying OpenStack uses self-signed certificate, the operator expects the ceritificate is present in a ConfigMap named "cloud-provider-config" with key "ca-bundle.pem" in the namespace where it runs. Generally, it should be a copy of "openshift-config/cloud-provider-config" ConfigMap. It then uses the certificate to talk to OpenStack API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certificate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func init() {
buildInfo := prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "openshift_aws_ebs_csi_driver_operator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift_manila_csi_driver_operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed

buildInfo := prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "openshift_aws_ebs_csi_driver_operator",
Help: "A metric with a constant '1' value labeled by major, minor, git commit & git version from which OpenShift AWS EBS CSI Driver Operator was built.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openshift Manila CSI Driver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
Use OpenStack's ceritificates when OpenStack API uses self-signed ones.
Assume that CSO creates a ConfigMap with it when it starts the operator.

Since CSO is going to create the ConfigMap and operator Deployment in
parallel, make sure the operator recovers when the ConfigMap is created
later. In addition, the certificate may be added / updated later, when the
operator already runs for some time.
The operator must already have valid cloud credentials when it starts to
check if underlying OpenStack has Manila. Since OpenStack does not provide
fine-grained permissions, the credentials used by the operator and the
driver can be the same. Therefore let the driver reuse the operator
credentials.
Delete and re-create a StorageClass when there is a change in
StorageClass.Paramertes. Kubernetes API does not allow Parameters change
after creation.

This is useful when moving a StorageClass between namespaces - secrets
namespace changes.
@jsafrane
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@Fedosin
Copy link
Contributor

Fedosin commented Jul 30, 2020

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Fedosin, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4661d85 into openshift:master Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants