Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS EBS CSI operator startup #59

Merged
merged 11 commits into from Jul 29, 2020

Conversation

jsafrane
Copy link
Contributor

@jsafrane jsafrane commented Jun 29, 2020

Three new controllers are introduced in this PR:

  • CSIDriverStarterController: observes Infrastructure and starts CSI driver operators that are registered for each platform. For each CSI driver operator it starts:

    • StaticController to create namespace and RBAC rules.
    • CSIDriverOperatorDeploymentController: see below.
    • CSIDriverOperatorCRController: see below.
  • CSIDriverOperatorDeploymentController: manages Deployment of the operator and fills <CSI driver name>CSIDriverOperatorDeployment conditions (e.g. AWSEBSCSIDriverOperatorDeploymentAvailable / Progressing / Degraded).

  • CSIDriverOperatorCRController: manages CR of the operator. It syncs all its conditions back to storage CR.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 29, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2020
@jsafrane jsafrane force-pushed the aws-operator branch 2 times, most recently from a0d3515 to 5c31422 Compare July 15, 2020 12:48
@jsafrane
Copy link
Contributor Author

/retest

@jsafrane jsafrane changed the title Add AWS EBS CSI operator startup WIP: Add AWS EBS CSI operator startup Jul 15, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2020
@jsafrane jsafrane force-pushed the aws-operator branch 6 times, most recently from b8743fe to 699f8ea Compare July 21, 2020 13:16
@jsafrane
Copy link
Contributor Author

/retest

@jsafrane jsafrane force-pushed the aws-operator branch 2 times, most recently from 7fd3516 to e52266c Compare July 24, 2020 12:29
For new library-go.
It's needed by the new library-go
@jsafrane jsafrane force-pushed the aws-operator branch 8 times, most recently from ee9b913 to 43faa51 Compare July 27, 2020 09:33
go.mod Show resolved Hide resolved
return err
}
operatorInformers := opinformers.NewSharedInformerFactoryWithOptions(operatorClientSet, resync,
opinformers.WithTweakListOptions(singleNameListOptions(operatorclient.GlobalConfigName)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still one in csoclients.go:

$ ag singleNameListOptions
pkg/csoclients/csoclients.go
107:func singleNameListOptions(name string) func(opts *metav1.ListOptions) {

pkg/operator/starter.go Outdated Show resolved Hide resolved
pkg/operator/csidriveroperator/driverstarter.go Outdated Show resolved Hide resolved
pkg/operator/csidriveroperator/driverstarter.go Outdated Show resolved Hide resolved
@jsafrane jsafrane force-pushed the aws-operator branch 2 times, most recently from 52b2f56 to 89473a5 Compare July 28, 2020 15:05
@jsafrane jsafrane mentioned this pull request Jul 28, 2020
pkg/operator/csidriveroperator/deploymentcontroller.go Outdated Show resolved Hide resolved
for i := range c.controllers {
if c.controllers[i].operatorConfig.Platform == platform && !c.controllers[i].running {
go c.controllers[i].mgr.Start(ctx)
c.controllers[i].running = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't ever going to be false anymore, do we care?

When the ctx above is cancelled, all controllers from c.controllers[i].mgr will shutdown (which is what we want). However, running will still be true, and nothing sets it back to false. If for some reason we reach line 93 above again, the CSI driver operator will never run again.

I don't know if this can ever happen, but I wish we could get rid of .running and have the ControllerManager tell us if it's running or not (maybe we can).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need to solve this. The context gets cancelled at the operator shutdown and we won't ever want to statt the CSI driver operators again.

pkg/operator/csidriveroperator/deploymentcontroller.go Outdated Show resolved Hide resolved
pkg/operator/csidriveroperator/deploymentcontroller.go Outdated Show resolved Hide resolved
@jsafrane
Copy link
Contributor Author

Fixed remaining issues and squashed a lot.

The operator RBAC rules needs to include all rules collected from CSI
driver operators and for OLM, which is complicated. Using cluster-admin for
now to make the operator working, however, with goal to rework the CSI driver
operator (and CSI driver) to use common roles in the future.
The configuration should encapsulate all that CSO needs to know about a CSI
driver operator to be able to start it.
CSO has many clients and passing them around individually is cumbersome.
CSIDriverObserver checks Infracstructure CRD for the underlying cloud and
starts corresponding CSIDriverOperatorController(s).

CSIDriverOperatorController installs one CSI driver operator.
// This CSIDriverOperatorCRController installs and syncs CSI driver operator CR. It monitors the
// CR status and merges all its conditions to the CSO CR.
// It produces following Conditions:
// <CSI driver name>CSIDriverOperatorControllerDegraded on error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller actually produces 2 types of degraded conditions:

  • WithSyncDegradedOnError() will produce a c.name + "Degraded" condition

  • Sync() will produce a c.name + csiDriverControllerName + "Degraded" condition

IMO ideally we should have only 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller should produce two Degraded conditions because of WithSyncDegradedOnError - it clears its condition when sync() returns nil. And we still want copied Degraded conditions from the CR.

I added an explicit comment about that + separated their prefixes

// <CSI driver name>CSIDriverOperatorDegraded on error
// <CSI driver name>CSIDriverOperatorCRDegraded - copied from *Degraded conditions from CR.
// <CSI driver name>CSIDriverOperatorCRAvailable - copied from *Available conditions from CR.
// <CSI driver name>CSIDriverOperatorCRProgressing - copied from *Progressing conditions from CR.

(as a separate commit, because this one is a bigger change)

default:
recorder.Eventf(fmt.Sprintf("%sUpdated", gvk.Kind), "Updated %s:\n%s", resourcehelper.FormatResourceForCLIWithNamespace(obj), strings.Join(details, "\n"))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these report*Event() are generic enough to live in util.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved (and squashed)


var platform configv1.PlatformType
if infrastructure.Status.PlatformStatus != nil {
platform = infrastructure.Status.PlatformStatus.Type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to use csiDriverControllerManager.platform instead?

I don't think you did, if so we can remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed csiDriverControllerManager.platform

@bertinatto
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bertinatto,jsafrane]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e31dc33 into openshift:master Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants