Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 145 additions & 20 deletions doc/proposals/sdk-integration-with-olm.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,13 @@ title: Neat-Enhancement-Idea
authors:
- "@estroz"
reviewers:
- TBD
- "@joelanford"
- "@dmesser"
approvers:
- TBD
- "@joelanford"
- "@dmesser"
creation-date: 2019-09-12
last-updated: 2019-09-12
last-updated: 2019-12-11
status: implementable
see-also:
- "./cli-ux-phase1.md"
Expand Down Expand Up @@ -39,34 +37,49 @@ OLM is an incredibly useful cluster management tool. There is currently no integ

#### General

* Operator developers can use `operator-sdk` to quickly deploy OLM on a given Kubernetes cluster
* Operator developers can use `operator-sdk` to run their Operator under OLM
* Operator developers can use `operator-sdk` to build a catalog/bundle containing their Operator for use with OLM
- Operator developers can use `operator-sdk` to quickly deploy OLM on a given Kubernetes cluster
- Operator developers can use `operator-sdk` to run their Operator under OLM
- Operator developers can use `operator-sdk` to build a catalog/bundle containing their Operator for use with OLM

#### Specific

* `operator-sdk` creates a [bundle][bundle] from an Operator project to deploy with OLM
* `operator-sdk` has a CLI interface to interact with OLM
* `operator-sdk` installs a specific version of OLM onto Kubernetes cluster
* `operator-sdk` uninstalls a specific version of OLM onto Kubernetes cluster
* `operator-sdk` accepts a bundle and deploys that operator onto an OLM-enabled Kubernetes cluster
* `operator-sdk` accepts a bundle and removes that operator onto an OLM-enabled Kubernetes cluster
- `operator-sdk` creates a [bundle][bundle] from an Operator project to deploy with OLM
- `operator-sdk` has a CLI interface to interact with OLM
- `operator-sdk` installs a specific version of OLM onto Kubernetes cluster
- `operator-sdk` uninstalls a specific version of OLM onto Kubernetes cluster
- `operator-sdk` accepts a bundle and deploys that operator onto an OLM-enabled Kubernetes cluster
- `operator-sdk` accepts a bundle and removes that operator from an OLM-enabled Kubernetes cluster

### Non-Goals

- Replicate mechanisms and abilities of OLM in `operator-sdk`.

## Proposal

### User Stories

**TODO**

Detail the things that people will be able to do if this is implemented.
Include as much detail as possible so that people can understand the "how" of
the system. The goal here is to make this feel real for users without getting
bogged down.
The following stories pertain to both upstream Kubernetes and OpenShift cluster types.

#### Story 1

I should be able to install a specific version of OLM onto a cluster

#### Story 2

I should be able to uninstall a specific version of OLM from a cluster

#### Story 3

I should be able to deploy a specific version of an Operator using OLM and a bundle directory.

#### Story 4

I should be able to remove a specific version of an Operator deployed using `operator-sdk` via OLM from a cluster.

#### Story 5

I should be able to specify one or more [required manifests](#olm-resources) saved locally or have `operator-sdk` generate them from bundled data during deployment.

### Implementation Details/Notes/Constraints

Initial PR: https://github.com/operator-framework/operator-sdk/pull/1912
Expand All @@ -75,14 +88,126 @@ Initial PR: https://github.com/operator-framework/operator-sdk/pull/1912

The SDK's approach to deployment should be as general and reliant on existing mechanisms as possible. To that end, [`operator-registry`][registry] should be used since it defines what a bundle contains and how one is structured. `operator-registry` libraries should be used to create and serve bundles, and interact with package manifests.

The idea is to create a `Deployment` containing the latest `operator-registry` [image][registry-image] to initialize a bundle database and run a registry server serving that database using binaries contained in the image. The `Deployment` will contain volume mounts from a `ConfigMap` containing bundle files and a package manifest for an operator. Using manifest data in the `ConfigMap` volume source, the registry initializer can build a local database and serve that database through the `Service`. OLM-specific resources created by the SDK or supplied by a user, described below, will establish communication between this registry server and OLM.
The idea is to create a `Deployment` containing the latest `operator-registry` [image][registry-image] to initialize a bundle database and run a registry server serving that database using binaries contained in the image. The `Deployment` will contain volume mounts from a `ConfigMap` containing bundle files and a package manifest for an Operator. Using manifest data in the `ConfigMap` volume source, the registry initializer can build a local database and serve that database through the `Service`. OLM-specific resources created by the SDK or supplied by a user, described below, will establish communication between this registry server and OLM.

#### OLM resources

OLM understands `operator-registry` servers and served data through several objects. A [`CatalogSource`][olm-catalogsource] specifies how to communicate with a registry server. A [`Subscription`][olm-subscription] links a particular CSV channel to a `CatalogSource`, indicating from which `CatalogSource` OLM should pull an Operator. Another OLM resource that _may_ be required is an [`OperatorGroup`][olm-operatorgroup], which provides Operator namespacing information to OLM; OLM creates two `OperatorGroup`'s by default, one of which can be used for globally scoped Operators.
OLM understands `operator-registry` servers and served data through several objects. A [`CatalogSource`][olm-catalogsource] specifies how to communicate with a registry server. A [`Subscription`][olm-subscription] links a particular CSV channel to a `CatalogSource`, indicating from which `CatalogSource` OLM should pull an Operator. Another OLM resource that _may_ be required is an [`OperatorGroup`][olm-operatorgroup], which provides Operator namespacing information to OLM. OLM creates a globally-scoped `OperatorGroup` by default, which can be used for globally-scoped Operators.

These resources can be created from bundle data with minimal user input. They can also be created from manifests defined by the user; however, the SDK cannot make guarantees that user-defined manifests will work as expected.

#### OperatorGroups and tenancy requirements

[`OperatorGroup`][olm-operatorgroup]'s configure CSV tenancy in multiple
namespaces in a cluster. Each Operator must be a
[member][olm-operatorgroup-membership] with one `OperatorGroup` resource in
the cluster, which defines a set of namespaces the CSV can exist in. A CSV's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be a little more specific than "namespaces the CSV can exist in" - it defines the set of namespaces the operator defined in the CSV is permitted to operate over; OperatorGroups are largely about RBAC and cluster visibility.

`installModes` determine what [type][olm-operatorgroup-installmodes] of
`OperatorGroup` it can be a member of.

No two `OperatorGroup`'s can exist in the same namespace, and a CSV with
membership in an `OperatorGroup` of a type it does not support (determined
by `installModes`) will transition to a failure state.

Given these rules and constraints, Operator developers may have a tough time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document from here on is not wrong, but I think it should probably do more to indicate that installmodes should describe the way the operator works and are not things that can be changed "after the fact" without rewriting/etc.

If the operator starts up and watches all namespaces, it should be AllNamespace, and should go into an OperatorGroup that watches all namespaces.

If the operator starts up and watches its own namespace, it should be OwnNamespace and go into an OperatorGroup that watches its own namespace only.

If the operator starts up and watches a single namespace based on an env var, and that env var is wired up to project the olm.targetNamespaces annotation from the deployment, then it should go into a SingleNamespace OperatorGroup.

Likewise, multinamespace mode and the configuration thereof is fundamental to the way the operator starts up and generating an operatorgroup with multiple namespaces will do nothing if the operator doesn't support watching n namespaces.

Operators can also support one or all of these, depending on how it is written.

A lot of this can be determined based on the properties of the operator itself:

  1. If it uses the downward API to read the namespace annotations, then it can potentially support Single and Multi namespace modes (the inverse might make more sense - if a deployment for an operator does not read from the annotations, then it cannot possible satisfy single or multi namespace installmode requirements).
  2. If the operator can only start up control loops per a configured namespace value (be that a specific namespace or AllNamespaces ""), then it cannot support MultiNamespace

The current proposal is fine and makes sense! But I'm wondering if there's a way that the sdk can "just know" what operatorgroup to make and what installmodes to make more directly, because of knowledge of what namespaces can be watched by the operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about establishing a convention that is based on the order of priority of the following conditions:

If AllNamespace is supported, create an OperatorGroup that watches all namespaces.
If SingleNamespace is supported, create an OperatorGroup that watches the namespace the Operator is deployed in.
If OwnNamespace is supported, same as above.
If MultiNamespace is supported, same as above.

writing an `OperatorGroup` for their Operator initially. To assist them,
`operator-sdk` should automate `OperatorGroup` "compilation" if one is not
supplied.

To perform compilation, the user can optionally supply the desired install
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What command is this option added to? I would imagine it's on the command that actually installs/runs?

mode type by which the CSV is installed through an `--install-mode` flag, and
the set of namespaces (may be all namespaces, `""`) in which the CSV will be
installed. For example, `--install-mode=MultiNamespace=[ns1,ns2]` will create
this `OperatorGroup`:
```yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: my-group
namespace: my-namespace
labels:
operator-sdk: true
spec:
targetNamespaces:
- ns1
- ns2
```

The compilation algorithm is as follows:

```
1. If an OperatorGroup manifest is supplied:
1. Use the one supplied and return.
Comment on lines +139 to +140
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there use cases that will require an operator group to be supplied, or is it possible to always generate one with flags (or defaults)?

I'm wondering if we can keep it simple and cover 90% of the use cases so that the CLI flag set doesn't explode. Can we wait to see if there's demand for supplying the operator group directly?

EDIT: I kept reading. It sounds like there may be cases where this is needed for running an operator in an existing namespace that already has an operator group? If we say that that scenario is out of scope, would that simplify things? Would it kill a bunch of common use cases?

@shawn-hurley @robszumski thoughts?

Copy link
Member Author

@estroz estroz Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how common it is to roll your own OperatorGroup. Given info in a CSV and --install-mode we can create a valid OperatorGroup 100% of the time.

If I understand correctly, if h exists in namespace n, then we can create g in another namespace m. The CSVs will still be deployed according to targetNamespaces and no OperatorGroup conflict errors will occur. I will test this theory to make sure.

2. Else if an OperatorGroup manifest is not supplied, compile an OperatorGroup g:
1. If no installMode and set of namespaces is supplied:
1. Initialize g as type OwnNamespace by setting g's targetNamespaces to the Operator's namespace, and return.
2. Else if an installMode and set of namespaces is supplied:
1. Validate the set of namespaces against the install mode's constraints and the Operator's namespace.
2. Initialize g as the desired type with the set of namespaces and return.
```

Managing `OperatorGroup` resources for multiple Operators _before_ deployment
is attempted is a more complex problem, but prevents annoying-to-debug
deployment issues that will occur in the following scenarios:

- A user wants to deploy two or more Operators with CSV install modes
incompatible for one `OperatorGroup` to handle in the same namespace.
- A user wants to create an `OperatorGroup` in a namespace that already has
an `OperatorGroup`.
- The new and existing `OperatorGroup` namespace intersection is:
- Equivalent to the set of new and existing namespaces (they have the
same set).
- The empty set (not intersecting).
- A strict subset of either namespace set.

A solution to these types of conflicts is the following two algorithms:

Algorithm for creating an `OperatorGroup`:
```
1. Follow the compilation algorithm above to create an OperatorGroup g.
2. Determine whether an OperatorGroup exists in a given namespace n.
3. If no OperatorGroup exists in n:
1. If g was not compiled by operator-sdk:
1. Label g with a static label to signify g was not created by operator-sdk.
2. Else if g was created by operator-sdk:
1. Label g with a static label to signify g was created by operator-sdk.
3. Create g in n and return.
4. Else if an OperatorGroup h exists in n:
1. If h was not compiled by operator-sdk, return an error.
2. Else if h was compiled by operator-sdk:
1. Determine which CSV's are members of h, h's targetNamespaces hn, and g's targetNamespaces gn.
2. If gn is equivalent to hn, return.
3. Else if the intersection of gn and hn is the empty set or a subset of either:
1. Label g with a static label to signify g was created by operator-sdk.
2. Create g in another namespace m and return.
```

Algorithm for deleting an `OperatorGroup`:
```
1. Determine whether an OperatorGroup exists in a given namespace n.
2. If no OperatorGroup exists in n, return.
3. Else if an OperatorGroup g exists in n:
1. If g is not labeled with an operator-sdk static label, return.
2. Else if g is labeled with an operator-sdk static label:
1. Determine the set of CSV's cs that are members of g.
2. If cs is the empty set:
1. Delete g and return.
3. Else if cs is not the empty set, return.
```

Notes on these algorithms:
- Labeling allows `operator-sdk` to determine whether an `OperatorGroup` can
be deleted; `OperatorGroup`'s not compiled by `operator-sdk` should not be
deleted in any case.
- An `OperatorGroup` not compiled by `operator-sdk` is considered a user-
managed resource. All conflicts must be resolved by the user, so an error
is returned if a non-compiled `OperatorGroup` is already present in a namespace.
- Deleting an `OperatorGroup` associated with 1..N CSVs will cause those CSVs
to transition to a failure state, so we should not delete if this is the case.

[olm-operatorgroup-membership]: https://github.com/operator-framework/operator-lifecycle-manager/blob/1cb0681/doc/design/operatorgroups.md
[olm-operatorgroup-installmodes]: https://github.com/operator-framework/operator-lifecycle-manager/blob/1cb0681/doc/design/operatorgroups.md

#### Use of operator-framework/api validation

Static validation is necessary for users to determine problems before deploying their Operator. As we all know, static bugs are usually more tractable than runtime bugs, especially those discovered in a live cluster. The [`operator-framework/api`][of-api] repo intends to house a validation library for static, and potentially runtime, validation. The SDK should use this library as the source of truth for the qualities of a valid OLM manifest. This repo is a work-in-progress, and should be used as soon as it is ready.
Expand Down