Enhancement for operator-registry #24

kevinrizza · 2019-09-18T20:45:48Z

Adding an enhancement proposal for olm's operator-registry update to support native container bundle format.

kevinrizza · 2019-09-18T20:47:54Z

/cc @ecordell @njhale @tkashem @dmesser @shawn-hurley @gallettilance @dinhxuanvu @jpeeler @awgreene @anik120

deads2k · 2019-09-19T21:48:09Z

olm/operator-registry.md

+
+The purpose of this enhancement is to create a first class storage mechanism for Operators in Kubernetes.
+
+This implementation is desinged with commonly used tools in mind (specifically, container images and registries) to provide a format of pushing and pulling yaml metadata needed to install and lifecycle Operators.


typo: designed

deads2k · 2019-09-19T21:57:00Z

olm/operator-registry.md

+
+In order to serve that manifest image, we cannot pull the image directly. Instead we will:
+
+1. Create a pod spec that uses an init container to inject a binary into the manifest image.


I may not be understanding this correctly. Is this a proposal to run a pod for each operator we are gathering metadata for?

The premise is that we need some way of getting the manifests stored in these bundle images. We are going to accomplish it by bootstrapping a pod to push the metadata from it into a configmap and then kill the pod - we aren't going to run these pods perpetually.

This is a (non ideal) method of using normal kubernetes apis to pull metadata files from an image that is just scratch + files copied into a directory. In an ideal world we could use container tooling in olm directly to pull the image and extract these files from the image, but that is not an option with what we have available to us now. @jwforres @shawn-hurley

deads2k · 2019-09-19T22:05:12Z

/assign @jwforres

jwforres

I feel like some diagrams would really help explain some of the expected flows in this proposal. Where is the content coming from and where is it going?

jwforres · 2019-09-20T03:07:15Z

olm/operator-registry.md

+    -bundle: NULL
+    -bundlePath: "quay.io/community-operators/foo:sha256@abcd123"
+
+Additionally, we will update the grpc API that OLM currently uses to get the bundle that if it attempts to pull the non latest version will return null. In that case, it will make a second query to get the bundlePath. Then we will create a pod with that image that will serve the bundle data by writing the data to a configmap that can be read by OLM.


Is there an unstated assumption here that the index will contain the complete set of metadata+manifests for the latest version of all operators? And that it will then only contain metadata for all non-latest versions of the operator?

It is actually explicitly called out in this doc, it's just not explicitly called out in the implementation detail section about the database add method. Instead it's called out explicitly in the index container definition user story a little further up this document.

If it would make it more clear, I can update the registry api section of the implementation details to call that part out.

jwforres · 2019-09-20T03:11:22Z

olm/operator-registry.md

+2. Run the manifest image with that binary that writes to a configmap in a well known location
+3. Read the configmap data
+4. On return kill the pod
+5. Delete the configmap


Are you intending to start backing the operator registry pods containing the database with actual persisted storage? If not, and the operator registry pod is killed for whatever reason, then you've lost all of those modifications you made to the index and would have to re-run any of these pods that had been launched. Leaving the ConfigMap in place instead of deleting it would provide persistence for that data if the operator registry gets restarted and needs to be rebuilt.

Leaving the ConfigMaps does mean more management later, need to know when to clean them up (like pruning old versions that are no longer needed).

Actually, we are not going to update the database at all at runtime. This implementation means that operator-registries are still immutable once built. Right now this bundle request is only used to explicitly create the CSV resource on the cluster -- we will just use that metadata to persist the data on that object.

It is true that if the CSV is explicitly deleted for some reason (most likely install) that this process would happen again. But at the moment the frequency of that is so low that trying to persist these configmaps and handle their state is probably not worth that optimization. Obviously though if we decide to remove the CSV API altogether we may have to revisit this (mutable database, persist them as config maps, etc)

CSV does not currently contain all the data needed to install for some operators, and I don't think that it makes sense to assume that. I would like to think about if we need to cache this, or if re-pulling/running this object is acceptable.

Are there times when we need the old files (CRD's maybe?) for upgrade? Imagine the skip_range case where this old bundle ref may be removed from the manifest-image.

I would like to think about if we need to cache this, or if re-pulling/running this object is acceptable.

This enhancement does not propose removing InstallPlans, which perform the caching/recreation you're concerned with. When/if we remove them, we can discuss alternative caching solutions.

jwforres · 2019-09-20T03:18:35Z

olm/operator-registry.md

+When I set the update frequency on an operator index, OLM will poll and ensure that if there is a new version of the index that those changes are pulled and propagated into the cluster.
+
+Internal details:
+Polling process should be reviewed by quay team to ensure we scale on quay (although these indexes should work on ANY container image registry)


What will this polling look like? Are you proposing that catalog-operator in OLM would be reaching out to an image registry directly to see if new versions of the index image are available?

This feels like scheduled imports on ImageStreams. @bparees thoughts here?

I've had a few conversations with @shawn-hurley about the challenges involved in pulling images directly. Most of those concerns would also apply to anything that's checking an external registry for images as well. Off the top of my head that is:

proxy support

credential management (this is the big one... ideally you want to benefit from creds the admin already setup for the cluster, not force people to point you to yet another set of creds that this particular component needs)

CA trust management (for registries that are signed with self-signed certs or CAs that aren't in system trust)
(if you're actually pulling images yourself, there are additional concerns primarily around storage management).

So those are all things to think about here.

But as to your specific question, yes it sounds like imagestream scheduled import... if OLM pre-req'd openshift, it would be pretty reasonable to simply setup imagestreams with scheduled import turned on, and then use a form of imagechange trigger to have OLM take action when the imagestream picks up a new SHA.

But of course if OLM is expected to work on k8s, that's not viable, and it sounds like this polling implementation will have to rebuild the capabilities that exist in imagestream import.

ecordell · 2019-09-20T21:23:30Z

olm/operator-registry.md

+  containers:
+  - name: operator-manifest
+    image: quay.io/community-operators/foo@sha256:abc123
+    command: ['operator-registry', 'serve' '-manifestdir "./manifests/"']


This is a strawman, but this command looks a lot more like starting the registry server from a set of manifests than it looks like copying manifests into a configmap

yep, this is because this needs to be rewritten from when we were suggesting serving an api here. i'll update this now.

ecordell · 2019-09-20T21:31:22Z

olm/operator-registry.md

+
+We will also need to update operator-registry to include a new `serve` command that knows how to parse the manifest/ folder in the operator bundle image. This command will write to a configmap that OLM will use to get the bundle data.
+
+*DB Versioning*


We should call out our versioning strategy for index images and CatalogSources in general.

Proposal:

DB format is versioned with schemas

Tooling is versioned and supports a range of db schemas, indicated in operator-registry/README.md.

GRPC api follows convention and doesn't make backwards incompatible changes

Any large changes we wish to make (i.e. want to deprecate the current grpc api because we've changed how dependency resolution works) will be indicated via new CatalogSource types (current types are grpc, and configmap/internal).

ecordell · 2019-09-20T21:33:55Z

olm/operator-registry.md

+spec:
+  sourceType: grpc
+  image: quay.io/operator-framework/community-operators:
+  pollTimer: 1m


Perhaps this should be a block allowing additional config in the future;

poll: interval: 1m startingAt: 09-20-2019T20:20:15

This is a good point. Let's nest for now but leave the single field for now, which will make extending it easier in the future.

ecordell · 2019-09-20T21:35:06Z

olm/operator-registry.md

+
+### Version Skew Strategy
+
+n/a


This seems like a good place to put our versioning strategy 😄

kevinrizza · 2019-09-20T22:59:22Z

@deads2k @jwforres @ecordell @bparees Addressed your feedback, please take another look.

shawn-hurley

Some more questions and concerns

shawn-hurley · 2019-09-23T14:11:55Z

olm/operator-registry.md

+    - inputs: none
+    - outputs: empty operator registry database
+
+`operator-registry add`


Where and how can both the admin and administer say what channels this operator should be added to?

The current plan is to include package metadata in the bundle itself. This is specified in the bundle enhancement.

Future plans discussed (but not written down here) including having a separate, catalog-wide set of metadata describing how updates are laid out in a catalog. When such metadata exists, I imagine we will need to add options to this command or provide new commands for inspecting/modifying.

It may be nice to link to that here in a "how to build the update graph" section WDYT?

@shawn-hurley when you say "that" are you referring to the bundle enhancement? Or are you referring to evan's description of the catalog wide metadata?

I think the former is in a pr that is yet to merge so I would hesitate to link to it, and the latter would need to be written.

Somewhere, either the bundle format OR this PR we need to write down how given the input we will develop the graph IMO I think it would help a lot of folks understand what we are planning.

shawn-hurley · 2019-09-23T14:20:10Z

olm/operator-registry.md

+2. Run the manifest image with that binary that writes to a configmap in a well known location
+3. Read the configmap data
+4. On return kill the pod
+5. Delete the configmap


CSV does not currently contain all the data needed to install for some operators, and I don't think that it makes sense to assume that. I would like to think about if we need to cache this, or if re-pulling/running this object is acceptable.

Are there times when we need the old files (CRD's maybe?) for upgrade? Imagine the skip_range case where this old bundle ref may be removed from the manifest-image.

jewzaam · 2019-09-23T20:51:47Z

olm/operator-registry.md

+
+The goal of this enhancement is to replace the app-registry implementation in a way that is usable by operator developers and operator author developers. To do this we will build an index image that is defined from a set of operator bundle images that are released independently and optimized in a way that can be productized by operator authors and pipeline developers.
+
+This implementation makes the assumption that a separate process is built that generates container images that contain individual operator bundles for each version of an operator. The intent of the index image is to aggregate these images that are out of scope of this enhancement.


Where is the operator bundle defined? Some challenges we have faced in operator development is only a subset of resources can be bundled along with the operator, so I'm interested what is supported if the bundle format changes.

The operator bundle definition is being defined in a separate openshift enhancement proposal here: #27

jewzaam · 2019-09-23T20:55:03Z

olm/operator-registry.md

+
+*Update the Operator Registry to insert incremental updates*
+
+Add create db, delete, and batch insert APIs to the model layer of operator-registry.


How does the registry survive restart of pods and/or nodes?

This is resolved exactly as it works today. This enhancement proposal is suggesting that we continue to use the same catalogsource API that we currently use.

jewzaam · 2019-09-23T21:01:18Z

olm/operator-registry.md

+
+Mitigation:
+
+In that event, we will do a subset of this work. Instead of the operator-registry taking an image, it will take a local directory that has a set of yaml for the bundle. Additionally, we will not add the imagePath optimization to the registry schema.


Will this support more than is supported in grpc bundles today? SRE would like to install Service, ServiceMonitor, PrometheusRules (and probably some other things I'm forgetting) as part of a bundle.

This is out of scope of this proposal, as we aren't redefining the bundle here. For now it will support the exact same API as what currently exists.

jewzaam · 2019-09-23T21:02:11Z

olm/operator-registry.md

+### Test Plan
+
+Update operator-registry tests to ensure new APIs and DB update commands work as expected.
+Additionally, add e2e tests with mocked index containers to ensure that those workflows work as expected.


Destructive tests would be nice. As a user, if I add an operator and a pod is deleted or a node is deleted I want to be sure it won't disappear from OperatorHub.

Something like this would be useful, but we aren't redefining how the catalog-operator handles creating registry pods here at all. So something like that would be a separate proposal.

olm/operator-registry.md

bparees · 2019-09-24T22:01:05Z

olm/operator-registry.md

+  name: example-catalog
+  namespace: olm
+spec:
+  sourceType: grpc


sourceType grpc? not "image" or something?

This isn't a net new API, I am planning on just reusing the existing type. OLM already supports handling images that host grpc apis as input, so the intent here is to reuse the existing API that users are already familiar with and just expanding on it.

ralphbean · 2019-09-27T13:56:49Z

olm/operator-registry.md

+
+### Non-Goals
+
+  - Tools to build the index images or operator manifest images


Providing this tool somehow will be essential to success here. It's doesn't need to be a goal of this enhancement, but it needs to be taken up if this is all to work.

We have a separate enhancement specifically for tooling (though some will be built as part of this enhancement).

ralphbean · 2019-09-27T15:12:22Z

olm/operator-registry.md

+
+### Implementation Details
+
+The goal of this enhancement is to replace the app-registry implementation in a way that is usable by operator developers and operator author developers. To do this we will build an index image that is defined from a set of operator bundle images that are released independently and optimized in a way that can be productized by operator authors and pipeline developers.


What's the difference between an operator developer and an operator author developer?

Looks like it should be operator authors and pipeline developers

Yep, this is a typo. I'll fix it in an update.

mhrivnak · 2019-09-27T16:21:36Z

olm/operator-registry.md

+    - outputs: updated database file without latest version of $operatorName included
+    ex: `operator-registry delete-latest foo example.db`
+
+As a point of context, these operations all output new database files. The intent of creating these commands is to allow the creation of new container images based on historical context of previous versions. These commands will be wrapped in tooling (outside the scope of this enhancement) that will be run as part of build environments or for local development. We are not creating a way for these commands to be run from inside the context of a cluster.


In terms of building and releasing an artifact, and ensuring we have source data available for reproducible builds of that artifact: If I were maintaining an operator registry, would I have example.db directly committed to a git repo, alongside a Dockerfile? And to make changes, I'd run commands such as the above on my laptop, then submit a PR with the updated example.db?

The database file is an implementation detail. Previously, this database was always generated by the data in a given namespace on quay.io and generated at runtime. Here, the database will be generated as part of building a container image (through a set of commands) and the image itself is stored in a given registry (quay.io, dockerhub, etc.). The database file itself is part of the filesystem of the image that actually runs the service that hosts the data in the database.

Outside the scope of this enhancement itself there is an initiative to build a set of commands that can generate these index images so that all a developer needs to maintain is the yaml itself to build a bundle (also outside the scope of this enhancement, see #27 for more details on that). The index image described in this enhancement would be managed by a CI system that aggregates a set of bundle container images. To reproduce this image, CI would be able to point to a given set of bundle images and rebuild the index (and therefore the database) from scratch.

The database file is an implementation detail.

Just to be clear, this is part of the implementation details. It might be nice to write down how we expect people to use this.

Yes, but from the perspective of a source control owner the database file itself is an implementation detail that will not be exposed to the user. The database file itself is described in detail elsewhere in this doc -- which is to say that the file itself is just built inside the container image and not saved or used anywhere.

I should make sure I read further:
https://github.com/openshift/enhancements/pull/24/files#diff-cdcfafa680aa53239dc81d77c79e4404R259

Sorry

But that isn't how package repositories work today though. A yum repository is essentially a set of XML files that have pointers to the path for rpms along with some metadata about what the package is. The individual rpm includes the source of the installation.

We are essentially making the exact same distinction here. To use the same metaphor

index image == yum repo (which is the subject of this enhancement)
bundle image == rpm (which is the subject of #27)

This index image we are attempting to define is nothing more than a set of pointers to some stuff that can be installed along with some helpful metadata to define how to get it at install time. The concept is actually heavily influenced by createrepo_c which defines a lot of the same concepts. I do not believe that it should be necessary to be able to reliably put the entire set of manifests in a single image for the sake of reproducibility -- I would question the value of how important it is to reproduce something of this nature anyway just because it is an image. I certainly see the value of being able to rebuild the content of this image itself by storing a file which is essentially a list of images it was built from, however.

In reality, if you want to reproduce the yum repository by storing the xml somewhere, you can absolutely do that now. But if someone was to delete an rpm that the yum repo referred to, that would be mostly meaningless since the repository itself is no longer usable.

The yum analogy is a good one. In that case, the RPMs must be in a local directory at the time the metadata files are generated. Once the XML has been created, it gets moved around with all of the RPMs; they stay together as one large release output.

If the RPMs were being retrieved from a live API somewhere, or perhaps even being retrieved from multiple API sources, that would fundamentally change the picture. Then there would need to be a way to snapshot, at a point in time, exactly what set of RPMs that live API was serving. To some extent we have that with brew and koji. They enable you to tag RPMs into a repository build or product release, and then take care to keep the resulting sets immutable. We could make something like that for bundle images. But just as brew is major overkill for most use cases, we likewise do not want to require all distributors of operators to have a sophisticated set of tooling and services to manage snapshots of content served by live APIs.

While the yum world has advanced tooling like brew, koji, and Pulp to help people manage curated repositories over time, there is also the simple and bullet-proof option of putting RPMs in a directory, running createrepo_c, and then keeping the resulting output together as one immutable and reproducible release. We need something that simple for operators.

I still question why git repo == safe place to reproducibly store things and container registry == dangerous non reproducible place to store things. In both cases the source is stored and hosted in an API. If we need to ensure those things are secure and will always be available, then we have backup procedures in place to ensure that the content in the registry is maintained. If the concern is that these indexes can be built from multiple sources: that is true, it is possible. It makes developer workflows like "What happens when I add my package to this index and try to install?" possible. But in reality when a production environment is managing these things, the bundles should be stored in a single secure safe place -- a managed and hosted container registry.

We are defining exactly the thing you are describing: operator-registry will get the manifests onto a local repo, run something to insert that data into the repo, and then results in an immutable blob that can be published. That is literally what createrepo_c does. Can you point to me what differences there are?

git is the canonical way to track the provenance of source material. Serving content over live APIs is not dangerous, but a container registry today won't tell you anything about what it was serving yesterday. At Red Hat we've built a lot of tooling to make repositories of container images trackable, immutable, promotable from qa -> stage -> prod, etc. That's all to establish trust via an audit trail. A live API itself is not a signable, auditable artifact, so we've had to create processes and tools around that.

Maybe it would help if you could elaborate on the curation mechanism you have in mind. Suppose a product owner wants to include a specific list of operators together in a catalog and keep them updated over time; where is that list kept, what does it look like, and how is it used? Put another way, what is the single source of truth for what should be put into the catalog, and how does that get used?

This is a very good discussion with lots of good points made. There are a couple of things I'd like to follow up on:

The fact that OLM and operator-registry happen to have an API that allows that is largely a user testing workflow and not something that exists in any production pipeline.

This is how we build the upstream catalog, and the non-appregistry tools were built for more than just testing. The appregistry server was added later to support the existing pipeline, which used appregistry, because we determined it would not be possible to change course at the particular stage of 4.1 development.

The way @mhrivnak is describing the use of operator-registry is exactly what was originally envisioned for operator-registry.

But there are some important points missing from this that changed our approach (and my mind):

We got lots of feedback from users that managing a set of all versions of your manifests was hard. It was easy to make mistakes, get them out of sync, forget to update them, etc. We also had some "chicken and egg" issues, where users wanted to keep their manifests in the same git repo as their operator source.

The embedded icons and the descriptions take up a good chunk of disk space, and get stored for each version of each operator.

There is community interest in storing yaml manifests as images (see: ORAS, CNAB, helm, etc).

The "one big directory" model is a little harder to support external contributors (we can't, for example, easily give a user access to sub-directory of a git repo).

Because of (3), we needed to consider images as a potential source of data. Because of (1), it seemed like that might be a good way to resolve current user frustration, and we'd get (4) and (2) solved for free (even though there are more options for 2).

Then there would need to be a way to snapshot, at a point in time, exactly what set of RPMs that live API was serving.

This snapshotting is essentially what we're doing with this proposal. If it wasn't called out, all of the references to images are digest-only. Availability is a concern, though, which I think is a good point.

A live API itself is not a signable, auditable artifact, so we've had to create processes and tools around that.

OCI/Docker registries and their artifacts they host are auditable and signable (there are perhaps too many standard ways to do this), so I think Kevin's question is reasonable.

But you are making a good point that all of Red Hat's provenance tracking is git-based and likely will be for some time.

I also think that, since the file exists, users will treat it as an "API". One of the ways they can do that is to store it in a git repo and track changes to it there, and then load it up via the registry server.

One nice thing is that we can actually do both 😄

Since we need to migrate, we will continue to support "direct" bundles (not stored in images). We could choose to never drop support.

If it's valuable, we could provide "all-in-one" registry images, or tooling to make them. It's a straightforward process to take bundle images, extract them, and store them.

mhrivnak · 2019-09-27T17:01:20Z

olm/operator-registry.md

+1. Create a pod spec that uses an init container to inject a binary into the manifest image.
+2. Run the manifest image with that binary that writes to a configmap in a well known location
+3. Read the configmap data
+4. On return kill the pod


Maybe a Job would be a good fit? It's expected to exit, and it may be useful with handling unexpected failures.

ecordell · 2019-10-01T12:29:01Z

@jwforres @shawn-hurley This is ready a for a final pass, it looks good to me.

shawn-hurley · 2019-10-01T12:39:57Z

olm/operator-registry.md

+    interval: 1m
+```
+
+Some open questions about the actual poll implementation: In openshift, for many of the the same reasons that OLM cannot directly pull the operator bundle image, OLM cannot query for new shas on a given tag directly. In order to accomplish this, we will need to create a deployment process that has an implicit understanding of what images are available (for example, we may use a rollout to force the deployment to update on the timer -- with a pull policy of always this will result in a new pod that is on the latest image when there are updates).


Can we please write down how we are going to solve this?

Updated the doc to specify.

- Update polling details with initial trivial implementation - Fix additional typos

shawn-hurley · 2019-10-01T13:27:31Z

olm/operator-registry.md

+`operator-registry add --batch`
+    - inputs: $operatorBundleImagePath, $dbFile
+    - outputs: updated database file
+    ex: `operator-registry add "quay.io/community-operators/foo:sha256@abcd123,quay.io/community-operators/bar:sha256@defg456" example.db`


For the downstream builds, how is this command going to be run inside the image?

I understand that we are going to wrap in tooling, but making sure that the underlying thing can be used by downstream is important.

Specifically, I am worried about prod image. This seems that the tool will pull the image, but we would not have published the production image (registry.redhat.io) yet when building this. Is there some other mechanism to do this?

@ralphbean

Yes, the operator-registry will use container tooling to pull the image down, inspect the manifests, and build the database.

I'm confused about the comment:

Specifically, I am worried about prod image. This seems that the tool will pull the image, but we would not have published the production image (registry.redhat.io) yet when building this. Is there some other mechanism to do this?

What do you mean by this? The index build would expect that the images it is being built from do already exist. Otherwise whatever CI process that decides to build a new index won't have any idea what to build. How else could you build something from something else without the first thing already existing? Or am I misunderstanding?

So

Yes, the operator-registry will use container tooling to pull the image down, inspect the manifests, and build the database

I don't think that the above will work. Brew does not have access to the internet, so when creating this image, wouldn't it need access to the internet? Would there be a way of saying use this internal registry image here but name it X as we do with OMPS on a push to application registries?

inspecting the images by pulling the tarball and expanding the image into the filesystem?

Otherwise whatever CI process that decides to build a new index won't have any idea what to build. How else could you build something from something else without the first thing already existing? Or am I misunderstanding?

I am worried that this might not have been true. That we would want to pre-stage this image with the other to be delivered prod images. We could do a separate thing but I think it would require much more CI work from the pipeline/RCM side and I wanted to make sure that everyone is aware of what we are thinking will happen here.

dmesser · 2019-10-02T13:00:44Z

olm/operator-registry.md

+`operator-registry add`
+    - inputs: $operatorBundleImagePath, $dbFile
+    - outputs: updated database file
+    ex: `operator-registry add quay.io/community-operators/foo:sha256@abcd123 example.db`


What's the behavior when an existing, indexed bundle is getting updated and supplied to the add command - is there some sort of INSERT ON DUPLICATE KEY UPDATE behavior?
Background: it's going to be fairly hard for a pipeline to determine whether the bundle that just got pushed is a net-net version or replacing an existing version.

The behavior is that it will fail. This is the same existing enforcement that currently exists about not being able to easily update an existing version -- once that version is released into the wild modifying it is problematic because what happens when someone has already installed that version? Instead if another change is required just bump the version again.

If there is some more serious reason why that version needs to be removed entirely, then the safety valve is a manual process of deleting that bundle image version and either modifying the index by deleting the version and then adding the new one OR deleting the bundle image version and rebuilding from scratch.

Thanks. That also means, that any cosmetic changes, e.g. typos in descriptions, or logos, will lead to an updated bundle which leads to an updated version which leads to updates happening on cluster. Unless we modify / recreate the index manually as you describe.
Just making sure we acknowledge this. From my side this is fine.

mhrivnak · 2019-10-02T18:30:20Z

This proposal seems to assume that if the catalog image contained all the manifests, it would become too large, thus this proposal splits the manifests into separate artifacts from a slimmed-down index. This begs the question: how many operator releases could we reasonably fit into one image with their manifests? At what point does the data become so big that we need to split it apart?

To get a data point, I looked at the source for operatorhub.io. It has 77 operators with a total of 178 operator releases (counted by CSVs). I tarred the whole thing up, and it came to 7.3MB uncompressed. When compressed with bzip2, it goes down to 1.8MB.

That's an average of 10KB per release if we compress the data.

Even if we put 10,000 operator releases into a catalog source, if we compressed their manifests, that data would only amount to 96MB. Not bad.

So what is our requirement for how many operator releases need to be in a single catalog source? Keep in mind that they do not need to grow indefinitely. For example, most projects and products do something like semantic versioning and could happy make a unique catalog source image per y-release. Upgrading from the 1.2 stream to the 1.3 stream would involve changing your catalog source to reference a different image. Bugfix releases would land in each stream's corresponding catalog source. That's quite similar to how rpm and deb repositories tend to be used.

A use case like operatorhub.io could publish the last 6 months or 12 months worth of releases in the default catalog source image. If someone needed an older release, just make those older images still available for use.

kevinrizza

This proposal seems to assume that if the catalog image contained all the manifests, it would become too large, thus this proposal splits the manifests into separate artifacts from a slimmed-down index. This begs the question: how many operator releases could we reasonably fit into one image with their manifests? At what point does the data become so big that we need to split it apart?

To get a data point, I looked at the source for operatorhub.io. It has 77 operators with a total of 178 operator releases (counted by CSVs). I tarred the whole thing up, and it came to 7.3MB uncompressed. When compressed with bzip2, it goes down to 1.8MB.

That's an average of 10KB per release if we compress the data.

Even if we put 10,000 operator releases into a catalog source, if we compressed their manifests, that data would only amount to 96MB. Not bad.

So what is our requirement for how many operator releases need to be in a single catalog source? Keep in mind that they do not need to grow indefinitely. For example, most projects and products do something like semantic versioning and could happy make a unique catalog source image per y-release. Upgrading from the 1.2 stream to the 1.3 stream would involve changing your catalog source to reference a different image. Bugfix releases would land in each stream's corresponding catalog source. That's quite similar to how rpm and deb repositories tend to be used.

A use case like operatorhub.io could publish the last 6 months or 12 months worth of releases in the default catalog source image. If someone needed an older release, just make those older images still available for use.

Couple of things to note here:

10,000 operator releases is actually a relatively small number of releases. Currently we have 77 operators, if we just x10 that number and over the next year and each has 10 releases then we are already on the order of magnitude of that estimate. If we actually have 10,000 operators and each has 10 versions, we have 100,000 which makes an image on the order of magnitude of almost 1GB. If that image needs to be streamed every 60 seconds onto a cluster, even a 100MB file is not particulary sustainable.

We built a chart that defines some of the constraits we are considering:

| # operators | Full CSV, no compression |
|-------------|--------------------------|
| 10          | 200 KB                   |
| 100         | 3.4 MB                   |
| 1000        | 33.5 MB                  |
| 10000       | 300 MB (estimated)       |
| 100000      | 3 GB (estimated)         |

Additionally, some of the complexity arises from the way the OLM operator currently requires the entire history of the graph in order to understand and perform updates. It is true that some mechanism for putting subsets of version graphs onto clusters is possible, but that would also require some significant thought around the release engineering process. This enhancement is an attempt at simplifying the size of these things so that those constraints are not an immediate concern -- by defining a method of essentially listing a set of pointers to the content rather than shipping all the content directly.

njhale · 2019-10-03T12:50:53Z

olm/operator-registry.md

+
+All the metadata that drives the on-cluster UI and install flow needs to be pulled by the cluster in order to display everything. Therefore it may be better to just build the index as a startable and queriable container that serves the content rather than a list of pointers to other content that needs to be pulled at runtime. There are some reasons to build the index as a simple list, largely stemming from a need for CI and build tools to build these indexes very rapidly. However, if the tools to generate the index are sufficiently well designed then bottlenecks can be avoided.
+
+The format then looks something like the operator-registry database that exists today, but with only minimal information about every single container. The db schema will have a layer of "latest" for each operator name for all the metadata that drives the OpenShift console (descriptions, icon references, etc.). Then all operator image pointers will aggregate just the metadata that drives install and upgrade (version, channels, dependencies, owned crds). To add additional updates to the database, the database just needs to be inserted/batch updated with changes rather than rebuilt from scratch.


... but with only minimal information about every single container.

"every single container" is a little confusing to me; which containers, are they running?. If we're talking about images that contain operators (manifests/metadata), then I think "but with only the metadata required to drive <resolution, ui, etc..> for each operator in the index." is slightly more clear.

you're right, this is unclear. at the very least it should have been every single container image. will update

njhale · 2019-10-03T12:57:01Z

olm/operator-registry.md

+When I add an operator index to my cluster, I am able to get information about the operators defined in that index.
+
+Internal detail note:
+Create an image type CatalogSource that, when added to a cluster with a reference to an index image, pulls the index manifest down and generates the operator metadata database for that operator.


nit: image is already a CatalogSource type -- perhaps we could use index image instead

The intention here was to reuse that type specifically so that we do not need to create a new one. Since in both cases they result in a database serving the catalog there is no need to create a separate type since the implementation remains the same. In fact, adding the polling implementation for this enhancement should also make polling work for the current full size registry images.

njhale · 2019-10-03T12:58:36Z

olm/operator-registry.md

+
+As an OpenShift administrator, I want a way to install operators made available to me from my Operator index.
+- So that I can view what operators are available to install
+- So that I can view metadata about that operator (description, author, etc) on the OpenShift console


Suggested change

- So that I can view metadata about that operator (description, author, etc) on the OpenShift console

- So that I can view metadata about available operators (description, author, etc) on the OpenShift console

njhale · 2019-10-03T13:11:41Z

olm/operator-registry.md

+1. Takes a set of manifest files and outputs a database file.
+2. Takes a database file and serves a grpc api that serves content from that db
+3. Takes a configmap reference, builds a db and serves a grpc api that serves content from that db
+4. Takes an app-registry path/namespace reference, builds a db and serves a grpc api that serves content from that db


and 4. are for specific variants of operator-registry, while 1 and 2 are abstract and apply to all variants. The way this is currently written, I'm expecting them to be a set of successive steps rather than features.

So, to be clear, these are all directly translated from the four CLI binaries built from the operator-registry today:

initializer

registry-server

configmap-server

appregistry-server

Agreed that 3 and 4 are specific implementations of 2, but that is the distinction that exists in terms of today's tooling.

mhrivnak · 2019-10-03T19:02:22Z

To have 10,000 operators would be quite a sight. I think that would be very optimistic; @dmesser how many operators do you think we need to be prepared for on operator hub (I assume that's the use case we're talking about) in 1-2 years?

For a comparable data point; helm has a ton of mind share, lots of promotion at the corporate level and from CNCF, and it's crazy-easy to make a chart; for as long as they've been in the game and as little investment as it takes to create a chart, Helm Hub currently has 850 charts. That's a lot, and they came from many different contributors over the course of years.

Speaking of helm hub, they moved to a distributed model for repositories, which I think we should consider carefully. The lesson they learned by operating a monolithic repo is that not all 854 charts have to exist in one repository, and there are many advantages to breaking that apart. Whether we have 70 or 10,000 operators, they likewise do not need to exist in one catalog image. Helm made it very easy for a chart author to host their own repository, and they solve the aggregation problem at the hub. We should likewise make it as easy as possible for someone to make and host their own catalog image. I do not think it's worth complicating the structure of an individual repository for the sake of supporting 10,000+ operators in a single repository.

I'm not saying we should run off and change how operator hub works right now. But if it grows to a big enough size (whatever we decide that is), it will make sense to split it out into many catalog images and let the website aggregate search across them.

If that image needs to be streamed every 60 seconds onto a cluster

If we need to download the entire catalog every 60 seconds, that sounds like its own problem we should solve. What needs to happen for us to avoid that?

Is this a product of the assumption that OLM itself cannot directly interact with remote registries? How fixed is that assumption, and might we be able to relax it? Perhaps there could be a middle-ground where we use ImageStreams if available, or otherwise directly interact with container image registries. Or perhaps we could just enable OLM to talk to registries all the time. Or ... ?

Additionally, some of the complexity arises from the way the OLM operator currently requires the entire history of the graph in order to understand and perform updates. It is true that some mechanism for putting subsets of version graphs onto clusters is possible, but that would also require some significant thought around the release engineering process.

@ecordell I thought the requirement to have the entire graph history available was going away? That's what I understood from the z-stream strategy discussions.

shawn-hurley · 2019-10-04T13:29:50Z

olm/operator-registry.md

+FROM quay.io/operator-registry-builder/builder:latest
+COPY /build/bin/operator-registry /operator-registry
+
+RUN operator-registry add "quay.io/community-operators/foo:sha256@abcd123,quay.io/community-operators/bar:sha256@defg456" exampledb.db


We have an open question/risk with this correct? We do not know if it is possible to pull an image during image creation. Can we be more clear that this process may need to change as we fill in gaps?

@shawn-hurley I spoke with @ralphbean and he seemed to think that this process of pulling images should be okay for the current pipeline systems. I can make a note of the fact that we may need to provide some alternative solutions for those internal pipelines (building the database upfront then passing that db file around, for example) but that we are not promoting that as a primary workflow.

kevinrizza · 2019-10-04T13:35:03Z

@mhrivnak

To have 10,000 operators would be quite a sight. I think that would be very optimistic

We don't need 10,000 operators for that size, we need 10,000 csvs which could just mean 500 operators with an average of 20 released versions. With the way that we release CVEs that is not a very long term or unrealistic assumption at all.

Furthermore a main concern of this proposal is around the fact that our current implementation is using a hard to support private API (app-registry) built inside Quay.io. We want the community to adopt our implementation so we are attempting to propose the use of a common toolset (in this case container tooling) as a means of metadata storage.

If we need to download the entire catalog every 60 seconds, that sounds like its own problem we should solve. What needs to happen for us to avoid that?

This very proposal is attempting to solve that by instead pulling down a list of pointers instead of the metadata itself. We are redefining what a "catalog" looks like by just having a set of references to what operators can be installed, then pulling them down at runtime. If we make that catalog slim enough then polling for the whole update graph should not be an issue even with large scale.

Is this a product of the assumption that OLM itself cannot directly interact with remote registries?

We have had numerous discussions around what OLM can and cannot do in regards to interacting with image registries directly (I believe @jwforres can clarify if you would like more specific details, and see above comments around ImageStreams as well) but for now we are attempting to make something something that works the same with upstream vanilla kubernetes as well. Given the restrictions that OpenShift has right now in regards to how we can interact with images, the current polling implementation is relatively limited and being able to inspect to see if a new sha is available would simplify it somewhat. However, I do not believe that is the only reason at the moment for us to maintain these single large catalogs. There are concerns around release process, dependency graph ownership (specifically, who owns a given CRD is a concern that the large catalog is attempting to solve), and upgrade graphs (how do we handle upgrades for a cluster that is running a very old version of an operator, for example).

shawn-hurley · 2019-10-04T16:07:35Z

/lgtm

openshift-ci-robot · 2019-10-04T16:07:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kevinrizza, shawn-hurley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [shawn-hurley]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 18, 2019

openshift-ci-robot requested review from dgoodwin and runcom September 18, 2019 20:46

openshift-ci-robot requested review from anik120, awgreene, dinhxuanvu, dmesser, ecordell, gallettilance, jpeeler, njhale, shawn-hurley and tkashem September 18, 2019 20:47

deads2k reviewed Sep 19, 2019

View reviewed changes

openshift-ci-robot assigned jwforres Sep 19, 2019

jwforres reviewed Sep 20, 2019

View reviewed changes

tkashem mentioned this pull request Sep 20, 2019

Propose convention for storing operator bundle #27

Merged

ecordell reviewed Sep 20, 2019

View reviewed changes

kevinrizza force-pushed the operator-registry branch from c9b887b to 851e1a6 Compare September 20, 2019 22:50

Enhancement for operator registry

9e1979d

kevinrizza force-pushed the operator-registry branch from 851e1a6 to 9e1979d Compare September 20, 2019 22:58

shawn-hurley reviewed Sep 23, 2019

View reviewed changes

jewzaam reviewed Sep 23, 2019

View reviewed changes

shawn-hurley reviewed Sep 24, 2019

View reviewed changes

olm/operator-registry.md Show resolved Hide resolved

bparees reviewed Sep 24, 2019

View reviewed changes

Add context to purpose of db update commands

1387dce

ralphbean reviewed Sep 27, 2019

View reviewed changes

mhrivnak reviewed Sep 30, 2019

View reviewed changes

shawn-hurley reviewed Oct 1, 2019

View reviewed changes

Update polling implementationd details

b5d53b3

- Update polling details with initial trivial implementation - Fix additional typos

shawn-hurley reviewed Oct 1, 2019

View reviewed changes

dmesser reviewed Oct 2, 2019

View reviewed changes

kevinrizza commented Oct 3, 2019

View reviewed changes

njhale reviewed Oct 3, 2019

View reviewed changes

Reproducibility, CI concerns update

3daaf2a

shawn-hurley reviewed Oct 4, 2019

View reviewed changes

Add CI access to image registry risk

992f693

openshift-ci-robot assigned shawn-hurley Oct 4, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2019

openshift-merge-robot merged commit 8262139 into openshift:master Oct 4, 2019


		The purpose of this enhancement is to create a first class storage mechanism for Operators in Kubernetes.

		This implementation is desinged with commonly used tools in mind (specifically, container images and registries) to provide a format of pushing and pulling yaml metadata needed to install and lifecycle Operators.


		In order to serve that manifest image, we cannot pull the image directly. Instead we will:

		1. Create a pod spec that uses an init container to inject a binary into the manifest image.


		We will also need to update operator-registry to include a new `serve` command that knows how to parse the manifest/ folder in the operator bundle image. This command will write to a configmap that OLM will use to get the bundle data.

		DB Versioning


		The goal of this enhancement is to replace the app-registry implementation in a way that is usable by operator developers and operator author developers. To do this we will build an index image that is defined from a set of operator bundle images that are released independently and optimized in a way that can be productized by operator authors and pipeline developers.

		This implementation makes the assumption that a separate process is built that generates container images that contain individual operator bundles for each version of an operator. The intent of the index image is to aggregate these images that are out of scope of this enhancement.


		Update the Operator Registry to insert incremental updates

		Add create db, delete, and batch insert APIs to the model layer of operator-registry.


		Mitigation:

		In that event, we will do a subset of this work. Instead of the operator-registry taking an image, it will take a local directory that has a set of yaml for the bundle. Additionally, we will not add the imagePath optimization to the registry schema.


		### Non-Goals

		- Tools to build the index images or operator manifest images


		### Implementation Details

		The goal of this enhancement is to replace the app-registry implementation in a way that is usable by operator developers and operator author developers. To do this we will build an index image that is defined from a set of operator bundle images that are released independently and optimized in a way that can be productized by operator authors and pipeline developers.


		All the metadata that drives the on-cluster UI and install flow needs to be pulled by the cluster in order to display everything. Therefore it may be better to just build the index as a startable and queriable container that serves the content rather than a list of pointers to other content that needs to be pulled at runtime. There are some reasons to build the index as a simple list, largely stemming from a need for CI and build tools to build these indexes very rapidly. However, if the tools to generate the index are sufficiently well designed then bottlenecks can be avoided.

		The format then looks something like the operator-registry database that exists today, but with only minimal information about every single container. The db schema will have a layer of "latest" for each operator name for all the metadata that drives the OpenShift console (descriptions, icon references, etc.). Then all operator image pointers will aggregate just the metadata that drives install and upgrade (version, channels, dependencies, owned crds). To add additional updates to the database, the database just needs to be inserted/batch updated with changes rather than rebuilt from scratch.

	- So that I can view metadata about that operator (description, author, etc) on the OpenShift console
	- So that I can view metadata about available operators (description, author, etc) on the OpenShift console

Enhancement for operator-registry #24

Enhancement for operator-registry #24

Conversation

kevinrizza commented Sep 18, 2019

kevinrizza commented Sep 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Sep 19, 2019

jwforres left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinrizza commented Sep 20, 2019

shawn-hurley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Sep 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinrizza Sep 24, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shawn-hurley Oct 1, 2019 • edited

Choose a reason for hiding this comment

kevinrizza Oct 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Oct 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell commented Oct 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinrizza Oct 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhrivnak commented Oct 2, 2019

kevinrizza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Sep 23, 2019 •

edited

kevinrizza Sep 24, 2019 •

edited

shawn-hurley Oct 1, 2019 •

edited

kevinrizza Oct 1, 2019 •

edited

ecordell Oct 3, 2019 •

edited

kevinrizza Oct 2, 2019 •

edited