Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements: Search #71

Open
SteveLasker opened this issue Oct 3, 2019 · 13 comments
Open

Requirements: Search #71

SteveLasker opened this issue Oct 3, 2019 · 13 comments
Labels
enhancement New feature or request

Comments

@SteveLasker
Copy link
Contributor

SteveLasker commented Oct 3, 2019

OCI Artifact Search Requirements

As registries support multiple artifact types, a search/catalog API that supports filtering on the artifact type will be needed.

The docker v1 registry spec supported Docker Search. While some vendors like Quay.io implemented the v1 search API, the majority of vendors require the v2 registry api which dropped search.

We believe revisiting the search api will support client CLIs that span registries, such as helm search, duffle search (CNAB), docker search, and other evolving artifact types.

By supporting a common search API across all registries, users could consistently use these new artifact CLIs across all registries.

This issue focuses on capturing the requirements for new Search and Eventing APIs. As the requirements are agreed upon, we'll move to a spec that captures the requirements.

KubeCon 2019 EU Notes: OCI Catalog Listing APIs

Use Cases

Search is a generic capability used across several different use cases.

Tool Specific Searches

Helm, Singularity, Docker, OPA, CNAB and other tools will need to query their specific artifact types across various registries.
Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts.

helm search demo42.azurecr.io hello-world

Results
--------------------------------------
samples/hello-world
marketing/products/hello-world-sample
dev/prototypes/sample-hello-world

Version specific searches:

helm search --versions demo42.azurecr.io samples/hello-world

Results
--------------------------------------
samples/hello-world   1.0
samples/hello-world   1.1
samples/hello-world   1.2

Registry Specific Search

Users want to query registries for the artifacts that match a specific name or list artifacts within a given path. In this case, the results contain multiple artifact types.

Today, registries have created unique client APIs and server APIs. Until we have a generic registry client, it's expected registries will have vendor specific APIs. However, having common registry server side APIs expands the possibility for common tooling across registries.

A registry search API would include

  • repo listings
  • tag/version listings
  • limit by artifact type
  • query by date range, such as what's changed/added since a given timestamp
  • as results may be paged, sorting the results by name and/or version with ascending and descending options

Existing examples

ACR list repo example:

Without a common search/catalog API, cloud vendors have had to implement vendor specific experiences:

az acr repository list -n demo42

Name                         
-----------------------------
samples/demo42/queueworker   
samples/demo42/quotes-api    
samples/demo42/web           
samples/demo42/deploy/chart  
samples/demo42/deploy/cnab   
samples/demo42/deploy/arm    

ACR list tags example, w/ future type added:

az acr repository show-tags -n demo42 --repository samples/demo42/deploy/chart

Result  Type
-------------
1.0     helm-chart
1.1     helm-chart
1.1.1   helm-chart
2.0     helm-chart
3.0     helm-chart

A repo could contain multiple artifact types

az acr repository show-tags -n demo42 --repository samples/demo42/deploy

Result       Type
------------ ----------------
helm-1.0     helm-chart
helm-1.1     helm-chart
helm-1.1.1   helm-chart
cnab-1.0     cnab
arm-1.0      arm

Rather than each registry vendor having to offer unique APIs, the goal would be to offer a common API.

Registry Tool Search - Scanners

Vendors and the community have attempted to build tools atop registries.

Without a common search/catalog API, these tools must work with individual images.

One of the most common registry tools include image scanning tools like Aqua, Twistlock, Neuvector and Clair.
While the scanning tools protect runtime nodes, they all pre-scan registries to understand image vulnerabilities before they're run.

Scanners evaluate images in registries with a combination of a search/catalog API and events.

These vulnerability scanners need the following:

  • list all repos and tags for the inital scan evaluation
  • get paged results as they may contain thousands of images
  • periodically list all new and update images and tags, to keep a registry up to date
  • register for events to scan images as they arrive. Possibly using The Container Quarantine Pattern
  • filter, or at least understand the different artifact types
  • as new CVEs are found, re-scan the registry

Today, scanners assume all artifacts in a registry are a container image. As a registry stores new artifact types, scanners will either need to know how to scan these new artifacts, or at least filter the results to artifacts they support.

Artifact Types

A registry must know the types it hosts for it to provide meaningful search results.
Artifact types will be internally identified by an expanded set of OCI Media Types.

However, displaying application/vnd.cncf.helm.chart.v1+json does not make for a good user experience. To provide clean user experiences, a list of artifact types, a short description, and info on the artifact tooling will be maintained. Media Type Short Names

Media Type Display Name Info
application/vnd.oci.image.index.v1+json OCI Image Docker *
application/vnd.oci.image.manifest.v1+json OCI Image Docker *
application/vnd.cncf.helm.chart.v1+json Helm Helm
application/vnd.oci.cnab.index.v1+json CNAB Duffle, Docker-application

* most registry providers automatically convert oci.image manifests to the format requested by the client.

Registry Search Requirements

Listing repos

Listing artifacts

Listing versions

Filtering by artifacts

Filtering by date ranges

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

  • date:time filters MUST be supported on manifests and tags.
  • Registry operators MAY add additional value by parsing manifest.config objects. This allows a registry to option to add value, while not burdening all registries to parse all config objects of all artifact types.

Paging

Results may be paged, to provide a full list of artifacts.
A default page size of 100, with the ability to change the paging size.

Sorting

As results may be paged, being able to sort provides the ability to get the top n results, based on a given sort order. Sorting includes ascending and descending.

Role Based Access Control

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account. The spec shall not define specific rights or roles for how authorization should be implemented or managed, rather simply state the registry must be cognizant of security and support the security of it's product and/or platform. If a user has read access to repo1 and repo3, but not repo2, the repository listing should only return repo1, repo3. The spec will not define a role where read differentiates between management and data operations.

@jonjohnsonjr
Copy link
Contributor

A lot of this seems more about listing than searching. We really need to figure out what that will look like. I had a strawman here that suffered a painful death by bikeshedding.

as results may be paged, sorting the results by name and/or version with ascending and defending options

descending

What's a "version" here, from a registry perspective? A tag or digest? Just a tag? A new thing?

Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts.

This might be a pain. The artifacts stuff works because registries can be (mostly) agnostic to the contents of what's being distributed. The more stuff we require to be indexed, the less flexible we make the distribution API.

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

Artifact creation from the client's perspective (i.e. created time) or when it was pushed?

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account.

The spec doesn't currently speak to access models currently, so I'm not sure how appropriate this is.

@SteveLasker
Copy link
Contributor Author

Thanks @jonjohnsonjr ,
As I reviewed the hackmd doc, posting here, I realized we had much better structure of conversation in the KubeCon 2019 EU Notes: OCI Catalog Listing APIs

This topic should, IMO, cover listing, search and eventing as these are all required to meet the scenarios.

What's a "version" here, from a registry perspective? A tag or digest? Just a tag? A new thing?
Good point. I suppose we would need both.
Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts.
This might be a pain. The artifacts stuff works because registries can be (mostly) agnostic to the contents of what's being distributed. The more stuff we require to be indexed, the less flexible we make the distribution API.

The premise of artifacts means all objects in a registry have a unique manifest.config.mediaType. To your point, search would likely incorporate top level data, such as :tag, digests and annotations. I think you're referring to information stored in the manifest.config. I'd suggest it would be up to the registry operator to decide what "value add" they wish to provide. If gcr wanted to surface additional search metadata which is parsed from configs of specific artifacts, that would be cool. But, I would imagine the search spec would say this was optional. Annotations, :tags, manifests would be required.

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

Artifact creation from the client's perspective (i.e. created time) or when it was pushed?

This is an interesting one where the value is stored in a config, unique to the OCI Image artifact type.
This goes to the conversation above, related to parsing config objects. I'd probably still say, we'd say the spec would MUST on the manifest and tag dates, while making config values optional value add.

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account.
The spec doesn't currently speak to access models currently, so I'm not sure how appropriate this is.

One of the complaints I've heard about implementing _catalog consistently was it didn't address returning information the user doesn't have access to. I'm not suggesting we spec a specific auth model. Rather, stating results must match the rights of the user. I'd go further to say how rights/roles are defined should not be specified, rather leave it fairly high level, enabling registry operators the freedom to implement models that align with their products or platforms.

defending decending

I'll fix, thanks

@rchincha
Copy link
Contributor

Has a GraphQL [1] endpoint been considered for queries?

[1] https://en.wikipedia.org/wiki/GraphQL

@SteveLasker
Copy link
Contributor Author

Suggest adding a label for vNext to avoid this being incorporated into the v1 scope.

@vbatts vbatts added the enhancement New feature or request label Mar 4, 2020
@vbatts
Copy link
Member

vbatts commented Apr 1, 2020

after having thought about the extensions proposal #111 I'm wondering whether this might ought to be an extension that registerys could choose to implement?

@mikebrow
Copy link
Member

mikebrow commented Apr 2, 2020

+1 on extension

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Aug 30, 2021

Just bumping a few links as searching/discovering and indexing continues to come up:

@jonjohnsonjr
Copy link
Contributor

continues to come up

Where?

@SteveLasker
Copy link
Contributor Author

helm, bicep, wasm and others that are trying to utilize registries as their package management, so they don' have to build one.

@rchincha
Copy link
Contributor

rchincha commented Oct 25, 2021

As a data point, zot project has used graphQL [1] to help with this [2], [3].
Caveat that it does need the client to be aware of the graphQL schema.

[1] https://en.wikipedia.org/wiki/GraphQL
[2] https://github.com/anuvu/zot#listing-images
[3] https://github.com/anuvu/zot/blob/main/pkg/extensions/search/schema.graphql#L57

@rchincha
Copy link
Contributor

rchincha commented Dec 5, 2022

An update on this, more formalized as as OCI dist-spec extension:
https://github.com/project-zot/zot/blob/main/pkg/extensions/search/search.md

discoverable via:
https://github.com/opencontainers/distribution-spec/tree/main/extensions
https://github.com/opencontainers/distribution-spec/blob/main/extensions/_oci.md

@rchincha
Copy link
Contributor

rchincha commented Jan 12, 2024

Any interest in reviving this conversation? Now that we are converging towards adding the ability to store image and non-image artifacts in OCI conformant registries.

Can try to send out a draft proposal (roughly along the lines of what zot currently has)

rchincha added a commit to rchincha/distribution-spec that referenced this issue Jan 17, 2024
opencontainers#71

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
rchincha added a commit to rchincha/distribution-spec that referenced this issue Jan 17, 2024
opencontainers#71

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants