Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2985: Public KRM Functions Registry #2986

Merged
merged 7 commits into from
Oct 12, 2021
Merged

Conversation

mengqiy
Copy link
Member

@mengqiy mengqiy commented Sep 22, 2021

  • One-line PR description: adding new KEP for public KRM functions registry
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Sep 22, 2021
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
Publishers are responsible for the security of their KRM functions. Publishers
are responsible for clearly communicating the expectation (e.g. maturity) to
their users (e.g. through a file for the publisher. We should standardize it
eventually). For example, Kustomize wants to provide a small set of carefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something about what you mean here, we already have a means of indicating maturity: each function config's GV. I wouldn't expect a publisher's entire catalog to have a single maturity level.

_kustomize/experimental_ for the latter.

If some functions don't have any publishers, the users should use it at their
own risk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use case for anonymous publishers as opposed to the "experimental" SIG publisher above? Even if the publisher is an individual Github user, I feel they should still be identified as responsible for the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainers are responsible for the functions they contributed no matter even when they are not published with a publisher.

keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
It should be sponsored by SIG-CLI.

The repo name can be `KRM-functions`, `KRM-function-catalog`,
`KRM-function-registry` or something more reasonable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like the first and last options, since this will generate more than one catalog.

Copy link
Contributor

@natasha41575 natasha41575 Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the website is going to be krm-fn-registry.sigs.k8s.io, I prefer KRM-function-registry or KRM-fn-registry for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should align the domain and repo name, which implies using lower kebab case. We can mark the exact name as unresolved for now.

keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
catalogs:
- https://krm-fn-registry.sigs.k8s.io/catalog.yaml?publisher=kustomize
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me why I need to specify the global registry. I generally would err on the side of providing reasonable defaults. Having to provide obvious stuff is tedious, discourages quick creation of kustomization.yaml files. The next thing we get is a request from someone to create a kustomization.yaml with all the boilerplate.

keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 24, 2021
@KnVerey
Copy link
Contributor

KnVerey commented Sep 24, 2021

Since the way the Registry generates Catalogs will determine what means of consuming Catalog are available to most users, I think we should more explicitly consider what we want them to do. E.g.:

  • For discovery purposes and the imperative kpt workflows, hitting a website endpoint that dynamically generates a catalog feels suitable. In this case, freshness arguably matters most, and I'd guess traffic levels wouldn't be a problem.
  • For declarative purposes like specification in Kustomizations (and in confirmatory CLI arguments), we should probably favor static options such as publishing OCI artifacts and/or git-committed snapshots addressable by SHA and semver tags. I think the KEP could use more discussion of what we'll make available/encourage in this regard, and I suspect @jeremyrickard will have some opinions on this as well. There's also a potential scalability discussion to be had here if the endpoints we encourage Kustomize users to reference is hosted on SIG infrastructure, since I'd expect significant traffic from sources like CI pipelines.

@mengqiy
Copy link
Member Author

mengqiy commented Sep 24, 2021

For declarative purposes like specification in Kustomizations (and in confirmatory CLI arguments), we should probably favor static options such as publishing OCI artifacts and/or git-committed snapshots addressable by SHA and semver tags.

I agree a static (versioned) catalog can be useful for declarative purposes and having reproducible build which is especially important for production use.
But if we should require semver for tagging a catalog is questionable, since it's not clear to me if it will provide more benefits than confusions. e.g. What are the exception for a major, minor or patch release of a catalog? @KnVerey @jeremyrickard Thoughts?

Add workflow to publish a function.
@mengqiy
Copy link
Member Author

mengqiy commented Oct 1, 2021

The KEP has been updated to include KRM function metadata and workflow to publish a function

@mengqiy
Copy link
Member Author

mengqiy commented Oct 1, 2021

@mikebz GitHub doesn't allow me to re-request review from you. I just ping you here.

- apiVersion: example.com/v1alpha1
kind: LegacySetNamespace
deprecated: true
usage: <a URL pointing to a README.md>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work with a thing like docsify? Docsify wants your markdown files locally AFAIK? So will we then have to develop something that fetches the files when the index changes?

Copy link
Member Author

@mengqiy mengqiy Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point!

So will we then have to develop something that fetches the files when the index changes?

I'd avoid it since it will introduce some maintenance burden.

One way to solve the freshness of the index is that we do both on-demand (e.g. merging PR) and cron-based (e.g. nightly) site building.
An alternative we can consider is that we require the contributors to check in these markdown files (usage and examples) in the registry repo and the URL in the metadata would be pointing to the markdown files in the same repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do both: allow a URL, in which case the website will have a hyperlink instead of inline docs, or a local markdown file, in which case the docs will be embedded into the equivalent field when Catalog is built.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said using the markdown files in the same repo, I meant we can use a URL pointing the file in same repo like https://raw.githubusercontent.com/kubernetes-sigs/krm-functions-registry/master/path/to/markdown. It will make it easier to construct a Catalog resource than using a relative path.


```shell
├── publishers
│ ├── communities
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we want to separate those? What advantage does that give us? Hypothetical question - why don't I just have a one long list fo functions at the root level by name and the owner metadata is already specified?

We now introduce a dependency between what's in the file and where it exists. People might forget to move if for instance the publisher changes (gets acquired for instance).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different types of publishers will have slightly different requirements. When we saw a publisher Foo in a function metadata, we need to know the type of Foo (community, company or individual) to determine the verification requirement needed.
I'm open to suggestions about the layout.

We now introduce a dependency between what's in the file and where it exists. People might forget to move if for instance the publisher changes (gets acquired for instance).

Are you worrying about that the image may disappear (e.g. renamed or moved) after publishing the function?
To mitigate that, we can have some cron-based CI like checking nightly to see if the images still exist. If not, notify the maintainers by GH issue and(or) email.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about defining a kind: Publisher and having that be a second required file in each directory (which could then be flat as Mike suggests)? That kind could have fields for the publisher type, and perhaps some of its fields could also be used to default fields in KRMFunctions during Catalog construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean each publisher need to duplicate thekind: Publisher file in every function directories?
To avoid duplication, we can have a publisher directory that contains a list of kind: Publisher files. And we can have a list of functions at the root level directory and function metadata can point to a publisher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the publishers each already have a directory in your proposed structure, I was picturing the publisher.yaml going in there. But we can iterate on details like that as we figure out what makes sense.

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to get this in as provisional ahead of my and Jeff's related talk next Wednesday. To that end, can we tag the ongoing discussion points as unresolved and aim for a merge early next week? My main quibbles are either about things that I think are fine to mark unresolved at this stage, or about the schema of KRMFunction, which I think should ultimately move to the Catalog KEP, because they need to match IMO (maybe I'm missing something about why they shouldn't?).

It should be sponsored by SIG-CLI.

The repo name can be `KRM-functions`, `KRM-function-catalog`,
`KRM-function-registry` or something more reasonable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should align the domain and repo name, which implies using lower kebab case. We can mark the exact name as unresolved for now.

function. We will only support container-based KRM functions in the public
registry.

Ideally, the content under field `spec` should be able to be used directly in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just ideal, or is it required? When should they differ, given that a catalog needs to be programatically constructed from this data? I'd expect the Registry's catalog compilation to essentially just be collecting a set of these and inserting them into the catalog's functions field. And if it HAS to match, then one KEP or the other (Catalog, IMO) should be the source of truth for it before either is moved to implementable, at the latest.

Currently Catalog has a definition: field that presumably points to a separate artifact with most of the information you have below. I think I agree with inlining it, but I also think we should make the structure closer to a CRD. That's really what a function is after all: a client-side CR, and kind: KRMFunction is shaping up to be the CRD analogue. I think we should embrace this analogy on principle, but there are also some issues below that the CRD structure would solve. Notably, in CRD, most fields fall under items in a versions list. That should be the case for most of the fields you have under spec directly right now: schema, runtime, examples, licence, idempotent, and license all could change with the version.

Please mark this entire "Function Metadata" section as unresolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording.
Marked this section as "unresolved"

keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
```

The following is an example for exec-based KRM function. We will not allow
contributors to publish exec-based KRM functions. But we want to standardize the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow them in the registry even if we don't publish any of the first-party ones this way. Otherwise they're effectively not supported. Orchestrators can make their own choices about what to run/prefer. On the other hand, we should require all verification-related fields to be populated for all exec functions in the registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow them in the registry even if we don't publish any of the first-party ones this way. Otherwise they're effectively not supported.

How strong do you feel about allowing publishing exec-based KRM functions from day 1?

Orchestrators can make their own choices about what to run/prefer.

Orchestrators should be secure by default.
Ideally kustomize should have a --allow-exec-plugins flag or something similar. Imagine we have the following use case:
A kustomize user is trying to achieve a task. The user finds a kustomization.yaml from internet (e.g. stackoverflow) that claims it can solve it. kustomization.yaml uses a catalog that contains exec-based plugins. When the user runs kustomize build, it should ask the user to set the --allow-exec-plugins flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How strong do you feel about allowing publishing exec-based KRM functions from day 1?

As long as the schema we propose supports it in theory, I think it's reasonable to implement container-based only in the initial pass and e.g. require exec support for beta. We can outline this in the rollout plan in the follow-up PR that moves this KEP to implementable.

Ideally kustomize should have a --allow-exec-plugins flag or something similar. Imagine we have the following use case:
A kustomize user is trying to achieve a task. The user finds a kustomization.yaml from internet (e.g. stackoverflow) that claims it can solve it. kustomization.yaml uses a catalog that contains exec-based plugins. When the user runs kustomize build, it should ask the user to set the --allow-exec-plugins flag.

This is a discussion for the "plugin graduation" KEP, but as a quick tl;dr it is proposing that Kustomize require the end user to explicitly trust the catalog rather than authorize categories of runtimes. E.g. they want to trust one specific exec plugin that that kustomization from the internet requires, and they've checked it out specifically--we should not ask them to authorize all exec plugins. By requiring them to type --trusted-catalog=[the catalog they've vetted], they are authorizing a specific set of things that they can audit and that will not silently change. If the Kustomization is remote and gets changed to use additional plugins, Kustomize will fail the build and complain that the catalog the user is trusting does not include the new ones. Another way to think about is that a catalog reference in a Kustomization says to Kustomize "you need this" and the command-line flag says "you can use this". If the latter isn't a superset of the former, the build fails.

- apiVersion: example.com/v1alpha1
kind: LegacySetNamespace
deprecated: true
usage: <a URL pointing to a README.md>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do both: allow a URL, in which case the website will have a hyperlink instead of inline docs, or a local markdown file, in which case the docs will be embedded into the equivalent field when Catalog is built.

keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2985-public-krm-functions-registry/README.md Outdated Show resolved Hide resolved

```shell
├── publishers
│ ├── communities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about defining a kind: Publisher and having that be a second required file in each directory (which could then be flat as Mike suggests)? That kind could have fields for the publisher type, and perhaps some of its fields could also be used to default fields in KRMFunctions during Catalog construction.

@mengqiy
Copy link
Member Author

mengqiy commented Oct 8, 2021

@KnVerey I will update the KEP and mark unsolved as UNRESOLVED. We can aim to merge it as provisional before the KubeCon.

@mengqiy mengqiy requested a review from KnVerey October 11, 2021 17:45
@mengqiy
Copy link
Member Author

mengqiy commented Oct 12, 2021

/cc @KnVerey @monopole

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KnVerey, mengqiy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit e7f51ff into kubernetes:master Oct 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 12, 2021
@mengqiy mengqiy deleted the krmfn branch October 12, 2021 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants