Flesh out the input data model and patterns #3396

bgrant0607 · 2022-07-20T21:53:51Z

Topic that needs more work.

We've figured out some aspects and requirements of package / function inputs:

Packages are not encapsulated and don't have monolithic package-specific interfaces: https://kpt.dev/guides/rationale
Setters are another form of manually specified parameters, and are not recommended: Document problems with setters / parameters, and alternatives #3131. We also ran into problems with composition of multiple packages when using setters.
Individual attributes should be able to just be edited in place, since we store the rendered output and update it in place, and functions should "patch" resources in general, rather than blow them away Special behavior for generators #2528.
KRM functions can take their inputs from a specified "function config", which is either a ConfigMap or a client-side KRM type, such as ApplyReplacements, which we should be able to automatically map to functions Map function inputs to functions automatically via a catalog mechanism #3339.
We've noticed with value transformers Make it easier to write value transformers #3155, such as set-namespace and set-labels, that input values often need to be copied from their sources to the input structures expected by the functions.
The variant-constructor pattern can use the package name to provide distinct identities for variants.
We've identified the need for a deployment target "context", such as cluster targets Declarative cluster targeting #3387, along the lines of kubeconfig, gcloud config, terraform provider config, etc.
Standardization of input types, APIs, across packages increase the opportunity for plug-and-play-style automation
There may be other properties associated with the variant "context" that we need to discover or users need to provide. If the latter, we may need to be able to identify those attributes automatically, so that we can prompt the user. Multiple sources of context are common, such as environment and application.
We need to be able to find sets of input contexts in order to automatically generate corresponding sets of deployment packages. Bulk package creation #3347
We need to be able to identify downstream inputs in order to implement a "replay" approach to package upgrades. Fixing updates in porch #3329
We want to support loosely coupled external dependencies, such as an application package like ghost requiring a namespace or a SQL database
Fully dynamic data, like autoscaled replica counts, may not belong in config storage. Another common example is allocated IP addresses, for which service discovery systems and DNS are common. But there may be some that we want to "snapshot" and write to storage. GitOps image updaters are an example.
Some inputs may be reasonably self-contained, such as application config for ConfigMap generation ConfigMap generation #3119.

But we don't have a fully fleshed out model or recommended patterns yet.

kpt isn't the first config tool to encounter these issues. We should look at data-oriented, non-package-parameter-based models for inspiration.

Some examples:

Puppet's facter and hiera. Core facts are kind of like standardized context.
Ansible inventory. Ansible was described as infrastructure as data. Example of separating out input data.
Terraform data sources. Example
Kapitan inventory. Video. Generator reference.
Various runtime parameter stores: Consul, Pulumi config, etc.

Additional thoughts or findings should be posted back here.

cc @justinsb @johnbelamaric @droot @yuwenma

bgrant0607 · 2022-07-22T18:48:58Z

Related: When gathering inputs, we may need to allow network access: #2450. And probably a way to provide credentials.

bgrant0607 · 2022-07-22T19:54:47Z

It's also worth mentioning kustomize components:
https://github.com/kubernetes-sigs/kustomize/blob/master/examples/components.md
https://github.com/kubernetes/enhancements/blob/master/keps/sig-cli/1802-kustomize-components/README.md

johnbelamaric · 2022-07-22T20:10:10Z

Related: When gathering inputs, we may need to allow network access: #2450. And probably a way to provide credentials.

Do we need to solve this in the CLI case / with kpt functions? That is, could more complex cases like this be handled instead only in the Porch incarnation of CaD, where we can build controllers that interact with other systems in any way we want? If an interactive CLI based session requires network reach out, then it can more easily fail, for example. Also, there are interactions we will never be able to handle that way - for example, imagine that getting an input requires filing a ticket, which a human then responds to. In the controller case, we can handle this sort of arbitrary-time-delay without any trouble. But it won't work at all in the interactive kpt fn render case.

bgrant0607 · 2022-07-22T20:24:19Z

@johnbelamaric I don't expect inputs to be generated during the kpt fn render pipeline, in general. It may consume the inputs. Input generation / gathering likely needs to be decoupled. Interactive forms or prompts is one such example.

Your ticket example is a good one, thanks. If you think of others, post them here.

johnbelamaric · 2022-07-22T21:42:42Z

A few quick thoughts, all slight variations on "fetch from external system":

Read from a CMDB or other external database
Allocate from IPAM or other external system
Read from another cluster (e.g., get the LB IP of an LB-backed K8s Service that has no DNS entry)
Read from the config of a package on which this depends

johnbelamaric · 2022-07-22T22:38:55Z

Read from the cloud provider API. For example, if an app is dependent on a DB application, that may be another subpackage, or it may be a cloud provider DB instance (which maybe is provisioned by a separate package, or maybe not).

Not all of these are necessarily only "function inputs". They could simply be ways of setting field values. For the example in the IPAM case, I can imagine a couple different approaches (this applies to others too, probably).

The resources that have an IP address field of course just accept an IP value; they do not have a concept of sourcing that value from anywhere. But we could use a placeholder value and a marker comment. The marker comment could indicate the inputs to the IPAM system. A controller (running in the Porch cluster) could see an unresolved placeholder (or the marker comment could indicate this, to avoid a conflict with that sentinel value), and use the data from the comment (which would be arbitrary from the package point of view, for example: "region", "cluster-name", "package-name") to call out to the IPAM, and get back an allocated IP. This would have to be an idempotent operation.
Another approach would be to use an intermediate resource, which could effectively define an API. So, you have some CR that represents an IPAM request. The controller (or arguably a function) processes that request and stores back the allocated value in a status field. This can then be referenced by the function input by whatever mechanism we come up with for field references. Or, if we support references in field values, it could be placed directly in there.

Reading that over, the second approach is probably more maintainable and flexible.

bgrant0607 · 2022-07-25T20:06:46Z

CMDB is an example use case for dynamic inventory in ansible, such as via inventory plugins and inventory scripts.

In addition to querying inputs dynamically, adapting input data locations / schemas to expected function input locations / schemas (or, in the case of IaC, to parameters of off-the-shelf packages) appears to be one of the other core / common issues.

bgrant0607 · 2022-07-25T20:48:47Z

Example from slack: https://kubernetes.slack.com/archives/C0155NSPJSZ/p1658760504705309

How to provide information to packages automatically.

bgrant0607 · 2022-07-26T20:40:28Z

The idea of "decorations" was discussed in the app config issue:
#3351 (comment)
#3351 (comment)

kubectl expose and autoscale are examples of this.

Resource creation might be imperative, but this does raise the issue of using information from resources themselves as function inputs.

In the ghost package, we're experimenting with that approach as a way to propagate the host name:
https://github.com/GoogleContainerTools/kpt/pull/3403/files

We could also use the approach to read resource requests and set application resource-dependent settings accordingly:
#3210 (comment)

In order to be understandable there probably needs to be an intuitive source of truth. A potential advantage of the approach is that the source of truth could be well known, as opposed to an input to an arbitrary function. However, if multiple locations disagreed and the source of truth were ambiguous, then the user would need to be asked to resolve the inconsistency, as when providing multiple values in an undiscriminated union.

This approach could have implications for update strategies.

yuwenma · 2022-08-29T17:42:25Z

A Tekton example from Slack: https://github.com/marniks7/chaos-catalog

Slack message: https://kubernetes.slack.com/archives/C0155NSPJSZ/p1661457969525029?thread_ts=1661311193.053569&cid=C0155NSPJSZ
More example for non-KRM file: #2350 (comment)

bgrant0607 · 2024-04-19T20:13:31Z

Example from another domain:
https://support.microsoft.com/en-us/office/use-mail-merge-for-bulk-email-letters-labels-and-envelopes-f488ed5b-b849-4c11-9cff-932c49474705

bgrant0607 added the enhancement New feature or request label Jul 20, 2022

bgrant0607 mentioned this issue Jul 26, 2022

Epic: WYSIWYG Kubernetes Application Configuration #3351

Open

29 tasks

yuwenma mentioned this issue Aug 8, 2022

set-image function is optimized for out-of-place and found it practically unusable forin-place mode #3444

Open

This was referenced Aug 18, 2022

WorkloadIdentityBinding operator #3456

Merged

Package dependencies: expressing ("my package requires") and fulfilling ("my package provides") #3448

Open

mortent added triaged Issue has been triaged by adding an `area/` label area/porch epic labels Nov 16, 2022

mortent added this to ToDo in kpt kanban board Nov 16, 2022

johnbelamaric mentioned this issue May 16, 2023

CEL for function inputs #3964

Open

liamfallon mentioned this issue Apr 23, 2024

Flesh out the input data model and patterns nephio-project/nephio#662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flesh out the input data model and patterns #3396

Flesh out the input data model and patterns #3396

bgrant0607 commented Jul 20, 2022 •

edited

Loading

bgrant0607 commented Jul 22, 2022

bgrant0607 commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022 •

edited

Loading

bgrant0607 commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022

bgrant0607 commented Jul 25, 2022 •

edited

Loading

bgrant0607 commented Jul 25, 2022

bgrant0607 commented Jul 26, 2022

yuwenma commented Aug 29, 2022

bgrant0607 commented Apr 19, 2024

Flesh out the input data model and patterns #3396

Flesh out the input data model and patterns #3396

Comments

bgrant0607 commented Jul 20, 2022 • edited Loading

bgrant0607 commented Jul 22, 2022

bgrant0607 commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022 • edited Loading

bgrant0607 commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022

johnbelamaric commented Jul 22, 2022

bgrant0607 commented Jul 25, 2022 • edited Loading

bgrant0607 commented Jul 25, 2022

bgrant0607 commented Jul 26, 2022

yuwenma commented Aug 29, 2022

bgrant0607 commented Apr 19, 2024

bgrant0607 commented Jul 20, 2022 •

edited

Loading

johnbelamaric commented Jul 22, 2022 •

edited

Loading

bgrant0607 commented Jul 25, 2022 •

edited

Loading