Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out the input data model and patterns #3396

Open
bgrant0607 opened this issue Jul 20, 2022 · 11 comments
Open

Flesh out the input data model and patterns #3396

bgrant0607 opened this issue Jul 20, 2022 · 11 comments
Labels
area/porch enhancement New feature or request epic triaged Issue has been triaged by adding an `area/` label

Comments

@bgrant0607
Copy link
Contributor

bgrant0607 commented Jul 20, 2022

Topic that needs more work.

We've figured out some aspects and requirements of package / function inputs:

  • Packages are not encapsulated and don't have monolithic package-specific interfaces: https://kpt.dev/guides/rationale
  • Setters are another form of manually specified parameters, and are not recommended: Document problems with setters / parameters, and alternatives #3131. We also ran into problems with composition of multiple packages when using setters.
  • Individual attributes should be able to just be edited in place, since we store the rendered output and update it in place, and functions should "patch" resources in general, rather than blow them away Special behavior for generators #2528.
  • KRM functions can take their inputs from a specified "function config", which is either a ConfigMap or a client-side KRM type, such as ApplyReplacements, which we should be able to automatically map to functions Map function inputs to functions automatically via a catalog mechanism #3339.
  • We've noticed with value transformers Make it easier to write value transformers #3155, such as set-namespace and set-labels, that input values often need to be copied from their sources to the input structures expected by the functions.
  • The variant-constructor pattern can use the package name to provide distinct identities for variants.
  • We've identified the need for a deployment target "context", such as cluster targets Declarative cluster targeting #3387, along the lines of kubeconfig, gcloud config, terraform provider config, etc.
  • Standardization of input types, APIs, across packages increase the opportunity for plug-and-play-style automation
  • There may be other properties associated with the variant "context" that we need to discover or users need to provide. If the latter, we may need to be able to identify those attributes automatically, so that we can prompt the user. Multiple sources of context are common, such as environment and application.
  • We need to be able to find sets of input contexts in order to automatically generate corresponding sets of deployment packages. Bulk package creation #3347
  • We need to be able to identify downstream inputs in order to implement a "replay" approach to package upgrades. Fixing updates in porch #3329
  • We want to support loosely coupled external dependencies, such as an application package like ghost requiring a namespace or a SQL database
  • Fully dynamic data, like autoscaled replica counts, may not belong in config storage. Another common example is allocated IP addresses, for which service discovery systems and DNS are common. But there may be some that we want to "snapshot" and write to storage. GitOps image updaters are an example.
  • Some inputs may be reasonably self-contained, such as application config for ConfigMap generation ConfigMap generation #3119.

But we don't have a fully fleshed out model or recommended patterns yet.

kpt isn't the first config tool to encounter these issues. We should look at data-oriented, non-package-parameter-based models for inspiration.

Some examples:

Additional thoughts or findings should be posted back here.

cc @justinsb @johnbelamaric @droot @yuwenma

@bgrant0607 bgrant0607 added the enhancement New feature or request label Jul 20, 2022
@bgrant0607
Copy link
Contributor Author

Related: When gathering inputs, we may need to allow network access: #2450. And probably a way to provide credentials.

@bgrant0607
Copy link
Contributor Author

@johnbelamaric
Copy link
Contributor

johnbelamaric commented Jul 22, 2022

Related: When gathering inputs, we may need to allow network access: #2450. And probably a way to provide credentials.

Do we need to solve this in the CLI case / with kpt functions? That is, could more complex cases like this be handled instead only in the Porch incarnation of CaD, where we can build controllers that interact with other systems in any way we want? If an interactive CLI based session requires network reach out, then it can more easily fail, for example. Also, there are interactions we will never be able to handle that way - for example, imagine that getting an input requires filing a ticket, which a human then responds to. In the controller case, we can handle this sort of arbitrary-time-delay without any trouble. But it won't work at all in the interactive kpt fn render case.

@bgrant0607
Copy link
Contributor Author

@johnbelamaric I don't expect inputs to be generated during the kpt fn render pipeline, in general. It may consume the inputs. Input generation / gathering likely needs to be decoupled. Interactive forms or prompts is one such example.

Your ticket example is a good one, thanks. If you think of others, post them here.

@johnbelamaric
Copy link
Contributor

A few quick thoughts, all slight variations on "fetch from external system":

  • Read from a CMDB or other external database
  • Allocate from IPAM or other external system
  • Read from another cluster (e.g., get the LB IP of an LB-backed K8s Service that has no DNS entry)
  • Read from the config of a package on which this depends

@johnbelamaric
Copy link
Contributor

  • Read from the cloud provider API. For example, if an app is dependent on a DB application, that may be another subpackage, or it may be a cloud provider DB instance (which maybe is provisioned by a separate package, or maybe not).

Not all of these are necessarily only "function inputs". They could simply be ways of setting field values. For the example in the IPAM case, I can imagine a couple different approaches (this applies to others too, probably).

  • The resources that have an IP address field of course just accept an IP value; they do not have a concept of sourcing that value from anywhere. But we could use a placeholder value and a marker comment. The marker comment could indicate the inputs to the IPAM system. A controller (running in the Porch cluster) could see an unresolved placeholder (or the marker comment could indicate this, to avoid a conflict with that sentinel value), and use the data from the comment (which would be arbitrary from the package point of view, for example: "region", "cluster-name", "package-name") to call out to the IPAM, and get back an allocated IP. This would have to be an idempotent operation.
  • Another approach would be to use an intermediate resource, which could effectively define an API. So, you have some CR that represents an IPAM request. The controller (or arguably a function) processes that request and stores back the allocated value in a status field. This can then be referenced by the function input by whatever mechanism we come up with for field references. Or, if we support references in field values, it could be placed directly in there.

Reading that over, the second approach is probably more maintainable and flexible.

@bgrant0607
Copy link
Contributor Author

bgrant0607 commented Jul 25, 2022

CMDB is an example use case for dynamic inventory in ansible, such as via inventory plugins and inventory scripts.

In addition to querying inputs dynamically, adapting input data locations / schemas to expected function input locations / schemas (or, in the case of IaC, to parameters of off-the-shelf packages) appears to be one of the other core / common issues.

@bgrant0607
Copy link
Contributor Author

Example from slack: https://kubernetes.slack.com/archives/C0155NSPJSZ/p1658760504705309

How to provide information to packages automatically.

@bgrant0607
Copy link
Contributor Author

The idea of "decorations" was discussed in the app config issue:
#3351 (comment)
#3351 (comment)

kubectl expose and autoscale are examples of this.

Resource creation might be imperative, but this does raise the issue of using information from resources themselves as function inputs.

In the ghost package, we're experimenting with that approach as a way to propagate the host name:
https://github.com/GoogleContainerTools/kpt/pull/3403/files

We could also use the approach to read resource requests and set application resource-dependent settings accordingly:
#3210 (comment)

In order to be understandable there probably needs to be an intuitive source of truth. A potential advantage of the approach is that the source of truth could be well known, as opposed to an input to an arbitrary function. However, if multiple locations disagreed and the source of truth were ambiguous, then the user would need to be asked to resolve the inconsistency, as when providing multiple values in an undiscriminated union.

This approach could have implications for update strategies.

@yuwenma
Copy link
Contributor

yuwenma commented Aug 29, 2022

@mortent mortent added triaged Issue has been triaged by adding an `area/` label area/porch epic labels Nov 16, 2022
@mortent mortent added this to ToDo in kpt kanban board Nov 16, 2022
@bgrant0607
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/porch enhancement New feature or request epic triaged Issue has been triaged by adding an `area/` label
Projects
Development

No branches or pull requests

4 participants