Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declaring preprocessor functions #2385

Open
karlkfi opened this issue Jul 10, 2021 · 14 comments
Open

Declaring preprocessor functions #2385

karlkfi opened this issue Jul 10, 2021 · 14 comments
Labels
area/hydrate customer deep engagement design-doc enhancement New feature or request triaged Issue has been triaged by adding an `area/` label
Milestone

Comments

@karlkfi
Copy link
Contributor

karlkfi commented Jul 10, 2021

I want to be able to use a function in a pipeline to perform processing of values AFTER they have been set by a parent package's apply-setters.

The following cluster package works intuitively as a single package, but it doesn't work when it it's inside a parent package that also uses apply-setters, because the parent package mutators are executed after the child package's mutators...

Parent setters:

apiVersion: kpt.dev/v1
kind: Kptfile
metadata:
  name: cluster
pipeline:
  mutators:
    - image: gcr.io/kpt-fn/apply-setters:v0.1
      configPath: setters.yaml

Child setters:

apiVersion: kpt.dev/v1
kind: Kptfile
metadata:
  name: nodepool
pipeline:
  mutators:
    - image: gcr.io/kpt-fn/apply-setters:v0.1
      configPath: setters.yaml
    - image: gcr.io/kpt-fn/starlark:v0.1
      configPath: truncate-service-accounts.yaml

The workaround I know of is to copy the function to the parent package kptfile, but if I do that, it will affect all resources in the parent package, not just the resources in the child package. And even if I could scope it to the right package, it would be a hassle to have to copy function config from child packages to parent packages.

One way to solve this might be to have early and late mutators.

Order of execution:

  • parent "early mutators"
  • child "early mutators"
  • child "late mutators"
  • parent "late mutators"

This would kill two birds with one stone, because it would also allow for parent packages to modify the setter values of child packages. I could put apply-setters as an "early mutator" and the starlark function in a "late mutator" and it would affect the setter values as passed from parent to child without affecting the parent or sibling packages.

For example...

Parent setters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: setters
  annotations:
    config.kubernetes.io/local-config: "true"
data:
  name: cluster-1
  project-id: example-1234

Child setters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: setters
  annotations:
    config.kubernetes.io/local-config: "true"
data:
  name: pool-1
  cluster-name: cluster-1 # kpt-set: ${name}
  project-id: example-1234 # kpt-set: ${project-id}

Child resource:

apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMServiceAccount
metadata:
  name: gke-cluster-1-pool-1 # kpt-set: gke-${cluster-name}-${name}

Service account names are frequently too long when concatenating other names together. With this configuration the starlark function could truncate it after the setters have been applied. And with this pattern, the name setter means "node pool name" in the child package and "cluster name" in the parent package, allowing for hierarchical setter namespacing.

@karlkfi karlkfi added the enhancement New feature or request label Jul 10, 2021
@frankfarzan
Copy link
Contributor

frankfarzan commented Jul 12, 2021

This is a use case that has been discussed, referred to as "pre-processors" as opposed to current execution order where parent package functions run as "post-processors". This is something that can be added incrementally as it is purely additive (i.e. via a preprocessors field in the pipeline section) so was left out of v1 scope. We should look at this in v1.1 time frame.

@frankfarzan frankfarzan changed the title Early & Late Mutator Functions Declaring preprocessor functions Jul 12, 2021
@frankfarzan frankfarzan added this to To do in kpt kanban board via automation Jul 12, 2021
@frankfarzan frankfarzan added the triaged Issue has been triaged by adding an `area/` label label Jul 12, 2021
@frankfarzan frankfarzan added this to the v1.1 milestone Jul 12, 2021
@mikebz mikebz removed this from the v1.1 milestone Jul 14, 2021
@morgante
Copy link
Contributor

Please prioritize this. The lack of this basically makes setter inheritance unworkable as currently implemented. Let's say I want to apply a setter (ex. organization-id) but some functions in lower packages depend on that setter value. The functions in subpackages will always run before the parent apply-setters function, meaning my only option is to manually copy the setter value down into my subpackage.

@phanimarupaka
Copy link
Contributor

@karlkfi Not proposing a solution here, but trying to understand the problem better, does it help if we make the order of hydration configurable ? Currently, the default order if bottom up, where children are hydrated before parents. What if we make it configurable, and hydrate parents first and then children?(top down)

@morgante
Copy link
Contributor

@phanimarupaka That would not solve the problem, as we still have cases where parents need to be applied first (ex. setter inheritance, the one I mentioned). Shifting the burden to users via configuration isn't sufficient.

@phanimarupaka
Copy link
Contributor

So for this case, having early-mutators, early-validators which are applied in top-down fashion(additive change) and mutators, validators which apply in bottom-up fashion(current behavior) should help I guess. Advanced users can pick and choose where to include the functions.

@phanimarupaka
Copy link
Contributor

@morgante Another way is by using target selector for functions. The starlark function can be moved to parent package and specify target functions as mentioned in this #2015.

@morgante
Copy link
Contributor

@phanimarupaka No, that is not a solution. The entire point is that the parent package should not need to know the details of child packages. Once you start tightly coupling them like that, you've lost the package boundary.

We need either a form of pre and post hooks, or a way to assign weights to functions.

@karlkfi
Copy link
Contributor Author

karlkfi commented Jul 23, 2021

Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.

In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.

@droot
Copy link
Contributor

droot commented Jul 23, 2021

@phanimarupaka about: does it help if we make the order of hydration configurable ?

That won't help technically because we don't allow operating on meta resources during render, while in this case, primary objective for top-down processing is to customize the input to the pipeline.

The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.
In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

Thanks @karlkfi This articulates the use-case so well conceptually.

@bgrant0607 also touched on this in the comment #1280 (comment)

I think we should design this with filter functionality together because they seem to have an interplay here.

@phanimarupaka
Copy link
Contributor

phanimarupaka commented Jul 23, 2021

Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.

So the early mutators should act only on the inputs, and early validators should validate the inputs to solve this problem. The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources. I feel that this is pretty powerful in terms of input values mutations and validations. This would also solve issues like #2041.

Some prior art in this space is PersistentPreRun hook in Cobra https://github.com/spf13/cobra/blob/v1.2.1/user_guide.md#prerun-and-postrun-hooks

In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.

@droot
Copy link
Contributor

droot commented Jul 23, 2021

The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.

@phanimarupaka
Copy link
Contributor

The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.

Agree. I think we got good inputs regarding the problem statement. We can switch to internal design doc to finalize the design.

@karlkfi
Copy link
Contributor Author

karlkfi commented Jul 23, 2021

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package.

Agreed. Because kpt uses in-place hydration, all of the outputs may be used as inputs.

@phanimarupaka phanimarupaka added this to the Q3-2021 milestone Aug 24, 2021
@phanimarupaka phanimarupaka moved this from In progress to ToDo in kpt kanban board Sep 7, 2021
@phanimarupaka phanimarupaka modified the milestones: Q3-2021, Q4-2021 Sep 22, 2021
@droot droot removed this from ToDo in kpt kanban board Jan 19, 2022
@droot droot assigned droot and unassigned droot and phanimarupaka Mar 21, 2022
@RafalMaleska
Copy link

stumbled about this as well.
makes in our case the usage of child-parent kptfiles difficult

current workaround: wrapper script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hydrate customer deep engagement design-doc enhancement New feature or request triaged Issue has been triaged by adding an `area/` label
Projects
None yet
Development

No branches or pull requests

7 participants