Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-29: Operator dependencies #1454

Merged
merged 10 commits into from Apr 22, 2020
Merged

KEP-29: Operator dependencies #1454

merged 10 commits into from Apr 22, 2020

Conversation

zen-dog
Copy link
Contributor

@zen-dog zen-dog commented Apr 6, 2020

This KEP aims to improve operator user and developer experience by introducing operator dependencies.

Signed-off-by: Aleksey Dukhovniy alex.dukhovniy@googlemail.com
Co-authored-by: Marcin Owsiany marcin@owsiany.pl

This KEP aims to improve operator user and developer experience by introducing operator dependencies.

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty straightforward and elegant solution to dependencies 👍 I'll keep thinking about it in the back of my head but sounds good on first read :)

keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
Copy link
Member

@nfnt nfnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great and well thought-out!
The only sore point is parameter handling for dependencies. It doesn't really fit into the current architecture of KUDO and there's nothing we can do about it. Only workarounds like the ones you're suggesting would work. To fully solve this we would need a different approach to KUDO's controllers. Something which, of course, cannot be part of this KEP. My suggested approach is to get rid of Operator and OperatorVersion and only keep Instance. Furthermore all template rendering would be done in the client and an Instance would consist of already rendered templates. As a result, the instance controller would be much simpler and it's only concern would be to apply resources (from the rendered template) at the respective steps. With this, all dependency management would be done on the client side.

@zen-dog
Copy link
Contributor Author

zen-dog commented Apr 8, 2020

The only sore point is parameter handling for dependencies.

I agree. Wdyt about this alternative? @nfnt

@nfnt
Copy link
Member

nfnt commented Apr 8, 2020

@zen-dog Yes, that sounds like a good approach to solve this in the model we have right now.

Copy link
Member

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks and style improvements inline. However my biggest concern is with parameter passing. Here's my take on it:

KUDO should encourage operator composition by providing a way of operator encapsulation. In other words, operator users should not be allowed to arbitrarily modify the parameters of embedded operator instances. In the proposed model, where we do not support depending on pre-existing Instances, the set of instances that are depended on by the top-level one is an implementation detail.

Therefore I propose that we keep the current syntax and semantics for -p and -P and only allow the operator user to set the parameters on the top-level operator instance. If and when the operator developer wants to provide a way to customize these embedded instances, they can do so using the following mechanism:

The proposed new task type has a parameterFile attribute, which is defined to be the name of a file in the templates directory of the operator package. This file is expanded using the same templating mechanism as other templates, which could include (among other things) values of top-level-operator's parameters, possibly provided by the operator user. The expanded file is fed into the instance installation as parameter file content.

keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@zen-dog
Copy link
Contributor Author

zen-dog commented Apr 14, 2020

The proposed new task type has a parameterFile attribute, which is defined to be the name of a file in the templates directory of the operator package. This file is expanded using the same templating mechanism as other templates, which could include (among other things) values of top-level-operator's parameters, possibly provided by the operator user. The expanded file is fed into the instance installation as parameter file content.

While I agree with the desire of the operator encapsulation I'm not sure if the above-described approach (or the way I understood it) will work.

Let's go through an example: say we have a top-level operator AA (1) and a dependency operator BB (2) where latter has a parameter PASSWORD which is empty and required. We expect the operator user to pick the password during the installation. Now, with the approach above, the BB operator-task would look like:

- name: deploy-bb
  kind: Operator
  spec:
    package: BB
    parameterFile: bb-params.yaml

and the bb-params.yaml smth. like:

BB-PASSWORD: secret

The BB instance also needs to know that password has to be fetched from another Instance and so we have to modify the instance-bb like:

...
spec:
   parameters:
      PASSWORD: {{.Instance-aa.BB-PASSWORD}}

where .Instance-aa is a special type of reference for the top-level AA operator instance with the name Instance-aa.

Now, when executing the BBs deploy plan we would fetch the instance-aa and fetch the parameter BB-PASSWORD.

There are two issues with this approach off the top of my head:

  1. We would need to generate the instance-bb.Spec.Parameters.PASSWORD key and value and for that, we need to know where BB-PASSWORD belongs too. I guess we'll need some namespacing construct for passed parameters here anyway
  2. When talking about referencing of instance resources, the most likely use-case is top-down: e.g. Kafka needs to know the dependency instance-zk.ZK_URI parameter. So, we might end up "referencing the references" which might create circular dependencies.

Summary:
Both issues fall into the referencing instance resources which is ideally a non-goal for this KEP. Both issues would stretch an already packed KEP with additional complexity. With all Instances being pre-created all we need right now is to point the parameters to the right Instance. However, we might have to revisit this approach later and improve on it.

@porridge
Copy link
Member

I got lost where you said "we have to modify the instance-bb like:". Perhaps I'm missing something, but I don't think there is a need for any special types of references or other such mechanisms. Let me explain my idea in the context of your example:

If the bb operator requires a parameter PASSWORD, and the aa operator developer wants to allow users to customize it (rather than hardcode it in the aa operator) then think this should look as follows:

  1. operator bb defines a required default-empty parameter PASSWORD
  2. operator aa defines a parameter BB_PASSWORD (which might also be required or not, up to developer of aa)
  3. operator aa has a task like you said:
    - name: deploy-bb
      kind: Operator
      spec:
        package: BB
        parameterFile: bb-params.yaml
    
  4. operator aa contains a file templates/bb-params.yaml with the following content:
    PASSWORD: {{ .Params.BB_PASSWORD }}
    

Now, let's say a user of aa installs it with: kubectl kudo install aa --parameter BB_PASSWORD=foobar.

I understand that:

  • the Instance resource named aa-instance will contain a spec.parameters.BB_PASSWORD with verbatim value foobar
  • the OperatorVersion resource aa-x.y.z will contain a spec.templates.bb-params.yaml with the (not yet expanded) value PASSWORD: {{ .Params.BB_PASSWORD }}\n.

Then, when the kudo controller runs the deploy-bb task, it needs to:

  • fetch the value of bb-params.yaml from the aa-x.y.z. OperatorVersion,
  • fetch the parameters from the aa-instance Instance,
  • expand the former using the latter, in this example the result will be:
PASSWORD: foobar
  • write that result into a temporary file,
  • feed that file as the value to the --parameter-file flag of the kudo install bb command

voila!

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@zen-dog
Copy link
Contributor Author

zen-dog commented Apr 15, 2020

Then, when the kudo controller runs the deploy-bb task, it needs to:
...
feed that file as the value to the --parameter-file flag of the kudo install bb command

I don't quite understand: KUDO Instance controller is already running the BB deploy plan on an already existing instance-bb, and now you want to run kudo install bb --parameter-file?

fetch the value of bb-params.yaml from the aa-x.y.z. OperatorVersion

  1. I guess you deduct the parent operator from the bb.meta.ownerReferences that we set during the installation? Even so, now that you effectively allowed for Instances to reference each other parameters I'd rather make it explicit rather than implicit (e.g. implicitly deducting that .Params.PASSWORD comes from someplace else). Think about terraform and how you need to explicitly pass vars to the child module:
module "bb" {
  password  = "${var.bb_password}
}
  1. What do you do when multiple dependency operators have a parameter named PASSWORD?

@porridge
Copy link
Member

porridge commented Apr 15, 2020

Then, when the kudo controller runs the deploy-bb task, it needs to:
...
feed that file as the value to the --parameter-file flag of the kudo install bb command

I don't quite understand: KUDO Instance controller is already running the BB deploy plan on an already existing instance-bb

Hang on, you what do you mean by "already existing"? aa depends on bb, so bb is only being installed now as part of aa being deployed.

and now you want to run kudo install bb --parameter-file?

That's what the Operator task does, no?

fetch the value of bb-params.yaml from the aa-x.y.z. OperatorVersion

1. I guess you deduct the parent operator from the `bb.meta.ownerReferences` that we set during the installation?

No, what makes you think so? bb does not even know that aa exists. aa explicilty passes parameters to bb.

1. What do you do when multiple dependency operators have a parameter named `PASSWORD`?

There is no clash. If aa depends on bb, cc and dd, and each of these dependencies have a parameter PASSWORD, then aa is free to define BB_PASSWORD, CC_PASSWORD and DD_PASSWORD.

Example: if a frobnicator operator depends on a mysql operator and on samba operartor, each of which has a notion of a password, then frobnicator exposes two parameters:

  • mysql_password and
  • samba_password

to the operator user.

@zen-dog
Copy link
Contributor Author

zen-dog commented Apr 15, 2020

Ok, talked offline about it and I'll update the KEP with the results.

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Copy link
Member

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really looks great. I'm amazed we managed to come up with something so simple for such a complex topic in general.

Some nitpicks inline, and I'm also wondering whether we should start thinking about garbage-collecting old unreferenced Operator and OperatorVersion resources.

keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Copy link
Member

@gerred gerred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I had some comments, more option questions, and one possible change (the dep graph root).


In the first step we build a dependency graph. A set of all graph vertices (which are task-operators) `S` is defined as `S = {AA, BB, CC, EE, GG}`. A transitive relationship `R` between the vertices is defined as `(a, b) ∈ S` meaning _`a` needs `b` deployed first_. The transitive relationship for the above example is: `R = { (AA,BB), (AA,CC), (BB,EE), (BB,GG) }`. The resulting topological order `O` is therefor `O = (EE, GG, BB, CC, AA)` which has no cycles.

The instance controller (IC) then starts with the execution of the top-level `deploy` plan of the operator `AA`. The first task is the `BB` operator-task. When executing it, IC creates the `Instance-BB` resource and ends current reconciliation. Next, IC notices new `Instance-BB` resource, starts new reconciliation, and executes the `deploy` plan of the operator `BB` which then creates `Instance-EE` resource. This way we are basically performing the depth-first search for the dependency graph, executing each vertex in the right order e.g. `EE` has to be healthy before `BB` deploy plan can continue with the next step `F`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that Terraform does that I really like is add a root node to that dependency graph as the "permanent" start point. This is handy so that you can have multiple "roots" (still non-cyclical!) that you can execute in parallel. Do we want to include that concept here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the model above, the one root is the AA deploy plan. However, it becomes interesting if it has multiple parallel phases, each then having an operator-task. In this case, we might actually build multiple graphs and execute them in parallel. I'll keep that in the back of my head and revisit once we get to the implementation. Good point 👍


Note that in the above example if e.g. `EE` and `CC` task-operators reference the same operator package they must use distinct instance names `spec.instanceName` so that two separate `Instance`s are deployed. Otherwise, the dependency graph will have a cycle.

##### Dependencies Parametrization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### Dependencies Parametrization
##### Dependencies Parameterization

Copy link
Contributor Author

@zen-dog zen-dog Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem that both notations exist, but wiki has Parametrization as the main one.


## Alternatives

One alternative is to use terraform and the existing [KUDO terraform provider](https://kudo.dev/blog/blog-2020-02-07-kudo-terraform-provider-1.html#current-process) to outsource the burden of dealing with the dependency graphs. On the upside, we would avoid the additional implementation complexity in KUDO _itself_ (though the complexity of the terraform provider is not going anywhere) and get [output values](https://www.terraform.io/docs/configuration/outputs.html) and [resource referencing](https://www.terraform.io/docs/configuration/resources.html#referring-to-instances) on top. On the downside, terraform is a heavy dependency which will completely replace KUDO UI. It is hard to quantify the pros and cons of both approaches, so it is left up for discussion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean against using Terraform for this. Semantically, KUDO operators are their own "concept". The KUDO Terraform Provider is still very valuable even with a built-in dependency engine, as it allows KUDO to be a first class concept if a user is powering their IaC with Terraform.

However, there are others doing IaC with tools such as Pulumi, Cluster API, Crossplane, and others. Treating KUDO semantics at a separate level allows those semantics to be exposed as a single unit to any prevailing tool rather than tying a critical, core feature to TF specifics.

keps/0029-operator-dependencies.md Show resolved Hide resolved
Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work... I would like to have some discussion around a few things...

current thought:

  1. I would prefer an overarching label or annotation that spans all instances of a multi operator install.
  2. most significant thought... is I would prefer to have a better scoping of params without a lot of burden on the operator dev.
  3. it would be nice to understand what will happen with plan status... will that show all status for all operators in a single view?
  4. "Other Plans" removes the scope of other plans... but couldn't we state that dependent operators will be able to be managed separately... for instance in a kakfa+zookeeper install... it should still be possible to run the backup plan of the zookeeper right? or are you explicitly saying that will not work?

* [Deployment](#deployment)
* [Client-Side](#client-side)
* [Server-Side](#server-side)
* [Parameterization](#dependencies-parametrization)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [Parameterization](#dependencies-parametrization)
* [Dependencies Parameterization](#dependencies-parameterization)


### Non-Goals

Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
repo: # optional, name of local repository configuration to use
appVersion: # optional, a specific app version in the official repo, defaults to the most recent one
operatorVersion: # optional, a specific operator version in the official repo, defaults to the most recent one
instanceName: # optional, the instance name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what guarantees do we make about instanceName. optional or not... if it is NOT supplied... do we only go with the default and if the default fails the install fails?
Does the instance-name have any linkage to the group? like kakfa-zookeeper-instance ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, It might be valuable to allow prefixes or some other linkage to the group. If I name my top-level operator abc, the dependency might/should have the name abc-zookeeper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's at the hand of the operator developer, right? If you want, you can name your instances accordingly. If not, KUDO will take the default name, like it does today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we either have to extend the defaulting (i.e. if the instance name zookeeper-instance is already taken, use zookeeper-instance1 or provide a way for the operator user to configure this.

Otherwise we run into the problem of having multiple operators that use the same dependency:

Operator AA uses zk and doesn't define an instanceName. zk gets installed as zookeeper-instance.
Operator BB uses zk and doesn't define an instanceName. zk wants to install as zookeeper-instance but it's already there and it fails. The user now has no way to install Operator BB. Or am I missing something?

Making the instanceName required may help, but... Not sure if that's the best way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally - If i want to install AA twice, with different instance names, I can do that for the top-level operator. But if the dependencies don't rely on the top-level-instance name, they will always clash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good points. Wdyt about making dependency instance names hierarchical? E.g. if the top-level operator instance is named kafka-prod then it's ZK instance is named kakfa-prod.zk-instance? This way, it is also immediately clear in the $ k get instances where the instance belongs to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd say that'd be a good idea. Not sure which separator we should use. . vs - vs _, but some kind of hierarchy would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice, but I'm not sure the schema for metadata.name will like the dots/underscores or the resulting long names in case of deeply nested operators...

Copy link
Contributor Author

@zen-dog zen-dog Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ̀docs say that an object name should:

  • contain no more than 253 characters
  • contain only lowercase alphanumeric characters, ‘-’ or ‘.’
  • start with an alphanumeric character
  • end with an alphanumeric character

Dots are fine but 253 characters might become an issue somewhere down the road. We could encourage short names for dependencies and drop the -instance suffix. I'll add a note in the KEP.

instanceName: # optional, the instance name
```

As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization).
As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parameterization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that you and Gerred insist on this notation. My dictionary and wiki suggest the other one: #1454 (comment)

Upon receiving a new operator Instance with dependencies KUDO mangers workflow engine will:

1. Build a [dependency graph](https://en.wikipedia.org/wiki/Dependency_graph) by transitively expanding top-level `deploy` plan using operator-tasks as vertices, and their execution order (`a` needs `b` to be installed first) as edges
2. Perform cycle detection and fail if circular dependencies found. We could additionally run this check on the client-side as part of the `kudo package verify` command to improve the UX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

keps/0029-operator-dependencies.md Show resolved Hide resolved
keps/0029-operator-dependencies.md Outdated Show resolved Hide resolved
keps/0029-operator-dependencies.md Show resolved Hide resolved
@zen-dog zen-dog mentioned this pull request Apr 17, 2020
4 tasks
@zen-dog zen-dog linked an issue Apr 17, 2020 that may be closed by this pull request
4 tasks
Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great conversation regarding dependency parameter management via zoom.
I would like to see a solution around convention over configuration... however this model was explained to be a simpler model to develop and will accelerate the feature of operator dependencies in order to get more user feedback. The idea of "scoping" parameters isn't excluded with this solution...
lets do this!

Copy link
Member

@ANeumann82 ANeumann82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this as is - As Ken said, if we need scoping parameters at some point, we can still add them, so 🚀

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@zen-dog zen-dog merged commit 62921fc into master Apr 22, 2020
@zen-dog zen-dog deleted the ad/kep-30 branch April 22, 2020 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator Dependencies
7 participants