New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-29: Operator dependencies #1454
Conversation
This KEP aims to improve operator user and developer experience by introducing operator dependencies. Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty straightforward and elegant solution to dependencies 👍 I'll keep thinking about it in the back of my head but sounds good on first read :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great and well thought-out!
The only sore point is parameter handling for dependencies. It doesn't really fit into the current architecture of KUDO and there's nothing we can do about it. Only workarounds like the ones you're suggesting would work. To fully solve this we would need a different approach to KUDO's controllers. Something which, of course, cannot be part of this KEP. My suggested approach is to get rid of Operator
and OperatorVersion
and only keep Instance
. Furthermore all template rendering would be done in the client and an Instance
would consist of already rendered templates. As a result, the instance controller would be much simpler and it's only concern would be to apply resources (from the rendered template) at the respective steps. With this, all dependency management would be done on the client side.
I agree. Wdyt about this alternative? @nfnt |
@zen-dog Yes, that sounds like a good approach to solve this in the model we have right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks and style improvements inline. However my biggest concern is with parameter passing. Here's my take on it:
KUDO should encourage operator composition by providing a way of operator encapsulation. In other words, operator users should not be allowed to arbitrarily modify the parameters of embedded operator instances. In the proposed model, where we do not support depending on pre-existing Instances
, the set of instances that are depended on by the top-level one is an implementation detail.
Therefore I propose that we keep the current syntax and semantics for -p
and -P
and only allow the operator user to set the parameters on the top-level operator instance. If and when the operator developer wants to provide a way to customize these embedded instances, they can do so using the following mechanism:
The proposed new task type has a parameterFile
attribute, which is defined to be the name of a file in the templates
directory of the operator package. This file is expanded using the same templating mechanism as other templates, which could include (among other things) values of top-level-operator's parameters, possibly provided by the operator user. The expanded file is fed into the instance installation as parameter file content.
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
While I agree with the desire of the operator encapsulation I'm not sure if the above-described approach (or the way I understood it) will work. Let's go through an example: say we have a top-level operator - name: deploy-bb
kind: Operator
spec:
package: BB
parameterFile: bb-params.yaml and the BB-PASSWORD: secret The ...
spec:
parameters:
PASSWORD: {{.Instance-aa.BB-PASSWORD}} where Now, when executing the There are two issues with this approach off the top of my head:
Summary: |
I got lost where you said "we have to modify the If the bb operator requires a parameter
Now, let's say a user of I understand that:
Then, when the kudo controller runs the
voila! |
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
I don't quite understand: KUDO Instance controller is already running the BB
module "bb" {
password = "${var.bb_password}
}
|
Hang on, you what do you mean by "already existing"? aa depends on bb, so bb is only being installed now as part of aa being deployed.
That's what the
No, what makes you think so? bb does not even know that aa exists. aa explicilty passes parameters to bb.
There is no clash. If aa depends on bb, cc and dd, and each of these dependencies have a parameter Example: if a
to the operator user. |
Ok, talked offline about it and I'll update the KEP with the results. |
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really looks great. I'm amazed we managed to come up with something so simple for such a complex topic in general.
Some nitpicks inline, and I'm also wondering whether we should start thinking about garbage-collecting old unreferenced Operator
and OperatorVersion
resources.
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. I had some comments, more option questions, and one possible change (the dep graph root).
|
||
In the first step we build a dependency graph. A set of all graph vertices (which are task-operators) `S` is defined as `S = {AA, BB, CC, EE, GG}`. A transitive relationship `R` between the vertices is defined as `(a, b) ∈ S` meaning _`a` needs `b` deployed first_. The transitive relationship for the above example is: `R = { (AA,BB), (AA,CC), (BB,EE), (BB,GG) }`. The resulting topological order `O` is therefor `O = (EE, GG, BB, CC, AA)` which has no cycles. | ||
|
||
The instance controller (IC) then starts with the execution of the top-level `deploy` plan of the operator `AA`. The first task is the `BB` operator-task. When executing it, IC creates the `Instance-BB` resource and ends current reconciliation. Next, IC notices new `Instance-BB` resource, starts new reconciliation, and executes the `deploy` plan of the operator `BB` which then creates `Instance-EE` resource. This way we are basically performing the depth-first search for the dependency graph, executing each vertex in the right order e.g. `EE` has to be healthy before `BB` deploy plan can continue with the next step `F`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that Terraform does that I really like is add a root node to that dependency graph as the "permanent" start point. This is handy so that you can have multiple "roots" (still non-cyclical!) that you can execute in parallel. Do we want to include that concept here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the model above, the one root is the AA
deploy
plan. However, it becomes interesting if it has multiple parallel phases, each then having an operator-task. In this case, we might actually build multiple graphs and execute them in parallel. I'll keep that in the back of my head and revisit once we get to the implementation. Good point 👍
|
||
Note that in the above example if e.g. `EE` and `CC` task-operators reference the same operator package they must use distinct instance names `spec.instanceName` so that two separate `Instance`s are deployed. Otherwise, the dependency graph will have a cycle. | ||
|
||
##### Dependencies Parametrization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
##### Dependencies Parametrization | |
##### Dependencies Parameterization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem that both notations exist, but wiki has Parametrization as the main one.
|
||
## Alternatives | ||
|
||
One alternative is to use terraform and the existing [KUDO terraform provider](https://kudo.dev/blog/blog-2020-02-07-kudo-terraform-provider-1.html#current-process) to outsource the burden of dealing with the dependency graphs. On the upside, we would avoid the additional implementation complexity in KUDO _itself_ (though the complexity of the terraform provider is not going anywhere) and get [output values](https://www.terraform.io/docs/configuration/outputs.html) and [resource referencing](https://www.terraform.io/docs/configuration/resources.html#referring-to-instances) on top. On the downside, terraform is a heavy dependency which will completely replace KUDO UI. It is hard to quantify the pros and cons of both approaches, so it is left up for discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean against using Terraform for this. Semantically, KUDO operators are their own "concept". The KUDO Terraform Provider is still very valuable even with a built-in dependency engine, as it allows KUDO to be a first class concept if a user is powering their IaC with Terraform.
However, there are others doing IaC with tools such as Pulumi, Cluster API, Crossplane, and others. Treating KUDO semantics at a separate level allows those semantics to be exposed as a single unit to any prevailing tool rather than tying a critical, core feature to TF specifics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work... I would like to have some discussion around a few things...
current thought:
- I would prefer an overarching label or annotation that spans all instances of a multi operator install.
- most significant thought... is I would prefer to have a better scoping of params without a lot of burden on the operator dev.
- it would be nice to understand what will happen with plan status... will that show all status for all operators in a single view?
- "Other Plans" removes the scope of other plans... but couldn't we state that dependent operators will be able to be managed separately... for instance in a kakfa+zookeeper install... it should still be possible to run the backup plan of the zookeeper right? or are you explicitly saying that will not work?
* [Deployment](#deployment) | ||
* [Client-Side](#client-side) | ||
* [Server-Side](#server-side) | ||
* [Parameterization](#dependencies-parametrization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [Parameterization](#dependencies-parametrization) | |
* [Dependencies Parameterization](#dependencies-parameterization) |
keps/0029-operator-dependencies.md
Outdated
|
||
### Non-Goals | ||
|
||
Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
repo: # optional, name of local repository configuration to use | ||
appVersion: # optional, a specific app version in the official repo, defaults to the most recent one | ||
operatorVersion: # optional, a specific operator version in the official repo, defaults to the most recent one | ||
instanceName: # optional, the instance name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what guarantees do we make about instanceName. optional or not... if it is NOT supplied... do we only go with the default and if the default fails the install fails?
Does the instance-name have any linkage to the group? like kakfa-zookeeper-instance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, It might be valuable to allow prefixes or some other linkage to the group. If I name my top-level operator abc
, the dependency might/should have the name abc-zookeeper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's at the hand of the operator developer, right? If you want, you can name your instances accordingly. If not, KUDO will take the default name, like it does today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we either have to extend the defaulting (i.e. if the instance name zookeeper-instance
is already taken, use zookeeper-instance1
or provide a way for the operator user to configure this.
Otherwise we run into the problem of having multiple operators that use the same dependency:
Operator AA uses zk and doesn't define an instanceName. zk gets installed as zookeeper-instance
.
Operator BB uses zk and doesn't define an instanceName. zk wants to install as zookeeper-instance
but it's already there and it fails. The user now has no way to install Operator BB. Or am I missing something?
Making the instanceName required may help, but... Not sure if that's the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally - If i want to install AA twice, with different instance names, I can do that for the top-level operator. But if the dependencies don't rely on the top-level-instance name, they will always clash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all good points. Wdyt about making dependency instance names hierarchical? E.g. if the top-level operator instance is named kafka-prod
then it's ZK instance is named kakfa-prod.zk-instance
? This way, it is also immediately clear in the $ k get instances
where the instance belongs to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd say that'd be a good idea. Not sure which separator we should use. .
vs -
vs _
, but some kind of hierarchy would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice, but I'm not sure the schema for metadata.name
will like the dots/underscores or the resulting long names in case of deeply nested operators...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ̀docs say that an object name should:
- contain no more than 253 characters
- contain only lowercase alphanumeric characters, ‘-’ or ‘.’
- start with an alphanumeric character
- end with an alphanumeric character
Dots are fine but 253 characters might become an issue somewhere down the road. We could encourage short names for dependencies and drop the -instance
suffix. I'll add a note in the KEP.
instanceName: # optional, the instance name | ||
``` | ||
|
||
As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization). | |
As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parameterization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that you and Gerred insist on this notation. My dictionary and wiki suggest the other one: #1454 (comment)
Upon receiving a new operator Instance with dependencies KUDO mangers workflow engine will: | ||
|
||
1. Build a [dependency graph](https://en.wikipedia.org/wiki/Dependency_graph) by transitively expanding top-level `deploy` plan using operator-tasks as vertices, and their execution order (`a` needs `b` to be installed first) as edges | ||
2. Perform cycle detection and fail if circular dependencies found. We could additionally run this check on the client-side as part of the `kudo package verify` command to improve the UX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great conversation regarding dependency parameter management via zoom.
I would like to see a solution around convention over configuration... however this model was explained to be a simpler model to develop and will accelerate the feature of operator dependencies in order to get more user feedback. The idea of "scoping" parameters isn't excluded with this solution...
lets do this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this as is - As Ken said, if we need scoping parameters at some point, we can still add them, so 🚀
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
This KEP aims to improve operator user and developer experience by introducing operator dependencies.
Signed-off-by: Aleksey Dukhovniy alex.dukhovniy@googlemail.com
Co-authored-by: Marcin Owsiany marcin@owsiany.pl