Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 33 - Structured parameters #1666

Merged
merged 17 commits into from
Sep 25, 2020
Merged

Conversation

ANeumann82
Copy link
Member

Signed-off-by: Andreas Neumann aneumann@mesosphere.com

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82 ANeumann82 marked this pull request as draft August 31, 2020 10:24
zen-dog and others added 10 commits August 31, 2020 21:18
…escription

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Updated defined types in Spec
Added small description to wizard and listing section

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
…cussions

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
…igration

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass

keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
keps/0033-structured-parameters.md Outdated Show resolved Hide resolved

### Non-Goals

Extend the current format to support new features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a goal or a non-goal to provide recommended groupings? I ask because there is UX value in having standard naming for things like "data node" and "jvm" etc. If not a goal here... perhaps a mention that there is value in a future kep based on this feature for grouping and standardizing the user experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a goal or a non-goal to provide recommended groupings?

This isn't a goal at this point. However, this should be possible via extra "ui:xxx": ... properties on parameters.

"$schema": "http://json-schema.org/draft-07/schema#"
apiVersion: kudo.dev/v1beta2
properties:
Zookeeper:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be Camel case? or Sentence case? can it be 2 words combined or not and how is that managed? What about the display name... is it dependent on the casing for the property? and does the casing impact the setting of the value from the value file or param passing by CLI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be Camel case? or Sentence case?

We don't have an established policy at this point so it's up for debate. But yes, it is case-sensitive (in the CLI and in templates)

type: "string"
description: "host and port information for Zookeeper connection. e.g. zk:2181,zk2:2181,zk3:2181"
default: "zookeeper-instance-zookeeper-0.zookeeper-instance-hs:2181"
required: ["URI"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics of this required seems different to me... the value of the required before was to ensure that a value for that property was provided... and was mainly useful when a default was NOT provided. When a default was provided it was used.

Defining required in a schema feels different in that if a default value is available, the user does not need to provide it... but if you have a value file what you wanted to use the schema to validate it... it would fail because the required element is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I guess, we'll have to enrich the values with the defaults from the schema and then validate it 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - we have to merge the data before validating. There are discussions about this, and it seems to be consensus that validation will (usually) ignore default values.

keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
### Non-Goals

Extend the current format to support new features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally will the grouping / hierarchy be limited? and is that a goal or non-goal? I believe dcos params were limited to 1 level deep... so it was possible to set a grouping or category of information. This again also has an impact on the UI team and the UX... it is really hard to support infinite nesting of properties... but it is reasonable to expect 1 or 2 levels. One can expect that the UI team would create a simple tab for each category and list all the properties in that category... challenging to imagine what infinite nesting looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting the hierarchy is not a goal at this point. This is probably better solved "by convention" anyway. Also, dcos C* has e.g. four levels.


### CRDs

Normally, per Kubernetes conventions, CRDs would evolve over released versions without a breaking change from one version to the next, allowing clients to gradually convert to new CRD versions. This requires for a CRD to be convertible from the old version to new _and vice versa_ for all clients (new and old) to be able to read/write both old and new CRD versions. However, such a conversion in both directions will not be possible with this change, as JSON-schema supports a lot of configurations which will not be mappable from a flat list.
Copy link
Contributor

@alenkacz alenkacz Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is right, I have to do some reading. I wonder if we should not have really new version of API/different API when we're doing such a breaking change. Is one way conversion a norm?

Copy link
Contributor

@alenkacz alenkacz Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, this is what the docs say for beta features: The schema and/or semantics of objects may change in incompatible ways in a subsequent beta or stable release. When this happens, we will provide instructions for migrating to the next version. This may require deleting, editing, and re-creating API objects. The editing process may require some thought. This may require downtime for applications that rely on the feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at the same time I found this: In Kubernetes, all versions must be safely round-tripable through each other. This means that if we convert from version 1 to version 2, and then back to version 1, we must not lose information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that btw. they propose that you need to change v1 and enhance it in a way so that it can contain all the information from v2. Can you imagine doing that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhancing the v1beta version so it can contain all the data from v1beta2 was my first idea, but that would mean we'd have to restrict the usage of json-schema to a quite limited subset. So, realistically, I don't see that as a viable option.

The all versions must be safely round-tripable is true, but only for all served versions. The current idea is to deprecate v1beta immediatly and only serve v1beta2. In the code, we still need to have v1beta1, but only for migration purposes.

This ensures that only conversions from v1beta to v1beta2 are done, never the other way around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I cannot really evaluate how good that option is and if it poses some problems, I would have to do a lot of reading. All the sources I found always talk about round-tripability as main feature for all fields https://groups.google.com/g/kubernetes-sig-api-machinery/c/07JKqCzQKsc?pli=1 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have some examples/reading about someone who also did it this way? or basically any conversion webhooks in the wild?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For examples: The only real project I know that uses multiple CRD versions is cert-manager, and I looked at their code for some of the things. They do register a conversion webhook, but fully rely on the default schema conversion without any custom code. I haven't looked into their CRDs to see how closely they follow the full-roundtrip rule. I assume close, as they might serve multiple/all versions.
I don't know any project that does "one-way-conversion" or migration though.

And yeah. The solution described here is not optimal and breaks the Kubernetes API compatibility rules, but it shouldn't impose any technical problems.

The alternatives would be one of these:

  1. Make it a fully breaking change - Ignore existing installations and just change the CRD. I'm very opposed to this solution.
  2. Do it following the K8s API rules - This would involve multiple new CRD versions over multiple KUDO releases and not really change much - The current released clients still won't work with the final CRDs, but there is a time period in between where we support both old and new model where clients can switch. I argue that we don't have many clients yet, and are not that integrated in existing structures, so we can get away with skipping this migration period.


The server-side of the proposal includes:
- Updating `Operator` and `Instance` CRDs to use JSON-schema as the parameter format
- Providing a conversion-webhook to convert in-cluster running operators to the new format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a reminder that this will bump minimal version for kudo to 1.16 while we've been keeping it 1.15 (is konvoy 1.16 out?). I don't think it changes everything just noting :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's feature gated and on by default for 1.15. But yes, we should probably mention it somewhere in the KEP. (https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#webhook-conversion)

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
1. Deploy the new CRD version where the new (`v1beta2`) version will be the _only_ served and stored version:
- `v1beta1` will be still in the list, but with `storage=false, served=false`
- `v1beta2` will be marked as `storage=true, served=true`
2. Deploy the new manager with the conversion webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally deploy manager before the crd? so there's already the conversion implementation in place when the crd comes.

Copy link
Member Author

@ANeumann82 ANeumann82 Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I'm not sure. Starting the manager first would mean the manager would try to start reconciliations, or maybe couldn't even start because it tries to register a watch on Instance with v1beta2 which is unknown to the cluster if the CRDs are not deployed first

Maybe this is the reason that projects separate the manager and the conversion webhook? That's another option. I would prefer to keep it in the manager, but maybe we have to make it a separate deployment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see, maybe. but that would make our upgrade process even more complicated. Can we even right now express the steps like this easily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the steps here can be implemented pretty easy. I've updated the section a little bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally deploy manager before the crd?

Like Andreas mentioned, you need the new CRD version before the manager can register itself to watch them, right? And the resources has to be converted too at this moment. So the whole update process will be something like:

  1. Kill the manager (or maybe step 3 can do it for the user?)
  2. kubectl krew upgrade kudoctl to upgrade the kudoctl
  3. kudo init --update to install new CRDs, run the new manager+webhook, and trigger the migration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kudo init --upgrade kills the manager already, so there's no need to manually shut it down. The whole proces described here is internal to the kudo init --upgrade

Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well researched and very detailed KEP, thank you for that 👏

I have done my first pass, I have couple of comments, I think my only big concern is this #1666 (comment)

Other than that, it looks like a lot of work and it's a breaking change, so the only thing I question is whether it's worth it and some easier but much less nicer solution would not be good-enough.

}
```

A better solution than `extensionapi.JSON` would be to have a typed structure, but CRDs currently cannot describe recursive data structures, as they [don't support $ref or $recursiveRef](https://github.com/kubernetes/kubernetes/issues/54579). The only way to define an arbitrary JSON structure in a Kubernetes API is with `apiextension.JSON`. The resulting CRD for the operator version will therefore look like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is fine on the CRD level. What would be nice is to have a method on OV to get you that strongly typed version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We will have to figure out how this strongly typed structure will look like - Maybe we can use something existing from a json-scheme go library, maybe we have to wrap that structure to make it easily usable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I found this json-schema validator has this type Schema struct. But dragging this into the OperatorVersion ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find 👍 We can't use that in the OV itself, because it won't get rendered in the CRD because of the missing capabilities for recursion. But we could maybe use it internally

keps/0033-structured-parameters.md Show resolved Hide resolved
foo: "bar"
```

// TODO: it seems that the conversion webhook doesn't have access to the cluster resources (at least by default). How do we unwrap the values (e.g. `MAP_PARAM`) without access to the `OperatorVersion` JSON-schema?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an issue, you just inject client there as we do with the other webhook. I can't think of why that would not work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit annoying, because (at least in my current PoC) the conversion is mostly autogenerated and injecting the client may be a bit ugly. But nothing that can't be solved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, I did not see that code but yeah, should be possible :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is here, if you want to take a look: https://github.com/kudobuilder/kudo/blob/an/add_v1beta2_and_conversion/pkg/apis/kudo/v1beta1/conversion.go

It's a proof-of-concept branch and won't be merged, it's more about exploratory development ;) The conversion there isn't the real conversion either, of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's similar to what we have done with admission. So for admission, there's this Validator interface that's nicely wired in through manager and everything. But that does not have client and supports only simple use cases. I think for complex use cases we will have to reimplement some of the stuff by ourselves basically hooking in here https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/webhook/conversion/conversion.go#L92

I think that's very much in line with what we have here. https://github.com/kudobuilder/kudo/blob/main/pkg/webhook/instance_admission.go#L45

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to reimplement that, but to use the code from the controller-runtime. The problem is, both patterns (conversion-gen that uses the schema to convert, and controller-runtime which uses Hub and Converter interfaces) do not allow for additional metadata to be plugged into the conversion process... Meh.

In the end it's an implementation details, but it would be nice if we could not reinvent the wheel for once ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that's actually the recommended process. I talked with controller-runtime people about this for the admission part. They provide these simplified interfaces with no metadata for the basic use. For anything more complex, you're on your own and yes, you have to handle the request/response object by yourself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of having a basically side-effect free conversion function. So maybe we can keep it simple and just try to unmarshal the values into either type and take one that was successful. Brute force all things! :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll have to try it - Would be nice if it works without external data

keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
### Trigger

With the list of parameters, each parameter had a trigger attribute which specified a plan to trigger when the parameter was changed. With the nested structure, this gets more complicated as one "parameter branch" might have multiple different triggers defined which is clearly not something we want. Here is the new rule set:
- Every level of the structure *may* specify a trigger.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not only leaves to have a trigger? And no inheritance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an option, but at the same time it would

  • lead to a lot of repeated trigger attributes
  • if we assume the default "When it's not set, it's deploy" rule, this could be easy to forget for params of other plans. I think it's easier to only specify the trigger on the /Backup object, and all parameters below that trigger the backup plan.

Only triggers on leaves is a valid strategy here, I think - Maybe inheritance is a premature optimization, i'm not fully sure. I still think i'd be nice to have it, from an operator developers perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I am just afraid of some ambiguity in rich trees of params. But that said I don't even know how much is trigger used in the wild, if people usually leave it to default or ... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends - It's mostly used on custom plans, i'm not sure how widely they are used. But I think custom plans is one of the USPs of KUDO, the ability to trigger a backup plan, or to execute a custom action is something that's not easily possibly with other solutions, and I think we should embrace it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I actually don't feel strongly either way about this, it's definitely not a blocker for me

Copy link
Contributor

@zen-dog zen-dog Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not only leaves to have a trigger? And no inheritance

I tend to agree. While @ANeumann82 is right and it will "lead to a lot of repeated trigger attributes" this is no different from what we have today. So, on the other hand, we save the operator developer some lines by defining it at the higher level, on the other, there is an increased mental tax to pay when figuring out the trigger inheritance.

I think I'd rather save brain cycles than YAML lines. And we could always relax the rule and introduce inheritance later.

keps/0033-structured-parameters.md Outdated Show resolved Hide resolved
@ANeumann82 ANeumann82 marked this pull request as ready for review September 7, 2020 14:04
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82
Copy link
Member Author

Other than that, it looks like a lot of work and it's a breaking change, so the only thing I question is whether it's worth it and some easier but much less nicer solution would not be good-enough.

That's a valid concern, and I'm not sure how to resolve that. One reason for doing it this way would be that a lot of the work we will have to do anyway at some point. For example additional CRD versions, conversion webhook, etc.

If we build this now it's a lot easier to utilize for smaller changes later on.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@alenkacz
Copy link
Contributor

alenkacz commented Sep 8, 2020

@ANeumann82 I think I would like to see at least one alternative. One of the "uglier" solution to be on a side of this and then we can compare "ugliness" vs the amount of work. WDYT?

Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any huge concerns with this approach other than "should we do it" and "is it worth it". So let's merge and figure out those two questions when we move this to implementable?

I think we leave KEPs in provisional open for too long and not merge

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82 ANeumann82 merged commit 8d66df8 into main Sep 25, 2020
@ANeumann82 ANeumann82 deleted the an/kep-33-structured-parameters branch September 25, 2020 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants