Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Versioning compatibility in the future #202

Closed
frodopwns opened this issue Feb 16, 2021 · 7 comments
Closed

API Versioning compatibility in the future #202

frodopwns opened this issue Feb 16, 2021 · 7 comments
Labels
question Further information is requested

Comments

@frodopwns
Copy link
Contributor

Describe the feature

Backward compatibility. In the future, the API for the custom resources this project uses may change. Many maintainers of operators choose not to implement semver or versioning at all. This can lead to painful releases.

Luckily, kubebuilder has a system in place for managing this: https://book.kubebuilder.io/multiversion-tutorial/conversion-concepts.html

What would the new user story look like?

A cluster maintainer maintains a cluster running version 1.0 of Capsule. That maintainer's customers would like the new functionality in version 2 of Capsule.

When the maintainer deploys Capsule 2.0, the existing CRDs get patched to add the new CRD version rather than replacing it. Any existing instances should still be accessible and the new version of the controllers should handle conversion between 1.0 CRs and 2.0 CRs seamlessly.

How would the new interaction with Capsule look like? E.g.

  1. What are the prerequisites for this?
    Each new version of a CRD would require a new folder in ./api with ConvertTo and ConvertFrom methods that handle conversion to the stored version of the CRD.

Kustomize templates would need changing to add the conversion webhooks.

Is there a plan for this already in place?

@frodopwns frodopwns added the blocked-needs-validation Issue need triage and validation label Feb 16, 2021
@prometherion prometherion added question Further information is requested and removed blocked-needs-validation Issue need triage and validation labels Feb 17, 2021
@prometherion
Copy link
Member

Thanks for opening this issue/question, really appreciated it, also considering the hot topic!

At the current state, Capsule is adding features upon request, or encountering minor fixes that are not affecting the base schema, e.g. the allowed external service IP to mitigate CVE-2020-8554: that's it, pretty easy since we're adding fields, these are nullable so no side effects are expected.

If I understood correctly the documentation, upon conversion webhook setup, the Kubernetes API Server should be able to return the converted object even getting with the old versioning as occurred with the ingresses.networking.k8s.io/v1beta1 from ingresses.extensions/v1beta1 resource: I'm not sure this is going to work also for the Shared Informer, although I'd say it should since the underlying implementation is just a watch op towards the API server, I hadn't the time to focus on this also considering it's not our priority in this current phase.

Anyway, if this would be true, we should be allowed to hack just a little bit the current Tenant controller codebase interacting directly with the new version, dropping eventually unsupported functionalities introduced with newer versions when dealing with old versions.

I'm aware of the fact we pretty sure are going to face changes in the upcoming future, as long as the CRD API version change from v1alpha1 to v1beta1 or whatever: this has been a topic I had to focus on while working for other companies, at that time there wasn't so much info or code-base to address the issue, so I ended up with the following idea.

What I wanted to do is to provide a domain interface, totally abstracted from the CRD version: each underlying version is going to implement the domain functions as retrieving the allowed hostnames, IP, the NetworkPolicies to apply, etc. Said so, I take a look at the proper section of kubebuilder CRD versioning and it seems that what we just need to do is implementing the Convertible interface.

I haven't tested yet if we still need two controllers, one per CRD version, but what popped in my mind while typing this reply is the interaction with the webhook since they're versioned:

https://github.com/clastix/capsule/blob/e3b927f1122cc6fd48030da807c2b094f5c5819c/pkg/webhook/tenant/validating.go#L32

I know I wrote a lot of stuff, so adding a tl;dr; here! Right now, we don't have anything in place, once the problem will be there we'll implement the smoothest and neat implementation without decreasing the DX and reliability of the Operator (sic: without providing downtime or deletion of resources upon CRD version conversion).

@frodopwns
Copy link
Contributor Author

You shouldn't need additional controllers per version. The last time I used this, installing a new CRD version caused kubernetes to use the conversion hook to upgrade all existing instances. Of course, in that case, I was updating the stored version to the new version. In theory, you could leave the oldest version as the stored version which would mean new CR instances get converted downward before operations happen.

As long as the controller operates under the assumption that it is working on the hub version you should only need one controller. The Kube API converts your instance to that stored version before returning it to you.

Thanks for taking the time to respond! I know the conversion stuff doesn't matter too much early on for an operator's development but once in production, it becomes essential.

@prometherion
Copy link
Member

Thanks for sharing all your insights @frodopwns, really appreciated it, also because the topic, as I said, it's hot and tough for most people, and getting feedback is definitely gold!

You shouldn't need additional controllers per version

This is great news!

I know the conversion stuff doesn't matter too much early on for an operator's development but once in production, it becomes essential

Right now we have some companies running Capsule in production and, honestly, the issue has been raised several times, and I couldn't agree more on managing the lifecycle of CRDs at its best, automating it.


Instead, regarding webhooks, did you have the chance to test if the same logic (v1 >>> hub >>> v2) is applied to the webhooks?

@frodopwns
Copy link
Contributor Author

I haven't worked with webhooks much but I suspect you would need to implement a new hook for each version.

The hooks are attached to the versioned API struct itself:

func (r *CronJob) SetupWebhookWithManager(mgr ctrl.Manager) error {
    return ctrl.NewWebhookManagedBy(mgr).
        For(r).
        Complete()
}

// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *CronJob) Default() {
    cronjoblog.Info("default", "name", r.Name)

    if r.Spec.ConcurrencyPolicy == "" {
        r.Spec.ConcurrencyPolicy = AllowConcurrent
    }
    if r.Spec.Suspend == nil {
        r.Spec.Suspend = new(bool)
    }
    if r.Spec.SuccessfulJobsHistoryLimit == nil {
        r.Spec.SuccessfulJobsHistoryLimit = new(int32)
        *r.Spec.SuccessfulJobsHistoryLimit = 3
    }
    if r.Spec.FailedJobsHistoryLimit == nil {
        r.Spec.FailedJobsHistoryLimit = new(int32)
        *r.Spec.FailedJobsHistoryLimit = 1
    }
}

And I believe the webhook gets called before the conversion would happen at storage time.

@frodopwns
Copy link
Contributor Author

I have an example multi-version repo here: https://github.com/Azure/azure-service-operator
Could come in handy for reference.

One interesting thing about the hub and spokes. Which version you choose as your hub can have ramifications.

I chose to make my newest version always be the hub version. I did this because when you set an old version as the hub and add new fields to your models, Kubernetes needs to store them in the old version. This is done by adding the new fields into labels when converting to the hub version.

Another side-effect is automatic updates to old CRs. When you implement conversion hooks and set your stored CRD version to the new version, Kubernetes converts the old CRs and resaves them. Could be dangerous...

The downside to updating the stored hub) version is all the code that needs to be updated when you release a version.

@prometherion
Copy link
Member

I have an example multi-version repo here: https://github.com/Azure/azure-service-operator
Could come in handy for reference.

Again, thanks a lot! <3 This is absolutely gold!

I chose to make my newest version always be the hub version

I couldn't agree more.

The downside to updating the stored hub) version is all the code that needs to be updated when you release a version.

It's a pay-off we can easily consider while upgrading, legit.

@prometherion
Copy link
Member

Closing since #317 is landing on master, thanks a lot for the useful inputs @frodopwns!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants