Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-21: Upgrading KUDO #1124

Merged
merged 22 commits into from
Sep 22, 2020
Merged

KEP-21: Upgrading KUDO #1124

merged 22 commits into from
Sep 22, 2020

Conversation

ANeumann82
Copy link
Member

What this PR does / why we need it:
At the moment, we do not have a process or code for upgrading KUDO.

This is an early draft to invite discussion.

@ANeumann82 ANeumann82 added the kind/kep Kudo Enhancement label Dec 2, 2019
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a first pass, I need to process/think about it now. Few high level ideas that might simplify some parts:

  • I think a lot of migration code could be just updating the resources in the cluster, pretty much doing kudo init -o yaml --dry-run | kubectl apply -f -
  • I think using conversion webhooks will simplify this a lot and if we commit to always use it for newer version it will make our life easier, I think the supported k8s version is not limiting us here

Also probably worth some thought - what if we supported upgrades via helm? Would that actually help us in any way? How would that look like?


## Open Questions
- Lowest supported K8s version
- Do we want to support downgrades?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no right now, definitely not in this kep - on the other hand, what if I upgrade but need to rollback again? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't really want to support it either, but I think it's important enough to have a (small) discussion about it. And if we decide to go with the conversion webhooks, it might not be that difficult to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh we have to do conversion webhooks - do you feel like there's any reason not to do them?

## Open Questions
- Lowest supported K8s version
- Do we want to support downgrades?
- Split versions between KUDO manager and KUDO CLI?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you write more about why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're basically two different programs that only communicate via CRDs. It might make something easier having split versions, for example bug fixes in manager only without having a new KUDO CLI version that does not change anything in the CLI.

But it would introduce a new compatibility matrix, and having "empty" version changes on manager or CLI isn't that big of an issue. I lean towards having a single version for both parts of KUDO.

Copy link
Contributor

@alenkacz alenkacz Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, me too I think until there's a compelling reason to split them, having them as one is a big win, I would almost go as far as resolve this question with - no :) especially since there's no real as for this and the scope of this KEP is already pretty big

Versioned: No, but closely tied to KUDO manager

The KUDO manager has a set of prerequisites that need to be present for the manager to run successfully. They are
the least likely to change, but probably the most specific. If we make changes here, we need to implement custom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way - new features might actually touch those, e.g. take a look at my webhook PR. I wonder what is a good strategy for those. I mean in there I:

  • added new prereqs, that's easy, you just create those on upgrade
  • updated manager - could we just patch the manager in that case?

Also could you maybe show how would the custom migration code look like? I think the webhook is a great example

Each migration should have a validate-step that checks if the migration is possible.

**Alternative update process**
- Can we just delete them all and reinstall them? Probably not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about updating them? just as you do with any other resource, what are the challenges with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updating is probably the only possible way. The idea with "Delete all and install fresh" would be to not do migrations but to always just install the expected state, but I don't think that's really possible.


- Use semantic versioning for the manager binary
- ToBeDiscussed: Use different versioning for Manager and CLI?
- The manager needs to be shut down before updating the CRDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it? why? also are we talking about updating crd of the same version or updating to newer version or both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - this assumes we're not using the conversion web hook, and therefore can't serve different versions of a CR and need to migrate them manually. In that case, having a manager running that tries to read or modify the old version of CRs would be problematic.

Regarding updating CRDs with the same version - Is that something that is commonly used? I can imagine updating a CRD with new fields, assuming it's backwards compatible and defaults are provided, but in which case is the version of a CRD updated then? Only in case of breaking changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read the K8s documentation on API changes. (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md) Explains a lot. I'll update the KEP.

fields.
- Existing CRs need to be migrated to new versions
- K8s CRD support
- Which K8s version do we want/need to support?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is kind of given by what kubernetes version we support in which version of manager? Maybe we should somehow define that as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it mostly comes down to: Do we want to use conversion webhooks or not.

- MultiVersion is supported since 1.11 (manual conversion)
- WebHook conversion since 1.16
- WebHook conversion would allow us to transparently switch to a new CRD version without manually migrating all existing CRs
- CRD versioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably rather stick with one version if we can for now to not make things compilcated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, if we don't enforce the same requirements on CRD stability as k8s itself, this might be possible. We would have to enforce that KUDO CLI is at least the same version as the installed KUDO manager though, otherwise and older CLI version could drop fields from a CR that it doesn't know about...

#### Proposal for update process
Integrated into `kudo init --upgrade`

To support older K8s versions, no WebHook conversion is used. We only serve a single CRD-Version and migrate existing CRs in the update process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're fine in terms of kube version here Webhook conversion is available as beta since 1.15, and as alpha since Kubernetes 1.13.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means, the operating cluster needs to have webhook conversion explicitly enabled for KUDO to work. I'm not sure if we want to make this a requirement for KUDO, tbh.

- Do we allow an older KUDO CLI to be used with a newer KUDO installation?

#### Proposal for update process
User has to download newest KUDO version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think replacing the binary or dowing upgrade via brew is good enough here 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think this is the most simple part of the update process :)

- Every migration step after the installed version is executed
- Every migration step only migrates the prerequisites from the previous version to the marked version of the migration

2. Have only one setup/update version in KUDO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should start with this and introduce migrations when there's very compelling reason to do so. I can see all our current migration needs in the last versions to be covered with this.

@kensipe kensipe mentioned this pull request Jan 6, 2020
Added section about operator compatibility

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82 ANeumann82 marked this pull request as ready for review January 7, 2020 16:22
We may arrive at a point where we need to implement custom migration logic between KUDO versions, but so far this would mean a lot of
implementation effort without a defined use case.

### KUDO Manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any expectation on the manager to re-process plans that have been run on previous versions? I'm thinking of scenarios where there might be a bug in a previous version that would have caused the plan to not be executed correctly, and want to re-process the plans to correct the mistake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would be a good idea to do automatically. For some plans that may be ok, but others not as much - restore plans, node replaces, etc.

There will be some kind of framework to write migrations, I guess, to fix major issues that need to be fixed, but nothing like just re-processing plans.

@kensipe kensipe added this to the 0.12.0 milestone Mar 10, 2020
@kensipe
Copy link
Member

kensipe commented Mar 10, 2020

@ANeumann82 ping... what is next here

@kensipe kensipe modified the milestones: 0.12.0, 0.13.0 Apr 9, 2020
@kensipe
Copy link
Member

kensipe commented Apr 9, 2020

retargeting for 0.13.0

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
x
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@kensipe kensipe changed the base branch from master to main June 24, 2020 00:41
… scope here

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82 ANeumann82 merged commit 198260f into main Sep 22, 2020
@ANeumann82 ANeumann82 deleted the an/kep-kudo-upgrades branch September 22, 2020 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
KUDO Global
  
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

5 participants