Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-alias resource apiVersions #798

Merged
merged 6 commits into from
Oct 4, 2019
Merged

Conversation

lblackstone
Copy link
Member

@lblackstone lblackstone commented Sep 11, 2019

Proposed changes

Kubernetes apiVersions are mostly forward/backward
compatible, so for cases where we know it's safe, we
auto-alias the apiVersions so that the engine does not
force a replacement when a resource is updated to a
compatible apiVersion.

Related issues (optional)

Fixes #573

@lblackstone lblackstone force-pushed the lblackstone/autoaliasing branch 4 times, most recently from 6e1c788 to d596f9e Compare September 12, 2019 22:53
@lblackstone lblackstone force-pushed the lblackstone/autoaliasing branch 3 times, most recently from 4dcf058 to 74d9191 Compare September 19, 2019 18:30
@lblackstone lblackstone marked this pull request as ready for review September 19, 2019 18:30
Copy link
Contributor

@metral metral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

For future PRs of this size, it'd be great if we could break it up into multiple smaller commits, preferably with identical changes segmented together. It took me a couple of review sessions to get through it so I'm sure I've missed something 😄

case kinds.DaemonSet:
return []string{
"kubernetes:apps/v1:DaemonSet",
// For some reason, there is no `apps/v1beta1:DaemonSet`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// It's unsafe to move between `extensions/v1beta1`, and the newer apiVersions due to differences in
// the behavior of the Deployment and ReplicaSet. Even if the apiVersion is changed,
// the resource will continue to use the old behavior, which will break the await logic. Without an
// alias set, the engine will recreate the resource with the newer apiVersion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general issue makes me think that I don't understand the implications of this PR:

  • It is true API versions are generally incompatible (e.g., Deployment.spec.selector is not required in e/v1b1, but is required in other versions), but I do not get the sense that we are super confident that we understand which API versions are actually schematically incompatible. Is this wrong?
  • In the event where there are incompatibilities, does aliasing actually cause problems? If a person changes a Deployment apiVersion from e/v1b1 to apps/v1, it seems like they'd get an error that .spec.selector is not set, and then they'll set it, and the upgrade will "just work"—right?
  • In the event where aliasing does cause problems, it seems like we'd want a very clear answer on which of these are schematically different, because any pair that is schematically different can't be aliased at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I do not get the sense that we are super confident that we understand which API versions are actually schematically incompatible. Is this wrong?

Any schema incompatibilities will be caught during an update. For the example you mentioned with the Deployment selector, the SDK will raise an error if the selector is not present on newer apiVersions.

  • In the event where there are incompatibilities, does aliasing actually cause problems?

Yes, aliasing can cause problems in specific cases. The one I know about currently is the e/v1b1 Deployment. The problem is a bit subtle, which I tried to capture in the comments here:

  1. If the Deployment is created with the e/v1b1 API, the Deployment will never include the newer status fields, regardless of which API is used to retrieve the state. This is true even when the apiVersion of the Deployment is changed to a newer version. The Deployment has to be deleted and then recreated to get the new behavior.
  2. The await logic works differently for the e/v1b1 Deployment. The problem with aliasing is that it changes the associated apiVersion, and breaks the await logic (which can't tell that the Deployment was created with the e/v1b1 apiVersion). I suspect that this is a bug in Kubernetes.

In summary, it's not exactly aliasing that is the problem, it's the different behavior from e/v1b1 Deployment resources, even after the apiVersion is changed explicitly. We need to recreate these resources to get them to work properly, which is probably out of scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Kubernetes itself treats these as the same, and doesn't replace, right? Like if I had a apps/v1/DaemonSet named foo, and then tried to post a extensions/v1beta1/DamesonSet named foo it would modify the existing resource instead of creating a new one (or even replacing the old one), right? Which is not the same as what would happen if I posted a core/v1/Pod named foo.

So we should match that - right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Kubernetes itself treats these as the same, and doesn't replace, right?

Yes, Kubernetes would not replace the resource for a changed apiVersion.

I think we have two choices here:

  1. Force a replacement for the extensions/v1beta1 apiVersion when the resource is updated to a newer apiVersion. The behavior is different between these versions, and a replacement is necessary to avoid having a resource that claims to be a newer version but keeps the old behavior.
  2. Change the await logic so that every Deployment apiVersion uses the extensions/v1beta1 logic. This has the advantage of not requiring a replacement, but is less robust and more complex. This has a chance of introducing bugs for newer apiVersions since the readiness logic would change.

I'm in favor of 1. Worst case, this is a one-time replacement for a limited number of resources. The extensions/v1beta1 apiVersion will be completely unsupported in another 6 months (k8s 1.18), at which time this will be a moot point, and we can likely remove the legacy code path.

@hausdorff @metral Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to loop back on this, I believe it should be sufficient to mark apiVersion as a replacement-requiring property for e/v1b1 and alias all apiVersions of every resource. The schemas are incompatible, but if you're changing the apiVersion users will change the incompatible properties too. Replacing e/v1b1s will also fix the status issue (though obviously this is not an amazing experience).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just saw the new comments. We could match the non-replace behavior here, but we'd have to keep track of which resources used to be e/v1b1. That's kind of a pain, and I don't see it as the end of the world if we don't do it this way, but that's fine with me too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commits to this PR take approach 1. I think this is the best tradeoff, and is vastly better than the current behavior, which requires manual user intervention to resolve. As @hausdorff mentioned, many Helm charts are rolling apiVersions now, so it's important to get this fix out ASAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of pain to do it the other way, but not that big of a pain. You could put this state in the annotations, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hausdorff I decided to go with your annotation suggestion to minimize disruption to existing resources. That should also catch hard-to-diagnose cases where the user aliased a e/v1b1 Deployment themselves.

Tested this out and was able to seamlessly move between apiVersions without breaking the await logic.

@hausdorff
Copy link
Contributor

I think I don't understand @metral's comment about breaking up the PR because it's too big. It's not actually 375 files, it's 5 files and 370 autogenerated files. Does that seem unmanageable?

@hausdorff
Copy link
Contributor

This is causing issues because a lot of people are upgrading Charts, since extensions/v1beta1 is has been removed from Kubernetes v1.16, many charts are transitioning to newer .apiVersions. Pulumi thinks of identical resources with different values for this field to as different resources, while Kubernetes considers them the same—so when changed we schedule a Create of the "new" resource and a subsequent Delete of the "old" resource. When we try to do the Create Kubernetes rejects the operation because it already exists.

Kubernetes apiVersions are mostly forward/backward
compatible, so for cases where we know it's safe, we
auto-alias the apiVersions so that the engine does not
force a replacement when a resource is updated to a
compatible apiVersion.
Force a replacement for resources with the
extensions/v1beta1 apiVersion when the apiVersion
changes.
Reverse the force-replace change for e/v1b1 apiVersions,
and instead store the apiVersion for the orginial create as
an annotation on the k8s resource. The await logic uses
this annotation to choose the correct logic path.
pkg/gen/typegen.go Show resolved Hide resolved
pkg/metadata/annotations.go Outdated Show resolved Hide resolved
pkg/metadata/annotations.go Show resolved Hide resolved
@hausdorff
Copy link
Contributor

Moving from e/v1b1 -> apps/v1 triggers a replace. This is not intentional, right?

Screen Shot 2019-10-04 at 11 25 39 AM

Copy link
Contributor

@hausdorff hausdorff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I think this is good enough to ship. The previous replace issue I mentioned before turns out to have been because I was using a new build of the provider plugin, but somehow an old version of the TS code.

Can we file follow-up issues for the things I mentioned?

Also, it would be nice for the alias tests to specifically test that e/v1b1 does not hang when it's upgraded and a new rollout is triggered.

Check that e/v1b1 Deployment can update successfully after
changing the apiVersion.
@lblackstone
Copy link
Member Author

@hausdorff Updated the test to verify the rollout of an upgrade e/v1b1 Deployment and opened issues to track feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically alias different ApiVersions
4 participants