Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Kops 1.17 AWS StorageClass Wreaks Havoc with Existing Defaults and Provides Uncertain Upgrade Path #9336

Closed
LeeHampton opened this issue Jun 11, 2020 · 9 comments · Fixed by #9337

Comments

@LeeHampton
Copy link

This commit introduces a new StorageClass called kops-ssd-1-17. There seem to be a couple of issues with this.

First, even though it is no longer explicitly declared in this file, the gp2 StorageClass is set to the default volume in existing pre 1.17 clusters (i.e., it has the "storageclass.beta.kubernetes.io/is-default-class":"true" annotation). That's because it used to be set to the default, as seen in this commit. For whatever reason, this annotation does not get undone during the upgrade to 1.17. Now that kops-ssd-1-17 has that annotation explicitly set, what I ended up with after upgrading from 1.16 to 1.17 is 2 default StorageClasses, which leads to the following error when attempting to create a PV/PVC:

Internal error occurred: 2 default StorageClasses were found

Second, tying the name of this new StorageClass to the kops version seems pretty problematic. In my case, I'm going to need to remove the annotation on this StorageClass to get my existing infra working again. To do so, I'll add a simple kubectl patch operation to my init/upgrade scripts. However, with this naming schema I'm assuming subsequent versions of kops will change the name of the storage class to e.g. kops-ssd-1-18, which means I'm always going to be playing wack-a-mole with these version specific names. Why can't we just use a generic name like ssd-ebs (or even just still use the gp2 name, since it's still gp2)?

Is there any way to work around this behavior that I'm missing, or is kubectl patch my best bet for now? And does anyone have thoughts on what a good patch for 1.17 would be to fix this cluster-breaking behavior? Change the name to something more generic and remove the default status of this new StorageClass for now? Or ensure that the default status from gp2 is removed -- (more risky for existing clusters IMO)? Happy to take this on if there's some consensus around it.

@joshbranham
Copy link
Contributor

Hey @LeeHampton sorry for the confusion here. We are discussing in #kops-dev, but I will say that I don't anticipate this new class to change much, and therefore the version matching should not change unless we find new setting to set.

We could not just use gp2 because you can't modify settings of an existing storageclass, which would break on upgrades as well.

I think the missing piece here is what to set the default as, or to set it at all. We discussed this a bit in the original PR but did not nail anything down in terms of a release note. In our environment, we use our own storageclass and use open policy agent as a way to enforce it, as we want things like encryption and rest and volume expansion by default.

@johngmyers
Copy link
Member

The previous gp2 StorageClass was left in that file with the intention that the annotation would be removed upon upgrade.

@joshbranham
Copy link
Contributor

Put up a fix, we needed to set the default to false versus omitting the annotation completely.

@johngmyers
Copy link
Member

I'm curious as to why, as including the default class omitting the annotation appeared to work. Perhaps because gp2 had been previously installed with kubectl apply?

@LeeHampton
Copy link
Author

Thanks @joshbranham , makes total sense. I think I've gotten bitten by the whole dropping-annotation-doesn't-remove-it thing before.

I'm still a little bit concerned about this sudden change in default though, especially because it doesn't appear to be easily modifiable in the kops yaml manifest (unless I'm missing something?). Given how transparent this change is, a developer could easily find themselves in a situation where they're trying to debug some volume related issue, and they discover that at some point the underlying storage class for new PVs is different than the one for old deploys. I could see that becoming a debugging nightmare, especially as the behavior on the default StorageClass gets more complex/custom.

I definitely get the desire to move forward with new functionality on StorageClasses, but I think this change in defaults should be accompanied with a Breaking Change notice, as well as an easy user-facing way to modify it through kops.

@joshbranham
Copy link
Contributor

@LeeHampton does the default gp2 StorageClass in your cluster have a http://kubectl.kubernetes.io/last-applied-configuration annotation? It looks like that is causing the default value to be re-applied when the StorageClass is applied with kops. Net new cluster upgrade from 1.16 - > 1.17 does not experience the duplicate default class (cc @olemarkus for the find)

@LeeHampton
Copy link
Author

It does have that annotation. Here is the full annotation set from my gp2 class:

kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"},"labels":{"k8s-addon":"storage-aws.addons.k8s.io"},"name":"gp2"},"parameters":{"type":"gp2"},"provisioner":"kubernetes.io/aws-ebs"}

@joshbranham
Copy link
Contributor

Yeah so with that said, since you modified the storageclass at some point, it would be hard for kops to work around that annotation forcing the default annotation

https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#replace-the-list-if-all-its-elements-are-primitives

@LeeHampton
Copy link
Author

Is there anything in kops that would potentially result in a StorageClass being modified? The 1.16 clusters I upgraded from were all fairly new clusters and I can't find anything in my stack that edits the underlying StorageClasses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants