Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the ServiceAccount issuer of a Shoot breaks the cluster #5817

Closed
rfranzke opened this issue Apr 19, 2022 · 7 comments · Fixed by #5888
Closed

Changing the ServiceAccount issuer of a Shoot breaks the cluster #5817

rfranzke opened this issue Apr 19, 2022 · 7 comments · Fixed by #5888
Assignees
Labels
area/usability Usability related kind/bug Bug

Comments

@rfranzke
Copy link
Member

How to categorize this issue?

/area usability
/kind bug

What happened:
When changing the ServiceAccount issuer for an existing shoot cluster then the control plane and system components start to fail.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

  1. Create a new Shoot with a standard manifest (ref https://github.com/gardener/gardener/blob/master/example/provider-local/shoot.yaml)
  2. Set .spec.kubernetes.kubeAPIServer.serviceAccountConfig.issuer="https://foo.bar.com"
  3. See the control plane and system components fail

Anything else we need to know?:
According to https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/, the --service-account-issuer field does the following:

[...] The issuer will assert this identifier in "iss" claim of issued tokens. [...] When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted.

By default, the issuer is the external hostname of the cluster itself:

out := kubeapiserver.ServiceAccountConfig{Issuer: "https://" + externalHostname}
. Hence, after creating a shoot, all tokens used by control plane and system components for communication with the kube-apiserver have this issuer.
Now, after changing this for an existing shoot, we just set the flag to the provided value:
if config.ServiceAccountConfig.Issuer != nil {
out.Issuer = *config.ServiceAccountConfig.Issuer
}

This will make kube-apiserver no longer accepting old tokens.

@gardener-prow gardener-prow bot added area/usability Usability related kind/bug Bug labels Apr 19, 2022
@dimityrmirchev
Copy link
Member

With this PR was introduced a possibility to specify the --service-account-issuer multiple times for k8s 1.22 or later. This allows a non-disruptive migration of the issuer but it requires that the old issuer is set in the .spec.kubernetes.kubeAPIServer.serviceAccountConfig.acceptedIssuers field. Although this does not prevent a user from changing the issuer in a way that will be disruptive for the control plane and other services I thought that this information is good to be mentioned in this issue.

@rfranzke
Copy link
Member Author

Thanks @dimityrmirchev. I think what we should do is to always add the external hostname to the list of accepted issuers (

out := kubeapiserver.ServiceAccountConfig{Issuer: "https://" + externalHostname}
), even if the user specifies their own issuer. Today, this is not done automatically, hence all already issued tokens become invalid unless the user adds the external hostname manually to the list of acceptedIssuers (which is non-obvious for users and they shouldn't even care).

@dimityrmirchev
Copy link
Member

But I think that this again will not stop me from breaking a cluster. For example:

  1. I change the default issuer to iss-a and all is good because external hostname was implicitly added to the acceptedIssuers.
  2. I change it again to iss-b and then everything breaks because tokens were issued for iss-a but iss-a is not in the acceptedIssuers.

I am not sure if some kind of validation can be introduced to not allow changing the current issuer if the old value is not present in the acceptedIssuers. Even then I can remove it immediately from the acceptedIssuers after the update and probably still break something. But I guess if someone does this then it will be their responsibility because Gardener already made the validation before changing the initial issuer.

@dimityrmirchev
Copy link
Member

IMO the non obvious part of the whole thing is "When can I remove the old issuer from acceptedIssuers? Were all the tokens with old iss claim regenerated?"

@rfranzke
Copy link
Member Author

Right, thanks for pointing this out!

@timebertt
Copy link
Member

We discussed this again and came up with the following plan:

  • Always add the external hostname to the accepted issuers list. This will prevent invalidating service account tokens once users switch to a custom service account issuer.
  • We won't introduce validation for when an issuer can safely be removed from the accepted issuers list, because it's too complicated and in the users responsibility to check this (as pointed out by @dimityrmirchev)
  • add some documentation about what users need to do when migrating from one service account issuer to another one (similar to the CA rotation process: 1) make sure both old and new are accepted 2) make sure all tokens were issued by the new one (use projected volume tokens with short-lived tokens or roll all pods + deleted all service account token secrets) 3) only then drop the old issuer from the accepted issuers list)

@rfranzke
Copy link
Member Author

rfranzke commented May 2, 2022

/assign @acumino @rfranzke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/usability Usability related kind/bug Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants