Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning to docs about enabling IRSA in existing cluster #13818

Closed
ReillyBrogan opened this issue Jun 16, 2022 · 4 comments
Closed

Add warning to docs about enabling IRSA in existing cluster #13818

ReillyBrogan opened this issue Jun 16, 2022 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@ReillyBrogan
Copy link
Contributor

/kind feature

The following operations are extremely likely to be highly disruptive during cluster upgrading (speaking specifically about using kops-managed IRSA here, I am unsure whether externally managed OIDC would be affected as well but I suspect it would be):

  • Enabling IRSA on a cluster that did not have IRSA enabled
  • Disabling IRSA on a cluster that had IRSA enabled
  • Changing the bucket name used as the discovery store.

After the control plane have been upgraded during the upgrade process I've observed the following behaviors:

  • Any service account tokens issued to "old" pods are no longer valid against the Kubernetes API
    • This breaks basically every controller that's not scheduled onto the control plane (as those pods are "new" and have a service token issued with the new issuer field).
    • This causes a network partition between "old" and "new" nodes as "old" CNI pods cannot contact the Kubernetes API to retrieve any new nodes in order to make network adjacencies. This might not break CNI tooling that uses some kind of direct external routing.
  • Kubelet running on "old" nodes will not start "new" pods that are scheduled to that node due to a mismatch in the issuer field in the token issued to that pod and what Kubelet expects.

As such the recommended method for making an IRSA change should be to roll all the cluster nodes as quickly as possible once the control plane has been rolled in order to ensure that only "new" nodes are in the cluster. A warning to this effect should be added to the documentation around the IRSA feature.

For this reason also we should also ensure that IRSA is never enabled for pre-existing clusters that did not have IRSA enabled during the migration to new Kops versions.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 16, 2022
@olemarkus
Copy link
Member

Would you be able to do a PR to the docs adressing this?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2022
@olemarkus
Copy link
Member

@k8s-ci-robot
Copy link
Contributor

@olemarkus: Closing this issue.

In response to this:

Warning was added a while ago: https://kops.sigs.k8s.io/cluster_spec/#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants