Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot enable IRSA on existing cluster #13101

Closed
dmitriishaburov opened this issue Jan 13, 2022 · 7 comments
Closed

Cannot enable IRSA on existing cluster #13101

dmitriishaburov opened this issue Jan 13, 2022 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dmitriishaburov
Copy link

dmitriishaburov commented Jan 13, 2022

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Version 1.21.2

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.22.3
Server Version: v1.21.6

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

On existing cluster:

  • Enable IRSA:
serviceAccountIssuerDiscovery:
  discoveryStore: s3://our-discovery-bucket/
  enableAWSOIDCProvider: true
  • Try to update cluster:
# kops update cluster cluster
-   serviceAccountIssuer: https://api.internal.cluster.com
+   serviceAccountIssuer: https://our-discovery-bucket/
+   serviceAccountJWKSURI: https://our-discovery-bucket/openid/v1/jwks
-   serviceAccountJWKSURI: https://api.internal.cluster.com/openid/v1/jwks

# kops update cluster cluster --yes
# kops rolling-update cluster --yes
Cluster did not pass validation, will retry in "30s": system-node-critical pod "calico-node-r4pvs" is not ready (calico-node).
Cluster did not pass validation, will retry in "30s": system-node-critical pod "calico-node-r4pvs" is not ready (calico-node).

# kubectl logs calico-node-r4pvs
2022-01-13 13:25:00.297 [INFO][9] startup/startup.go 450: Checking datastore connection
2022-01-13 13:25:00.321 [WARNING][9] startup/startup.go 462: Connection to the datastore is unauthorized error=connection is unauthorized: Unauthorized
2022-01-13 13:25:00.321 [WARNING][9] startup/startup.go 1347: Terminating
Calico node failed to start

5. What happened after the commands executed?

Cluster failed to update, was forced to rollback changes with verification disabled.

6. What did you expect to happen?

Cluster updated successfuly and I'm able to configure IRSA

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

---
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: # removed
spec:
  additionalPolicies:
    node: # removed
  DisableSubnetTags: true
  api:
    loadBalancer:
      type: Internal
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
  authentication:
    aws: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  cloudLabels:
    Creator: kops
    Owner: # removed
    Project: # removed
    Repository: # removed
  configBase: # removed
  containerRuntime: docker
  dnsZone: # removed
  etcdClusters:
    - cpuRequest: 200m
      etcdMembers:
        - instanceGroup: master-eu-central-1a
          name: a
        - instanceGroup: master-eu-central-1b
          name: b
        - instanceGroup: master-eu-central-1c
          name: c
      memoryRequest: 100Mi
      name: main
    - cpuRequest: 100m
      etcdMembers:
        - instanceGroup: master-eu-central-1a
          name: a
        - instanceGroup: master-eu-central-1b
          name: b
        - instanceGroup: master-eu-central-1c
          name: c
      memoryRequest: 100Mi
      name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
    - 0.0.0.0/0
  kubernetesVersion: 1.21.6
  masterPublicName: # removed
  networkCIDR: # removed
  networkID: # removed
  networking:
    calico:
      majorVersion: v3
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
    - # removed
  subnets:
    - cidr: # removed
      id: # removed
      name: eu-central-1a
      type: Private
      zone: eu-central-1a
    - cidr: # removed
      id: # removed
      name: eu-central-1b
      type: Private
      zone: eu-central-1b
    - cidr: # removed
      id: # removed
      name: eu-central-1c
      type: Private
      zone: eu-central-1c
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1a
      type: Utility
      zone: eu-central-1a
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1b
      type: Utility
      zone: eu-central-1b
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1c
      type: Utility
      zone: eu-central-1c
  topology:
    dns:
      type: Private
    masters: private
    nodes: private
  clusterAutoscaler:
    enabled: true
    balanceSimilarNodeGroups: true
    skipNodesWithLocalStorage: false
    skipNodesWithSystemPods: false
  nodeTerminationHandler:
    enabled: true
    enableScheduledEventDraining: true
  awsLoadBalancerController:
    enabled: true
  certManager:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: # removed
    enableAWSOIDCProvider: true
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2022
@dmitriishaburov
Copy link
Author

Most likely that would be possible to workaround if

	// Identifier of the service account token issuer. The issuer will assert this identifier
	// in "iss" claim of issued tokens. This value is a string or URI.
	ServiceAccountIssuer *string `json:"serviceAccountIssuer,omitempty" flag:"service-account-issuer"`

in https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#KubeAPIServerConfig would be []string and result in multiple --service-account-issuer flags, which is supported in 1.22:

--service-account-issuer strings

Identifier of the service account token issuer. The issuer will assert this identifier in "iss" claim of issued tokens. This value is a string or URI. If this option is not a valid URI per the OpenID Discovery 1.0 spec, the ServiceAccountIssuerDiscovery feature will remain disabled, even if the feature gate is set to true. It is highly recommended that this value comply with the OpenID spec: https://openid.net/specs/openid-connect-discovery-1_0.html. In practice, this means that service-account-issuer must be an https URL. It is also highly recommended that this URL be capable of serving OpenID discovery documents at {service-account-issuer}/.well-known/openid-configuration. When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted.

In that case specifying both old and new issuer would probably allow to rollout cluster without breaking all existing service accounts tokens

@olemarkus
Copy link
Member

I have not had much luck with that approach, unfortunately. But if you can make it work, I would be very happy to help adding support for a graceful migration path.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2022
@gtorre
Copy link

gtorre commented Apr 20, 2022

@dmitriishaburov I'm curious to know if you've solved this problem. I'm also trying to enable IRSA on a cluster running v1.21.5

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 20, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants