Cannot enable IRSA on existing cluster #13101

dmitriishaburov · 2022-01-13T13:58:47Z

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Version 1.21.2

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.22.3
Server Version: v1.21.6

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

On existing cluster:

Enable IRSA:

serviceAccountIssuerDiscovery:
  discoveryStore: s3://our-discovery-bucket/
  enableAWSOIDCProvider: true

Try to update cluster:

# kops update cluster cluster
-   serviceAccountIssuer: https://api.internal.cluster.com
+   serviceAccountIssuer: https://our-discovery-bucket/
+   serviceAccountJWKSURI: https://our-discovery-bucket/openid/v1/jwks
-   serviceAccountJWKSURI: https://api.internal.cluster.com/openid/v1/jwks

# kops update cluster cluster --yes
# kops rolling-update cluster --yes
Cluster did not pass validation, will retry in "30s": system-node-critical pod "calico-node-r4pvs" is not ready (calico-node).
Cluster did not pass validation, will retry in "30s": system-node-critical pod "calico-node-r4pvs" is not ready (calico-node).

# kubectl logs calico-node-r4pvs
2022-01-13 13:25:00.297 [INFO][9] startup/startup.go 450: Checking datastore connection
2022-01-13 13:25:00.321 [WARNING][9] startup/startup.go 462: Connection to the datastore is unauthorized error=connection is unauthorized: Unauthorized
2022-01-13 13:25:00.321 [WARNING][9] startup/startup.go 1347: Terminating
Calico node failed to start

5. What happened after the commands executed?

Cluster failed to update, was forced to rollback changes with verification disabled.

6. What did you expect to happen?

Cluster updated successfuly and I'm able to configure IRSA

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

---
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: # removed
spec:
  additionalPolicies:
    node: # removed
  DisableSubnetTags: true
  api:
    loadBalancer:
      type: Internal
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
  authentication:
    aws: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  cloudLabels:
    Creator: kops
    Owner: # removed
    Project: # removed
    Repository: # removed
  configBase: # removed
  containerRuntime: docker
  dnsZone: # removed
  etcdClusters:
    - cpuRequest: 200m
      etcdMembers:
        - instanceGroup: master-eu-central-1a
          name: a
        - instanceGroup: master-eu-central-1b
          name: b
        - instanceGroup: master-eu-central-1c
          name: c
      memoryRequest: 100Mi
      name: main
    - cpuRequest: 100m
      etcdMembers:
        - instanceGroup: master-eu-central-1a
          name: a
        - instanceGroup: master-eu-central-1b
          name: b
        - instanceGroup: master-eu-central-1c
          name: c
      memoryRequest: 100Mi
      name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
    - 0.0.0.0/0
  kubernetesVersion: 1.21.6
  masterPublicName: # removed
  networkCIDR: # removed
  networkID: # removed
  networking:
    calico:
      majorVersion: v3
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
    - # removed
  subnets:
    - cidr: # removed
      id: # removed
      name: eu-central-1a
      type: Private
      zone: eu-central-1a
    - cidr: # removed
      id: # removed
      name: eu-central-1b
      type: Private
      zone: eu-central-1b
    - cidr: # removed
      id: # removed
      name: eu-central-1c
      type: Private
      zone: eu-central-1c
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1a
      type: Utility
      zone: eu-central-1a
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1b
      type: Utility
      zone: eu-central-1b
    - cidr: # removed
      id: # removed
      name: utility-eu-central-1c
      type: Utility
      zone: eu-central-1c
  topology:
    dns:
      type: Private
    masters: private
    nodes: private
  clusterAutoscaler:
    enabled: true
    balanceSimilarNodeGroups: true
    skipNodesWithLocalStorage: false
    skipNodesWithSystemPods: false
  nodeTerminationHandler:
    enabled: true
    enableScheduledEventDraining: true
  awsLoadBalancerController:
    enabled: true
  certManager:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: # removed
    enableAWSOIDCProvider: true

The text was updated successfully, but these errors were encountered:

dmitriishaburov · 2022-01-13T15:20:19Z

Most likely that would be possible to workaround if

	// Identifier of the service account token issuer. The issuer will assert this identifier
	// in "iss" claim of issued tokens. This value is a string or URI.
	ServiceAccountIssuer *string `json:"serviceAccountIssuer,omitempty" flag:"service-account-issuer"`

in https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#KubeAPIServerConfig would be []string and result in multiple --service-account-issuer flags, which is supported in 1.22:

--service-account-issuer strings

Identifier of the service account token issuer. The issuer will assert this identifier in "iss" claim of issued tokens. This value is a string or URI. If this option is not a valid URI per the OpenID Discovery 1.0 spec, the ServiceAccountIssuerDiscovery feature will remain disabled, even if the feature gate is set to true. It is highly recommended that this value comply with the OpenID spec: https://openid.net/specs/openid-connect-discovery-1_0.html. In practice, this means that service-account-issuer must be an https URL. It is also highly recommended that this URL be capable of serving OpenID discovery documents at {service-account-issuer}/.well-known/openid-configuration. When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted.

In that case specifying both old and new issuer would probably allow to rollout cluster without breaking all existing service accounts tokens

olemarkus · 2022-01-16T07:10:12Z

I have not had much luck with that approach, unfortunately. But if you can make it work, I would be very happy to help adding support for a graceful migration path.

k8s-triage-robot · 2022-04-16T07:52:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

gtorre · 2022-04-20T21:36:10Z

@dmitriishaburov I'm curious to know if you've solved this problem. I'm also trying to enable IRSA on a cluster running v1.21.5

k8s-triage-robot · 2022-05-20T22:13:11Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-06-19T23:07:20Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-06-19T23:07:34Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 20, 2022

k8s-ci-robot closed this as completed Jun 19, 2022

elliotdobson mentioned this issue Apr 30, 2024

kops v1.26 upgrade fails due to serviceAccountIssuer changes #16488

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot enable IRSA on existing cluster #13101

Cannot enable IRSA on existing cluster #13101

dmitriishaburov commented Jan 13, 2022 •

edited

Loading

dmitriishaburov commented Jan 13, 2022

olemarkus commented Jan 16, 2022

k8s-triage-robot commented Apr 16, 2022

gtorre commented Apr 20, 2022

k8s-triage-robot commented May 20, 2022

k8s-triage-robot commented Jun 19, 2022

k8s-ci-robot commented Jun 19, 2022

Cannot enable IRSA on existing cluster #13101

Cannot enable IRSA on existing cluster #13101

Comments

dmitriishaburov commented Jan 13, 2022 • edited Loading

dmitriishaburov commented Jan 13, 2022

olemarkus commented Jan 16, 2022

k8s-triage-robot commented Apr 16, 2022

gtorre commented Apr 20, 2022

k8s-triage-robot commented May 20, 2022

k8s-triage-robot commented Jun 19, 2022

k8s-ci-robot commented Jun 19, 2022

dmitriishaburov commented Jan 13, 2022 •

edited

Loading