Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"does not contain acceptable node role" error when starting a k8s 1.19 cluster specifying custom iam roles #10719

Closed
alanbover opened this issue Feb 3, 2021 · 8 comments
Assignees
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release.
Milestone

Comments

@alanbover
Copy link

**1. What kops version are you running? The command kops version, will display
1.19.0

**2. What Kubernetes version are you running?
1.19.7

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
kops create -f ${BUILD_PATH}/cluster.yaml --state ${ASSETS_S3_PATH} -v9

5. What happened after the commands executed?
The cluster master instances boots correctly. But the nodes are not able to start. Seems that they are failing with the error

Feb 03 15:45:37 ip-10-82-39-61.ec2.internal nodeup[4153]: W0203 15:45:37.535214    4153 executor.go:139] error running task "BootstrapClient/BootstrapClient" (33s remaining to succeed): bootstrap returned status code 403: failed to verify token: arn "arn:aws:sts::<hiddenAWSaccount>:assumed-role/eu-west-1-dev-k8sWorker/i-0a508b0db9c93ce33" does not contain acceptable node role

6. What did you expect to happen?
The nodes starts correctly, as they do for previous versions of kubernetes (< 1.19)

**7. Please provide your cluster manifest. Execute

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  generation: 1
  name: loko2.kops.mydomain.com
spec:
  additionalSans:
  - api.loko2.us-east-1.admin.mydomain.com
  - api.loko2.us-east-1.mydomain.com
  api:
    loadBalancer:
      class: Classic
      type: Internal
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    billed-service: schip-next
    billed-team: cpr
    monitor-with-datadog: enabled
    realm: dev
    support-team: cpr
  cloudProvider: aws
  configBase: s3://schip-kops-dev-us-east-1-<accountId>/loko2.kops.mydomain.com
  containerRuntime: docker
  etcdClusters:
  - cpuRequest: "1"
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
      volumeIops: 100
      volumeSize: 100
      volumeType: io1
    - instanceGroup: master-us-east-1b
      name: b
      volumeIops: 100
      volumeSize: 100
      volumeType: io1
    - instanceGroup: master-us-east-1c
      name: c
      volumeIops: 100
      volumeSize: 100
      volumeType: io1
    memoryRequest: 2Gi
    name: main
  - cpuRequest: 500m
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
      volumeIops: 100
      volumeSize: 21
      volumeType: io1
    - instanceGroup: master-us-east-1b
      name: b
      volumeIops: 100
      volumeSize: 21
      volumeType: io1
    - instanceGroup: master-us-east-1c
      name: c
      volumeIops: 100
      volumeSize: 21
      volumeType: io1
    memoryRequest: 1Gi
    name: events
  fileAssets:
  - content: |
      clusters:
        - name: authn-webhook
          cluster:
            # 1.15 part is not needed but oktawebhook accepts /auth/
            # if provided /auth/ kubernetes would send /auth, be redirected to /auth/
            # and then issue a GET without the TokenReviewRequest.
            # adding an artificial 1.14 suffix to ensure kubernetes sends a requst starting with /auth/
            server: https://auth.mydomain.com/auth/1.15
      contexts:
        - context:
            cluster: authn-webhook
          name: authn-webhook
      current-context: authn-webhook
    name: okta-webhook
    path: /srv/kubernetes/authn.config
    roles:
    - Master
  - content: |
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: system:psp:restricted
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: system:psp:restricted
      subjects:
      - kind: Group
        name: system:serviceaccounts
        apiGroup: rbac.authorization.k8s.io
      - kind: Group
        name: system:authenticated
        apiGroup: rbac.authorization.k8s.io
    name: psp-restricted-clusterrolebinding
    path: /srv/kubernetes/manifests/psp-restricted-clusterrolebinding.yaml
    roles:
    - Master
  - content: |
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: system:psp:privileged
      rules:
      - apiGroups:
        - extensions
        resourceNames:
        - privileged
        resources:
        - podsecuritypolicies
        verbs:
        - use
    name: psp-privileged-clusterrole
    path: /srv/kubernetes/manifests/psp-privileged-clusterrole.yaml
    roles:
    - Master
  - content: |
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: system:psp:restricted
      rules:
      - apiGroups:
        - extensions
        resourceNames:
        - restricted
        resources:
        - podsecuritypolicies
        verbs:
        - use
    name: psp-restricted-clusterrole
    path: /srv/kubernetes/manifests/psp-restricted-clusterrole.yaml
    roles:
    - Master
  - content: |
      apiVersion: extensions/v1beta1
      kind: PodSecurityPolicy
      metadata:
        annotations:
          seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
        name: privileged
      spec:
        allowPrivilegeEscalation: true
        allowedCapabilities:
        - '*'
        fsGroup:
          rule: RunAsAny
        hostIPC: true
        hostNetwork: true
        hostPID: true
        hostPorts:
        - max: 65535
          min: 0
        privileged: true
        runAsUser:
          rule: RunAsAny
        seLinux:
          rule: RunAsAny
        supplementalGroups:
          rule: RunAsAny
        volumes:
        - '*'
    name: psp-privileged
    path: /srv/kubernetes/manifests/psp-privileged.yaml
    roles:
    - Master
  - content: |
      apiVersion: extensions/v1beta1
      kind: PodSecurityPolicy
      metadata:
        name: restricted
      spec:
        # Required to prevent escalations to root.
        allowPrivilegeEscalation: false
        requiredDropCapabilities:
        # This is redundant with non-root + disallow privilege escalation,
        # but we can provide it for defense in depth.
        - ALL
        # Block host namespaces
        hostIPC: false
        hostNetwork: false
        hostPID: false
        # Block privileged mode
        privileged: false
        # Commenting user and group enforcement because without runAsGroup it is impossible to make smooth transition!
        # Enforcement explanation/example
        # https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
        # We should do it because K8S doesn't support user namespaces remapping
        # https://github.com/kubernetes/enhancements/issues/127
        runAsUser:
          rule: RunAsAny
        #  rule: MustRunAs
        #  # By default first process with run with min uid
        #  # >= 10000 to avoid overlapping with hosts' uids
        #  ranges:
        #  - min: 10000
        #    max: 65535
        # Alpha support in K8S 1.10+
        #runAsGroup:
        #  rule: MustRunAs
        #  # By default first process with run with min gid
        #  # >= 10000 to avoid overlapping with hosts' gids
        #  ranges:
        #  - min: 10000
        #    max: 65535
        # Controls which group IDs containers add.
        supplementalGroups:
          rule: RunAsAny
        #  rule: MustRunAs
        #  ranges:
        #  - min: 10000
        #    max: 65535
        # Controls the supplemental group applied to some volumes
        fsGroup:
          rule: RunAsAny
        #  rule: MustRunAs
        #  ranges:
        #  - min: 10000
        #    max: 65535
        seLinux:
          # Since we didn't set any AppArmor not SeLinux profiles
          rule: RunAsAny
        # Allow core volume types.
        volumes:
        - configMap
        - downwardAPI
        - emptyDir
        # Assume that persistentVolumes set up by the cluster admin are safe to use.
        - persistentVolumeClaim
        - secret
        - projected
        # It would be nice to make containers` root filesystems read-only
        #readOnlyRootFilesystem: true
        readOnlyRootFilesystem: false
    name: psp-restricted
    path: /srv/kubernetes/manifests/psp-restricted.yaml
    roles:
    - Master
  - content: '{ "init": true, "bip": "172.18.0.1/16" }'
    name: custom-docker-config
    path: /etc/docker/daemon.json
    roles:
    - Master
    - Node
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    admissionControl:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - NodeRestriction
    - ResourceQuota
    - PodSecurityPolicy
    - Priority
    authenticationTokenWebhookConfigFile: /srv/kubernetes/authn.config
  kubeDNS:
    provider: CoreDNS
  kubeProxy:
    enabled: true
    proxyMode: ipvs
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
  kubernetesApiAccess:
  - 127.0.0.1/32
  kubernetesVersion: 1.19.7
  masterPublicName: api.loko2.us-east-1.kopsinternal.mydomain.com
  metricsServer:
    enabled: true
  networkCIDR: 10.82.32.0/20
  networkID: <hiddenVPC>
  networking:
    canal:
      typhaReplicas: 3
  nonMasqueradeCIDR: 10.216.0.0/13
  sshAccess:
  - 10.82.0.0/16
  subnets:
  - id: subnet-0a7ec0c142261a22e
    name: PublicSubnetZoneA
    type: Public
    zone: us-east-1a
  - id: subnet-0a3262b707640ebee
    name: PublicSubnetZoneB
    type: Public
    zone: us-east-1b
  - id: subnet-0403231cc7f4a7805
    name: PublicSubnetZoneC
    type: Public
    zone: us-east-1c
  - id: subnet-08e0818b7e7ca93ac
    name: EtcdSubnetZoneA
    type: Private
    zone: us-east-1a
  - id: subnet-0584c698ca0781453
    name: EtcdSubnetZoneB
    type: Private
    zone: us-east-1b
  - id: subnet-012ba81d554592cb6
    name: EtcdSubnetZoneC
    type: Private
    zone: us-east-1c
  - id: subnet-060a2ba195fd365d8
    name: ControllerSubnetZoneA
    type: Private
    zone: us-east-1a
  - id: subnet-0c41a19126813ccc2
    name: ControllerSubnetZoneB
    type: Private
    zone: us-east-1b
  - id: subnet-05de58bcd0c15517c
    name: ControllerSubnetZoneC
    type: Private
    zone: us-east-1c
  - id: subnet-0c5c4566f8bbff797
    name: WorkerSubnetZoneA
    type: Private
    zone: us-east-1a
  - id: subnet-0e33df0b718d8fbc7
    name: WorkerSubnetZoneB
    type: Private
    zone: us-east-1b
  - id: subnet-02b0dd80b3e919282
    name: WorkerSubnetZoneC
    type: Private
    zone: us-east-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-02-03T16:26:35Z"
  labels:
    kops.k8s.io/cluster: loko2.kops.mydomain.com
  name: master-us-east-1a
spec:
  additionalSecurityGroups:
  - sg-01777a37c43f26f32
  - sg-04661fba074cc1685
  associatePublicIp: false
  externalLoadBalancers:
  - loadBalancerName: loko2-ApiPublicELB
  - loadBalancerName: loko2-ApiPrivilegedELB
  iam:
    profile: arn:aws:iam::<accountId>:instance-profile/schip-iam-roles-IAMInstanceProfileController-1ALNMRZN76RTO
  image: ami-047a51fa27710816e
  machineType: m5a.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - ControllerSubnetZoneA

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-02-03T16:26:35Z"
  labels:
    kops.k8s.io/cluster: loko2.kops.mydomain.com
  name: master-us-east-1b
spec:
  additionalSecurityGroups:
  - sg-01777a37c43f26f32
  - sg-04661fba074cc1685
  associatePublicIp: false
  externalLoadBalancers:
  - loadBalancerName: loko2-ApiPublicELB
  - loadBalancerName: loko2-ApiPrivilegedELB
  iam:
    profile: arn:aws:iam::<accountId>:instance-profile/schip-iam-roles-IAMInstanceProfileController-1ALNMRZN76RTO
  image: ami-047a51fa27710816e
  machineType: m5a.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1b
  role: Master
  subnets:
  - ControllerSubnetZoneB

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-02-03T16:26:35Z"
  labels:
    kops.k8s.io/cluster: loko2.kops.mydomain.com
  name: master-us-east-1c
spec:
  additionalSecurityGroups:
  - sg-01777a37c43f26f32
  - sg-04661fba074cc1685
  associatePublicIp: false
  externalLoadBalancers:
  - loadBalancerName: loko2-ApiPublicELB
  - loadBalancerName: loko2-ApiPrivilegedELB
  iam:
    profile: arn:aws:iam::<accountId>:instance-profile/schip-iam-roles-IAMInstanceProfileController-1ALNMRZN76RTO
  image: ami-047a51fa27710816e
  machineType: m5a.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1c
  role: Master
  subnets:
  - ControllerSubnetZoneC

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-02-03T16:26:36Z"
  labels:
    kops.k8s.io/cluster: loko2.kops.mydomain.com
  name: nodes
spec:
  associatePublicIp: false
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: enabled
  externalLoadBalancers:
  - loadBalancerName: loko2-IngressElb
  - loadBalancerName: loko2-InternalIngressElb2
  - loadBalancerName: loko2-PrivateIngressElb
  iam:
    profile: arn:aws:iam::<accountId>:instance-profile/schip-iam-roles-IAMInstanceProfileWorker-131899VPNFYHB
  image: ami-047a51fa27710816e
  machineType: m5a.4xlarge
  maxSize: 8
  minSize: 5
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - WorkerSubnetZoneA
  - WorkerSubnetZoneB
  - WorkerSubnetZoneC

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

I'm new on kops code, so I might just got something wrong. I've tried to trace this problem, and seems that its probably triggered by #9653.

The problem seems to be happening on this piece of the code

arn := result.GetCallerIdentityResult[0].Arn
parts := strings.Split(arn, ":")
if len(parts) != 6 {
return nil, fmt.Errorf("arn %q contains unexpected number of colons", arn)
}
if parts[0] != "arn" {
return nil, fmt.Errorf("arn %q doesn't start with \"arn:\"", arn)
}
if parts[1] != a.partition {
return nil, fmt.Errorf("arn %q not in partion %q", arn, a.partition)
}
if parts[2] != "iam" && parts[2] != "sts" {
return nil, fmt.Errorf("arn %q has unrecognized service", arn)
}
// parts[3] is region
// parts[4] is account
resource := strings.Split(parts[5], "/")
if resource[0] != "assumed-role" {
return nil, fmt.Errorf("arn %q has unrecognized type", arn)
}
if len(resource) < 3 {
return nil, fmt.Errorf("arn %q contains too few slashes", arn)
}
found := false
for _, role := range a.opt.NodesRoles {
if resource[1] == role {
found = true
break
}
}
if !found {
return nil, fmt.Errorf("arn %q does not contain acceptable node role", arn)
.

Aparently (according to the error log) the received arn for the comparison is arn:aws:sts:::assumed-role/eu-west-1-dev-k8sWorker/i-0a508b0db9c93ce33 (the assigned instance role)

Then it's trying to check if the role name (eu-west-1-dev-k8sWorker) is inside a list of roles (which I understand its populated by calling the bootstrap endpoint from kops controller https://kops-controller.internal.loko2.kops.mydomain.com:3988/bootstrap).

As I understand, this would be the content of the bootstrapped call:
cat /var/lib/kubelet/pods/99ef619e-037b-47de-9f1b-c31e6ad12622/volumes/kubernetes.io~configmap/kops-controller-config/config.yaml

{"cloud":"aws","configBase":"s3://schip-kops-dev-us-east-1-<hiddenAWSaccount>/loko2.kops.mydomain.com","server":{"Listen":":3988","provider":{"aws":{"nodesRoles":["schip-iam-roles-IAMInstanceProfileWorker-131899VPNFYHB"],"Region":"us-east-1"}},"serverKeyPath":"/etc/kubernetes/kops-controller/pki/kops-controller.key","serverCertificatePath":"/etc/kubernetes/kops-controller/pki/kops-controller.crt","caBasePath":"/etc/kubernetes/kops-controller/pki","signingCAs":["ca"],"certNames":["kubelet","kubelet-server","kube-proxy"]}}

As we can see, the nodesRole contains instanceProfiles instead of roleNames (something is seems to be generated here https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/cloudup/template_functions.go#L446-L450).

If I haven't missed any intermediate step, I believe the problem seems to be happening because it's comparing iam roles with instance profiles.

The same configuration with 1.18 clusters works fine. Also checked that starting a basic 1.19 cluster (without configuration, not specifying a pre-existing instance profile) also works fine.

@rifelpet rifelpet added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Feb 4, 2021
@rifelpet rifelpet added this to the v1.19 milestone Feb 4, 2021
@rifelpet
Copy link
Member

rifelpet commented Feb 4, 2021

Hi @alanbover thanks for the report and the detailed investigation! You're right it seems the code assumes that the IAM instance profile name matches the IAM role name which is not necessarily true. We should call iam.GetInstanceProfile to get the instance profile's list of roles and append their names to this list.

@h3poteto
Copy link
Contributor

h3poteto commented Feb 4, 2021

I got the same error.

@h3poteto
Copy link
Contributor

h3poteto commented Feb 4, 2021

Probably I can fix it, so I will take this issue.

/assign

@olemarkus
Copy link
Member

Thanks @h3poteto

@h3poteto
Copy link
Contributor

h3poteto commented Mar 3, 2021

I have confirmed that this issue has already been resolved in 1.19.1. So I will close.

/close

@k8s-ci-robot
Copy link
Contributor

@h3poteto: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

I have confirmed that this issue has already been resolved in 1.19.1. So I will close.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@olemarkus
Copy link
Member

Thanks :)
/close

@k8s-ci-robot
Copy link
Contributor

@olemarkus: Closing this issue.

In response to this:

Thanks :)
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release.
Projects
None yet
Development

No branches or pull requests

5 participants