Can not create cluster (following the getting_started) #15852

wo9999999999 · 2023-09-01T08:36:46Z

/kind bug

1. What kops version are you running? The command kops version, will display
this information.
1.27.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

3. What cloud provider are you using?
aws

4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster

5. What happened after the commands executed?
cluster validation keep fail

NAME				ROLE		MACHINETYPE	MIN	MAX	SUBNETS
control-plane-us-east-1a	ControlPlane	t3.medium	1	1	us-east-1a
nodes-us-east-1a		Node		t3.medium	1	1	us-east-1a

NODE STATUS
NAME			ROLE		READY
i-0be13550669d0b732	control-plane	True
i-0c78304642cc52ef8	node		True

VALIDATION ERRORS
KIND	NAME						MESSAGE
Pod	kube-system/ebs-csi-controller-75fc64d98f-4dbzk	system-cluster-critical pod "ebs-csi-controller-75fc64d98f-4dbzk" is pending

Validation Failed
W0901 16:32:30.328628   45600 validate_cluster.go:232] (will retry): cluster not yet healthy

6. What did you expect to happen?
start cluster successfully

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: "2023-09-01T08:22:42Z"
  name: woleung.k8s.local
spec:
  api:
    loadBalancer:
      class: Network
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://testwokops5-example-com-state-store/woleung.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.5
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://testwokops4-example-com-oidc-store/woleung.k8s.local/discovery/woleung.k8s.local
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  topology:
    dns:
      type: Private
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:43Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: control-plane-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:44Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: nodes-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - us-east-1a

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
describe the pending pod

Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      ebs-csi-controller-sa
Node:                 <none>
Labels:               app=ebs-csi-controller
                      app.kubernetes.io/instance=aws-ebs-csi-driver
                      app.kubernetes.io/name=aws-ebs-csi-driver
                      app.kubernetes.io/version=v1.14.1
                      kops.k8s.io/managed-by=kops
                      pod-template-hash=75fc64d98f
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/ebs-csi-controller-75fc64d98f
Containers:
  ebs-plugin:
    Image:       registry.k8s.io/provider-aws/aws-ebs-csi-driver:v1.14.1@sha256:f0c5de192d832e7c1daa6580d4a62e8fa6fc8eabc0917ae4cb7ed4d15e95b59e
    Ports:       9808/TCP, 3301/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      controller
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --k8s-tag-cluster-id=woleung.k8s.local
      --extra-tags=KubernetesCluster=woleung.k8s.local
      --http-endpoint=0.0.0.0:3301
      --v=5
    Liveness:   http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Readiness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:                 unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      CSI_NODE_NAME:                 (v1:spec.nodeName)
      AWS_ACCESS_KEY_ID:            <set to the key 'key_id' in secret 'aws-secret'>      Optional: true
      AWS_SECRET_ACCESS_KEY:        <set to the key 'access_key' in secret 'aws-secret'>  Optional: true
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-provisioner:
    Image:      registry.k8s.io/sig-storage/csi-provisioner:v3.1.0@sha256:122bfb8c1edabb3c0edd63f06523e6940d958d19b3957dc7b1d6f81e9f1f6119
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election=true
      --default-fstype=ext4
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-attacher:
    Image:      registry.k8s.io/sig-storage/csi-attacher:v3.4.0@sha256:8b9c313c05f54fb04f8d430896f5f5904b6cb157df261501b29adc04d2b2dc7b
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --leader-election=true
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-resizer:
    Image:      registry.k8s.io/sig-storage/csi-resizer:v1.4.0@sha256:9ebbf9f023e7b41ccee3d52afe39a89e3ddacdbb69269d583abfc25847cfd9e4
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  liveness-probe:
    Image:      registry.k8s.io/sig-storage/livenessprobe:v2.6.0@sha256:406f59599991916d2942d8d02f076d957ed71b541ee19f09fc01723a6e6f5932
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
    Environment:
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  token-amazonaws-com:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  kube-api-access-pz7x2:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
                              topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m                     default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m54s (x3 over 4m24s)  default-scheduler  0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..

The text was updated successfully, but these errors were encountered:

johngmyers · 2023-09-04T02:57:26Z

Please see https://kops.sigs.k8s.io/operations/troubleshoot/

justinsb · 2023-09-04T12:09:54Z

/assign

I was able to reproduce this, the issue is that we're trying to bring up two CSI pods but we only have two nodes (one control plane, one node), and one of them is tainted.

Even when running on workers (using IRSA), if we try to run multiple controllers we may have problems with node-spreading, and we don't necessarily gain any availability, as we need an apiserver lease. Issue kubernetes#15852

mmadrid · 2023-09-22T00:26:34Z

I was having the same issue. It looks like it has been fixed and is part of the latest alpha release v1.29.0-alpha.1. I was able to get 2 node (1 master, 1 worker) cluster up using that version.

k8s-triage-robot · 2024-01-28T21:59:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-27T22:51:27Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-28T23:43:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-28T23:43:06Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 1, 2023

k8s-ci-robot assigned justinsb Sep 4, 2023

justinsb mentioned this issue Sep 4, 2023

Only run one replica of controller pods on non-HA clusters #15868

Merged

mmadrid mentioned this issue Sep 22, 2023

AWS cluster fails to create - ebs-csi-controller stays pending #15335

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not create cluster (following the getting_started) #15852

Can not create cluster (following the getting_started) #15852

wo9999999999 commented Sep 1, 2023

johngmyers commented Sep 4, 2023

justinsb commented Sep 4, 2023

mmadrid commented Sep 22, 2023

k8s-triage-robot commented Jan 28, 2024

k8s-triage-robot commented Feb 27, 2024

k8s-triage-robot commented Mar 28, 2024

k8s-ci-robot commented Mar 28, 2024

Can not create cluster (following the getting_started) #15852

Can not create cluster (following the getting_started) #15852

Comments

wo9999999999 commented Sep 1, 2023

johngmyers commented Sep 4, 2023

justinsb commented Sep 4, 2023

mmadrid commented Sep 22, 2023

k8s-triage-robot commented Jan 28, 2024

k8s-triage-robot commented Feb 27, 2024

k8s-triage-robot commented Mar 28, 2024

k8s-ci-robot commented Mar 28, 2024