Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not create cluster (following the getting_started) #15852

Closed
wo9999999999 opened this issue Sep 1, 2023 · 7 comments
Closed

Can not create cluster (following the getting_started) #15852

wo9999999999 opened this issue Sep 1, 2023 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@wo9999999999
Copy link

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.27.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

3. What cloud provider are you using?
aws

4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster

5. What happened after the commands executed?
cluster validation keep fail

NAME				ROLE		MACHINETYPE	MIN	MAX	SUBNETS
control-plane-us-east-1a	ControlPlane	t3.medium	1	1	us-east-1a
nodes-us-east-1a		Node		t3.medium	1	1	us-east-1a

NODE STATUS
NAME			ROLE		READY
i-0be13550669d0b732	control-plane	True
i-0c78304642cc52ef8	node		True

VALIDATION ERRORS
KIND	NAME						MESSAGE
Pod	kube-system/ebs-csi-controller-75fc64d98f-4dbzk	system-cluster-critical pod "ebs-csi-controller-75fc64d98f-4dbzk" is pending

Validation Failed
W0901 16:32:30.328628   45600 validate_cluster.go:232] (will retry): cluster not yet healthy

6. What did you expect to happen?
start cluster successfully

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: "2023-09-01T08:22:42Z"
  name: woleung.k8s.local
spec:
  api:
    loadBalancer:
      class: Network
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://testwokops5-example-com-state-store/woleung.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.5
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://testwokops4-example-com-oidc-store/woleung.k8s.local/discovery/woleung.k8s.local
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  topology:
    dns:
      type: Private
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:43Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: control-plane-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:44Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: nodes-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - us-east-1a

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
describe the pending pod

Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      ebs-csi-controller-sa
Node:                 <none>
Labels:               app=ebs-csi-controller
                      app.kubernetes.io/instance=aws-ebs-csi-driver
                      app.kubernetes.io/name=aws-ebs-csi-driver
                      app.kubernetes.io/version=v1.14.1
                      kops.k8s.io/managed-by=kops
                      pod-template-hash=75fc64d98f
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/ebs-csi-controller-75fc64d98f
Containers:
  ebs-plugin:
    Image:       registry.k8s.io/provider-aws/aws-ebs-csi-driver:v1.14.1@sha256:f0c5de192d832e7c1daa6580d4a62e8fa6fc8eabc0917ae4cb7ed4d15e95b59e
    Ports:       9808/TCP, 3301/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      controller
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --k8s-tag-cluster-id=woleung.k8s.local
      --extra-tags=KubernetesCluster=woleung.k8s.local
      --http-endpoint=0.0.0.0:3301
      --v=5
    Liveness:   http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Readiness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:                 unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      CSI_NODE_NAME:                 (v1:spec.nodeName)
      AWS_ACCESS_KEY_ID:            <set to the key 'key_id' in secret 'aws-secret'>      Optional: true
      AWS_SECRET_ACCESS_KEY:        <set to the key 'access_key' in secret 'aws-secret'>  Optional: true
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-provisioner:
    Image:      registry.k8s.io/sig-storage/csi-provisioner:v3.1.0@sha256:122bfb8c1edabb3c0edd63f06523e6940d958d19b3957dc7b1d6f81e9f1f6119
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election=true
      --default-fstype=ext4
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-attacher:
    Image:      registry.k8s.io/sig-storage/csi-attacher:v3.4.0@sha256:8b9c313c05f54fb04f8d430896f5f5904b6cb157df261501b29adc04d2b2dc7b
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --leader-election=true
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-resizer:
    Image:      registry.k8s.io/sig-storage/csi-resizer:v1.4.0@sha256:9ebbf9f023e7b41ccee3d52afe39a89e3ddacdbb69269d583abfc25847cfd9e4
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  liveness-probe:
    Image:      registry.k8s.io/sig-storage/livenessprobe:v2.6.0@sha256:406f59599991916d2942d8d02f076d957ed71b541ee19f09fc01723a6e6f5932
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
    Environment:
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  token-amazonaws-com:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  kube-api-access-pz7x2:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
                              topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m                     default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m54s (x3 over 4m24s)  default-scheduler  0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 1, 2023
@johngmyers
Copy link
Member

@justinsb
Copy link
Member

justinsb commented Sep 4, 2023

/assign

I was able to reproduce this, the issue is that we're trying to bring up two CSI pods but we only have two nodes (one control plane, one node), and one of them is tainted.

justinsb added a commit to justinsb/kops that referenced this issue Sep 4, 2023
Even when running on workers (using IRSA), if we try to run multiple
controllers we may have problems with node-spreading, and we don't
necessarily gain any availability, as we need an apiserver lease.

Issue kubernetes#15852
hakman pushed a commit to hakman/kops that referenced this issue Sep 6, 2023
Even when running on workers (using IRSA), if we try to run multiple
controllers we may have problems with node-spreading, and we don't
necessarily gain any availability, as we need an apiserver lease.

Issue kubernetes#15852
@mmadrid
Copy link

mmadrid commented Sep 22, 2023

I was having the same issue. It looks like it has been fixed and is part of the latest alpha release v1.29.0-alpha.1. I was able to get 2 node (1 master, 1 worker) cluster up using that version.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants