Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitops apply fails to get cluster autoscaler working #1237

gemagomez opened this issue Aug 30, 2019 · 13 comments

gitops apply fails to get cluster autoscaler working #1237

gemagomez opened this issue Aug 30, 2019 · 13 comments


Copy link

gemagomez commented Aug 30, 2019

What happened?
Deployed with eksctl gitops apply and after deployment and adding flux's ssh key to my gitops repo, cluster autoscaler doesn't start:

My cluster looks as follows:

% kubectl get pods --all-namespaces                                      
NAMESPACE              NAME                                                      READY   STATUS             RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-4dhc2                                    1/1     Running            0          56m
amazon-cloudwatch      cloudwatch-agent-dkvqr                                    1/1     Running            0          56m
amazon-cloudwatch      fluentd-cloudwatch-qgg2c                                  1/1     Running            0          56m
amazon-cloudwatch      fluentd-cloudwatch-vzgbj                                  1/1     Running            0          56m
demo                   podinfo-75b8547f78-lxgns                                  1/1     Running            0          55m
flux                   flux-bd67dd99c-vjvj8                                      1/1     Running            0          64m
flux                   flux-helm-operator-6bc7c85bb5-tdfld                       1/1     Running            0          64m
flux                   memcached-958f745c-7dsp2                                  1/1     Running            0          64m
flux                   tiller-deploy-7ccc4b4d45-rf7kn                            1/1     Running            0          64m
kube-system            alb-ingress-controller-69f845f8f9-49q59                   1/1     Running            0          56m
kube-system            aws-node-87v4z                                            1/1     Running            0          2d3h
kube-system            aws-node-rtjq5                                            1/1     Running            0          2d3h
kube-system            cluster-autoscaler-5d74cbcb5-sdwm6                        0/1     CrashLoopBackOff   15         56m
kube-system            coredns-79d667b89f-lwht7                                  1/1     Running            0          2d3h
kube-system            coredns-79d667b89f-lxtjf                                  1/1     Running            0          2d3h
kube-system            kube-proxy-4fpgz                                          1/1     Running            0          2d3h
kube-system            kube-proxy-r9d9k                                          1/1     Running            0          2d3h
kubernetes-dashboard   dashboard-metrics-scraper-f7b5dbf7d-mnpv6                 1/1     Running            0          56m
kubernetes-dashboard   kubernetes-dashboard-7447f48f55-v9h7r                     1/1     Running            0          56m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running            0          54m
monitoring             metrics-server-7dfc675884-7tnmm                           1/1     Running            0          56m
monitoring             prometheus-operator-grafana-9bb769cf-7dhkn                2/2     Running            0          55m
monitoring             prometheus-operator-kube-state-metrics-79f476bff6-8kzp5   1/1     Running            0          55m
monitoring             prometheus-operator-operator-58fcb66576-cvvph             1/1     Running            0          55m
monitoring             prometheus-operator-prometheus-node-exporter-lpbqf        1/1     Running            0          55m
monitoring             prometheus-operator-prometheus-node-exporter-rvnns        1/1     Running            0          55m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running            1          54m

The error in the logs of the cluster-autoscaler container are:

75-NodeInstanceRole-13RPA3JTIZZ49/i-0528c95b7967742fa is not authorized to perform: autoscaling:DescribeTags
        status code: 403, request id: d6532115-cb13-11e9-ab64-bd252c647884
F0830 10:49:27.225451       1 cloud_provider_builder.go:149] Failed to create AWS Manager: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::376248598259:assumed-role/eksctl-wonderful-paint
ing-1566975-NodeInstanceRole-13RPA3JTIZZ49/i-0528c95b7967742fa is not authorized to perform: autoscaling:DescribeTags
        status code: 403, request id: d6532115-cb13-11e9-ab64-bd252c647884
@gemagomez gemagomez added this to the 0.5.0 milestone Aug 30, 2019
Copy link

I'd say this should be an issue in the profile repo.

Copy link

Yep, I agree with @errordeveloper

Copy link

marccarre commented Sep 2, 2019

FWIW, it is a matter of having the right IAM policies in place when creating the cluster, namely:

   - name: ng-1
     instanceType: m5.large
-    desiredCapacity: 1
+    minSize: 1
+    maxSize: 2
+    iam:
+      withAddonPolicies:
+        albIngress: true
+        autoScaler: true
+        cloudWatch: true

How do we want to proceed there?

  1. Is there a way to edit the cluster post creation?
  2. Do we want to simply document this in the profile repo? (A lot of users will likely miss this.)

Copy link

We should add this to the quickstart guide for now. We'll fix it as part of a different issue later on.

Copy link

I'll document this in the quickstart profile's repository, but I thought I'd also provide an example ClusterConfig manifest which users can use out-of-the-box. See also: #1249.

@errordeveloper errordeveloper modified the milestones: 0.5.0, 0.6.0 Sep 3, 2019
Copy link

@marccarre should this issue be closed actually?

Copy link

Not yet, only once we've merged weaveworks/eks-quickstart-app-dev#22

Copy link

marccarre commented Sep 4, 2019

Fixed by weaveworks/eks-quickstart-app-dev#22

Copy link

happened to me today.. when applying profile app-dev when creating a new cluster from gitops... how do I fix?

Copy link

@ilanpillemer, did you have the required IAM roles in place in your cluster?

If you have them, then this should work fine. See also the steps in this collapsible. (I just re-ran this myself to be sure it still does work as expected. It does.)
$ git diff
diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml
index 487cb46b..5783c605 100644
--- a/examples/eks-quickstart-app-dev.yaml
+++ b/examples/eks-quickstart-app-dev.yaml
@@ -5,8 +5,8 @@ apiVersion:
 kind: ClusterConfig
-  name: cluster-12
-  region: eu-north-1
+  name: mc-1237-testing-with-iam
+  region: ap-northeast-1
   - name: ng-1

$ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml 
[ℹ]  eksctl version 0.11.1
[ℹ]  using region ap-northeast-1
[ℹ]  setting availability zones to [ap-northeast-1c ap-northeast-1d ap-northeast-1a]
[ℹ]  subnets for ap-northeast-1c - public: private:
[ℹ]  subnets for ap-northeast-1d - public: private:
[ℹ]  subnets for ap-northeast-1a - public: private:
[ℹ]  nodegroup "ng-1" will use "ami-02e124a380df41614" [AmazonLinux2/1.14]
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region with un-managed nodes
[ℹ]  1 nodegroup (ng-1) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ]  will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam'
[ℹ]  CloudWatch logging will not be enabled for cluster "mc-1237-testing-with-iam" in "ap-northeast-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing-with-iam" in "ap-northeast-1"
[ℹ]  2 sequential tasks: { create cluster control plane "mc-1237-testing-with-iam", create nodegroup "ng-1" }
[ℹ]  building cluster stack "eksctl-mc-1237-testing-with-iam-cluster"
[ℹ]  deploying stack "eksctl-mc-1237-testing-with-iam-cluster"
[ℹ]  building nodegroup stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1"
[ℹ]  deploying stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1"
[✔]  all EKS cluster resources for "mc-1237-testing-with-iam" have been created
[✔]  saved kubeconfig as "${HOME}/.kube/config"
[ℹ]  adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-with-iam-n-NodeInstanceRole-1M7OF6KB2D8RV" to auth ConfigMap
[ℹ]  nodegroup "ng-1" has 0 node(s)
[ℹ]  waiting for at least 1 node(s) to become ready in "ng-1"
[ℹ]  nodegroup "ng-1" has 1 node(s)
[ℹ]  node "ip-192-168-13-77.ap-northeast-1.compute.internal" is ready
[ℹ]  kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region is ready

$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
>     -f examples/eks-quickstart-app-dev.yaml \
>     --git-email \
>     --git-url

[ℹ]  Generating public key infrastructure for the Helm Operator and Tiller
[ℹ]    this may take up to a minute, please be patient
[!]  Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki431635447"
[!]  please move the files into a safe place or delete them
[ℹ]  Generating manifests
[ℹ]  Cloning
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-956113642'...
remote: Enumerating objects: 59, done.        
remote: Counting objects: 100% (59/59), done.        
remote: Compressing objects: 100% (55/55), done.        
remote: Total 447 (delta 11), reused 50 (delta 3), pack-reused 388        
Receiving objects: 100% (447/447), 183.32 KiB | 514.00 KiB/s, done.
Resolving deltas: 100% (157/157), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  Writing Flux manifests
[ℹ]  created "Namespace/flux"
[ℹ]  Applying Helm TLS Secret(s)
[ℹ]  created "flux:Secret/flux-helm-tls-cert"
[ℹ]  created "flux:Secret/tiller-secret"
[!]  Note: certificate secrets aren't added to the Git repository for security reasons
[ℹ]  Applying manifests
[ℹ]  created "flux:Deployment.apps/flux"
[ℹ]  created "flux:ServiceAccount/flux-helm-operator"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created "flux:Secret/flux-git-deploy"
[ℹ]  created "flux:Deployment.apps/memcached"
[ℹ]  created "flux:Deployment.apps/flux-helm-operator"
[ℹ]  created "flux:Deployment.extensions/tiller-deploy"
[ℹ]  created "flux:Service/tiller-deploy"
[ℹ]  created "flux:Service/memcached"
[ℹ]  created "flux:ServiceAccount/flux"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created "flux:ConfigMap/flux-helm-tls-ca-config"
[ℹ]  created "flux:ServiceAccount/tiller"
[ℹ]  created ""
[ℹ]  created "flux:ServiceAccount/helm"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  Waiting for Helm Operator to start
ERROR: logging before flag.Parse: E1210 18:44:24.787197    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:24 socat[6735] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:26.816135    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:26 socat[6814] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:28.844545    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:28 socat[6870] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:30.877698    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:30 socat[6967] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:32.914902    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:32 socat[7082] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:34.944906    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:34 socat[7084] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:36.971253    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:36 socat[7085] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:38.998610    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:39 socat[7090] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:41.023201    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:41 socat[7093] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:43.053384    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:43 socat[7113] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:45.084005    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:45 socat[7115] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:47.115951    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:47 socat[7116] E connect(5, AF=2, 16): Connection refused
[!]  Helm Operator is not ready yet (Get EOF), retrying ...
[ℹ]  Helm Operator started successfully
[ℹ]  see for details on how to use the Helm Operator
[ℹ]  Waiting for Flux to start
[ℹ]  Flux started successfully
[ℹ]  see for details on how to use Flux
[ℹ]  Committing and pushing manifests to
[master 15b0aad] Add Initial Flux configuration
 13 files changed, 803 insertions(+)
 create mode 100644 flux/flux-account.yaml
 create mode 100644 flux/flux-deployment.yaml
 create mode 100644 flux/flux-helm-operator-account.yaml
 create mode 100644 flux/flux-helm-release-crd.yaml
 create mode 100644 flux/flux-namespace.yaml
 create mode 100644 flux/flux-secret.yaml
 create mode 100644 flux/helm-operator-deployment.yaml
 create mode 100644 flux/memcache-dep.yaml
 create mode 100644 flux/memcache-svc.yaml
 create mode 100644 flux/tiller-ca-cert-configmap.yaml
 create mode 100644 flux/tiller-dep.yaml
 create mode 100644 flux/tiller-rbac.yaml
 create mode 100644 flux/tiller-svc.yaml
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done.
Total 16 (delta 1), reused 12 (delta 1)
remote: Resolving deltas: 100% (1/1), done.        
   e54ab6f..15b0aad  master -> master
[ℹ]  Flux will only operate properly once it has write-access to the Git repository
[ℹ]  please configure so that the following Flux SSH public key has write access to it
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFgi4LH0m5lCSUf/qmBTTZIz3MASZOQMepyDUYxtmAycwC0158op7ykTvHgmAqfXMxS90LzDQ4qPUxWKgExfjnWv3u7gWJBhDJhhDyLEodJLO6/IljgC1rUPTj5QJ1AwcPM7cvoB5sIBVq1iU6Jmf0Hp/BL2QEiLdiBdpA4HkPGKOMvzB+nNiLg4iJbCdAKAefHJWqWvf2k+PPTkVgpQ9ujcyQ+KHczY8Aj4HPu9he8C8S9Sqj2Vxq/qKZVbAuxllINy/WXlCB9SdbPx1b66g9Hiw6meoXiYJPaLft78SVXLQBx7l1anDabmcRnNHSChwMY8AAVFBssm537DyAHuG5

### Then added the above SSH key to

$ kubectl get po --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
flux          flux-7696dbc4cd-sjbv7                 1/1     Running   0          17m
flux          flux-helm-operator-8687676b89-qw7kq   1/1     Running   0          17m
flux          memcached-5dcd7579-7bn6l              1/1     Running   0          17m
flux          tiller-deploy-69547b56b4-p6zxd        1/1     Running   0          17m
kube-system   aws-node-f8g7z                        1/1     Running   0          20m
kube-system   coredns-699bb99bf8-gptx4              1/1     Running   0          27m
kube-system   coredns-699bb99bf8-smzch              1/1     Running   0          27m
kube-system   kube-proxy-28xqt                      1/1     Running   0          20m

$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
>     -f examples/eks-quickstart-app-dev.yaml \
>     --git-email \
>     --git-url
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386'...
remote: Enumerating objects: 63, done.        
remote: Counting objects: 100% (63/63), done.        
remote: Compressing objects: 100% (59/59), done.        
remote: Total 451 (delta 12), reused 53 (delta 3), pack-reused 388        
Receiving objects: 100% (451/451), 185.04 KiB | 104.00 KiB/s, done.
Resolving deltas: 100% (158/158), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  cloning repository "":master
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-008692361'...
remote: Enumerating objects: 5, done.        
remote: Counting objects: 100% (5/5), done.        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209        
Receiving objects: 100% (214/214), 57.27 KiB | 335.00 KiB/s, done.
Resolving deltas: 100% (92/92), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  processing template files in repository
[ℹ]  writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386/base"
[master b7070d5] Add app-dev quickstart components
 27 files changed, 1380 insertions(+)
 create mode 100644 base/LICENSE
 create mode 100644 base/
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml
 create mode 100644 base/demo/helm-release.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml
 create mode 100644 base/monitoring/metrics-server.yaml
 create mode 100644 base/monitoring/prometheus-operator.yaml
 create mode 100644 base/namespaces/amazon-cloudwatch.yaml
 create mode 100644 base/namespaces/demo.yaml
 create mode 100644 base/namespaces/kubernetes-dashboard.yaml
 create mode 100644 base/namespaces/monitoring.yaml
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (36/36), 13.54 KiB | 13.54 MiB/s, done.
Total 36 (delta 7), reused 27 (delta 7)
remote: Resolving deltas: 100% (7/7), done.        
   15b0aad..b7070d5  master -> master

$ kubectl get po --all-namespaces
NAMESPACE              NAME                                                      READY   STATUS    RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-h9wr7                                    1/1     Running   0          15m
amazon-cloudwatch      fluentd-cloudwatch-8r5f6                                  1/1     Running   0          15m
demo                   podinfo-67b7886b6c-bvdtm                                  1/1     Running   0          15m
flux                   flux-7696dbc4cd-sjbv7                                     1/1     Running   0          36m
flux                   flux-helm-operator-8687676b89-qw7kq                       1/1     Running   0          36m
flux                   memcached-5dcd7579-7bn6l                                  1/1     Running   0          36m
flux                   tiller-deploy-69547b56b4-p6zxd                            1/1     Running   0          36m
kube-system            alb-ingress-controller-8df75bc98-gssb9                    1/1     Running   0          15m
kube-system            aws-node-f8g7z                                            1/1     Running   0          39m
kube-system            cluster-autoscaler-86d68b66cb-b9xqv                       1/1     Running   0          15m
kube-system            coredns-699bb99bf8-gptx4                                  1/1     Running   0          46m
kube-system            coredns-699bb99bf8-smzch                                  1/1     Running   0          46m
kube-system            kube-proxy-28xqt                                          1/1     Running   0          39m
kubernetes-dashboard   dashboard-metrics-scraper-65785bfbc-s8tq6                 1/1     Running   0          15m
kubernetes-dashboard   kubernetes-dashboard-76b969b44b-rwgk5                     1/1     Running   0          15m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running   0          14m
monitoring             metrics-server-5df4599bd7-cgh79                           1/1     Running   0          15m
monitoring             prometheus-operator-grafana-dd95fb7d4-n9ddh               2/2     Running   0          15m
monitoring             prometheus-operator-kube-state-metrics-5d7558d7cc-h8xgg   1/1     Running   0          15m
monitoring             prometheus-operator-operator-67895dd7c5-nqj7w             1/1     Running   0          15m
monitoring             prometheus-operator-prometheus-node-exporter-qp8gp        1/1     Running   0          15m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running   1          14m
If, however, you do NOT have the IAM roles in place, then the cluster-autoscaler will CrashLoopBackOff. See also these steps which reproduce the issue. (Which I have also run, to double check things & ensure I can actually reproduce the issue.)
$ eksctl create cluster --name mc-1237-testing
[ℹ]  eksctl version 0.11.1
[ℹ]  using region ap-northeast-1
[ℹ]  setting availability zones to [ap-northeast-1d ap-northeast-1a ap-northeast-1c]
[ℹ]  subnets for ap-northeast-1d - public: private:
[ℹ]  subnets for ap-northeast-1a - public: private:
[ℹ]  subnets for ap-northeast-1c - public: private:
[ℹ]  nodegroup "ng-7bfc0f1f" will use "ami-02e124a380df41614" [AmazonLinux2/1.14]
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "mc-1237-testing" in "ap-northeast-1" region with un-managed nodes
[ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing'
[ℹ]  CloudWatch logging will not be enabled for cluster "mc-1237-testing" in "ap-northeast-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing" in "ap-northeast-1"
[ℹ]  2 sequential tasks: { create cluster control plane "mc-1237-testing", create nodegroup "ng-7bfc0f1f" }
[ℹ]  building cluster stack "eksctl-mc-1237-testing-cluster"
[ℹ]  deploying stack "eksctl-mc-1237-testing-cluster"
[ℹ]  building nodegroup stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f"
[ℹ]  --nodes-min=2 was set automatically for nodegroup ng-7bfc0f1f
[ℹ]  --nodes-max=2 was set automatically for nodegroup ng-7bfc0f1f
[ℹ]  deploying stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f"
[✔]  all EKS cluster resources for "mc-1237-testing" have been created
[✔]  saved kubeconfig as "${HOME}/.kube/config"
[ℹ]  adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-nodegroup-NodeInstanceRole-KGOKLPVNIK10" to auth ConfigMap
[ℹ]  nodegroup "ng-7bfc0f1f" has 0 node(s)
[ℹ]  waiting for at least 2 node(s) to become ready in "ng-7bfc0f1f"
[ℹ]  nodegroup "ng-7bfc0f1f" has 2 node(s)
[ℹ]  node "ip-192-168-2-23.ap-northeast-1.compute.internal" is ready
[ℹ]  node "ip-192-168-48-84.ap-northeast-1.compute.internal" is ready
[ℹ]  kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "mc-1237-testing" in "ap-northeast-1" region is ready

$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
>    --cluster mc-1237-testing \
>    --region ap-northeast-1 \
>    --git-email \
>    --git-url

[ℹ]  Generating public key infrastructure for the Helm Operator and Tiller
[ℹ]    this may take up to a minute, please be patient
[!]  Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki563648596"
[!]  please move the files into a safe place or delete them
[ℹ]  Generating manifests
[ℹ]  Cloning
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-026154915'...
remote: Enumerating objects: 43, done.        
remote: Counting objects: 100% (43/43), done.        
remote: Compressing objects: 100% (40/40), done.        
remote: Total 431 (delta 9), reused 35 (delta 3), pack-reused 388        
Receiving objects: 100% (431/431), 177.90 KiB | 497.00 KiB/s, done.
Resolving deltas: 100% (155/155), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  Writing Flux manifests
[ℹ]  created "Namespace/flux"
[ℹ]  Applying Helm TLS Secret(s)
[ℹ]  created "flux:Secret/flux-helm-tls-cert"
[ℹ]  created "flux:Secret/tiller-secret"
[!]  Note: certificate secrets aren't added to the Git repository for security reasons
[ℹ]  Applying manifests
[ℹ]  created "flux:ServiceAccount/flux"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created "flux:Service/memcached"
[ℹ]  created "flux:ServiceAccount/tiller"
[ℹ]  created ""
[ℹ]  created "flux:ServiceAccount/helm"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created "flux:Deployment.extensions/tiller-deploy"
[ℹ]  created "flux:Deployment.apps/flux"
[ℹ]  created "flux:ConfigMap/flux-helm-tls-ca-config"
[ℹ]  created "flux:Deployment.apps/flux-helm-operator"
[ℹ]  created "flux:Deployment.apps/memcached"
[ℹ]  created "flux:Secret/flux-git-deploy"
[ℹ]  created "flux:ServiceAccount/flux-helm-operator"
[ℹ]  created ""
[ℹ]  created ""
[ℹ]  created "flux:Service/tiller-deploy"
[ℹ]  Waiting for Helm Operator to start
[ℹ]  Helm Operator started successfully
[ℹ]  see for details on how to use the Helm Operator
[ℹ]  Waiting for Flux to start
[ℹ]  Flux started successfully
[ℹ]  see for details on how to use Flux
[ℹ]  Committing and pushing manifests to
[master f8e0c52] Add Initial Flux configuration
 13 files changed, 803 insertions(+)
 create mode 100644 flux/flux-account.yaml
 create mode 100644 flux/flux-deployment.yaml
 create mode 100644 flux/flux-helm-operator-account.yaml
 create mode 100644 flux/flux-helm-release-crd.yaml
 create mode 100644 flux/flux-namespace.yaml
 create mode 100644 flux/flux-secret.yaml
 create mode 100644 flux/helm-operator-deployment.yaml
 create mode 100644 flux/memcache-dep.yaml
 create mode 100644 flux/memcache-svc.yaml
 create mode 100644 flux/tiller-ca-cert-configmap.yaml
 create mode 100644 flux/tiller-dep.yaml
 create mode 100644 flux/tiller-rbac.yaml
 create mode 100644 flux/tiller-svc.yaml
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done.
Total 16 (delta 1), reused 12 (delta 1)
remote: Resolving deltas: 100% (1/1), done.        
   4b9a79d..f8e0c52  master -> master
[ℹ]  Flux will only operate properly once it has write-access to the Git repository
[ℹ]  please configure so that the following Flux SSH public key has write access to it
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoYrh1xqsHGQuJZnsY2hiOyplanBS/wmLQaxyPu2eMexmG1uy4Vq+e1qHQ6ukTlPSV92N2diz7Mml/VnfMIu6/S6WpcEa8s8cX+4X2w4DN5VGcOdMbRa76Td6me1Kp7X4BvQSpmtfj380+7dY+yxywTVf97ZFYq1atitxvjgVHIUCDLAXxqmM2t7OnH5nYEJFS+32BRmENMpzEfB+31PiOAgsUHENA4BCr0sbxDpKt3j4hzJbntgYQVyhaNLBH8S34Ogz1V0i8H5iplJ6YjsNXpeUhmRYFH4rKOTi0EJv7wEWMEH1gttQvLxhHAd6s4qDMB27aQSJFMh55/DW/r6Z

### Then added the above SSH key to

$ kubectl get po --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
flux          flux-7696dbc4cd-4h927                 1/1     Running   0          69s
flux          flux-helm-operator-8687676b89-hskbj   1/1     Running   0          68s
flux          memcached-5dcd7579-tpkvd              1/1     Running   0          69s
flux          tiller-deploy-69547b56b4-sp9md        1/1     Running   0          69s
kube-system   aws-node-97px5                        1/1     Running   0          7m5s
kube-system   aws-node-kxbzd                        1/1     Running   0          7m5s
kube-system   coredns-699bb99bf8-sn7ws              1/1     Running   0          13m
kube-system   coredns-699bb99bf8-zx26g              1/1     Running   0          13m
kube-system   kube-proxy-t2rvs                      1/1     Running   0          7m5s
kube-system   kube-proxy-tkncf                      1/1     Running   0          7m5s

$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
>    --cluster mc-1237-testing \
>    --region ap-northeast-1 \
>    --git-email \
>    --git-url

Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557'...
remote: Enumerating objects: 47, done.        
remote: Counting objects: 100% (47/47), done.        
remote: Compressing objects: 100% (44/44), done.        
remote: Total 435 (delta 10), reused 38 (delta 3), pack-reused 388        
Receiving objects: 100% (435/435), 179.62 KiB | 494.00 KiB/s, done.
Resolving deltas: 100% (156/156), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  cloning repository "":master
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-019213272'...
remote: Enumerating objects: 5, done.        
remote: Counting objects: 100% (5/5), done.        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209        
Receiving objects: 100% (214/214), 57.27 KiB | 322.00 KiB/s, done.
Resolving deltas: 100% (92/92), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  processing template files in repository
[ℹ]  writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557/base"
[master 5e6bcf5] Add app-dev quickstart components
 27 files changed, 1380 insertions(+)
 create mode 100644 base/LICENSE
 create mode 100644 base/
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml
 create mode 100644 base/demo/helm-release.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml
 create mode 100644 base/monitoring/metrics-server.yaml
 create mode 100644 base/monitoring/prometheus-operator.yaml
 create mode 100644 base/namespaces/amazon-cloudwatch.yaml
 create mode 100644 base/namespaces/demo.yaml
 create mode 100644 base/namespaces/kubernetes-dashboard.yaml
 create mode 100644 base/namespaces/monitoring.yaml
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (36/36), 13.52 KiB | 13.52 MiB/s, done.
Total 36 (delta 7), reused 25 (delta 7)
remote: Resolving deltas: 100% (7/7), done.        
   f8e0c52..5e6bcf5  master -> master

$ kubectl get po --all-namespaces
NAMESPACE              NAME                                                      READY   STATUS             RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-6km5b                                    1/1     Running            0          109m
amazon-cloudwatch      cloudwatch-agent-kcpb9                                    1/1     Running            0          109m
amazon-cloudwatch      fluentd-cloudwatch-8wxxn                                  1/1     Running            0          109m
amazon-cloudwatch      fluentd-cloudwatch-nst52                                  1/1     Running            0          109m
demo                   podinfo-67b7886b6c-pjws4                                  1/1     Running            0          109m
flux                   flux-7696dbc4cd-4h927                                     1/1     Running            0          116m
flux                   flux-helm-operator-8687676b89-hskbj                       1/1     Running            0          115m
flux                   memcached-5dcd7579-tpkvd                                  1/1     Running            0          116m
flux                   tiller-deploy-69547b56b4-sp9md                            1/1     Running            0          116m
kube-system            alb-ingress-controller-776b5b58c9-bbt7t                   1/1     Running            0          109m
kube-system            aws-node-97px5                                            1/1     Running            0          121m
kube-system            aws-node-kxbzd                                            1/1     Running            0          121m
kube-system            cluster-autoscaler-55d556f787-rm7cc                       0/1     CrashLoopBackOff   25         109m
kube-system            coredns-699bb99bf8-sn7ws                                  1/1     Running            0          128m
kube-system            coredns-699bb99bf8-zx26g                                  1/1     Running            0          128m
kube-system            kube-proxy-t2rvs                                          1/1     Running            0          121m
kube-system            kube-proxy-tkncf                                          1/1     Running            0          121m
kubernetes-dashboard   dashboard-metrics-scraper-65785bfbc-52952                 1/1     Running            0          109m
kubernetes-dashboard   kubernetes-dashboard-76b969b44b-hf9kd                     1/1     Running            0          109m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running            0          108m
monitoring             metrics-server-5df4599bd7-l5b8q                           1/1     Running            0          109m
monitoring             prometheus-operator-grafana-dd95fb7d4-gzqxn               2/2     Running            0          109m
monitoring             prometheus-operator-kube-state-metrics-5d7558d7cc-qx4tl   1/1     Running            0          109m
monitoring             prometheus-operator-operator-67895dd7c5-nhbbv             1/1     Running            0          109m
monitoring             prometheus-operator-prometheus-node-exporter-77nb6        1/1     Running            0          109m
monitoring             prometheus-operator-prometheus-node-exporter-hfdv9        1/1     Running            0          109m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running            1          108m

Copy link

ilanpillemer commented Dec 10, 2019 via email

Copy link

If you follow the instructions word for
word it fails.

Which instructions were you following exactly @ilanpillemer? (Could you please share a link to them to ensure we are on the same page, and/or so that we know if we need to update/correct anything published elsewhere? 🙇 )

If you are talking about something else than this, would you have any suggestion to make these instructions clearer?

Note that the pre-requisites for the app-dev profile are documented here:, but any suggestion on how to improve this & make it more obvious is always welcome! ✨

You need to add the roles with the necessary config when
creating the cluster.

Yes, this is what the first two commands in what I shared here were hoping to show, i.e.:

  1. Use a ClusterConfig with the appropriate roles, e.g. examples/eks-quickstart-app-dev.yaml:

    $ git diff
    diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml

    Indeed, this file define the following IAM roles:

  2. Create the cluster by passing a reference to this file.

    $ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml 

Copy link

Yes. Now it seems completely obvious what I had to do with hindsight. I think a very minor tweak would help. I used the gitops quick start guide at When I look now it says some variant of the command should be used. Perhaps a few more words like for example if you need the auto scaler or alb ingress then the necessary switches you can find in the documents should be used. Or something similar. Great work with eksctl and flux, they are game changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

5 participants