`gitops apply` fails to get cluster autoscaler working #1237

gemagomez · 2019-08-30T11:13:59Z

What happened?
Deployed with eksctl gitops apply and after deployment and adding flux's ssh key to my gitops repo, cluster autoscaler doesn't start:

My cluster looks as follows:

% kubectl get pods --all-namespaces                                      
NAMESPACE              NAME                                                      READY   STATUS             RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-4dhc2                                    1/1     Running            0          56m
amazon-cloudwatch      cloudwatch-agent-dkvqr                                    1/1     Running            0          56m
amazon-cloudwatch      fluentd-cloudwatch-qgg2c                                  1/1     Running            0          56m
amazon-cloudwatch      fluentd-cloudwatch-vzgbj                                  1/1     Running            0          56m
demo                   podinfo-75b8547f78-lxgns                                  1/1     Running            0          55m
flux                   flux-bd67dd99c-vjvj8                                      1/1     Running            0          64m
flux                   flux-helm-operator-6bc7c85bb5-tdfld                       1/1     Running            0          64m
flux                   memcached-958f745c-7dsp2                                  1/1     Running            0          64m
flux                   tiller-deploy-7ccc4b4d45-rf7kn                            1/1     Running            0          64m
kube-system            alb-ingress-controller-69f845f8f9-49q59                   1/1     Running            0          56m
kube-system            aws-node-87v4z                                            1/1     Running            0          2d3h
kube-system            aws-node-rtjq5                                            1/1     Running            0          2d3h
kube-system            cluster-autoscaler-5d74cbcb5-sdwm6                        0/1     CrashLoopBackOff   15         56m
kube-system            coredns-79d667b89f-lwht7                                  1/1     Running            0          2d3h
kube-system            coredns-79d667b89f-lxtjf                                  1/1     Running            0          2d3h
kube-system            kube-proxy-4fpgz                                          1/1     Running            0          2d3h
kube-system            kube-proxy-r9d9k                                          1/1     Running            0          2d3h
kubernetes-dashboard   dashboard-metrics-scraper-f7b5dbf7d-mnpv6                 1/1     Running            0          56m
kubernetes-dashboard   kubernetes-dashboard-7447f48f55-v9h7r                     1/1     Running            0          56m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running            0          54m
monitoring             metrics-server-7dfc675884-7tnmm                           1/1     Running            0          56m
monitoring             prometheus-operator-grafana-9bb769cf-7dhkn                2/2     Running            0          55m
monitoring             prometheus-operator-kube-state-metrics-79f476bff6-8kzp5   1/1     Running            0          55m
monitoring             prometheus-operator-operator-58fcb66576-cvvph             1/1     Running            0          55m
monitoring             prometheus-operator-prometheus-node-exporter-lpbqf        1/1     Running            0          55m
monitoring             prometheus-operator-prometheus-node-exporter-rvnns        1/1     Running            0          55m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running            1          54m

The error in the logs of the cluster-autoscaler container are:

75-NodeInstanceRole-13RPA3JTIZZ49/i-0528c95b7967742fa is not authorized to perform: autoscaling:DescribeTags
        status code: 403, request id: d6532115-cb13-11e9-ab64-bd252c647884
F0830 10:49:27.225451       1 cloud_provider_builder.go:149] Failed to create AWS Manager: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::376248598259:assumed-role/eksctl-wonderful-paint
ing-1566975-NodeInstanceRole-13RPA3JTIZZ49/i-0528c95b7967742fa is not authorized to perform: autoscaling:DescribeTags
        status code: 403, request id: d6532115-cb13-11e9-ab64-bd252c647884

The text was updated successfully, but these errors were encountered:

errordeveloper · 2019-08-30T11:31:44Z

I'd say this should be an issue in the profile repo.

2opremio · 2019-08-30T16:49:09Z

Yep, I agree with @errordeveloper

marccarre · 2019-09-02T08:35:00Z

FWIW, it is a matter of having the right IAM policies in place when creating the cluster, namely:

 nodeGroups:
   - name: ng-1
     instanceType: m5.large
-    desiredCapacity: 1
+    minSize: 1
+    maxSize: 2
+    iam:
+      withAddonPolicies:
+        albIngress: true
+        autoScaler: true
+        cloudWatch: true
 
 cloudWatch:
     clusterLogging:

How do we want to proceed there?

Is there a way to edit the cluster post creation?
Do we want to simply document this in the profile repo? (A lot of users will likely miss this.)

gemagomez · 2019-09-02T09:34:23Z

We should add this to the quickstart guide for now. We'll fix it as part of a different issue later on.

marccarre · 2019-09-02T11:09:19Z

I'll document this in the quickstart profile's repository, but I thought I'd also provide an example ClusterConfig manifest which users can use out-of-the-box. See also: #1249.

errordeveloper · 2019-09-04T07:29:37Z

@marccarre should this issue be closed actually?

marccarre · 2019-09-04T07:32:53Z

Not yet, only once we've merged weaveworks/eks-quickstart-app-dev#22

marccarre · 2019-09-04T07:51:21Z

Fixed by weaveworks/eks-quickstart-app-dev#22

ilanpillemer · 2019-11-25T15:29:48Z

happened to me today.. when applying profile app-dev when creating a new cluster from gitops... how do I fix?

marccarre · 2019-12-10T10:33:55Z

@ilanpillemer, did you have the required IAM roles in place in your cluster?

If you have them, then this should work fine. See also the steps in this collapsible. (I just re-ran this myself to be sure it still does work as expected. It does.)

$ git diff
diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml
index 487cb46b..5783c605 100644
--- a/examples/eks-quickstart-app-dev.yaml
+++ b/examples/eks-quickstart-app-dev.yaml
@@ -5,8 +5,8 @@ apiVersion: eksctl.io/v1alpha5
 kind: ClusterConfig
 
 metadata:
-  name: cluster-12
-  region: eu-north-1
+  name: mc-1237-testing-with-iam
+  region: ap-northeast-1
 
 nodeGroups:
   - name: ng-1


$ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml 
[ℹ]  eksctl version 0.11.1
[ℹ]  using region ap-northeast-1
[ℹ]  setting availability zones to [ap-northeast-1c ap-northeast-1d ap-northeast-1a]
[ℹ]  subnets for ap-northeast-1c - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ]  subnets for ap-northeast-1d - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ]  subnets for ap-northeast-1a - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ]  nodegroup "ng-1" will use "ami-02e124a380df41614" [AmazonLinux2/1.14]
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region with un-managed nodes
[ℹ]  1 nodegroup (ng-1) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ]  will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam'
[ℹ]  CloudWatch logging will not be enabled for cluster "mc-1237-testing-with-iam" in "ap-northeast-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing-with-iam" in "ap-northeast-1"
[ℹ]  2 sequential tasks: { create cluster control plane "mc-1237-testing-with-iam", create nodegroup "ng-1" }
[ℹ]  building cluster stack "eksctl-mc-1237-testing-with-iam-cluster"
[ℹ]  deploying stack "eksctl-mc-1237-testing-with-iam-cluster"
[ℹ]  building nodegroup stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1"
[ℹ]  deploying stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1"
[✔]  all EKS cluster resources for "mc-1237-testing-with-iam" have been created
[✔]  saved kubeconfig as "${HOME}/.kube/config"
[ℹ]  adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-with-iam-n-NodeInstanceRole-1M7OF6KB2D8RV" to auth ConfigMap
[ℹ]  nodegroup "ng-1" has 0 node(s)
[ℹ]  waiting for at least 1 node(s) to become ready in "ng-1"
[ℹ]  nodegroup "ng-1" has 1 node(s)
[ℹ]  node "ip-192-168-13-77.ap-northeast-1.compute.internal" is ready
[ℹ]  kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region is ready


$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
>     -f examples/eks-quickstart-app-dev.yaml \
>     --git-email carre.marc+flux@gmail.com \
>     --git-url git@github.com:marccarre/my-gitops-repo.git

[ℹ]  Generating public key infrastructure for the Helm Operator and Tiller
[ℹ]    this may take up to a minute, please be patient
[!]  Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki431635447"
[!]  please move the files into a safe place or delete them
[ℹ]  Generating manifests
[ℹ]  Cloning git@github.com:marccarre/my-gitops-repo.git
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-956113642'...
remote: Enumerating objects: 59, done.        
remote: Counting objects: 100% (59/59), done.        
remote: Compressing objects: 100% (55/55), done.        
remote: Total 447 (delta 11), reused 50 (delta 3), pack-reused 388        
Receiving objects: 100% (447/447), 183.32 KiB | 514.00 KiB/s, done.
Resolving deltas: 100% (157/157), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  Writing Flux manifests
[ℹ]  created "Namespace/flux"
[ℹ]  Applying Helm TLS Secret(s)
[ℹ]  created "flux:Secret/flux-helm-tls-cert"
[ℹ]  created "flux:Secret/tiller-secret"
[!]  Note: certificate secrets aren't added to the Git repository for security reasons
[ℹ]  Applying manifests
[ℹ]  created "flux:Deployment.apps/flux"
[ℹ]  created "flux:ServiceAccount/flux-helm-operator"
[ℹ]  created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator"
[ℹ]  created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io"
[ℹ]  created "flux:Secret/flux-git-deploy"
[ℹ]  created "flux:Deployment.apps/memcached"
[ℹ]  created "flux:Deployment.apps/flux-helm-operator"
[ℹ]  created "flux:Deployment.extensions/tiller-deploy"
[ℹ]  created "flux:Service/tiller-deploy"
[ℹ]  created "flux:Service/memcached"
[ℹ]  created "flux:ServiceAccount/flux"
[ℹ]  created "ClusterRole.rbac.authorization.k8s.io/flux"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/flux"
[ℹ]  created "flux:ConfigMap/flux-helm-tls-ca-config"
[ℹ]  created "flux:ServiceAccount/tiller"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller"
[ℹ]  created "flux:ServiceAccount/helm"
[ℹ]  created "flux:Role.rbac.authorization.k8s.io/tiller-user"
[ℹ]  created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding"
[ℹ]  Waiting for Helm Operator to start
ERROR: logging before flag.Parse: E1210 18:44:24.787197    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:24 socat[6735] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:26.816135    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:26 socat[6814] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:28.844545    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:28 socat[6870] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:30.877698    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:30 socat[6967] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:32.914902    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:32 socat[7082] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:34.944906    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:34 socat[7084] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:36.971253    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:36 socat[7085] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:38.998610    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:39 socat[7090] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:41.023201    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:41 socat[7093] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:43.053384    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:43 socat[7113] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:45.084005    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:45 socat[7115] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
ERROR: logging before flag.Parse: E1210 18:44:47.115951    4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:47 socat[7116] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused
[!]  Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ...
[ℹ]  Helm Operator started successfully
[ℹ]  see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator
[ℹ]  Waiting for Flux to start
[ℹ]  Flux started successfully
[ℹ]  see https://docs.fluxcd.io/projects/flux for details on how to use Flux
[ℹ]  Committing and pushing manifests to git@github.com:marccarre/my-gitops-repo.git
[master 15b0aad] Add Initial Flux configuration
 13 files changed, 803 insertions(+)
 create mode 100644 flux/flux-account.yaml
 create mode 100644 flux/flux-deployment.yaml
 create mode 100644 flux/flux-helm-operator-account.yaml
 create mode 100644 flux/flux-helm-release-crd.yaml
 create mode 100644 flux/flux-namespace.yaml
 create mode 100644 flux/flux-secret.yaml
 create mode 100644 flux/helm-operator-deployment.yaml
 create mode 100644 flux/memcache-dep.yaml
 create mode 100644 flux/memcache-svc.yaml
 create mode 100644 flux/tiller-ca-cert-configmap.yaml
 create mode 100644 flux/tiller-dep.yaml
 create mode 100644 flux/tiller-rbac.yaml
 create mode 100644 flux/tiller-svc.yaml
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done.
Total 16 (delta 1), reused 12 (delta 1)
remote: Resolving deltas: 100% (1/1), done.        
To github.com:marccarre/my-gitops-repo.git
   e54ab6f..15b0aad  master -> master
[ℹ]  Flux will only operate properly once it has write-access to the Git repository
[ℹ]  please configure git@github.com:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFgi4LH0m5lCSUf/qmBTTZIz3MASZOQMepyDUYxtmAycwC0158op7ykTvHgmAqfXMxS90LzDQ4qPUxWKgExfjnWv3u7gWJBhDJhhDyLEodJLO6/IljgC1rUPTj5QJ1AwcPM7cvoB5sIBVq1iU6Jmf0Hp/BL2QEiLdiBdpA4HkPGKOMvzB+nNiLg4iJbCdAKAefHJWqWvf2k+PPTkVgpQ9ujcyQ+KHczY8Aj4HPu9he8C8S9Sqj2Vxq/qKZVbAuxllINy/WXlCB9SdbPx1b66g9Hiw6meoXiYJPaLft78SVXLQBx7l1anDabmcRnNHSChwMY8AAVFBssm537DyAHuG5


### Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys


$ kubectl get po --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
flux          flux-7696dbc4cd-sjbv7                 1/1     Running   0          17m
flux          flux-helm-operator-8687676b89-qw7kq   1/1     Running   0          17m
flux          memcached-5dcd7579-7bn6l              1/1     Running   0          17m
flux          tiller-deploy-69547b56b4-p6zxd        1/1     Running   0          17m
kube-system   aws-node-f8g7z                        1/1     Running   0          20m
kube-system   coredns-699bb99bf8-gptx4              1/1     Running   0          27m
kube-system   coredns-699bb99bf8-smzch              1/1     Running   0          27m
kube-system   kube-proxy-28xqt                      1/1     Running   0          20m


$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
>     -f examples/eks-quickstart-app-dev.yaml \
>     --git-email carre.marc+flux@gmail.com \
>     --git-url git@github.com:marccarre/my-gitops-repo.git
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386'...
remote: Enumerating objects: 63, done.        
remote: Counting objects: 100% (63/63), done.        
remote: Compressing objects: 100% (59/59), done.        
remote: Total 451 (delta 12), reused 53 (delta 3), pack-reused 388        
Receiving objects: 100% (451/451), 185.04 KiB | 104.00 KiB/s, done.
Resolving deltas: 100% (158/158), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-008692361'...
remote: Enumerating objects: 5, done.        
remote: Counting objects: 100% (5/5), done.        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209        
Receiving objects: 100% (214/214), 57.27 KiB | 335.00 KiB/s, done.
Resolving deltas: 100% (92/92), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  processing template files in repository
[ℹ]  writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386/base"
[master b7070d5] Add app-dev quickstart components
 27 files changed, 1380 insertions(+)
 create mode 100644 base/LICENSE
 create mode 100644 base/README.md
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml
 create mode 100644 base/demo/helm-release.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml
 create mode 100644 base/monitoring/metrics-server.yaml
 create mode 100644 base/monitoring/prometheus-operator.yaml
 create mode 100644 base/namespaces/amazon-cloudwatch.yaml
 create mode 100644 base/namespaces/demo.yaml
 create mode 100644 base/namespaces/kubernetes-dashboard.yaml
 create mode 100644 base/namespaces/monitoring.yaml
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (36/36), 13.54 KiB | 13.54 MiB/s, done.
Total 36 (delta 7), reused 27 (delta 7)
remote: Resolving deltas: 100% (7/7), done.        
To github.com:marccarre/my-gitops-repo.git
   15b0aad..b7070d5  master -> master


$ kubectl get po --all-namespaces
NAMESPACE              NAME                                                      READY   STATUS    RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-h9wr7                                    1/1     Running   0          15m
amazon-cloudwatch      fluentd-cloudwatch-8r5f6                                  1/1     Running   0          15m
demo                   podinfo-67b7886b6c-bvdtm                                  1/1     Running   0          15m
flux                   flux-7696dbc4cd-sjbv7                                     1/1     Running   0          36m
flux                   flux-helm-operator-8687676b89-qw7kq                       1/1     Running   0          36m
flux                   memcached-5dcd7579-7bn6l                                  1/1     Running   0          36m
flux                   tiller-deploy-69547b56b4-p6zxd                            1/1     Running   0          36m
kube-system            alb-ingress-controller-8df75bc98-gssb9                    1/1     Running   0          15m
kube-system            aws-node-f8g7z                                            1/1     Running   0          39m
kube-system            cluster-autoscaler-86d68b66cb-b9xqv                       1/1     Running   0          15m
kube-system            coredns-699bb99bf8-gptx4                                  1/1     Running   0          46m
kube-system            coredns-699bb99bf8-smzch                                  1/1     Running   0          46m
kube-system            kube-proxy-28xqt                                          1/1     Running   0          39m
kubernetes-dashboard   dashboard-metrics-scraper-65785bfbc-s8tq6                 1/1     Running   0          15m
kubernetes-dashboard   kubernetes-dashboard-76b969b44b-rwgk5                     1/1     Running   0          15m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running   0          14m
monitoring             metrics-server-5df4599bd7-cgh79                           1/1     Running   0          15m
monitoring             prometheus-operator-grafana-dd95fb7d4-n9ddh               2/2     Running   0          15m
monitoring             prometheus-operator-kube-state-metrics-5d7558d7cc-h8xgg   1/1     Running   0          15m
monitoring             prometheus-operator-operator-67895dd7c5-nqj7w             1/1     Running   0          15m
monitoring             prometheus-operator-prometheus-node-exporter-qp8gp        1/1     Running   0          15m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running   1          14m

If, however, you do NOT have the IAM roles in place, then the cluster-autoscaler will CrashLoopBackOff. See also these steps which reproduce the issue. (Which I have also run, to double check things & ensure I can actually reproduce the issue.)

$ eksctl create cluster --name mc-1237-testing
[ℹ]  eksctl version 0.11.1
[ℹ]  using region ap-northeast-1
[ℹ]  setting availability zones to [ap-northeast-1d ap-northeast-1a ap-northeast-1c]
[ℹ]  subnets for ap-northeast-1d - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ]  subnets for ap-northeast-1a - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ]  subnets for ap-northeast-1c - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ]  nodegroup "ng-7bfc0f1f" will use "ami-02e124a380df41614" [AmazonLinux2/1.14]
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "mc-1237-testing" in "ap-northeast-1" region with un-managed nodes
[ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing'
[ℹ]  CloudWatch logging will not be enabled for cluster "mc-1237-testing" in "ap-northeast-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing" in "ap-northeast-1"
[ℹ]  2 sequential tasks: { create cluster control plane "mc-1237-testing", create nodegroup "ng-7bfc0f1f" }
[ℹ]  building cluster stack "eksctl-mc-1237-testing-cluster"
[ℹ]  deploying stack "eksctl-mc-1237-testing-cluster"
[ℹ]  building nodegroup stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f"
[ℹ]  --nodes-min=2 was set automatically for nodegroup ng-7bfc0f1f
[ℹ]  --nodes-max=2 was set automatically for nodegroup ng-7bfc0f1f
[ℹ]  deploying stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f"
[✔]  all EKS cluster resources for "mc-1237-testing" have been created
[✔]  saved kubeconfig as "${HOME}/.kube/config"
[ℹ]  adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-nodegroup-NodeInstanceRole-KGOKLPVNIK10" to auth ConfigMap
[ℹ]  nodegroup "ng-7bfc0f1f" has 0 node(s)
[ℹ]  waiting for at least 2 node(s) to become ready in "ng-7bfc0f1f"
[ℹ]  nodegroup "ng-7bfc0f1f" has 2 node(s)
[ℹ]  node "ip-192-168-2-23.ap-northeast-1.compute.internal" is ready
[ℹ]  node "ip-192-168-48-84.ap-northeast-1.compute.internal" is ready
[ℹ]  kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "mc-1237-testing" in "ap-northeast-1" region is ready


$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
>    --cluster mc-1237-testing \
>    --region ap-northeast-1 \
>    --git-email carre.marc+flux@gmail.com \
>    --git-url git@github.com:marccarre/my-gitops-repo.git

[ℹ]  Generating public key infrastructure for the Helm Operator and Tiller
[ℹ]    this may take up to a minute, please be patient
[!]  Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki563648596"
[!]  please move the files into a safe place or delete them
[ℹ]  Generating manifests
[ℹ]  Cloning git@github.com:marccarre/my-gitops-repo.git
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-026154915'...
remote: Enumerating objects: 43, done.        
remote: Counting objects: 100% (43/43), done.        
remote: Compressing objects: 100% (40/40), done.        
remote: Total 431 (delta 9), reused 35 (delta 3), pack-reused 388        
Receiving objects: 100% (431/431), 177.90 KiB | 497.00 KiB/s, done.
Resolving deltas: 100% (155/155), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  Writing Flux manifests
[ℹ]  created "Namespace/flux"
[ℹ]  Applying Helm TLS Secret(s)
[ℹ]  created "flux:Secret/flux-helm-tls-cert"
[ℹ]  created "flux:Secret/tiller-secret"
[!]  Note: certificate secrets aren't added to the Git repository for security reasons
[ℹ]  Applying manifests
[ℹ]  created "flux:ServiceAccount/flux"
[ℹ]  created "ClusterRole.rbac.authorization.k8s.io/flux"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/flux"
[ℹ]  created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io"
[ℹ]  created "flux:Service/memcached"
[ℹ]  created "flux:ServiceAccount/tiller"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller"
[ℹ]  created "flux:ServiceAccount/helm"
[ℹ]  created "flux:Role.rbac.authorization.k8s.io/tiller-user"
[ℹ]  created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding"
[ℹ]  created "flux:Deployment.extensions/tiller-deploy"
[ℹ]  created "flux:Deployment.apps/flux"
[ℹ]  created "flux:ConfigMap/flux-helm-tls-ca-config"
[ℹ]  created "flux:Deployment.apps/flux-helm-operator"
[ℹ]  created "flux:Deployment.apps/memcached"
[ℹ]  created "flux:Secret/flux-git-deploy"
[ℹ]  created "flux:ServiceAccount/flux-helm-operator"
[ℹ]  created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator"
[ℹ]  created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator"
[ℹ]  created "flux:Service/tiller-deploy"
[ℹ]  Waiting for Helm Operator to start
[ℹ]  Helm Operator started successfully
[ℹ]  see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator
[ℹ]  Waiting for Flux to start
[ℹ]  Flux started successfully
[ℹ]  see https://docs.fluxcd.io/projects/flux for details on how to use Flux
[ℹ]  Committing and pushing manifests to git@github.com:marccarre/my-gitops-repo.git
[master f8e0c52] Add Initial Flux configuration
 13 files changed, 803 insertions(+)
 create mode 100644 flux/flux-account.yaml
 create mode 100644 flux/flux-deployment.yaml
 create mode 100644 flux/flux-helm-operator-account.yaml
 create mode 100644 flux/flux-helm-release-crd.yaml
 create mode 100644 flux/flux-namespace.yaml
 create mode 100644 flux/flux-secret.yaml
 create mode 100644 flux/helm-operator-deployment.yaml
 create mode 100644 flux/memcache-dep.yaml
 create mode 100644 flux/memcache-svc.yaml
 create mode 100644 flux/tiller-ca-cert-configmap.yaml
 create mode 100644 flux/tiller-dep.yaml
 create mode 100644 flux/tiller-rbac.yaml
 create mode 100644 flux/tiller-svc.yaml
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done.
Total 16 (delta 1), reused 12 (delta 1)
remote: Resolving deltas: 100% (1/1), done.        
To github.com:marccarre/my-gitops-repo.git
   4b9a79d..f8e0c52  master -> master
[ℹ]  Flux will only operate properly once it has write-access to the Git repository
[ℹ]  please configure git@github.com:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoYrh1xqsHGQuJZnsY2hiOyplanBS/wmLQaxyPu2eMexmG1uy4Vq+e1qHQ6ukTlPSV92N2diz7Mml/VnfMIu6/S6WpcEa8s8cX+4X2w4DN5VGcOdMbRa76Td6me1Kp7X4BvQSpmtfj380+7dY+yxywTVf97ZFYq1atitxvjgVHIUCDLAXxqmM2t7OnH5nYEJFS+32BRmENMpzEfB+31PiOAgsUHENA4BCr0sbxDpKt3j4hzJbntgYQVyhaNLBH8S34Ogz1V0i8H5iplJ6YjsNXpeUhmRYFH4rKOTi0EJv7wEWMEH1gttQvLxhHAd6s4qDMB27aQSJFMh55/DW/r6Z


### Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys


$ kubectl get po --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
flux          flux-7696dbc4cd-4h927                 1/1     Running   0          69s
flux          flux-helm-operator-8687676b89-hskbj   1/1     Running   0          68s
flux          memcached-5dcd7579-tpkvd              1/1     Running   0          69s
flux          tiller-deploy-69547b56b4-sp9md        1/1     Running   0          69s
kube-system   aws-node-97px5                        1/1     Running   0          7m5s
kube-system   aws-node-kxbzd                        1/1     Running   0          7m5s
kube-system   coredns-699bb99bf8-sn7ws              1/1     Running   0          13m
kube-system   coredns-699bb99bf8-zx26g              1/1     Running   0          13m
kube-system   kube-proxy-t2rvs                      1/1     Running   0          7m5s
kube-system   kube-proxy-tkncf                      1/1     Running   0          7m5s


$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
>    --cluster mc-1237-testing \
>    --region ap-northeast-1 \
>    --git-email carre.marc+flux@gmail.com \
>    --git-url git@github.com:marccarre/my-gitops-repo.git

Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557'...
remote: Enumerating objects: 47, done.        
remote: Counting objects: 100% (47/47), done.        
remote: Compressing objects: 100% (44/44), done.        
remote: Total 435 (delta 10), reused 38 (delta 3), pack-reused 388        
Receiving objects: 100% (435/435), 179.62 KiB | 494.00 KiB/s, done.
Resolving deltas: 100% (156/156), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-019213272'...
remote: Enumerating objects: 5, done.        
remote: Counting objects: 100% (5/5), done.        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209        
Receiving objects: 100% (214/214), 57.27 KiB | 322.00 KiB/s, done.
Resolving deltas: 100% (92/92), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
[ℹ]  processing template files in repository
[ℹ]  writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557/base"
[master 5e6bcf5] Add app-dev quickstart components
 27 files changed, 1380 insertions(+)
 create mode 100644 base/LICENSE
 create mode 100644 base/README.md
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml
 create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml
 create mode 100644 base/demo/helm-release.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml
 create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml
 create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml
 create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml
 create mode 100644 base/monitoring/metrics-server.yaml
 create mode 100644 base/monitoring/prometheus-operator.yaml
 create mode 100644 base/namespaces/amazon-cloudwatch.yaml
 create mode 100644 base/namespaces/demo.yaml
 create mode 100644 base/namespaces/kubernetes-dashboard.yaml
 create mode 100644 base/namespaces/monitoring.yaml
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (36/36), 13.52 KiB | 13.52 MiB/s, done.
Total 36 (delta 7), reused 25 (delta 7)
remote: Resolving deltas: 100% (7/7), done.        
To github.com:marccarre/my-gitops-repo.git
   f8e0c52..5e6bcf5  master -> master


$ kubectl get po --all-namespaces
NAMESPACE              NAME                                                      READY   STATUS             RESTARTS   AGE
amazon-cloudwatch      cloudwatch-agent-6km5b                                    1/1     Running            0          109m
amazon-cloudwatch      cloudwatch-agent-kcpb9                                    1/1     Running            0          109m
amazon-cloudwatch      fluentd-cloudwatch-8wxxn                                  1/1     Running            0          109m
amazon-cloudwatch      fluentd-cloudwatch-nst52                                  1/1     Running            0          109m
demo                   podinfo-67b7886b6c-pjws4                                  1/1     Running            0          109m
flux                   flux-7696dbc4cd-4h927                                     1/1     Running            0          116m
flux                   flux-helm-operator-8687676b89-hskbj                       1/1     Running            0          115m
flux                   memcached-5dcd7579-tpkvd                                  1/1     Running            0          116m
flux                   tiller-deploy-69547b56b4-sp9md                            1/1     Running            0          116m
kube-system            alb-ingress-controller-776b5b58c9-bbt7t                   1/1     Running            0          109m
kube-system            aws-node-97px5                                            1/1     Running            0          121m
kube-system            aws-node-kxbzd                                            1/1     Running            0          121m
kube-system            cluster-autoscaler-55d556f787-rm7cc                       0/1     CrashLoopBackOff   25         109m
kube-system            coredns-699bb99bf8-sn7ws                                  1/1     Running            0          128m
kube-system            coredns-699bb99bf8-zx26g                                  1/1     Running            0          128m
kube-system            kube-proxy-t2rvs                                          1/1     Running            0          121m
kube-system            kube-proxy-tkncf                                          1/1     Running            0          121m
kubernetes-dashboard   dashboard-metrics-scraper-65785bfbc-52952                 1/1     Running            0          109m
kubernetes-dashboard   kubernetes-dashboard-76b969b44b-hf9kd                     1/1     Running            0          109m
monitoring             alertmanager-prometheus-operator-alertmanager-0           2/2     Running            0          108m
monitoring             metrics-server-5df4599bd7-l5b8q                           1/1     Running            0          109m
monitoring             prometheus-operator-grafana-dd95fb7d4-gzqxn               2/2     Running            0          109m
monitoring             prometheus-operator-kube-state-metrics-5d7558d7cc-qx4tl   1/1     Running            0          109m
monitoring             prometheus-operator-operator-67895dd7c5-nhbbv             1/1     Running            0          109m
monitoring             prometheus-operator-prometheus-node-exporter-77nb6        1/1     Running            0          109m
monitoring             prometheus-operator-prometheus-node-exporter-hfdv9        1/1     Running            0          109m
monitoring             prometheus-prometheus-operator-prometheus-0               3/3     Running            1          108m

ilanpillemer · 2019-12-10T18:00:01Z

Yes. I have resolved the issue. If you follow the instructions word for word it fails. You need to add the roles with the necessary config when creating the cluster.

…

On Tue, 10 Dec 2019, 10:33 Marc Carré, ***@***.***> wrote: @ilanpillemer <https://github.com/ilanpillemer>, did you have the required IAM roles <https://github.com/weaveworks/eks-quickstart-app-dev/#pre-requisites> in place in your cluster? If you have them, then this should work fine. See also the steps in this collapsible. (I just re-ran this myself to be sure it still does work as expected. It does.) $ git diff diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml index 487cb46b..5783c605 100644 --- a/examples/eks-quickstart-app-dev.yaml +++ b/examples/eks-quickstart-app-dev.yaml @@ -5,8 +5,8 @@ apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: - name: cluster-12 - region: eu-north-1 + name: mc-1237-testing-with-iam + region: ap-northeast-1 nodeGroups: - name: ng-1 $ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml [ℹ] eksctl version 0.11.1 [ℹ] using region ap-northeast-1 [ℹ] setting availability zones to [ap-northeast-1c ap-northeast-1d ap-northeast-1a] [ℹ] subnets for ap-northeast-1c - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for ap-northeast-1d - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for ap-northeast-1a - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-1" will use "ami-02e124a380df41614" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region with un-managed nodes [ℹ] 1 nodegroup (ng-1) was included (based on the include/exclude rules) [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s) [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s) [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam' [ℹ] CloudWatch logging will not be enabled for cluster "mc-1237-testing-with-iam" in "ap-northeast-1" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing-with-iam" in "ap-northeast-1" [ℹ] 2 sequential tasks: { create cluster control plane "mc-1237-testing-with-iam", create nodegroup "ng-1" } [ℹ] building cluster stack "eksctl-mc-1237-testing-with-iam-cluster" [ℹ] deploying stack "eksctl-mc-1237-testing-with-iam-cluster" [ℹ] building nodegroup stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1" [ℹ] deploying stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1" [✔] all EKS cluster resources for "mc-1237-testing-with-iam" have been created [✔] saved kubeconfig as "${HOME}/.kube/config" [ℹ] adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-with-iam-n-NodeInstanceRole-1M7OF6KB2D8RV" to auth ConfigMap [ℹ] nodegroup "ng-1" has 0 node(s) [ℹ] waiting for at least 1 node(s) to become ready in "ng-1" [ℹ] nodegroup "ng-1" has 1 node(s) [ℹ] node "ip-192-168-13-77.ap-northeast-1.compute.internal" is ready [ℹ] kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes' [✔] EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region is ready $ EKSCTL_EXPERIMENTAL=true eksctl enable repo \ > -f examples/eks-quickstart-app-dev.yaml \ > --git-email ***@***.*** \ > --git-url ***@***.***:marccarre/my-gitops-repo.git [ℹ] Generating public key infrastructure for the Helm Operator and Tiller [ℹ] this may take up to a minute, please be patient [!] Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki431635447" [!] please move the files into a safe place or delete them [ℹ] Generating manifests [ℹ] Cloning ***@***.***:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-956113642'... remote: Enumerating objects: 59, done. remote: Counting objects: 100% (59/59), done. remote: Compressing objects: 100% (55/55), done. remote: Total 447 (delta 11), reused 50 (delta 3), pack-reused 388 Receiving objects: 100% (447/447), 183.32 KiB | 514.00 KiB/s, done. Resolving deltas: 100% (157/157), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] Writing Flux manifests [ℹ] created "Namespace/flux" [ℹ] Applying Helm TLS Secret(s) [ℹ] created "flux:Secret/flux-helm-tls-cert" [ℹ] created "flux:Secret/tiller-secret" [!] Note: certificate secrets aren't added to the Git repository for security reasons [ℹ] Applying manifests [ℹ] created "flux:Deployment.apps/flux" [ℹ] created "flux:ServiceAccount/flux-helm-operator" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io" [ℹ] created "flux:Secret/flux-git-deploy" [ℹ] created "flux:Deployment.apps/memcached" [ℹ] created "flux:Deployment.apps/flux-helm-operator" [ℹ] created "flux:Deployment.extensions/tiller-deploy" [ℹ] created "flux:Service/tiller-deploy" [ℹ] created "flux:Service/memcached" [ℹ] created "flux:ServiceAccount/flux" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux" [ℹ] created "flux:ConfigMap/flux-helm-tls-ca-config" [ℹ] created "flux:ServiceAccount/tiller" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller" [ℹ] created "flux:ServiceAccount/helm" [ℹ] created "flux:Role.rbac.authorization.k8s.io/tiller-user" [ℹ] created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding" [ℹ] Waiting for Helm Operator to start ERROR: logging before flag.Parse: E1210 18:44:24.787197 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:24 socat[6735] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:26.816135 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:26 socat[6814] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:28.844545 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:28 socat[6870] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:30.877698 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:30 socat[6967] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:32.914902 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:32 socat[7082] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:34.944906 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:34 socat[7084] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:36.971253 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:36 socat[7085] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:38.998610 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:39 socat[7090] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:41.023201 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:41 socat[7093] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:43.053384 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:43 socat[7113] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:45.084005 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:45 socat[7115] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:47.115951 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:47 socat[7116] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... [ℹ] Helm Operator started successfully [ℹ] see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator [ℹ] Waiting for Flux to start [ℹ] Flux started successfully [ℹ] see https://docs.fluxcd.io/projects/flux for details on how to use Flux [ℹ] Committing and pushing manifests to ***@***.***:marccarre/my-gitops-repo.git [master 15b0aad] Add Initial Flux configuration 13 files changed, 803 insertions(+) create mode 100644 flux/flux-account.yaml create mode 100644 flux/flux-deployment.yaml create mode 100644 flux/flux-helm-operator-account.yaml create mode 100644 flux/flux-helm-release-crd.yaml create mode 100644 flux/flux-namespace.yaml create mode 100644 flux/flux-secret.yaml create mode 100644 flux/helm-operator-deployment.yaml create mode 100644 flux/memcache-dep.yaml create mode 100644 flux/memcache-svc.yaml create mode 100644 flux/tiller-ca-cert-configmap.yaml create mode 100644 flux/tiller-dep.yaml create mode 100644 flux/tiller-rbac.yaml create mode 100644 flux/tiller-svc.yaml Enumerating objects: 17, done. Counting objects: 100% (17/17), done. Delta compression using up to 8 threads Compressing objects: 100% (15/15), done. Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done. Total 16 (delta 1), reused 12 (delta 1) remote: Resolving deltas: 100% (1/1), done. To github.com:marccarre/my-gitops-repo.git e54ab6f..15b0aad master -> master [ℹ] Flux will only operate properly once it has write-access to the Git repository [ℹ] please configure ***@***.***:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFgi4LH0m5lCSUf/qmBTTZIz3MASZOQMepyDUYxtmAycwC0158op7ykTvHgmAqfXMxS90LzDQ4qPUxWKgExfjnWv3u7gWJBhDJhhDyLEodJLO6/IljgC1rUPTj5QJ1AwcPM7cvoB5sIBVq1iU6Jmf0Hp/BL2QEiLdiBdpA4HkPGKOMvzB+nNiLg4iJbCdAKAefHJWqWvf2k+PPTkVgpQ9ujcyQ+KHczY8Aj4HPu9he8C8S9Sqj2Vxq/qKZVbAuxllINy/WXlCB9SdbPx1b66g9Hiw6meoXiYJPaLft78SVXLQBx7l1anDabmcRnNHSChwMY8AAVFBssm537DyAHuG5 ### Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys $ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE flux flux-7696dbc4cd-sjbv7 1/1 Running 0 17m flux flux-helm-operator-8687676b89-qw7kq 1/1 Running 0 17m flux memcached-5dcd7579-7bn6l 1/1 Running 0 17m flux tiller-deploy-69547b56b4-p6zxd 1/1 Running 0 17m kube-system aws-node-f8g7z 1/1 Running 0 20m kube-system coredns-699bb99bf8-gptx4 1/1 Running 0 27m kube-system coredns-699bb99bf8-smzch 1/1 Running 0 27m kube-system kube-proxy-28xqt 1/1 Running 0 20m $ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \ > -f examples/eks-quickstart-app-dev.yaml \ > --git-email ***@***.*** \ > --git-url ***@***.***:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386'... remote: Enumerating objects: 63, done. remote: Counting objects: 100% (63/63), done. remote: Compressing objects: 100% (59/59), done. remote: Total 451 (delta 12), reused 53 (delta 3), pack-reused 388 Receiving objects: 100% (451/451), 185.04 KiB | 104.00 KiB/s, done. Resolving deltas: 100% (158/158), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-008692361'... remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (4/4), done. remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209 Receiving objects: 100% (214/214), 57.27 KiB | 335.00 KiB/s, done. Resolving deltas: 100% (92/92), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] processing template files in repository [ℹ] writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386/base" [master b7070d5] Add app-dev quickstart components 27 files changed, 1380 insertions(+) create mode 100644 base/LICENSE create mode 100644 base/README.md create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml create mode 100644 base/demo/helm-release.yaml create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml create mode 100644 base/monitoring/metrics-server.yaml create mode 100644 base/monitoring/prometheus-operator.yaml create mode 100644 base/namespaces/amazon-cloudwatch.yaml create mode 100644 base/namespaces/demo.yaml create mode 100644 base/namespaces/kubernetes-dashboard.yaml create mode 100644 base/namespaces/monitoring.yaml Enumerating objects: 37, done. Counting objects: 100% (37/37), done. Delta compression using up to 8 threads Compressing objects: 100% (28/28), done. Writing objects: 100% (36/36), 13.54 KiB | 13.54 MiB/s, done. Total 36 (delta 7), reused 27 (delta 7) remote: Resolving deltas: 100% (7/7), done. To github.com:marccarre/my-gitops-repo.git 15b0aad..b7070d5 master -> master $ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE amazon-cloudwatch cloudwatch-agent-h9wr7 1/1 Running 0 15m amazon-cloudwatch fluentd-cloudwatch-8r5f6 1/1 Running 0 15m demo podinfo-67b7886b6c-bvdtm 1/1 Running 0 15m flux flux-7696dbc4cd-sjbv7 1/1 Running 0 36m flux flux-helm-operator-8687676b89-qw7kq 1/1 Running 0 36m flux memcached-5dcd7579-7bn6l 1/1 Running 0 36m flux tiller-deploy-69547b56b4-p6zxd 1/1 Running 0 36m kube-system alb-ingress-controller-8df75bc98-gssb9 1/1 Running 0 15m kube-system aws-node-f8g7z 1/1 Running 0 39m kube-system cluster-autoscaler-86d68b66cb-b9xqv 1/1 Running 0 15m kube-system coredns-699bb99bf8-gptx4 1/1 Running 0 46m kube-system coredns-699bb99bf8-smzch 1/1 Running 0 46m kube-system kube-proxy-28xqt 1/1 Running 0 39m kubernetes-dashboard dashboard-metrics-scraper-65785bfbc-s8tq6 1/1 Running 0 15m kubernetes-dashboard kubernetes-dashboard-76b969b44b-rwgk5 1/1 Running 0 15m monitoring alertmanager-prometheus-operator-alertmanager-0 2/2 Running 0 14m monitoring metrics-server-5df4599bd7-cgh79 1/1 Running 0 15m monitoring prometheus-operator-grafana-dd95fb7d4-n9ddh 2/2 Running 0 15m monitoring prometheus-operator-kube-state-metrics-5d7558d7cc-h8xgg 1/1 Running 0 15m monitoring prometheus-operator-operator-67895dd7c5-nqj7w 1/1 Running 0 15m monitoring prometheus-operator-prometheus-node-exporter-qp8gp 1/1 Running 0 15m monitoring prometheus-prometheus-operator-prometheus-0 3/3 Running 1 14m If, however, you do *NOT* have the IAM roles in place, then the cluster-autoscaler will CrashLoopBackOff. See also these steps which reproduce the issue. (Which I have also run, to double check things & ensure I can actually reproduce the issue.) $ eksctl create cluster --name mc-1237-testing [ℹ] eksctl version 0.11.1 [ℹ] using region ap-northeast-1 [ℹ] setting availability zones to [ap-northeast-1d ap-northeast-1a ap-northeast-1c] [ℹ] subnets for ap-northeast-1d - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for ap-northeast-1a - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for ap-northeast-1c - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-7bfc0f1f" will use "ami-02e124a380df41614" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "mc-1237-testing" in "ap-northeast-1" region with un-managed nodes [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing' [ℹ] CloudWatch logging will not be enabled for cluster "mc-1237-testing" in "ap-northeast-1" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing" in "ap-northeast-1" [ℹ] 2 sequential tasks: { create cluster control plane "mc-1237-testing", create nodegroup "ng-7bfc0f1f" } [ℹ] building cluster stack "eksctl-mc-1237-testing-cluster" [ℹ] deploying stack "eksctl-mc-1237-testing-cluster" [ℹ] building nodegroup stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f" [ℹ] --nodes-min=2 was set automatically for nodegroup ng-7bfc0f1f [ℹ] --nodes-max=2 was set automatically for nodegroup ng-7bfc0f1f [ℹ] deploying stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f" [✔] all EKS cluster resources for "mc-1237-testing" have been created [✔] saved kubeconfig as "${HOME}/.kube/config" [ℹ] adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-nodegroup-NodeInstanceRole-KGOKLPVNIK10" to auth ConfigMap [ℹ] nodegroup "ng-7bfc0f1f" has 0 node(s) [ℹ] waiting for at least 2 node(s) to become ready in "ng-7bfc0f1f" [ℹ] nodegroup "ng-7bfc0f1f" has 2 node(s) [ℹ] node "ip-192-168-2-23.ap-northeast-1.compute.internal" is ready [ℹ] node "ip-192-168-48-84.ap-northeast-1.compute.internal" is ready [ℹ] kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes' [✔] EKS cluster "mc-1237-testing" in "ap-northeast-1" region is ready $ EKSCTL_EXPERIMENTAL=true eksctl enable repo \ > --cluster mc-1237-testing \ > --region ap-northeast-1 \ > --git-email ***@***.*** \ > --git-url ***@***.***:marccarre/my-gitops-repo.git [ℹ] Generating public key infrastructure for the Helm Operator and Tiller [ℹ] this may take up to a minute, please be patient [!] Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki563648596" [!] please move the files into a safe place or delete them [ℹ] Generating manifests [ℹ] Cloning ***@***.***:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-026154915'... remote: Enumerating objects: 43, done. remote: Counting objects: 100% (43/43), done. remote: Compressing objects: 100% (40/40), done. remote: Total 431 (delta 9), reused 35 (delta 3), pack-reused 388 Receiving objects: 100% (431/431), 177.90 KiB | 497.00 KiB/s, done. Resolving deltas: 100% (155/155), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] Writing Flux manifests [ℹ] created "Namespace/flux" [ℹ] Applying Helm TLS Secret(s) [ℹ] created "flux:Secret/flux-helm-tls-cert" [ℹ] created "flux:Secret/tiller-secret" [!] Note: certificate secrets aren't added to the Git repository for security reasons [ℹ] Applying manifests [ℹ] created "flux:ServiceAccount/flux" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux" [ℹ] created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io" [ℹ] created "flux:Service/memcached" [ℹ] created "flux:ServiceAccount/tiller" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller" [ℹ] created "flux:ServiceAccount/helm" [ℹ] created "flux:Role.rbac.authorization.k8s.io/tiller-user" [ℹ] created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding" [ℹ] created "flux:Deployment.extensions/tiller-deploy" [ℹ] created "flux:Deployment.apps/flux" [ℹ] created "flux:ConfigMap/flux-helm-tls-ca-config" [ℹ] created "flux:Deployment.apps/flux-helm-operator" [ℹ] created "flux:Deployment.apps/memcached" [ℹ] created "flux:Secret/flux-git-deploy" [ℹ] created "flux:ServiceAccount/flux-helm-operator" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "flux:Service/tiller-deploy" [ℹ] Waiting for Helm Operator to start [ℹ] Helm Operator started successfully [ℹ] see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator [ℹ] Waiting for Flux to start [ℹ] Flux started successfully [ℹ] see https://docs.fluxcd.io/projects/flux for details on how to use Flux [ℹ] Committing and pushing manifests to ***@***.***:marccarre/my-gitops-repo.git [master f8e0c52] Add Initial Flux configuration 13 files changed, 803 insertions(+) create mode 100644 flux/flux-account.yaml create mode 100644 flux/flux-deployment.yaml create mode 100644 flux/flux-helm-operator-account.yaml create mode 100644 flux/flux-helm-release-crd.yaml create mode 100644 flux/flux-namespace.yaml create mode 100644 flux/flux-secret.yaml create mode 100644 flux/helm-operator-deployment.yaml create mode 100644 flux/memcache-dep.yaml create mode 100644 flux/memcache-svc.yaml create mode 100644 flux/tiller-ca-cert-configmap.yaml create mode 100644 flux/tiller-dep.yaml create mode 100644 flux/tiller-rbac.yaml create mode 100644 flux/tiller-svc.yaml Enumerating objects: 17, done. Counting objects: 100% (17/17), done. Delta compression using up to 8 threads Compressing objects: 100% (15/15), done. Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done. Total 16 (delta 1), reused 12 (delta 1) remote: Resolving deltas: 100% (1/1), done. To github.com:marccarre/my-gitops-repo.git 4b9a79d..f8e0c52 master -> master [ℹ] Flux will only operate properly once it has write-access to the Git repository [ℹ] please configure ***@***.***:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoYrh1xqsHGQuJZnsY2hiOyplanBS/wmLQaxyPu2eMexmG1uy4Vq+e1qHQ6ukTlPSV92N2diz7Mml/VnfMIu6/S6WpcEa8s8cX+4X2w4DN5VGcOdMbRa76Td6me1Kp7X4BvQSpmtfj380+7dY+yxywTVf97ZFYq1atitxvjgVHIUCDLAXxqmM2t7OnH5nYEJFS+32BRmENMpzEfB+31PiOAgsUHENA4BCr0sbxDpKt3j4hzJbntgYQVyhaNLBH8S34Ogz1V0i8H5iplJ6YjsNXpeUhmRYFH4rKOTi0EJv7wEWMEH1gttQvLxhHAd6s4qDMB27aQSJFMh55/DW/r6Z ### Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys $ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE flux flux-7696dbc4cd-4h927 1/1 Running 0 69s flux flux-helm-operator-8687676b89-hskbj 1/1 Running 0 68s flux memcached-5dcd7579-tpkvd 1/1 Running 0 69s flux tiller-deploy-69547b56b4-sp9md 1/1 Running 0 69s kube-system aws-node-97px5 1/1 Running 0 7m5s kube-system aws-node-kxbzd 1/1 Running 0 7m5s kube-system coredns-699bb99bf8-sn7ws 1/1 Running 0 13m kube-system coredns-699bb99bf8-zx26g 1/1 Running 0 13m kube-system kube-proxy-t2rvs 1/1 Running 0 7m5s kube-system kube-proxy-tkncf 1/1 Running 0 7m5s $ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \ > --cluster mc-1237-testing \ > --region ap-northeast-1 \ > --git-email ***@***.*** \ > --git-url ***@***.***:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557'... remote: Enumerating objects: 47, done. remote: Counting objects: 100% (47/47), done. remote: Compressing objects: 100% (44/44), done. remote: Total 435 (delta 10), reused 38 (delta 3), pack-reused 388 Receiving objects: 100% (435/435), 179.62 KiB | 494.00 KiB/s, done. Resolving deltas: 100% (156/156), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-019213272'... remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (4/4), done. remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209 Receiving objects: 100% (214/214), 57.27 KiB | 322.00 KiB/s, done. Resolving deltas: 100% (92/92), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] processing template files in repository [ℹ] writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557/base" [master 5e6bcf5] Add app-dev quickstart components 27 files changed, 1380 insertions(+) create mode 100644 base/LICENSE create mode 100644 base/README.md create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml create mode 100644 base/demo/helm-release.yaml create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml create mode 100644 base/monitoring/metrics-server.yaml create mode 100644 base/monitoring/prometheus-operator.yaml create mode 100644 base/namespaces/amazon-cloudwatch.yaml create mode 100644 base/namespaces/demo.yaml create mode 100644 base/namespaces/kubernetes-dashboard.yaml create mode 100644 base/namespaces/monitoring.yaml Enumerating objects: 37, done. Counting objects: 100% (37/37), done. Delta compression using up to 8 threads Compressing objects: 100% (28/28), done. Writing objects: 100% (36/36), 13.52 KiB | 13.52 MiB/s, done. Total 36 (delta 7), reused 25 (delta 7) remote: Resolving deltas: 100% (7/7), done. To github.com:marccarre/my-gitops-repo.git f8e0c52..5e6bcf5 master -> master $ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE amazon-cloudwatch cloudwatch-agent-6km5b 1/1 Running 0 109m amazon-cloudwatch cloudwatch-agent-kcpb9 1/1 Running 0 109m amazon-cloudwatch fluentd-cloudwatch-8wxxn 1/1 Running 0 109m amazon-cloudwatch fluentd-cloudwatch-nst52 1/1 Running 0 109m demo podinfo-67b7886b6c-pjws4 1/1 Running 0 109m flux flux-7696dbc4cd-4h927 1/1 Running 0 116m flux flux-helm-operator-8687676b89-hskbj 1/1 Running 0 115m flux memcached-5dcd7579-tpkvd 1/1 Running 0 116m flux tiller-deploy-69547b56b4-sp9md 1/1 Running 0 116m kube-system alb-ingress-controller-776b5b58c9-bbt7t 1/1 Running 0 109m kube-system aws-node-97px5 1/1 Running 0 121m kube-system aws-node-kxbzd 1/1 Running 0 121m kube-system cluster-autoscaler-55d556f787-rm7cc 0/1 CrashLoopBackOff 25 109m kube-system coredns-699bb99bf8-sn7ws 1/1 Running 0 128m kube-system coredns-699bb99bf8-zx26g 1/1 Running 0 128m kube-system kube-proxy-t2rvs 1/1 Running 0 121m kube-system kube-proxy-tkncf 1/1 Running 0 121m kubernetes-dashboard dashboard-metrics-scraper-65785bfbc-52952 1/1 Running 0 109m kubernetes-dashboard kubernetes-dashboard-76b969b44b-hf9kd 1/1 Running 0 109m monitoring alertmanager-prometheus-operator-alertmanager-0 2/2 Running 0 108m monitoring metrics-server-5df4599bd7-l5b8q 1/1 Running 0 109m monitoring prometheus-operator-grafana-dd95fb7d4-gzqxn 2/2 Running 0 109m monitoring prometheus-operator-kube-state-metrics-5d7558d7cc-qx4tl 1/1 Running 0 109m monitoring prometheus-operator-operator-67895dd7c5-nhbbv 1/1 Running 0 109m monitoring prometheus-operator-prometheus-node-exporter-77nb6 1/1 Running 0 109m monitoring prometheus-operator-prometheus-node-exporter-hfdv9 1/1 Running 0 109m monitoring prometheus-prometheus-operator-prometheus-0 3/3 Running 1 108m — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1237>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGZZDXBZC2E5AXMQ72KFG3QX5WBJANCNFSM4ISMF6YA> .

marccarre · 2019-12-10T20:04:37Z

If you follow the instructions word for
word it fails.

Which instructions were you following exactly @ilanpillemer? (Could you please share a link to them to ensure we are on the same page, and/or so that we know if we need to update/correct anything published elsewhere? 🙇 )

If you are talking about something else than this, would you have any suggestion to make these instructions clearer?

Note that the pre-requisites for the app-dev profile are documented here: https://github.com/weaveworks/eks-quickstart-app-dev#pre-requisites, but any suggestion on how to improve this & make it more obvious is always welcome! ✨

You need to add the roles with the necessary config when
creating the cluster.

Yes, this is what the first two commands in what I shared here were hoping to show, i.e.:

Use a ClusterConfig with the appropriate roles, e.g. examples/eks-quickstart-app-dev.yaml:
```
$ git diff
diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml
[...]
```
Indeed, this file define the following IAM roles: https://github.com/weaveworks/eksctl/blob/796d9f48c2f70732e27aebeee1c38a864cda88a8/examples/eks-quickstart-app-dev.yaml#L16-L20

Create the cluster by passing a reference to this file.

$ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml 
[...]

ilanpillemer · 2019-12-10T22:21:40Z

Yes. Now it seems completely obvious what I had to do with hindsight. I think a very minor tweak would help. I used the gitops quick start guide at eksctl.io. When I look now it says some variant of the command should be used. Perhaps a few more words like for example if you need the auto scaler or alb ingress then the necessary switches you can find in the documents should be used. Or something similar. Great work with eksctl and flux, they are game changing.

gemagomez added the kind/bug label Aug 30, 2019

gemagomez added this to the 0.5.0 milestone Aug 30, 2019

marccarre self-assigned this Sep 2, 2019

This was referenced Sep 2, 2019

Document mandatory IAM roles weaveworks/eks-quickstart-app-dev#19

Closed

Add examples/eks-quickstart-app-dev.yaml #1249

Merged

errordeveloper modified the milestones: 0.5.0, 0.6.0 Sep 3, 2019

marccarre closed this as completed Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gitops apply` fails to get cluster autoscaler working #1237

`gitops apply` fails to get cluster autoscaler working #1237

gemagomez commented Aug 30, 2019 •

edited

Loading

errordeveloper commented Aug 30, 2019

2opremio commented Aug 30, 2019

marccarre commented Sep 2, 2019 •

edited

Loading

gemagomez commented Sep 2, 2019

marccarre commented Sep 2, 2019

errordeveloper commented Sep 4, 2019

marccarre commented Sep 4, 2019

marccarre commented Sep 4, 2019 •

edited

Loading

ilanpillemer commented Nov 25, 2019

marccarre commented Dec 10, 2019

ilanpillemer commented Dec 10, 2019 via email

marccarre commented Dec 10, 2019

ilanpillemer commented Dec 10, 2019

gitops apply fails to get cluster autoscaler working #1237

gitops apply fails to get cluster autoscaler working #1237

Comments

gemagomez commented Aug 30, 2019 • edited Loading

errordeveloper commented Aug 30, 2019

2opremio commented Aug 30, 2019

marccarre commented Sep 2, 2019 • edited Loading

gemagomez commented Sep 2, 2019

marccarre commented Sep 2, 2019

errordeveloper commented Sep 4, 2019

marccarre commented Sep 4, 2019

marccarre commented Sep 4, 2019 • edited Loading

ilanpillemer commented Nov 25, 2019

marccarre commented Dec 10, 2019

ilanpillemer commented Dec 10, 2019 via email

marccarre commented Dec 10, 2019

ilanpillemer commented Dec 10, 2019

`gitops apply` fails to get cluster autoscaler working #1237

`gitops apply` fails to get cluster autoscaler working #1237

gemagomez commented Aug 30, 2019 •

edited

Loading

marccarre commented Sep 2, 2019 •

edited

Loading

marccarre commented Sep 4, 2019 •

edited

Loading