Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS alb-ingress-controller failed to create ALB in EKS with fargate #1202

Closed
aspnet4you opened this issue Mar 23, 2020 · 16 comments
Closed

AWS alb-ingress-controller failed to create ALB in EKS with fargate #1202

aspnet4you opened this issue Mar 23, 2020 · 16 comments

Comments

@aspnet4you
Copy link

I was trying to follow the documentation below to create an alb-ingress-controller with ingress resources- https://aws.amazon.com/blogs/containers/using-alb-ingress-controller-with-amazon-eks-on-fargate/

It's supposed to create an alb and bind the address field of Kubernetes ingress but the address field of ingress is empty! No error. Fargate profile has been given proper IAM permissions and service account is given RBAC based on the documentation.

I documented the steps in my blog with screenshots at https://blogs.aspnet4you.com/2020/03/17/run-serverless-kubernetes-pods-using-amazon-eks-and-aws-fargate/ and you can see address of ingress is empty! Ingress PODs are running fine.

I could create an alb manually which is what I did but it defeats the purpose. Any idea why alb didn't get created?

Thanks,
Prodip

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Mar 23, 2020

Hi, would you help share the logs from the controller pod?

BTW, where is your controller running? if it's running as a fargate pod itself, you need to specify --aws-vpc-id and --aws-region

@aspnet4you
Copy link
Author

aspnet4you commented Mar 23, 2020

@M00nF1sh,
Thank you for responding to my question. Unfortunately, I didn't check the logs in the ingress controller before deleting the eks cluster. Any suggestion before I retry eks fargate with alb?

The ingress controllers (pods) were running in kube-system namespace. I did specify was-vpc-id and aws-region in the deployment yaml. For this pic, I didn't have any node group, just a fargate profile. Here is my ingress yaml, https://raw.githubusercontent.com/aspnet4you/eks-fargate-poc/master/alb-ingress-controller.yaml

@M00nF1sh
Copy link
Collaborator

@aspnet4you
Pure Fargate(without any node group) should works fine. (i tested v1.1.4 which you are using works fine).
One tip is change v1.1.4 to v1.1.6 for latest code(but none of these fixes is related to your issue).

From the controller-log, you should see what's wrong, typically it's iam permission or a subnet misttaged.

@aspnet4you
Copy link
Author

@M00nF1sh,
Thanks for the suggestion. I will try the latest version.

I was overly cautious on subnet tags and both the public and private pairs were tagged correctly. Learned that from previous poc with eks and ec2! Matter of fact, eksctl tool did that for me with security groups wide open to all traffic all ports!

@aspnet4you
Copy link
Author

@M00nF1sh ,
Below is what I see in the logs and no ALB!
Can't make anything out of the logs. What can possibly go wrong? I downloaded the latest IAM policy from github.

kubectl logs -p alb-ingress-controller-5db898488b-bqrf6 -n kube-system


AWS ALB Ingress controller
Release: v1.1.6
Build: git-95ee2ac8
Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git

W0324 00:42:59.659618 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0324 00:43:29.660449 1 manager.go:173] kubebuilder/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.100.0.1:443/api?timeout=32s: dial tcp 10.100.0.1:443: i/o timeout"
F0324 00:43:29.660488 1 main.go:84] Get https://10.100.0.1:443/api?timeout=32s: dial tcp 10.100.0.1:443: i/o timeout

image

Thanks,
Prodip

@aspnet4you
Copy link
Author

@M00nF1sh :
More logs.. see the attached file for formatted logs.
kubectl logs -f alb-ingress-controller-5db898488b-bqrf6 -n kube-system


AWS ALB Ingress controller
Release: v1.1.6
Build: git-95ee2ac8
Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git

W0324 00:43:30.859177 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0324 00:43:30.970685 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}
}
I0324 00:43:30.970902 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},
"spec":{},"status":{"loadBalancer":{}}}}
I0324 00:43:30.970963 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"=
I0324 00:43:30.971098 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},
"spec":{},"status":{"loadBalancer":{}}}}
I0324 00:43:30.971131 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"=
I0324 00:43:30.971266 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}
}
I0324 00:43:30.971574 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},
"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","
kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0324 00:43:31.044029 1 leaderelection.go:205] attempting to acquire leader lease kube-system/ingress-controller-leader-alb...
I0324 00:43:31.057484 1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0324 00:43:31.057674 1 recorder.go:53] kubebuilder/manager/events "level"=1 "msg"="Normal" "message"="alb-ingress-controller-5db898488b-bqrf6_7bd33a30-6d68-11ea-994e-7290c1c88576 became leader" "obj
ect"={"kind":"ConfigMap","namespace":"kube-system","name":"ingress-controller-leader-alb","uid":"7bdf9bad-6d68-11ea-8108-0a9dec12172d","apiVersion":"v1","resourceVersion":"4864"} "reason"="LeaderElection"
I0324 00:43:31.158073 1 controller.go:134] kubebuilder/controller "level"=0 "msg"="Starting Controller" "controller"="alb-ingress-controller"
I0324 00:43:31.258364 1 controller.go:154] kubebuilder/controller "level"=0 "msg"="Starting workers" "controller"="alb-ingress-controller" "worker count"=1
W0324 00:51:50.249271 1 reflector.go:270] pkg/mod/k8s.io/client-go@v0.0.0-20181213151034-8d9ed539ba31/tools/cache/reflector.go:95: watch of *v1.Secret ended with: too old resource version: 3846 (6237)
E0324 01:26:23.432194 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="no object matching key "default/aspnetapp-ingress" in local store" "controller"="alb-ingress-cont
roller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:27:10.067226 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:27:56.072913 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:28:47.180817 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:29:37.624242 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:30:13.205391 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:30:51.391739 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:31:32.773034 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:32:21.837140 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:33:12.075720 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:33:50.826910 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:34:37.774838 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:35:28.136156 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:36:28.244697 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:38:00.702172 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:40:14.999806 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:43:45.026893 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:44:34.766833 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}
E0324 01:50:01.950738 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa
iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller
"="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}

alb-ingress-controller-error.txt

@aspnet4you
Copy link
Author

Private subnet tagged by eksctl, looks fine to me-
image

Public subnet tagged by eksctl, looks fine to me-
image

@aspnet4you
Copy link
Author

Hi @M00nF1sh,
Any idea what may be wrong with my configuration? Looks like EKS Farget in not mature enough for production when it comes to ingress!

Thanks,
Prodip

@M00nF1sh
Copy link
Collaborator

@aspnet4you
apparently the real cause of your issue is dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller, did your VPC have an internet GW or nat GW?
Note: even with Fargate, the internet requests for your pods will still use your VPC(we dropped a ENI in your vpc)

@M00nF1sh
Copy link
Collaborator

also, specify these settings without the quote:

- --cluster-name='eks-fargate-alb-ingress-demo'
 - --aws-vpc-id='vpc-057af016ed6507b52'
- --aws-region='us-east-1'

to

- --cluster-name=eks-fargate-alb-ingress-demo
- --aws-vpc-id=vpc-057af016ed6507b52
- --aws-region=us-east-1

You can see the error message of sts.'us-east-1'.amazonaws.com, where even region is quoted

@aspnet4you
Copy link
Author

@M00nF1sh,
You are smart. 💯 That was it! I removed the quotes and alb provisioned as designed. You can close the issue.

I liked how alb auto adjusts the target backed. I changed the scaleset from 2 to 3 pods and I can see new IP is auto added to the target. Nice. :)- This is the reason I didn't want to add alb manually and deal with the auto scaling.

Here is my ingress definition:
image

Ingress resource definition:
image

Thanks,
Prodip

@M00nF1sh
Copy link
Collaborator

cool, glad it works :D

@zquintana
Copy link

I have the exact same issue, I can't figure out what's causing it.

Pod Logs:

{"level":"error","ts":1627148976.691803,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:34703->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149158.8136048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:52341->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149331.7705815,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:58778->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149528.279761,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:55073->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149707.0748882,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:48301->172.20.0.10:53: read: connection refused"}

Container args:

Args:
      --cluster-name=app-rylqFOXa
      --ingress-class=alb
      --aws-region=us-west-2
      --aws-vpc-id=vpc-0e200d3ae7e12447c

Role policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::203341958641:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
        }
      }
    }
  ]
}

The public subnets tagged with:

kubernetes.io/role/elb	1
kubernetes.io/cluster/app-rylqFOXa	shared

Private are basically the same, but with elb-internal. I'm trying to try out fargate as a POC for work. What might I be missing here?

@aspnet4you
Copy link
Author

@zquintana,
Your issue is little different than what I was facing. Your controller definition looks ok.

Do you want to double check your vpc subnet tags for private subnet? As per documentation, it should be internal-elb and not elb-internal.
https://aws.amazon.com/premiumsupport/knowledge-center/eks-vpc-subnet-discovery/

Key: kubernetes.io/role/internal-elb
Value: 1

Things may have changed a bit since I performed the poc. I have all the supporting files in github.com and entrypoint is https://github.com/aspnet4you/eks-fargate-poc/blob/master/eks-fargate-alb-ingress-v2.ps1

@zquintana
Copy link

@aspnet4you , yea looks like it's internal-elb. Typo. I'm using the official AWS helm chart.

@zquintana
Copy link

Turns out my issue was this #1360, core dns wasn't setup for fargate only cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants