Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster CA validation fails with an HTTPS proxy #895

Closed
dhermanns opened this issue Jun 18, 2019 · 8 comments · Fixed by #2294
Closed

cluster CA validation fails with an HTTPS proxy #895

dhermanns opened this issue Jun 18, 2019 · 8 comments · Fixed by #2294
Assignees
Labels
kind/feature New feature or request

Comments

@dhermanns
Copy link

dhermanns commented Jun 18, 2019

What happened?
I tried to create a new cluster using eksctl create cluster. eksctl failed with a timeout.
Until this, I see the following error message:

"control plane not read yet - certificate signed by unknown authority"

After 25 minutes, I get:

"2019-06-18T11:15:11+02:00 [✖] timed out waiting for control plane "lvm" after 25m0s"

After checking with kubectl get nodes it turns out, that no nodes have been assigend to
the cluster.

What you expected to happen?
A new cluster with assigned nodes should have been created.

How to reproduce it?

eksctl create cluster -v 4 \                                                                                       
--name lvm \
--version 1.12 \
--nodegroup-name standard-workers \
--node-type t2.small \
--nodes 2 \
--nodes-min 1 \
--nodes-max 4 \
--node-ami auto

Anything else we need to know?
I'm running on ubuntu 16.04. aws-iam-authenticator is installed.

Versions
Please paste in the output of these commands:

$ eksctl version: 0.1.35
$ uname -a: Linux c518837 4.15.0-34-generic #37lvm6 SMP Thu Feb 28 17:25:59 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version: Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Logs
This is just an extract. Let me know if you need more (I don't want to paste something sensitive here):

2019-06-18T10:46:16+02:00 [ℹ]  deploying stack "eksctl-lvm-nodegroup-standard-workers"
2019-06-18T10:46:16+02:00 [▶]  start waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:46:16+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:46:35+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:46:53+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:47:12+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:47:32+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:47:48+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:48:05+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:48:24+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:48:43+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:49:01+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:49:20+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:49:37+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:49:55+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:50:11+02:00 [▶]  waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:50:11+02:00 [▶]  done after 3m55.339596715s of waiting for CloudFormation stack "eksctl-lvm-nodegroup-standard-workers" to reach "CREATE_COMPLETE" status
2019-06-18T10:50:11+02:00 [▶]  processing stack outputs
2019-06-18T10:50:11+02:00 [▶]  completed task: create nodegroup "standard-workers"
2019-06-18T10:50:11+02:00 [▶]  completed task: create nodegroup "standard-workers"
2019-06-18T10:50:11+02:00 [✔]  all EKS cluster resource for "lvm" had been created
2019-06-18T10:50:11+02:00 [▶]  merging kubeconfig files
2019-06-18T10:50:11+02:00 [▶]  setting current-context to dirk@lvm.eu-central-1.eksctl.io
2019-06-18T10:50:11+02:00 [✔]  saved kubeconfig as "/home/m500516/.kube/config"
2019-06-18T10:50:31+02:00 [▶]  control plane not ready yet – Get https://96B53D35625B928059BE883C7EAD4435.yl4.eu-central-1.eks.amazonaws.com/version?timeout=32s: x509: certificate signed by unknown authority
2019-06-18T10:50:51+02:00 [▶]  control plane not ready yet – Get https://96B53D35625B928059BE883C7EAD4435.yl4.eu-central-1.eks.amazonaws.com/version?timeout=32s: x509: certificate signed by unknown authority
@martina-if
Copy link
Contributor

Hi @dhermanns , thanks for your report. What region are you using? I have seen this problem before when using t2 instances in newer regions because they don't exist. And we don't get a good error message from cloud formation when this happens. You can try the newer t3.small instead and see if that works. If not, we might have a bug somewhere. I'll take a look.

@dhermanns
Copy link
Author

Hi Martina, I'm using eu-central-1.

I will try to switch to t3.small and give it a try ;-)

Is there a way to disable TLS verification? Our proxy breaks TLS connection and I have to add our CA or disable verification to fix this.

I'm thinking of something like

eksctl --no-verify-ssl

@dhermanns
Copy link
Author

Same error with t3.small.

@dhermanns
Copy link
Author

When I try to delete e.g. a nodegroup afterwards using:

eksctl delete nodegroup -f eks-cluster.yml --approve

I get an error, too:
[✖] checking if cluster implements policy API: Get https://0567D3E19AD4BD198467F357028E.yl4.eu-central-1.eks.amazonaws.com/api?timeout=32s: x509: certificate signed by unknown authority

And that's because I would have to add our ca.crt to eksctl or would have to use something equivalent to .kube/config's insecure-skip-tls-verify: true.

Is there a way to let eksctl ignore TLS certificates?

@errordeveloper
Copy link
Contributor

errordeveloper commented Jun 18, 2019

Is there a way to let eksctl ignore TLS certificates?

Not at the moment, but you should be able to work around this. And we'd like to fix it, of course.

Please try this:

  • eksctl create cluster --without-nodegroup --kubeconfig=./kubeconfig
  • edit kubeconfig to disable TLS verification
  • use eksctl create nodegroup --update-auth-configmap=false (it may error out, but should still create the nodegroup)
  • update configmap manually to authorise the node

To update the configmap, you will need instance role ARN, which can be found in many different ways, e.g. in CloudFormation stack outputs. You can get those outputs by using the CloudFormation console, or with eksctl utils describe-stacks.

@errordeveloper errordeveloper added kind/feature New feature or request and removed kind/bug labels Jun 18, 2019
@errordeveloper errordeveloper changed the title eksctl create cluster results in "control plane not read yet - certificate signed by unknown authority" cluster CA validation fails with an HTTPS proxy Jun 18, 2019
@antoine-choimet-cbp
Copy link

hello, any update on this feature?

@citrusoft
Copy link

+1

@michaelbeaumont
Copy link
Contributor

@antoine-choimet-cbp @citrusoft If you have time, please try out my branch with the new option #2294

torredil pushed a commit to torredil/eksctl that referenced this issue May 20, 2022
remove hardcoded namespace for pod disruption budget
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
6 participants