Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting x509-certificate-signed-by-unknown-authority #1154

Closed
ackris opened this issue Feb 8, 2021 · 10 comments
Closed

Getting x509-certificate-signed-by-unknown-authority #1154

ackris opened this issue Feb 8, 2021 · 10 comments
Labels

Comments

@ackris
Copy link

ackris commented Feb 8, 2021

Hi Everyone,

I have been able to successfully access an eks cluster created via eks terraform module with a caveat. I am unable to access the cluster securely.

Version Information

Terraform v0.14.5
+ provider registry.terraform.io/hashicorp/aws v3.26.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2
+ provider registry.terraform.io/hashicorp/local v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.1
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/tls v3.0.0

As far as I can understand, kubernetes provider is not accepting the cert generated during eks instantiation as safe/valid.

Until I pass insecure = true, I am unable to access the cluster. Please find below my scripts.

k8s-provider.tf:

provider "kubernetes" {
    host = data.aws_eks_cluster.cluster.endpoint
    #cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token = data.aws_eks_cluster_auth.cluster.token
    config_path = "./kubeconfig_${var.cluster_name}"
    insecure = true
}

eks-cluster.tf:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

module "eks" {
    source = "terraform-aws-modules/eks/aws"
    version = "14.0.0"
    cluster_version = var.cluster_version
    cluster_name = var.cluster_name
    subnets = module.vpc.private_subnets
    cluster_endpoint_private_access = true
    cluster_create_timeout = "1h"
    vpc_id = module.vpc.vpc_id
    worker_groups = [
        {
            name = "atomstate_worker_group_one"
            instance_type = "t2.small"
            asg_desired_capacity = 1
            additional_security_group_ids = [ aws_security_group.worker_group_one.id ]
        }
    ]
    workers_group_defaults = {
        root_volume_type = "gp2"
    }
    wait_for_cluster_interpreter = ["C:\\Program Files\\Git\\bin\\sh.exe", "-c"]
    wait_for_cluster_cmd = "until curl -sk $ENDPOINT >/dev/null; do sleep 4; done"
}

As you can see, I had to comment out cluster_ca_certificate attribute and mention insecure as true.

Steps to reproduce

  1. Use the versions as highlighted above.
  2. Create EKS cluster using VPC and EKS terraform modules.
  3. Make insecure as false and don't comment out cluster_ca_cert.
  4. terraform apply.
  5. Get the x509 certificate error.

Expected Behavior
Access the cluster securely without x509 certifcation error.

Actual Behavior
Accessing the cluster insecurely with insecure set to true.

References
https://discuss.hashicorp.com/t/x509-certificate-signed-by-unknown-authority/8671

@ackris ackris added the bug label Feb 8, 2021
@dak1n1
Copy link
Contributor

dak1n1 commented Feb 10, 2021

This could happen if the value of data.aws_eks_cluster.cluster.certificate_authority.0.data is unknown when the provider is initialized. Can you try running terraform refresh and see if that pulls in a new value for the CA cert? Alternatively, a targeted apply could help:

terraform apply -target=module.eks

I have a similar configuration, but I was fetching the certificate from the EKS module like this:

https://github.com/hashicorp/terraform-provider-kubernetes/blob/master/_examples/eks/main.tf#L56

I think your configuration is a better approach though. I'll update the example config using your approach and let you know the results.

@ackris
Copy link
Author

ackris commented Feb 17, 2021

@dak1n1 I will try your suggestion mate and let you know if it has worked. Cheers!

@ackris
Copy link
Author

ackris commented Feb 26, 2021

@dak1n1 - Is instance size the reason behind this error? Because this error is not cropping up when I change the instance to a bigger one...like m4.large instead of a smaller one like t2.small.

I am just assuming, and it is a wild guess.

now my code block is

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  config_path            = "./kubeconfig_${var.cluster_name}"
  insecure               = false
}

@dak1n1
Copy link
Contributor

dak1n1 commented Feb 26, 2021

That is interesting... In my testing, I've been able to update the instance size of an EKS cluster without having the cluster get re-created, so updating the instance size shouldn't cause data.aws_eks_cluster.cluster to become unknown, and therefore shouldn't trigger the Kubernetes provider to have any authentication or certificate issues.

BTW, I did incorporate the data source you used into our EKS example, since it's a more reliable way to refer to the certificates than using the module outputs. So that part works well.

I tried out instance size t2.small and even changed it to m4.large and that worked.

Oh! You know what I just noticed... this configuration is actually using mutually-exclusive authentication options. 🤦 Sorry I didn't see that before! When multiple ways of authenticating are specified, such as when using config_path with token, the Kubernetes provider will combine them in ways that are difficult to predict. There's a chance it could be pulling the CA cert from config_path instead of using the one you're passing into it explicitly with cluster_ca_certificate:

provider "kubernetes" {
    host = data.aws_eks_cluster.cluster.endpoint
    #cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token = data.aws_eks_cluster_auth.cluster.token
    config_path = "./kubeconfig_${var.cluster_name}"
    insecure = true
}

I have a fix for that provider bug that will make it easier to configure this.

In the mean time, can you try a configuration that does not specify config_path? Also check for any environment variables on the system that start with KUBE. Those are pulled into the provider config and can override statically configured settings like this, until we release that fix.

For example:

This is how I check for environment variables on my system:

env |grep KUBE
unset KUBE_CONFIG_PATH

I usually unset any that might interfere (here's a list of them):

Assuming there are no KUBE environment variables interfering, the following provider config should work. This is the one I had been using in my tests since I saw this issue:

data "aws_eks_cluster_auth" "default" {
  name = "module.eks.cluster_id"
}

data "aws_eks_cluster" "default" {
  name = "module.eks.cluster_id"
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

Once the fix is released, the provider will be more straightforward in telling you exactly what options are incompatible with what other options, rather than silently combining some of the given options and ignoring some other options, as it does today.

@ackris
Copy link
Author

ackris commented Feb 27, 2021

@dak1n1 Those are some interesting observations!

I will try out the config without config_path and let you know the results.

Have a great weekend!

@ackris
Copy link
Author

ackris commented Feb 27, 2021

@dak1n1 I've applied the following k8s logic upon your suggestion.

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

Removed config_path, and it has worked. I didn't see the error that's been highlighted in the issue summary.

I somehow feel the issue highlighted in summary arose due to a combination of wrong instance size and mixed authentication scenario. Why I am saying this because I was able to connect to k8s cluster using kubectl after modifying instance size. No error cropped up (as told in previous message). Also, I didn't remove config_path at that time.

Looking forward for the new version!

@dak1n1
Copy link
Contributor

dak1n1 commented Feb 27, 2021

I'm glad to hear it worked! I'll go ahead and close this for now. If you find the issue comes up again, we can re-open it.

@dak1n1 dak1n1 closed this as completed Feb 27, 2021
@ackris
Copy link
Author

ackris commented Mar 1, 2021

Sure 👍

Another advantage of avoiding config_path variable in k8s provider configuration is when a user tries to destroy k8s cluster, Terraform wouldn't throw kubeconfig file is not available in the path error.
`
You can mention this in the documentation.

Have a great day!

@dak1n1
Copy link
Contributor

dak1n1 commented Mar 1, 2021

@ackris I haven't encountered that error myself, but my team would be happy to take a look in a new github issue, especially if you have a config that reproduces the issue. I wouldn't want to leave the bug there and document it as expected behavior, since we can probably fix it instead. Offhand, I think it could be solved in the configuration by ensuring that Terraform knows about the dependency between the file and the Kubernetes provider. Referencing the file by its resource name, rather than an output or hard-coded file name, would establish an implicit dependency between the two. We could document the configuration for that, if it ends up being better solved in configs than in the provider code.

@ghost
Copy link

ghost commented Mar 30, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants