New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate issuing doesn't work with rewrite-target #286

Closed
turbolent opened this Issue Jan 27, 2018 · 12 comments

Comments

Projects
None yet
8 participants
@turbolent

turbolent commented Jan 27, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I tried issuing a certificate for a host. I'm using the nginx ingress controller and its nginx.ingress.kubernetes.io/rewrite-target annotation, which allows rewriting the requested path when forwarding it to a service. The ingress looks something like this (I tried this for my own domain, not example.org, I only use it as an example here):

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/rewrite-target: /
  name: test
spec:
  rules:
  - host: example.org
    http:
      paths:
      - backend:
          serviceName: test-service
          servicePort: 80
        path: /test
  tls:
  - hosts:
    - example.org
    secretName: example-tls

Logs:

I0127 19:46:14.045231       1 controller.go:187] certificates controller: syncing item 'default/example-tls'
I0127 19:46:14.046410       1 sync.go:107] Error checking existing TLS certificate: secret "example-tls" not found
I0127 19:46:14.046558       1 sync.go:238] Preparing certificate with issuer
I0127 19:46:14.047244       1 prepare.go:239] Compare "" with "https://acme-v01.api.letsencrypt.org/acme/reg/123456789"
I0127 20:01:16.530091       1 helpers.go:165] Setting lastTransitionTime for Certificate "example-tls" condition "Ready" to 2018-01-27 20:01:16.530040382 +0000 UTC m=+515317.420158477
I0127 20:01:16.530692       1 sync.go:242] Error preparing issuer for certificate: error waiting for key to be available for domain "example.org": context deadline exceeded
E0127 20:01:16.552113       1 sync.go:190] [default/example-tls] Error getting certificate 'example-tls': secret "example-tls" not found
E0127 20:01:16.552219       1 controller.go:196] certificates controller: Re-queuing item "default/example-tls" due to error processing: error waiting for key to be available for domain "example.org": context deadline exceeded
I0127 20:01:16.552288       1 controller.go:187] certificates controller: syncing item 'default/example-tls'
...

I think the cause is that cert-manager adds an additional rule to the ingress for the validation:

- backend:
    serviceName: cm-main-tls-uwson
    servicePort: 8089
  path: /.well-known/acme-challenge/XXX

It then tries to parse the challenge from the path (XXX), but it has been rewritten due to the use of the nginx ingress controller annotation.

Maybe it is possible to change the path that is added by cert-manager to just /.well-known/acme-challenge, so even if the rewrite is performed, the challenge component of the path is still avaialable?

What you expected to happen:

A certificate to be issued.

How to reproduce it (as minimally and precisely as possible):

See ingress definition above.

Environment:

  • Kubernetes version (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration**: Self-hosted

@whereisaaron

This comment has been minimized.

Contributor

whereisaaron commented Jan 28, 2018

Hi @turbolent. By default for Ingress-triggered certificates cert-manager will modify the existing Ingress and use an http01 challenge. This means it can get screwed up by ingress controller special behaviors - as in your example.

You can avoid this drama by getting cert-manager to create its own temporary Ingress resource. You trigger this by creating a Certificate resource with http01.ingressClass set to your kubernetes.io/ingress.class value, e.g. the default nginx. See an example below.

@munnerz is there or should there be an Ingress annotation that can trigger cert-manager to create its own Ingress rather than modify?

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: prometheus.example.com
  namespace: prometheus
spec:
  secretName: prometheus.example.com
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  dnsNames:
  - prometheus.example.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - prometheus.example.com

In general this is a far more robust way to create certificates, though some abhorrent ingress controller implementations don't correct support multiple Ingress resources for the same domain (a mandatory ability since Ingresses are namespaced). Apparently the commercial Nginx Plus ingress control is broken in this way. YMMV.

@turbolent

This comment has been minimized.

turbolent commented Jan 28, 2018

Thanks for the explanation and the example @whereisaaron!

Does cert-manager stop adding it's own rule to an ingress for the http01 challenge once I create a certificate as you described above, or does it need to exist before creating the ingress which should get a certificate?

I worked around it for now, but I think this will still be useful documentation for somebody else who might run into this.

@munnerz

This comment has been minimized.

Member

munnerz commented Jan 29, 2018

If you use the ingressClass field instead of ingress, it doesn't matter when the applications ingress exists as it the Certificate resource and your ingress resource are completely decoupled. cert-manager will then create a new Ingress resource each time a validation needs to be performed, which an ingress controller like nginx will flatten down to merge with any other ingresses for the same hostname/domain.

This has the advantage of allowing per-ingress rewrite-targets, which is why this resolves your problem.

@whereisaaron not right now, but we also have the 'setting annotations on ingresses/services to disable auth etc' problem to solve, which is of a similar ilk. Not sure on what our best strategy should be right now, it'd be good to write up a few solutions somewhere so we can model it/mock up some manifests.

@mactr0n

This comment has been minimized.

mactr0n commented Jan 31, 2018

@whereisaaron I got the same problem as @turbolent but by using shim. Since the certificate is created by ingress-shim I can't add ingressClass: nginx. Is it possible to create another annotation for that or is there a better way?

@whereisaaron

This comment has been minimized.

Contributor

whereisaaron commented Feb 1, 2018

HI @mactr0n, right now the better way is to create the Certificate resource yaml yourself, with ingressClass: nginx, instead of annotating the Ingress and relying on ingress-shim.

@Jokero

This comment has been minimized.

Jokero commented Apr 18, 2018

+1 to fail with ingress-shim: Error checking ACME domain validation: error waiting for key to be available for domain "mycooldomain.com": context deadline exceeded

@bananaspliff

This comment has been minimized.

bananaspliff commented Apr 18, 2018

@MartinodF

This comment has been minimized.

MartinodF commented Apr 19, 2018

Not only is the temporary workaround proposed by whereisaaron far from optimal, but the failing behaviour seems to be associated with some kind of memory leak.

One of our domains expired before the ingress was removed, and cert-manager has been trying to renew it for a few days. Here's the pod logs (the message repeats every 15 minutes):

I0419 12:01:01.486498       1 controller.go:187] certificates controller: syncing item 'turn/web-production-tls'
I0419 12:01:01.486981       1 sync.go:238] Preparing certificate with issuer
I0419 12:01:01.487532       1 prepare.go:239] Compare "" with "https://acme-v01.api.letsencrypt.org/acme/reg/12345678"
I0419 12:01:01.487649       1 prepare.go:239] Compare "https://acme-v01.api.letsencrypt.org/acme/reg/12345678" with "https://acme-v01.api.letsencrypt.org/acme/reg/12345678"
I0419 12:16:06.391451       1 sync.go:242] Error preparing issuer for certificate: error waiting for key to be available for domain "www.example.com": context deadline exceeded
I0419 12:16:06.401255       1 sync.go:200] Certificate scheduled for renewal in -53 hours
E0419 12:16:06.401535       1 controller.go:196] certificates controller: Re-queuing item "turn/web-production-tls" due to error processing: error waiting for key to be available for domain "www.example.com": context deadline exceeded

Here's the pod memory usage, from before the domain expired up until today:

image

The steeper part of the graph is associated with having two domains failing at the same time.
As you can see, memory usage went from a stable 24Mb to over 800Mb in a few days after being constant for a few weeks of normal operation.

Should I open a separate issue for this (apparent) memory leak?
Is there any additional info I can provide you?

@munnerz

This comment has been minimized.

Member

munnerz commented Apr 19, 2018

@MartinodF #493 tracks a new annotation that should allow you to do this without having to manually specify your Certificate resource.

Regarding the memory leak - I think this is due to the way HTTP challenges were cleaned up in the past. ACMEv2 (i.e. #309) has drastically changed how it works, and so I don't think that bug will exist still.

Are you able to confirm by trying the v0.3.0-alpha.1 release? You'll also need to update the acme URL in all of your Issuer resources after upgrading.

@MartinodF

This comment has been minimized.

MartinodF commented Apr 19, 2018

@munnerz thanks for the pointer to #493. Glad to see a more complete solution is being worked on!

As for the memory leak, we just deployed a working v0.3.0-alpha.1 pod. We'll report back to you in a few hours after we've gathered some data about its memory consumption. Hopefully the leak is gone in the ACMEv2 issuer!

@MartinodF

This comment has been minimized.

MartinodF commented Apr 19, 2018

@munnerz after a couple of hours of trying to renew the certificate once a minute, I can confirm that memory usage is stable around 21Mb on version v0.3.0-alpha.1.

Thank you for the help and for the great work on cert-manager!

@munnerz

This comment has been minimized.

Member

munnerz commented Jun 5, 2018

Closing this as we now support the edit-in-place annotation for ingress-shim!

@munnerz munnerz closed this Jun 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment