Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http-01 self check failed for domain #656

Closed
AmbroiseCouissin opened this issue Jun 14, 2018 · 51 comments
Closed

http-01 self check failed for domain #656

AmbroiseCouissin opened this issue Jun 14, 2018 · 51 comments

Comments

@AmbroiseCouissin
Copy link

@AmbroiseCouissin AmbroiseCouissin commented Jun 14, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
I get the message: http-01 self check failed for domain ""

$ kubectl describe certificates website-cert

Name:         website-cert
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"website-cert","namespace":"default"},"spe...
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-06-14T14:56:48Z
  Generation:          0
  Resource Version:    14514530
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/website-cert
  UID:                 2a172bc7-6fe3-11e8-a23d-00163e0067a2
Spec:
  Acme:
    Config:
      Domains:
        <redacted>.com
      Http 01:
        Ingress:  ingress
  Common Name:
  Dns Names:
    <redacted>.com
  Issuer Ref:
    Name:       letsencrypt-issuer-staging
  Secret Name:  website-cert
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ
        Domain:     <redacted>.com
        Http 01:
          Ingress:  ingress
        Key:        VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0
        Token:      VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
        Type:       http-01
        URL:        https://acme-staging-v02.api.letsencrypt.org/acme/challenge/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ/135522965
        Wildcard:   false
      URL:          https://acme-staging-v02.api.letsencrypt.org/acme/order/6285995/2040425
  Conditions:
    Last Transition Time:  2018-06-14T14:56:56Z
    Message:               http-01 self check failed for domain "<redacted>.com"
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:
  Type    Reason       Age   From          Message
  ----    ------       ----  ----          -------
  Normal  CreateOrder  4s    cert-manager  Created new ACME order, attempting validation...

If I get all the events:

I0614 15:03:16.667525       1 controller.go:177] certificates controller: syncing item 'default/website-cert'
I0614 15:03:16.667660       1 sync.go:239] Preparing certificate default/website-cert with issuer
I0614 15:03:16.667674       1 acme.go:159] getting private key (letsencrypt-issuer-staging->tls.key) for acme issuer default/letsencrypt-issuer-staging
I0614 15:03:16.668072       1 logger.go:27] Calling GetOrder
I0614 15:03:16.876856       1 logger.go:52] Calling GetAuthorization
I0614 15:03:17.065635       1 logger.go:72] Calling HTTP01ChallengeResponse
I0614 15:03:17.065678       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/website-cert
I0614 15:03:17.065696       1 logger.go:47] Calling GetChallenge
I0614 15:03:17.266766       1 helpers.go:162] Found status change for Certificate "website-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-14 15:03:17.266752283 +0000 UTC m=+20046.828096097
I0614 15:03:17.266805       1 sync.go:241] Error preparing issuer for certificate default/website-cert: http-01 self check failed for domain "<redacted>.com"
E0614 15:03:17.272906       1 sync.go:168] [default/website-cert] Error getting certificate 'website-cert': secret "website-cert" not found
E0614 15:03:17.272958       1 controller.go:186] certificates controller: Re-queuing item "default/website-cert" due to error processing: http-01 self check failed for domain "<redacted>.com"

What you expected to happen:
The self check to succeed

How to reproduce it (as minimally and precisely as possible):
Here is my Ingress:

spec:
  tls:
    - hosts:
        - <redacted>.com
      secretName: website-cert
  rules:
    - host: <redacted>.com
      http:
        paths:
          - backend:
              servicePort: 80
              serviceName: website
            path: /
          - backend:
              servicePort: 8089
              serviceName: cm-acme-http-solver-7lvgt
            path: >-
              /.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
apiVersion: extensions/v1beta1
status:
  loadBalancer:
    ingress:
      - ip: {IP}
kind: Ingress
metadata:
  uid: 6c304201-6fe2-11e8-8294-00163e020142
  resourceVersion: '14515959'
  name: ingress
  creationTimestamp: '2018-06-14T14:51:30Z'
  selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/ingress
  generation: 4
  namespace: default

Here is my Issuer:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: letsencrypt-issuer-staging
  namespace: default
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: <redacted>

    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-issuer-staging
    http01: {}

Here is my certificate:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: website-cert
spec:
  secretName: website-cert
  dnsNames:
  - <redacted>.com
  acme:
    config:
    - http01:
        ingress: ingress
      domains:
      - <redacted>.com
  issuerRef:
    name: letsencrypt-issuer-staging

Anything else we need to know?:
When I navigate to

http://<redacted>.com/.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw

I get:

VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0

Also, if I look at the logs of the cm-acme pod:

2018/06/14 17:31:58 [<redacted>.com] Validating request. basePath=/.well-known/acme-challenge, token=VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
2018/06/14 17:31:58 [<redacted>.com] Comparing actual host '<redacted>.com' against expected '<redacted>.com'
2018/06/14 17:31:58 [<redacted>.com] Got successful challenge request, writing key...

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: Aliyun Container Service
  • Install tools:
  • Others:

I've been struggling for two days. It's probably something really stupid from my side :)

Any idea?

@AmbroiseCouissin

This comment has been minimized.

Copy link
Author

@AmbroiseCouissin AmbroiseCouissin commented Jun 15, 2018

The problem solved by itself today. I don't know how.

Thanks for cert-manager. It's really a great tool!

@arianitu

This comment has been minimized.

Copy link

@arianitu arianitu commented Jun 19, 2018

I'm running into the same thing. I see in the logs say writing key..., but if I look at the certificate, it says its still validating it.

Super buggy

@munnerz

This comment has been minimized.

Copy link
Member

@munnerz munnerz commented Jun 19, 2018

@dunjoye4real

This comment has been minimized.

Copy link

@dunjoye4real dunjoye4real commented Jun 19, 2018

I am running into this same issue, How long does it take?

@AmbroiseCouissin

This comment has been minimized.

Copy link
Author

@AmbroiseCouissin AmbroiseCouissin commented Jun 21, 2018

It took me two-three days. But now when generate certificates for other subdomains, it takes less than a minute.

@oanogin

This comment has been minimized.

Copy link

@oanogin oanogin commented Jun 27, 2018

it's strange behavior, i have 2 dns names (asd.team1.example.com and asd.another.com), and interesting points are:

  • with asd.team1.example.com domain - everything works fine
  • with certbot on this machine and both dns - everything works fine
  • both domains is also accessible ( http version of service works fine )

And only for asd.another.com i can't obtain cert by cert-manager, but with certbot on this machine everything works fine

:(

@hekonsek

This comment has been minimized.

Copy link

@hekonsek hekonsek commented Jul 10, 2018

I encountered the same issue. Any address I choose for my app works, except a single one whose validation is blocked by http-01 self check failed for domain error. In particular http://foo.mydomain.com doesn't work, but for example http://foo-app.mydomain.com works like a charm and can be validated in less than a minute.

I'm trying to figure our from logs what could be a reason for this single subdomain to fail self check validation.

@gabx

This comment has been minimized.

Copy link

@gabx gabx commented Jul 28, 2018

Same error as the OP. Here is my certificate.yaml file. certificate has been created, but since then, no LTS certificate from let's encrypt.

% cat longhorn-certificate.yaml 
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: longhorn-thetradinghall-com
  namespace: default
spec:
  secretName: longhorn-thetradinghall-com-tls
  issuerRef:
    name: letsencrypt-cluster
    kind: ClusterIssuer
  dnsNames:
  - longhorn.thetradinghall.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - longhorn.thetradinghall.com
@maresende

This comment has been minimized.

Copy link

@maresende maresende commented Aug 2, 2018

Same error as OP.(2)

@Zetanova

This comment has been minimized.

Copy link

@Zetanova Zetanova commented Aug 3, 2018

updated now from v2.5
I tried ingressClass: nginx and ingress: my-ingress

With the last the ingress get with the acme collange extended
and can be quired successfully in the browser.

cert-manager still logs:
http-01 self check failed for domain "mydomain.tt"

@Zetanova

This comment has been minimized.

Copy link

@Zetanova Zetanova commented Aug 3, 2018

I could solve it.

The hairpin mode of the NLB in front of the cluster didnt work.

@ngo275

This comment has been minimized.

Copy link

@ngo275 ngo275 commented Aug 10, 2018

I ran into the same problem.. but I tried it again after a while then it succeeded..!
This is weird..

@Darwinyo

This comment has been minimized.

Copy link

@Darwinyo Darwinyo commented Aug 20, 2018

i just have same problem today.
I have 5 domains to validate.
www.farmersflorals.com, farmersflorals.com, api.farmersflorals.com, identity.farmersflorasl.com, and blog.farmersflorals.com.

only 3 of those validated www.farmersflorals.com, farmersflorals.com, and identity.farmersflorals.com

other not.

all of those refer to same IP, i could access all of them, but only 3 validated, that's weird.

i'm using helm chart version 0.4.1
with ingress on GKE

@Antiarchitect

This comment has been minimized.

Copy link

@Antiarchitect Antiarchitect commented Aug 23, 2018

Having the same issue. I have two clusters tuned absolutely identically in terms of nginx-ingress and cert-manager and the third one is lagging. all three domains self check failed. I have clusters for prod and staging - now it's QA turn. Nothing works. Logs don't say anything useful

@Antiarchitect

This comment has been minimized.

Copy link

@Antiarchitect Antiarchitect commented Aug 23, 2018

I'm on the GCP using GKE. Removing nginx-ingress and turn on it back helped. Ephemeral external IP seems preserved magically.

@innovia

This comment has been minimized.

Copy link

@innovia innovia commented Aug 26, 2018

same issue here - the pod of the challenge are up running with no logs, and cert manager is failing the self check.

I manually deleted the secret for the TLS and it successfully generated the cert.

I have tested a pod with the same service account name to create and update a secret and it succeeded so its not an RBAC solution.

here's my log:

sync.go:127] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" is up to date
controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-gp4h8'

logger.go:52] Calling GetChallenge

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-cr47k as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-cr47k"

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-tmz4r'

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-srrjm'

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-srrjm"

controller.go:195] certificates controller: Finished processing work item "backend-prod/web-backend-prod-tls"

controller.go:152] ingress-shim controller: syncing item 'backend-prod/backend-web-gunicorn-nginx-ingress-config'

service.go:35] No existing HTTP01 challenge solver service found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:124] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" already exists

helpers.go:188] Found status change for Certificate "web-backend-prod-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-25 19:55:25.649155183 +0000 UTC m=+23.060496450

sync.go:174] Certificate backend-prod/web-backend-prod-tls scheduled for renewal in -728 hours

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-gp4h8 as it does not contain necessary annotations

ingress.go:33] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=3600485562,certmanager.k8s.io/acme-http-token=1325141813

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/x-server-backend-prod-tls". One will be created.

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-tmz4r"

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-srrjm as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/backend-web-gunicorn-nginx-ingress-config"

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-gp4h8"

pod.go:49] No existing HTTP01 challenge solver pod found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:282] Error preparing issuer for certificate backend-prod/web-backend-prod-tls: [http-01 self check failed for domain "web.backend.server.com", http-01 self check failed for domain "web.server.com"]
@stefanvladvoinea

This comment has been minimized.

Copy link

@stefanvladvoinea stefanvladvoinea commented Aug 28, 2018

I ran into the same issue today

@sjbarrio

This comment has been minimized.

Copy link

@sjbarrio sjbarrio commented Aug 29, 2018

I have the same issue too

@jpfaria

This comment has been minimized.

Copy link

@jpfaria jpfaria commented Sep 1, 2018

me too

@xaralis

This comment has been minimized.

Copy link

@xaralis xaralis commented Sep 5, 2018

@munnerz Could this issue be reopened? Seems to be happening to a lot of people, myself included.

Kubectl reports "http-01 self check failed" while solver logs claim "Got successfull challenge request, writing key ..." and seem to be stuck in loop.

xaralis@h90-dockertest1-gateway1:~$ kubectl describe certificate cert-test-rancher-f-app-it-letsencrypt
Name:         cert-test-rancher-f-app-it-letsencrypt
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2018-09-05T12:16:48Z
  Generation:          1
  Resource Version:    5504
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/cert-test-rancher-f-app-it-letsencrypt
  UID:                 90110fb7-b105-11e8-8c33-00163e000206
Spec:
  Acme:
    Config:
      Domains:
        test.rancher.f-app.it
      Http 01:
        Ingress:
        Ingress Class:  nginx
  Common Name:          test.rancher.f-app.it
  Dns Names:
    test.rancher.f-app.it
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       letsencrypt-staging
  Secret Name:  test-rancher-f-app-it-letsencrypt-tls
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/fpiC_AFvxd3BK6450WXXWuu_18iHr0PQ6ewYHeT-e34
        Domain:     test.rancher.f-app.it
        Http 01:
          Ingress:
          Ingress Class:  nginx
        Key:              r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0.TIJdwGgLPcC8d-Ki7ofbRruiCs47RHeBVc2TttYrT34
        Token:            r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
        Type:             http-01
        URL:              https://acme-staging-v02.api.letsencrypt.org/acme/challenge/fpiC_AFvxd3BK6450WXXWuu_18iHr0PQ6ewYHeT-e34/167935148
        Wildcard:         false
      URL:                https://acme-staging-v02.api.letsencrypt.org/acme/order/6875195/7290623
  Conditions:
    Last Transition Time:  2018-09-05T12:23:17Z
    Message:               http-01 self check failed for domain "test.rancher.f-app.it"
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:
  Type    Reason       Age   From          Message
  ----    ------       ----  ----          -------
  Normal  CreateOrder  6m    cert-manager  Created new ACME order, attempting validation...

Solver log:

2018/09/05 12:19:37 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:40 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:40 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:40 [test.rancher.f-app.it] Got successful challenge request, writing key...

Cert-manager log (repeats this again and again):

I0905 12:31:22.735124       1 sync.go:242] Preparing certificate default/cert-test-rancher-f-app-it-letsencrypt with issuer
I0905 12:31:22.735137       1 acme.go:169] getting private key (letsencrypt-staging->tls.key) for acme issuer kube-system/letsencrypt-staging
I0905 12:31:22.735520       1 logger.go:27] Calling GetOrder
I0905 12:31:22.952199       1 logger.go:57] Calling GetAuthorization
I0905 12:31:23.137759       1 logger.go:77] Calling HTTP01ChallengeResponse
I0905 12:31:23.137792       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/cert-test-rancher-f-app-it-letsencrypt
I0905 12:31:23.137826       1 logger.go:52] Calling GetChallenge
I0905 12:31:23.356645       1 ingress.go:33] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=1490028511,certmanager.k8s.io/acme-http-token=923668044
I0905 12:31:23.356919       1 helpers.go:188] Found status change for Certificate "cert-test-rancher-f-app-it-letsencrypt" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-09-05 12:31:23.35691253 +0000 UTC m=+1534.926270721
I0905 12:31:23.357090       1 sync.go:244] Error preparing issuer for certificate default/cert-test-rancher-f-app-it-letsencrypt: http-01 self check failed for domain "test.rancher.f-app.it"
E0905 12:31:23.357267       1 sync.go:165] [default/cert-test-rancher-f-app-it-letsencrypt] Error getting certificate 'test-rancher-f-app-it-letsencrypt-tls': secret "test-rancher-f-app-it-letsencrypt-tls" not found
E0905 12:31:23.379806       1 controller.go:190] certificates controller: Re-queuing item "default/cert-test-rancher-f-app-it-letsencrypt" due to error processing: http-01 self check failed for domain "test.rancher.f-app.it"
I0905 12:32:23.379648       1 controller.go:181] certificates controller: syncing item 'default/cert-test-rancher-f-app-it-letsencrypt'
@munnerz

This comment has been minimized.

Copy link
Member

@munnerz munnerz commented Sep 5, 2018

Hey @xaralis - this issue has been closed as this error message is expected whilst the self check is failing, and otherwise issues like this can become catch-alls for common misconfiguration by users.

If you are experiencing issues, please try and put together a reproducible test case and open a new issue with instructions for how it can be reproduced, if you think you've encountered an actual bug in the self checking flow so we can (1) encode that test case into an actual automated test and (2) fix that test 😄

There's a real wide variety of issues that can cause this message to be printed - although your case, with the self check pod clearly receiving requests, does seem odd. That said, the timestamps between the two differ by 12-13 minutes, so it seems like you may be looking at different self check attempts here.

We are trying to keep the repositories issue board clean of "support" related issues, and so would prefer if you could post on Slack in order to help debug the problem. Once we've identified that it is in fact a bug, and not simply a misconfiguration, opening an issue will then be the best route so we can track and triage an actual fix 😄

@sjbarrio

This comment has been minimized.

Copy link

@sjbarrio sjbarrio commented Sep 5, 2018

Mi problem persists only in prod server (https://acme-v02.api.letsencrypt.org/directory)... in stanging server it works good (https://acme-staging-v02.api.letsencrypt.org/directory)... How can I obtain more information ?

@xaralis

This comment has been minimized.

Copy link

@xaralis xaralis commented Sep 5, 2018

@munnerz OK, I'll try the slack tomorrow if this doesn't fix itself. What is the reasonable amount of time to wait?

@saward

This comment has been minimized.

Copy link

@saward saward commented Sep 5, 2018

Just in case it's helpful, I had a situation where the well-known path was set for both my main ingress and the one created by cert-manager. I think what happened is that the path set for my main ingress was the chosen one, and was automatically redirecting to SSL and failing because the certificate wasn't found.

Removing the main ingress completely and recreating seemed to resolve the issue for me.

@innovia

This comment has been minimized.

Copy link

@innovia innovia commented Sep 5, 2018

@saward what do you mean the main ingress for the well-known path? is this a bug? did you manually set it up before cert manager? for me once the secret was deleted it was created immediately on the already running challenge pod

@saward

This comment has been minimized.

Copy link

@saward saward commented Sep 5, 2018

It might be a bug. I'll explain a bit clearer, but I am not great with the terminology and concepts yet so I may not describe things well.

I have my own ingress I've created with a few rules. cert-manager appears to create its own ingress for the domain with a rule matching a specific path, the 'well-known' path, used by let's encrypt to verify ownership.

While cert-manager was trying and failing with the self check, I checked all ingresses (kubectl describe ing). I noticed that a 'well-known' path rule existed for both the ingress I'd created as well as the one created by cert manager, even though I had never added such a rule to my own ingress. I can only assume that cert manager created the rule under both ingresses, but why and under what conditions, I'm not sure.

Edit: I just remembered, this may have been a result of me misconfiguring the certificate object, leading to the creation of an extraneous rule.

@xaralis

This comment has been minimized.

Copy link

@xaralis xaralis commented Sep 6, 2018

For the record, my problem was:

I've been following Rancher HA setup guide which suggests having public-facing nginx load balancer. That is OK, but the problem is: their sample nginx config redirects all the HTTP traffic to HTTPS. I was having HTTPS enabled using their default self-signed certificate. That was obviously stopping let's encrypt from reaching the challenge URL.

So, if you bump into this, make sure your traffic either allows HTTP or has HTTPS with a trusted cert.

@szymonpk

This comment has been minimized.

Copy link

@szymonpk szymonpk commented Sep 6, 2018

@xaralis I had the same issue, and it wasn't the case. Nginx redirected http traffic to well-known location without a problem. Still, validation failed. I am using nginx-ingress-0.23.0 and cert-manager-v0.4.1 helm charts.

@ernoaapa

This comment has been minimized.

Copy link

@ernoaapa ernoaapa commented Sep 7, 2018

@saward I faced the same problem. What misconfiguration you had?

@saward

This comment has been minimized.

Copy link

@saward saward commented Sep 7, 2018

@ernoaapa From the docs, I think I had something like this, except two separate ingresses for the same domain, when I shouldn't have had two for the same domain (not 100% sure, going off memory). i.e., having 'example.com' twice instead of 'example.com' in one and 'www.example.com' in the other:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  secretName: example-com-tls
  issuerRef:
    name: letsencrypt-staging
  commonName: example.com
  dnsNames:
  - www.example.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - example.com
    - http01:
        ingress: my-ingress
      domains:
      - www.example.com
@jfpucheu

This comment has been minimized.

Copy link

@jfpucheu jfpucheu commented Sep 17, 2018

Hello ,
in fact the probleme is :

The ingress Rule generate for acme challenge in in https but with a bad certificate (because not yet generated) -> that cause the failed challenge.

The solution is to add : nginx.ingress.kubernetes.io/ssl-redirect: "false" annotation to the ingress rule cm-acme-http-solver generated.

wait a minute and the certificate will be created.

To the developers: the solution should be to add this parameter by default in the ingress rule.

JEff

@quorak

This comment has been minimized.

Copy link

@quorak quorak commented Sep 21, 2018

hey guys, so i was wasting a whole day on getting this run for our staging setup. I rolled back and forth versions of nginx-ingress and cert-manager and tried all kind of namespace version combinations. At the end I got it working and I'm almost certain it is because of the number in the domain name

test.s1.staging.exmaple.com always failed with http-01 self check failed for domain
but test.one.staging.exmaple.com out of a sudden worked.

this looks like some strange regex bug to me. Hope this helps as for some further investigation.

@kwilczek

This comment has been minimized.

Copy link

@kwilczek kwilczek commented Sep 21, 2018

I had a similar problem. For some reason auto-regeneration stopped working. I had self-check problem, etc. What helped me was deleting the old certificates (the whole secret with files) and certificate. With this cert-manager managed ;) So having a stale certificate was a problem for uknown reason.

kubectl delete secret myapp-tls (where .pem resides)
kubectl delete certificate myapp-tls

Before this I changed version from 0.4.X to 0.5.0, but the problem was immune to version change.

@kiuka

This comment has been minimized.

Copy link

@kiuka kiuka commented Oct 16, 2018

Hey guys, maybe it helps someone:

In my case the problem was that I was using path: /* in my ingress, that was catching all the requests, even the /.well-known... ones.

I am quite new to ingress, so I'm not sure when they changed from /* to / but it seems to be working without the wildcard also.

@Freyert

This comment has been minimized.

Copy link
Contributor

@Freyert Freyert commented Oct 18, 2018

Caveat Emptor: The Google Cloud Load Balancer Ingress is a difficult with cert-manager http01

Suggest you use dns01 challenge instead.

I don't think you can use ingress shim

When using GCLB you MUST specify a preexisting ingress otherwise GCLB will create another load-balancer on a different IP. The self checks will fail because your loadbalancer with the correct DNS will not have the necessary rule.

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
#...
spec:
  acme:
    config:
    - domains:
      - my.cool.domain.com
      http01:
        ingress: already-existing-ingress-resource-name

The ingress will not update unless everything is perfect

Namely, if the secret does not exist already the ingress will not update the loadbalancer rules. When working with GCLB always describe the ingress first when troubleshooting. Make sure events look happy.

The process I went about from having a preexisting certificate:

  1. Create a Certificate CRD as per cert-manager documentation
    • make sure you specify the preexisting ingress as above!
  2. Do not change anything about the ingress.
  3. Watch the ingress definition change! cert-manager adds a path to a service it creates in the cluster that hosts the file to serve. Try curling it.
  4. Find the load balancer in GCLB, does it match expectations? If not describe the ingress and remove errors until it has the challenge path listed as a backend!
  5. After the certificate is issued add the TLS secret to the ingress manually. Verify that GCLB has updated properly with a crazy random certificate name.

Because the GCLB doesn't change the real configuration unless everything is OK I believe you can get through a migration to cert-manager from a pre-shared cert with no impact. Especially if you follow these simplified steps:

  1. Create a Certificate CRD as per cert-manager documentation
    • make sure you specify the preexisting ingress as above!
  2. Add TLS secret manually to ingress

I think @kiuka point is interesting to consider as well. The GCLB ingress asserts that /* points to a default back end. I manually deleted it several times from the LB, but it comes back almost instantly. I'm hoping this works despite /*.

Version Matters?

I believe I read in other issues that there's some issues with different versions of GCLB? Don't remember where.

@irontoby

This comment has been minimized.

Copy link

@irontoby irontoby commented Nov 27, 2018

@saward thanks!! Your description of the issue helped me track down the problem. In my case I'm running using a cluster provisioned using Typhoon on DigitalOcean.

The problem is that I had left off the kubernetes.io/ingress.class annotation on my service Ingress. Without this annotation, every Ingress will try to handle the incoming request. I had created the nginx Ingress Controller using --ingress-class=public which meant I needed to annotate my Ingress with:

kubernetes.io/ingress.class: "public"

However the more "generic" version would probably also work:

kubernetes.io/ingress.class: "nginx"

Either way, once I added that annotation, it seems like the self-check request was routed to the correct solver and the cert update was then successful.

@BumwooPark

This comment has been minimized.

Copy link

@BumwooPark BumwooPark commented Dec 20, 2018

i have same issue

my domain success apply A record and can success invoked from other ip address

but continue arise http-01 self check failed for domain

@lkoba

This comment has been minimized.

Copy link

@lkoba lkoba commented Jan 10, 2019

When using GCLB you MUST specify a preexisting ingress otherwise GCLB will create another load-balancer on a different IP. The self checks will fail because your loadbalancer with the correct DNS will not have the necessary rule.

I had this same issue but I'm not using Certificate objects directly. Instead I'm using Ingress annotations. To solve this issue on GKE when using Ingress annotations you need to add this one:
certmanager.k8s.io/acme-http01-edit-in-place: "true"

This will use the same Ingress instead of creating a new one.

@wilco

This comment has been minimized.

Copy link

@wilco wilco commented Feb 6, 2019

@quorak I had the same problem with the Let's Encrypt "http-01 self check failed for domain" error. When the domein contains a number (in my case a 2), the certificate is not given, when I remove the number it works. So rancher2.blabla.com does not work, but rancher.blabla.com works.

@EIrwin

This comment has been minimized.

Copy link

@EIrwin EIrwin commented Feb 25, 2019

I know there is a lot of chatter on this topic and wanted to give what I was seeing as well as what fixed it.

In my case, I have had ingress successfully setup with cert-manager for two domains mydomain.com and www.mydomain.com running for awhile without an issue.

I recently added another host/rule/backend api.mydomain.com so that my ingress.yaml looks like the following

kind: Ingress
metadata:
  name: web
  annotations:
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - mydomain.io
        - www.mydomain.io
        - api.mydomain.io <-- THIS IS WHAT WAS ADDED
      secretName: letsencrypt-prod
  rules:
    - host: mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: www.mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: api.mydomain.io <-- THIS IS WHAT WAS ADDED
      http:
        paths:
          - backend:
              serviceName: api
              servicePort: 80

I also saw the following in the ingress logs

W0225 03:46:38.166926       7 controller.go:1080] Validating certificate against DNS names. This will be deprecated in a future version.
W0225 03:46:38.166932       7 controller.go:1085] SSL certificate "default/letsencrypt-prod" does not contain a Common Name or Subject Alternative Name for server "api.mydomain.io": x509: certificate is valid for mydomain.io, www.mydomain.io, not api.mydomain.io

Additionally, (and what led me to this thread) was the output of kubectl describe certificate showed there was an issue with self check

http-01 self check failed for domain "www.mydomain.io"

Upon trying different things, within seconds of running a command to delete the letsencrypt-prod seret, it was regenerated and now everything works.

kubectldo delete secret letsencrypt-prod

@rpetteruti

This comment has been minimized.

Copy link

@rpetteruti rpetteruti commented Apr 17, 2019

Hello I've the same problem described in this issue, waited for 5 days but cert-manager loop on "http-01 self check failed for domain" don't know what can I do in order to figure out the problem, on the same machine if I shutdown the docker enviroment and try to use the certbot client everithing works fine, I'm using the http-01 challenge.

@rbq

This comment has been minimized.

Copy link

@rbq rbq commented Apr 17, 2019

I'd try curl-ing the challenge endpoint from within your cluster. Had a similar problem and in my case it was the missing NAT reflection (or split DNS) that prevented cert-manager inside my cluster from verifying that the challenge was available.

@bertoost

This comment has been minimized.

Copy link

@bertoost bertoost commented May 5, 2019

@rbq can you explain how to do that? I am facing kinda same issue and getting these errors;

I0505 20:10:05.505800       1 controller.go:206] challenges controller: syncing item 'example/letsencrypt-3860812899-1'
I0505 20:10:05.506489       1 ingress.go:49] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=640746824,certmanager.k8s.io/acme-http-token=1208225418
I0505 20:10:05.545080       1 sync.go:176] propagation check failed: wrong status code '404', expected '200'
@rbq

This comment has been minimized.

Copy link

@rbq rbq commented May 6, 2019

@bertoost My Ingress was available from my workstation via HTTP, yet cert-manager complained that it couldn't verify that the HTTP challenge it added was visible to letsencrypt. So I finally figured out that it couldn't reach my WAN address from behind the NAT.

To verify, I started a container with curl (something like kubectl run -it --rm my-test --namespace=test --image=ubuntu -- bash) and tried to request anything from my Ingress using its public DNS name: curl myapp.example.com.

But yours looks like a totally different problem to me—it seems to be stuck before it even gets to the self-check.

@bertoost

This comment has been minimized.

Copy link

@bertoost bertoost commented May 6, 2019

Hm okay. It's on my hosted VPS (not my local machine) and the weird thing is, I can access the host and view the website. Therefor I have successfully requested more certificates earlier for other projects, the same way, the same setup etc.. So, I really don't understand why this one is not working

@rbq

This comment has been minimized.

Copy link

@rbq rbq commented May 6, 2019

@bertoost I think it would make sense to open a separate issue and post some configuration details.

@bertoost

This comment has been minimized.

Copy link

@bertoost bertoost commented May 6, 2019

somehow it is working .. just want to continue working on it, and suddenly it has a valid certificate retrieved from LetsEncrypt.. weird, but okay

@alepaez

This comment has been minimized.

Copy link

@alepaez alepaez commented Jun 14, 2019

Just got into this error

wrong status code '404', expected '200'

This is my config:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

Found this on my nginx ingress logs:

conflicting server name "my.domain" on 0.0.0.0:80, ignored

"GET /.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c HTTP/1.1" 404 209 "http://my.domain/.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c" "Go-http-client/1.1" "-"

Just changed the host option on my ingress rule and the issue was fixed:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my2.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

After that I had to put it back in place.

@juwalter

This comment has been minimized.

Copy link

@juwalter juwalter commented Aug 23, 2019

Also encountered could not reach 'http://HOST.domain.NET/.well-known/acme-challenge/NldjKBM648vvka9A7VCSIKqqFwBCxM2DP5rIBgNr80s': wrong status code '404', expected '200' in kubectl -n istio-system logs -f certmanager-1c1c1c1c1c1-xnxxnnxnx

After looking at all ingresses kubectl get ingress --all-namespaces I realized that istio had created its own ingress to intercept the .well-known/acme-challenge/ call from letsencrypt.

This "letsencrypt cm-acme-http-solver" ingress is a temporary one and apparently there to intercept and answer the call to .well-known/acme-challenge/ - its rules configuration for matching a particular backend is identical to the original ingress needed for my service, except the paths: section contains the very specific path matching rule; my service was initially without a path match and probably chosen as the catch all, preventing the acme challenge from resolving.

Not working:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80

Working


apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80
            path: /

(notice the very last line path: / )

Not sure if this is just a lucky coincidence now, or if it is really needed - ymmv

@MakG10

This comment has been minimized.

Copy link

@MakG10 MakG10 commented Nov 16, 2019

My case: I changed NS DNS records, but after TTL expired, the nameserver set in kubernetes node was still pointing to the old server, which obviously was returning 404 for the HTTP challange. You can verify this using curl from node machine.

As a quick workaround, I temporally changed the nameserver in the node that was running cert-manager in /etc/resolv.conf to Google's 8.8.8.8 and set dnsPolicy to Default in cert-manager deployment. I guess you could also set dnsConfig for the cert-manager deployment instead of modyfing node's resolv.conf

If there is a better solution, then I'd be happy to hear it.

@Bram-Zijp

This comment has been minimized.

Copy link

@Bram-Zijp Bram-Zijp commented Dec 26, 2019

Removing the NGINX ingress, the cert manager and the deployment that had a failing certificate, and adding it all back afterwards, fixed it for me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.