Skip to content
This repository has been archived by the owner on Jan 12, 2022. It is now read-only.

GCP PR build failed waiting for Certificate ingress/https-cert to be ready #172

Closed
sreis opened this issue Dec 15, 2020 · 1 comment
Closed
Labels
type: ci-instability may or may not be a user-facing problem, too

Comments

@sreis
Copy link
Contributor

sreis commented Dec 15, 2020

Describe the bug

@jgehrcke opened a PR and the CI run failed https://buildkite.com/opstrace/prs/builds/3284#244c5925-a40b-4a99-9f73-082e059539fe/240-6841

[2020-12-15T11:57:08Z] 2020-12-15T11:57:08.586Z info: waiting for 1 Certificates
[2020-12-15T11:57:08Z] 2020-12-15T11:57:08.586Z debug:     Waiting for Certificate ingress/https-cert to be ready
[2020-12-15T11:57:13Z] 2020-12-15T11:57:13.258Z warning: cluster creation attempt timed out after 2400 seconds

While triaging the cluster I found this issue cert-manager/cert-manager#1507 but restarting the cert-manager pod didn't fix it.

This was in the cert-manager logs before restarting the pod

E1215 10:55:06.642995       1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="DNS record for \"default.bk-3284-19e-g.opstrace.io\" not yet propagated" "dnsName"="default.bk-3284-19e-g.opstrace.io" "resource_kind"="Challenge" "resource_name"="https-cert-j98dj-2004239255-1344158417" "resource_namespace"="ingress" "resource_version"="v1" "type"="DNS-01"

After restarting it was complaining about

E1215 11:48:48.493979       1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="DNS record for \"system.bk-3284-19e-g.opstrace.io\" not yet propagated" "dnsName"="system.bk-3284-19e-g.opstrace.io" "resource_kind"="Challenge" "resource_name"="https-cert-j98dj-2004239255-129808158" "resource_namespace"="ingress" "resource_version"="v1" "type"="DNS-01"

I deleted the CertificateRequest and updated the TXT record in Google Cloud DNS console manually with the new token:

> dig TXT _acme-challenge.system.bk-3284-19e-g.opstrace.io. 
; <<>> DiG 9.16.1-Ubuntu <<>> TXT _acme-challenge.system.bk-3284-19e-g.opstrace.io.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28967
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_acme-challenge.system.bk-3284-19e-g.opstrace.io. IN TXT
;; ANSWER SECTION:
_acme-challenge.system.bk-3284-19e-g.opstrace.io. 22 IN	TXT "3Gor8YS3A7G0HBuSiCvqoqnnme3U_0QKG2gjeKq-j48"
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Dec 15 10:54:06 -01 2020
;; MSG SIZE  rcvd: 133
10:55

The challenge token was taken from:

> kubectl -n ingress get orders.acme.cert-manager.io https-cert-dxr47-2004239255 -o yaml
apiVersion: acme.cert-manager.io/v1
kind: Order
metadata:
  annotations:
    cert-manager.io/certificate-name: https-cert
    cert-manager.io/certificate-revision: "1"
    cert-manager.io/private-key-secret-name: https-cert-c7zmj
    opstrace: owned
  creationTimestamp: "2020-12-15T11:49:17Z"
  generation: 1
    manager: controller
    operation: Update
    time: "2020-12-15T11:49:17Z"
  name: https-cert-dxr47-2004239255
  namespace: ingress
  ownerReferences:
  - apiVersion: cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CertificateRequest
    name: https-cert-dxr47
    uid: 8ab84a23-d8a0-4dcf-a6cb-b6de078e5939
  resourceVersion: "33186"
  selfLink: /apis/acme.cert-manager.io/v1/namespaces/ingress/orders/https-cert-dxr47-2004239255
  uid: fee1ad74-0752-4573-9caa-173075955269
spec:
  dnsNames:
  - bk-3284-19e-g.opstrace.io
  - system.bk-3284-19e-g.opstrace.io
  - '*.system.bk-3284-19e-g.opstrace.io'
  - default.bk-3284-19e-g.opstrace.io
  - '*.default.bk-3284-19e-g.opstrace.io'
  issuerRef:
    kind: Issuer
    name: letsencrypt-staging
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJREhqQ0NBZ1lDQVFBd0FEQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU9ZTQpUSG03bGhFL0gxN1JjdjBhOW5mdmpxTFRDOEtEMTNrdnFCS09jaHVDY05oZm1pVUZVMGJqV0xCbnNkZG1QUE5hCm5BNmVWdU42cDdnNEFaK2lTQzRRa0k1a1U2ODNVQWRpb1IrM2RUVTBGQW01RTJaZ1B4OVNPNVVJMjlnR09xdWQKSmloaEtzS1M2VXVJN2dqRS96elBWVC9OYlpPVnUwWmovWitVVUM4dmw3elpxVUtNRHl5cjRaeld6WVJmMTBTMQpoNmJIUzZQZ2hVZWcvOHB5ZVdRK3IySWJjYUs5VGhIVkxFbTNidjlmc2ZFRno3R3VTOHdhVUs0ZFdYS2xnTXdHClRvVThDL0JtWWF6VTA4aWw5VUhLZHJjQ3Z1S2dmWWRjOVkzMjlEbFRUOHRCbmJvbmNQUWZIZFF1NWludHF0NnoKMHQ2ME4vallreHJ1QnRzT1RlTUNBd0VBQWFDQjJEQ0IxUVlKS29aSWh2Y05BUWtPTVlISE1JSEVNSUcwQmdOVgpIUkVFZ2F3d2dhbUNHV0pyTFRNeU9EUXRNVGxsTFdjdWIzQnpkSEpoWTJVdWFXK0NJSE41YzNSbGJTNWlheTB6Ck1qZzBMVEU1WlMxbkxtOXdjM1J5WVdObExtbHZnaUlxTG5ONWMzUmxiUzVpYXkwek1qZzBMVEU1WlMxbkxtOXcKYzNSeVlXTmxMbWx2Z2lGa1pXWmhkV3gwTG1KckxUTXlPRFF0TVRsbExXY3ViM0J6ZEhKaFkyVXVhVytDSXlvdQpaR1ZtWVhWc2RDNWlheTB6TWpnMExURTVaUzFuTG05d2MzUnlZV05sTG1sdk1Bc0dBMVVkRHdRRUF3SUZvREFOCkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQVJHSjhUTEFwWmxibTU4R1M0c2NSNG1MT3RJS09HSEZNM3NkUFpZdUcKTE9jYWw0MEVsbmlNQi9NdFlNVUJxdmhQN2kycm1FT1FIZFJVWlMxbVZiTDQ1dkpjNmhvRURHYzFzUEZQU0RVWgpCbzZFQzdYek1oRTBzNlczK0F0bUJjZ0c4VEs1QU81MTBNdzlIWDFFSkYzMmYreEZ4MFhVL09LbWZPNlBJSHNoCk1ybWR4VUhkL1dkRXc3V1loaWZjV21XZmltMkJEbFN3R2x1M1NyRTVxN2E5SWtzNVNDbHl3c0o3R1VWTldGYjgKVm1Wdkp4SWU5cTVWS0JCMVhxMytuZUNERjZDaklJOVJDaE82dTgyNUUwbzl6bWd3VlRoWnNCNXBsVE5KQmc0aAoxUVE0R1g3TmNPdndYSVBkSmd0dnRZbUJTZWZOL2NrNk1ZbFREcVV6TkdOa3ZBPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUgUkVRVUVTVC0tLS0tCg==
status:
  authorizations:
  - challenges:
    - token: MA2X2QC-pp_PCSOsf9OwNkBa361kbjWAkQ1OMR6WS8Q
      type: dns-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950872/HOoKPg
    identifier: default.bk-3284-19e-g.opstrace.io
    initialState: valid
    url: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/172950872
    wildcard: true
  - challenges:
    - token: Luh0xiQ3Q8T7x-B8iru7CfSX5VuUEDUGiHP6Xhp2DjI
      type: dns-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950873/yGPuNQ
    identifier: system.bk-3284-19e-g.opstrace.io
    initialState: pending
    url: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/172950873
    wildcard: true
  - challenges:
    - token: ym1Ul5srFqf5g1OALVlGdplIVYYSB5Pocp-mnR0Vwrw
      type: dns-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950874/u-G88w
    identifier: bk-3284-19e-g.opstrace.io
    initialState: valid
    url: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/172950874
    wildcard: false
  - challenges:
    - token: Cu2IVXk9gbU3YY0h5t_wx9QjbOg4D_sWeVLxDbs2EBE
      type: dns-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950875/Sj2QiQ
    identifier: default.bk-3284-19e-g.opstrace.io
    initialState: valid
    url: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/172950875
    wildcard: false
  - challenges:
    - token: eAblhBsmpUQ6Pa9MBOh9EI1tgTrZwFNZlnRokY5rZt0
      type: http-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950876/1jffSg
    - token: eAblhBsmpUQ6Pa9MBOh9EI1tgTrZwFNZlnRokY5rZt0
      type: dns-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950876/l7b1Yw
    - token: eAblhBsmpUQ6Pa9MBOh9EI1tgTrZwFNZlnRokY5rZt0
      type: tls-alpn-01
      url: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/172950876/6dPuSA
    identifier: system.bk-3284-19e-g.opstrace.io
    initialState: pending
    url: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/172950876
    wildcard: false
  finalizeURL: https://acme-staging-v02.api.letsencrypt.org/acme/finalize/17105378/202184011
  state: pending
  url: https://acme-staging-v02.api.letsencrypt.org/acme/order/17105378/202184011

It took several minutes for the TXT record to update and by that time the cluster was already deleting:

> dig TXT _acme-challenge.system.bk-3284-19e-g.opstrace.io. 

; <<>> DiG 9.16.1-Ubuntu <<>> TXT _acme-challenge.system.bk-3284-19e-g.opstrace.io.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39347
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_acme-challenge.system.bk-3284-19e-g.opstrace.io. IN TXT

;; ANSWER SECTION:
_acme-challenge.system.bk-3284-19e-g.opstrace.io. 59 IN	TXT "eAblhBsmpUQ6Pa9MBOh9EI1tgTrZwFNZlnRokY5rZt0"

;; Query time: 68 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Dec 15 11:00:49 -01 2020
;; MSG SIZE  rcvd: 133
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: ci-instability may or may not be a user-facing problem, too
Development

No branches or pull requests

3 participants