-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingress-nginx intermittently serves the default certificate instead of a configured tls certificate for rules without a host #7153
Comments
Tried a new cluster with a static manifest for 0.46.0 instead of the 1-Click App and it's the same problem:
|
Started to explore the nginx Lua code that serves the cert and it looks like it'll be non-trivial to understand and add debug to, so for now I just used a workaround of specifying a default certificate which works:
|
I guess you could also experiment with creating a (I imagine that if the problem is that the config does not get re-loaded, perhaps having the |
@irbekrm If I understand correctly, you mean creating the certificate resource manually and then pointing to that in the ingress with If so, I tried that and it didn't work:
FYI, every time I do a test, I destroy the cluster, load balancer, and DNS A record, and start fresh, though I'm skipping all the other initial steps enumerated in the original report; this also means the IP and other things may change after I post this comment as I do further tests. |
Show logs of your curl request from controller pod.
Thanks,
; Long
…On Sat, 5 Jun, 2021, 7:08 AM kevgrig, ***@***.***> wrote:
@irbekrm <https://github.com/irbekrm> If I understand correctly, you mean
creating the certificate resource manually and then pointing to that in the
ingress with spec.tls.hosts and spec.tls.secretName without
metadata.annotations.cert-manager.io/issuer, right?
If so, I tried that and it didn't work:
1. Create certificate manually:
printf '{"apiVersion":"cert-manager.io/v1","kind":"Certificate","metadata":{"name":"%s","namespace":"%s"},"spec":{"secretName":"%s","duration":"2160h","renewBefore":"360h","subject":{"organizations":["%s"]},"isCA":false,"privateKey":{"algorithm":"RSA","encoding":"PKCS1","size":4096},"usages":["server auth","client auth"],"dnsNames":["%s"],"issuerRef":{"name":"%s"}}}' "my-certificate" "testns1" "my-certificate-key" "MyOrganization" "example.myplaceonline.com" "letsencrypt-production-issuer" | kubectl create -f -
2. Certificate issued:
[...]
Normal Issuing 7s cert-manager The certificate has been successfully issued
3. Certificate looks good:
$ kubectl get secret my-certificate-key --namespace=testns1 -o "jsonpath={.data['tls\.crt']}" | base64 -d | openssl x509 -in - -text | grep Issuer:
Issuer: C = US, O = Let's Encrypt, CN = R3
4. Edit the ingress:
$ EDITOR=vi kubectl edit ingress ingress1 --namespace=testns1
To add:
spec:
tls:
- hosts:
- example.myplaceonline.com
secretName: my-certificate-key
5. Ingress looks good:
$ kubectl describe ingress ingress1 --namespace=testns1
Name: ingress1
Namespace: testns1
Address: 64.225.89.179
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
my-certificate-key terminates example.myplaceonline.com
[...]
6. TLS request still shows the default certificate:
$ curl -vk https://example.myplaceonline.com/ 2>&1 | grep -e issuer:
* issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
FYI, every time I do a test, I destroy the cluster, load balancer, and DNS
A record, and start fresh, though I'm skipping all the other initial steps
enumerated in the original report; this also means the IP and other things
may change after I post this comment as I do further tests.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWXZATFSJK3WS25GAULTRF5YNANCNFSM45K3DY7Q>
.
|
@longwuyuan Hi Long, it shows a connection reset from the controller pod. The IP address is correct. What might this mean? Is this a potential networking issue inside Digital Ocean?
From outside the cluster, it works but shows the wrong cert:
|
Please curl from outside and that should produce logs in the controller
pod. It is those log messages we could analyse. Kubectl logs -f
controllerpodname
Thanks,
; Long
…On Sat, 5 Jun, 2021, 12:04 PM kevgrig, ***@***.***> wrote:
@longwuyuan <https://github.com/longwuyuan> Hi Long, it shows a
connection reset from the controller pod. The IP address is correct. What
might this mean? Is this a potential networking issue inside Digital Ocean?
$ kubectl exec ingress-nginx-controller-57cb5bf694-vrpnm --namespace=ingress-nginx -- curl -kv https://example.myplaceonline.com/test/
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 64.225.89.58:443...
* Connected to example.myplaceonline.com (64.225.89.58) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* OpenSSL SSL_connect: Connection reset by peer in connection to example.myplaceonline.com:443
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection 0
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to example.myplaceonline.com:443
command terminated with exit code 35
From outside the cluster, it works but shows the wrong cert:
$ curl -kv https://example.myplaceonline.com/test/
* Trying 64.225.89.58:443...
* Connected to example.myplaceonline.com (64.225.89.58) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
* start date: Jun 5 06:19:37 2021 GMT
* expire date: Jun 5 06:19:37 2022 GMT
* issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
* SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x557e12677d80)
> GET /test/ HTTP/2
> Host: example.myplaceonline.com
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200
< date: Sat, 05 Jun 2021 06:31:41 GMT
< content-type: text/html
< content-length: 120
< last-modified: Sat, 05 Jun 2021 06:18:23 GMT
< strict-transport-security: max-age=15724800; includeSubDomains
<
<html><head><title>HTTP Hello World</title></head><body><h1>Hello from helloworldweb-849f6d4b9f-t2j89</h1></body></html
* Connection #0 to host example.myplaceonline.com left intact
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWV23M3UCRTN7LQLMIDTRHAOBANCNFSM45K3DY7Q>
.
|
By any chance, does this relate to the problem you are facing https://kubernetes.github.io/ingress-nginx/user-guide/tls/#default-ssl-certificate |
@longwuyuan Hi Long,
Yes, if I use I'm a bit busy today but should be able to reproduce curl with controller logs later today or tomorrow. Thanks for your help. |
ok. |
another log message needed is the below sequence of steps and the related logs ;
|
I recreated the cluster and DNS A record between those tests so those were two different clusters. Now that I have active help with you, I will stick to a single cluster where we can diagnose the issue, thanks! I'll gather everything from scratch. |
@longwuyuan Hi Long,
Current ingress:
Requested from external:
This resulted in the following in the log tail:
Requested from controller:
This resulted in the following in the log tail:
|
Also just to confirm that the referenced certificate is good:
|
Can you show this in zoom sharescreen ? |
@longwuyuan Yes, sure, thanks! What days and times are good for you? I'm in Pacific Time (PT) |
Are you on slack
Thanks,
; Long
…On Wed, 9 Jun, 2021, 6:22 AM kevgrig, ***@***.***> wrote:
@longwuyuan <https://github.com/longwuyuan> Yes, sure, thanks! What days
and times are good for you? I'm in Pacific Time (PT)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWQFM6YP65N3236ZDXDTR23NZANCNFSM45K3DY7Q>
.
|
@longwuyuan I'm not on Slack but I will join. I have a customer call in 40 minutes for work tonight and going to bed, but I'll join the Slack tomorrow... |
Great. Please use the ingress controller users channel or dm me.
Thanks,
; Long
…On Wed, 9 Jun, 2021, 8:50 AM kevgrig, ***@***.***> wrote:
@longwuyuan <https://github.com/longwuyuan> I'm not on Slack but I will
join. I have a customer call in 40 minutes for work tonight and going to
bed, but I'll join the Slack tomorrow...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWXKFRQOVWTPN7LJYPTTR3MYFANCNFSM45K3DY7Q>
.
|
Please close the issue since resolved on zoom |
/remove-kind bug |
@longwuyuan Thank you for your time for the investigation. I started from scratch and found that the difference was that in the example we used, it had an explicit host:
Once I added an explicit host, then everything worked (including the cert-manager annotation shim). This was odd because lacking a host should apply to all hosts:
And previously describing the ingress showed a wildcard host:
So then I removed the host again, but it still worked! From previous testing, I knew that restarting the deployment didn't help which suggests it's not some initial process state issue. So there still seems to be some strange initial host mapping issue. If anyone wants to debug this, I think I have a reproducible test case. Nevertheless, for now, everything works for me (after using the above workaround) and I'm happy. |
You could change the title of the issue for coherence and then write
details steps to reproduce the problem.
Thanks,
; Long
…On 11/06/21 9:54 am, kevgrig wrote:
@longwuyuan <https://github.com/longwuyuan> Thank you for your time
for the investigation. I started from scratch and found that the
difference was that in the example we used, it had an explicit host:
|spec: rules: - host: "test0.myplaceonline.com" http: [...] |
Once I added an explicit host, then everything worked (including the
cert-manager annotation shim). This was odd because lacking a host
should apply to all hosts
<https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-rules>:
An optional host. In this example, no host is specified, so the
rule applies to all inbound HTTP traffic through the IP address
specified.
And previously describing the ingress showed a wildcard host:
|Rules: Host Path Backends ---- ---- -------- * /(.*) helloworldweb:80
(10.244.0.47:80) |
So then I removed the host again, but it still worked! From previous
testing, I knew that restarting the deployment didn't help which
suggests it's not some initial process state issue.
So there still seems to be some strange initial host mapping issue. If
anyone wants to debug this, I think I have a reproducible test case.
Nevertheless, for now, everything works as expected for me and I'm happy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWQTC364J63BMPTU4LLTSGFYFANCNFSM45K3DY7Q>.
|
@longwuyuan Updated title. Reproduction steps are the same as originally reported in the bug description. I've yet to try them outside of Digital Ocean. Note also that using helm did not help nor did manually fixing 0.46.0 to 0.47.0 that we observed (as detailed in #7229). |
Please post reproduce process in do channel on slack. There are some
wonderfully awesome do engrs there.
Thanks,
; Long
…On Fri, 11 Jun, 2021, 10:30 AM kevgrig, ***@***.***> wrote:
@longwuyuan <https://github.com/longwuyuan> Updated title. Reproduction
steps are no different although I've yet to try them outside of Digital
Ocean. Note also that using helm did not help nor did manually fixing
0.46.0 to 0.47.0 that we observed (as detailed in #7229
<#7229>).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWXATJDLRBABRDBEDW3TSGJ65ANCNFSM45K3DY7Q>
.
|
/triage needs-information |
Reproduction steps:
|
If you tell about this to engineers from DO, in the k8s slack channel of
DO, maybe it will help a lot.
Thanks,
; Long
…On 11/06/21 10:00 pm, kevgrig wrote:
Reproduction steps:
1. Create Digital Ocean Kubernetes cluster
<https://cloud.digitalocean.com/kubernetes/clusters/new> (a 1 node
cluster works)
2. Click |Download Config File| and save to |~/.kube/config|
3. Install ingress-nginx either with Helm
<https://kubernetes.github.io/ingress-nginx/deploy/#using-helm>,
the YAML file
<https://kubernetes.github.io/ingress-nginx/deploy/#digital-ocean>
(with fix for #7229
<#7229>), or the
NGINX Ingress Controller 1-Click App
<https://marketplace.digitalocean.com/apps/nginx-ingress-controller>.
All methods reproduce the problem.
4. Wait until the Digital Ocean Load Balancer is created and copy the
IP address.
5. Create DNS A record
<https://cloud.digitalocean.com/networking/domains> or entry in
|/etc/hosts/| for the load balancer IP and test host (in the
following example, |example.myplaceonline.com|).
6. Create |testns1| namespace:
|printf
'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"testns1"}}'
| kubectl create -f - |
7. Create Hello World website
<https://hub.docker.com/r/strm/helloworld-http> deployment:
|kubectl create deployment helloworldweb
--image=strm/helloworld-http --namespace=testns1 |
8. Expose deployment as a service:
|kubectl expose deployment helloworldweb --port=80
--target-port=80 --namespace=testns1 |
9. Create ingress pointing to the helloworldweb service:
|printf
'{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"name":"%s","namespace":"%s","annotations":{"nginx.ingress.kubernetes.io/rewrite-target":"/$1"}},"spec":{"rules":[{"http":{"paths":[{"path":"%s","pathType":"Prefix","backend":{"service":{"name":"%s","port":{"number":80}}}}]}}]}}'
"ingress1" "testns1" "/(.*)" "helloworldweb" | kubectl create -f - |
10. Test that external port 443 shows the default self-signed
certificate with |CN=Kubernetes Ingress Controller Fake Certificate|:
|$ curl -vk https://example.myplaceonline.com/ 2>&1 | grep issuer:
* issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake
Certificate |
11. Install cert-manager:
|kubectl apply -f
https://github.com/jetstack/cert-manager/releases/download/v1.3.1/cert-manager.yaml
|
12. Wait until cert-manager is ready:
|$ kubectl get pods --namespace cert-manager NAME READY STATUS
RESTARTS AGE cert-manager-7dd5854bb4-rztj8 1/1 Running 0 18s
cert-manager-cainjector-64c949654c-9np86 1/1 Running 0 18s
cert-manager-webhook-6bdffc7c9d-lp924 1/1 Running 0 18s |
13. Create Digital Ocean Personal Access Token
<https://cloud.digitalocean.com/account/api/tokens/new> for
Digital Ocean DNS01 solver
<https://cert-manager.io/docs/configuration/acme/dns01/digitalocean/>
and convert to Base64 (replace |TOKEN|):
|echo -n 'TOKEN' | base64 -w 0 |
14. Create DNS01 secret (replace |BASE64TOKEN|):
|printf
'{"apiVersion":"v1","kind":"Secret","metadata":{"name":"digitalocean-dns","namespace":"testns1"},"data":{"access-token":"%s"}}'
"BASE64TOKEN" | kubectl create -f - |
15. Create Issuer (replace email address):
|printf
'{"apiVersion":"cert-manager.io/v1","kind":"Issuer","metadata":{"name":"letsencrypt-production-issuer","namespace":"testns1"},"spec":{"acme":{"email":"%s","server":"https://acme-v02.api.letsencrypt.org/directory","privateKeySecretRef":{"name":"letsencrypt-production-issuer-private-key"},"solvers":[{"dns01":{"digitalocean":{"tokenSecretRef":{"name":"digitalocean-dns","key":"access-token"}}}}]}}}'
***@***.***" | kubectl create -f - |
16. Edit the ingress:
|$ EDITOR=vi kubectl edit ingress ingress1 --namespace=testns1 |
Add |cert-manager.io/issuer: letsencrypt-production-issuer| and
|tls|
<https://kubernetes.io/docs/concepts/services-networking/ingress/#tls>:
|metadata: annotations: cert-manager.io/issuer:
letsencrypt-production-issuer [...] spec: tls: - hosts: -
example.myplaceonline.com secretName: example-tls |
17. Wait a few minutes until the certificate is ready:
|$ kubectl get certificates --namespace=testns1 NAME READY SECRET
AGE example-tls True example-tls 77s |
18. New request to port 443 still shows the old certificate:
|$ curl -vk https://example.myplaceonline.com/ 2>&1 | grep issuer:
* issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake
Certificate |
19. Edit the ingress and add a host to the rules and the curl works:
|rules: - host: example.myplaceonline.com http: [...] |
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWS5XFB7YM3EJNXN6VDTSI233ANCNFSM45K3DY7Q>.
|
@longwuyuan I see no evidence that this is related to DO. I opened a forum post with DO and received no response. I can open a support ticket. What specifically should I ask? All evidence points to the issue in ingress-nginx: adding the host in the ingress fixes the issue; subsequently, after removing the host from the ingress, the issue remains fixed. It appears to be related to ingress-nginx initial handling of a wildcard host. |
@longwuyuan That's very interesting. I found this in the Ingress documentation:
It seems like it would be useful to print a warning about this somewhere? I doubt I will be the first person to hit this issue. |
What I think confused me the most is that describing the ingress shows the following which suggests it's configured correctly. Maybe the describe command could be the place for a warning?
|
In production, I have never seen blank-value ingress.spec.rules.host, if
ingress.spec.tls has values.
Thanks,
; Long
…On 12/06/21 6:59 pm, kevgrig wrote:
@longwuyuan <https://github.com/longwuyuan> That's very interesting. I
found this in the Ingress documentation
<https://kubernetes.io/docs/concepts/services-networking/ingress/#tls>:
Keep in mind that TLS will not work on the default rule because
the certificates would have to be issued for all the possible
sub-domains. Therefore, hosts in the tls section need to
explicitly match the host in the rules section.
It seems like it would be useful to print a warning about this
somewhere? I doubt I will be the first person to hit this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWT5NAZZURUCYGSQUH3TSNOMLANCNFSM45K3DY7Q>.
|
Created PR #7239 |
I've just had this issue with 1.1.3 (which i assume has #7239 in it). Seems to match it exactly, I had tls configured with a default backend and the ingress refused to serve the certificate. After changing from a default backend to a specific host it worked. I was then able to change to back to a default backend and it continued to work. Just as describe here. We use default backend because rewrite decodes the uri which broke the particular service i'm hosting. |
This issue happened to me as well, for some reason Ingress Controller was using default certificate(fake one). Had to: 1 - Setup rule host as a wrong one like: ...
spec:
ingressClassName: nginx-custom-class
rules:
- host: potato.com
- http:
paths:
- path: /
pathType: Prefix
backend:
... 2 - Setup rule host with the correct one like: ...
spec:
ingressClassName: nginx-custom-class
rules:
- host: www.myservice.com
- http:
paths:
- path: /
pathType: Prefix
backend:
... If i don't setup host with a invalid one before it just doesn't work, ingress controller keeps sending the wrong certificate |
The same happened to me! I spent a day trying to figure out before I actually landed here. Just want to thank @kevgrig for this gem here! I fixed such issue by setting a |
I had a similar experience. In my case the secret for the ingress was accidentally in default namespace. After changing to the same namespace as the ingress the problem seems to have gone away. I want to remember that secrets are only available to pods within the same namespace. Does that go for ingress as well? If not why did it work intermittently? |
NGINX Ingress controller version:
Installed with the Digital Ocean NGINX Ingress Controller 1-Click App. Looks to be NGINX 0.44.0 through the 3.23.0 Helm chart:
$ kubectl get deployment ingress-nginx-controller --namespace=ingress-nginx -o jsonpath='{.metadata.labels}' {"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"ingress-nginx","app.kubernetes.io/version":"0.44.0","helm.sh/chart":"ingress-nginx-3.23.0"}
Kubernetes version:
Environment:
uname -a
):Linux pool-o1y0v82td-8wjdr 4.19.0-11-amd64 #1 SMP Debian 4.19.146-1 (2020-09-17) x86_64 GNU/Linux
What happened:
As detailed in cert-manager issue #4012, cert-manager is creating the certificate and ingress-nginx seems to be picking it up, but the default fake certificate is still being served:
cert-manager created the certificate:
Describing the ingress shows the TLS certificate:
Running the ingress with
--v=5
debug logging shows the following "ssl" related entries which suggests it found the certificate but might not be applying it?I reviewed
/etc/nginx/nginx.conf
but it seems the certificates are handled by a Lua module and I'm not sure how to dive into that.How to reproduce it:
Reproduction steps detailed in cert-manager issue #4012.
Anything else we need to know: N/A
/kind bug
The text was updated successfully, but these errors were encountered: