Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDS falis to load available secret #18061

Closed
wpbeckwith opened this issue Oct 18, 2019 · 17 comments
Closed

SDS falis to load available secret #18061

wpbeckwith opened this issue Oct 18, 2019 · 17 comments
Assignees

Comments

@wpbeckwith
Copy link

wpbeckwith commented Oct 18, 2019

Bug description
I have installed istio with SDS enabled and all works fine except that secrets are not getting reloaded by SDS. I can successfully submit a cert-manager certificate like

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: httpbin-certs
  namespace: istio-system
spec:
  secretName: httpbin-tls
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  commonName: httpbin.foo.io
  dnsNames:
  - httpbin.foo.io

And the secret will be created in the istio-system namesapce.

kubectl describe secret/httpbin-tls -n istio-system
Name:         httpbin-tls
Namespace:    istio-system
Labels:       <none>
Annotations:  cert-manager.io/alt-names: httpbin.foo.io
              cert-manager.io/certificate-name: httpbin-certs
              cert-manager.io/common-name: httpbin.foo.io
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: letsencrypt-staging
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  3590 bytes
tls.key:  1675 bytes

However the logs for the ingress-sds process show

2019-10-18T18:04:45.074022Z	info	secretFetcherLog	secret httpbin-tls is deleted
2019-10-18T18:04:45.074161Z	info	sdsServiceLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-5, RESOURCE NAME: httpbin-tls, EVENT: connection is terminated: rpc error: code = Canceled desc = context canceled
2019-10-18T18:04:50.415424Z	warn	secretFetcherLog	Cannot find secret httpbin-tls, searching for fallback secret gateway-fallback
2019-10-18T18:04:50.415453Z	error	secretFetcherLog	cannot find secret httpbin-tls and cannot find fallback secret gateway-fallback
2019-10-18T18:04:50.415459Z	warn	cacheLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-6, RESOURCE NAME: httpbin-tls, EVENT: SecretFetcher cannot find secret httpbin-tls from cache
2019-10-18T18:04:50.415469Z	warn	sdsServiceLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-6, RESOURCE NAME: httpbin-tls, EVENT: waiting for ingress gateway secret for proxy "router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local"

2019-10-18T18:16:12.214700Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T18:17:49.074677Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret httpbin-tls
2019-10-18T18:17:49.074785Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T18:17:49.074997Z	warn	secretFetcherLog	unexpected server key/cert change in secret httpbin-tls
2019-10-18T18:26:49.610124Z	info	secretFetcherLog	secret httpbin-tls is deleted
2019-10-18T18:26:49.610278Z	info	sdsServiceLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-6, RESOURCE NAME: httpbin-tls, EVENT: connection is terminated: rpc error: code = Canceled desc = context canceled
2019-10-18T18:26:55.824339Z	warn	secretFetcherLog	Cannot find secret httpbin-tls, searching for fallback secret gateway-fallback
2019-10-18T18:26:55.824366Z	error	secretFetcherLog	cannot find secret httpbin-tls and cannot find fallback secret gateway-fallback
2019-10-18T18:26:55.824372Z	warn	cacheLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-7, RESOURCE NAME: httpbin-tls, EVENT: SecretFetcher cannot find secret httpbin-tls from cache
2019-10-18T18:26:55.824382Z	warn	sdsServiceLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-7, RESOURCE NAME: httpbin-tls, EVENT: waiting for ingress gateway secret for proxy "router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local"

2019-10-18T18:27:46.226917Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T18:27:47.166576Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret httpbin-tls
2019-10-18T18:27:47.166694Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T18:27:47.167786Z	warn	secretFetcherLog	unexpected server key/cert change in secret httpbin-tls

The above logs show where I deleted the certificate and the secret from the istio-system namespace and then reapplied the above certificate. If i execute echo | openssl s_client -showcerts -servername httpbin.foo.io -connect httpbin.foo.io:443 2>/dev/null | openssl x509 -inform pem -noout -text then I can see that the cert returned is the old cert and not the new. So far, killing the pod is the only way to get the new cert loaded.

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[X ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
The secret should be reloaded when changed by cert-manager.

Steps to reproduce the bug

  1. Configure an istio Gateway to use a secret, foo-tls.
  2. Create a secret, foo-tls, in a namespace with an ingressgateway and sds enabled.
  3. Watch the logs to verify the ingress-sds container loads the secret.
  4. Delete and recreate the secret.
  5. Verify the ingress-sds logs show that the secret can't be loaded.

Version (include the output of istioctl version --remote and kubectl version)
istioctl version --remote
client version: 1.3.2
ingressgateway version: 1.3.2
pilot version: 1.3.2

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-07T09:55:27Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.7-eks-e9b1d0", GitCommit:"e9b1d0551216e1e8ace5ee4ca50161df34325ec2", GitTreeState:"clean", BuildDate:"2019-09-21T08:33:01Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?
With helm

Environment where bug was observed (cloud vendor, OS, etc)
AWS EKS

Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

@wpbeckwith
Copy link
Author

OK, I found the problem and it is a bug. The issue is that cert-manager creates secrets with 3 keys, like

kubectl describe secret/httpbin-tls -n istio-system
Name:         httpbin-tls
Namespace:    istio-system
Labels:       <none>
Annotations:  cert-manager.io/alt-names: httpbin.foo.io
              cert-manager.io/certificate-name: httpbin-certs
              cert-manager.io/common-name: httpbin.foo.io
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: letsencrypt-staging
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  3590 bytes
tls.key:  1675 bytes

So I first tried deleting the ca.crt key from the secret and cert-manager just adds it right back, so I then edited the secret's ca.crt key to have the base64 encoded value of 'foo' which is 'Zm9v' and updated the secret.

kubectl describe secret httpbin-tls -n istio-system
Name:         httpbin-tls
Namespace:    istio-system
Labels:       <none>
Annotations:  cert-manager.io/alt-names: httpbin.foo.io
              cert-manager.io/certificate-name: httpbin-certs
              cert-manager.io/common-name: httpbin.foo.io
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: letsencrypt-staging
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
tls.key:  1679 bytes
ca.crt:   3 bytes
tls.crt:  3590 bytes

Once I did this then I got the following in the ingress-sds logs

2019-10-18T21:36:33.035641Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T21:36:34.173213Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret httpbin-tls
2019-10-18T21:36:34.173269Z	warn	secretFetcherLog	failed load server cert/key pair from secret httpbin-tls: server cert or private key is empty
2019-10-18T21:36:34.173932Z	warn	secretFetcherLog	unexpected server key/cert change in secret httpbin-tls
2019-10-18T21:36:59.353755Z	info	secretFetcherLog	Return secret httpbin-tls found by direct api call
2019-10-18T21:36:59.359077Z	info	secretFetcherLog	Return secret httpbin-tls found by direct api call
2019-10-18T21:36:59.359151Z	info	sdsServiceLog	CONNECTION ID: router~10.20.122.250~istio-ingressgateway-5bfbb64cc-76mg7.istio-system~istio-system.svc.cluster.local-8, RESOURCE NAME: httpbin-tls, EVENT: pushed key/cert pair to proxy

2019-10-18T21:39:12.057115Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret httpbin-tls

And executiong echo | openssl s_client -showcerts -servername httpbin.foo.io -connect httpbin.foo.io:443 2>/dev/null | openssl x509 -inform pem -noout -text shows the updated cert's info.

Thus the ingress-sds code needs to be updated to either ignore zero length values for keys it doesn't care about or only check the values of keys it does care about (i.e. tls.key and tls.crt).

@pib
Copy link

pib commented Oct 25, 2019

Looks like this should stop happening with cert-manager v0.11.0 because they will no longer be generating a temporary certificate while waiting for the ACME cert to be issued: https://github.com/jetstack/cert-manager/releases/tag/v0.11.0

@wpbeckwith
Copy link
Author

@pib I'm already using cert-manager 0.11.

@timurb
Copy link

timurb commented Oct 28, 2019

I also see the described behaviour. I was able to have SDS load my keypair generated by certmanager/letsencrypt only once and still trying to reproduce that with no luck.

At the same time I was not able to completely reproduce the trick with manually replacing ca.crt in the secret -- the error for server cert or private key is empty is gone but I'm still not able to access the SSL endpoint.

Another observation:

  • if I delete the secret and then generate it again with the same name then SDS is going to infinite loop with the following ever repeating logs (debug level enabled through ControlZ and no other lines in logs):
2019-10-28T09:26:06.786907Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret wonderful-gnat-myservice
2019-10-28T09:26:06.988633Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret wonderful-gnat-myservice
2019-10-28T09:26:07.186930Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret wonderful-gnat-myservice

Here is the same logline with stack trace enabled:

2019-10-28T09:25:55.386836Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret wonderful-gnat-myservice
istio.io/istio/vendor/istio.io/pkg/log.(*Scope).emit
	/workspace/go/src/istio.io/istio/vendor/istio.io/pkg/log/scope.go:277
istio.io/istio/vendor/istio.io/pkg/log.(*Scope).Infof
	/workspace/go/src/istio.io/istio/vendor/istio.io/pkg/log/scope.go:213
istio.io/istio/security/pkg/nodeagent/secretfetcher.(*SecretFetcher).scrtUpdated
	/workspace/go/src/istio.io/istio/security/pkg/nodeagent/secretfetcher/secretfetcher.go:417
istio.io/istio/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate
	/workspace/go/src/istio.io/istio/vendor/k8s.io/client-go/tools/cache/controller.go:202
istio.io/istio/vendor/k8s.io/client-go/tools/cache.NewInformer.func1
	/workspace/go/src/istio.io/istio/vendor/k8s.io/client-go/tools/cache/controller.go:309
istio.io/istio/vendor/k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop
	/workspace/go/src/istio.io/istio/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:436
istio.io/istio/vendor/k8s.io/client-go/tools/cache.(*controller).processLoop
	/workspace/go/src/istio.io/istio/vendor/k8s.io/client-go/tools/cache/controller.go:150
istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/workspace/go/src/istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/workspace/go/src/istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait.Until
	/workspace/go/src/istio.io/istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
istio.io/istio/vendor/k8s.io/client-go/tools/cache.(*controller).Run
	/workspace/go/src/istio.io/istio/vendor/k8s.io/client-go/tools/cache/controller.go:124

and here is the brand new secret described after it was regenerated from scratch:

$ kubectl describe secret wonderful-gnat-schema-registry -n istio-system
Name:         wonderful-gnat-myservice
Namespace:    istio-system
Labels:       <none>
Annotations:  cert-manager.io/alt-names: mydomain.com
              cert-manager.io/certificate-name: wonderful-gnat-myservice
              cert-manager.io/common-name: mydomain.com
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: letsencrypt-staging
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  3574 bytes
tls.key:  1675 bytes

@timurb
Copy link

timurb commented Oct 29, 2019

Ok, I think I identified the failing scenario.

@wpbeckwith did you create gateway in the same yaml manifest as your gateway? (I assume you tried to attach it to the gateway as that's what I was doing).

If I create certificate and attach it to the gateway HTTPS endpoint in a single kubectl apply/helm install command I always see the error described above (I'm using helm).
If I create certificate, wait till it is issued by letsencrypt and attach it to the gateway HTTPS endpoint then SDS is able to pick up and load the certificate like a charm.

Here is related helm template part I use (redacted) for reference.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: {{ $fullName }}
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http-80
        protocol: HTTP
      hosts:
        - "*"
    - port:
        number: 443
        name: https-443
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: {{ $fullName }}
        privateKey: sds
        serverCertificate: sds
      hosts:
        - "*"
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: {{ $fullName }}
  namespace: istio-system
spec:
  secretName: {{ $fullName }}
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: {{ .Values.https.dnsName }}
  dnsNames:
    - {{ .Values.https.dnsName }}
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer

@juliendangers
Copy link

juliendangers commented Oct 31, 2019

We faced the same issue when migrating from istio 1.2 to 1.3 (confirmed with 1.3.3).

We're using cert-manager to generate our certicate through Let's Encrypt. Istio Gateway is created at the same time we ask for the certificate, so before the certificate challenge being accepted.

Once secret is generated/updated:

2019-10-31T14:46:53.618348Z	info	sdsServiceLog	CONNECTION ID: router~10.0.3.196~istio-ingressgateway-5b7c88b68f-zdkb8.istio-system~istio-system.svc.cluster.local-31, RESOURCE NAME: authentication-tech-tls-cert, EVENT: pushed key/cert pair to proxy
2019-10-31T14:48:10.072966Z	info	secretFetcherLog	scrtUpdated is called on kubernetes secret authentication-tech-tls-cert
2019-10-31T14:48:10.073226Z	warn	secretFetcherLog	unexpected server key/cert change in secret authentication-tech-tls-cert

Unfortunately the certificate used is issued by cert-manager.local, but with right common name, so I guess it might be temporary certificate issued by cert-manager before challenge being accepted.

If I create a secret with valid certificate, then create gateway asking sds to use that secret it works fine. Then if I edit secret and replace with a certificate for another domain, initial certificate is still used and requests succeed.

It definitely looks like sds is unable to take secret update into account anymore.

@timurb
Copy link

timurb commented Oct 31, 2019

I wonder what happens when Letsencrypt certificate expires and needs to be renewed.
If SDS fails to load the updated cert it could become challenging to update certs in production with this setup.

@pib
Copy link

pib commented Nov 4, 2019

This seems to be fixed in 1.3.4

@wpbeckwith
Copy link
Author

I'll checkout 1.3.4 this weekend and report back.

@timurb
Copy link

timurb commented Nov 22, 2019

I've just checked this issue with 1.3.4 and I don't confirm this is fixed.

@timurb
Copy link

timurb commented Dec 2, 2019

I could not confirm it is fixed in 1.4.0 as well

@JimmyCYJ JimmyCYJ self-assigned this Dec 3, 2019
@JimmyCYJ
Copy link
Member

JimmyCYJ commented Dec 3, 2019

@timurb Could you describe how you test it with Istio 1.4.0, and provide node agent logs so that I can take a look?

@timurb
Copy link

timurb commented Dec 3, 2019

Sure.
I use the following approach.

  • I'm using EKS.
  • I install cert-manager and ClusterIssuer to use staging letsencrypt. Please let me know if you need their YAML description
  • I install istio 1.4.0 using helm and the following custom values.yaml:
gateways:
  enabled: true
  istio-ingressgateway:
    enabled: true
    sds:
      enabled: true
sidecarInjectorWebhook:
  enabled: true
security:
  enabled: true
certmanager:
  enabled: false
global:
  tag: 1.4.0
  mtls:
    enabled: false
  sds:
    enabled: false
  • I configure default istio gateway using this yaml:
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: release-name-istio-gateway
  namespace: kafka-services
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
    - port:
        number: 80
        name: http-80-release-name-istio-gateway
        protocol: HTTP
      hosts:
        - "*"
    - port:
        number: 443
        name: https-443-release-name-istio-gateway
        protocol: HTTPS
      tls:
        mode: SIMPLE # enables HTTPS on this port
        credentialName: release-name-istio-gateway
        privateKey: sds
        serverCertificate: sds
      hosts:
        - "*"
  • I create cert object with name matching credentialName specified in gateway definition to request Letsencrypt certificate.
  • I obtain ingressgateway ELB endpoint by looking at kubectl get svc -A

When certificate turns into READY state according to output of kubectl get cert -A I would expect it to be loaded into ingressgateway but it doesn't. I test that by running curl via https for the above ingressgateway ELB endpoint and I get an error like this:

$ curl -ik https://ad6fb8ab015c011eaada60686aba2f53-783532307.eu-west-2.elb.amazonaws.com
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to ad6fb8ab015c011eaada60686aba2f53-783532307.eu-west-2.elb.amazonaws.com:443

Alternatively I can make it work perfectly if I don't specify SSL port to ingressgateway definition at the very beginning but instead edit and add it once certificate gets to READY state. In that case running curl provides 404 as I would expect (given I don't create any VirtualService objects).

Logs of ingressgateway when I see the problem

$ kubectl logs -f istio-ingressgateway-5d8cd684b8-tss6c -n istio-system ingress-sds
2019-12-03T11:34:18.973787Z	info	ControlZ available at 127.0.0.1:9876
2019-12-03T11:34:19.074081Z	info	sdsServiceLog	SDS gRPC server for ingress gateway controller starts, listening on "/var/run/ingress_gateway/sds"

2019-12-03T11:34:19.074253Z	info	citadel agent monitor has started.
2019-12-03T11:34:19.074458Z	info	monitor	Monitor server started.
2019-12-03T11:34:19.074458Z	info	sdsServiceLog	Start SDS grpc server for ingress gateway proxy
>>>>>>> after this point I upload the gateway configuration
2019-12-03T11:38:06.484823Z	warn	secretFetcherLog	failed load server cert/key pair from secret istio-kafkagateway-istio-gateway: server cert or private key is empty
2019-12-03T11:38:06.487375Z	info	secretFetcherLog	Fail to extract secret istio-kafkagateway-istio-gateway found by direct api call
2019-12-03T11:38:06.487382Z	warn	secretFetcherLog	Cannot find secret istio-kafkagateway-istio-gateway, searching for fallback secret gateway-fallback
2019-12-03T11:38:06.487387Z	error	secretFetcherLog	cannot find secret istio-kafkagateway-istio-gateway and cannot find fallback secret gateway-fallback
2019-12-03T11:38:06.487393Z	warn	cacheLog	CONNECTION ID: router~10.10.102.206~istio-ingressgateway-5d8cd684b8-tss6c.istio-system~istio-system.svc.cluster.local-1, RESOURCE NAME: istio-kafkagateway-istio-gateway, EVENT: SecretFetcher cannot find secret istio-kafkagateway-istio-gateway from cache
2019-12-03T11:38:06.487510Z	warn	sdsServiceLog	CONNECTION ID: router~10.10.102.206~istio-ingressgateway-5d8cd684b8-tss6c.istio-system~istio-system.svc.cluster.local-1, RESOURCE NAME: istio-kafkagateway-istio-gateway, EVENT: waiting for ingress gateway secret for proxy "router~10.10.102.206~istio-ingressgateway-5d8cd684b8-tss6c.istio-system~istio-system.svc.cluster.local"
>>>> after this point I will expect lines for loading the updated cert and there are none

I'm not completely confident that the loglines were the same for istio 1.3.x, please let me know if you want me to verify that.

$ kubectl get secret -A | grep istio-kafkagateway-istio-gateway
istio-system      istio-kafkagateway-istio-gateway                     kubernetes.io/tls                     3      2m11s

Logs of everything working fine when I don't specify HTTPS endpoint for ingressgateway and create it later when secret is ready.

$ kubectl logs -f istio-ingressgateway-5d8cd684b8-jzt42 -n istio-system ingress-sds
2019-12-03T11:22:30.179813Z	info	ControlZ available at 127.0.0.1:9876
2019-12-03T11:22:30.280188Z	info	sdsServiceLog	SDS gRPC server for ingress gateway controller starts, listening on "/var/run/ingress_gateway/sds"

2019-12-03T11:22:30.280229Z	info	sdsServiceLog	Start SDS grpc server for ingress gateway proxy
2019-12-03T11:22:30.280540Z	info	citadel agent monitor has started.
2019-12-03T11:22:30.280653Z	info	monitor	Monitor server started.
2019-12-03T11:29:02.277259Z	info	secretFetcherLog	Return secret istio-kafkagateway-istio-gateway found by direct api call
2019-12-03T11:29:02.282154Z	info	secretFetcherLog	Return secret istio-kafkagateway-istio-gateway found by direct api call
2019-12-03T11:29:02.282332Z	info	sdsServiceLog	CONNECTION ID: router~10.10.0.254~istio-ingressgateway-5d8cd684b8-jzt42.istio-system~istio-system.svc.cluster.local-1, RESOURCE NAME: istio-kafkagateway-istio-gateway, EVENT: pushed key/cert pair to proxy

@JimmyCYJ
Copy link
Member

JimmyCYJ commented Dec 4, 2019

@timurb Thanks for the update.
From the gateway yaml, it seems that credentialName should be istio-kafkagateway-istio-gateway, because credentialName needs to match gateway secret name.
In the ingress-sds log, it shows that the secret is rejected because at least one field is not filled.
2019-12-03T11:38:06.484823Z warn secretFetcherLog failed load server cert/key pair from secret istio-kafkagateway-istio-gateway: server cert or private key is empty
This log is generated from here

, and extractCertAndKey() decides whether this secret has empty field.
Is it possible for you to dump the secret istio-kafkagateway-istio-gateway, and check if the secret has proper set of fields? {"cert", "key"} or {"tls.crt", "tls.key"}
One more thing is, please don't use prefix "istio" for gateway secrets, as we reserve that prefix for istio internal secrets. This secret should be filtered out here but it gets parsed somehow.

@timurb
Copy link

timurb commented Dec 4, 2019

From the gateway yaml, it seems that credentialName should be istio-kafkagateway-istio-gateway, because credentialName needs to match gateway secret name.

Sorry for misleading. I'm using correct name for secret -- the one that shows up in logs.
I just pasted the output from helm template command here and didn't notice that the secret name is not correct. Exactly the same yaml (with the same correct secret name) works fine if I attach the HTTPS endpoint later when certificate is issued.

In the ingress-sds log, it shows that the secret is rejected because at least one field is not filled.

Right, this is correct log message: while certificate is not yet issued by letsencrypt one of the fields are empty. Once cert-manager passes challenge-response successfully with letsencrypt it updates the secret with certificate and private key.
The problem here is when it is done I see no new loglines in ingressgateway logs.
I think in 1.3.X there were additional tries to load the secret with the same message of "cert or private keys is empty" (please let me know if you need me to confirm this).
In 1.4.0 there are no log messages after initial loading of the secret fails -- even though the secret was updated.

One more thing is, please don't use prefix "istio" for gateway secrets, as we reserve that prefix for istio internal secrets. This secret should be filtered out here but it gets parsed somehow.

Thanks for reminding about that!
In my case secret name is constructed automatically by Helm: I use {{ $fullName }} as secret name and that starts with "istio" prefix).
I will fix that shortly and get back to you with the result.

@timurb
Copy link

timurb commented Dec 6, 2019

@JimmyCYJ looks like that was my problem — having istio- at the start of secret name.
Once I changed it to something different the issue faded away.
Don't know if it relates to the original reported issue though.

I checked that only in istio 1.4.0. If you want me to check that in 1.3.X please let me know.

@JimmyCYJ
Copy link
Member

Thanks for letting me know. I am going to close this issue now. Please feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants