Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio Ingress integration with cert-manager #9030

Closed
vitalijsvolodenkovs opened this Issue Sep 28, 2018 · 23 comments

Comments

@vitalijsvolodenkovs
Copy link

vitalijsvolodenkovs commented Sep 28, 2018

Based on Istio documentation you should manually manage certs.

As you know nginx-ingress-controller have integration with cert-manager and automatically creates and manages certificates(by using annotations).

Are you going to implement this kind of feature for istio Ingress-ingressgateway ?
Or we need to create out own workaround ?

Regards,

@prune998

This comment has been minimized.

Copy link
Contributor

prune998 commented Oct 15, 2018

#6486 (comment)

This is an old discussion around Cert-Manager.
As of today, nothing changed (or did I missed something ? )

@ayj ayj added the area/networking label Oct 15, 2018

@BrianChristie

This comment has been minimized.

Copy link

BrianChristie commented Nov 28, 2018

I've done some research on the state of TLS Termination with Istio-IngressGateway, here's what I've found in case it's useful to others following this:

There appears to be two problems that block using cert-manager effectively:

  1. Multiple certificates are not supported
  2. Certificates are not reloaded when updated

Presently it looks like Istio-IngressGateway directly loads a local certificate into Envoy, and only supports a single certificate globally in the entire cluster, because the secret name is hardcoded to istio-ingressgateway-certs in the istio-system namespace.

In #7976 there is a link to a design doc that proposes a ingress gateway agent which will provision certificates to envoy via the SecretDiscoveryService API.

The proposal adds tls.fromSecret: istio-ingress-certs.bookinfo-cert to the Gateway object which allows multiple certificates in the cluster, but doesn't allow multiple certificates per Gateway.

There's also a note in the design saying ingress gateway agent will support calling cert-manager in the future.

Edit:
This is also relevant, regarding adding HTTP01 challenge support with Istio to cert-manager: jetstack/cert-manager#783
cc: @munnerz can you add anything about the cert-manager integration plans?

@wstrange

This comment has been minimized.

Copy link

wstrange commented Nov 28, 2018

@BrianChristie Thanks for the summary. This is a blocker for us moving to istio. We need smooth cert manager integration and support for multiple certs.

@prune998

This comment has been minimized.

Copy link
Contributor

prune998 commented Nov 28, 2018

@wstrange you can check CertMerge-Operator in the meantime...
https://medium.com/@prune998/istio-1-0-2-envoy-cert-manager-lets-encrypt-for-tls-certificate-merge-7a774bff66c2

note that it will not automatically reload/restart the Ingress Gateway when certs are renewed...

@dunjut

This comment has been minimized.

Copy link
Contributor

dunjut commented Nov 29, 2018

If you don't have a requirement for reloading updated certificates, you can try cert-sync as a current workaround for dynamic certs provision.

The ingress gateway agent is still in review.

@prune998

This comment has been minimized.

Copy link
Contributor

prune998 commented Nov 29, 2018

Don't want to push my side, but cert-sync requires a shared volume and some changes in the IngressGateway Deployment. CertMerge-Operator just update the secret that the Gateway already use. It's a drop in solution.

Both have the same drawback : they don't make the IngressGateway reload the certs when they are updated.

One solution there is to "update" your Gateway definition so it will trigger the reload...

@dunjut

This comment has been minimized.

Copy link
Contributor

dunjut commented Nov 30, 2018

@prune998 CertMerge-Operator is kind of tricky but a good idea.

However, there is a size limit (IIRC, 1MB) for ConfigMap, so merging all certs into one may not suit in production.

@prune998

This comment has been minimized.

Copy link
Contributor

prune998 commented Nov 30, 2018

On my side, a .crt is around 3600 bytes and a .key is 1600...
If my maths are good, it's around 174 certificates before reaching the 1MB (or is it one mega Bit ?)
From my point of view, having 174 certs in your external ingress endpoint is a bad idea... but it's just me :)

@prune998

This comment has been minimized.

Copy link
Contributor

prune998 commented Nov 30, 2018

And thinking out loud... the other solution would be to have those 174 certificates in their own secret. That's 174 volumes to mount into your IngressGateway pod, and you'll have to recycle this pod every time you add a new Gateway with a new certificate... not sure it's better either :)

@incfly

This comment has been minimized.

Copy link
Member

incfly commented Nov 30, 2018

@myidpt

This comment has been minimized.

Copy link
Contributor

myidpt commented Nov 30, 2018

@BrianChristie
Thanks for bringing up the design doc.
To clarify, this design does support multiple key and certs per gateway. Sorry I didn't specify it in the doc, I've updated the doc CUJ/workflow to make the feature explicit.
As the agent directly calling to the cert-manage is a future work, you can have the cert-manager write the key and cert into secret, and the agent reads from the secret as a quick solution.

@myidpt myidpt changed the title Istio integration with cert-manager Istio Ingress integration with cert-manager Dec 5, 2018

@mmckane

This comment has been minimized.

Copy link

mmckane commented Dec 6, 2018

So I have been struggling with cert-manager integration as well, especially with HTTP01 validation. I have been using @dunjut Cert-Sync with Cert-Manager. Here are the issues I have run into so far on the HTTP01 front and am currently struggling with:

  1. With the Current Version of Cert-Manager lets encrypt Solver pod names/service names are randomized, and the labels are all hashed with adler32. To work around this I compiled a version that adds a truncated label with unhashed Domain name and create a service that selects it. (not really an istio issue)
  2. Because the current cert-manager solver pods do no have side cars enabled (via annotation) it's required that you create a service entry to a known internal cluster host name to route the traffic in my case the un-managed service created above.
  3. I have been using Cert-Sync to handle multiple Certs. Though as stated above there seem to be a few bugs/issues with this.
    A. If you create a Gateway spec before creating Certificate secret all traffic for the virtual service including http seems to fail, which means that when using lets-encrypt the solver is inaccessible. To work around this I have been creating a certificate secret with a dummy self signed cert and having cert-manager update it once the Lets Encrypt cert has been issued.
    B. As stated above in the thread if Cert Manager issues the certificate and updates/creates the secret the Istio Gateway doesn't pick up on the change until either the gateway is restarted or the gateway spec is "updated". This has been causing a huge headache with step A as I am using Helm to do the app deployment and I can't make helm go update the spec after the app has been deployed.

As I have most of this working in my test environment, to agree with @BrianChristie the 2 biggest blocks in the istio project regarding cert-manager in my eyes are the issues around automatically reloading certificates and the state of support for multiple certificates in ingress gateway, as currently all solutions require some kind of manual intervention IE restarting pods, or refreshing the gateway spec.

@berstend

This comment has been minimized.

Copy link

berstend commented Jan 28, 2019

Also I feel this issue is very much related: #7479

What's the current best practice to use istio's ingressgateway with certmanager in regards to having the certs updated in the ingressgateway automatically - without downtime?

Patching the ingressgateway deployment by hand/CronJob kinda works but causes a small downtime and feels like a hack?

@thomschke

This comment has been minimized.

Copy link

thomschke commented Jan 28, 2019

how about increasing replicas of the ingressgateway deployment?

@berstend

This comment has been minimized.

Copy link

berstend commented Jan 29, 2019

@thomschke thanks a lot for the suggestion!

Unfortunately patching the ingressgateway deployment with 3 replicas also results in downtime as all replicas are restarted at once (despite StrategyType: RollingUpdate and RollingUpdateStrategy: 0 max unavailable defined in the deployment).

Maybe this is due to the ingressgateway not having a liveness/readiness probe defined. Even upscaling the replicas will result in a brief service interruption. 😅

I'll continue toying with this approach, potentially killing the ingressgateway pods by hand/script with a brief delay, to force the pods rolling over without downtime.

Still, even if I find something that works without downtime the next step would be put this into a self baked CronJob which I feel isn't the right approach?

edit, FWIW: I'm using istio-on-gke which I've noticed comes with different defaults than the vanilla github istio.

@thomschke

This comment has been minimized.

Copy link

thomschke commented Jan 29, 2019

@berstend: Very interesting, keep us informed !

Did you patch the annotations of the pod template (not the deployment itself) like:
kubectl patch -n istio-system deployment/istio-ingressgateway -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"date +'%s'\"}}}}}"

For upscaling -> #11001

@berstend

This comment has been minimized.

Copy link

berstend commented Jan 29, 2019

Oh thanks @thomschke, that's the perfect PR for the problem I've encountered 😄

So far I had partial success using this combo:

❯❯❯ kubectl -n istio-system patch HorizontalPodAutoscalers/istio-ingressgateway -p '{"spec":{"minReplicas":3}}'
❯❯❯ kubectl -n istio-system patch deploy/istio-ingressgateway -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable": 0}}}}'
❯❯❯ kubectl -n istio-system patch deploy/istio-ingressgateway -p '{"spec":{"minReadySeconds":60}}'
❯❯❯ POD_NAME=foobar kubectl -n istio-system patch deployment istio-ingressgateway -p '{"spec":{"template":{"metadata":{"annotations":{"restarted-by":"'${POD_NAME}'", "restarted-at":"'$(date +%s)'"}}}}}'

minReadySeconds will ensure that traffic will only hit new gateways after some delay, which actually works and results in rolling updates.

Unfortunately gateway pods that are in termination state are still being routed to and will cause connection errors. The more I play with this the stronger I feel that the only proper solution for the gateways is to have readiness probes.

I'm gonna look into monkey patching health checks into the ingressgateway now 😄

Still wondering how others run istio + certmanager in production at the moment without downtimes or manual scripting hacks.

edit, sorry - missed your question. I never patch pods directly but only through deployments, if that was your question :-)

@esnible

This comment has been minimized.

Copy link
Contributor

esnible commented Feb 2, 2019

Issue #11397 seems related.

@myidpt

This comment has been minimized.

Copy link
Contributor

myidpt commented Feb 2, 2019

@JimmyCYJ is able to help.

@musicformellons

This comment has been minimized.

Copy link

musicformellons commented Feb 11, 2019

Could anyone (@prune998 for instance?) maybe clarify why mixing with standard ingress (which does work with cert-manager) might not be a good idea? Or could this work fine?

@trevorlinton

This comment has been minimized.

Copy link

trevorlinton commented Feb 22, 2019

I've pulled down the version of Istio after @JimmyCYJ made commit 19 days ago (
#11496) and tried it against letsencrypt

I installed the latest nightly chart for Istio 1.1.x, and followed the instructions yet to be released but posted here: https://github.com/istio/istio.io/pull/3224/files

I end up with these errors in my ingress gateway pod:

[2019-02-22 03:38:39.290][18][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:70] gRPC config for type.googleapis.com/envoy.api.v2.auth.Secret rejected: Proto constraint validation failed (SecretValidationError.TlsCertificate: ["embedded message failed validation"] | caused by TlsCertificateValidationError.CertificateChain: ["embedded message failed validation"] | caused by field: "specifier", reason: is required): name: "testsite2.example.com"
tls_certificate {
  certificate_chain {
  }
  private_key {
    inline_bytes: "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEAxt3D...redacted...\n-----END RSA PRIVATE KEY-----\n"
  }
}

It looks as if the certificate chain or cert isn't being pulled? I should note that the I did not use mutual TLS authentication, here's the certificate/gateway/virtualservice definition:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: testsite2.example.com
  namespace: istio-system
spec:
  secretName: testsite2.example.com
  issuerRef:
    name: letsencrypt
    kind: ClusterIssuer
  commonName: 'testsite2.example.com'
  dnsNames:
  - testsite2.example.com
  acme:
    config:
    - dns01:
        provider: aws
      domains:
      - testsite2.example.com
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: testsite2-example-com-gateway
  namspace: istio-system
spec:
  selector:
    istio: sites-public-ingressgateway
  servers:
  - port:
      number: 443
      name: https-testsite2-example-com
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: testsite2.example.com
      privateKey: /etc/istio/testsite2.example.com/tls.key
      serverCertificate: /etc/istio/testsite2.example.com/tls.crt
    hosts:
    - "testsite2.example.com"
  - port:
      number: 80
      name: http
      protocol: HTTP
    tls:
      httpsRedirect: true
    hosts:
    - "testsite2.example.com"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: testsite2-example-com
  namspace: istio-system
spec:
  hosts:
  - "testsite2.example.com"
  gateways:
  - testsite2-example-com-gateway
  http:
    - match:
      - uri:
          prefix: "/"
      rewrite:
        uri: "/productpage"
      route:
      - destination: 
          host: productpage.default.svc.cluster.local
          port:
            number: 9080
    - match:
      - uri:
          prefix: "/nginx"
      rewrite:
        uri: "/"
      route:
      - destination: 
          host: testapp.bashir.svc.cluster.local
          port:
            number: 80

The secret is successfully there but it is a kubernetes/tls type, and has entries in tls.crt and tls.key however ca.crt is empty.

Ideas?

@JimmyCYJ

This comment has been minimized.

Copy link
Contributor

JimmyCYJ commented Feb 22, 2019

Thanks for trying it out. Looks like the gateway agent sends empty TlsCertificateValidationError.CertificateChain back to ingress gateway. Probably the gateway agent extracts empty cert chain from secret. I will test it tomorrow.

@thomschke

This comment has been minimized.

Copy link

thomschke commented Feb 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.