Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/L7 metrics for TLS-terminated downstream requests in a Gateway #50758

Open
kaiburjack opened this issue Apr 30, 2024 · 6 comments
Open

HTTP/L7 metrics for TLS-terminated downstream requests in a Gateway #50758

kaiburjack opened this issue Apr 30, 2024 · 6 comments

Comments

@kaiburjack
Copy link

Describe the feature request

We are currently using the Egress Gateway to terminate HTTPS/TLS requests originating from our workloads to external domains that we control as well and whose traffic we want to observe on L7. We do have an own self-signed CA and a server certificate with a PKI that we distribute securely to the Istio Egress Gateway. The role of that Egress Gateway should then be to terminate the HTTPS/TLS connections, make the HTTP requests visible for L7 metrics gathering, and then optionally send these HTTP requests further to an egress cache (Varnish) which is used to cache egress requests and whose Istio sidecar will then do TLS origination to the actual external domains; or the Egress Gateway will TLS-originate those requests again and send them off to the actual destination domain.

That all works splendidly, however no HTTP/L7 metrics are gathered by Istio/Envoy for downstream HTTPS/TLS requests to the Egress Gateway that are ultimately TLS-terminated by the Egress Gateway, so should be available to Istio/Envoy there.

I am assuming that, when you have a HTTPS/443 listener on a Gateway CRD with downstream tls configured, then the Envoy listener this is configured for, is still a HTTPS/TLS listener which "thinks" that it cannot inspect HTTP/L7 metrics, even though we are doing TLS-termination here.

Describe alternatives you've considered

Observing HTTP/L7 metrics for outbound/upstream requests in the Egress Gateway works, though. So, we currently do not have L7 metrics for downstream requests to the Egress Gateway (which are originally HTTPS/TLS requests, but TLS-terminate by the Egress Gateway), but we have HTTP/L7 metrics for upstream requests done by the Egress Gateway.

However, that means, we lose the correlation between the source service where the HTTPS/TLS connections/requests originate from and the Egress Gateway where those requests are received (and TLS-terminated) and then sent further upstream.

Affected product area (please put an X in all that apply)

[ ] Ambient
[ ] Docs
[ ] Dual Stack
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[X] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane

Additional context

@howardjohn
Copy link
Member

Can you share a config dump (istioctl pc all PODNAME)? Istio knows when its HTTPS and does HTTP metrics, so this means either:

  • Its not actually doing what you think (terminating HTTPS)
  • Stats are not configured correctly
  • A new bug

@kaiburjack
Copy link
Author

kaiburjack commented Apr 30, 2024

The Egress Gateway is definitely terminating the TLS connection. I see the Common Name of the certificate (which we set to a distinct own name) when I do e.g. curl -v https://httpbin.org/status/200 in a source workload/pod, and also issue for httpbin.org (for testing):

* Server certificate:
*  subject: CN=egress-server-cert
*  start date: Apr 30 14:41:18 2024 GMT
*  expire date: Apr 30 14:41:18 2025 GMT
*  subjectAltName: host "httpbin.org" matched cert's "httpbin.org"
*  issuer: CN=pki-ca
*  SSL certificate verify ok.

And the certificate is only ever mounted to the Egress Gateway's Gateway CRD, which looks like this:

spec:
  selector:
    app: istio-egressgateway
    istio: egressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: https-port
      number: 443
      protocol: HTTPS
    tls:
      credentialName: egress-server-cert-secret
      mode: SIMPLE

I think, it is much like with an Ingress Gateway which terminates HTTPS/TLS connections for our own server domain (with a public Let's Encrypt certificate). There, I also do not see (via Prometheus stats endpoint) the incoming/downstream requests from clients towards the Ingress Gateway. So, no istio_requests_total metrics here when the reporter="destination" on that Ingress Gateway pod. However, upstream requests are visible as Prometheus metrics when the HTTPS/TLS-terminated requests go upstream from the Ingress Gateway to the upstream services/backends. There, I do see the HTTP requests e.g. via istio_requests_total{reporter="source", source_app="istio-ingressgateway"}.

But no downstream HTTP metrics with istio_requests_total{reporter="destination", destination_app="istio-ingressgateway"}.

The exact same scenario is with the Egress Gateway: Terminating the downstream HTTPS/TLS connections, but no HTTP/L7 metrics over the Prometheus stats endpoint for incoming/downstream requests. Though, outgoing/upstream requests, the metrics are fine.

I am going to assemble a sensible istioctl pc all egress-gateway-... output, because it contains many sensitive information at many places and is many tens of thousands of lines long.

By the way, we are using Istio 1.21.2.

@howardjohn
Copy link
Member

1 request to a gateway generates 1 single metric, not a distinct one for upstream and downstream.

Sidecars generate 2 because there are two distinct sidecars (each generates 1)

@kaiburjack
Copy link
Author

kaiburjack commented Apr 30, 2024

1 request to a gateway generates 1 single metric, not a distinct one for upstream and downstream.

Oh, that makes total sense, since there is only one listener filter chain and single http connection manager processed.
So, is there no way to correlate requests through a gateway between the downstream and upstream?
What I really am after is to know in the metrics which downstream service was sending a request through a gateway, which ultimately sent this very request upstream to another service.
I cannot track the HTTPS request in the originating workload's outgoing sidecar, because it cannot inspect the HTTPS/TLS traffic on layer 7. Only the gateway could do that, but since, as you said, there is only a single metric generated, and that metrics seems to look like "outgoing/upstream" metrics (so the source_app is the gateway and the destination_app is the service this request is destined for).
What I would like to have is a single metric point with the originating service being the source_app and the eventual destination upstream being the destination_app.

@howardjohn
Copy link
Member

Basically no. You can use tracing to do this, but its of course not quite the same as metrics.

Interestingly, part of ambient mesh we decided to make the waypoint component work exactly how you want it to work: #42320 (comment). While we haven't talked about it much yet, there are plans to be able to use waypoints like egress gateways, which would ultimately fulfill that use case. Its a ways off to get the egress part though.

@kaiburjack
Copy link
Author

kaiburjack commented Apr 30, 2024

Thank you for your answers and insights. Much appreciated!
Actually, the reason why we opted for an egress gateway was that there does not seem to be an easy way for the outbound listener of a workload's sidecar to do the TLS-termination, because whenever a ServiceEntry was configured with a HTTPS port, it would not configure a Http connection manager on it, but only make TLS metrics visible. Terminating the application-side HTTPS/TLS connection on the outbound listener would also suit us here.
Using the downstream tls config on a gateway was just the next best thing, since there currently is no such thing as a tls config on a sidecar's outbound/upstream listener (without restoring to EnvoyFilter magic, that is).
While researching this topic I came across at least these following Istio GitHub issues tackling the same problem, I think:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants