Metadata discovery service #43937

kyessenov · 2023-03-15T00:25:42Z

This is a proposal to use the Workload Discovery Service in the Istio mesh as the source of the peer metadata for the telemetry.
This effectively deprecates the requirement to supply the baggage header in the request/response pair in the HBONE protocol, and provides an alternative design for the existing metadata exchange protocols in the sidecar mesh.

Requirements

R1: Ambient mesh produces Istio peer telemetry istio_requests_total for backwards compatibility with the sidecar mode. Additionally, any new telemetry proposals using Otel are supported.
R2: Pay-as-you-go for the telemetry - users should not carry costs unless they choose to get the value of the telemetry.
R3: Minimal requirements on the mesh members to join the mesh.
R4: Trust in the peer metadata.

Problems with the status quo

The current design for the telemetry production relies on several disjoint mechanisms to produce the standard telemetry in the Istio mesh:

1. HTTP metadata exchange

HTTP sidecar telemetry relies on the special headers x-envoy-peer-metadata to announce the source metadata per each request. This design is very flexible, but unfortunately it suffers from several issues: high cost and exclusive headers. First, the high per-request cost of the telemetry: each request gets up to 4K overhead on the wire, and up to 10% CPU overhead decoding the header. This is counter to the goal R2 - any Istio telemetry forces enablement of HTTP metadata exchange, which incurs significant data plane overhead. Second, special headers are counter to goal R3. To produce the telemetry on the server, clients have to synthesize the non-standard Istio header. Third, there is no way to establish the integrity of the metadata for R4, since the peer metadata is not signed by a trusted authority.

2. TCP ALPN metadata exchange

TCP traffic uses a different protocol that relies on the bytes prefix in TCP connections, guarded by a special TLS ALPN string. This proposal violates R3 - custom ALPN and custom wire encoding is incompatible with any client except the Istio sidecar. Similar to the above, R4 is not satisfied, and the metadata is untrusted.

3. Baggage: HTTP CONNECT metadata exchange

This is the same as the first design, but instead works on the longer living tunnel HTTP CONNECT streams. It suffers from the same problems: extra cost on the wire, extra cost for the interoperation, lack of integrity. The main benefit is better protocol encapsulation, since the applications no longer see the metadata header in-flight. Current implementation only supports the server-side reporting, with the client telemetry reporting depending on EDS design below.

4. CDS and EDS

This is a fallback mechanism to use the dynamic metadata from CDS and EDS in case the metadata exchange fails. This is generally useful for the failure scenarios and for destinations outside the mesh. However, this clearly violates R2 - any load balancing server has to provide the peer metadata as part of the response, which is traditionally outside the domain of the global load balancing. A load balancing client also receives a much larger xDS even when not using the telemetry included with it. Interoperation in R3 is improved since there is no special data protocol, and the integrity R4 can be implemented by the control plane.

Proposal

We propose to replace all of the above with a dedicated workload metadata service, with the following payload and IP address as the resource key:

message Workload {
  // Name represents the name for the workload.
  // For Kubernetes, this is the pod name.
  // This is just for debugging and may be elided as an optimization.
  string name = 1;
  // Namespace represents the namespace for the workload.
  // This is just for debugging and may be elided as an optimization.
  string namespace = 2;
  // IP address of the workload. Serves as the lookup key.
  bytes address = 3;
  // The SPIFFE identity of the workload. The identity is joined to form spiffe://<trust_domain>/ns/<namespace>/sa/<service_account>.
  // TrustDomain of the workload. May be elided if this is the mesh wide default (typically cluster.local)
  string trust_domain = 6;
  // ServiceAccount of the workload. May be elided if this is "default"
  string service_account = 7;
  // CanonicalName for the workload. Used for telemetry.
  string canonical_name = 10;
  // CanonicalRevision for the workload. Used for telemetry.
  string canonical_revision = 11;
  // WorkloadType represents the type of the workload. Used for telemetry.
  WorkloadType workload_type = 12;
  // WorkloadName represents the name for the workload (of type WorkloadType). Used for telemetry.
  string workload_name = 13;
}
enum WorkloadType {
  DEPLOYMENT = 0;
  CRONJOB = 1;
  POD = 2;
  JOB = 3;
}

This metadata service is xDS-based, supports on-demand lookup, and allows querying metadata by network IP directly. The proposal satisfies all the requirements:

R1: The metadata content in the service subsumes the existing peer metadata.
R2: The service is optional and has zero wire and per-request CPU costs. The memory cost is O(workloads) with SotW xDS and is identical to the existing HTTP metadata exchange with on-demand delta xDS.
R3: Since this proposal eliminates the requirement for special headers or protocols, it’s simpler for endpoints to interoperate with the mesh.
R4: The integrity of the metadata can be established by using trusted lookup keys in the service, e.g. by using IPs when they are not easily spoofed, or by using extra attributes in the TLS certificates.

It is worth noting that a dedicated service is a better alternative to the CDS/EDS design above since it allows decoupling the metadata provider from the load balancing concern. It permits metadata to be keyed by something other than the endpoint addresses, maximizes sharing of data (same endpoints in two clusters). A dedicated client to the metadata service can be embedded in the user applications, or in the telemetry processing pipeline (e.g OTel collector) for backfilling data, instead of shoehorning CDS/EDS for the same purpose.

Risks and Mitigations

1. Control plane load

As for any new xDS, there is a risk with a new operational load on the control plane. It’s worth noting that EDS already includes the majority of the information needed, and is consumed by all endpoints, therefore imposing the load in the sidecar model. To minimize the risk, we propose to share the xDS pipeline with ztunnel, and re-purpose the existing WDS as-is. We intentionally deleted the PTR parts of the proto related to the authorization, since that is outside the scope of the proposal.

2. Data plane load

There is additional memory overhead required to hold the peer metadata in the proxies. To minimize the risk, we propose to couple metadata discovery with HBONE. In other words, HBONE would require metadata discovery to produce Istio telemetry in all clients of HBONE. This would leave sidecars without HBONE safe, and allow us to gradually gain experience with the service as HBONE matures. A longer term approach to minimize the memory overhead is to switch to the on-demand model. This would require modifying the telemetry pipelines in Envoy to be asynchronous and await for xDS response before flushing the telemetry reports. This can be delayed until there’s a strong need for it, since xDS protocol fully supports this mode of operation.

The text was updated successfully, but these errors were encountered:

howardjohn · 2023-03-15T17:36:44Z

One thing I wasn't sure about - if I get a request from 1.2.3.4, that IP may live in other networks. How do we handle this? (sorry if its discussed already, wanted to comment befor I forgot)

costinm · 2023-03-16T14:52:57Z

Related comment: please add a section on how we get the IP and network.

Default is simple - from the network layer.

As a server - x-forwarder-for or forwarded headers I assume, but only if the peer is auth and is a gateway or waypoint.

As client - don't remember if xff is propagated on return, need to check.

Would be worth adding a section on security ( can a client forge his telemetry ?).

I still believe we should include podname and clustername in all responses and allow hostname in addition to network+ip - I would use clustername instead of network, since Istiod may need to do on-demand lookup too in very large meshes.

kyessenov · 2023-04-04T23:58:44Z

@howardjohn For multi-network, the proposal is to aggregate metadata in a federated way. That means that the metadata provider can access metadata from any mesh endpoint. Specifically, for multi-cluster endpoints we'd expose k8s metadata from all clusters to the provider.

@costinm Those are good points, and I don't have a good answer for all of them and deliberately left them underspecified.
There are several options:

Client or server lookup using mTLS property. ztunnel presents a client pod certificate with client pod name in it, and we look up based on that.
Client lookup using x-forwarded-for. ztunnel appends the header during forwarding and we use the first hop IP address as the key.
Client lookup using pod name and pod cluster. ztunnel injects or propagates the header identifying the "real client" as pod ID and pod cluster ID, and we that as the key.
Server lookup using destination IP. A client uses the destination IP and/or endpoint metadata as the lookup key.
Server lookup using destination pod name and pod cluster. We'd require special response headers to identify the workload in the server gateway/ztunnel.
All of the above.

I think there's a need to standardize on the "workload identification headers" in HBONE protocol. Baggage is an indirect way to deliver this information, but I think we need a first class representation for it, not assume only telemetry usage.

For background information, OTEL k8s attribute processor fulfills the same design goal in a broader k8s context.

lei-tang · 2023-04-05T17:31:34Z

Hi Kuat, thanks for the proposal! Can you add a detailed workflow diagram to help readers better understand the proposal?

kyessenov · 2023-04-05T18:07:22Z

@lei-tang This is a basic workflow for metadata discovery by a gateway:

CC @markdroth : FYI a proposal to drop "peer metadata" header from transport protocol requirements, and rely on a separate "back-fill" metadata flow. This aligns well with OTel processor pipeline architecture, instantiated as a custom xDS-based telemetry processor in Envoy.

howardjohn · 2023-04-06T19:47:28Z

One thing I think Baggage gives that this doesn't is the ability for the client to tell the server which Service it access it through

costinm · 2023-04-06T20:19:35Z

IMO telemetry is not an exact science - and should use whatever metadata is available, from the most accurate source it has access to. If a Baggage header from the client exists - and it includes the service it access through - we should use it. Otherwise - canonical service is likely good enough, and what we would use if the client is external ( or not using Istio ). I want to also make sure we take into account other headers that are part of routing. For example for session affinity to work, we need to encode the cluster ( == service ) into the cookie or some header. We discussed in the past that for many CONNECT clients the authority header will be the hostname ( == service ). While now we set it to the IP, the protocol should probably preserve the VIP and hostname when available and include it in standard headers ( XFF, etc). And I think telemetry should also get info from standard headers ( authority, XFF, etc), if available.

…

On Thu, Apr 6, 2023 at 12:47 PM John Howard ***@***.***> wrote: One thing I think Baggage gives that this doesn't is the ability for the client to tell the server which Service it access it through — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2WIL4UQZG5MPBCS25DW74MVXANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kyessenov · 2023-04-06T20:53:55Z

FYI destination service was never in scope for the metadata exchange. It's a property of a request, while the subject is a description of the peer. Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and it is aggressively cached.

Besides that, yes it could be useful. We had that before with a mixer attribute but it was not a popular design choice.

costinm · 2023-04-07T02:15:08Z

Destination is not part of the 'metadata exchange', just info from client to server. But it is valid and useful telemetry info for logs or traces - and we should send it if available ( VIP or clustername).

…

On Thu, Apr 6, 2023 at 1:54 PM Kuat ***@***.***> wrote: FYI destination service was never in scope for the metadata exchange. It's a property of a request, while the subject is a description of the peer. Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and it is aggressively cached. Besides that, yes it could be useful. We had that before with a mixer attribute but it was not a popular design choice. — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2UCKWOJXSFD3IBRUDDW74UO5ANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

louiscryan · 2023-05-15T19:21:30Z

Trying to catch up on this a bit and it seems like it might be worth an explicit review.

I'm not totally sold that we can make federation work reliably for multi-network solutions. We already allow for connectivity without full multi-network knowledge via loose coupling. This might even be problematic within large networks with very high cluster counts.

As for keeping bandwidth down I don't see the option of simply POSTing the metadata on connection initiation which would outperform putting into CONNECT headers. The reserved endpoint for this can be versioned so we could actually treat this like a real API. I do agree that many of the alternative methods are pretty terrible.

Finally for verification we can make sure inlined baggage is signed by an authority like the control plane which is what was proposed above. (Aside - this feels a lot like putting things into SANs so might be worth talking about those two things together)

I generally agree with @costinm that we can do both and fallback to the control-plane if we can't resolve inline passing the caller identity etc.

kdorosh · 2023-05-15T19:43:50Z

For multi-network, the proposal is to aggregate metadata in a federated way. That means that the metadata provider can access metadata from any mesh endpoint. Specifically, for multi-cluster endpoints we'd expose k8s metadata from all clusters to the provider.

just reiterating what @louiscryan said above in a different way: beyond loose coupling, many isolated control plane architectures have strong requirements to preserve telemetry without allowing contact across boundaries (all communication is coming through network gateways, which is exactly what we want telemetry on. we cannot force users to make cross network requests to a federated metadata service for telemetry information)

kyessenov · 2023-05-15T20:03:02Z

@kdorosh @louiscryan

If every externally facing gateway acts as a metadata discovery service for its network, would that address your concern about the centralized metadata discovery service?

Concretely, a server telemetry producer would call gRPC POST asynchronously to retrieve metadata for an endpoint behind a gateway on a well-known endpoint. The gateway address would be either a network address or a dedicated header in the CONNECT request. Every sidecar could also respond to the Metadata discovery to itself.

I agree that signing the header would work as a delegation mechanism, although that would be the first instance in Istio. However, signing doesn't address the other problems with inline headers:

coupling with the protocol (CONNECT)
per-request overhead (higher in fact with signing)
coupling with Istio proxies, a telemetry intermediary like Otel collector cannot participate in the telemetry production and off-load the proxies.

kdorosh · 2023-05-15T21:14:31Z

If every externally facing gateway acts as a metadata discovery service for its network, would that address your concern about the centralized metadata discovery service?

I think it could, provided we update HBONE baggage or something to include the source network per istio/ztunnel#515

Also in #43937 (comment)

Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and it is aggressively cached.

Can you elaborate on this caching? Who is caching, why, and for how long? I'm a bit concerned with all this async stuff since IPs can be recycled in k8s.

Also generally worth discussing.. I think this metadata service may have to handle a very high amount of requests (barring aggressive caching, concerns noted above) and that comes with its own operational and CPU/memory costs. Given that for request metrics (not peer metadata) we will still need HTTP headers or TCP metadata for things like originating client IP, originating network, etc.. it seems like we're already paying the cost on each request and that baggage is not as painful as it seems.

kyessenov · 2023-05-15T22:04:54Z

Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and it is aggressively cached.

Can you elaborate on this caching? Who is caching, why, and for how long? I'm a bit concerned with all this async stuff since IPs can be recycled in k8s.

It is cached in the MX extension because the CPU cost to decode the header is significant https://github.com/istio/proxy/blob/master/extensions/metadata_exchange/plugin.cc#L116. There's no expiration or verification - it's easy to confuse telemetry with the same key.

costinm · 2023-05-16T01:41:58Z

I'm not even sure what 'federated' means - Istio doesn't really support multiple meshes, and all multi-cluster and multi-network is based on a single security and discovery domain. Federations are interesting - and there are various options, but we can focus on the current feature set of Istio and maybe use a different solution for federation ( where boundaries, 'authorities', root CAs, for each federated entity are defined). With Istio as it is today and 'flat network' - I think the IP that is intercepted and used in the flat network is fine. With multi-network - either overlapping IPs or if we have to go through an East-West gateway - we will clearly need the gateway to pass information. But we do need this for ingress and egress as well. It is not exclusive - i.e. we must pick only one source of metadata, nothing else allowed. We use the best metadata we find - if the peer is a waypoint/E-W/ingress/egress we rely on CONNECT and headers ( after we verify the identity of the peer as a trusted gateway), if the peer is a same-cluster or a non-istio workload in the cluster - we can get the info using MDS. And I agree that if a JWT or other signed info is present - either in-band or in a source we can pull on demand from ( including DNS-SEC signed records for example ) - that's a golden signal, and Istiod could integrate with such sources. Federated versions of 'telemetry meta for IPs' have worked relatively well for >20 years, as DNS PTR, WHOIS, geolocation and other sources. At least for telemetry and abuse - and continue to be very useful for ingress traffic. I don't think it would be bad if someone wrote a 'workload discovery server' to integrate with such sources for ingress, and Istiod would delegate to it for the public IP ranges.

…

On Mon, May 15, 2023 at 3:05 PM Kuat ***@***.***> wrote: Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and *it is aggressively cached.* Can you elaborate on this caching? Who is caching, why, and for how long? I'm a bit concerned with all this async stuff since IPs can be recycled in k8s. It is cached in the MX extension because the CPU cost to decode the header is significant https://github.com/istio/proxy/blob/master/extensions/metadata_exchange/plugin.cc#L116. There's no expiration or verification - it's easy to confuse telemetry with the same key. — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2QMKQL7RVKCBKBCEP3XGKSBBANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bleggett · 2023-05-16T17:55:46Z

@kdorosh @louiscryan

If every externally facing gateway acts as a metadata discovery service for its network, would that address your concern about the centralized metadata discovery service?

Concretely, a server telemetry producer would call gRPC POST asynchronously to retrieve metadata for an endpoint behind a gateway on a well-known endpoint. The gateway address would be either a network address or a dedicated header in the CONNECT request. Every sidecar could also respond to the Metadata discovery to itself.

I agree that signing the header would work as a delegation mechanism, although that would be the first instance in Istio. However, signing doesn't address the other problems with inline headers:
* coupling with the protocol (CONNECT)

* per-request overhead (higher in fact with signing)

* coupling with Istio proxies, a telemetry intermediary like Otel collector cannot participate in the telemetry production and off-load the proxies.

If every gateway acts as a metadata source for its network, how do you do control information leakage? Trusting remote proxies to be good consumers? Granted, information leakage is a concern already today with envoy-peer-metadata, but this seems messy.
Signing headers feels gross, it adds a lot of overhead, and doesn't help with 1). If you need to sign a header, then you shouldn't be using a header - something involving out-of-band checks against a singular identity to establish authenticity of source versus relying on request header signatures to establish authenticity of source (e.g. how SPIRE does it) makes a whole lot more sense as a scalable long-term option to me here.

costinm · 2023-05-16T19:28:30Z

I don't think gateways should or can act as MDS. Ztunnel or Istiod - yes, they are trustworthy and relatively well isolated. Ztunnel could be trusted for same-node. Trust can be delegated - Istiod may use XDS ( or other mechanisms ) to talk with other trusted metadata sources. But gateways should not be required to implement MDS - they do have a role in propagating the source ( X-F-F or the proxy protocol ), but we should be able to use non-Istio gateways and the infrastructure that has been around for a long time without inventing new requirements. This covers normal mesh - as we have today, I don't think we should couple federation or foreign control planes. While I agree that signed metadata is valuable - in particular for federation - I don't think we need to complicate things for normal mesh, where Istiod is already trusted. One problem is that current metadata is exposed only as XDS delta - and the ztunnel MDS server over HTTP only return peer identity. We should also expose it over HTTP in Istiod - and expose more info in ztunnel, maybe also in envoy sidecars for consistency.

…

On Tue, May 16, 2023 at 10:55 AM Ben Leggett ***@***.***> wrote: @kdorosh <https://github.com/kdorosh> @louiscryan <https://github.com/louiscryan> If every externally facing gateway acts as a metadata discovery service for its network, would that address your concern about the centralized metadata discovery service? Concretely, a server telemetry producer would call gRPC POST asynchronously to retrieve metadata for an endpoint behind a gateway on a well-known endpoint. The gateway address would be either a network address or a dedicated header in the CONNECT request. Every sidecar could also respond to the Metadata discovery to itself. I agree that signing the header would work as a delegation mechanism, although that would be the first instance in Istio. However, signing doesn't address the other problems with inline headers: * coupling with the protocol (CONNECT) * per-request overhead (higher in fact with signing) * coupling with Istio proxies, a telemetry intermediary like Otel collector cannot participate in the telemetry production and off-load the proxies. 1. If every gateway acts as a metadata source for its network, how do you do control information leakage? Trusting remote proxies? Granted, information leakage is a concern already today with envoy-peer-metadata, but this seems messy. 2. Signing headers feels gross, it adds a lot of overhead, and doesn't help with 1). If you need to sign a header, then you shouldn't be using a header - something involving out-of-band checks to establish authenticity of source versus relying on request header signatures to establish authenticity of source (e.g. how SPIRE does it) makes a whole lot more sense as a scalable option to me here. — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2SEYGXDFE2PWICUHXDXGO5S5ANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bleggett · 2023-05-23T14:26:28Z

Where I think I'm at on this:

If we have a metadata discovery service, it will have to be publicly exposed across clusters, and thus have basic authz controls at the very least. I don't think this means we shouldn't do it, but it's probably the largest risk to control for. We could do something where we expose an endpoint that takes a workload cert and returns a metadata blob if the cert was issued by our local CA and is still valid. At this point we're beginning to get pretty "NIH SPIRE"-y though and that means workload certs would have to be conveyed across clusters via XFCC or similar.
If we don't have a metadata discovery service, we need signed headers. And we should probably only construct/send those when crossing boundaries.
If we use signed headers, we should instead just use JWTs, as has been floated previously (which is effectively a purpose-built signed metadata blob header).
We can use a metadata service locally, and send JWTs across borders to avoid 1)

tl;dr it's a metadata service or JWTs, or a hybrid of the two. The hybrid is currently more appealing IMO especially since it can be implemented in stages.

The other option is "lean more heavily on SPIRE workload identity attestation and SPIRE workload identity federation" which is probably not simpler than any of the above for us to do, though it would offer additional attestation and PoP capabilities that the above options do not.

costinm · 2023-05-23T15:01:40Z

On Tue, May 23, 2023, 07:26 Ben Leggett ***@***.***> wrote: Where I think I'm at on this: 1. If we have a metadata discovery service, it will *have* to be publicly exposed across clusters, and thus have authz controls. I don't think this means we shouldn't do it, but it's probably the largest risk to control for. We could do something where we expose an endpoint that takes a workload cert and returns a metadata blob if the cert was issued by our local CA and is still valid. This is beginning to get pretty "NIH SPIRE"-y though.

Only in 'federation' cases ( which we don't support yet in Istio), and only in a particular design for federation. In Istio we still have each Istiod watch all clusters and pods - so it can generate EDS - which means it can generate MDS. And the XDS federation model (which is partially support) is also based on Istiod talking with other XDS servers. There is some auth and trust in both cases, of course, but istiod needs to authenticate itself to k8s or XDS servers. In other words - nothing special or different from how Istio EDS works, just a reverse index on the same info. Istiod in cluster is the trust anchor.

1. 2. If we don't have a metadata discovery service, we need signed headers. And we should probably only send those when crossing boundaries.I 1. f we use signed headers, we should instead just use JWTs.

I agree, a signed JWT (or peer cert) are good sources of info for remote clusters that are not part of the federated mesh. But parsing and verifying JWT for metadata has a cost - plus maintaining the roots, workloads getting signed JWTs - probably with audience because otherwise they're as good as regular headers. And we are moving into 'why not just use JWT plus TLS as alternative to client certs, and add meta to JWT instead of cert'. Which is not bad for peers outside of Istio MC or XDS federation.

1. We can use a metadata service locally, and send JWTs across borders to avoid 1) tl;dr it's a metadata service or JWTs, or a hybrid of the two.

Metadata service for current Istio use cases, JWT or client cert for the 'federated' model we will need to design and implement. —

…

Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2SAIQJDPIFRXOBXIH3XHTCJ5ANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bleggett · 2023-05-23T17:45:56Z

In Istio we still have each Istiod watch all clusters and pods - so it can
generate EDS - which means it can generate MDS.
And the XDS federation model (which is partially support) is also based on
Istiod talking with other XDS servers. There is some auth and trust in both
cases, of course, but istiod needs to authenticate itself to k8s or XDS
servers.

Makes sense, and as you mention this is simple for single-cluster to start with. I'm not sure how scalable it is to use XDS federation for all cases, but it's not the end of the world to convey the same info differently across boundaries if we need to, so it feels deferrable.

Either way, we'll probably eventually need either JWTs or a workload analog to the OIDC /userinfo endpoint for non-federated workload metadata.

And we are moving into 'why not just use JWT plus TLS as alternative to client certs, and add meta to JWT instead of cert'. Which is not bad for peers outside of Istio MC or XDS federation.

Certs are good simple identity documents, and terrible metadata stores. It makes sense to me to keep certs for identifying (used for authZ) metadata, and use a JWT (or a metadata endpoint that accepts an identity document) for non-identifying metadata (if we really ever need to shuttle baggage around as a blob).

probably with audience because otherwise they're as good as regular headers.

Meh, if they're signed by the workload cert I don't think aud helps us much but I might be wrong.

bleggett · 2023-05-25T17:03:28Z

Agree we probably need multiple mechanisms here. Putting peer metadata in certs is wildly impractical beyond the very very basic identity-specific fields the x509 spec dictates (which will not be sufficient for Istio's needs)

A peer metadata query service for cluster-local lookup makes sense since most of that info is readily available locally.
As @kdorosh said - we cannot force users to make cross network requests to a federated metadata service for telemetry information) so we need an alternate mechanism to propagate that metadata across boundaries.

if we have a workload metadata authority (whether strictly local or not) I think it's going to need to be replacable/composable in an equivalent way to how workload identity authorities (e.g. CA) are.

If I want my workload identity authority to attest more things about the workload than the default istio CA is capable of attesting, before issuing that workload an identity, I can replace the Istio CA with my own CA that does this.
If I want my workload metadata authority to attest more metadata about the workload than the default Istio workload metadata authority cares about, there needs to be a way to compose/append (or replace) that.

This is much simpler if the workload identity document and the workload metadata document are the same (JWT), but that's infeasible for Istio so I think we have to contend with the alternative options.

It's a little less simple if the workload identity document and workload metadata document are disjoint (workload cert + potential baggage JWT).

costinm · 2023-05-26T00:17:44Z

On Thu, May 25, 2023 at 10:03 AM Ben Leggett ***@***.***> wrote: Agree we probably need multiple mechanisms here. Putting peer metadata in certs is wildly impractical beyond the very very basic identity-specific fields the x509 spec dictates.

We have to disagree on the 'wildly' - x509 certs are broadly used for a lot of things outside of HTTPS, in SIM cards or passports and more. While ASN1 is not protobuf - and OIDs are inconvenient for OSS projects - it is a pretty extensible format that passed the test of time. JWTs may sound easier - but there are a lot of problems with them as well. But either way - if a CA provider manages to put metadata in a cert and sign it - I think we can agree that it's trust-worthy data and use it. It may turn out that very few CAs do, or only put little data - for example on GKE the pod name is included in an OID. I don't think it's unreasonable to have some config mapping OIDs to meta keys, if CAs are willing to sign them.

- A peer metadata query service for cluster-local lookup makes sense since most of that info is readily available locally. - As @kdorosh <https://github.com/kdorosh> said - we cannot force users to make cross network requests to a federated metadata service for telemetry information) so we need an alternate mechanism to propagate that metadata across boundaries. I agree we can't force user to make cross-network requests - but we can

require that the control planes federate. That's what the XDS federation is about. The metadata service is not a new invention: DNS PTR and geo-location databases have been used for a long time, and as you mentioned, OIDC and OAuth2 provide the userinfo endpoint for exactly this purpose - get metadata about a principal based on the token. Email and name and more. *if* we have a workload metadata authority (local or not) I think it's

going to need to be replacable/composable in an equivalent way to how workload identity authorities (e.g. CA) are. - If I want my workload identity authority to attest more things about the workload than the default istio CA is capable of attesting, before issuing that workload an identity, I can replace the Istio CA with my own CA that does this. - If I want my workload metadata authority to include more metadata about the workload than the default Istio workload metadata authority cares about, there needs to be a way to compose (or replace) that. This is much simpler if the workload identity document and the workload metadata document are the same (JWT), it's a little less simple if the workload identity document and workload metadata document are disjoint (workload cert + potential baggage JWT).

I don't think dealing with JWTs and OIDC and federated JWT signers is as easy as it seems. I'm all for using the JWTs when available, but I don't think we should go overboard - just like we should not go overboard with certificates. If we do adopt JWT - we should just use existing standards, OIDC/Oauth2 are not limited to humans and are broadly used for machine to machine. And they include the infra to sign JWTs, federate - and their own metadata service. However we should not exclude XDS federation either, since it provides more than just metadata. Message ID: ***@***.***>

…

louiscryan · 2023-07-14T17:04:53Z

@kyessenov

I think I'm relatively convinced that using WDS (Workload Discovery API) is the right mechanism for the majority of use-cases. I chatted a bit with @howardjohn and @costinm about this as well as @bleggett

Some basic constraints and supporting information....

We are already distributing workload metadata for the entire fleet visible to a single istiod instance to every ztunnel already so we are not particularly concerned with availability issues for resolving client metadata on the server side. It does not seem like we need to use on-demand loading at this time and if we have to add it later for scale reasons our mitigation approaches are viable
For more federated control-plane use-cases where traffic transits E-W Gateways but the remote workload information is opaque to the client we should send baggage so there needs to be a property in WDS indicating to ztunnel that it should do this. See @stevenctl recent work on this for WorkloadEntry
Using OTEL baggage as the encoding form seems innappropriate for this use case. Its specification indicates that its not hop-by-hop and there is no verification mechanism for the authenticity of the data. Instead it seems appropriate to use a JWT with signed claims to convey origin information. Having signed claims has the added benefit of allowing trusted claims to be used in policy decisions and not just telemetry. Discussion of whether this was possible with x509 was extensive (and likely ongoing) but this was the consensus.
As noted above this solution has the best performance dynamic for the majority of traffic flows (intra-cluster)
This solution allows the WDS and dataplanes to largely evolve independently, making it easier to enhance capabilities in either over time.

We discussed what the JWT should look like and how it should be constructed and signed. While details need to be worked out as a strawman we thought that:

JWT should be issued by and signed by istiod and delivered to ztunnel inside the WDS API. Something must attest to the claims it contains and istiod is an authority here
The JWT must be channel-bound to the x509 credential of the same workload identity. I.e When receiving the JWT the receiver must validate that it's 'sub' matches the SPIFFE ID of the SAN for the negotiated channel.
Since x509 certificates are subject to re-issue it is not clear that the JWTs require re-issue and so do not require an expiry.
The issuer should be istiod's identity and not any other account, particularly not any other workload identity so they cannot be used in other contexts to impersonate workloads.

The above needs to be put into a doc and thrashed out. It is of course unfortunate that we cannot use the certificate issuance and attestation flow to achieve the above effect but as long as we are confident in the security relationship between istiod and the apiserver this seems acceptable to layer on top.

bleggett · 2023-07-14T17:15:06Z

The JWT must be channel-bound to the x509 credential of the same workload identity. I.e When receiving the JWT the receiver must validate that it's 'sub' matches the SPIFFE ID of the SAN for the negotiated channel.

if we do this, then I don't think it matters what is used to sign the JWT (istiod cert or workload cert), and this:

The issuer should be istiod's identity and not any other account, particularly not any other workload identity so they cannot be used in other contexts to impersonate workloads.

becomes largely moot.

Unless we're just saying that impersonation is less likely with istiod because the istiod certs are "more protected" than the workload certs and therefore less susceptible to exfil - which is pretty tenuous - it will be ztunnel doing all impersonation no matter what.

If ztunnel is trusted to impersonate the workload for the purposes of constructing the channel then it arguably doesn't matter whether istiod or ztunnel sign/mint the JWT - it's a net-nil difference in impersonation risk.

But that's an impl detail, approach SGTM.

howardjohn · 2023-07-14T18:07:42Z

JWT should be issued by and signed by istiod and delivered to ztunnel inside the WDS API. Something must attest to the claims it contains and istiod is an authority here
The JWT must be channel-bound to the x509 credential of the same workload identity. I.e When receiving the JWT the receiver must validate that it's 'sub' matches the SPIFFE ID of the SAN for the negotiated channel.

Instead of having the CA issue a cert, and Istiod issue a JWT, then linking them by sub, is there a way to somehow directly sign the JWT with the workload certificate itself?

More or less agreeing with #43937 (comment) I think.

louiscryan · 2023-07-14T21:00:13Z

It depends whether we think the workload is a sufficiently reliable asserter of its claims. I think the receiving side would like some confidence that claims are trustable if policy is going to be enforced against them.

ztunnel can impersonate workloads but it can't/shouldn't be able to impersonate the credential used to sign the claims ideally. I agree this boils down to itiod's (or some CA's) credentials being better protected, we're certainly reliant on that. being the case for a CA.

We don't stritcly have to use the workload identity of istiod as the signing key, that choice is more flexible. A common secret for instance would suffice.

howardjohn · 2023-07-14T21:02:06Z

Ah good point. OK, I think I am on the same page as you then.

…

On Fri, Jul 14, 2023 at 2:00 PM Louis Ryan ***@***.***> wrote: It depends whether we think the workload is a sufficiently reliable asserter of its claims. I think the receiving side would like some confidence that claims are trustable if policy is going to be enforced against them. ztunnel can impersonate workloads but it can't/shouldn't be able to impersonate the credential used to sign the claims ideally. I agree this boils down to itiod's (or some CA's) credentials being better protected, we're certainly reliant on that. being the case for a CA. We don't stritcly have to use the workload identity of istiod as the signing key, that choice is more flexible. A common secret for instance would suffice. — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXL64BQZMVQXBAINDN3XQGXOPANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

louiscryan · 2023-07-14T21:19:15Z

Capturing some related work going on for reference

Standardizing some OIDs for K8S, primarily for machine identification in kubelet but seems like it could be leveraged to improve some of the information in x509 with some effort, particularly a workload identity in addition to SA identity in the SAN. Likely insufficient to obviate the need for a JWT to capture info but worth tracking...

kubernetes/k8s.io#1959

costinm · 2023-07-15T01:10:52Z

Let's not over-complicate things. A JWT - from any source that is configured as trusted ( with current APIs ) - is considered good enough for authenticating users - should be more than sufficient for telemetry. Binding JWTs to channel is an interesting improvement - we should do it for authentication JWTs first, but not required, they are not broadly used. I agree that ztunnel can't sign the JWTs - it may be trusted to impersonate and see traffic for a pod, but not to make or verify claims about the pod. Istiod could sign tokens as a fallback - but most orgs and clouds have IDP integrations and it's better to integrate with them and keep Istiod as last resort. Another thing we may want to reconsider is federation on multi-network / separated control planes. The WDS is not very different from having PTR records for the hostnames - if the cert or JWT has a real workload identity ( with FQDN of the workload ) - it should be possible to expose just the core metadata. Duplicated IP addresses or ephemeral IPs are not a concern as long as FQDN is used - and normally the internet requirement are for FQDNs to be unique. There are some things to resolve if all clusters use 'cluster.local' ( it's no longer a hierarchical naming with each admin domain having locally unique names) - but there are solutions. And that's where on-demand would fit well. I mentioned before - we should not dismiss DNS. Modern DNS is secure and extremely fast and scalable - and broadly used. Programming TXT records into DNS are broadly used and supported too, including tools for k8s to sync up with enterprise DNS servers. While WDS is great in clusters - falling back to DNS would make the telemetry work on internet scale.

…

On Fri, Jul 14, 2023 at 2:19 PM Louis Ryan ***@***.***> wrote: Capturing some related work going on for reference Standardizing some OIDs for K8S, primarily for machine identification in kubelet but seems like it could be leveraged to improve some of the information in x509 with some effort, particularly a workload identity in addition to SA identity in the SAN. Likely insufficient to obviate the need for a JWT to capture info but worth tracking... kubernetes/k8s.io#1959 <kubernetes/k8s.io#1959> — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2RJ32UX2C2H5QLTJHLXQGZV5ANCNFSM6AAAAAAV3DXGTM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kyessenov · 2023-10-04T18:09:10Z

cc @whitneygriffith

costinm · 2023-10-05T15:21:08Z

My position has changed a bit: while ztunnel should not be able to sign arbitrary identity JWTs, the JWT is a very flexible format allowing anyone to sign a statement. Security is not black and white, but incremental. Today status is any pod can lie about its metadata. Ztunnel signing a JWT - plus Ztunnel certificate or identity JWT attesting ztunnel node and identity, plus a check on the peer ztunnel would be a huge improvement, and the JWT can be passed through a chain of gateways ( east-west, ingress, egress, global LBs, etc). Bearer tokens are not perfect - but every solution has tradeoffs and it's good to allow users to make the choice that fits their needs and/or combine multiple mechanisms. So I would now vote with 'all of the above' ( in time ) - WDS, DNS records, JWTs signed by ztunnel, JWTs signed by an IDP, direct use of K8S APIs - depending on availability of the infra and the use cases. We already have plenty of experience with handling JWTs, DNS and XDS, and as we see in OTel SDK, using plugins/extensions to extract telemetry from diverse platforms/discovery systems is quite valuable. However I think this work would be far more valuable in the context of extending OTel collector - so it's not Istio specific and we can better integrate with other telemetry producers.

…

On Wed, Oct 4, 2023 at 11:09 AM Kuat ***@***.***> wrote: cc @whitneygriffith <https://github.com/whitneygriffith> — Reply to this email directly, view it on GitHub <#43937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2VB76A3UREQXBK7RZTX5WQ5BAVCNFSM6AAAAAAV3DXGTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGQYDAMBSHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kyessenov · 2023-11-13T22:15:21Z

Refs: #47205, #47584

linsun · 2023-12-19T19:46:30Z

@kyessenov is this already implemented?

from our prior chat:

Metadata discovery requires "ambient WDS controller" on by default. Waypoint always uses it.

If I do enable the ambient PEER_METADATA_DISCOVERY, I'll be able to use the metadata discovery service for sidecar.

kyessenov · 2023-12-19T19:53:36Z

Yes, it can be used on sidecar as a fallback. The traditional headers will take priority.

linsun · 2024-01-26T21:28:44Z

Cool, anything remaining or should we close this out?

kyessenov · 2024-04-04T17:21:46Z

It's done as opt-in and on waypoints. Changing the defaults for Istio is difficult due to compatibility concerns, we'll need a separate issue for that.

zirain added the area/extensions and telemetry label Mar 17, 2023

kyessenov mentioned this issue Apr 5, 2023

Strip internal mesh-machinery headers when sending requests/responses out of mesh #17635

Open

kdorosh mentioned this issue May 15, 2023

Support Multi Network - HBONE protocol + metadata needs origin network istio/ztunnel#515

Open

keithmattix added the Ambient Beta Must have for Beta of Ambient Mesh label Jul 10, 2023

kyessenov mentioned this issue Jul 14, 2023

metadata_exchange: combine into native implementation istio/proxy#4789

Merged

bleggett mentioned this issue Jul 21, 2023

Initial MDS implementation istio/ztunnel#504

Closed

kyessenov mentioned this issue Aug 22, 2023

HBONE Specification? istio/ztunnel#660

Closed

kyessenov mentioned this issue Oct 5, 2023

tcp mx: implement WDS fallback istio/proxy#4994

Merged

stevenctl mentioned this issue Dec 14, 2023

Waypoint Sandwich #48362

Open

9 tasks

linsun assigned kyessenov Dec 19, 2023

kyessenov mentioned this issue Dec 21, 2023

Upstreaming istio filters envoyproxy/envoy#29681

Open

kyessenov closed this as completed Apr 4, 2024

howardjohn mentioned this issue Apr 4, 2024

Enable metadata support for peers not in mesh #24302

Closed

Metadata discovery service #43937

Metadata discovery service #43937

Comments

kyessenov commented Mar 15, 2023

Requirements

Problems with the status quo

1. HTTP metadata exchange

2. TCP ALPN metadata exchange

3. Baggage: HTTP CONNECT metadata exchange

4. CDS and EDS

Proposal

Risks and Mitigations

1. Control plane load

2. Data plane load

howardjohn commented Mar 15, 2023

costinm commented Mar 16, 2023

kyessenov commented Apr 4, 2023

lei-tang commented Apr 5, 2023

kyessenov commented Apr 5, 2023

howardjohn commented Apr 6, 2023

costinm commented Apr 6, 2023 via email

kyessenov commented Apr 6, 2023

costinm commented Apr 7, 2023 via email

louiscryan commented May 15, 2023

kdorosh commented May 15, 2023

kyessenov commented May 15, 2023

kdorosh commented May 15, 2023

kyessenov commented May 15, 2023

costinm commented May 16, 2023 via email

bleggett commented May 16, 2023 • edited Loading

costinm commented May 16, 2023 via email

bleggett commented May 23, 2023 • edited Loading

costinm commented May 23, 2023 via email

bleggett commented May 23, 2023 • edited Loading

bleggett commented May 25, 2023 • edited Loading

costinm commented May 26, 2023 via email

louiscryan commented Jul 14, 2023

bleggett commented Jul 14, 2023 • edited Loading

howardjohn commented Jul 14, 2023

louiscryan commented Jul 14, 2023

howardjohn commented Jul 14, 2023 via email

louiscryan commented Jul 14, 2023

costinm commented Jul 15, 2023 via email

kyessenov commented Oct 4, 2023

costinm commented Oct 5, 2023 via email

kyessenov commented Nov 13, 2023

linsun commented Dec 19, 2023

kyessenov commented Dec 19, 2023

linsun commented Jan 26, 2024

kyessenov commented Apr 4, 2024

bleggett commented May 16, 2023 •

edited

Loading

bleggett commented May 23, 2023 •

edited

Loading

bleggett commented May 23, 2023 •

edited

Loading

bleggett commented May 25, 2023 •

edited

Loading

bleggett commented Jul 14, 2023 •

edited

Loading