-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata discovery service #43937
Comments
One thing I wasn't sure about - if I get a request from 1.2.3.4, that IP may live in other networks. How do we handle this? (sorry if its discussed already, wanted to comment befor I forgot) |
Related comment: please add a section on how we get the IP and network. Default is simple - from the network layer. As a server - x-forwarder-for or forwarded headers I assume, but only if the peer is auth and is a gateway or waypoint. As client - don't remember if xff is propagated on return, need to check. Would be worth adding a section on security ( can a client forge his telemetry ?). I still believe we should include podname and clustername in all responses and allow hostname in addition to network+ip - I would use clustername instead of network, since Istiod may need to do on-demand lookup too in very large meshes. |
@howardjohn For multi-network, the proposal is to aggregate metadata in a federated way. That means that the metadata provider can access metadata from any mesh endpoint. Specifically, for multi-cluster endpoints we'd expose k8s metadata from all clusters to the provider. @costinm Those are good points, and I don't have a good answer for all of them and deliberately left them underspecified.
I think there's a need to standardize on the "workload identification headers" in HBONE protocol. Baggage is an indirect way to deliver this information, but I think we need a first class representation for it, not assume only telemetry usage. For background information, OTEL k8s attribute processor fulfills the same design goal in a broader k8s context. |
Hi Kuat, thanks for the proposal! Can you add a detailed workflow diagram to help readers better understand the proposal? |
@lei-tang This is a basic workflow for metadata discovery by a gateway: CC @markdroth : FYI a proposal to drop "peer metadata" header from transport protocol requirements, and rely on a separate "back-fill" metadata flow. This aligns well with OTel processor pipeline architecture, instantiated as a custom xDS-based telemetry processor in Envoy. |
One thing I think Baggage gives that this doesn't is the ability for the client to tell the server which Service it access it through |
IMO telemetry is not an exact science - and should use whatever metadata is
available, from the most accurate source it has access to.
If a Baggage header from the client exists - and it includes the service it
access through - we should use it. Otherwise - canonical service is likely
good enough,
and what we would use if the client is external ( or not using Istio ).
I want to also make sure we take into account other headers that are part
of routing. For example for session affinity to work, we need to encode
the cluster ( == service ) into the cookie or some header. We discussed in
the past that for many CONNECT clients the authority header will
be the hostname ( == service ). While now we set it to the IP, the protocol
should probably preserve the VIP and hostname when available and
include it in standard headers ( XFF, etc). And I think telemetry should
also get info from standard headers ( authority, XFF, etc), if available.
…On Thu, Apr 6, 2023 at 12:47 PM John Howard ***@***.***> wrote:
One thing I think Baggage gives that this doesn't is the ability for the
client to tell the server which Service it access it through
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2WIL4UQZG5MPBCS25DW74MVXANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
FYI destination service was never in scope for the metadata exchange. It's a property of a request, while the subject is a description of the peer. Many things break if you try to put per-request property onto peer metadata since the consumers assume peer metadata is immutable and it is aggressively cached. Besides that, yes it could be useful. We had that before with a mixer attribute but it was not a popular design choice. |
Destination is not part of the 'metadata exchange', just info from client
to server.
But it is valid and useful telemetry info for logs or traces - and we
should send it if available ( VIP or clustername).
…On Thu, Apr 6, 2023 at 1:54 PM Kuat ***@***.***> wrote:
FYI destination service was never in scope for the metadata exchange. It's
a property of a request, while the subject is a description of the peer.
Many things break if you try to put per-request property onto peer metadata
since the consumers assume peer metadata is immutable and it is
aggressively cached.
Besides that, yes it could be useful. We had that before with a mixer
attribute but it was not a popular design choice.
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2UCKWOJXSFD3IBRUDDW74UO5ANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Trying to catch up on this a bit and it seems like it might be worth an explicit review. I'm not totally sold that we can make federation work reliably for multi-network solutions. We already allow for connectivity without full multi-network knowledge via loose coupling. This might even be problematic within large networks with very high cluster counts. As for keeping bandwidth down I don't see the option of simply POSTing the metadata on connection initiation which would outperform putting into CONNECT headers. The reserved endpoint for this can be versioned so we could actually treat this like a real API. I do agree that many of the alternative methods are pretty terrible. Finally for verification we can make sure inlined baggage is signed by an authority like the control plane which is what was proposed above. (Aside - this feels a lot like putting things into SANs so might be worth talking about those two things together) I generally agree with @costinm that we can do both and fallback to the control-plane if we can't resolve inline passing the caller identity etc. |
just reiterating what @louiscryan said above in a different way: beyond loose coupling, many isolated control plane architectures have strong requirements to preserve telemetry without allowing contact across boundaries (all communication is coming through network gateways, which is exactly what we want telemetry on. we cannot force users to make cross network requests to a federated metadata service for telemetry information) |
If every externally facing gateway acts as a metadata discovery service for its network, would that address your concern about the centralized metadata discovery service? Concretely, a server telemetry producer would call gRPC POST asynchronously to retrieve metadata for an endpoint behind a gateway on a well-known endpoint. The gateway address would be either a network address or a dedicated header in the CONNECT request. Every sidecar could also respond to the Metadata discovery to itself. I agree that signing the header would work as a delegation mechanism, although that would be the first instance in Istio. However, signing doesn't address the other problems with inline headers:
|
I think it could, provided we update HBONE baggage or something to include the source network per istio/ztunnel#515 Also in #43937 (comment)
Can you elaborate on this caching? Who is caching, why, and for how long? I'm a bit concerned with all this async stuff since IPs can be recycled in k8s. Also generally worth discussing.. I think this metadata service may have to handle a very high amount of requests (barring aggressive caching, concerns noted above) and that comes with its own operational and CPU/memory costs. Given that for request metrics (not peer metadata) we will still need HTTP headers or TCP metadata for things like originating client IP, originating network, etc.. it seems like we're already paying the cost on each request and that baggage is not as painful as it seems. |
It is cached in the MX extension because the CPU cost to decode the header is significant https://github.com/istio/proxy/blob/master/extensions/metadata_exchange/plugin.cc#L116. There's no expiration or verification - it's easy to confuse telemetry with the same key. |
I'm not even sure what 'federated' means - Istio doesn't really support
multiple meshes, and all multi-cluster and multi-network
is based on a single security and discovery domain.
Federations are interesting - and there are various options, but we can
focus on the current feature set of Istio and
maybe use a different solution for federation ( where boundaries,
'authorities', root CAs, for each federated entity are defined).
With Istio as it is today and 'flat network' - I think the IP that is
intercepted and used in the flat network is fine.
With multi-network - either overlapping IPs or if we have to go through an
East-West gateway - we will clearly
need the gateway to pass information. But we do need this for ingress and
egress as well.
It is not exclusive - i.e. we must pick only one source of metadata,
nothing else allowed. We use the best metadata
we find - if the peer is a waypoint/E-W/ingress/egress we rely on CONNECT
and headers ( after we verify
the identity of the peer as a trusted gateway), if the peer is a
same-cluster or a non-istio workload in the
cluster - we can get the info using MDS.
And I agree that if a JWT or other signed info is present - either in-band
or in a source we can pull on demand
from ( including DNS-SEC signed records for example ) - that's a golden
signal, and Istiod could integrate
with such sources.
Federated versions of 'telemetry meta for IPs' have worked relatively well
for >20 years, as DNS PTR, WHOIS, geolocation
and other sources. At least for telemetry and abuse - and continue to be
very useful for ingress traffic. I don't think
it would be bad if someone wrote a 'workload discovery server' to integrate
with such sources for ingress, and
Istiod would delegate to it for the public IP ranges.
…On Mon, May 15, 2023 at 3:05 PM Kuat ***@***.***> wrote:
Many things break if you try to put per-request property onto peer
metadata since the consumers assume peer metadata is immutable and *it is
aggressively cached.*
Can you elaborate on this caching? Who is caching, why, and for how long?
I'm a bit concerned with all this async stuff since IPs can be recycled in
k8s.
It is cached in the MX extension because the CPU cost to decode the header
is significant
https://github.com/istio/proxy/blob/master/extensions/metadata_exchange/plugin.cc#L116.
There's no expiration or verification - it's easy to confuse telemetry with
the same key.
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2QMKQL7RVKCBKBCEP3XGKSBBANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I don't think gateways should or can act as MDS.
Ztunnel or Istiod - yes, they are trustworthy and relatively well isolated.
Ztunnel could be trusted for same-node.
Trust can be delegated - Istiod may use XDS ( or other mechanisms ) to talk
with other trusted metadata sources.
But gateways should not be required to implement MDS - they do have a role
in propagating the source ( X-F-F or
the proxy protocol ), but we should be able to use non-Istio gateways and
the infrastructure that has been around
for a long time without inventing new requirements.
This covers normal mesh - as we have today, I don't think we should couple
federation or foreign control planes.
While I agree that signed metadata is valuable - in particular for
federation - I don't think we need to complicate things
for normal mesh, where Istiod is already trusted.
One problem is that current metadata is exposed only as XDS delta - and the
ztunnel MDS server over HTTP only
return peer identity. We should also expose it over HTTP in Istiod - and
expose more info in ztunnel, maybe also
in envoy sidecars for consistency.
…On Tue, May 16, 2023 at 10:55 AM Ben Leggett ***@***.***> wrote:
@kdorosh <https://github.com/kdorosh> @louiscryan
<https://github.com/louiscryan>
If every externally facing gateway acts as a metadata discovery service
for its network, would that address your concern about the centralized
metadata discovery service?
Concretely, a server telemetry producer would call gRPC POST
asynchronously to retrieve metadata for an endpoint behind a gateway on a
well-known endpoint. The gateway address would be either a network address
or a dedicated header in the CONNECT request. Every sidecar could also
respond to the Metadata discovery to itself.
I agree that signing the header would work as a delegation mechanism,
although that would be the first instance in Istio. However, signing
doesn't address the other problems with inline headers:
* coupling with the protocol (CONNECT)
* per-request overhead (higher in fact with signing)
* coupling with Istio proxies, a telemetry intermediary like Otel collector cannot participate in the telemetry production and off-load the proxies.
1.
If every gateway acts as a metadata source for its network, how do you
do control information leakage? Trusting remote proxies? Granted,
information leakage is a concern already today with envoy-peer-metadata,
but this seems messy.
2.
Signing headers feels gross, it adds a lot of overhead, and doesn't
help with 1). If you need to sign a header, then you shouldn't be using a
header - something involving out-of-band checks to establish authenticity
of source versus relying on request header signatures to establish
authenticity of source (e.g. how SPIRE does it) makes a whole lot more
sense as a scalable option to me here.
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2SEYGXDFE2PWICUHXDXGO5S5ANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Where I think I'm at on this:
tl;dr it's a metadata service or JWTs, or a hybrid of the two. The hybrid is currently more appealing IMO especially since it can be implemented in stages. The other option is "lean more heavily on SPIRE workload identity attestation and SPIRE workload identity federation" which is probably not simpler than any of the above for us to do, though it would offer additional attestation and PoP capabilities that the above options do not. |
On Tue, May 23, 2023, 07:26 Ben Leggett ***@***.***> wrote:
Where I think I'm at on this:
1.
If we have a metadata discovery service, it will *have* to be publicly
exposed across clusters, and thus have authz controls. I don't think this
means we shouldn't do it, but it's probably the largest risk to control
for. We could do something where we expose an endpoint that takes a
workload cert and returns a metadata blob if the cert was issued by our
local CA and is still valid. This is beginning to get pretty "NIH SPIRE"-y
though.
Only in 'federation' cases ( which we don't support yet in Istio), and only
in a particular design for federation.
In Istio we still have each Istiod watch all clusters and pods - so it can
generate EDS - which means it can generate MDS.
And the XDS federation model (which is partially support) is also based on
Istiod talking with other XDS servers. There is some auth and trust in both
cases, of course, but istiod needs to authenticate itself to k8s or XDS
servers.
In other words - nothing special or different from how Istio EDS works,
just a reverse index on the same info. Istiod in cluster is the trust
anchor.
1.
2.
If we don't have a metadata discovery service, we need signed headers.
And we should probably only send those when crossing boundaries.I
1.
f we use signed headers, we should instead just use JWTs.
I agree, a signed JWT (or peer cert) are good sources of info for remote
clusters that are not part of the federated mesh.
But parsing and verifying JWT for metadata has a cost - plus maintaining
the roots, workloads getting signed JWTs - probably with audience because
otherwise they're as good as regular headers.
And we are moving into 'why not just use JWT plus TLS as alternative to
client certs, and add meta to JWT instead of cert'. Which is not bad for
peers outside of Istio MC or XDS federation.
1.
We can use a metadata service locally, and send JWTs across borders to
avoid 1)
tl;dr it's a metadata service or JWTs, or a hybrid of the two.
Metadata service for current Istio use cases, JWT or client cert for the
'federated' model we will need to design and implement.
—
… Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2SAIQJDPIFRXOBXIH3XHTCJ5ANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Makes sense, and as you mention this is simple for single-cluster to start with. I'm not sure how scalable it is to use XDS federation for all cases, but it's not the end of the world to convey the same info differently across boundaries if we need to, so it feels deferrable. Either way, we'll probably eventually need either JWTs or a workload analog to the OIDC
Certs are good simple identity documents, and terrible metadata stores. It makes sense to me to keep certs for identifying (used for authZ) metadata, and use a JWT (or a metadata endpoint that accepts an identity document) for non-identifying metadata (if we really ever need to shuttle baggage around as a blob).
Meh, if they're signed by the workload cert I don't think |
Agree we probably need multiple mechanisms here. Putting peer metadata in certs is wildly impractical beyond the very very basic identity-specific fields the x509 spec dictates (which will not be sufficient for Istio's needs)
if we have a workload metadata authority (whether strictly local or not) I think it's going to need to be replacable/composable in an equivalent way to how workload identity authorities (e.g. CA) are.
This is much simpler if the workload identity document and the workload metadata document are the same (JWT), but that's infeasible for Istio so I think we have to contend with the alternative options. It's a little less simple if the workload identity document and workload metadata document are disjoint (workload cert + potential baggage JWT). |
On Thu, May 25, 2023 at 10:03 AM Ben Leggett ***@***.***> wrote:
Agree we probably need multiple mechanisms here. Putting peer metadata in
certs is wildly impractical beyond the very very basic identity-specific
fields the x509 spec dictates.
We have to disagree on the 'wildly' - x509 certs are broadly used for a lot
of things outside of HTTPS, in SIM cards or passports and more.
While ASN1 is not protobuf - and OIDs are inconvenient for OSS projects -
it is a pretty extensible format that passed the test of time.
JWTs may sound easier - but there are a lot of problems with them as well.
But either way - if a CA provider manages to put metadata in a cert and
sign it - I think we can agree that it's trust-worthy data and use it.
It may turn out that very few CAs do, or only put little data - for example
on GKE the pod name is included in an OID.
I don't think it's unreasonable to have some config mapping OIDs to meta
keys, if CAs are willing to sign them.
- A peer metadata query service for cluster-local lookup makes sense
since most of that info is readily available locally.
- As @kdorosh <https://github.com/kdorosh> said - we cannot force
users to make cross network requests to a federated metadata service for
telemetry information) so we need an alternate mechanism to propagate
that metadata across boundaries.
I agree we can't force user to make cross-network requests - but we can
require that the control planes federate. That's what the XDS federation is
about.
The metadata service is not a new invention: DNS PTR and geo-location
databases have been used for a long time, and as you mentioned,
OIDC and OAuth2 provide the userinfo endpoint for exactly this purpose -
get metadata about a principal based on the token. Email and name and more.
*if* we have a workload metadata authority (local or not) I think it's
going to need to be replacable/composable in an equivalent way to how
workload identity authorities (e.g. CA) are.
- If I want my workload identity authority to attest more things about
the workload than the default istio CA is capable of attesting, before
issuing that workload an identity, I can replace the Istio CA with my own
CA that does this.
- If I want my workload metadata authority to include more metadata
about the workload than the default Istio workload metadata authority cares
about, there needs to be a way to compose (or replace) that.
This is much simpler if the workload identity document and the workload
metadata document are the same (JWT), it's a little less simple if the
workload identity document and workload metadata document are disjoint
(workload cert + potential baggage JWT).
I don't think dealing with JWTs and OIDC and federated JWT signers is as
easy as it seems. I'm all for using the JWTs when available, but I don't
think we should
go overboard - just like we should not go overboard with certificates.
If we do adopt JWT - we should just use existing standards, OIDC/Oauth2 are
not limited to humans and are broadly used for machine to machine.
And they include the infra to sign JWTs, federate - and their own metadata
service.
However we should not exclude XDS federation either, since it provides more
than just metadata.
Message ID: ***@***.***>
… |
I think I'm relatively convinced that using WDS (Workload Discovery API) is the right mechanism for the majority of use-cases. I chatted a bit with @howardjohn and @costinm about this as well as @bleggett Some basic constraints and supporting information....
We discussed what the JWT should look like and how it should be constructed and signed. While details need to be worked out as a strawman we thought that:
The above needs to be put into a doc and thrashed out. It is of course unfortunate that we cannot use the certificate issuance and attestation flow to achieve the above effect but as long as we are confident in the security relationship between istiod and the apiserver this seems acceptable to layer on top. |
if we do this, then I don't think it matters what is used to sign the JWT (istiod cert or workload cert), and this:
becomes largely moot. Unless we're just saying that impersonation is less likely with istiod because the istiod certs are "more protected" than the workload certs and therefore less susceptible to exfil - which is pretty tenuous - it will be ztunnel doing all impersonation no matter what. If ztunnel is trusted to impersonate the workload for the purposes of constructing the channel then it arguably doesn't matter whether istiod or ztunnel sign/mint the JWT - it's a net-nil difference in impersonation risk. But that's an impl detail, approach SGTM. |
Instead of having the CA issue a cert, and Istiod issue a JWT, then linking them by More or less agreeing with #43937 (comment) I think. |
It depends whether we think the workload is a sufficiently reliable asserter of its claims. I think the receiving side would like some confidence that claims are trustable if policy is going to be enforced against them. ztunnel can impersonate workloads but it can't/shouldn't be able to impersonate the credential used to sign the claims ideally. I agree this boils down to itiod's (or some CA's) credentials being better protected, we're certainly reliant on that. being the case for a CA. We don't stritcly have to use the workload identity of istiod as the signing key, that choice is more flexible. A common secret for instance would suffice. |
Ah good point. OK, I think I am on the same page as you then.
…On Fri, Jul 14, 2023 at 2:00 PM Louis Ryan ***@***.***> wrote:
It depends whether we think the workload is a sufficiently reliable
asserter of its claims. I think the receiving side would like some
confidence that claims are trustable if policy is going to be enforced
against them.
ztunnel can impersonate workloads but it can't/shouldn't be able to
impersonate the credential used to sign the claims ideally. I agree this
boils down to itiod's (or some CA's) credentials being better protected,
we're certainly reliant on that. being the case for a CA.
We don't stritcly have to use the workload identity of istiod as the
signing key, that choice is more flexible. A common secret for instance
would suffice.
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXL64BQZMVQXBAINDN3XQGXOPANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Capturing some related work going on for reference Standardizing some OIDs for K8S, primarily for machine identification in kubelet but seems like it could be leveraged to improve some of the information in x509 with some effort, particularly a workload identity in addition to SA identity in the SAN. Likely insufficient to obviate the need for a JWT to capture info but worth tracking... |
Let's not over-complicate things.
A JWT - from any source that is configured as trusted ( with current APIs )
- is considered good enough for authenticating users - should be more
than sufficient for telemetry. Binding JWTs to channel is an interesting
improvement - we should do it for authentication JWTs first, but not
required,
they are not broadly used.
I agree that ztunnel can't sign the JWTs - it may be trusted to impersonate
and see traffic for a pod, but not to make or verify claims about the pod.
Istiod could sign tokens as a fallback - but most orgs and clouds have IDP
integrations and it's better to integrate with them and keep Istiod as last
resort.
Another thing we may want to reconsider is federation on multi-network /
separated control planes. The WDS is not very different from having PTR
records for the hostnames - if the cert or JWT has a real workload identity
( with FQDN of the workload ) - it should be possible to expose just the
core metadata. Duplicated IP addresses or ephemeral IPs are not a concern
as long as FQDN is used - and normally the internet requirement
are for FQDNs to be unique. There are some things to resolve if all
clusters use 'cluster.local' ( it's no longer a hierarchical naming with
each
admin domain having locally unique names) - but there are solutions. And
that's where on-demand would fit well.
I mentioned before - we should not dismiss DNS. Modern DNS is secure and
extremely fast and scalable - and broadly used. Programming
TXT records into DNS are broadly used and supported too, including tools
for k8s to sync up with enterprise DNS servers. While WDS is great
in clusters - falling back to DNS would make the telemetry work on internet
scale.
…On Fri, Jul 14, 2023 at 2:19 PM Louis Ryan ***@***.***> wrote:
Capturing some related work going on for reference
Standardizing some OIDs for K8S, primarily for machine identification in
kubelet but seems like it could be leveraged to improve some of the
information in x509 with some effort, particularly a workload identity in
addition to SA identity in the SAN. Likely insufficient to obviate the need
for a JWT to capture info but worth tracking...
kubernetes/k8s.io#1959 <kubernetes/k8s.io#1959>
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2RJ32UX2C2H5QLTJHLXQGZV5ANCNFSM6AAAAAAV3DXGTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
My position has changed a bit: while ztunnel should not be able to sign
arbitrary identity JWTs, the JWT
is a very flexible format allowing anyone to sign a statement. Security is
not black and white, but incremental.
Today status is any pod can lie about its metadata. Ztunnel signing a JWT -
plus Ztunnel certificate or identity JWT
attesting ztunnel node and identity, plus a check on the peer ztunnel would
be a huge improvement, and the JWT
can be passed through a chain of gateways ( east-west, ingress, egress,
global LBs, etc). Bearer tokens are not
perfect - but every solution has tradeoffs and it's good to allow users to
make the choice that fits their needs and/or
combine multiple mechanisms.
So I would now vote with 'all of the above' ( in time ) - WDS, DNS records,
JWTs signed by ztunnel, JWTs signed
by an IDP, direct use of K8S APIs - depending on availability of the infra
and the use cases. We already have
plenty of experience with handling JWTs, DNS and XDS, and as we see in OTel
SDK, using plugins/extensions
to extract telemetry from diverse platforms/discovery systems is quite
valuable.
However I think this work would be far more valuable in the context of
extending OTel collector - so it's not Istio
specific and we can better integrate with other telemetry producers.
…On Wed, Oct 4, 2023 at 11:09 AM Kuat ***@***.***> wrote:
cc @whitneygriffith <https://github.com/whitneygriffith>
—
Reply to this email directly, view it on GitHub
<#43937 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2VB76A3UREQXBK7RZTX5WQ5BAVCNFSM6AAAAAAV3DXGTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGQYDAMBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kyessenov is this already implemented? from our prior chat: Metadata discovery requires "ambient WDS controller" on by default. Waypoint always uses it. If I do enable the ambient PEER_METADATA_DISCOVERY, I'll be able to use the metadata discovery service for sidecar. |
Yes, it can be used on sidecar as a fallback. The traditional headers will take priority. |
Cool, anything remaining or should we close this out? |
It's done as opt-in and on waypoints. Changing the defaults for Istio is difficult due to compatibility concerns, we'll need a separate issue for that. |
This is a proposal to use the Workload Discovery Service in the Istio mesh as the source of the peer metadata for the telemetry.
This effectively deprecates the requirement to supply the baggage header in the request/response pair in the HBONE protocol, and provides an alternative design for the existing metadata exchange protocols in the sidecar mesh.
Requirements
R1: Ambient mesh produces Istio peer telemetry
istio_requests_total
for backwards compatibility with the sidecar mode. Additionally, any new telemetry proposals using Otel are supported.R2: Pay-as-you-go for the telemetry - users should not carry costs unless they choose to get the value of the telemetry.
R3: Minimal requirements on the mesh members to join the mesh.
R4: Trust in the peer metadata.
Problems with the status quo
The current design for the telemetry production relies on several disjoint mechanisms to produce the standard telemetry in the Istio mesh:
1. HTTP metadata exchange
HTTP sidecar telemetry relies on the special headers x-envoy-peer-metadata to announce the source metadata per each request. This design is very flexible, but unfortunately it suffers from several issues: high cost and exclusive headers. First, the high per-request cost of the telemetry: each request gets up to 4K overhead on the wire, and up to 10% CPU overhead decoding the header. This is counter to the goal R2 - any Istio telemetry forces enablement of HTTP metadata exchange, which incurs significant data plane overhead. Second, special headers are counter to goal R3. To produce the telemetry on the server, clients have to synthesize the non-standard Istio header. Third, there is no way to establish the integrity of the metadata for R4, since the peer metadata is not signed by a trusted authority.
2. TCP ALPN metadata exchange
TCP traffic uses a different protocol that relies on the bytes prefix in TCP connections, guarded by a special TLS ALPN string. This proposal violates R3 - custom ALPN and custom wire encoding is incompatible with any client except the Istio sidecar. Similar to the above, R4 is not satisfied, and the metadata is untrusted.
3. Baggage: HTTP CONNECT metadata exchange
This is the same as the first design, but instead works on the longer living tunnel HTTP CONNECT streams. It suffers from the same problems: extra cost on the wire, extra cost for the interoperation, lack of integrity. The main benefit is better protocol encapsulation, since the applications no longer see the metadata header in-flight. Current implementation only supports the server-side reporting, with the client telemetry reporting depending on EDS design below.
4. CDS and EDS
This is a fallback mechanism to use the dynamic metadata from CDS and EDS in case the metadata exchange fails. This is generally useful for the failure scenarios and for destinations outside the mesh. However, this clearly violates R2 - any load balancing server has to provide the peer metadata as part of the response, which is traditionally outside the domain of the global load balancing. A load balancing client also receives a much larger xDS even when not using the telemetry included with it. Interoperation in R3 is improved since there is no special data protocol, and the integrity R4 can be implemented by the control plane.
Proposal
We propose to replace all of the above with a dedicated workload metadata service, with the following payload and IP address as the resource key:
This metadata service is xDS-based, supports on-demand lookup, and allows querying metadata by network IP directly. The proposal satisfies all the requirements:
R1: The metadata content in the service subsumes the existing peer metadata.
R2: The service is optional and has zero wire and per-request CPU costs. The memory cost is O(workloads) with SotW xDS and is identical to the existing HTTP metadata exchange with on-demand delta xDS.
R3: Since this proposal eliminates the requirement for special headers or protocols, it’s simpler for endpoints to interoperate with the mesh.
R4: The integrity of the metadata can be established by using trusted lookup keys in the service, e.g. by using IPs when they are not easily spoofed, or by using extra attributes in the TLS certificates.
It is worth noting that a dedicated service is a better alternative to the CDS/EDS design above since it allows decoupling the metadata provider from the load balancing concern. It permits metadata to be keyed by something other than the endpoint addresses, maximizes sharing of data (same endpoints in two clusters). A dedicated client to the metadata service can be embedded in the user applications, or in the telemetry processing pipeline (e.g OTel collector) for backfilling data, instead of shoehorning CDS/EDS for the same purpose.
Risks and Mitigations
1. Control plane load
As for any new xDS, there is a risk with a new operational load on the control plane. It’s worth noting that EDS already includes the majority of the information needed, and is consumed by all endpoints, therefore imposing the load in the sidecar model. To minimize the risk, we propose to share the xDS pipeline with ztunnel, and re-purpose the existing WDS as-is. We intentionally deleted the PTR parts of the proto related to the authorization, since that is outside the scope of the proposal.
2. Data plane load
There is additional memory overhead required to hold the peer metadata in the proxies. To minimize the risk, we propose to couple metadata discovery with HBONE. In other words, HBONE would require metadata discovery to produce Istio telemetry in all clients of HBONE. This would leave sidecars without HBONE safe, and allow us to gradually gain experience with the service as HBONE matures. A longer term approach to minimize the memory overhead is to switch to the on-demand model. This would require modifying the telemetry pipelines in Envoy to be asynchronous and await for xDS response before flushing the telemetry reports. This can be delayed until there’s a strong need for it, since xDS protocol fully supports this mode of operation.
The text was updated successfully, but these errors were encountered: