Initial MDS implementation #504

howardjohn · 2023-04-25T17:56:09Z

Store each connection in a shared map
Capture special IP address, redirect to MDS handler
MDS handler takes 4 tuple as input.
Response is JSON containing identity

Also has an example Golang library implementation to add HTTP middleware that extracts the identity.

Istio tests: istio/istio#44536 (blocked by this PR, of course)

costinm

Few comments- it's ok to just add TODOs and comments in the code, no need to resolve them in this PR, better to merge and iterate.

costinm · 2023-04-25T18:15:30Z

go-metadata/examples/main.go

+		fatal(err)
+
+		si, err := istiometadata.FetchFromClientConnection(conn)
+		fatal(err)


In a future PR we should add some code to test that MDS doesn't return anything after close is called.

I would change this to a plain TCP echo ( also in future PR) and add an example for HTTP and gRPC but more proper, i.e. using the library and usable in real life apps.

But this is good enough for now.

costinm · 2023-04-25T18:17:01Z

go-metadata/istio.go

+}
+
+// metadataContextKey is the key the metadata handler will store connection metadata in
+var metadataContextKey = &contextKey{}


As API, not a big fan of using context as a hash map and forcing users to use this pattern.

Better to have a clean interface - and let users wrap or save the info however they want. Most likely this will
be integrated into another library ( otel, authz )

I generally agree but I think this is the only way to do it as an HTTP handler? unless do the magic to lookup the TCP connection for the request, and then add caching yourself which would just be duplicating this I think?

Not sure - see my other comment bellow. In a handler user has access to peer and local IP - and can make the MDS call directly when needed.

I see what you mean - it's about avoiding to make this call on each request.

How about we leave it out for now - users can probably use the same pattern if they want, better to focus this library on TCP ?

I agree that I don't love implicit context storage, but this is what go-grpc uses all over the place, so in this instance probably better to follow the same pattern

There is a difference between a library and a framework. Yes, go-grpc and other frameworks follow this pattern - and it is not a bad pattern for a framework, where user can get peer identity in a consistent way from context.

My point is that for a library that interacts with ztunnel - it should not define those abstractions and implied requirements ( chained for ALL requests, etc). We will have other ways to get peer info - JWT, XFCC, etc - so this library would be used by frameworks together with other auth mechanisms, and in the end expose the context
data specific to the framework in use.

go-metadata/istio.go

costinm · 2023-04-25T18:25:51Z

go-metadata/istio.go

+// ExtractFromRequest attempts to extract ConnectionMetadata from a request.
+// This requires Handler to be used.
+// If a connection is re-used between requests, only a single call will be made.
+func ExtractFromRequest(r *http.Request) *ConnectionMetadata {


That's what I don't like about this pattern - user has to call ExtractFromRequest ( so has a dependency to this library anyways ), but the Handler is called in ALL requests. If user only needs peer info for some requests - it's more complicated to setup ( wrap only some request with the handler chain ), or not possible if the lookup is based on some other logic ( like checking XFF or other standard headers).

Instead this method could just to the lookup on demand, and no wrapper needed.

costinm · 2023-04-25T18:28:59Z

src/proxy/metadata.rs

+            .parse()?,
+    );
+    if remote.ip() != dst.ip() && remote.ip() != src.ip() {
+        anyhow::bail!("metadata server request must come from the src or dst address")


Not sure this is guaranteed - for example if the connection is IPv6, pretty sure it won't be the case ( dual stack and we use a MDS v4 address ).

Can we keep a pointer to the dest and source pod - and check its IPs ?

Or just add a TODO and bug - we can do this in a later phase.

Ran into this myself with localhost testing -- since we don't use 127.0.0.1 to connect to MDS. I put a TODO. I think we can handle it once the workload API supports dual stack (currently doesn't)

costinm · 2023-04-25T18:30:17Z

src/proxy/outbound.rs

        orig_dst_addr: SocketAddr,
        block_passthrough: bool,
    ) -> Result<(), Error> {
+        let remote_ip = remote_addr.ip();
+        if orig_dst_addr.ip() == METADATA_SERVER_IP {


That's ok - but may also be good to have MDS listen on a separate port ( 150xx) and let CNI handle the redirection ( or in whitebox mode use the Service ). Cleaner.

But it's good to keep this code too.

have MDS listen on a separate port ( 150xx) and let CNI handle the redirection

+1 on this, I don't love the reliance on a hardcoded IP here, and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.

For the record - changed my mind after further thinking. See my other comments on why it's best for ztunnel to handle all redirection based on destination IP.

This is also something we can potentially revisit later if ztunnel/CNI integ gets tighter/more involved, but I'm ok punting on it now and going with this impl.

bleggett · 2023-04-27T14:10:14Z

Store each connection in a shared map Have we thought about how this might conflict/be architecturally redundant in some sense with eBPF-based + map-based conntracking?

Most eBPF CNIs will need to do something like this (track connections in a shared map), and it will work almost exactly the same way (map of connections, just in kernelspace maintained as a eBPF hash map).

I understand that ztunnel will need to work with things that are not just eBPF, but I wonder if this wouldn't be better implemented (eventually) as a BPF conntrack map that ztunnel can manage/query from userspace.

go-metadata/istio.go

linsun · 2023-04-27T18:42:54Z

Sorry I'm missing how this would benefit users since we said we want to keep ztunnel small and lean. MDS - is this google cloud metadata service or more generic?

howardjohn · 2023-04-27T19:47:46Z

It has nothing to do with Google - this allows users applications to find the peer identity. Today they can do this with XFCC header, but that is HTTP based which we cannot do in ztunnel. This provides a generalized way.

Note this is needed for sandwiching as well

linsun · 2023-04-28T01:02:00Z

It has nothing to do with Google - this allows users applications to find the peer identity. Today they can do this with XFCC header, but that is HTTP based which we cannot do in ztunnel. This provides a generalized way.

Note this is needed for sandwiching as well

Thanks, btw, what does MDS stand for?

I thought we aren't using sandwich approach at all?

costinm · 2023-04-28T01:02:42Z

On Thu, Apr 27, 2023 at 7:29 AM Eitan Yarmush ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In go-metadata/istio.go <#504 (comment)>: > + "net/url" + "time" +) + +// ConnectionMetadata returns the metadata about a connection +type ConnectionMetadata struct { + Identity string `json:"identity"` +} + +// metadataClient provides a default client for all metadata requests +var metadataClient = http.Client{ + Timeout: time.Second, +} + +// metadataContextKey is the key the metadata handler will store connection metadata in +var metadataContextKey = &contextKey{} I agree that I don't love implicit context storage, but this is what go-grpc uses all over the place, so in this instance probably better to follow the same pattern

The pattern may be fine in some cases. That's not my concern - this library should not be a framework, just expose the raw command. It will likely be combined with checks for XFCC, ProxyProtocol, etc - there are plenty of auth schemes an app may support, and plenty of frameworks that set the identity in some way.

costinm · 2023-04-28T01:04:55Z

MDS is 'metadata server' AFAIK, all cloud vendors have one. Not sure who was first. This has little to do with sandwich - any user application may need to know the identity of the peer. For example we may have authz to allow Alice to talk to Bob - but Bob still wants to know who's talking to.

…

On Thu, Apr 27, 2023 at 6:02 PM Lin Sun ***@***.***> wrote: It has nothing to do with Google - this allows users applications to find the peer identity. Today they can do this with XFCC header, but that is HTTP based which we cannot do in ztunnel. This provides a generalized way. Note this is needed for sandwiching as well Thanks, btw, what does MDS stand for? I thought we aren't using sandwich approach at all? — Reply to this email directly, view it on GitHub <#504 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2XX36RPONSNSU7QFZDXDMJJFANCNFSM6AAAAAAXLK5ZTQ> . You are receiving this because you commented.Message ID: ***@***.***>

hzxuzhonghu · 2023-04-28T02:39:50Z

src/proxy/outbound.rs

        orig_dst_addr: SocketAddr,
        block_passthrough: bool,
    ) -> Result<(), Error> {
+        let remote_ip = remote_addr.ip();
+        if orig_dst_addr.ip() == METADATA_SERVER_IP {


Would prefer this is not intercepted to ztunnel

I am not sure how we can opt out? since ztunnel will capture all traffic

costinm · 2023-04-28T03:30:05Z

Interesting - will ztunnel break the platform MDS ? I think CNI needs to probably exclude the range ( most platforms seem to be using same /24). But ztunnel MDS itself needs to be intercepted, how else can the pod communicate with it ?

…

On Thu, Apr 27, 2023 at 7:45 PM John Howard ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/proxy/outbound.rs <#504 (comment)>: > orig_dst_addr: SocketAddr, block_passthrough: bool, ) -> Result<(), Error> { + let remote_ip = remote_addr.ip(); + if orig_dst_addr.ip() == METADATA_SERVER_IP { I am not sure how we can opt out? since ztunnel will capture all traffic — Reply to this email directly, view it on GitHub <#504 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2VDSEHDRTXDW5TOCPTXDMVMHANCNFSM6AAAAAAXLK5ZTQ> . You are receiving this because you commented.Message ID: ***@***.***>

howardjohn · 2023-04-28T03:31:30Z

Hmm we did fix platform MDS somehow and have a test for it. Let me see how we did it On Thu, Apr 27, 2023, 8:30 PM Costin Manolache ***@***.***> wrote:

…

Interesting - will ztunnel break the platform MDS ? I think CNI needs to probably exclude the range ( most platforms seem to be using same /24). But ztunnel MDS itself needs to be intercepted, how else can the pod communicate with it ? On Thu, Apr 27, 2023 at 7:45 PM John Howard ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In src/proxy/outbound.rs > <#504 (comment)>: > > > orig_dst_addr: SocketAddr, > block_passthrough: bool, > ) -> Result<(), Error> { > + let remote_ip = remote_addr.ip(); > + if orig_dst_addr.ip() == METADATA_SERVER_IP { > > I am not sure how we can opt out? since ztunnel will capture all traffic > > — > Reply to this email directly, view it on GitHub > <#504 (comment)>, or > unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAAUR2VDSEHDRTXDW5TOCPTXDMVMHANCNFSM6AAAAAAXLK5ZTQ > > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#504 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXJOBNZ25SLBRUKS4MTXDM2UPANCNFSM6AAAAAAXLK5ZTQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

louiscryan · 2023-04-28T19:49:38Z

go-metadata/istio.go

+type ConnectionMetadata struct {
+	// Identity provides the identity of the peer.
+	// Example: spiffe://cluster.local/ns/a/sa/b.
+	Identity string `json:"identity"`


Can we open a tracking issue to capture the other information about the identity that we would like to expose here

Over time I would expect to see us add either a field or a new API that provided details from the public key of the peer

I think the interesting question is what are we trying to prove to the caller about the veracity of the identity we just provided. Options include:

How can the caller know that the MDS is reliably acting on its behalf. Clearly theres a certain amount of infrastructural trust seeing as this is coming from the 'network' as perceived by the app. One option here if for the MDS to offer a "who am I" API. This is a bit like SDS but now its an implicit part of the infrastructure. Response to the call should either be the current public cert or some shorthand of it. Variant, can prove that the infra has the private key which is shielded from the app by signing some random bytes passed to the API which can then be verified with the public key. Likely want to deliver a shorthand in addition to x509 to shield users from parsing its gory details.

How can the caller know that the infrastructure actually performed a secure handshake with the identified peer? This is harder and of debateable value. If we provide proof that infrastructure already has the public and private keys as described in 1 above then the infra is already capable of impersonating the application for whatever purposes it sees fit. We could do something like a 'signed ping'

Client generates random bytes and encrypts with public key of peer (opaque to local ztunnel) Client app calls 'secure ping' passing the encrypted bytes local ztunnel simply forwards encrypted bytes to the identified peer peer ztunnel decrypts random byes with private key and returns their signature local ztunnel forwars message signature to client Client compares message signature to locally computed hash

This proves that there is something between the client and the peer that has access to the private key associated with the identity. The client doesn't know who or what that is necessarily

Over time I would expect to see us add either a field or a new API that provided details from the public key of the peer

+1, ideally one that follows XFCC semantics, and chains certs of all workloads in the pathway, including the ztunnel instance.

One option here if for the MDS to offer a "who am I" API. This is a bit like SDS but now its an implicit part of the infrastructure. Response to the call should either be the current public cert or some shorthand of it.

This is more or less what SPIRE already does, it's worth pointing out.

XFCC is a pretty horrible UX - you need a parser library, parse certs, etc. It may be useful for some super advanced users - but if they are willing to take all those dependencies they may be better off just using proxyless and having end to end mTLS.

All platforms I know ( not only google) have a basic, simple MDS server that returns metadata and tokens in
a very simple way.

I'm not saying it is an invalid use case to want fancy attestation - and it is clear that cloud vendors don't
provide this right now, so it's not a bad idea if someone implemented it - but we should not mix it with
the very simple use case for this PR.

linsun · 2023-04-28T19:57:26Z

MDS is 'metadata server' AFAIK, all cloud vendors have one. Not sure who was first. This has little to do with sandwich - any user application may need to know the identity of the peer. For example we may have authz to allow Alice to talk to Bob - but Bob still wants to know who's talking to.
…

Thanks, thinking loud here - could this be something opted in so it doesn't impact users don't use MDS.

howardjohn · 2023-04-28T19:58:15Z

MDS is 'metadata server' AFAIK, all cloud vendors have one. Not sure who was first. This has little to do with sandwich - any user application may need to know the identity of the peer. For example we may have authz to allow Alice to talk to Bob - but Bob still wants to know who's talking to.
…

Thanks, thinking loud here - could this be something opted in so it doesn't impact users don't use MDS.

This doesn't impact anyone if they don't use it, t hey have to explicitly call the metadata server for it to do anything

bleggett · 2023-04-28T21:04:06Z

src/proxy/outbound.rs

        orig_dst_addr: SocketAddr,
        block_passthrough: bool,
    ) -> Result<(), Error> {
+        let remote_ip = remote_addr.ip();
+        if orig_dst_addr.ip() == METADATA_SERVER_IP {


have MDS listen on a separate port ( 150xx) and let CNI handle the redirection

+1 on this, I don't love the reliance on a hardcoded IP here, and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.

bleggett · 2023-04-28T21:04:57Z

src/proxy/metadata.rs

+
+/// METADATA_SERVER_IP provides the well-known metadata server IP.
+/// This is captured by the redirection.
+pub const METADATA_SERVER_IP: IpAddr = IpAddr::V4(Ipv4Addr::new(169, 254, 169, 111));


Can we just hardcode a specific metadata server IP in OSS ztunnel like this?

If it's "well-known", to whom is it well-known?

I mean we can - the IP is irrelevant other than the client and server need to be in agreement about it and it should not conflict with other IPs users want to use for other purposes.

and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.

What is better about a special port instead of a special IP? Or did you want special IP + special port (which seems good?)

I guess I'm not clear on why ztunnel needs a special IP at all?

If we can redirect any pod outbound destined for <whatever IP you want> to <whatever ztunnel you want> and <whatever ztunnel port we want> outside of ztunnel itself - why does ztunnel need to match a well-known IP?

That could be a pretty simple CNI rule.

Is it any simpler to have it in the CNI layer instead of in ztunnel though?

In general there is a tension of what goes at which layer. For example, we could skip. ztunnel entirely in cases where we know it won't do anything but pass through.

So far we have leaned towards keeping the CNI simpler so there is a better chance we can replace it with better approaches (and rust is easier than iptables/eBPF)

Is it any simpler to have it in the CNI layer instead of in ztunnel though?

Long term I think so, since the CNI (ours or someone else's) will be handling/redirecting the traffic to listeners within the ztunnel netns based on intended dest anyway, and we already have the concept of directing packets to different well-known ztunnel listener ports and handling them differently as a result (for purposes of early exit/handling bypass in eBPF/iptables redirect logic, for instance)

So far we have leaned towards keeping the CNI simpler so there is a better chance we can replace it with better approaches (and rust is easier than iptables/eBPF)

Absolutely a fan of keeping CNI simpler, but sending captured traffic based on intended dest to a given ztunnel listener is what CNI is for - that's what it does best. It's how we do DNS redir for instance. CNI redir already does special-casing around certain dest ports (or whether src is host ip, etc etc) for purposes of data path optimization - it's already handling this stuff, and adding this would probably be a single iptables rule.

Feels logically consistent to maintain that pattern here, and it has the added benefit of making it very easy for the IP to be changed or redefined however the network owner wants it, without ztunnel caring (or us needing to add a flag/envvar to ztunnel to let people define what they want).

I strongly disagree - routing traffic based on dst is what ztunnel does.

CNI is responsible to redirect pod egress to ztunnel - it does not know where to send ( unless ztunnel and WDS/PTR-DS is integrated into the CNI ). Knowing where to redirect an IP and how is based on Istiod configs.

And if we want at some point to have the 'redirect all egress from pod to daemon set' - shared by all
mesh implementations and all CNI providers - we need to keep the requirements on the CNI layer as
small as possible.

We may add extra requirements - like exclude ports for proxyless - but getting into 'route IP address x in a different way' is too much.

I strongly disagree - routing traffic based on dst is what ztunnel does.

CNI is responsible to redirect pod egress to ztunnel - it does not know where to send ( unless ztunnel and WDS/PTR-DS is integrated into the CNI ). Knowing where to redirect an IP and how is based on Istiod configs.

CNI knows what ztunnel listener to redirect traffic to, and ztunnel knows what dest to send traffic to - correct - using CNI to redirect specific MDS traffic to a specific ztunnel listener would be the former, which is definitely what CNI is responsible for doing today.

And if we want at some point to have the 'redirect all egress from pod to daemon set' - shared by all
mesh implementations and all CNI providers - we need to keep the requirements on the CNI layer as
small as possible.

Valid. I'm not sure that's a realistic or practical goal, given that no two CNIs are really the same and we want to keep ztunnel from being a kitchen-sink but also tightly integrate at the L4 layer, but we can live in hope.

We may add extra requirements - like exclude ports for proxyless - but getting into 'route IP address x in a different way' is too much.

It doesn't strike me as significantly different than route IP port X in a different way which CNI already does and will always do for e.g. DNS, but I'm good w/ this approach for now.

costinm · 2023-04-29T00:56:05Z

The reason for the hardcoded IP is pretty simple: in an app, how do you make the http request ? You need a DNS name or IP. Many MDS servers are also DNS servers - there is a plan to allow running a DNS proxy in ztunnel. Also what hostname would you use - and how would CNI know that some VIP is actually the MDS address ? All MDS servers I know rely on a hardcoded 'well known' address in that range. We can also use a special 150xx port, but if we want ztunnel to be a resolver it'll need to listen on 53 ( yes, we can do some complicated redirection to use a different port for DNS - but why not use the simplest solution ). If we are using this pattern - I would like to see a good reason why we should follow a different pattern, and what would that be - can't think of a simpler or cleaner solution. Also this whole thing doesn't need a fancy iptables or eBPF or k8s - CNI can simply add a route and add a second address with the well-known IP on the ztunnel pod. It will also work on VMs and non-K8S environments - even in cases where no iptables are possible ( ztunnel can work without any CNI or iptables, by setting SOCKS_PROXY env variable with most applications ).

…

On Fri, Apr 28, 2023 at 3:42 PM Ben Leggett ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/proxy/metadata.rs <#504 (comment)>: > + fn drop(&mut self) { + trace!(tuple=?self.tup, "forgetting connection"); + let mut cm = self.ct.conns.lock().unwrap(); + cm.remove(&self.tup); + } +} + +#[derive(Clone, Debug, Serialize)] +pub struct ConnectionMetadata { + #[serde(default)] + identity: Option<Identity>, +} + +/// METADATA_SERVER_IP provides the well-known metadata server IP. +/// This is captured by the redirection. +pub const METADATA_SERVER_IP: IpAddr = IpAddr::V4(Ipv4Addr::new(169, 254, 169, 111)); Is it any simpler to have it in the CNI layer instead of in ztunnel though? Long term I think so, since the CNI (ours or someone else's) will be handling/redirecting the traffic based on intended dest anyway, and we already have the concept of directing packets to different well-known ztunnel listener ports and handling them differently as a result (for purposes of early exit/handling bypass in eBPF/iptables redirect logic, for instance) So far we have leaned towards keeping the CNI simpler so there is a better chance we can replace it with better approaches (and rust is easier than iptables/eBPF) Absolutely a fan of keeping CNI simpler, but sending captured traffic based on intended dest to a given ztunnel listener is what CNI is for - that's what it does best. It's how we do DNS redir for instance. CNI redir already does special-casing around certain dest ports (or whether src is host ip, etc etc) for purposes of data path optimization - it's already handling this stuff. Feels logically consistent to maintain that pattern here, and it has the added benefit of making it very easy for the IP to be changed or redefined without ztunnel caring (or us needing to add a flag to ztunnel to let people define what they want). — Reply to this email directly, view it on GitHub <#504 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2XZUNHVJ75L7CON3MLXDRBXDANCNFSM6AAAAAAXLK5ZTQ> . You are receiving this because you commented.Message ID: ***@***.***>

costinm · 2023-04-29T01:04:32Z

On Fri, Apr 28, 2023 at 12:49 PM Louis Ryan ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In go-metadata/istio.go <#504 (comment)>: > + "context" + "encoding/json" + "fmt" + "io" + "net" + "net/http" + "net/url" + "os" + "time" +) + +// ConnectionMetadata returns the metadata about a connection +type ConnectionMetadata struct { + // Identity provides the identity of the peer. + // Example: spiffe://cluster.local/ns/a/sa/b. + Identity string `json:"identity"` Can we open a tracking issue to capture the other information about the identity that we would like to expose here Over time I would expect to see us add either a field or a new API that provided details from the public key of the peer

And probably some of the telemetry or labels and other info we got from the PTR-DS. I agree in time we'll likely add more, but the first step is to get identity in and validate. I would also expose a lookup using only the peer IP - the 4-tuple is only really needed if ztunnel can't impersonate the original source IP.

I think the interesting question is what are we trying to prove to the caller about the veracity of the identity we just provided. Options include: 1. *How can the caller know that the MDS is reliably acting on its behalf*. Clearly theres a certain amount of infrastructural trust seeing as this is coming from the 'network' as perceived by the app. One option here if for the MDS to offer a "who am I" API. This is a bit like SDS but now its an implicit part of the infrastructure. Response to the call should either be the current public cert or some shorthand of it. Variant, can prove that the infra has the private key which is shielded from the app by signing some random bytes passed to the API which can then be verified with the public key. Likely want to deliver a shorthand in addition to x509 to shield users from parsing its gory details.

I think this may be overkill and defeat the purpose. If the app is willing to do all this - and link libraries and deal with certs - they are probably better off just using a proxyless library. With sidecars they don't get any guarantee either, nor with most CNI. It's also not the caller that needs to be assured - what is it going to do ? The network admin, who installs the CNI and ztunnel needs to be sure all packets go to ztunnel, but that can be exposed using telemetry.

1. *How can the caller know that the infrastructure actually performed a secure handshake with the identified peer?* This is harder and of debateable value. If we provide proof that infrastructure already has the public and private keys as described in 1 above then the infra is already capable of impersonating the application for whatever purposes it sees fit. We could do something like a 'signed ping' Client generates random bytes and encrypts with public key of peer (opaque to local ztunnel) Client app calls 'secure ping' passing the encrypted bytes local ztunnel simply forwards encrypted bytes to the identified peer peer ztunnel decrypts random byes with private key and returns their signature local ztunnel forwars message signature to client Client compares message signature to locally computed hash This proves that there is something between the client and the peer that has access to the private key associated with the identity. The client doesn't know who or what that is necessarily

That seems for super-advanced users - it may be something for a specialized ztunnel to support for specialized apps with libraries that can take advantage of it, but maybe not something to have in OSS and the simple/clean implementation we have ? It is a cool idea.

…

Message ID: ***@***.***>

bleggett · 2023-04-29T01:17:49Z

The reason for the hardcoded IP is pretty simple: in an app, how do you make the http request ? You need a DNS name or IP. Many MDS servers are also DNS servers - there is a plan to allow running a DNS proxy in ztunnel. Also what hostname would you use - and how would CNI know that some VIP is actually the MDS address ?

I'm suggesting if you have a need to map an arbitrary environmental IP of your choice to always get redirected to a listener in ztunnel, it's pretty simple to do that with a single iptables rule outside of ztunnel, by adding another rule to the ones we already have that are functionally similar, without ztunnel needing to care what that IP is, or needing to hardcode the IP in ztunnel. We could pick a fixed IP in OSS, anyone who wanted to do something different could change it by changing the rules outside ztunnel, ztunnel wouldn't care either way.

All MDS servers I know rely on a hardcoded 'well known' address in that range.
MDS clients do, I'm not sure the MDS server being added to ztunnel has a true need to know anything about a well-known/fixed MDS IP.

We can also use a special 150xx port, but if we want ztunnel to be a resolver it'll need to listen on 53 ( yes, we can do some complicated redirection to use a different port for DNS - but why not use the simplest solution ).

ztunnel already listens on 15053 for this purpose, I believe. We have iptables rules that rewrite UDP on port 53 to 15053 within ztunnel's netns on pod redirect. ztunnel doesn't care that this translation happened, or (for now) what the original dest IP was, it just treats traffic on port 15053 as DNS proxy traffic. I'm suggesting we do the same thing for MDS here.

If we are using this pattern - I would like to see a good reason why we should follow a different pattern, and what would that be - can't think of a simpler or cleaner solution. Also this whole thing doesn't need a fancy iptables or eBPF or k8s - CNI can simply add a route and add a second address with the well-known IP on the ztunnel pod. It will also work on VMs and non-K8S environments - even in cases where no iptables are possible ( ztunnel can work without any CNI or iptables, by setting SOCKS_PROXY env variable with most applications ).

I'm proposing we use the pattern already in use for all ambient ztunnel redirection - ztunnel exposes/hardcodes well-known ports, and CNI redirects traffic, based on (variously) dest IP, dest port, and dest packet type to those ports - for this as well. A fixed IP within ztunnel isn't a requirement for this, and it feels weird to add it here when nothing else currently in ztunnel works like this, and there's nothing really preventing it from being implemented exactly like all the existing pod->ztunnel redirections that rely on CNI to map traffic to fixed ztunnel listening ports.

costinm · 2023-04-29T01:24:36Z

On Fri, Apr 28, 2023 at 6:18 PM Ben Leggett ***@***.***> wrote: The reason for the hardcoded IP is pretty simple: in an app, how do you make the http request ? You need a DNS name or IP. Many MDS servers are also DNS servers - there is a plan to allow running a DNS proxy in ztunnel. Also what hostname would you use - and how would CNI know that some VIP is actually the MDS address ? I'm suggesting if you have a need to map an arbitrary environmental IP of your choice to always get redirected to a listener in ztunnel, it's pretty simple to do that with a single iptables rule without ztunnel needing to care what that IP is, or needing to hardcode it in ztunnel. We could pick a fixed IP, anyone who wanted to do something different could change it, ztunnel wouldn't care either way.

I see, you are right, that would work too. However since ztunnel is intercepting all egress anyways - and istio-cni is extremely complex and hopefully will become part of the native CNI, I would rather keep it in ztunnel and simplify the requirement on the CNI to 'just redirect all egress from the app to ztunnel'. All MDS servers I know rely on a hardcoded 'well known' address in that

range. MDS clients do, I'm not sure the MDS server being added to ztunnel does. We can also use a special 150xx port, but if we want ztunnel to be a resolver it'll need to listen on 53 ( yes, we can do some complicated redirection to use a different port for DNS - but why not use the simplest solution ). ztunnel already listens on 15053 for this purpose, I believe. We have iptables rules that rewrite UDP on port 53 to 15053 within ztunnel's netns.

Unless someone added DNS - I think agent does that. There is an option to also listen on 53 and not require iptable rules. Just because it is possible to use iptables - it doesn't mean we should ( and the rules for 53 require raw and some very ugly stuff - UDP is far more complicated )

If we are using this pattern - I would like to see a good reason why we should follow a different pattern, and what would that be - can't think of a simpler or cleaner solution. Also this whole thing doesn't need a fancy iptables or eBPF or k8s - CNI can simply add a route and add a second address with the well-known IP on the ztunnel pod. It will also work on VMs and non-K8S environments - even in cases where no iptables are possible ( ztunnel can work without any CNI or iptables, by setting SOCKS_PROXY env variable with most applications ). I'm proposing we use the pattern already in use - ztunnel exposes well-known ports, and CNI redirects traffic, based on (variously) dest IP, dest port, and dest packet type to those ports - for this as well. A fixed IP within ztunnel isn't a requirement for this, and it feels weird to add it here when nothing else currently in ztunnel works like that.

Another pattern already in use is CNI redirects all traffic - and ztunnel routes it wherever it needs to go. I would rather reduce complexity in CNI. I also mentioned we have the SOCKS mode that is independent of iptables ( works without any ) - with the current method it would work just fine. And if we ever add a HTTP/1.1 CONNECT proxy ( for the few apps that support HTTPS_PROXY and not SOCKS ) - it would work too. Message ID: ***@***.***>

…

costinm · 2023-04-29T17:03:45Z

go-metadata/istio.go

+	u := metadataServerURL()
+	params := url.Values{}
+	params.Add("src", src)
+	params.Add("dst", dst)


Getting connection metadata is very useful - as Louis mentioned we may extend this to return all the details
about the connection, including labels and more handshake info.

And since we return it as json - we may as well consider an MDS endpoint that is also signing it.

But I think the use case for passing both src and dst is primarily for dealing with CNI implementations
where the source IP can't be preserved. Calico for example seems to support that, and hopefully we can
do it for more CNIs.

In the case of using ztunnel in a 'best practice' CNI - we may want to support metadata lookup using only the source IP ( which would return data from the PTR-DS/WDS ).
If the ztunnel doesn't preserve the source IP - I think the lookup result should include the original source IP
in addition to the peer identity. And maybe all the metadata lookup info too.

( all this in additional PRs, of course - not intending to hold this PR until everything is added )

* Store each connection in a shared map * Capture special IP address, redirect to MDS handler * MDS handler takes 4 tuple as input. TODO: should probably validate that the caller matches at least one of the IPs? * Response is JSON containing identity Also has an example Golang library implementation to add HTTP middleware that extracts the identity.

istio-testing · 2023-06-12T22:04:14Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bleggett · 2023-07-21T15:57:59Z

Looking back at this:

It probably needs a new name, since it's unrelated to Metadata discovery service istio#43937 (which we also need)
This is functionally similar to SPIFFE/SPIRE self-identity-lookup in a lot of ways (consulting ztunnel instead of SPIRE agent, and bound to the connection versus actively attested by out-of-band checks), and if you were using SPIRE with Istio this could be (a bit of, not fully - ztunnel def. needs to be the attesting authority for the connection) a redundant mechanism. That's not necessarily an objection, but it concerns me a bit on the face of it.

e.g. - if I was using SPIRE with Istio as the workload CA as documented by us, and as a workload/pod sidecar I asked SPIRE for my identity and also asked ztunnel for my (connection) identity - I would/should get the same x509 cert in both cases.

howardjohn · 2023-08-14T23:36:53Z

Lets drop this for now to keep things minimal.

istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 25, 2023

costinm reviewed Apr 25, 2023

View reviewed changes

EItanya reviewed Apr 27, 2023

View reviewed changes

go-metadata/istio.go Show resolved Hide resolved

howardjohn force-pushed the mds/initial branch from e7450d6 to eedf5b8 Compare April 27, 2023 17:55

hzxuzhonghu reviewed Apr 28, 2023

View reviewed changes

louiscryan reviewed Apr 28, 2023

View reviewed changes

bleggett reviewed Apr 28, 2023

View reviewed changes

costinm reviewed Apr 29, 2023

View reviewed changes

costinm approved these changes May 4, 2023

View reviewed changes

howardjohn added 2 commits May 19, 2023 11:19

address some comments

13f5e3a

howardjohn force-pushed the mds/initial branch from eedf5b8 to 13f5e3a Compare May 19, 2023 18:21

howardjohn requested a review from a team as a code owner May 19, 2023 18:21

howardjohn assigned stevenctl Jun 12, 2023

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 12, 2023

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 12, 2023

istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 21, 2023

howardjohn closed this Aug 14, 2023

howardjohn mentioned this pull request Feb 27, 2024

Initial connection MDS implementation #802

Open

Initial MDS implementation #504

Initial MDS implementation #504

Conversation

howardjohn commented Apr 25, 2023

costinm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Apr 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett commented Apr 27, 2023 • edited

linsun commented Apr 27, 2023

howardjohn commented Apr 27, 2023

linsun commented Apr 28, 2023

costinm commented Apr 28, 2023 via email

costinm commented Apr 28, 2023 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costinm commented Apr 28, 2023 via email

howardjohn commented Apr 28, 2023 via email

Choose a reason for hiding this comment

bleggett Apr 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linsun commented Apr 28, 2023

howardjohn commented Apr 28, 2023

bleggett Apr 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Apr 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett Apr 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleggett May 5, 2023 • edited

Choose a reason for hiding this comment

costinm commented Apr 29, 2023 via email

costinm commented Apr 29, 2023 via email

bleggett commented Apr 29, 2023 • edited

costinm commented Apr 29, 2023 via email

costinm Apr 29, 2023 • edited

Choose a reason for hiding this comment

istio-testing commented Jun 12, 2023

bleggett commented Jul 21, 2023 • edited

howardjohn commented Aug 14, 2023

bleggett Apr 28, 2023 •

edited

bleggett commented Apr 27, 2023 •

edited

bleggett Apr 28, 2023 •

edited

bleggett Apr 28, 2023 •

edited

bleggett Apr 28, 2023 •

edited

bleggett Apr 28, 2023 •

edited

bleggett May 5, 2023 •

edited

bleggett commented Apr 29, 2023 •

edited

costinm Apr 29, 2023 •

edited

bleggett commented Jul 21, 2023 •

edited