Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial MDS implementation #504

Closed
wants to merge 2 commits into from
Closed

Conversation

howardjohn
Copy link
Member

  • Store each connection in a shared map
  • Capture special IP address, redirect to MDS handler
  • MDS handler takes 4 tuple as input.
  • Response is JSON containing identity

Also has an example Golang library implementation to add HTTP middleware that extracts the identity.

Istio tests: istio/istio#44536 (blocked by this PR, of course)

@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 25, 2023
Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments- it's ok to just add TODOs and comments in the code, no need to resolve them in this PR, better to merge and iterate.

fatal(err)

si, err := istiometadata.FetchFromClientConnection(conn)
fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR we should add some code to test that MDS doesn't return anything after close is called.

I would change this to a plain TCP echo ( also in future PR) and add an example for HTTP and gRPC but more proper, i.e. using the library and usable in real life apps.

But this is good enough for now.

}

// metadataContextKey is the key the metadata handler will store connection metadata in
var metadataContextKey = &contextKey{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As API, not a big fan of using context as a hash map and forcing users to use this pattern.

Better to have a clean interface - and let users wrap or save the info however they want. Most likely this will
be integrated into another library ( otel, authz )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree but I think this is the only way to do it as an HTTP handler? unless do the magic to lookup the TCP connection for the request, and then add caching yourself which would just be duplicating this I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - see my other comment bellow. In a handler user has access to peer and local IP - and can make the MDS call directly when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean - it's about avoiding to make this call on each request.

How about we leave it out for now - users can probably use the same pattern if they want, better to focus this library on TCP ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I don't love implicit context storage, but this is what go-grpc uses all over the place, so in this instance probably better to follow the same pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between a library and a framework. Yes, go-grpc and other frameworks follow this pattern - and it is not a bad pattern for a framework, where user can get peer identity in a consistent way from context.

My point is that for a library that interacts with ztunnel - it should not define those abstractions and implied requirements ( chained for ALL requests, etc). We will have other ways to get peer info - JWT, XFCC, etc - so this library would be used by frameworks together with other auth mechanisms, and in the end expose the context
data specific to the framework in use.

go-metadata/istio.go Outdated Show resolved Hide resolved
go-metadata/istio.go Show resolved Hide resolved
go-metadata/istio.go Outdated Show resolved Hide resolved
// ExtractFromRequest attempts to extract ConnectionMetadata from a request.
// This requires Handler to be used.
// If a connection is re-used between requests, only a single call will be made.
func ExtractFromRequest(r *http.Request) *ConnectionMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I don't like about this pattern - user has to call ExtractFromRequest ( so has a dependency to this library anyways ), but the Handler is called in ALL requests. If user only needs peer info for some requests - it's more complicated to setup ( wrap only some request with the handler chain ), or not possible if the lookup is based on some other logic ( like checking XFF or other standard headers).

Instead this method could just to the lookup on demand, and no wrapper needed.

.parse()?,
);
if remote.ip() != dst.ip() && remote.ip() != src.ip() {
anyhow::bail!("metadata server request must come from the src or dst address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is guaranteed - for example if the connection is IPv6, pretty sure it won't be the case ( dual stack and we use a MDS v4 address ).

Can we keep a pointer to the dest and source pod - and check its IPs ?

Or just add a TODO and bug - we can do this in a later phase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into this myself with localhost testing -- since we don't use 127.0.0.1 to connect to MDS. I put a TODO. I think we can handle it once the workload API supports dual stack (currently doesn't)

orig_dst_addr: SocketAddr,
block_passthrough: bool,
) -> Result<(), Error> {
let remote_ip = remote_addr.ip();
if orig_dst_addr.ip() == METADATA_SERVER_IP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok - but may also be good to have MDS listen on a separate port ( 150xx) and let CNI handle the redirection ( or in whitebox mode use the Service ). Cleaner.

But it's good to keep this code too.

Copy link
Contributor

@bleggett bleggett Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have MDS listen on a separate port ( 150xx) and let CNI handle the redirection

+1 on this, I don't love the reliance on a hardcoded IP here, and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record - changed my mind after further thinking. See my other comments on why it's best for ztunnel to handle all redirection based on destination IP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also something we can potentially revisit later if ztunnel/CNI integ gets tighter/more involved, but I'm ok punting on it now and going with this impl.

@bleggett
Copy link
Contributor

bleggett commented Apr 27, 2023

Store each connection in a shared map Have we thought about how this might conflict/be architecturally redundant in some sense with eBPF-based + map-based conntracking?

Most eBPF CNIs will need to do something like this (track connections in a shared map), and it will work almost exactly the same way (map of connections, just in kernelspace maintained as a eBPF hash map).

I understand that ztunnel will need to work with things that are not just eBPF, but I wonder if this wouldn't be better implemented (eventually) as a BPF conntrack map that ztunnel can manage/query from userspace.

@linsun
Copy link
Member

linsun commented Apr 27, 2023

Sorry I'm missing how this would benefit users since we said we want to keep ztunnel small and lean. MDS - is this google cloud metadata service or more generic?

@howardjohn
Copy link
Member Author

It has nothing to do with Google - this allows users applications to find the peer identity. Today they can do this with XFCC header, but that is HTTP based which we cannot do in ztunnel. This provides a generalized way.

Note this is needed for sandwiching as well

@linsun
Copy link
Member

linsun commented Apr 28, 2023

It has nothing to do with Google - this allows users applications to find the peer identity. Today they can do this with XFCC header, but that is HTTP based which we cannot do in ztunnel. This provides a generalized way.

Note this is needed for sandwiching as well

Thanks, btw, what does MDS stand for?

I thought we aren't using sandwich approach at all?

@costinm
Copy link
Contributor

costinm commented Apr 28, 2023 via email

@costinm
Copy link
Contributor

costinm commented Apr 28, 2023 via email

orig_dst_addr: SocketAddr,
block_passthrough: bool,
) -> Result<(), Error> {
let remote_ip = remote_addr.ip();
if orig_dst_addr.ip() == METADATA_SERVER_IP {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer this is not intercepted to ztunnel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how we can opt out? since ztunnel will capture all traffic

@costinm
Copy link
Contributor

costinm commented Apr 28, 2023 via email

@howardjohn
Copy link
Member Author

howardjohn commented Apr 28, 2023 via email

type ConnectionMetadata struct {
// Identity provides the identity of the peer.
// Example: spiffe://cluster.local/ns/a/sa/b.
Identity string `json:"identity"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we open a tracking issue to capture the other information about the identity that we would like to expose here

Over time I would expect to see us add either a field or a new API that provided details from the public key of the peer

I think the interesting question is what are we trying to prove to the caller about the veracity of the identity we just provided. Options include:

  1. How can the caller know that the MDS is reliably acting on its behalf. Clearly theres a certain amount of infrastructural trust seeing as this is coming from the 'network' as perceived by the app. One option here if for the MDS to offer a "who am I" API. This is a bit like SDS but now its an implicit part of the infrastructure. Response to the call should either be the current public cert or some shorthand of it. Variant, can prove that the infra has the private key which is shielded from the app by signing some random bytes passed to the API which can then be verified with the public key. Likely want to deliver a shorthand in addition to x509 to shield users from parsing its gory details.

  2. How can the caller know that the infrastructure actually performed a secure handshake with the identified peer? This is harder and of debateable value. If we provide proof that infrastructure already has the public and private keys as described in 1 above then the infra is already capable of impersonating the application for whatever purposes it sees fit. We could do something like a 'signed ping'

      Client generates random bytes and encrypts with public key of peer (opaque to local ztunnel)
      Client app calls 'secure ping' passing the encrypted bytes
      local ztunnel simply forwards encrypted bytes to the identified peer
      peer ztunnel decrypts random byes with private key and returns their signature
      local ztunnel forwars message signature to client
      Client compares message signature to locally computed hash
    

This proves that there is something between the client and the peer that has access to the private key associated with the identity. The client doesn't know who or what that is necessarily

Copy link
Contributor

@bleggett bleggett Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over time I would expect to see us add either a field or a new API that provided details from the public key of the peer

+1, ideally one that follows XFCC semantics, and chains certs of all workloads in the pathway, including the ztunnel instance.

One option here if for the MDS to offer a "who am I" API. This is a bit like SDS but now its an implicit part of the infrastructure. Response to the call should either be the current public cert or some shorthand of it.

This is more or less what SPIRE already does, it's worth pointing out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XFCC is a pretty horrible UX - you need a parser library, parse certs, etc. It may be useful for some super advanced users - but if they are willing to take all those dependencies they may be better off just using proxyless and having end to end mTLS.

All platforms I know ( not only google) have a basic, simple MDS server that returns metadata and tokens in
a very simple way.

I'm not saying it is an invalid use case to want fancy attestation - and it is clear that cloud vendors don't
provide this right now, so it's not a bad idea if someone implemented it - but we should not mix it with
the very simple use case for this PR.

@linsun
Copy link
Member

linsun commented Apr 28, 2023

MDS is 'metadata server' AFAIK, all cloud vendors have one. Not sure who was first. This has little to do with sandwich - any user application may need to know the identity of the peer. For example we may have authz to allow Alice to talk to Bob - but Bob still wants to know who's talking to.

Thanks, thinking loud here - could this be something opted in so it doesn't impact users don't use MDS.

@howardjohn
Copy link
Member Author

MDS is 'metadata server' AFAIK, all cloud vendors have one. Not sure who was first. This has little to do with sandwich - any user application may need to know the identity of the peer. For example we may have authz to allow Alice to talk to Bob - but Bob still wants to know who's talking to.

Thanks, thinking loud here - could this be something opted in so it doesn't impact users don't use MDS.

This doesn't impact anyone if they don't use it, t hey have to explicitly call the metadata server for it to do anything

orig_dst_addr: SocketAddr,
block_passthrough: bool,
) -> Result<(), Error> {
let remote_ip = remote_addr.ip();
if orig_dst_addr.ip() == METADATA_SERVER_IP {
Copy link
Contributor

@bleggett bleggett Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have MDS listen on a separate port ( 150xx) and let CNI handle the redirection

+1 on this, I don't love the reliance on a hardcoded IP here, and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.


/// METADATA_SERVER_IP provides the well-known metadata server IP.
/// This is captured by the redirection.
pub const METADATA_SERVER_IP: IpAddr = IpAddr::V4(Ipv4Addr::new(169, 254, 169, 111));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just hardcode a specific metadata server IP in OSS ztunnel like this?

If it's "well-known", to whom is it well-known?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we can - the IP is irrelevant other than the client and server need to be in agreement about it and it should not conflict with other IPs users want to use for other purposes.

and would prefer a metadata port (like prometheus, xds, etc etc) so we don't need to rely on known IP keying in zt proper.

What is better about a special port instead of a special IP? Or did you want special IP + special port (which seems good?)

Copy link
Contributor

@bleggett bleggett Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not clear on why ztunnel needs a special IP at all?

If we can redirect any pod outbound destined for <whatever IP you want> to <whatever ztunnel you want> and <whatever ztunnel port we want> outside of ztunnel itself - why does ztunnel need to match a well-known IP?

That could be a pretty simple CNI rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it any simpler to have it in the CNI layer instead of in ztunnel though?

In general there is a tension of what goes at which layer. For example, we could skip. ztunnel entirely in cases where we know it won't do anything but pass through.

So far we have leaned towards keeping the CNI simpler so there is a better chance we can replace it with better approaches (and rust is easier than iptables/eBPF)

Copy link
Contributor

@bleggett bleggett Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it any simpler to have it in the CNI layer instead of in ztunnel though?

Long term I think so, since the CNI (ours or someone else's) will be handling/redirecting the traffic to listeners within the ztunnel netns based on intended dest anyway, and we already have the concept of directing packets to different well-known ztunnel listener ports and handling them differently as a result (for purposes of early exit/handling bypass in eBPF/iptables redirect logic, for instance)

So far we have leaned towards keeping the CNI simpler so there is a better chance we can replace it with better approaches (and rust is easier than iptables/eBPF)

Absolutely a fan of keeping CNI simpler, but sending captured traffic based on intended dest to a given ztunnel listener is what CNI is for - that's what it does best. It's how we do DNS redir for instance. CNI redir already does special-casing around certain dest ports (or whether src is host ip, etc etc) for purposes of data path optimization - it's already handling this stuff, and adding this would probably be a single iptables rule.

Feels logically consistent to maintain that pattern here, and it has the added benefit of making it very easy for the IP to be changed or redefined however the network owner wants it, without ztunnel caring (or us needing to add a flag/envvar to ztunnel to let people define what they want).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree - routing traffic based on dst is what ztunnel does.

CNI is responsible to redirect pod egress to ztunnel - it does not know where to send ( unless ztunnel and WDS/PTR-DS is integrated into the CNI ). Knowing where to redirect an IP and how is based on Istiod configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we want at some point to have the 'redirect all egress from pod to daemon set' - shared by all
mesh implementations and all CNI providers - we need to keep the requirements on the CNI layer as
small as possible.

We may add extra requirements - like exclude ports for proxyless - but getting into 'route IP address x in a different way' is too much.

Copy link
Contributor

@bleggett bleggett May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree - routing traffic based on dst is what ztunnel does.

CNI is responsible to redirect pod egress to ztunnel - it does not know where to send ( unless ztunnel and WDS/PTR-DS is integrated into the CNI ). Knowing where to redirect an IP and how is based on Istiod configs.

CNI knows what ztunnel listener to redirect traffic to, and ztunnel knows what dest to send traffic to - correct - using CNI to redirect specific MDS traffic to a specific ztunnel listener would be the former, which is definitely what CNI is responsible for doing today.

And if we want at some point to have the 'redirect all egress from pod to daemon set' - shared by all
mesh implementations and all CNI providers - we need to keep the requirements on the CNI layer as
small as possible.

Valid. I'm not sure that's a realistic or practical goal, given that no two CNIs are really the same and we want to keep ztunnel from being a kitchen-sink but also tightly integrate at the L4 layer, but we can live in hope.

We may add extra requirements - like exclude ports for proxyless - but getting into 'route IP address x in a different way' is too much.

It doesn't strike me as significantly different than route IP port X in a different way which CNI already does and will always do for e.g. DNS, but I'm good w/ this approach for now.

@costinm
Copy link
Contributor

costinm commented Apr 29, 2023 via email

@costinm
Copy link
Contributor

costinm commented Apr 29, 2023 via email

@bleggett
Copy link
Contributor

bleggett commented Apr 29, 2023

The reason for the hardcoded IP is pretty simple: in an app, how do you make the http request ? You need a DNS name or IP. Many MDS servers are also DNS servers - there is a plan to allow running a DNS proxy in ztunnel. Also what hostname would you use - and how would CNI know that some VIP is actually the MDS address ?

I'm suggesting if you have a need to map an arbitrary environmental IP of your choice to always get redirected to a listener in ztunnel, it's pretty simple to do that with a single iptables rule outside of ztunnel, by adding another rule to the ones we already have that are functionally similar, without ztunnel needing to care what that IP is, or needing to hardcode the IP in ztunnel. We could pick a fixed IP in OSS, anyone who wanted to do something different could change it by changing the rules outside ztunnel, ztunnel wouldn't care either way.

All MDS servers I know rely on a hardcoded 'well known' address in that range.
MDS clients do, I'm not sure the MDS server being added to ztunnel has a true need to know anything about a well-known/fixed MDS IP.

We can also use a special 150xx port, but if we want ztunnel to be a resolver it'll need to listen on 53 ( yes, we can do some complicated redirection to use a different port for DNS - but why not use the simplest solution ).

ztunnel already listens on 15053 for this purpose, I believe. We have iptables rules that rewrite UDP on port 53 to 15053 within ztunnel's netns on pod redirect. ztunnel doesn't care that this translation happened, or (for now) what the original dest IP was, it just treats traffic on port 15053 as DNS proxy traffic. I'm suggesting we do the same thing for MDS here.

If we are using this pattern - I would like to see a good reason why we should follow a different pattern, and what would that be - can't think of a simpler or cleaner solution. Also this whole thing doesn't need a fancy iptables or eBPF or k8s - CNI can simply add a route and add a second address with the well-known IP on the ztunnel pod. It will also work on VMs and non-K8S environments - even in cases where no iptables are possible ( ztunnel can work without any CNI or iptables, by setting SOCKS_PROXY env variable with most applications ).

I'm proposing we use the pattern already in use for all ambient ztunnel redirection - ztunnel exposes/hardcodes well-known ports, and CNI redirects traffic, based on (variously) dest IP, dest port, and dest packet type to those ports - for this as well. A fixed IP within ztunnel isn't a requirement for this, and it feels weird to add it here when nothing else currently in ztunnel works like this, and there's nothing really preventing it from being implemented exactly like all the existing pod->ztunnel redirections that rely on CNI to map traffic to fixed ztunnel listening ports.

@costinm
Copy link
Contributor

costinm commented Apr 29, 2023 via email

u := metadataServerURL()
params := url.Values{}
params.Add("src", src)
params.Add("dst", dst)
Copy link
Contributor

@costinm costinm Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting connection metadata is very useful - as Louis mentioned we may extend this to return all the details
about the connection, including labels and more handshake info.

And since we return it as json - we may as well consider an MDS endpoint that is also signing it.

But I think the use case for passing both src and dst is primarily for dealing with CNI implementations
where the source IP can't be preserved. Calico for example seems to support that, and hopefully we can
do it for more CNIs.

In the case of using ztunnel in a 'best practice' CNI - we may want to support metadata lookup using only the source IP ( which would return data from the PTR-DS/WDS ).
If the ztunnel doesn't preserve the source IP - I think the lookup result should include the original source IP
in addition to the peer identity. And maybe all the metadata lookup info too.

( all this in additional PRs, of course - not intending to hold this PR until everything is added )

* Store each connection in a shared map
* Capture special IP address, redirect to MDS handler
* MDS handler takes 4 tuple as input. TODO: should probably validate
  that the caller matches at least one of the IPs?
* Response is JSON containing identity

Also has an example Golang library implementation to add HTTP middleware
that extracts the identity.
@howardjohn howardjohn requested a review from a team as a code owner May 19, 2023 18:21
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 12, 2023
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 12, 2023
@istio-testing
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bleggett
Copy link
Contributor

bleggett commented Jul 21, 2023

Looking back at this:

  1. It probably needs a new name, since it's unrelated to Metadata discovery service istio#43937 (which we also need)
  2. This is functionally similar to SPIFFE/SPIRE self-identity-lookup in a lot of ways (consulting ztunnel instead of SPIRE agent, and bound to the connection versus actively attested by out-of-band checks), and if you were using SPIRE with Istio this could be (a bit of, not fully - ztunnel def. needs to be the attesting authority for the connection) a redundant mechanism. That's not necessarily an objection, but it concerns me a bit on the face of it.

e.g. - if I was using SPIRE with Istio as the workload CA as documented by us, and as a workload/pod sidecar I asked SPIRE for my identity and also asked ztunnel for my (connection) identity - I would/should get the same x509 cert in both cases.

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 21, 2023
@howardjohn
Copy link
Member Author

Lets drop this for now to keep things minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants