-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ambient: stash hbone peer principal in endpoint metadata #50753
ambient: stash hbone peer principal in endpoint metadata #50753
Conversation
Skipping CI for Draft Pull Request. |
…data key instead Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
f968348
to
e2653c5
Compare
This implementation doesn't work; the metadata is on the wrong listener. I'm iterating locally and will push periodically |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
This is ready for review now; we decided to stash the identity in metadata due to complexity |
@keithmattix: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@@ -57,9 +58,32 @@ func buildInternalUpstreamCluster(name string, internalListener string) *cluster | |||
} | |||
} | |||
|
|||
func buildEncapInternalUpstreamCluster(name string, internalListener string) *cluster.Cluster { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems identical to buildInternalUpstreamCluster. is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is from a former iteration where I had a custom filter. I'll clean this up once I get this all working locally
@@ -667,6 +667,10 @@ func buildEnvoyLbEndpoint(b *EndpointBuilder, e *model.IstioEndpoint, mtlsEnable | |||
ep.HostIdentifier = &endpoint.LbEndpoint_Endpoint{Endpoint: &endpoint.Endpoint{ | |||
Address: util.BuildInternalAddressWithIdentifier(connectOriginate, net.JoinHostPort(address, strconv.Itoa(port))), | |||
}} | |||
peerPrincipal, _ := structpb.NewStruct(map[string]any{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this for everything doing hbone or just waypoints?
Like does sidecar have the issue today? Seems like probably yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would think so; the simplest implementation seemed to be just always adding the identity in endpoint metadata
@keithmattix I'd prefer keeping ambient stuff to ambient code, so if we can avoid changing EDS for ambient, that's better. Adding another field to flatbuffer is not a problem, we have to refactor it anyways to support Otel telemetry. |
Ack - I just pushed up my stash. Serializing to the WorkloadMetadataObject is returning an empty string for some reason, and when I try to access the flat buffer directly, it looks corrupted. I've never worked with flat buffers, so any advice would be appreciated |
Converting back to draft as I'm still testing and trying to get it to work e2e |
Agree with Kuat, WDS is the right approach. We may need to iterate on what
is exposed - but identity is not controversial.
I think sidecars and regular gateways should use this as well - not only
ambient.
We are discussing the Istio sidecar and gateway behavior if mTLS is off
(for any reasons) - getting peer info from
MDS is the best option.
…On Thu, May 2, 2024 at 2:59 PM Keith Mattix II ***@***.***> wrote:
Converting back to draft as I'm still testing and trying to get it to work
e2e
—
Reply to this email directly, view it on GitHub
<#50753 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2WVV3AXSIOJQV5XVZDZAKZNRAVCNFSM6AAAAABG7JPIU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRHAYTKNZVHE>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
WDS works well for ztunnel because its the source of truth for things. Its IMO a poor choice when its decoupled from the rest of the routing flow (reminds me of authz and HTTPRoute 🙂 ). For example, what happens when the IP we send to is the east-west GW IP? We want metadata for the underlying destination, not the EW GW. This information is already in EDS, we already put 4 pieces of metadata there -- and it correctly handles these cases. Why put a 5th in a different location? |
Because we know that was not a good solution. Not every endpoint is in a cluster to start with, you have to support pass-through cases just the same way. There's no reason why WDS/MDS can't support gateway IPs. |
Ok but we don't do mTLS for those either, so the identity information doesn't matter. Having a disjoint source of information isn't really acceptable on the authorization path |
Waypoint absolutely relies on WDS for all source telemetry right now. We can just as well rely on it for destination telemetry, instead of trying to make it a hybrid of EDS on the destination side. I would agree if we didn't use WDS already, but it's heavily used at this point on Waypoints. |
TBH I am a bit confused by the position here. In the past, hadn't you advocated for deriving more metadata from trusted attributes? I.e. get the namespace, identity, etc from the TLS handshake. Using WDS seems further from that. What am I missing? |
@howardjohn Yes, when Envoy performs L4 functions as in the sidecar we should use the trusted attributes. But with Ambient, L4 is intentionally walled from L7 so L7 ultimately has to trust something else to provide those attributes. Whether you trust ztunnel on the destination side, Cilium/kernel, or istiod, doesn't really matter - you simply look up identity wherever you can. Costin points out to a valid model of a sidecar also dropping L4 functions, where we'd need to do this lookup as well, but that's a separate topic. |
Even in the case where you have ztunnel, the model we (or, I) am pushing for is to have an exchange of information between these two layers. Blindly trusting is not a good model IMO. And we are blurring the layers by needing to look up the information out of band instead of just asking the layer. |
I don't think it's a problem. istiod is just a skip-level manager that you can ask instead of going to the direct manager :) There's a general consistency problem that can be improved if you do it on the wire - you can absolutely do the same with an internal protocol between HCM and tcp_proxy instead of raw TCP that read/write principals. But that requires custom filters for a relatively minor feature right now. |
I agree communication between the layers is more work and maybe not worth it, but I don't see how EDS vs WDS is more or less work. Both seem to be just plugging 1 new field into existing mechanisms. Then we will want to use that in SAN match, but probably that work is similar across both methods |
EDS is not meant for policy - it's a load assignment, not even config. Putting a policy for SAN into EDS feels like an abuse of xDS data model. |
Where else would you put per-endpoint policies? If we are talking about abusing XDS data model that does not include WDS either |
First - this is for telemetry, not authorization ( I hope !).
At least with the current Istio mode, authz should only be based on the
peer certificate.
Using EDS for peer telemetry info doesn't make sense - even if we put 4
pieces of info there. We do a lot of things for
various legacy (good or bad) reasons.
There is a discussion to be had about trusting the IP address and the
discovery server for info used in the
authorization step. It is certainly more trust-worthy than any unsigned
peer metadata sent by the peer - and as
secure as sending the same info in EDS ( just more efficient ).
Not sure I understand the issue with the gateways - but at least for HTTP
the entire path
should be reflected in X-Forwarded and I think anyone can ask WDS for info
about any of the IPs in the path,
assuming each gateway trusts the one in front with X-Fowarded headers.
…On Thu, May 2, 2024 at 5:01 PM John Howard ***@***.***> wrote:
Because we know that was not a good solution. Not every endpoint is in a
cluster to start with, you have to support pass-through cases just the same
way.
Ok but we don't do mTLS for those either, so the identity information
doesn't matter.
Having a disjoint source of information isn't really acceptable on the
authorization path
—
Reply to this email directly, view it on GitHub
<#50753 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2SIXRIF73LJM52PBILZALHWDAVCNFSM6AAAAABG7JPIU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRHEZTANBSGU>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
The intent (from me, maybe not Keith -- I will let him say) is to use this for telemetry and authz. Our current SAN match is
I am talking about how to verify the cert |
On Thu, May 2, 2024 at 5:07 PM John Howard ***@***.***> wrote:
TBH I am a bit confused by the position here. In the past, hadn't you
advocated for deriving more metadata from trusted attributes? I.e. get the
namespace, identity, etc from the TLS handshake. Using WDS seems further
from that. What am I missing?
If we can get signed info - from TLS peer cert, or some JWT - that's
absolutely best.
If we can't get it from a peer cert - the next best source of trust is WDS.
Istiod is also the one handling secure naming
and all ztunnel and sidecar trust decisions are based on info about
captured IP address and the same source of data
as WDS. Not as good as a signed cert or JWT - but still as secure as the
rest of Istio.
Least secure is the header (TCP or http) that peer is sending. Even for
telemetry it's pretty bad, but we have a history.
|
Ok - I am confused... Is this in context of the previous WG meeting and getting principal for telemetry ? Or for some authentication feature that I'm not aware of ( normally CDS has secure naming ) ? John - I don't mind if ztunnel can pass the data ( PROXY or other mean)- but ztunnel gets it from Istiod as well. |
~medium term we definitely need to validate the SAN of the upstream endpoint, otherwise the telemetry on the wire could be false without us knowing. The immediate fix is to try and make sure telemetry is fixed, but I don't see why we shouldn't do SAN validation as well |
I would suggest not mixing (again) telemetry and authz. They have very different requirements, and I agree current SAN match is really bad and needs to be fixed.
I have been thinking about telemetry - and not sure I have full context on the authz issue. Keith - can you add some comments to the PR description to make the use case(s) more clear, and if it is |
Yes it's securing naming in CDS doesn't work in waypoints due to the
internal listener hop.
If we are already giving a per-endpoint identity in don't see why we would
then aggregate that into a cluster match
…On Thu, May 2, 2024, 5:54 PM Costin Manolache ***@***.***> wrote:
The intent (from me, maybe not Keith -- I will let him say) is to use this
for telemetry and authz. Our current SAN match is trust-domain/* which is
obviously unacceptable long term.
I would suggest not mixing (again) telemetry and authz. They have very
different requirements, and I agree current SAN match is really bad and
needs to be fixed.
At least with the current Istio mode, authz should only be based on the
peer certificate.
I am talking about how to verify the cert
I have been thinking about telemetry - and not sure I have full context on
the authz issue.
It can't be about waypoint verifying the client cert - that would be
covered by authz policy matches, and
I assume HA-PROXY discussion for sandwitches.
Is it about 'secure naming' for waypoints ? That would be covered by CDS
for services - and there is no EDS
for pods that are not in a service.
Keith - can you add some comments to the PR description to make the use
case(s) more clear, and if it is
security related - it would really, really help to have a doc that can be
security reviewed by people who
focus on this area.
—
Reply to this email directly, view it on GitHub
<#50753 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXN4POYFN6G23Q4QYL3ZALN3DAVCNFSM6AAAAABG7JPIU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRHE3DOOJTGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We should certainly do SAN validation - but not confuse the security and trust model used in authz with the one used I still don't know what 'upstream endpoint' you're validating and where. For clusters - I assume we still use |
So this is a HTTP request for "example.ns" that hits waypoint, gets routed to "example-v1" cluster - which has a bunch of Why we aggregate all pod identities in the cluster is a controversial issue, but the current approved design for If the request is NOT for a cluster/service - but directly to an endpoint ( workload to workload ) - that's a whole different |
I haven't seen the code - how did we end up with flatbuffers ? Didn't we have enough protobuffers and Json and yaml ? Is it something required by envoy ? Nothing wrong with flatbuffers - I like them - but seems one more complexity. |
We had flatbuffers since version 1.5 or so :D They were forced by Wasm, there's no good reason to use them now, so it just remains a tech debt. |
Closing because I was able to get this working with WDS |
Please provide a description of this PR:
If envoyproxy/envoy#33857 merges, we'll be able to use
io.istio.upstream_peer_principal
to grab the destination principal in telemetry