-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable true mesh-only, remove hard dependency on the knative-local-gateway #1371
Comments
@adamkgray I think I have figured out that the top level virtual service should use the |
This is the current setup, we had to use knative local gateway to route from top level virtual service
|
to avoid going through the local gateway, tried istio virtual service delegate which helps forwarding the traffic to the destination service when using the top level host
|
@yanniszark @theofpa I think there are two options here
|
@yuzisun thanks for taking the time to write all of this down. I tried to summarize all the information that we talked about in the meeting at a high level, to sync our understanding on where we are now and where we want to go. Current Scenario that Kubeflow supports out-of-the-boxPlease keep in mind that everything refers to sidecar-injected Pods, which is the default in Kubeflow Profiles. With the work in kubeflow/kubeflow#5848, this is the currently supported scenario: Current State - Invalid ScenariosThe following scenarios do not work with sidecar-injected Pods. The traffic FROM the cluster-local gateway TO Pods in the user's namespace is not allowed Desired StateSince we have sidecars, we should not need the cluster-local-gateway. Istio sidecars can do all the complex routing themselves. Why this is importantBy eliminating the cluster-local-gateway, we can write very straightforward Istio AuthorizationPolicies. However, since creating InferenceServices without sidecars already allows that, we could create a permissive AuthorizationPolicy for Knative Pods right? WRONG! Because we can't express a selector that targets only Knative Pods: Next StepsI understand from our discussion and your comment above that:
P.S. Here are the |
To enable the mesh mode all you need to do is to inject the istio sidecar which is done by default with Kubeflow setup I assume. @yanniszark I think the diagram isn't correct for the transformer to predictor routing, knative creates two virtual services for both When you route the request to transformer it would be still an issue because the top level virtual service introduces a level of indirection which forces to use local gateway but we are trying to figure out a way to do this without local gateway. You can see the following access logging, when routing initially to transformer it goes through the local gateway BUT from transformer to predictor it does NOT go through the local gateway when istio sidecar is injected, instead the traffic directly goes from transformer istio envoy proxy to the predictor envoy proxy so you do not find the access logging on the gateway.
you can find the predictor call in the transformer envoy proxy log
|
@yuzisun I tried to reproduce but bumped into something very strange. First, requests were proxied through the activator Pod. I tried injecting it with a sidecar and allowing its traffic. However, I noticed that the inferenceservice sidecar was still blocking traffic. I tried to debug with: and saw that the activator was not connecting with mTLS (the IP of activator is
I tried creating a DestinationRule to no avail. Was this not the case for you? |
@yanniszark I was seeing the same error while I was busy with #1688. In that PR, the AuthorizationPolicy for the namespace only allows access to the
This section should include some of the values founf here, such as the I was able to replicate this behavior using a notebook server, by making requests directly to the IP address of a Pod rather than the service. This is also backed up by the fact that the Activator logs should show the requests going through
|
@davidspek Do you have istio proxy injected for knative activator pod? I think those values also require mTLS to be enabled. |
@yuzisun Yes, the Activator pod is injected with the sidecar proxy. |
Also if you see PassthroughCluster that means the request is not routed through the mesh, so the destination service is not on the service mesh? We can try debug with |
@davidspek Actually knative activator might be directly addressing the pod IPs bypassing the virtual service routes. |
@yuzisun My conclusion is indeed that the KNative Activator is addressing the pods directly and not using the virtual service routes, which is causing the problems. |
Let's bring this issue to knative community ? Looks like so far Knative has put an emphasis on no-mesh scenario or basic mTLS (no policies). |
@yuzisun That's what I was also thinking. The authorization policy they provide does work, but it's very broad and thus not really best practice I'd say. You should want to limit access to the necessary paths to either the service account KNative uses, or the knative-serving namespace. In situations like Kubeflow where you want to isolate namespaces this is a must, and these fine grained permissions are part of the reason for using a service mesh (which service can talk to another service). |
see this relevant issue knative-extensions/net-istio#554 |
This Istio issue also seems relevant: istio/istio#23494 |
@yuzisun @Suresh-Nakkeran : Hasn't this PR #2380 solves this already ? If yes, then we can close this. |
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Remove hard dependency on the knative local gateway and let kfserving have the option to run 100% mesh-only with istio.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Kfserving should be able to run in true mesh-only mode on istio with sidecars enabled. In doing so, we ought to be able to remove the hard dependency on the knative local gateway which currently is used to route top level requests (name.ns.svc.cluster.local) to the other knative services that kfserving creates (name-predictor-default.ns.svc.cluster.local). Vanilla knative services support this feature, so kfserving ought to as well.
Im not sure how the routing should be done in this case... Perhaps we can start by making the predictor ksvc the top level service so that name.ns.svc.cluster.local can resolve to its clusterIP instead of just being a CNAME for the knative local gateway.
The text was updated successfully, but these errors were encountered: