Allow external traffic from Istio pods #515
Comments
The kubernetes service cluster IP and pod IP ranges are well defined and explicitly configured with the It looks like this approach is compatible with the ongoing egress proxy work (#67 and #463) since a dedicated k8s service (with its own service IP within the redirect range) is created for each headless service. cc @rshriram @louiscryan @GregHanson [1] - https://kubernetes.io/docs/admin/kube-controller-manager/ |
So by default we won't impose the proxy on egress services? It's opt-in?
Doesn't this change the entire security/policy model from enforced to
opt-in? I guess it doesn't matter much because during istio deployment, an
organization can decide between the two options. If enforcement is chosen,
then the init container can be passed an option to trap all traffic, while
if opt-in is chosen, the restricted iptables rules can apply.
The question is can the IP range be automatically determined from k8s API
server?
On Wed, Apr 12, 2017 at 9:02 PM Jason Young ***@***.***> wrote:
The kubernetes service cluster IP and pod IP ranges are well defined and
explicitly configured with the --service-cluster-ip-range and
--cluster-cidr flags (see [1] and [2] below). Our iptable rules can
exclude redirecting outbound traffic to envoy for addresses outside these
ranges. All traffic internal to the cluster (pod-to-service, pod-to-pod)
would continue to be captured by proxy for *both* inbound and outbound
pod traffic. Only outbound pod traffic to outside the cluster (i.e. egress)
would be excluded from proxy and avoid the need for additional egress
proxying. This should solve the initial UX problem.
It looks like this approach is compatible with the ongoing egress proxy
work (#67 <#67> and #463
<#463>) since a dedicated k8s
service (with its own service IP within the redirect range) is created for
each headless service.
cc @rshriram <https://github.com/rshriram> @louiscryan
<https://github.com/louiscryan> @GregHanson
<https://github.com/GregHanson>
[1] - https://kubernetes.io/docs/admin/kube-controller-manager/
[2] - https://kubernetes.io/docs/admin/kube-apiserver/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#515 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd6DpuNqcuf6uFZj4iY8LnzGkNfdsks5rvXQXgaJpZM4M8Bd1>
.
--
~shriram
|
Yes. Added bonus is things like
Not necessarily. Inbound traffic (i.e. server side) is always redirected so mTLS should be enforceable. Auth to external services outside the cluster would be opt-in. cc @myidpt @wattli @lookuptable for auth perspective.
I'm still looking into this part. This could also be saved off into a ConfigMap as part of istio installation. For example, these ranges are available on GKE with gcloud, e.g. $ gcloud container clusters describe cluster0 | grep -i cidr
clusterIpv4Cidr: 10.20.0.0/14 # --cluster-cidr
nodeIpv4CidrSize: 24
servicesIpv4Cidr: 10.23.240.0/20 # --service-cluster-ip-range |
Yes, but this is cloud-platform specific. Istio installation in this case becomes part of the workflow. Last but not least, the number of IP tables rules that we have per pod keeps increasing. While I don't see any obvious issue with the above strategy of restricting our traffic capture to limited IP range, it is making me increasingly uncomfortable, given that we are mucking around a lot with L4, while really providing value at L7, and ignoring all the L4 policy enforcers (e.g., k8s network policies). It would be nice to get some external perspective on this. @christian-posta @liljenstolpe any thoughts? |
Sorry about coming late to the party on this one. Is the use case here a host-based proxy, rather than a side-car deployment? Since we are talking about egress capture, I assume this is the case. Currently, k8s network policy doesn't support a redirect, otherwise that could be an option. If we are talking side-car deployment, then ingress control to the proxy (from other pods or external) could be done via k8s network policy. Some implementations can also do egress filtering (full disclosure, ours does) as well, but that is not natively supported by k8s network policy. |
To give some context - the use case is a per-pod sidecar proxy that needs to capture both inbound and outbound traffic (that is two proxies on each data path). |
@rshriram Looking at this, I don't think that k8s security policy could solve the capture problem (wrong side of the namespace wall). However, you could certainly delegate all L3-L4 ingress protection to the k8s network policy api. There are options for egress filtering as well, out there (full transparency, Project Calico, which I am a member of, is one of those egress filtering options). |
@rshriram This might be different in a non-sidecar (non k8s) model. Let's cross that bridge when we come to it :) |
Ideally we could do something with k8s-networking, specifically related to this thread: |
@christian-posta the way I understand the issue is that since we are talking sidecar here, the iptables work would need to happen within the pod. If that is the case, then kube network policy doesn't have the right hooks - k8s network policy is external to the pod, not the intra-pod network. I may be wrong here, but that is my understanding of what we are talking about. |
@christian-posta, in that thread thockin refers to this thread, of which the conclusion is what we have in Istio today (i.e. the author that thread, Enrico Schiattarella, did the initial iptable rules for istio :)). |
The bad news - It looks like allowing external traffic by default and the istio egress are currently mutually exclusive. The good news - allowing external traffic by default and istio egress can eventually coexist, but it requires a bit more work on egress re: rewriting ExternalName as described by (1) here. The recommendation for alpha is to default to allowing external traffic thus bypassing the proxy and disabling istio egress. The argument for this recommendation is the principle of least surprise, i.e. existing kubernetes external traffic should work similarly with/without istio and for L7 *and L4 traffic. A new flag will be added to |
Let's try to avoid having to rely on a k8s network policy for this. This is not k8s native, and there are multiple implementations, each with its pros and cons. For now, we do not want to create a dependency between istio and nw policy. Besides, egress policy is still work in progress. |
Other clients have asked k8s to expose --service-cluster-ip-range and --cluster-cidr: Until this is implemented, we can add ServiceClusterIPRange and ClusterCIDR to the "istio" configMap, with same default values as k8s defaults, and require the admin to be change them if k8s defaults are changed. We can maybe add a more generic list of excluded CIDRs, not necessarily k8s service and pod CIDRs. |
Yes, ConfigMap would be the best place to put these information so we avoid extra required flags to re: k8s defaults, that seems to be cluster specific so our installation instructions would need to address that somehow, e.g. $ grep -r SERVICE_CLUSTER_IP_RANGE= *
cluster/ubuntu/config-default.sh:export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-192.168.3.0/24} # formerly PORTAL_NET
cluster/rackspace/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16" # formerly PORTAL_NET
cluster/centos/master/scripts/apiserver.sh:SERVICE_CLUSTER_IP_RANGE=${3:-"10.10.10.0/24"}
cluster/centos/config-default.sh:export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-"192.168.3.0/24"}
cluster/libvirt-coreos/config-default.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-10.11.0.0/16}" # formerly PORTAL_NET
cluster/gce/config-test.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16" # formerly PORTAL_NET
cluster/gce/config-default.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/16}" # formerly PORTAL_NET
cluster/vagrant/util.sh: echo "SERVICE_CLUSTER_IP_RANGE='${SERVICE_CLUSTER_IP_RANGE}'"
cluster/vagrant/config-default.sh:SERVICE_CLUSTER_IP_RANGE=10.247.0.0/16 # formerly PORTAL_NET
cluster/openstack-heat/config-default.sh:SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/16}
cluster/kubemark/gce/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16" # formerly PORTAL_NET
cluster/photon-controller/util.sh: echo "readonly SERVICE_CLUSTER_IP_RANGE='${SERVICE_CLUSTER_IP_RANGE}'"
cluster/photon-controller/config-test.sh:SERVICE_CLUSTER_IP_RANGE="10.244.240.0/20"
cluster/photon-controller/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.244.240.0/20"
hack/local-up-cluster.sh:SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/24}
test/kubemark/start-kubemark.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-}" |
Also, at least in the case of GKE, those defaults don't match what is actually used, e.g. $ gcloud container clusters describe c1 --zone=us-central1-a | grep -i cidr
clusterIpv4Cidr: 10.20.0.0/14
nodeIpv4CidrSize: 24
servicesIpv4Cidr: 10.23.240.0/20 # SERVICE_CLUSTER_IP_RANGE |
Well sadly IBM bluemix does not even expose these values to end users. So
this solution is not going to work for us reliably. :(. So before you do a
PR based on this approach, please validate with other platforms or let
others validate before merging.
Andra, My point with respect to other l4 stuff was not about relying on
network policies but to be actually independent of them or not mess with
them, when we start adding arbitrary iptables stuff.
On Fri, Apr 14, 2017 at 2:55 PM Jason Young ***@***.***> wrote:
Also, at least in the case of GKE, those defaults don't match what is
actually used, e.g.
$ gcloud container clusters describe c1 --zone=us-central1-a | grep -i cidr
clusterIpv4Cidr: 10.20.0.0/14
nodeIpv4CidrSize <http://10.20.0.0/14nodeIpv4CidrSize>: 24
servicesIpv4Cidr: 10.23.240.0/20 # SERVICE_CLUSTER_IP_RANGE
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#515 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd99FvRIKf4CDQ8UUgA230ME5uFMhks5rv8EHgaJpZM4M8Bd1>
.
--
~shriram
|
@rshriram, thanks for the update re: bluemix. What do you propose as an alternative in the short term until we can get pull this from kubernetes itself (e.g. kubernetes/kubernetes#25533)? Here are the options as I see it (other suggestions welcome).
(edit): (5) - @rshriram and @louiscryan suggestion to add a passthrough filter to envoy (see @louiscryan description below) The proposal is for (0) with a flag to revert back to (3) for egress proxy testing. Longer term plan is to push to make this available via kubernetes directly (e.g. cluster-wide ConfigMap). For cluster providers that don't expose SERVICE_CLUSTER_IP_RANGE, e.g. bluemix, the flag for option(3) can be used to retain the current behavior we have today. We could also invert the defaults such that option (3) is used unless cluster admin enables option (0). WDYT? |
@mattklein123 @rshriram @PiotrSikora There is another option here but it would take some work in Envoy to make it happen. We introduce a special cluster type that is a 'passthrough' to the DST_IP captured when running in TPROXY mode. This allows Envoy to proxy traffic for which it doesn't have an explicit configuration. This provides some useful gradual improvement runway as L4 routing rules can be incrementally introduced that capture this ad-hoc traffic and move it over to explicitly defined services over time. It might also be possible to use a route to re-classify traffic as L7 and still address a pass-through cluster. Matt - Is this something that has come up for Lyft. I remember some conversations with Netflix about needs for something like this when doing ad-hoc development where the burden of requiring a cluster for every addressable service might be too high. |
@rshriram Most k8s network policy plugins capture the traffic themselves, either via iptables or kernel level changes. How istio will work in clusters with other network plugins is a more complex topic, that we should address when faced with concrete use cases. |
To prepare for when kubernetes/kubernetes#25533 is ready, @thockin advised us to process the CIDRs as list of ranges. |
@louiscryan this is exactly what I proposed in slack. A pass through filter. I wasn't happy about the abstraction though as it seemed to nullify the purpose of a "proxy" and simply add overhead. But given the options, I think this could be a viable option in clusters where the subnet ranges are not available. In fact, we could offer the user the choice of option 0 (if they know the cidr block) or passthrough filter, until they migrate their services to external services. @ayj I am not too worried about apt-get not working. No point in supporting a mutable infrastructure. |
@ayj To be clear, my concern about the iptables exclusion is not just for IBM bluemix (I am sure I can get the cidr info from our internal folks. cc @dancberg). My point was that expecting users to obtain installation information about their k8s clusters (cidr blocks) may turn out to be a hurdle to get things up and running istio. |
@rshriram I agree that expecting users to obtain CIDR info is definitely the wrong long term solution for istio. Manually configuring CIDR was only a short term workaround for alpha release. Sorry if I didn't make that clear in the original proposal. My comments re: apt-get was just an example since we happen to hit that problem during development quick frequently early on. I added the passthrough suggestion as option (5) in my previous comment. Thanks @louiscryan and @rshriram for the suggestion. The long term solution would then likely be option (0) (discovering CIDR ranges via standard k8s API) or option (5). For maximum flexibility we might want to offer alternative installation options for both (e.g. (0) for end-users concerned about max-perf for non-istio services and (5) for everything else). Short term solution for alpha likely depends on how much work it is to add passthrough support to envoy. |
In general, this has not come up, mainly since at Lyft we force everyone to modify their applications slightly (we don't use iptables at all). So we have various egress ports that applications use for different things. The Netflix case is mostly about HTTP (and has some up from others also). That has been solved with the addition of CDS and allowing a routed cluster to be set by a header. This is a different case I think. With that said, I don't think it would be too terribly difficult to implement a transparent version of tcp_proxy that does dynamic pass through connection. I would recommend not building this into tcp_proxy filter, but making a new filter called tcp_pass_through_proxy which implements this. I would estimate this could be done in 1-2 weeks depending on who is doing it. This sounds like a good first medium size Envoy project for @PiotrSikora. :) |
Agreed, hey Piotr - feel like taking this on ?
…On Fri, Apr 14, 2017 at 5:24 PM, Matt Klein ***@***.***> wrote:
Matt - Is this something that has come up for Lyft. I remember some
conversations with Netflix about needs for something like this when doing
ad-hoc development where the burden of requiring a cluster for every
addressable service might be too high.
In general, this has not come up, mainly since at Lyft we force everyone
to modify there applications slightly (we don't use iptables at all). So we
have various egress ports that applications use for different things. The
Netflix case is mostly about HTTP (and has some up from others also). That
has been solved with the addition of CDS and allowing a routed cluster to
be set by a header. This is a different case I think.
With that said, I don't think it would be too terribly difficult to
implement a transparent version of tcp_proxy that does dynamic pass through
connection. I would recommend not building this into tcp_proxy filter, but
making a new filter called tcp_pass_through_proxy which implements this. I
would estimate this could be done in 1-2 weeks depending on who is doing
it. This sounds like a good first medium size Envoy project for
@PiotrSikora <https://github.com/piotrsikora>. :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#515 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIoKPLdVyztRnV_DYG5hO96r8hXUyY5Eks5rwA5EgaJpZM4M8Bd1>
.
|
@mattklein123, would this also address envoyproxy/envoy#527? |
@ayj No, it wouldn't. TBH I don't fully grok envoyproxy/envoy#527. We should talk about that separately if that is a real issue that needs to be solved. |
@louiscryan @mattklein123 sure, although, it's not clear to me how tcp_pass_through_proxy would be different from tcp_proxy filter with "ad-hoc" cluster configuration, since TPROXY (retaining source IP) doesn't seem to be relevant to a sidecar proxy, does it? |
This feature itself is really only useful (or safe) for outbound traffic
using a side-car so constraining it to that environment seems sensible.
Enabling this feature on a middle-proxy would seem like a
security-escalation waiting to happen.
…On Fri, Apr 14, 2017 at 5:49 PM, Piotr Sikora ***@***.***> wrote:
@louiscryan <https://github.com/louiscryan> @mattklein123
<https://github.com/mattklein123> sure, although, it's not clear to me
how tcp_pass_through_proxy would be different from tcp_proxy filter with
"ad-hoc" cluster configuration, since TPROXY (retaining source IP) doesn't
seem to be relevant to a sidecar proxy, does it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#515 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIoKPG9qVK1ai0Y4lRPPM48C2e8dIk7nks5rwBQigaJpZM4M8Bd1>
.
|
Building a dynamic cluster at runtime will be quite complicated. We can move this discussion over to an Envoy GH issue and do the design there, but I think a filter that specifically takes DST_IP, makes a synthetic Upstream::Host, and connects to it, will be simpler than trying to bolt this into tcp_proxy. (We might want to share some code in a base class or something, not sure). Either way, it's doable without a ton of work. |
re: envoyproxy/envoy#527, tcp_proxy and http filters cannot be used on the same port and have http fallback to tcp if no domains/routes match. Our current workaround is to restrict end users from sharing ports between HTTP and TCP services. That is more difficult when services are external to the cluster. Maybe this isn't a problem for tcp_pass_through_proxy, but if so we could discuss next week. |
I think it might make sense to have a brief meeting on this next week. This is kind of complicated and I'm concerned stuff is going to get lost in the comment trail. |
Opened issue Lets move the impl discussion there |
Moving this to beta milestone now since we have #573 and post-alpha we will likely use envoyproxy/envoy#775 |
|
Traffic capture rules prevent any world-bound traffic from leaving the cluster.
This is a major usability problem. Egress proxy is part of the solution but we need to explore if we can change
iptables
rules to limit what they capture.The text was updated successfully, but these errors were encountered: