Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Allow external traffic from Istio pods #515

Closed
kyessenov opened this issue Apr 12, 2017 · 36 comments
Closed

Allow external traffic from Istio pods #515

kyessenov opened this issue Apr 12, 2017 · 36 comments

Comments

@kyessenov
Copy link
Contributor

Traffic capture rules prevent any world-bound traffic from leaving the cluster.
This is a major usability problem. Egress proxy is part of the solution but we need to explore if we can change iptables rules to limit what they capture.

@kyessenov kyessenov added this to the manager alpha milestone Apr 12, 2017
@ayj ayj self-assigned this Apr 12, 2017
@ayj
Copy link
Contributor

ayj commented Apr 13, 2017

The kubernetes service cluster IP and pod IP ranges are well defined and explicitly configured with the --service-cluster-ip-range and --cluster-cidr flags (see [1] and [2] below). Our iptable rules can exclude redirecting outbound traffic to envoy for addresses outside these ranges. All traffic internal to the cluster (pod-to-service, pod-to-pod) would continue to be captured by proxy for both inbound and outbound pod traffic. Only outbound pod traffic to outside the cluster (i.e. egress) would be excluded from proxy and avoid the need for additional egress proxying. This should solve the initial UX problem.

It looks like this approach is compatible with the ongoing egress proxy work (#67 and #463) since a dedicated k8s service (with its own service IP within the redirect range) is created for each headless service.

cc @rshriram @louiscryan @GregHanson

[1] - https://kubernetes.io/docs/admin/kube-controller-manager/
[2] - https://kubernetes.io/docs/admin/kube-apiserver/

@rshriram
Copy link
Member

rshriram commented Apr 13, 2017 via email

@ayj
Copy link
Contributor

ayj commented Apr 13, 2017

So by default we won't impose the proxy on egress services? It's opt-in?

Yes. Added bonus is things like apt-get should work out of the box without mucking with iptable rules.

Doesn't this change the entire security/policy model from enforced to opt-in?

Not necessarily. Inbound traffic (i.e. server side) is always redirected so mTLS should be enforceable. Auth to external services outside the cluster would be opt-in. cc @myidpt @wattli @lookuptable for auth perspective.

The question is can the IP range be automatically determined from k8s API server?

I'm still looking into this part. This could also be saved off into a ConfigMap as part of istio installation. For example, these ranges are available on GKE with gcloud, e.g.

$ gcloud container clusters describe cluster0 | grep -i cidr
clusterIpv4Cidr: 10.20.0.0/14        # --cluster-cidr
nodeIpv4CidrSize: 24
servicesIpv4Cidr: 10.23.240.0/20     # --service-cluster-ip-range

@rshriram
Copy link
Member

Yes, but this is cloud-platform specific. Istio installation in this case becomes part of the workflow.

Last but not least, the number of IP tables rules that we have per pod keeps increasing. While I don't see any obvious issue with the above strategy of restricting our traffic capture to limited IP range, it is making me increasingly uncomfortable, given that we are mucking around a lot with L4, while really providing value at L7, and ignoring all the L4 policy enforcers (e.g., k8s network policies). It would be nice to get some external perspective on this.

@christian-posta @liljenstolpe any thoughts?

@liljenstolpe
Copy link

Sorry about coming late to the party on this one. Is the use case here a host-based proxy, rather than a side-car deployment? Since we are talking about egress capture, I assume this is the case.

Currently, k8s network policy doesn't support a redirect, otherwise that could be an option. If we are talking side-car deployment, then ingress control to the proxy (from other pods or external) could be done via k8s network policy. Some implementations can also do egress filtering (full disclosure, ours does) as well, but that is not natively supported by k8s network policy.

@kyessenov
Copy link
Contributor Author

To give some context - the use case is a per-pod sidecar proxy that needs to capture both inbound and outbound traffic (that is two proxies on each data path).

@liljenstolpe
Copy link

@rshriram Looking at this, I don't think that k8s security policy could solve the capture problem (wrong side of the namespace wall). However, you could certainly delegate all L3-L4 ingress protection to the k8s network policy api. There are options for egress filtering as well, out there (full transparency, Project Calico, which I am a member of, is one of those egress filtering options).

@liljenstolpe
Copy link

@rshriram This might be different in a non-sidecar (non k8s) model. Let's cross that bridge when we come to it :)

@christian-posta
Copy link
Contributor

Ideally we could do something with k8s-networking, specifically related to this thread:

kubernetes/kubernetes#25961

@liljenstolpe
Copy link

@christian-posta the way I understand the issue is that since we are talking sidecar here, the iptables work would need to happen within the pod. If that is the case, then kube network policy doesn't have the right hooks - k8s network policy is external to the pod, not the intra-pod network. I may be wrong here, but that is my understanding of what we are talking about.

@ayj
Copy link
Contributor

ayj commented Apr 13, 2017

@christian-posta, in that thread thockin refers to this thread, of which the conclusion is what we have in Istio today (i.e. the author that thread, Enrico Schiattarella, did the initial iptable rules for istio :)).

@ayj
Copy link
Contributor

ayj commented Apr 14, 2017

The bad news - It looks like allowing external traffic by default and the istio egress are currently mutually exclusive.

The good news - allowing external traffic by default and istio egress can eventually coexist, but it requires a bit more work on egress re: rewriting ExternalName as described by (1) here.

The recommendation for alpha is to default to allowing external traffic thus bypassing the proxy and disabling istio egress. The argument for this recommendation is the principle of least surprise, i.e. existing kubernetes external traffic should work similarly with/without istio and for L7 *and L4 traffic. A new flag will be added to istioctl kube-inject to disable this behavior so that development of egress can continue in parallel with the alpha release.

@andraxylia
Copy link
Contributor

Let's try to avoid having to rely on a k8s network policy for this. This is not k8s native, and there are multiple implementations, each with its pros and cons. For now, we do not want to create a dependency between istio and nw policy. Besides, egress policy is still work in progress.

@andraxylia
Copy link
Contributor

Other clients have asked k8s to expose --service-cluster-ip-range and --cluster-cidr:
kubernetes/kubernetes#25533

Until this is implemented, we can add ServiceClusterIPRange and ClusterCIDR to the "istio" configMap, with same default values as k8s defaults, and require the admin to be change them if k8s defaults are changed. We can maybe add a more generic list of excluded CIDRs, not necessarily k8s service and pod CIDRs.

@ayj
Copy link
Contributor

ayj commented Apr 14, 2017

Yes, ConfigMap would be the best place to put these information so we avoid extra required flags to kube-inject. A generic list of CIDRs for excluding outbound traffic should work.

re: k8s defaults, that seems to be cluster specific so our installation instructions would need to address that somehow, e.g.

$ grep -r SERVICE_CLUSTER_IP_RANGE= *
cluster/ubuntu/config-default.sh:export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-192.168.3.0/24}  # formerly PORTAL_NET
cluster/rackspace/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16"  # formerly PORTAL_NET
cluster/centos/master/scripts/apiserver.sh:SERVICE_CLUSTER_IP_RANGE=${3:-"10.10.10.0/24"}
cluster/centos/config-default.sh:export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-"192.168.3.0/24"}
cluster/libvirt-coreos/config-default.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-10.11.0.0/16}"  # formerly PORTAL_NET
cluster/gce/config-test.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16"  # formerly PORTAL_NET
cluster/gce/config-default.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/16}"  # formerly PORTAL_NET
cluster/vagrant/util.sh:  echo "SERVICE_CLUSTER_IP_RANGE='${SERVICE_CLUSTER_IP_RANGE}'"
cluster/vagrant/config-default.sh:SERVICE_CLUSTER_IP_RANGE=10.247.0.0/16  # formerly PORTAL_NET
cluster/openstack-heat/config-default.sh:SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/16}
cluster/kubemark/gce/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.0.0.0/16"  # formerly PORTAL_NET
cluster/photon-controller/util.sh:    echo "readonly SERVICE_CLUSTER_IP_RANGE='${SERVICE_CLUSTER_IP_RANGE}'"
cluster/photon-controller/config-test.sh:SERVICE_CLUSTER_IP_RANGE="10.244.240.0/20"
cluster/photon-controller/config-default.sh:SERVICE_CLUSTER_IP_RANGE="10.244.240.0/20"
hack/local-up-cluster.sh:SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-10.0.0.0/24}
test/kubemark/start-kubemark.sh:SERVICE_CLUSTER_IP_RANGE="${SERVICE_CLUSTER_IP_RANGE:-}"

@ayj
Copy link
Contributor

ayj commented Apr 14, 2017

Also, at least in the case of GKE, those defaults don't match what is actually used, e.g.

$ gcloud container clusters describe c1 --zone=us-central1-a | grep -i cidr
clusterIpv4Cidr: 10.20.0.0/14
nodeIpv4CidrSize: 24
servicesIpv4Cidr: 10.23.240.0/20                   # SERVICE_CLUSTER_IP_RANGE

@rshriram
Copy link
Member

rshriram commented Apr 14, 2017 via email

@ayj
Copy link
Contributor

ayj commented Apr 14, 2017

@rshriram, thanks for the update re: bluemix. What do you propose as an alternative in the short term until we can get pull this from kubernetes itself (e.g. kubernetes/kubernetes#25533)?

Here are the options as I see it (other suggestions welcome).

  1. outbound traffic from pod destined to k8s service is redirected to proxy. All other outbound traffic bypasses proxy. Principle of least surprise. Istio becomes opt-in for external service proxying, otherwise default external traffic works similar to vanilla kubernetes.

  2. outbound traffic from pod bypasses proxy unless explicitly redirected via iptable rules. Specifically, this means adding/removing dynamic iptable rules based on k8s services similar to kube-proxy.

  3. outbound traffic from pod is redirected to proxy unless explicit redirected via iptable rules. Requires similar dynamic iptable rule management.

  4. outbound traffic from pod is redirected to proxy (current solution). Egress proxy only works for explicit external services. All other external traffic is dropped by the proxy, e.g. apt-get doesn't work, user needs to audit their application for all external traffic and add ExternalName accordingly, otherwise things are broken.

  5. Make proxy non-transparent a. la linkerd and use localhost with explicit ports for everything.

(edit): (5) - @rshriram and @louiscryan suggestion to add a passthrough filter to envoy (see @louiscryan description below)

The proposal is for (0) with a flag to revert back to (3) for egress proxy testing. Longer term plan is to push to make this available via kubernetes directly (e.g. cluster-wide ConfigMap). For cluster providers that don't expose SERVICE_CLUSTER_IP_RANGE, e.g. bluemix, the flag for option(3) can be used to retain the current behavior we have today. We could also invert the defaults such that option (3) is used unless cluster admin enables option (0). WDYT?

@louiscryan
Copy link

@mattklein123 @rshriram @PiotrSikora

There is another option here but it would take some work in Envoy to make it happen.

We introduce a special cluster type that is a 'passthrough' to the DST_IP captured when running in TPROXY mode. This allows Envoy to proxy traffic for which it doesn't have an explicit configuration.

This provides some useful gradual improvement runway as L4 routing rules can be incrementally introduced that capture this ad-hoc traffic and move it over to explicitly defined services over time. It might also be possible to use a route to re-classify traffic as L7 and still address a pass-through cluster.

Matt - Is this something that has come up for Lyft. I remember some conversations with Netflix about needs for something like this when doing ad-hoc development where the burden of requiring a cluster for every addressable service might be too high.

@andraxylia
Copy link
Contributor

@rshriram Most k8s network policy plugins capture the traffic themselves, either via iptables or kernel level changes. How istio will work in clusters with other network plugins is a more complex topic, that we should address when faced with concrete use cases.

@andraxylia
Copy link
Contributor

To prepare for when kubernetes/kubernetes#25533 is ready, @thockin advised us to process the CIDRs as list of ranges.

@rshriram
Copy link
Member

@louiscryan this is exactly what I proposed in slack. A pass through filter. I wasn't happy about the abstraction though as it seemed to nullify the purpose of a "proxy" and simply add overhead. But given the options, I think this could be a viable option in clusters where the subnet ranges are not available. In fact, we could offer the user the choice of option 0 (if they know the cidr block) or passthrough filter, until they migrate their services to external services.

@ayj I am not too worried about apt-get not working. No point in supporting a mutable infrastructure.

@rshriram
Copy link
Member

@ayj To be clear, my concern about the iptables exclusion is not just for IBM bluemix (I am sure I can get the cidr info from our internal folks. cc @dancberg).

My point was that expecting users to obtain installation information about their k8s clusters (cidr blocks) may turn out to be a hurdle to get things up and running istio.

@ayj
Copy link
Contributor

ayj commented Apr 14, 2017

@rshriram I agree that expecting users to obtain CIDR info is definitely the wrong long term solution for istio. Manually configuring CIDR was only a short term workaround for alpha release. Sorry if I didn't make that clear in the original proposal. My comments re: apt-get was just an example since we happen to hit that problem during development quick frequently early on.

I added the passthrough suggestion as option (5) in my previous comment. Thanks @louiscryan and @rshriram for the suggestion.

The long term solution would then likely be option (0) (discovering CIDR ranges via standard k8s API) or option (5). For maximum flexibility we might want to offer alternative installation options for both (e.g. (0) for end-users concerned about max-perf for non-istio services and (5) for everything else).

Short term solution for alpha likely depends on how much work it is to add passthrough support to envoy.

@mattklein123
Copy link

mattklein123 commented Apr 15, 2017

Matt - Is this something that has come up for Lyft. I remember some conversations with Netflix about needs for something like this when doing ad-hoc development where the burden of requiring a cluster for every addressable service might be too high.

In general, this has not come up, mainly since at Lyft we force everyone to modify their applications slightly (we don't use iptables at all). So we have various egress ports that applications use for different things. The Netflix case is mostly about HTTP (and has some up from others also). That has been solved with the addition of CDS and allowing a routed cluster to be set by a header. This is a different case I think.

With that said, I don't think it would be too terribly difficult to implement a transparent version of tcp_proxy that does dynamic pass through connection. I would recommend not building this into tcp_proxy filter, but making a new filter called tcp_pass_through_proxy which implements this. I would estimate this could be done in 1-2 weeks depending on who is doing it. This sounds like a good first medium size Envoy project for @PiotrSikora. :)

@louiscryan
Copy link

louiscryan commented Apr 15, 2017 via email

@ayj
Copy link
Contributor

ayj commented Apr 15, 2017

@mattklein123, would this also address envoyproxy/envoy#527?

@mattklein123
Copy link

@ayj No, it wouldn't. TBH I don't fully grok envoyproxy/envoy#527. We should talk about that separately if that is a real issue that needs to be solved.

@PiotrSikora
Copy link

@louiscryan @mattklein123 sure, although, it's not clear to me how tcp_pass_through_proxy would be different from tcp_proxy filter with "ad-hoc" cluster configuration, since TPROXY (retaining source IP) doesn't seem to be relevant to a sidecar proxy, does it?

@louiscryan
Copy link

louiscryan commented Apr 15, 2017 via email

@mattklein123
Copy link

Building a dynamic cluster at runtime will be quite complicated. We can move this discussion over to an Envoy GH issue and do the design there, but I think a filter that specifically takes DST_IP, makes a synthetic Upstream::Host, and connects to it, will be simpler than trying to bolt this into tcp_proxy. (We might want to share some code in a base class or something, not sure). Either way, it's doable without a ton of work.

@ayj
Copy link
Contributor

ayj commented Apr 15, 2017

re: envoyproxy/envoy#527, tcp_proxy and http filters cannot be used on the same port and have http fallback to tcp if no domains/routes match. Our current workaround is to restrict end users from sharing ports between HTTP and TCP services. That is more difficult when services are external to the cluster. Maybe this isn't a problem for tcp_pass_through_proxy, but if so we could discuss next week.

@mattklein123
Copy link

I think it might make sense to have a brief meeting on this next week. This is kind of complicated and I'm concerned stuff is going to get lost in the comment trail.

@louiscryan
Copy link

Opened issue

envoyproxy/envoy#775

Lets move the impl discussion there

@ayj
Copy link
Contributor

ayj commented May 3, 2017

Moving this to beta milestone now since we have #573 and post-alpha we will likely use envoyproxy/envoy#775

@ayj
Copy link
Contributor

ayj commented Jun 7, 2017

Egress proxy is part of the solution but we need to explore if we can change iptables rules to limit what they capture.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants