Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirement on containerPort to capture traffic is too onerous #6259

Closed
louiscryan opened this issue Jun 13, 2018 · 40 comments
Closed

Requirement on containerPort to capture traffic is too onerous #6259

louiscryan opened this issue Jun 13, 2018 · 40 comments
Assignees
Milestone

Comments

@louiscryan
Copy link
Contributor

@louiscryan louiscryan commented Jun 13, 2018

We've seen many customers report issues on 0.8 around the new requirement to use containerPort declarations to capture traffic. This requirements was introduced as a workaround for the security issues with the proxy config in 0.7 that allowed a proxy to be used as a trampoline.

This is yielding a sub-optimal user experience and I believe we can have a more complete solution for this problem without requiring user config intervention.

A variety of solutions have already been discussed

@PiotrSikora @rshriram @costinm @andraxylia

I believe this is blocking for 1.0

@mandarjog

This comment has been minimized.

Copy link
Contributor

@mandarjog mandarjog commented Jun 13, 2018

We are hearing about this problem often from customers, and even from within the team.

Even though this behaviour was introduced to fix a security problem, it has introduced another one.
Users can inadvertently (or maliciously) opt out of the mesh by not using the correct containerPort setting. If there is a standard RBAC policy that applies to all services, simply by forgetting to set containerPort that service is left open.

This is certainly blocking for 1.0.

@PiotrSikora

This comment has been minimized.

Copy link
Member

@PiotrSikora PiotrSikora commented Jun 13, 2018

As mentioned elsewhere - Pilot simply needs to stop generating wildcard listeners, and then we can go back to intercepting all the traffic.

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Jun 14, 2018

For which we need the filter chain match on destination ip ranges.

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Jun 14, 2018

Otherwise generating 100s of listeners one per service and port combo is not feasible given that we will have 100s of RDS resource requests - one per http connection manager. Any change in mixer filter or other filter will cause mass reload of all these listeners.

@kyessenov

This comment has been minimized.

Copy link
Contributor

@kyessenov kyessenov commented Jun 14, 2018

I think technicalities in configuration server should not prevent us from getting the model right. xDS server can hold responses as needed, to prevent mass reloads, or to wait until all RDS resources are requested. In fact, I claim it's actually better to split RDS into smaller resources. Currently, in the worst case of all services listening on port 80, we have one giant RDS object that forces mass reload. By splitting into many RDS objects for each service VIP, the management server can be smarter and stage the update.

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Jun 14, 2018

You still can’t get rid of wildcard listeners when you add service entries or have stateful sets.

@kyessenov

This comment has been minimized.

Copy link
Contributor

@kyessenov kyessenov commented Jun 14, 2018

Is it ok to trampoline from a random pod's sidecar to an external service? I think it's not as bad, since an external service does not use the source's credentials. It is similar to sharing an egress between pods.

Stateful sets are a separate problem, and using wildcard listeners for them is a work around, not a final solution.

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Jun 14, 2018

Can someone clearly describe what the original issue was with an example and how that cryptic pr fixed the issue ?

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 14, 2018

Removing wildcard listeners will also break various cases in multicluster/meshexp - we can't know the full list of all IPs a service can use across the mesh.
And is likely to impact performance and scalability - since the number of matches in envoy will drastically increase.

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 14, 2018

And yes, statefull sets would also break if we remove the wildcard ports, as well as any non-k8s env where service VIPs are not used.

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 14, 2018

Mandar: I hope you are not proposing removing the option for user to exclude ports from interception - or disabling the interception completely via annotations. Mandatory policy and firewalls are far more complex then capturing and breaking all ports - it requires closing all other ways a user could break out.

@mandarjog

This comment has been minimized.

Copy link
Contributor

@mandarjog mandarjog commented Jun 14, 2018

We should keep exclusion options, as they are needed many times.
However it should be an explicit opt out option.

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Jun 14, 2018

@rshriram the problem is when traffic is sent to a pod with a spoofed host. The in and out listeners are mixed together, there is no separation. The incoming traffic is matched against a "0.0.0.0:port" listener with the spoofed host, and sent to the outbound route to the microservice for the spoofed host.

The fix in the inboundPorts does not let that traffic enter Envoy, so it is not trampolined. There is a small price to pay, which is to declare the ports.

The alternative to not having wildcard listeners is to split the in and out listeners. For this we need support in Envoy.

@kyessenov

This comment has been minimized.

Copy link
Contributor

@kyessenov kyessenov commented Jun 14, 2018

If a user types curl -H serviceB serviceA and the request reaches service B instead of service A, then that's a UX problem. The expectation is that service A VIP receives traffic no matter what host is specified. Therefore, we should respect that, and utilize VIP whenever it is available. That's not saying anything about services without VIP, we may very well use wildcard IPs and ports for those.

Re: perf and scalability. That depends whether dispatch by IP or dispatch by host is faster. I suspect the first one is faster since it happens sooner (no need to decode HTTP), and is simpler (no need to go through all the cases of how host is spelled). Given the total number of choices is same between the two, I don't see a clear reason why it would make performance worse. Of course, we need to measure that.

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Jun 14, 2018

Right, with the current inboundPort exclusion the traffic will reach serviceA, which will see it is spoofed and it contains hostB instead of hostA. We do not want to have serviceA exposed as external used as a trampoline to reach serviceB which is internal only.

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Jun 14, 2018

don't understand.. we setup inbound listeners to listen on the pod IP only. And the http connection manager in that listener can only route to local app and no where else, because we don't have any other virtual hosts in the inbound path.
so, even if I send a request to a pod IP with a spoofed host, it would end up in the local app and not go out.
if I send a request to a service VIP with a spoofed host from a pod with a sidecar (outbound), then it goes to the spoofed host as per HTTP specs.
On the inbound path, the you always see the pod IP. So, the fix for requiring a container port doesn't make sense nor do the alternatives being discussed here. Am I missing something?

@kyessenov

This comment has been minimized.

Copy link
Contributor

@kyessenov kyessenov commented Jun 15, 2018

@andraxylia It's not spoofing. Imagine "serviceB" was just the external name for serviceA (e.g. coming through ingress).

@baodongli

This comment has been minimized.

Copy link

@baodongli baodongli commented Jun 21, 2018

I'd like to understand this better as well. I agree with @rshriram that inbound listeners are setup to listen on the pod IP only. With regard to containerIP, the following is extracted from k8s API specs https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/:

ports
ContainerPort array
patch type: merge
patch merge key: containerPort List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Cannot be updated.

So trying to understand what's expected from setting containerPort in the deployment.

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Jun 21, 2018

@baodongli the traffic is sent to that PodIP, but the original dest IP of the traffic is re-written as the traffic is trampolined.

Correct, k8s spec does not mandate the containerPorts to be listed, Istio requires it, so it can add the ports to the list of inboundPorts.

@louiscryan louiscryan added this to the 1.0 milestone Jun 22, 2018
@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 25, 2018

Is this realistic for 1.0 ?

Is there a clear design - and understanding of the tradeoffs and what will break ?

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 25, 2018

@rshriram Scenario is:
Service A - port 8000
Service B - HTTP, port 9000
Attacker connects to a pod on Service A, on port 9000, and makes a request with "Host: b". It hits the *:9000 listener, which forwards to service B (with A's credentials).

This is only a problem for outbound path - inbound is fine ( connecting to A on port 8000 correctly goes to the inbound listener ).

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 25, 2018

What setting container port does is changes iptables capture, so only the ports that are used by the app are captured by the inbound table.

That means an app connecting to Service A pod on port 9000 will not be intercepted by iptables/envoy, so request will be rejected.

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jun 25, 2018

For tracking, if the change is made please reopen (and test):
#5865

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Jul 3, 2018

@jasmine-jaksic this should be decided/vetted by the TOC, since the TOC set the initial P0 priority.

@jasminejaksic

This comment has been minimized.

Copy link
Member

@jasminejaksic jasminejaksic commented Jul 3, 2018

As you know Sven and Louis are OOO and @rshriram agrees with moving this out. It's either that or we delay 1.0 perpetually until we come up with a solution to fix this.

@PiotrSikora

This comment has been minimized.

Copy link
Member

@PiotrSikora PiotrSikora commented Jul 3, 2018

@jasmine-jaksic I was under the impression that the agreed-upon solution for this is separating inbound/outbound listeners and reverting iptables back to intercepting all the traffic by default.

On the Envoy side, the code is the same as for #2468 (another P0), one the Istio side, it will require small changes to the iptables script (to use 2 ports instead of 1) and then Pilot changes to match it.

@jasminejaksic

This comment has been minimized.

Copy link
Member

@jasminejaksic jasminejaksic commented Jul 3, 2018

@PiotrSikora and @andraxylia This seems very risky but let's discuss the details in tomorrow's 1.0 stand-up.

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Jul 6, 2018

I looked a bit into this - TPROXY does not use the redirect port, so the previously agreed solution doesn't work for that. I'm trying to fix TPROXY for 1.0 - it appears REDIRECT will not work for UDP (which we want to support), and most docs I found indicate TPROXY scales better and has less bugs than redirect.

I think we need to re-evaluate the implementation, given the goal of supporting firewall rules we will need
to have more dynamic configuration of the iptables, so instead of REDIRECT on different ports, I suggest

  1. Init container will have a small app that will query pilot and get the list of inbound ports to intercept and list of inbound ports to block
  2. Sidecar container to have NET_ADMIN capabilities, and contact pilot to dynamically update iptables
  3. Long term define an API for dynamic port support

This moves us closer to real firewall support, works for TPROXY as well as REDIRECT - and avoids the need for user to define all containerPort (which we should still recommend).

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Sep 21, 2018

Any update on this?

@andraxylia

This comment has been minimized.

Copy link
Contributor

@andraxylia andraxylia commented Sep 22, 2018

Once on-demand xDS is available, this will reduce the number of listeners in one Envoy to the ones actually needed, and we could change the outbound listeners for http to be VIP:port instead of wildcard 0.0.0.0:port. This would solve the trampoline problem, but it would break the headless, which relies on an wildcard listener on an specific port. Tcp and https use VIP:port listeners, so headless does not work for https anyway.

Why does ServiceEntry need a wildcard? What is the use case @rshriram ?

@stale

This comment has been minimized.

Copy link

@stale stale bot commented Dec 21, 2018

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale

This comment has been minimized.

Copy link

@stale stale bot commented Feb 13, 2019

This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@duderino

This comment has been minimized.

Copy link
Contributor

@duderino duderino commented Jun 14, 2019

@silentdai could you describe your current progress and the remaining work?

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Jul 22, 2019

[x] iptables redirect inbound traffic and outbound traffic to separate listeners
[x] inbound listener and outbound listener share no bind_to_port=false listener
[ ] redefine readiness probe
By then containerPort is not required when user opt-in above

@howardjohn

This comment has been minimized.

Copy link
Member

@howardjohn howardjohn commented Aug 26, 2019

@lambdai this is done now, right? Will be shipped in 1.3?

@costinm

This comment has been minimized.

Copy link
Contributor

@costinm costinm commented Aug 26, 2019

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Aug 27, 2019

The readiness probe: #16336

@lambdai

This comment has been minimized.

Copy link
Member

@lambdai lambdai commented Aug 27, 2019

Ideally by removing containerPort we should support dynamic ports but we are not there yet.

@howardjohn

This comment has been minimized.

Copy link
Member

@howardjohn howardjohn commented Sep 13, 2019

Shipped in 1.3!

@howardjohn howardjohn closed this Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.