-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitations around TCP Services make Istio pretty unusable on larger multi-tenant clusters #9784
Comments
Just to elaborate even further with TCP services and the multi-tenancy issues, despite the conflict above, in this example Application A is attempting to connect to Therefore we're in a situation where the connection to
|
The root cause of the limitation is that for headless services, the connection is initiated to the IP of the endpoint. We can't create 1 listener for each endpoint - some TCP services may have large number of replicas, and besides the endpoints change very often. For normal services, we listen and intercept the cluster IP - which is stable, and Istio EDS finds the endpoint IP. To solve this for TCP, the plan is to use the 'on-demand' configuration: Envoy will detect requests going to There is a separate effort - which may land in 1.1 ( or 1.1.x ) where we isolate the namespaces. When this is enabled, apps in a namespace A will only automatically get configs for other services and endpoints in same namespace. To access a service in any other namespace - you would need to add an explicit configuration (https://docs.google.com/document/d/1x8LI3T7SHW-yDrrt3ryEr6iSs9ewMCt4kXtJpIaSSlI/edit?ts=5be2c288#heading=h.m6yvqjh71gxi), similar with egress. For now separate ports is the only option with 1.0.x. Namespace isolation appears to be a small change and may be backported - but current target is 1.1.x |
Hey @costinm - thanks for the detailed response. I'm not sure if here or the doc is the right place to discuss it further, the proposal seems great for scaling reasons but if I'm reading the doc right, I'm not sure scoping will actually solve the issue above, if you're limited to just "public" and "private". Take the following example where we have an app that needs to talk to Kafka and Zookeeper in different namespaces, in order to achieve that, both services would need to be "public" Namespace "MyAppThatTalksToKafkaAndSolr" Namespace "Kafka"
Namespace "Solr"
Surely at that point both zookeeper.kafka and zookeeper.solr services will once again globally conflict? |
I have the similar scenario, but my 'kafaka' and 'solr' are not in k8s cluster. Looking forward to the solution. |
I have the similar scenario, any solution or suggestion for the issue ? thanks! |
/cc |
1 similar comment
/cc |
/cc More of this in the wild: a recent post on Reddit from a confused user. |
Are there any updates on this? |
@mkjoerg I think #13970 will help this, you'll loose some metrics etc around the connection but who cares, if it works. |
Sidecars can be used to limit egress service. |
@hzxuzhonghu If I read it right, using |
Yes, you need to specify all the services you want to connect to, we still can set namespace scoped services. |
@hzxuzhonghu This is making istio even more complicated to use.. |
@hzxuzhonghu Is there any reason why Istio choose to group all the clusters by ports? Currently finding the route is port -> rds(clsuter) -> endpoint Why not directly matching the cluster like the inbound listeners? |
This is determined by envoy. Personally i think for outbound there may be many clusters on same port, but for inbound, i think it can be only one. For outbound traffic, it is more complicated. I am not sure if this is right, hha. |
As I mentioned earlier in the thread, the fundamental problem is how to route to 'stateful sets'. We need to intercept the traffic somehow. Client app gets the IP of a specific instance, and it initiates a connection. In Envoy we need to know which service the IP belongs to, so we apply routes/policy/telemetry. Distributing all the IPs of a stateful set and creating a listener for each is too expensive ( mem usage, scale ). Doing on-demand lookup is an option but implementation is tricky and moving slowly, we are starting with on-demand RDS but won't be ready in 1.3, and it's now focused on scaling to large number of vhosts. For HTTP we have some more options - but in TCP case the only option I currently know is using the port. The port does not have to be unique per mesh - but we must have a way for a particular envoy to map a port to a service name. Sidecar API allows some options on what to import, so if you have namespaceA using port 8000 for serviceA, and namespaceB using port 8000 for serviceB - a client could import either A or B, and will get the corresponding service. The 'whitebox' mode for Sidecar also allow more customization, you can choose the local ports using Sidecar.egress.port. HTTP and VIPs are easy and don't depend as much on port, and if anyone has a viable proposal |
This doesn't seem a problem. As for statefulsets, the common quantity is about3, 5, 7. This figure can not be too big, as the cost to keep consistent among then is very big. |
I think this issue should be prioritized, cause it making usage of headless services and StatefullSets very unscalable and error prone. Here is my scenario, which is pretty the same as @Stono has mentioned in this thread:
Pilot errors
Conclusion Workaround Maybe we should think for at least some more scalable workaround for this. |
@Demonian Your problem is that istio donot support using same port for both http and tcp. Edit: The cause is headless service with tcp port. |
@hzxuzhonghu It doesn't support this only if there is headless service with TCP, because normal service with TCP will be mapped to |
Yes, you are right, for headless service, with multiple instances, the tcp port will conflict. |
Our current approach is to add all the headless service with following annotation: |
Yes, but this mean, that every statefullset should be in separate namespace, which is not super convenient in our case. |
I think I commented already, but to be clear:
|
From the original motivation of the issue
this has been fixed with the protocol sniffing functionality introduced by @yxue in the current 1.3 release.
This is still an issue and #16242 attempted to take an approach that should solve it for the most common case (99% of usage) where you typically have less than 5-7 pods for a headless service like etcd/elasticsearch/postgres/cassandra etc. Such headless services do not use autoscaling and they are most often statefulsets. However, there was a lot of opposition to that PR as it was merely buying time by postponing the occurrence of the problem until one had more than 8 pods for a headless service. This is the only concern that I feel is valid. Concerns about scalability and memory use are over inflated. It is no different from a scenario where the user creates 7-8 service entries for the headless service manually, with the pod IPs [this is very bad UX btw]. And unless we hit a situation where an end user has 100 headless services on a cluster where each headless service has 5 pods, I would not be worried about scalability. That said, I forsure do not want to make this a custom solution only when there are two imports on the same port or the so called "narrow set of apps that have this issue". Asking end users to learn sidecar is not an answer. The system should just work out of the box. Sidecar is an optimization and not a panacea to fix problems. Secondly, there is nothing preventing end users from launching two headless services in the same namespace, where each one could be offering something on same port (like a JMX metrics thingy for example). |
We discussed this issue in the networking WG meeting and there are some scale implications. @howardjohn will pick up a PR from @rshriram and do some testing so we can understand until what scale we can enable listeners per IP:port. @Stono and others who thumbed up the issue: meanwhile, it would be good to understand what is your scale: how many services/how many headless, how many pods per headless service, do you use auto-scaling, do you use or intend to use the Sidecar API. |
Hola, We also make heavy use of Sidecar, effectively we default to namespace level isolation and expand out whereas when i originally wrote this issue we didn't have that, and it was a cluster wide problem (eg separate teams could deploy into separate namespaces but break each others workloads). In terms of scale we have 6 kafka brokers right now, probably scale to around 10 based on current estimations. No AutoScaling. |
Hey - we're also using stateful sets pretty extensively. Our main use is various Akka clusters (our own apps). Our internal platform allocates unique SSet ports to avoid collisions - but we can't always do that with 3rd party stuff (e.g. Kafka). We do use autoscaling in our own apps, and no Sidecar API usage yet. |
#16845 landed in master which may address some of these issues. What we still need to do is add testing for this as well as measure performance impacts |
CPU usage: https://snapshot.raintank.io/dashboard/snapshot/osN1j81pNaAMf9HAtnK6Er8cFhDLDubb?panelId=6&fullscreen&orgId=2 Test case is very much a worst case -- a headless service scaling between 0 and 10 replicas every 3 min From a CPU profile seems like a lot of time is spent on envoyfilter stuff since I have a few, will retry with those off. Edit: cpu drops to ~2.2 CPU without envoy filter, vs 3.2 CPU with envoyfilter. So the change uses roughly 60% more CPU in some bad cases |
This is believed to be fixed in 1.4 but I haven't had a chance to verify all of the scenarios in the original issues. If someone has tried these, please let us know how it functions. We do have some testing for this as well but can't beat diversity of kafka/zookeeper/etc |
I am going to go with the optimistic approach that this is fixed in 1.4 and hope no one proves me wrong. If you do see issues though, please let us know |
@howardjohn Hello. I did a deep dive into the configurations in 1.4.x. It looks like by adding listeners with However, it did not resolve cluster wide conflicts between TCP headless services. I guess this is pretty impossible to solve considering the factors mentioned by the comments above. The only way I can think of is to limit the visibility(using I am more than happy if you correct me that I am wrong. Thanks in advance. |
Is this problem fixed or not? we plan to deploy 10s of zookeeper, solr, kafka cluster to the same kubernetes cluster with istio installed, is this going to break? |
We are the opposite of @AceHack, we already have Zookeeper and Kafka and this issue makes us fearful of adding Istio. |
@Fryuni and @AceHack I had a chat with @howardjohn who fixed this. We think this really is fixed but we haven't tested with Zookeeper or Kafka and don't have the bandwidth. So there's a testing gap there, but if you try it and find any issues, report them here and we'll fix them. |
I may have run into this issue in istio 1.5.1 I have two redis deployments in two different namespaces A and B The sidecar proxy of an app in namespace B, which should be connecting to redis in namespace B is reporting outgoing connections to the redis service in namespace A extract from logs of app in namespace B (namespace A is external-auth-server)
eas-redis-ha.external-auth-server service is a headless service |
@cameronbraid Could you show our services yaml and what you expect |
The app is running in namespace drivenow-staging-z using the following service to access redis-sentinel redis-sentinel.drivenow-staging-z
eas-redis-ha.external-auth-server
|
Hi,
I've talked about this a few times now in different forums but never really collated it all together in one place. We really need to find a solution, otherwise we may well have to pull istio out all together.
In summary, these limitations make Istio pretty unusable for us. They're all based around the fact that TCP services are effectively a cluster wide concern because they're listening on 0.0.0.0:port:
To put this into a practical example for us so you can see just how painful this is, we have two namespaces that both have Zookeeper in them. One is for Kafka, and one is Solr.
Neither of these namespaces use istio/istio-proxy (because they need to talk to to node directly by FQDN hostname or IP, which we know Istio doesn't do). However, applications in other namespaces, which are on the mesh need to be able to talk to both of these.
So you have a situation where an application in the mesh needs to be able to talk to two different sets of Zookeepers on a cluster (perfectly reasonable, different zookeeper clusters for different concerns).
Just to throw some more into the mix, the solr-headless service, which
Application A
needs to talk to happens to be on port 8080.All of the above is actually, impossible (as far as I can see) with Istio, as you can see from the discovery logs:
The only solution I can see here is to have all headless services on different TCP ports, which is completely unmanageable at any sort of scale (we have 100s of applications on a single cluster each managed by different teams, they should not need to coordinate to make sure ports don't conflict with http or tcp ports defined more broadly on the mesh).
Semi related issues:
And probably more...
The text was updated successfully, but these errors were encountered: