Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accessing regular k8s services from istio mesh #506

Closed
prune998 opened this issue Jul 27, 2017 · 76 comments
Closed

accessing regular k8s services from istio mesh #506

prune998 opened this issue Jul 27, 2017 · 76 comments

Comments

@prune998
Copy link
Contributor

While I would like to route my traffic between my applications (http, grpc, tcp) using Istio/Envoy service mesh, some applications also need to reach some core TCP services like Zookeeper or Kafka.
I would like to be able to reach those core services using the regular K8s service endpoints.

app -> envoy proxy -> k8s service (by dns name .)

As far as I've found, it does not seem possible to route traffic out of the mesh, except using a Istio egress, which is http(s) only and is not meant to talk to k8s services.

Do you have any solution or plan for that ?
thanks.

@kyessenov
Copy link
Contributor

If you don't use auth feature, then you should be able to reach non-istio pods from istio pods and vice versa in the normal way.

@prune998
Copy link
Contributor Author

prune998 commented Jul 27, 2017

@kyessenov I should be able to reach pods or services ?
Actually I can reach neither...

~ # nslookup kafka-zk-broker-kafka.dev

Name:      kafka-zk-broker-kafka.dev
Address 1: 10.33.0.11 kafka-zk-kafka-0.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 2: 10.38.96.16 kafka-zk-kafka-2.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 3: 10.40.128.13 kafka-zk-kafka-1.kafka-zk-broker-kafka.dev.svc.cluster.local

~ # ping kafka-zk-broker-kafka.dev
PING kafka-zk-broker-kafka.dev (10.33.0.11): 56 data bytes
64 bytes from 10.33.0.11: seq=0 ttl=64 time=0.376 ms
64 bytes from 10.33.0.11: seq=1 ttl=64 time=0.348 ms

--- kafka-zk-broker-kafka.dev ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.348/0.362/0.376 ms

~ # telnet kafka-zk-broker-kafka.dev 9092
Connection closed by foreign host

Everything is working if I don't deploy the pod using the Istio sidecart

@kyessenov
Copy link
Contributor

Pods can only be reached through service names in istio (we don't program all individual pod routes).

This is likely due to namespacing issue (istio only cares about the namespace it's deployed in, cc @andraxylia ). I'd hope that if you deploy istio in "dev" namespace, it would work (at least we test for that case).

@prune998
Copy link
Contributor Author

in fact, everything is deployed in the "dev" namespace in my test.
My Kafka service is a headless one, trying to reach each pod individually, as suggest the result of the nslookup command.
So, if Istio can't reach (route) to the Pod's IP :

  1. i'm screwed
  2. I will have to user a cluster-IP service

Any chance to have Istio route to the pod's IP ? I think my usecase is more than usual, especially when you have a "service" like Kafka or Mongodb, with a rich client, where you want your "client" to know about all the existing server endpoints.

@kyessenov
Copy link
Contributor

kyessenov commented Jul 27, 2017

Thanks for the suggestion, we'll consider adding explicit network endpoints for headless TCP services. We were trying to preserve the service abstraction and reduce configuration load, but a rich client trying to address endpoints directly is a legitimate use-case. This is even more true for headless services.

At some point, we would want Envoy to take some of the rich client functionality by adding a kafka filter and delegating LB and other features to Envoy from the rich client. This would require a ClusterIP service. Would that make sense to you?

@kyessenov kyessenov self-assigned this Jul 27, 2017
@kyessenov
Copy link
Contributor

cc @rshriram @louiscryan

@prune998
Copy link
Contributor Author

@kyessenov I'm not sure it's a good idea to go the way of filters in Envoy. You will end re-writing applications clients logic in filter for as lot bunch of applications (Kafka, ZK, Mongo, Cassandra...).

The cool thing with rich clients is that they take care of the connection/deconnection/rebalance logic. There is no point going through another tool to gain nothing.

My suggestion would be enabling Istio/Envoy to route traffic to headless services, maybe by using some command line like --includeHeadlessServices, like includeIPRanges, or simply by discovering the headless services in the current namespace and maintaining a route for them...

Sadly, this discussion means I can't use Istio for now... at least with Envoy... How would it be with Linkerd ?

@kyessenov
Copy link
Contributor

We'll add an option to route directly to endpoints for headless services in the next release.

I'm not sure about the state of TCP load balancing for linkerd.

@prune998
Copy link
Contributor Author

can't wait for the next release then !
Any thought on the release date ? Will be testing right away !

@rshriram
Copy link
Member

It's not the proxy that's the issue. It's pilot. We don't configure Envoy or Linkerd with pod IPs due to the potentially large number of listener blocks or configs.

The fix for headless services would be a hack at most. It will face issues as more pods are added to the headless service or removed (if it's a statefulset there might be less churn). The sensible option for this is to have the passthrough mode support implemented in Envoy and then add a genetic tcp proxy listener in Envoy that matches traffic for the kube internal subnet range (e.g. 10.0..) and passes traffic through to original destination and port. Then one would be able to talk to pods directly irrespective of headless services or normal tcp services. We would probably even eliminate tcp proxy configuration completely.

@rshriram
Copy link
Member

Here is the issue in Envoy that is attempting to add this support. envoyproxy/envoy#1246

@vadimeisenbergibm
Copy link
Contributor

@kyessenov @rshriram How about enabling external traffic for TCP? Then headless services can be defined as external services (they are external to Istio).

A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for headful services?

@rshriram
Copy link
Member

rshriram commented Jul 28, 2017 via email

@kyessenov
Copy link
Contributor

How would transparent tcp proxying work with mTLS?

@FuzzOli87
Copy link

I opened an issue as well on the other repo. Just gathering them into one.

istio/old_issues_repo#37

@hollinwilkins
Copy link

hollinwilkins commented Aug 7, 2017

This would be very useful. I would like to use Istio but currently cannot because I need my services within the mesh to be able to access DBs and other services that are non-Istiofied.

@wuzhihui1123
Copy link

I have the same problem, can't access StatefulSet.

@ldemailly ldemailly added this to the Istio 0.2 milestone Aug 17, 2017
@ldemailly
Copy link
Contributor

Same as istio/old_pilot_repo#1015
(not sure we do have time to address the full solution but the minimal is k8 api server access)

@rshriram
Copy link
Member

@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.

@ldemailly
Copy link
Contributor

afaik the bug is we intercept everything but as long as we fix it's great

@sakshigoel12
Copy link
Contributor

@wattli can you add the details to explained in the morning to this bug

@prune998
Copy link
Contributor Author

Do you have any news on this ?

@ldemailly
Copy link
Contributor

it likely won't be fixed in the very first 0.2 release but should be soon after depending on your exact case (for instance access to k8s api server and https services should be first to get working)

@rshriram
Copy link
Member

@prune998 we have support for headless services in master. If you are feeling a bit adventurous, we would appreciate some feedback if you could try out the istio.yaml from istio/istio master branch. You need to make sure that you name the ports for headless services, and that the port on which the headless service is listening does not collide with istio-fied service ports (e.g., both headless and istio service on port 80). You can find an example of headless service in istio/pilot/test/integration/testdata/

@prune998
Copy link
Contributor Author

that's a good news. Will try it today.
Thanks.

@prune998
Copy link
Contributor Author

building the whole stack from master seems to be a mess... I think I'll wait until release (or nightly ?)

@prune998
Copy link
Contributor Author

@rshriram do you have a pointer to the commit that added headless services support ?
Thanks

@prune998
Copy link
Contributor Author

prune998 commented Jan 4, 2018

Well, there must be some timeout somewhere... do you see a consistent timeout ?
What if you connect to your pod (kubectl exec -ti <pod> sh) and run a postgres command by hand like psql ?
If it's a defined timeout it should always close the cnx after a fixed time...
I'm not sure how connections to headless services are made... are you still going through the envoy proxy or is it just iptables magic ? Anyone with the answer is welcomed to comment :)

@rshriram
Copy link
Member

rshriram commented Jan 4, 2018

It still goes through envoy.

@prune998
Copy link
Contributor Author

prune998 commented Jan 4, 2018

So maybe check the envoy /stats (on port 15000) and check for various timeout counters, if it's envoy it should be there...
if it's not... well, maybe it's the TCP stack ?

@hollinwilkins
Copy link

hollinwilkins commented Jan 4, 2018

@rshriram I only see this issue from istiofied pods (I have about 13 pods that are not istiofied, running the same stack that don't see this connection issue). I have one test pod right now, running one of the services that is istiofied to work through these connection issues. It could be some other middleware ultimately responsible for the issue, but I think Istio has something to do with it because of the setup I just described above. I am on GKE, and I don't think they impose any limits like this within their VMs. Also, I am not 100% sure if there is data across the wire or not. I have a connection pool, and I run health checks to the db fairly frequently (several times a minute). I saw this issue even with health checks running, but that could be because only one connection from the pool was being used by the health check, and when I go to do another query, a different connection gets pulled).

@prune998 The pod consistently times out if I let it sit without making any database calls. I will try the psql client and see if it has the same issue, that will rule out any library issues I may be running into. Also, I am running a new test without the connection expiration so I can check on the stats endpoint from the istio-proxy container.

@rshriram @prune998 Thank you both for the help! Thinking maybe we should put this into a separate issue?

@rshriram
Copy link
Member

rshriram commented Jan 4, 2018

Yes. Do you have Istio CA enabled? like mTLS enabled? Can you try with istio auth disabled and istio ca not deployed? I have a feeling that we are recycling Envoy every 10-15 minutes to refresh certificates. As part of the recycle, old connections are being terminated.

@hollinwilkins
Copy link

@rshriram I didn't install istio CA intentionally, I really don't want to add that layer yet. I used this install command: kubectl apply -f install/kubernetes/istio.yaml

However, I get this back when I run kubectl get po -n istio-system:

➜  istio kubectl get po -n istio-system
NAME                            READY     STATUS    RESTARTS   AGE
istio-ca-55b954ff7-mdgvz        1/1       Running   0          1d
istio-ingress-948b746cb-7nm75   1/1       Running   0          1d
istio-mixer-59cc756b48-n67mc    3/3       Running   0          1d
istio-pilot-55bb7f5d9d-7ss2l    2/2       Running   0          1d

Is that istio-ca doing what I think it may be doing? How do I disable it?

@hollinwilkins
Copy link

@prune998 I just ran the test using the psql command line utility, and I got this error after waiting 30 minutes:

> select * from service_accounts limit 1;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I think it is safe to assume it is a connection issue and not a library issue.

@prune998
Copy link
Contributor Author

prune998 commented Jan 4, 2018

@hollinwilkins try stopping (scale to 0) the istio-ca pod.
I had an issue where envoy was "restared" to sync the CA every 30 mins. It's a bug which may not be resolved yet.
you can do this safely if you're not using mtls.
Will link the bug once at the airport, i'm in the bus right now :)

@hollinwilkins
Copy link

@prune998 Can I just delete the deployment for istio-ca or should I scale?

@prune998
Copy link
Contributor Author

prune998 commented Jan 4, 2018

delete it if you're sure you;'re not using it...

@hollinwilkins
Copy link

@prune998 Hehe, gotcha. Also, there is too much information to sift through when I collect stats from istio-proxy. Is there a grep I can use to get you the useful stuff in regards to disconnects?

@ldemailly
Copy link
Contributor

you don't need to touch the CA to turn on/off mtls you just need to : (from the security faq)

kubectl edit configmap -n istio-system istio
comment out or uncomment out authPolicy: MUTUAL_TLS to toggle mTLS and then

kubectl delete pods -n istio-system -l istio=pilot
to restart Pilot, after a few seconds (depending on your *RefreshDelay) your Envoy proxies will have picked up the change from Pilot. During that time your services may be unavailable.

@hollinwilkins
Copy link

hollinwilkins commented Jan 4, 2018

@ldemailly I just edited the config, that line was already commented.

I did see a previous revision of the config had MUTUAL_TLS enabled, but that must have been from a long time ago.

# Uncomment the following line to enable mutual TLS between proxies
# authPolicy: MUTUAL_TLS

@hollinwilkins
Copy link

Actually, reviewing this:

Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","data":{"mesh":"# Uncomment the following line to enable mutual TLS between proxies\n# authPolicy: MUTUAL_TLS\n#\n# Edit this list t...

Coming from the config map, it looks like it was never enabled even in the previous revision.

@hollinwilkins
Copy link

@rshriram @prune998 Disabling istio-ca worked! I am no longer getting the disconnects after waiting 30 minutes.

@prune998
Copy link
Contributor Author

prune998 commented Jan 4, 2018

well, bug is still here then... I think this will change with the new GRPC API in envoy...
I can't find the related issue to this. Maybe there is no issue and I had that from someone else's comment... can't remember. Glad you had it working finaly !

@hollinwilkins
Copy link

@prune998 Added a PR for troubleshooting in Istio documentation: istio/istio.io#835

@andraxylia andraxylia removed this from the Istio 0.3 milestone Jan 4, 2018
@andraxylia
Copy link
Contributor

@hollinwilkins @prune998 thanks for your patience in troubleshooting this and for writing the troubleshooting guide.

It appears there is still a minor bug (with an easy workaround), since pilot-agent should not restart Envoy if encryption is disabled, even if istio-ca is present and certificates are refreshed. Until support for SDS #2120 will make envoy restart completely unnecessary, we need to allow both encrypted and un-encrypted services to co-exist in the same cluster, and we need the istio-ca in general. I opened #2427 for not having to disable istio-ca. We can close this issue.

@hollinwilkins
Copy link

@prune998 Starting to see another issue with this. Not sure if it is related to headless services, but it seems istio has an effect here. After deploying a certain number of pods in a namespace, connections to headless services stop working for some reason.

I deploy 6 services injected with istio sidecar, and they connect to my database fine. When I deploy the 7th and 8th one, they cannot connect. Deploying all 8 without istio offers no issue connecting to the database.

guptasu pushed a commit to guptasu/istio that referenced this issue Jun 11, 2018
howardjohn pushed a commit to howardjohn/istio that referenced this issue Jan 12, 2020
* use force flag for set override

* Update test

* fix conflict
incfly pushed a commit to incfly/istio that referenced this issue Nov 16, 2021
luksa pushed a commit to luksa/istio that referenced this issue Apr 29, 2022
Co-authored-by: maistra-bot <null>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests