accessing regular k8s services from istio mesh #506

prune998 · 2017-07-27T15:15:37Z

While I would like to route my traffic between my applications (http, grpc, tcp) using Istio/Envoy service mesh, some applications also need to reach some core TCP services like Zookeeper or Kafka.
I would like to be able to reach those core services using the regular K8s service endpoints.

app -> envoy proxy -> k8s service (by dns name .)

As far as I've found, it does not seem possible to route traffic out of the mesh, except using a Istio egress, which is http(s) only and is not meant to talk to k8s services.

Do you have any solution or plan for that ?
thanks.

kyessenov · 2017-07-27T18:40:31Z

If you don't use auth feature, then you should be able to reach non-istio pods from istio pods and vice versa in the normal way.

prune998 · 2017-07-27T19:17:17Z

@kyessenov I should be able to reach pods or services ?
Actually I can reach neither...

~ # nslookup kafka-zk-broker-kafka.dev

Name:      kafka-zk-broker-kafka.dev
Address 1: 10.33.0.11 kafka-zk-kafka-0.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 2: 10.38.96.16 kafka-zk-kafka-2.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 3: 10.40.128.13 kafka-zk-kafka-1.kafka-zk-broker-kafka.dev.svc.cluster.local

~ # ping kafka-zk-broker-kafka.dev
PING kafka-zk-broker-kafka.dev (10.33.0.11): 56 data bytes
64 bytes from 10.33.0.11: seq=0 ttl=64 time=0.376 ms
64 bytes from 10.33.0.11: seq=1 ttl=64 time=0.348 ms

--- kafka-zk-broker-kafka.dev ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.348/0.362/0.376 ms

~ # telnet kafka-zk-broker-kafka.dev 9092
Connection closed by foreign host

Everything is working if I don't deploy the pod using the Istio sidecart

kyessenov · 2017-07-27T19:32:52Z

Pods can only be reached through service names in istio (we don't program all individual pod routes).

This is likely due to namespacing issue (istio only cares about the namespace it's deployed in, cc @andraxylia ). I'd hope that if you deploy istio in "dev" namespace, it would work (at least we test for that case).

prune998 · 2017-07-27T20:46:08Z

in fact, everything is deployed in the "dev" namespace in my test.
My Kafka service is a headless one, trying to reach each pod individually, as suggest the result of the nslookup command.
So, if Istio can't reach (route) to the Pod's IP :

i'm screwed
I will have to user a cluster-IP service

Any chance to have Istio route to the pod's IP ? I think my usecase is more than usual, especially when you have a "service" like Kafka or Mongodb, with a rich client, where you want your "client" to know about all the existing server endpoints.

kyessenov · 2017-07-27T21:15:59Z

Thanks for the suggestion, we'll consider adding explicit network endpoints for headless TCP services. We were trying to preserve the service abstraction and reduce configuration load, but a rich client trying to address endpoints directly is a legitimate use-case. This is even more true for headless services.

At some point, we would want Envoy to take some of the rich client functionality by adding a kafka filter and delegating LB and other features to Envoy from the rich client. This would require a ClusterIP service. Would that make sense to you?

kyessenov · 2017-07-27T21:18:00Z

cc @rshriram @louiscryan

prune998 · 2017-07-27T21:40:04Z

@kyessenov I'm not sure it's a good idea to go the way of filters in Envoy. You will end re-writing applications clients logic in filter for as lot bunch of applications (Kafka, ZK, Mongo, Cassandra...).

The cool thing with rich clients is that they take care of the connection/deconnection/rebalance logic. There is no point going through another tool to gain nothing.

My suggestion would be enabling Istio/Envoy to route traffic to headless services, maybe by using some command line like --includeHeadlessServices, like includeIPRanges, or simply by discovering the headless services in the current namespace and maintaining a route for them...

Sadly, this discussion means I can't use Istio for now... at least with Envoy... How would it be with Linkerd ?

kyessenov · 2017-07-27T21:41:54Z

We'll add an option to route directly to endpoints for headless services in the next release.

I'm not sure about the state of TCP load balancing for linkerd.

prune998 · 2017-07-27T21:42:47Z

can't wait for the next release then !
Any thought on the release date ? Will be testing right away !

rshriram · 2017-07-28T00:03:54Z

It's not the proxy that's the issue. It's pilot. We don't configure Envoy or Linkerd with pod IPs due to the potentially large number of listener blocks or configs.

The fix for headless services would be a hack at most. It will face issues as more pods are added to the headless service or removed (if it's a statefulset there might be less churn). The sensible option for this is to have the passthrough mode support implemented in Envoy and then add a genetic tcp proxy listener in Envoy that matches traffic for the kube internal subnet range (e.g. 10.0..) and passes traffic through to original destination and port. Then one would be able to talk to pods directly irrespective of headless services or normal tcp services. We would probably even eliminate tcp proxy configuration completely.

rshriram · 2017-07-28T00:04:51Z

Here is the issue in Envoy that is attempting to add this support. envoyproxy/envoy#1246

vadimeisenbergibm · 2017-07-28T16:11:24Z

@kyessenov @rshriram How about enabling external traffic for TCP? Then headless services can be defined as external services (they are external to Istio).

A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for headful services?

rshriram · 2017-07-28T18:50:12Z

On Fri, Jul 28, 2017 at 12:11 PM Vadim Eisenberg ***@***.***> wrote: @kyessenov <https://github.com/kyessenov> @rshriram <https://github.com/rshriram> How about enabling external traffic for TCP? Then headless services can be defined as external services (they are external to Istio).

K8s external services don't support tcp. Secondly, the user wants to directly talk to pod IP. We need to process pods in statefulset or headless services like any other pod, there by providing the ability to dynamically add or remove pods from an upstream cluster. A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for

headful services?

We setup tcp proxy in Envoy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#506 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qd1n9tAP3I-NbP25eWdRRaHU8JFulks5sSggwgaJpZM4OlbzW> .

-- ~shriram

kyessenov · 2017-07-28T21:45:12Z

How would transparent tcp proxying work with mTLS?

FuzzOli87 · 2017-08-02T23:11:59Z

I opened an issue as well on the other repo. Just gathering them into one.

istio/old_issues_repo#37

hollinwilkins · 2017-08-07T04:48:41Z

This would be very useful. I would like to use Istio but currently cannot because I need my services within the mesh to be able to access DBs and other services that are non-Istiofied.

wuzhihui1123 · 2017-08-14T10:41:17Z

I have the same problem, can't access StatefulSet.

ldemailly · 2017-08-17T00:11:28Z

Same as istio/old_pilot_repo#1015
(not sure we do have time to address the full solution but the minimal is k8 api server access)

rshriram · 2017-08-17T02:35:42Z

@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.

ldemailly · 2017-08-17T20:45:20Z

afaik the bug is we intercept everything but as long as we fix it's great

sakshigoel12 · 2017-09-12T23:53:08Z

@wattli can you add the details to explained in the morning to this bug

prune998 · 2017-09-20T19:47:35Z

Do you have any news on this ?

ldemailly · 2017-09-20T19:59:00Z

it likely won't be fixed in the very first 0.2 release but should be soon after depending on your exact case (for instance access to k8s api server and https services should be first to get working)

rshriram · 2017-09-21T03:43:23Z

@prune998 we have support for headless services in master. If you are feeling a bit adventurous, we would appreciate some feedback if you could try out the istio.yaml from istio/istio master branch. You need to make sure that you name the ports for headless services, and that the port on which the headless service is listening does not collide with istio-fied service ports (e.g., both headless and istio service on port 80). You can find an example of headless service in istio/pilot/test/integration/testdata/

prune998 · 2017-09-21T10:48:06Z

that's a good news. Will try it today.
Thanks.

prune998 · 2017-09-22T12:34:21Z

building the whole stack from master seems to be a mess... I think I'll wait until release (or nightly ?)

prune998 · 2017-09-22T13:12:16Z

@rshriram do you have a pointer to the commit that added headless services support ?
Thanks

rshriram · 2018-01-04T17:01:41Z

It still goes through envoy.

prune998 · 2018-01-04T17:08:06Z

So maybe check the envoy /stats (on port 15000) and check for various timeout counters, if it's envoy it should be there...
if it's not... well, maybe it's the TCP stack ?

hollinwilkins · 2018-01-04T17:52:38Z

@rshriram I only see this issue from istiofied pods (I have about 13 pods that are not istiofied, running the same stack that don't see this connection issue). I have one test pod right now, running one of the services that is istiofied to work through these connection issues. It could be some other middleware ultimately responsible for the issue, but I think Istio has something to do with it because of the setup I just described above. I am on GKE, and I don't think they impose any limits like this within their VMs. Also, I am not 100% sure if there is data across the wire or not. I have a connection pool, and I run health checks to the db fairly frequently (several times a minute). I saw this issue even with health checks running, but that could be because only one connection from the pool was being used by the health check, and when I go to do another query, a different connection gets pulled).

@prune998 The pod consistently times out if I let it sit without making any database calls. I will try the psql client and see if it has the same issue, that will rule out any library issues I may be running into. Also, I am running a new test without the connection expiration so I can check on the stats endpoint from the istio-proxy container.

@rshriram @prune998 Thank you both for the help! Thinking maybe we should put this into a separate issue?

rshriram · 2018-01-04T18:03:14Z

Yes. Do you have Istio CA enabled? like mTLS enabled? Can you try with istio auth disabled and istio ca not deployed? I have a feeling that we are recycling Envoy every 10-15 minutes to refresh certificates. As part of the recycle, old connections are being terminated.

hollinwilkins · 2018-01-04T18:42:27Z

@rshriram I didn't install istio CA intentionally, I really don't want to add that layer yet. I used this install command: kubectl apply -f install/kubernetes/istio.yaml

However, I get this back when I run kubectl get po -n istio-system:

➜  istio kubectl get po -n istio-system
NAME                            READY     STATUS    RESTARTS   AGE
istio-ca-55b954ff7-mdgvz        1/1       Running   0          1d
istio-ingress-948b746cb-7nm75   1/1       Running   0          1d
istio-mixer-59cc756b48-n67mc    3/3       Running   0          1d
istio-pilot-55bb7f5d9d-7ss2l    2/2       Running   0          1d

Is that istio-ca doing what I think it may be doing? How do I disable it?

hollinwilkins · 2018-01-04T18:45:13Z

@prune998 I just ran the test using the psql command line utility, and I got this error after waiting 30 minutes:

> select * from service_accounts limit 1;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I think it is safe to assume it is a connection issue and not a library issue.

prune998 · 2018-01-04T18:46:22Z

@hollinwilkins try stopping (scale to 0) the istio-ca pod.
I had an issue where envoy was "restared" to sync the CA every 30 mins. It's a bug which may not be resolved yet.
you can do this safely if you're not using mtls.
Will link the bug once at the airport, i'm in the bus right now :)

hollinwilkins · 2018-01-04T18:48:19Z

@prune998 Can I just delete the deployment for istio-ca or should I scale?

prune998 · 2018-01-04T18:50:02Z

delete it if you're sure you;'re not using it...

hollinwilkins · 2018-01-04T18:52:38Z

@prune998 Hehe, gotcha. Also, there is too much information to sift through when I collect stats from istio-proxy. Is there a grep I can use to get you the useful stuff in regards to disconnects?

ldemailly · 2018-01-04T19:02:55Z

you don't need to touch the CA to turn on/off mtls you just need to : (from the security faq)

kubectl edit configmap -n istio-system istio
comment out or uncomment out authPolicy: MUTUAL_TLS to toggle mTLS and then

kubectl delete pods -n istio-system -l istio=pilot
to restart Pilot, after a few seconds (depending on your *RefreshDelay) your Envoy proxies will have picked up the change from Pilot. During that time your services may be unavailable.

hollinwilkins · 2018-01-04T19:11:10Z

@ldemailly I just edited the config, that line was already commented.

I did see a previous revision of the config had MUTUAL_TLS enabled, but that must have been from a long time ago.

# Uncomment the following line to enable mutual TLS between proxies
# authPolicy: MUTUAL_TLS

hollinwilkins · 2018-01-04T19:13:33Z

Actually, reviewing this:

Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","data":{"mesh":"# Uncomment the following line to enable mutual TLS between proxies\n# authPolicy: MUTUAL_TLS\n#\n# Edit this list t...

Coming from the config map, it looks like it was never enabled even in the previous revision.

hollinwilkins · 2018-01-04T19:32:10Z

@rshriram @prune998 Disabling istio-ca worked! I am no longer getting the disconnects after waiting 30 minutes.

prune998 · 2018-01-04T19:59:30Z

well, bug is still here then... I think this will change with the new GRPC API in envoy...
I can't find the related issue to this. Maybe there is no issue and I had that from someone else's comment... can't remember. Glad you had it working finaly !

hollinwilkins · 2018-01-04T19:59:57Z

@prune998 Added a PR for troubleshooting in Istio documentation: istio/istio.io#835

andraxylia · 2018-01-04T23:30:07Z

@hollinwilkins @prune998 thanks for your patience in troubleshooting this and for writing the troubleshooting guide.

It appears there is still a minor bug (with an easy workaround), since pilot-agent should not restart Envoy if encryption is disabled, even if istio-ca is present and certificates are refreshed. Until support for SDS #2120 will make envoy restart completely unnecessary, we need to allow both encrypted and un-encrypted services to co-exist in the same cluster, and we need the istio-ca in general. I opened #2427 for not having to disable istio-ca. We can close this issue.

hollinwilkins · 2018-01-08T22:28:27Z

@prune998 Starting to see another issue with this. Not sure if it is related to headless services, but it seems istio has an effect here. After deploying a certain number of pods in a namespace, connections to headless services stop working for some reason.

I deploy 6 services injected with istio sidecar, and they connect to my database fine. When I deploy the 7th and 8th one, they cannot connect. Deploying all 8 without istio offers no issue connecting to the database.

* use force flag for set override * Update test * fix conflict

Bump up istio/istio to f527312.

Co-authored-by: maistra-bot <null>

…uster Disable upstream multicluster

kyessenov self-assigned this Jul 27, 2017

ldemailly added this to the Istio 0.2 milestone Aug 17, 2017

stepanv mentioned this issue Sep 22, 2017

accessing service mesh services from a regular ones #905

Closed

hollinwilkins mentioned this issue Jan 4, 2018

Add troubleshooting headless TCP service disconnects. istio/istio.io#835

Closed

andraxylia removed this from the Istio 0.3 milestone Jan 4, 2018

andraxylia closed this as completed Jan 4, 2018

andraxylia added the area/security label Jan 4, 2018

guptasu pushed a commit to guptasu/istio that referenced this issue Jun 11, 2018

fix wildcard doc (istio#506)

8d67e57

mridul-sahu mentioned this issue Nov 1, 2018

Istio sidecar returns 404 when trying to reach Statefulset #9666

Closed

howardjohn pushed a commit to howardjohn/istio that referenced this issue Jan 12, 2020

Fix Galley trust domain and make cluster.local default (istio#506)

7d10654

howardjohn pushed a commit to howardjohn/istio that referenced this issue Jan 12, 2020

Add force flag for set override and update CNI schema (istio#506)

2cb46d8

* use force flag for set override * Update test * fix conflict

incfly pushed a commit to incfly/istio that referenced this issue Nov 16, 2021

Merge pull request istio#506 from tetratelabs/xcp-bump-latest-1.9

f1b1297

Bump up istio/istio to f527312.

luksa pushed a commit to luksa/istio that referenced this issue Apr 29, 2022

Automator: Update proxy (istio#506)

43bea68

Co-authored-by: maistra-bot <null>

vikaschoudhary16 pushed a commit to vikaschoudhary16/istio that referenced this issue Aug 12, 2024

Merge pull request istio#506 from tetrateio/port/1.21/disable-multicl…

3ad0d05

…uster Disable upstream multicluster

accessing regular k8s services from istio mesh #506

accessing regular k8s services from istio mesh #506

Comments

prune998 commented Jul 27, 2017

kyessenov commented Jul 27, 2017

prune998 commented Jul 27, 2017 • edited Loading

kyessenov commented Jul 27, 2017

prune998 commented Jul 27, 2017

kyessenov commented Jul 27, 2017 • edited Loading

kyessenov commented Jul 27, 2017

prune998 commented Jul 27, 2017

kyessenov commented Jul 27, 2017

prune998 commented Jul 27, 2017

rshriram commented Jul 28, 2017

rshriram commented Jul 28, 2017

vadimeisenbergibm commented Jul 28, 2017

rshriram commented Jul 28, 2017 via email

kyessenov commented Jul 28, 2017

FuzzOli87 commented Aug 2, 2017

hollinwilkins commented Aug 7, 2017 • edited Loading

wuzhihui1123 commented Aug 14, 2017

ldemailly commented Aug 17, 2017

rshriram commented Aug 17, 2017

ldemailly commented Aug 17, 2017

sakshigoel12 commented Sep 12, 2017

prune998 commented Sep 20, 2017

ldemailly commented Sep 20, 2017

rshriram commented Sep 21, 2017

prune998 commented Sep 21, 2017

prune998 commented Sep 22, 2017

prune998 commented Sep 22, 2017

rshriram commented Jan 4, 2018

prune998 commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018 • edited Loading

rshriram commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018

prune998 commented Jan 4, 2018 • edited Loading

hollinwilkins commented Jan 4, 2018

prune998 commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018

ldemailly commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018 • edited Loading

hollinwilkins commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018

prune998 commented Jan 4, 2018

hollinwilkins commented Jan 4, 2018

andraxylia commented Jan 4, 2018

hollinwilkins commented Jan 8, 2018

prune998 commented Jul 27, 2017 •

edited

Loading

kyessenov commented Jul 27, 2017 •

edited

Loading

hollinwilkins commented Aug 7, 2017 •

edited

Loading

hollinwilkins commented Jan 4, 2018 •

edited

Loading

prune998 commented Jan 4, 2018 •

edited

Loading

hollinwilkins commented Jan 4, 2018 •

edited

Loading