Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing headless service with statefulset (in multi primary setup) is not resolvable across clusters. #31787

Closed
Tracked by #38
irajdeep opened this issue Mar 30, 2021 · 14 comments
Labels
area/networking area/user experience feature/Multi-cluster issues related with multi-cluster support kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@irajdeep
Copy link

irajdeep commented Mar 30, 2021

Bug description
I have multi-primary-cluster set up on the same network based on these instructions. Cross cluster connectivity is working as expected as mentioned in these verification docs. But exposing a headless service(with a statefulset) in cluster2 is not getting resolved from cluster1(using pod FQDN).

[x] Docs
[ ] Installation
[x] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[x] User Experience
[ ] Developer Infrastructure
[ ] Upgrade

Expected behavior
The headless service deployed in cluster2 is getting resolved from cluster1.

Steps to reproduce the bug
i. Follow the multi primary installation docs to install istio in multi-cluster setup.
ii. Verify the setup is working as per this verification docs.
iii. Create a headless service backed by a statefulset in cluster 2.

apiVersion: v1
kind: Service
metadata:
  name: helloworld
  labels:
    app: helloworld
    service: helloworld
spec:
  clusterIP: None
  ports:
  - port: 5000
    name: http
  selector:
    app: helloworld
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: helloworld
spec:
  serviceName: "helloworld"
  replicas: 2
  selector:
    matchLabels:
      app: helloworld
  template:
    metadata:
      labels:
        app: helloworld
    spec:
      containers:
      - name: helloworld
        image: docker.io/istio/examples-helloworld-v2
        resources:
          requests:
            cpu: "100m"
        imagePullPolicy: IfNotPresent #Always
        ports:
        - containerPort: 5000

The service object was also created in cluster1 (-- not sure if it's necessary for this use case).

iv. verify the pod FQDN is resolvable from cluster2: curl helloworld-0.helloworld.test.svc.cluster.local:5000/hello -- works as expected.
v. Create service entry objects corresponding to each of the statefulset pods in the test namespace in cluster1 to make the pod FQDNs resolvable from cluster1 with the following spec.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: helloworld-0
spec:
  hosts:
  - "helloworld-0.helloworld.test.svc.cluster.local"
  location: MESH_INTERNAL
  ports:
  - number: 5000
    name: http
    protocol: TCP
  resolution: DNS

and

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: helloworld-1
spec:
  hosts:
  - "helloworld-1.helloworld.test.svc.cluster.local"
  location: MESH_INTERNAL
  ports:
  - number: 5000
    name: http
    protocol: TCP
  resolution: DNS

vi. exec into a pod in cluster1(in test namespace) and run curl helloworld-0.helloworld.test.svc.cluster.local:5000/hello -- this would return Could not resolve host: helloworld-0.helloworld.test.svc.cluster.local

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

client version: 1.9.1
control plane version: 1.9.1
data plane version: 1.9.1 (5 proxies)

How was Istio installed?
https://istio.io/latest/docs/setup/install/multicluster/multi-primary/

Environment where the bug was observed (cloud vendor, OS, etc)
Deployed on AWS using KOPS.

Note: ServiceEntry object was created based on the discussion here: #7495
Additionally, please consider running istioctl bug-report and attach the generated cluster-state tarball to this issue.
Refer cluster state archive for more details.

@irajdeep
Copy link
Author

@howardjohn would appreciate any input on this

@howardjohn
Copy link
Member

ServiceEntry is not DNS resolvable by default without https://istio.io/latest/docs/ops/configuration/traffic-management/dns-proxy/

@irajdeep
Copy link
Author

irajdeep commented Mar 31, 2021

Thanks for the cue @howardjohn. I enabled DNS proxy in both clusters. Verified that it works with this step.

Thereafter I created a ServiceEntry in cluster1 to make the statefulset pod(exposed via headless service) in cluster2 be reachable from cluster1:

ServiceEntry spec:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: helloworld-0
spec:
  hosts:
  - "helloworld-0.helloworld.test.svc.cluster.local"
  location: MESH_INTERNAL
  ports:
  - number: 5000
    name: http
    protocol: TCP
  resolution: DNS

This time running curl from a pod(sleep pod from varification sample) in the same namespace in cluster1, getting this error:

curl -v helloworld-0.helloworld.test.svc.cluster.local:5000/hello
*   Trying 240.240.0.1:5000...
* Connected to helloworld-0.helloworld.test.svc.cluster.local (240.240.0.1) port 5000 (#0)
> GET /hello HTTP/1.1
> Host: helloworld-0.helloworld.test.svc.cluster.local:5000
> User-Agent: curl/7.75.0-DEV
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

Any pointers on what could be going wrong here?
Btw:

  • I have created the same ServiceEntry in both the cluster (although I am not sure if it's needed in cluster2 where the actual headless service is created).
  • Also created the headless service in both the clusters(I am not sure if it makes sense to create it in cluster1 where the ServiceEntry object exists).

@irajdeep
Copy link
Author

irajdeep commented Mar 31, 2021

  • Running istioctl proxy-status | grep test from cluster2 (where the sts and headless service is created), I get the following output:
istioctl proxy-status | grep test
helloworld-0.test                                     SYNCED     SYNCED     SYNCED     SYNCED       istiod-6754bd759d-qmvpl     1.9.1
helloworld-1.test                                     SYNCED     SYNCED     SYNCED     SYNCED       istiod-6754bd759d-qmvpl     1.9.1
sleep-64d7d56698-x5lnd.test                           SYNCED     SYNCED     SYNCED     SYNCED       istiod-6754bd759d-qmvpl     1.9.1
  • Fetching the listeners and clusters from the pod in cluster1 from where I am trying to run the curl:
istioctl proxy-config listeners sleep-64d7d56698-lsrw4.test | grep test
240.240.0.1   5000  ALL                                           Cluster: outbound|5000||helloworld-0.helloworld.test.svc.cluster.local
istioctl proxy-config clusters sleep-64d7d56698-lsrw4.test | grep test
helloworld-0.helloworld.test.svc.cluster.local          5000      -          outbound      STRICT_DNS
helloworld.test.svc.cluster.local                       5000      -          outbound      ORIGINAL_DST
sleep.test.svc.cluster.local                            80        -          outbound      EDS

@irajdeep irajdeep changed the title Exposing headless service(in multi primary setup) is not resolvable across clusters. Exposing headless service with statefulset (in multi primary setup) is not resolvable across clusters. Mar 31, 2021
@hzxuzhonghu
Copy link
Member

@irajdeep For different networks, this could not work now.

@irajdeep
Copy link
Author

irajdeep commented Apr 1, 2021

@irajdeep For different networks, this could not work now.

@hzxuzhonghu got it. Thanks for the heads up. But in this case, cluster1 and cluster2 are on the same network i.e the pods have direct connectivity across the clusters.

@hzxuzhonghu
Copy link
Member

IC, you should apply this patch #31758

@irajdeep
Copy link
Author

irajdeep commented Apr 2, 2021

@hzxuzhonghu thanks for pointing it out, I will test with that patch.
I am wondering is there any "unstable/nightly" release of Istio with this patch or do I need to build it locally and then test it?

@howardjohn
Copy link
Member

howardjohn commented Apr 2, 2021 via email

@bhavin192
Copy link

@irajdeep thank you for creating this issue, we were trying to achieve something similar with DNS proxying enabled (the Slack thread where you replied on Istio Slack).

@irajdeep
Copy link
Author

irajdeep commented Apr 5, 2021

IC, you should apply this patch #31758

Tried with istio version 1.10-alpha.a274e872c9fd1f252eb3c07e62f43a30ba4c552e (current master) , still getting the same error.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 2, 2021
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-04-02. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jul 17, 2021
@vin-mad
Copy link

vin-mad commented Sep 29, 2021

@irajdeep did you manage to solve this issue...facing a similar problem

@asad-awadia
Copy link

@howardjohn does it work with headless stateful services automatically as pods come and go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking area/user experience feature/Multi-cluster issues related with multi-cluster support kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

7 participants