Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Network VM Installation 503 no healthy upstream #33268

Open
nmnellis opened this issue Jun 3, 2021 · 6 comments
Open

Multi-Network VM Installation 503 no healthy upstream #33268

nmnellis opened this issue Jun 3, 2021 · 6 comments
Labels
area/networking feature/Virtual-machine issues related with VM support kind/docs lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@nmnellis
Copy link
Contributor

nmnellis commented Jun 3, 2021

Bug description
Following the tutorial here https://istio.io/latest/docs/setup/install/virtual-machine/ for multi-network automatic service entry creation causes the VM to get 503 unhealthy upstream because there are no cluster IP addresses defined for helloworld.sample.svc

[2021-06-03T18:33:47.453Z] "GET /hello HTTP/1.1" 503 UH no_healthy_upstream - "-" 0 19 0 - "-" "curl/7.64.0" "fc38de56-1be0-4556-8156-0d44a74c0375" "helloworld.sample.svc:5000" "-" outbound|5000||helloworld.sample.svc.cluster.local - 10.100.153.188:5000 10.128.0.52:34390 - default


nick_nellis_solo_io@nick-test-eks:~$ curl -v helloworld.sample.svc:5000/hello
*   Trying 10.100.153.188...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5652928b7fb0)
* Connected to helloworld.sample.svc (10.100.153.188) port 5000 (#0)
> GET /hello HTTP/1.1
> Host: helloworld.sample.svc:5000
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 19
< content-type: text/plain
< date: Thu, 03 Jun 2021 18:29:18 GMT
< server: envoy
<
* Connection #0 to host helloworld.sample.svc left intact
no healthy upstream
nick_nellis_solo_io@nick-test-eks:~$ curl -sS localhost:15000/clusters | grep sample
outbound|5000||helloworld.sample.svc.cluster.local::observability_name::outbound|5000||helloworld.sample.svc.cluster.local
outbound|5000||helloworld.sample.svc.cluster.local::default_priority::max_connections::4294967295
outbound|5000||helloworld.sample.svc.cluster.local::default_priority::max_pending_requests::4294967295
outbound|5000||helloworld.sample.svc.cluster.local::default_priority::max_requests::4294967295
outbound|5000||helloworld.sample.svc.cluster.local::default_priority::max_retries::4294967295
outbound|5000||helloworld.sample.svc.cluster.local::high_priority::max_connections::1024
outbound|5000||helloworld.sample.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|5000||helloworld.sample.svc.cluster.local::high_priority::max_requests::1024
outbound|5000||helloworld.sample.svc.cluster.local::high_priority::max_retries::3
outbound|5000||helloworld.sample.svc.cluster.local::added_via_api::true
nick_nellis_solo_io@nick-test-eks:~$

[X] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
[ ] Upgrade

Expected behavior
the VM installation demo would work as expected

Steps to reproduce the bug
Follow https://istio.io/latest/docs/setup/install/virtual-machine/ for multi-network automatic service entry creation on an eks cluster and VM over the open internet with open ports

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

▶ istioctl version
client version: 1.10.0
control plane version: 1.10.0
data plane version: 1.10.0 (5 proxies)

How was Istio installed?
istioctl

Environment where the bug was observed (cloud vendor, OS, etc)
EKS.5 1.19

Additionally, please consider running istioctl bug-report and attach the generated cluster-state tarball to this issue
Refer cluster state archive for more details.

bug-report.tar.gz

@linsun
Copy link
Member

linsun commented Jun 3, 2021

@nmnellis looks like from the VM service, the hello service's endpoint isn't correct (should be the east-west ingress gw's LB ip) as the VM service have to access the hello service from the east-west ingress gw. One reason this could happen is when we are not detecting networks properly for VM service or hello service or the east-west ingress gw.

@howardjohn I saw we have troubleshooting instructions for MC multi-networks (https://istio.io/latest/docs/ops/diagnostic-tools/multicluster/#step-by-step-diagnosis) to check client + server network configurations, do we have similar docs for service on VM?

@nmnellis said he tried the same steps on GKE and it works fine.

@linsun
Copy link
Member

linsun commented Jun 4, 2021

Given #32962, it is likely mesh network config won't work for you on 1.10.0. Is your GKE env using same Istio 1.10.0 as EKS env?

@linsun
Copy link
Member

linsun commented Jun 4, 2021

I've validated the hello pod and east-west gw are marked correctly for network, along w/ the service for VM.

One thing is in EKS the east-west gw has a host name, e.g. x.us-east-2.elb.amazonaws.com for the EXTERNAL-IP value. @stevenctl @howardjohn - do we have any limitation with gw with host name in MC multi network env where istio can't replace the target service pod IP with its east-west gateway EXTERNAL-IP when EXTERNAL-IP is a hostname?

@stevenctl
Copy link
Contributor

do we have any limitation with gw with host name in MC multi network env where istio can't replace the target service pod IP with its east-west gateway EXTERNAL-IP when EXTERNAL-IP is a hostname?

Yes, see #29359. I think this problem is a little worse for VMs, since they need this resolvable just to reach the control plane, and the current design only handles the DNS in control plane.

Maybe we can do something in agent where we resolve this and use the resolved IPs + the original discovery address so SNI will work?

@linsun
Copy link
Member

linsun commented Jun 9, 2021

Thank you @stevenctl. Yea, I assume we need to resolve the east-west address of the gw on the VM so the vm can talk to istiod via the east-west gw and istio-agent seems a good home for this.

I assume we will also need to resolve the east-west address in istiod so that istiod can determine what services' endpoints to push to the istio-proxy running on the VM.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Sep 7, 2021
@stevenctl
Copy link
Contributor

stevenctl commented Sep 7, 2021

One workaround would be to set discoveryAddress in proxy config to the hostname of the eastwest gateway (we shouldn't pin to a single IP here). You can also set ISTIOD_SAN to istiod.istio-system... so that SNI still routes correctly.

I assume we will also need to resolve the east-west address in istiod so that istiod can determine what services' endpoints to push to the istio-proxy running on the VM.

What config we send is more related to #29359. I think this issue (#33268) should focus on the VM bootstrap/istiod reachability.

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Sep 7, 2021
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 7, 2021
@linsun linsun added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Dec 7, 2021
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking feature/Virtual-machine issues related with VM support kind/docs lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
None yet
Development

No branches or pull requests

4 participants