Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Cluster/Multi-Network - Cannot use a hostname-based gateway for east-west traffic #29359

Closed
bryankaraffa opened this issue Dec 3, 2020 · 83 comments · Fixed by #35346 or #36422
Closed
Labels
area/environments area/networking feature/Multi-cluster issues related with multi-cluster support feature/Multi-control-plane issues related with multi-control-plane support in a cluster kind/docs

Comments

@bryankaraffa
Copy link
Contributor

bryankaraffa commented Dec 3, 2020

Bug description

Following the guide Install Multi-Primary on different networks, everything seems to install as expected without errors and is running in the cluster. For secret/cacerts I am using the example certificate material from samples/certs/*.pem in both cluster1 and cluster2.

When I attempt to verify the installation using the guide Verify the installation the requests are not getting routed to the remote cluster as expected. I am only getting responses from the service on the local cluster:

# From Cluster 1 [where helloworld v1 is deployed]
$ while true; do kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- curl -s helloworld.sample:5000/hello; done
Hello version: v1, instance: helloworld-v1-578dd69f69-r9lkz
Hello version: v1, instance: helloworld-v1-578dd69f69-r9lkz
Hello version: v1, instance: helloworld-v1-578dd69f69-r9lkz
Hello version: v1, instance: helloworld-v1-578dd69f69-r9lkz
Hello version: v1, instance: helloworld-v1-578dd69f69-r9lkz
...
# From Cluster 2 [where helloworld v2 is deployed]
$ while true; do kubectl exec --context="${CTX_CLUSTER2}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER2}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- curl -s helloworld.sample:5000/hello; done
Hello version: v2, instance: helloworld-v2-776f74c475-h5j2q
Hello version: v2, instance: helloworld-v2-776f74c475-h5j2q
Hello version: v2, instance: helloworld-v2-776f74c475-h5j2q
Hello version: v2, instance: helloworld-v2-776f74c475-h5j2q
Hello version: v2, instance: helloworld-v2-776f74c475-h5j2q
...

istioctl proxy-config endpoint for the sleep pod in cluster1 and cluster2 to helloworld destination service:

# Cluster 1
$ istioctl -n sample --context=${CTX_CLUSTER1} proxy-config endpoint "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" | grep helloworld
10.100.1.12:5000                 HEALTHY     OK                outbound|5000||helloworld.sample.svc.cluster.local

# Cluster 2
$ istioctl -n sample --context=${CTX_CLUSTER2} proxy-config endpoint "$(kubectl get pod --context="${CTX_CLUSTER2}" -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" | grep helloworld
10.100.2.188:5000                HEALTHY     OK                outbound|5000||helloworld.sample.svc.cluster.local

It seems like maybe something is missing from the docs or example configs, but I understand that there are tests for the docs/examples, which is why I’ve been troubleshooting my own cluster… but just seems like something small is missing

[X] Docs
[X] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
[ ] Upgrade

Expected behavior
I expected to be able to follow the guide and get the same behavior that the guide expects

Steps to reproduce the bug
Following the guides for multi-primary, multi-network install and verify the install

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

$ istioctl version
client version: 1.8.0
control plane version: 1.8.0
data plane version: 1.8.0 (4 proxies)

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-eks-7684af", GitCommit:"7684af4ac41370dd109ac13817023cb8063e3d45", GitTreeState:"clean", BuildDate:"2020-10-20T22:57:40Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?
Istio Operator installed with istioctl operator init and the rest of the installation of istio in istio-system is done as steps in the guide [using example manifests & scripts to compile example manifests]

Environment where the bug was observed (cloud vendor, OS, etc)
AWS EKS with Kubernetes v1.17
Istio 1.8.0 on Mac

$ uname -a
Darwin OAK-MAC-HLJHD2 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_6

Additionally, please consider running istioctl bug-report and attach the generated cluster-state tarball to this issue.
Refer cluster state archive for more details.

@istio-policy-bot istio-policy-bot added area/environments area/networking feature/Multi-cluster issues related with multi-cluster support feature/Multi-control-plane issues related with multi-control-plane support in a cluster kind/docs labels Dec 3, 2020
@rinormaloku
Copy link
Contributor

rinormaloku commented Dec 3, 2020

Hi @bryankaraffa

This looks as though your clusters are not properly configured for endpoint discovery. This requires both eastwest gateway and as well the secrets to be properly configured.

To verify secrets, check if you have the kubeconfigs located in the istio-system namespace for both clusters. And afaik you need to validate that the context name in the secret matches the IstioOperator configuration, specifically the property multicluster.clusterName

      multiCluster:
        clusterName: cluster1

@sonnysideup
Copy link

@rinormaloku This may be a releated issue so it's probably worth mentioning here.

I followed the Install Primary-Remote on different networks instructions using v1.8.0 and, insofar as I can tell, the cross-cluster communication is set up correctly. Yet, I'm seeing the same issue as @bryankaraffa. I also verified that the context name inside the secret created by istioctl x create-remote-secret matches the remote cluster name (i.e. dataplane; see full configs below).

I see the primary Pilot log metadata related to Pods/Services created inside the remote but there are never any endpoints created in the primary (maybe that's expected). Either way, trying to route from the primary to the remote using a helloworld always fails.

I've included my full configuration files to ensure there's nothing wrong with them.

Primary configuration

  1. Install istio
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: cdps-mesh
      multiCluster:
        clusterName: controlplane
      network: cp-network
  1. Configure control plane gateway
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: cdps
spec:
  revision: ""
  profile: empty
  components:
    ingressGateways:
      - name: istio-cdps-gateway
        label:
          istio: cdps-gateway
          app: istio-cdps-gateway
          topology.istio.io/network: cp-network
        enabled: true
        k8s:
          env:
            # sni-dnat adds the clusters required for AUTO_PASSTHROUGH mode
            - name: ISTIO_META_ROUTER_MODE
              value: "sni-dnat"
            # traffic through this gateway should be routed inside the network
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: cp-network
          service:
            ports:
              - name: status-port
                port: 15021
                targetPort: 15021
              - name: tls
                port: 15443
                targetPort: 15443
              - name: tls-istiod
                port: 15012
                targetPort: 15012
              - name: tls-webhook
                port: 15017
                targetPort: 15017
  values:
    global:
      meshID: cdps-mesh
      network: cp-network
      multiCluster:
        clusterName: controlplane
  1. Expose CP istiod Pilot
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: istiod-gateway
  namespace: istio-system
spec:
  selector:
    istio: cdps-gateway
  servers:
    - port:
        name: tcp-istiod
        number: 15012
        protocol: TCP
      hosts:
        - "*"
    - port:
        name: tcp-istiodwebhook
        number: 15017
        protocol: TCP
      hosts:
        - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: istiod-vs
  namespace: istio-system
spec:
  hosts:
  - istiod.istio-system.svc.cluster.local
  gateways:
  - istiod-gateway
  tcp:
  - match:
    - port: 15012
    route:
    - destination:
        host: istiod.istio-system.svc.cluster.local
        port:
          number: 15012
  - match:
    - port: 15017
    route:
    - destination:
        host: istiod.istio-system.svc.cluster.local
        port:
          number: 443
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: istiod-dr
  namespace: istio-system
spec:
  host: istiod.istio-system.svc.cluster.local
  trafficPolicy:
    portLevelSettings:
    - port:
        number: 15012
      tls:
        mode: DISABLE
    - port:
        number: 15017
      tls:
        mode: DISABLE
  1. Expose control plane services
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: services-gateway
  namespace: istio-system
spec:
  selector:
    istio: cdps-gateway
  servers:
    - port:
        number: 15443
        name: tls
        protocol: TLS
      tls:
        mode: AUTO_PASSTHROUGH
      hosts:
        - "*.local"

Remote configuration

  1. Install istio
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: remote
  values:
    global:
      meshID: cdps-mesh
      multiCluster:
        clusterName: dataplane
      network: dp-network
      remotePilotAddress: <istio-cdps-gateway-loadbalancer-ip>
  1. Configure remote gateway
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: cdps
spec:
  revision: ""
  profile: empty
  components:
    ingressGateways:
      - name: istio-cdps-gateway
        label:
          istio: cdps-gateway
          app: istio-cdps-gateway
          topology.istio.io/network: dp-network
        enabled: true
        k8s:
          env:
            # sni-dnat adds the clusters required for AUTO_PASSTHROUGH mode
            - name: ISTIO_META_ROUTER_MODE
              value: "sni-dnat"
            # traffic through this gateway should be routed inside the network
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: dp-network
          service:
            ports:
              - name: status-port
                port: 15021
                targetPort: 15021
              - name: tls
                port: 15443
                targetPort: 15443
              - name: tls-istiod
                port: 15012
                targetPort: 15012
              - name: tls-webhook
                port: 15017
                targetPort: 15017
  values:
    global:
      meshID: cdps-mesh
      network: dp-network
      multiCluster:
        clusterName: dataplane
  1. Expose remote services
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: services-gateway
  namespace: istio-system
spec:
  selector:
    istio: cdps-gateway
  servers:
    - port:
        number: 15443
        name: tls
        protocol: TLS
      tls:
        mode: AUTO_PASSTHROUGH
      hosts:
        - "*.local"

@rinormaloku
Copy link
Contributor

rinormaloku commented Dec 3, 2020

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: cdps-mesh
      multiCluster:
        clusterName: controlplane
      network: cp-network             # <---   networks are named the same
  1. Configure control plane gateway
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: cdps
spec:
  revision: ""
  profile: empty
  values:
    global:
      meshID: cdps-mesh
      network: cp-network     # <---   networks are named the same
      multiCluster:
        clusterName: controlplane

Why are the networks of the different clusters named the same? considering that you want a Primary-Remote to be on different networks, the networks need to be uniquely named

@sonnysideup
Copy link

sonnysideup commented Dec 3, 2020

The two resources you listed above are both deployed into the primary (controlplane) cluster; I don't reference the cp-network in any of the resources I deploy into the remote cluster. I was under the impression that the network names should be the same here. Is that not the case? Should it reference the remote cluster instead, or a different name? If so, what should the remote cluster's gateway reference as a network?

I'm sorry for the back-and-forth, I'm just a little confused at this point.

# installs istiod "default" profile into primary
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: cdps-mesh
      multiCluster:
        clusterName: controlplane
      network: cp-network

---
# installs gateway into primary cluster with the same "cp-network" name
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: cdps
spec:
  revision: ""
  profile: empty
  components:
    ingressGateways:
      - name: istio-cdps-gateway
        label:
          istio: cdps-gateway
          app: istio-cdps-gateway
          topology.istio.io/network: cp-network
        enabled: true
        k8s:
          env:
            # sni-dnat adds the clusters required for AUTO_PASSTHROUGH mode
            - name: ISTIO_META_ROUTER_MODE
              value: "sni-dnat"
            # traffic through this gateway should be routed inside the network
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: cp-network
          service:
            ports:
              - name: status-port
                port: 15021
                targetPort: 15021
              - name: tls
                port: 15443
                targetPort: 15443
              - name: tls-istiod
                port: 15012
                targetPort: 15012
              - name: tls-webhook
                port: 15017
                targetPort: 15017
  values:
    global:
      meshID: cdps-mesh
      network: cp-network
      multiCluster:
        clusterName: controlplane

@rinormaloku
Copy link
Contributor

rinormaloku commented Dec 3, 2020

@sonnysideup the back and forths are good, will help others that are having issues as well. I spent 3 days getting multi clusters to work and getting an idea of how things are connected; that's to say that the documentation is not on par (understandable as there were many changes recently), with other Istio docs.

From your config everything looks correct, so if the secret of the dataplane cluster is stored in the controlplane cluster, the only thing that's left is making sure that the istio-system namespace is labeled according to the docs.

kubectl --context="${CTX_CP}" label namespace istio-system topology.istio.io/network=REPLACECPNetwork
kubectl --context="${CTX_DP}" label namespace istio-system topology.istio.io/network=REPLACEDPNetwork

@sonnysideup
Copy link

Unfortunately, that step has already been performed. 😢

covid19:istio-files sonny$ kubectl --context control-plane get ns istio-system --show-labels
NAME           STATUS   AGE     LABELS
istio-system   Active   3h36m   topology.istio.io/network=cp-network
covid19:istio-files sonny$ kubectl --context data-plane get ns istio-system --show-labels
NAME           STATUS   AGE     LABELS
istio-system   Active   3h34m   topology.istio.io/network=dp-network

Nothing else readily comes to mind regarding setup steps that I may have missed. I can see that the remote helloworld deployment has been sync'd inside the primary cluster

covid19:istio-files sonny$ istioctl --context control-plane proxy-status
NAME                                                   CDS        LDS        EDS          RDS          ISTIOD                      VERSION
helloworld-v2-7855866d4f-6lg7h.compute                 SYNCED     SYNCED     SYNCED       SYNCED       istiod-655559c95b-q7425     1.8.0
istio-cdps-gateway-579665f6b6-chl8p.istio-system       SYNCED     SYNCED     SYNCED       NOT SENT     istiod-655559c95b-q7425     1.8.0
istio-cdps-gateway-78df8697fb-htfn6.istio-system       SYNCED     SYNCED     SYNCED       NOT SENT     istiod-655559c95b-q7425     1.8.0
istio-ingressgateway-5779d8cd47-th7cm.istio-system     SYNCED     SYNCED     NOT SENT     NOT SENT     istiod-655559c95b-q7425     1.8.0
istio-ingressgateway-9469967d9-6hfr9.istio-system      SYNCED     SYNCED     NOT SENT     NOT SENT     istiod-655559c95b-q7425     1.8.0
sleep-8f795f47d-4xjwf.platform                         SYNCED     SYNCED     SYNCED       SYNCED       istiod-655559c95b-q7425     1.8.0

and the sleep pod inside the primary cluster where I'm performing verification is also up-to-date.

pieper:istio-files sonny$ istioctl --context control-plane proxy-status sleep-8f795f47d-4xjwf.platform
Clusters Match
Listeners Match
Routes Match (RDS last loaded at Thu, 03 Dec 2020 03:05:43 MST)

The only option I can think of right now is to enable DEBUG logging for istiod / istio-proxy and see if anything pops up there.

@sonnysideup
Copy link

sonnysideup commented Dec 3, 2020

Yeah, I created the service in both the primary and remote clusters, and I deployed the actual workload to the remote cluster. I'm not able to address the desired service without creating it inside the primary; I guess service mirroring is not supported. Whenever you do that, do you see endpoints created inside your primary cluster?

After I create the helloworld app inside the remote cluster, I see the following logs from istiod inside the primary cluster:

istiod-655559c95b-q7425 discovery 2020-12-03T13:35:37.473614Z   info    ads     Incremental push, service helloworld.example.svc.cluster.local has no endpoints
istiod-655559c95b-q7425 discovery 2020-12-03T13:35:37.573743Z   info    ads     Push debounce stable[93] 1: 100.081671ms since last change, 100.08147ms since last push, full=false
istiod-655559c95b-q7425 discovery 2020-12-03T13:35:37.573812Z   info    ads     XDS: Incremental Pushing:2020-12-03T13:30:33Z/50 ConnectedEndpoints:3

This looks suspicous and makes me think something is awry otherwise I'd end up with endpoints being created inside the primary cluster, no?

@nmittler
Copy link
Contributor

nmittler commented Dec 3, 2020

@sonnysideup @bryankaraffa

can look look in the istiod logs to see if it was able to access the remote API server from the secret?

@bryankaraffa
Copy link
Contributor Author

@nmittler --
Seeing this in istiod after setting global.proxy.logLevel: debug.. from my blue cluster I see green cluster is added after I create the secret:

│ 2020-12-02T20:30:04.195802Z    info    Processing add: istio-system/istio-remote-secret-kg-cet-917-green-staging-us-west-2                                                                                 │
│ 2020-12-02T20:30:04.196166Z    info    Adding cluster_id=kg-cet-917-green-staging-us-west-2 from secret=istio-system/istio-remote-secret-kg-cet-917-green-staging-us-west-2                                │
│ 2020-12-02T20:30:04.196295Z    info    Processing add: istio-system/istio-remote-secret-kg-cet-917-green-staging-us-west-2                                                                                 │
│ 2020-12-02T20:30:04.196319Z    info    Adding cluster_id=kg-cet-917-green-staging-us-west-2 from secret=istio-system/istio-remote-secret-kg-cet-917-green-staging-us-west-2                                │
│ 2020-12-02T20:30:04.308083Z    info    Number of remote clusters: 1

@martin2176
Copy link

I followed the procedure exactly as described in the 1.8 multi primary multi network and it worked without any issues. I was able to successfully complete the multicluster verification as described here --> https://istio.io/latest/docs/setup/install/multicluster/verify/
Matter of fact, the whole procedure worked perfect the very first time and under an hour I was up and running with multi cluster.

Few things I found out during the procedure (they are documented well in the instructions)

You have to create stub namespace and stub service in cluster1 (same namespace and same service name) to access the service in cluster2. This is well documented in the procedure. But just pointing out that this is an important step.
So is the remote secret.
Cluster1 should be able to access the kubeapi on clsuter2 and vice versa. If you are running hosted Kubernetes make sure your Kube api is accessible from both clusters. If you have IP ACLs on kube api, make sure you allow access from the other cluster.
Also cluster 1 should be able to access the east west gateway on cluster2 and vice versa. Once again IP ACLS or Network security group etc need to allow this.

@stevenctl
Copy link
Contributor

stevenctl commented Dec 3, 2020

@rinormaloku FYI the reason the Service must exist in both clusters is so that the Service's hostname resolves to some IP just to get the request out of the client workload and to its sidecar proxy.

@sonnysideup @bryankaraffa

The cluster name in the context field of the remote secret is not what istiod will use for the cluster name. Rather the data key under string data should be verified. These are likely the same, but worth checking. Do the clusterIDs in your logs match what you have in your istio operator config? Or are those generated names that might be assumed from your local kubeconfig file?

Also curious what happens if you restart istiod.

@bryankaraffa
Copy link
Contributor Author

bryankaraffa commented Dec 3, 2020

Here's hopefully some validation that I have followed the guide, and everything is deployed as expected:

Secret with certificate material created before istio deployed:

$ kubectl --context=$CTX_CLUSTER1 get secrets -n istio-system
NAMESPACE         NAME                                                           TYPE                                  DATA   AGE
istio-system      secret/cacerts                                                 Opaque                                4      6m36s

$ kubectl --context=$CTX_CLUSTER2 get secrets -n istio-system
NAMESPACE         NAME                                                           TYPE                                  DATA   AGE
istio-system      secret/cacerts                                                 Opaque                                4      5m19s

Istio Primary + EastWest Gateway Deployed deployed
cluster1:

$ kubectl --context="${CTX_CLUSTER1}" get namespace istio-system && \
>   kubectl --context="${CTX_CLUSTER1}" label namespace istio-system topology.istio.io/network=network1
NAME           STATUS   AGE
istio-system   Active   10m
namespace/istio-system labeled

bash-3.2$ cat <<EOF > cluster1.yaml
> apiVersion: install.istio.io/v1alpha1
> kind: IstioOperator
> spec:
>   values:
>     global:
>       meshID: mesh1
>       multiCluster:
>         clusterName: cluster1
>       network: network1
> EOF

bash-3.2$ istioctl install --context="${CTX_CLUSTER1}" -f cluster1.yaml
This will install the Istio profile into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete                                                                                                                                                                                                                                                                                                                                   

bash-3.2$ samples/multicluster/gen-eastwest-gateway.sh \
>     --mesh mesh1 --cluster cluster1 --network network1 | \
>     istioctl --context="${CTX_CLUSTER1}" install -y -f -
✔ Ingress gateways installed
✔ Installation complete                                                                                                                                                                                                                                                                                                                                   

bash-3.2$ kubectl --context="${CTX_CLUSTER1}" get svc istio-eastwestgateway -n istio-system
NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP                                                              PORT(S)                                                           AGE
istio-eastwestgateway   LoadBalancer   172.20.20.30   a5e21e07fd1a64a518ab6c02b4dfb9f5-826145575.us-west-2.elb.amazonaws.com   15021:31015/TCP,15443:30756/TCP,15012:32126/TCP,15017:32461/TCP   72s

bash-3.2$ kubectl --context="${CTX_CLUSTER1}" apply -n istio-system -f \
>     samples/multicluster/expose-services.yaml
gateway.networking.istio.io/cross-network-gateway created

and cluster2:

$ kubectl --context="${CTX_CLUSTER2}" get namespace istio-system && \
>   kubectl --context="${CTX_CLUSTER2}" label namespace istio-system topology.istio.io/network=network2
NAME           STATUS   AGE
istio-system   Active   10m
namespace/istio-system labeled

bash-3.2$ cat <<EOF > cluster2.yaml
> apiVersion: install.istio.io/v1alpha1
> kind: IstioOperator
> spec:
>   values:
>     global:
>       meshID: mesh1
>       multiCluster:
>         clusterName: cluster2
>       network: network2
> EOF

bash-3.2$ istioctl install --context="${CTX_CLUSTER2}" -f cluster2.yaml
This will install the Istio profile into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete                                                                                                                                                                                                                                                                                                                                   

bash-3.2$ samples/multicluster/gen-eastwest-gateway.sh \
>     --mesh mesh1 --cluster cluster2 --network network2 | \
>     istioctl --context="${CTX_CLUSTER2}" install -y -f -
✔ Ingress gateways installed
✔ Installation complete                                                                                                                                                                                                                                                                                                                                   

bash-3.2$ kubectl --context="${CTX_CLUSTER2}" get svc istio-eastwestgateway -n istio-system
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)                                                           AGE
istio-eastwestgateway   LoadBalancer   172.20.47.145   a98c08f2d1548410ea36cda99e19cc55-183263882.us-west-2.elb.amazonaws.com   15021:32469/TCP,15443:31613/TCP,15012:32636/TCP,15017:31475/TCP   14s

bash-3.2$ kubectl --context="${CTX_CLUSTER2}" apply -n istio-system -f \
>     samples/multicluster/expose-services.yaml
gateway.networking.istio.io/cross-network-gateway created

Enable Endpoint Discovery on cluster1 and cluster2:

bash-3.2$ istioctl x create-remote-secret \
>   --context="${CTX_CLUSTER1}" \
>   --name=cluster1 | \
>   kubectl apply -f - --context="${CTX_CLUSTER2}"
secret/istio-remote-secret-cluster1 created

bash-3.2$ istioctl x create-remote-secret \
>   --context="${CTX_CLUSTER2}" \
>   --name=cluster2 | \
>   kubectl apply -f - --context="${CTX_CLUSTER1}"
secret/istio-remote-secret-cluster2 created

Service/helloworld is deployed to both cluster1 and cluster2 in the samples namespace:

bash-3.2$ kubectl apply --context="${CTX_CLUSTER1}" \
>     -f samples/helloworld/helloworld.yaml \
>     -l service=helloworld -n sample
service/helloworld created

bash-3.2$ kubectl apply --context="${CTX_CLUSTER2}" \
>     -f samples/helloworld/helloworld.yaml \
>     -l service=helloworld -n sample
service/helloworld created

and the backend deployments:

bash-3.2$ kubectl apply --context="${CTX_CLUSTER1}" \
>     -f samples/helloworld/helloworld.yaml \
>     -l version=v1 -n sample
deployment.apps/helloworld-v1 created

bash-3.2$ kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l app=helloworld
NAME                             READY   STATUS    RESTARTS   AGE
helloworld-v1-578dd69f69-svvwf   2/2     Running   0          27s

bash-3.2$ kubectl apply --context="${CTX_CLUSTER2}" \
>     -f samples/helloworld/helloworld.yaml \
>     -l version=v2 -n sample
deployment.apps/helloworld-v2 created
bash-3.2$ kubectl get pod --context="${CTX_CLUSTER2}" -n sample -l app=helloworld
NAME                             READY   STATUS    RESTARTS   AGE
helloworld-v2-776f74c475-lqkst   2/2     Running   0          30s

And sleep pods also have been injected as expected [2 out of 2 containers]:

bash-3.2$ kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l app=sleep
NAME                    READY   STATUS    RESTARTS   AGE
sleep-f8cbf5b76-h82pz   2/2     Running   0          25s

bash-3.2$ kubectl get pod --context="${CTX_CLUSTER2}" -n sample -l app=sleep
NAME                    READY   STATUS    RESTARTS   AGE
sleep-f8cbf5b76-rj2jg   2/2     Running   0          31s

@stevenctl
Copy link
Contributor

stevenctl commented Dec 3, 2020

EXTERNAL-IP
a5e21e07fd1a64a518ab6c02b4dfb9f5-826145575.us-west-2.elb.amazonaws.com

This may be the issue. If you kubectl --context="${CTX_CLUSTER1}" get svc istio-eastwestgateway -n istio-system -oyaml is there a IP address at either status.LoadBalancer.Ingress or under spec.ExternalIPs? Those are the only two address types we allow for auto-gateway discovery (via that topology.istio.io/network label on the Service).

If you know the IP you may be able to use a legacy type of configuration meshNetworks to manually specify the addresses to use for the gateways:

values:
  global:
    meshNetworks:
      network1:
        endpoints:
        - fromRegistry: cluster1
        gateways:
        - address: 1.2.3.4
          port: 15443
      network2:
        endpoints:
        - fromRegistry: cluster2
        gateways:
        - address: 5.6.7.8
          port: 15443

This would be included in the install operator for all clusters and would need to be identical in every cluster in the mesh.

More info: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#MeshNetworks

@bryankaraffa
Copy link
Contributor Author

EXTERNAL-IP
a5e21e07fd1a64a518ab6c02b4dfb9f5-826145575.us-west-2.elb.amazonaws.com

This may be the issue. If you kubectl --context="${CTX_CLUSTER1}" get svc istio-eastwestgateway -n istio-system -oyaml is there a IP address at either status.LoadBalancer.Ingress or under spec.ExternalIPs? Those are the only two address types we allow for auto-gateway discovery (via that topology.istio.io/network label on the Service).

@nmittler -- confirming there are no IPs under status.LoadBalancer.Ingress or spec.ExternalIPs

$ kubectl get --context=$CTX_CLUSTER1 svc istio-eastwestgateway -n istio-system -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"istio-eastwestgateway","install.operator.istio.io/owning-resource":"eastwest","install.operator.istio.io/owning-resource-namespace":"istio-system","istio":"eastwestgateway","istio.io/rev":"default","operator.istio.io/component":"IngressGateways","operator.istio.io/managed":"Reconcile","operator.istio.io/version":"1.8.0","release":"istio","topology.istio.io/network":"network1"},"name":"istio-eastwestgateway","namespace":"istio-system"},"spec":{"ports":[{"name":"status-port","port":15021,"protocol":"TCP","targetPort":15021},{"name":"mtls","port":15443,"protocol":"TCP","targetPort":15443},{"name":"tcp-istiod","port":15012,"protocol":"TCP","targetPort":15012},{"name":"tcp-webhook","port":15017,"protocol":"TCP","targetPort":15017}],"selector":{"app":"istio-eastwestgateway","istio":"eastwestgateway","topology.istio.io/network":"network1"},"type":"LoadBalancer"}}
  creationTimestamp: "2020-12-03T19:19:35Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app: istio-eastwestgateway
    install.operator.istio.io/owning-resource: eastwest
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio: eastwestgateway
    istio.io/rev: default
    operator.istio.io/component: IngressGateways
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.8.0
    release: istio
    topology.istio.io/network: network1
  name: istio-eastwestgateway
  namespace: istio-system
  resourceVersion: "2493"
  selfLink: /api/v1/namespaces/istio-system/services/istio-eastwestgateway
  uid: af17e671-8853-4235-a758-66e8a5b8617d
spec:
  clusterIP: 172.20.81.177
  externalTrafficPolicy: Cluster
  ports:
  - name: status-port
    nodePort: 30079
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: mtls
    nodePort: 30893
    port: 15443
    protocol: TCP
    targetPort: 15443
  - name: tcp-istiod
    nodePort: 32378
    port: 15012
    protocol: TCP
    targetPort: 15012
  - name: tcp-webhook
    nodePort: 31536
    port: 15017
    protocol: TCP
    targetPort: 15017
  selector:
    app: istio-eastwestgateway
    istio: eastwestgateway
    topology.istio.io/network: network1
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: aaf17e67188534235a75866e8a5b8617-938382465.us-west-2.elb.amazonaws.com

This seems similar/related to what I ran into with multi-cluster setup on Istio 1.7 on AWS EKS, on this step where you get the remote cluster ingress hostname... The docs suggested using -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}' but I had to use -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}' to get a valid value for the ServiceEntry in next step

Same on my current cluster as well:

$ kubectl get --context=$CTX_CLUSTER2 svc --selector=app=istio-eastwestgateway     -n istio-system -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'

$ kubectl get --context=$CTX_CLUSTER2 svc --selector=app=istio-eastwestgateway     -n istio-system -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}'
aaf17e67188534235a75866e8a5b8617-938382465.us-west-2.elb.amazonaws.com

@stevenctl
Copy link
Contributor

https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#Network-IstioNetworkGateway

The address field should support an externally resolvable hostname. I think we should be able to support auto-discovering the LoadBalancer.hostname field as well, I'll open a separate issue for adding that and hopefully it will be available in a future release.

For the time being, we should add a section to the doc explaining this alternative config when there isn't an IP.

@nmittler
Copy link
Contributor

nmittler commented Dec 3, 2020

@sonnysideup @bryankaraffa can you confirm that the work-around in #29359 (comment) resolves the issue?

We'll want to hold-off on updating docs until we know this works.

@stevenctl
Copy link
Contributor

From looking at the code.. this may have some odd behavior:

ResolveHostsInNetworksConfig(meshNetworks)

When we load meshNetworks, we resolve the DNS there and replace the hostname with an IP. This can have issues when:

  • DNS changes and the hostname is backed by a different IP
  • Pilot is running somewhere that it can't resolve DNS but the proxies can (unlikely)

@bryankaraffa
Copy link
Contributor Author

bryankaraffa commented Dec 3, 2020

@sonnysideup @nmittler -- Does not look like global.meshNetworks.*.gateways[].address likes a hostname:

2020-12-03T20:40:37.838002Z	info	initializing mesh networks
2020-12-03T20:40:37.838315Z	info	failed to read mesh networks configuration from "/etc/istio/config/meshNetworks": 2 errors occurred:
	* invalid network network2: aec97d349f725404488c60706cda3bdd-1359780995.us-west-2.elb.amazonaws.com is not a valid IP
	* invalid network network1: aaf17e67188534235a75866e8a5b8617-938382465.us-west-2.elb.amazonaws.com is not a valid IP

https://github.com/istio/istio/blob/release-1.8/pkg/config/validation/validation.go#L2448-L2488

@stevenctl
Copy link
Contributor

@bryankaraffa Looks like that validation added in #23311 conflicts with other logic we have in pilot. It does still seem risky to use a hostname here the way hostname support is currently implemented (eagerly resolving DNS rather than resolving it at the proxy).

@stevenctl
Copy link
Contributor

Supporting this is probably non-trivial. A (not so great) workaround you can try is further customizing your eastwest gateway to be a NodePort service, and then manually specifying the IP + node port in meshNetworks.

@bryankaraffa
Copy link
Contributor Author

bryankaraffa commented Dec 3, 2020

Because I won't know the hostname/IP until after I deploy the eastwest-gateway [perhaps as a NodePort as suggested].. how should I go about getting the hostname / reconfiguring the meshNetwork with an IP address just as a test? Seems like there's only a few ways...

deploy the eastwest-gateway first, then configure cluster as primary with meshNetworks config [reverse order from guide]

or

install eastwest-gateway after configure as primary [same as guide], then get hostname/IP, edit the manifest to append meshNetworks config, and re-install eastwest-gateway

(?)

@stevenctl
Copy link
Contributor

Instead of including meshNetworks in the install, you can update the ConfigMap named istio in your system namespace. There is a key meshNetworks that should be hot-reloadable by pilot.

data:
  meshNetworks: |-
   networks:
     network1:
       endpoints:
         - fromRegistry: cluster1
       gateways:
         - address: 1.2.3.4
           port: 15443

@rinormaloku
Copy link
Contributor

More info: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#MeshNetworks

@stevenctl, I started a discuss thread with points that are confusing about MeshNetworks. I'd really appreciate it if you got some time to share your insights. I'll drive that knowledge into the Istio docs (now I don't find those sufficiently clear).

@stevenctl
Copy link
Contributor

Thanks, I just posted a short reply there, but apparently I need moderator approval. The tl;dr is that meshNetworks is a legacy piece of config that we want to move away from.

The only reasons I can see to use it:

  • manually specifying gateway addresses (fromRegistry + registryServiceName is essentially configured with the Service and Namespace labels).
  • Inferring workload networks based on cidr. This seems like an especially rare and advanced use-case.

Also I forgot that we recently fixed support for NodePort services. You should be able to use the topology.istio.io/network annotation on your eastwest gateway service of type NodePort and let istiod do the rest. It isn't well tested outside of unit-tests, so I can't say that we claim full support but it is worth a try.

@bryankaraffa
Copy link
Contributor Author

bryankaraffa commented Dec 4, 2020

@nmittler @sonnysideup @stevenctl ... Got it working by manually defining meshNetworks so hopefully that helps confirm what is needed. I am surprised nobody else using AWS EKS and Istio 1.8 have run into this issue. Let me know if there's any other details I can provide which would help. Thanks for your assistance in the meantime!

Here's my notes from the test [all these resources have been destroy already but can be replicated easily]:

Get Cluster 1 eastwestgateway Host/IP:

$ kubectl --context="${CTX_CLUSTER1}" -n istio-system get svc/istio-eastwestgateway
NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                                                           AGE
istio-eastwestgateway   LoadBalancer   172.20.165.219   a5700a42e952e42568fab49239b17071-2044969611.us-west-2.elb.amazonaws.com   15021:30119/TCP,15443:32032/TCP,15012:30470/TCP,15017:30736/TCP   13m

$ host a5700a42e952e42568fab49239b17071-2044969611.us-west-2.elb.amazonaws.com
a5700a42e952e42568fab49239b17071-2044969611.us-west-2.elb.amazonaws.com has address 44.235.109.1
a5700a42e952e42568fab49239b17071-2044969611.us-west-2.elb.amazonaws.com has address 35.155.121.141

Get Cluster 2 eastwestgateway Host/IP:

$ kubectl --context="${CTX_CLUSTER2}" -n istio-system get svc/istio-eastwestgateway
NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                                                           AGE
istio-eastwestgateway   LoadBalancer   172.20.252.110   a333ff3671278406f804e20da78c800d-1702858926.us-west-2.elb.amazonaws.com   15021:31407/TCP,15443:30253/TCP,15012:32116/TCP,15017:32546/TCP   12m

$ host a333ff3671278406f804e20da78c800d-1702858926.us-west-2.elb.amazonaws.com
a333ff3671278406f804e20da78c800d-1702858926.us-west-2.elb.amazonaws.com has address 44.232.122.66
a333ff3671278406f804e20da78c800d-1702858926.us-west-2.elb.amazonaws.com has address 35.155.125.147

Desired change to data.meshNetworks:

  meshNetworks: |-
    networks:
      network1:
        endpoints:
          - fromRegistry: cluster1
        gateways:
          - address: 44.235.109.1
            port: 15443
          - address: 35.155.121.141
            port: 15443
      network2:
        endpoints:
          - fromRegistry: cluster2
        gateways:
          - address: 44.232.122.66
            port: 15443
          - address: 35.155.125.147
            port: 15443

Define Cluster 1's configmap/istio data.meshNetworks manually

$ kubectl --context="${CTX_CLUSTER1}" -n istio-system edit configmap/istio

$ kubectl --context="${CTX_CLUSTER1}" -n istio-system get configmap/istio -o yaml
apiVersion: v1
data:
  mesh: |-
    accessLogFile: /dev/stdout
    defaultConfig:
      discoveryAddress: istiod.istio-system.svc:15012
      meshId: mesh1
      proxyMetadata:
        DNS_AGENT: ""
      tracing:
        zipkin:
          address: zipkin.istio-system:9411
    enablePrometheusMerge: true
    rootNamespace: istio-system
    trustDomain: cluster.local
  meshNetworks: |-
    networks:
      network1:
        endpoints:
          - fromRegistry: cluster1
        gateways:
          - address: 44.235.109.1
            port: 15443
          - address: 35.155.121.141
            port: 15443
      network2:
        endpoints:
          - fromRegistry: cluster2
        gateways:
          - address: 44.232.122.66
            port: 15443
          - address: 35.155.125.147
            port: 15443
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"mesh":"accessLogFile: /dev/stdout\ndefaultConfig:\n  discoveryAddress: istiod.istio-system.svc:15012\n  meshId: mesh1\n  proxyMetadata:\n    DNS_AGENT: \"\"\n  tracing:\n    zipkin:\n      address: zipkin.istio-system:9411\nenablePrometheusMerge: true\nrootNamespace: istio-system\ntrustDomain: cluster.local","meshNetworks":"networks: {}"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"install.operator.istio.io/owning-resource":"unknown","install.operator.istio.io/owning-resource-namespace":"istio-system","istio.io/rev":"default","operator.istio.io/component":"Pilot","operator.istio.io/managed":"Reconcile","operator.istio.io/version":"1.8.0","release":"istio"},"name":"istio","namespace":"istio-system"}}
  creationTimestamp: "2020-12-04T18:53:32Z"
  labels:
    install.operator.istio.io/owning-resource: unknown
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.8.0
    release: istio
  name: istio
  namespace: istio-system
  resourceVersion: "8054"
  selfLink: /api/v1/namespaces/istio-system/configmaps/istio
  uid: 134aeebe-19e1-4ed3-8347-d1069c7468ee

Define Cluster 2's configmap/istio data.meshNetworks manually

$ kubectl --context="${CTX_CLUSTER2}" -n istio-system edit configmap/istio

$ kubectl --context="${CTX_CLUSTER2}" -n istio-system get configmap/istio -o yaml
apiVersion: v1
data:
  mesh: |-
    accessLogFile: /dev/stdout
    defaultConfig:
      discoveryAddress: istiod.istio-system.svc:15012
      meshId: mesh1
      proxyMetadata:
        DNS_AGENT: ""
      tracing:
        zipkin:
          address: zipkin.istio-system:9411
    enablePrometheusMerge: true
    rootNamespace: istio-system
    trustDomain: cluster.local
  meshNetworks: |-
    networks:
      network1:
        endpoints:
          - fromRegistry: cluster1
        gateways:
          - address: 44.235.109.1
            port: 15443
          - address: 35.155.121.141
            port: 15443
      network2:
        endpoints:
          - fromRegistry: cluster2
        gateways:
          - address: 44.232.122.66
            port: 15443
          - address: 35.155.125.147
            port: 15443
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"mesh":"accessLogFile: /dev/stdout\ndefaultConfig:\n  discoveryAddress: istiod.istio-system.svc:15012\n  meshId: mesh1\n  proxyMetadata:\n    DNS_AGENT: \"\"\n  tracing:\n    zipkin:\n      address: zipkin.istio-system:9411\nenablePrometheusMerge: true\nrootNamespace: istio-system\ntrustDomain: cluster.local","meshNetworks":"networks: {}"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"install.operator.istio.io/owning-resource":"unknown","install.operator.istio.io/owning-resource-namespace":"istio-system","istio.io/rev":"default","operator.istio.io/component":"Pilot","operator.istio.io/managed":"Reconcile","operator.istio.io/version":"1.8.0","release":"istio"},"name":"istio","namespace":"istio-system"}}
  creationTimestamp: "2020-12-04T18:55:04Z"
  labels:
    install.operator.istio.io/owning-resource: unknown
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.8.0
    release: istio
  name: istio
  namespace: istio-system
  resourceVersion: "8089"
  selfLink: /api/v1/namespaces/istio-system/configmaps/istio
  uid: 06c181d7-5dd7-477e-9591-f51a680a409

istiod logs confirming configmap changes are picked up:

2020-12-04T19:11:21.877673Z    info    mesh networks configuration updated to: {                                                                                                                                                                                                                                                                          │
│     "networks": {                                                                                                                                                                                                                                                                                                                                         │
│         "network1": {                                                                                                                                                                                                                                                                                                                                     │
│             "endpoints": [                                                                                                                                                                                                                                                                                                                                │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "fromRegistry": "cluster1"                                                                                                                                                                                                                                                                                                            │
│                 }                                                                                                                                                                                                                                                                                                                                         │
│             ],                                                                                                                                                                                                                                                                                                                                            │
│             "gateways": [                                                                                                                                                                                                                                                                                                                                 │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "address": "44.235.109.1",                                                                                                                                                                                                                                                                                                            │
│                     "port": 15443                                                                                                                                                                                                                                                                                                                         │
│                 },                                                                                                                                                                                                                                                                                                                                        │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "address": "35.155.121.141",                                                                                                                                                                                                                                                                                                          │
│                     "port": 15443                                                                                                                                                                                                                                                                                                                         │
│                 }                                                                                                                                                                                                                                                                                                                                         │
│             ]                                                                                                                                                                                                                                                                                                                                             │
│         },                                                                                                                                                                                                                                                                                                                                                │
│         "network2": {                                                                                                                                                                                                                                                                                                                                     │
│             "endpoints": [                                                                                                                                                                                                                                                                                                                                │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "fromRegistry": "cluster2"                                                                                                                                                                                                                                                                                                            │
│                 }                                                                                                                                                                                                                                                                                                                                         │
│             ],                                                                                                                                                                                                                                                                                                                                            │
│             "gateways": [                                                                                                                                                                                                                                                                                                                                 │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "address": "44.232.122.66",                                                                                                                                                                                                                                                                                                           │
│                     "port": 15443                                                                                                                                                                                                                                                                                                                         │
│                 },                                                                                                                                                                                                                                                                                                                                        │
│                 {                                                                                                                                                                                                                                                                                                                                         │
│                     "address": "35.155.125.147",                                                                                                                                                                                                                                                                                                          │
│                     "port": 15443                                                                                                                                                                                                                                                                                                                         │
│                 }                                                                                                                                                                                                                                                                                                                                         │
│             ]                                                                                                                                                                                                                                                                                                                                             │
│         }                                                                                                                                                                                                                                                                                                                                                 │
│     }                                                                                                                                                                                                                                                                                                                                                     │
│ }

It works - we see responses from v1 and v2!

$ kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- sh -c "while true; do curl -s helloworld.sample:5000/hello; done"
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v1, instance: helloworld-v1-578dd69f69-x2nv9
Hello version: v2, instance: helloworld-v2-776f74c475-skrtm

@stevenctl
Copy link
Contributor

@carnei-ro that's a great suggestion – I've put it in this doc and hopefully we can prioritize the impl soon

https://docs.google.com/document/d/1Sbg6hyO9NAOagtHxsg6H-OlQKoTpy8zaXVEphlj5R-M/edit?usp=sharing&resourcekey=0-Aqzm_-tOzxlN46Qijpf7cw

@nightmareze1
Copy link

I fix this problem adding temporal elbs ips in ConfigMap this part:

k edit ConfigMap -n istio-system

    meshNetworks: |-
      networks:
        network1:
          endpoints:
            - fromRegistry: cluster1
          gateways:
            - address: 52.21.68.210 (EKS- ELB IP)
              port: 15443
            - address: 100.24.96.26 (EKS - ELB IP)
              port: 15443
        network2:
          endpoints:
            - fromRegistry: cluster2
          gateways:
            - address: 52.127.65.88(AKS IP)
              port: 15443

Later, You need kill istiod pod an reaload the config

This working for me but I wait the better solution and the feature request for support cname in gateways address , because the loadbalancers in AWS use a Elastic_IPs/dynamic ips!

Screen Shot 2021-05-11 at 18 45 40

Feature Request:
+1

@nightmareze1
Copy link

nightmareze1 commented Jul 23, 2021

Any news about this issue?? @stevenctl

@stevenctl
Copy link
Contributor

It will take a non-trivial amount of work to support automatically reading the hostnames and resolving them at the proxy. We hope to prioritize in 1.12 but contributions are welcome (see doc)

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 23, 2021
@tr-srij
Copy link

tr-srij commented Aug 11, 2021

We have written a python snippet which runs as a cronjob on our cluster every x mins and update the configmap and service entries with AWS NLB ENI IPs. Works wonders for us. :D

@carnei-ro
Copy link

@tr-srij do the IPs for NLB change? my Tech Acc Manager told me it wouldn't change... Are you using Internal NLB or External NLB?

@tr-srij
Copy link

tr-srij commented Aug 11, 2021

Internal and yes they won't change. We manage creation of those NLBs via ALB ingress controller. We just run this job in cron so that if in any case we need to delete/recreate this infra we don't want to be in the business of manually updating these configmaps and service entries. The Job only updates if there are any changes to the IPs.

It can be a simple k8s job as opposed to cron as well.

@nightmareze1
Copy link

nightmareze1 commented Aug 13, 2021

@tr-srij you can share you script? I written a script in goland by I need more time for testing.

@markszabo
Copy link

I run into this issue too, and ended up writing down the workaround described above: https://szabo.jp/2021/09/22/multicluster-istio-on-eks/

@jwilner
Copy link

jwilner commented Sep 22, 2021

@markszabo the TLS creds up there are borked. Either you're being MITM'd or you should sort them out :). Either way, would love to read your post.

@markszabo
Copy link

@jwilner the site is hosted via Github pages which had an outage earlier today. Could you check it again?

@jwilner
Copy link

jwilner commented Sep 23, 2021

@markszabo not sure what's up, but your cert still seem borked and chrome is rejecting. I downloaded the cert with openssl (visible here) and it looks pretty borked.

Back to the matter at hand though -- I'm not sure if any one else has hit this when applying the workaround to a multi primary deployment, but I allocated EIP IPs to two NLBs here and updated the associated config, and when proxy status endpoints is borked for the sample workloads and I see the envoy error LbEndpointValidationError.LoadBalancingWeight: value must be greater than or equal to 1 in the istiod logs (here from west):

2021-09-23T13:08:01.396294Z	warn	ads	ADS:EDS: ACK ERROR sleep-557747455f-8mtxg.sample-105 Internal:Proto constraint validation failed (ClusterLoadAssignmentValidationError.Endpoints[0]: embedded message failed validation | caused by LocalityLbEndpointsValidationError.LbEndpoints[1]: embedded message failed validation | caused by LbEndpointValidationError.LoadBalancingWeight: value must be greater than or equal to 1): cluster_name: "outbound|80||sleep.sample.svc.cluster.local"
endpoints {
  locality {
    region: "us-east-1"
    zone: "us-east-1a"
  }
  lb_endpoints {
    endpoint {
      address {
        socket_address {
          address: "US-EAST-1A-IP"
          port_value: 15443
        }
      }
    }
    metadata {
      filter_metadata {
        key: "envoy.transport_socket_match"
        value {
          fields {
            key: "tlsMode"
            value {
              string_value: "istio"
            }
          }
        }
      }
      filter_metadata {
        key: "istio"
        value {
          fields {
            key: "workload"
            value {
              string_value: ";;;;us-west-2-cluster"
            }
          }
        }
      }
    }
    load_balancing_weight {
      value: 1
    }
  }
  lb_endpoints {
    endpoint {
      address {
        socket_address {
          address: "US-EAST-1B-IP"
          port_value: 15443
        }
      }
    }
    metadata {
      filter_metadata {
        key: "envoy.transport_socket_match"
        value {
          fields {
            key: "tlsMode"
            value {
              string_value: "istio"
            }
          }
        }
      }
      filter_metadata {
        key: "istio"
        value {
          fields {
            key: "workload"
            value {
              string_value: ";;;;us-west-2-cluster"
            }
          }
        }
      }
    }
    load_balancing_weight {
    }
  }
...

Fwiw, kubectl get cm -n istio-system istio -ojsonpath='{.data.meshNetworks}':

networks:
  us-east-1-vpc-id:
    endpoints:
    - fromRegistry: us-east-1-cluster
    gateways:
    - address: US-EAST-1A-IP
      port: 15443
    - address: US-EAST-1B-IP
      port: 15443
    - address: US-EAST-1C-IP
      port: 15443
  us-west-2-vpc-id:
    endpoints:
    - fromRegistry: us-west-2-cluster
    gateways:
    - address: US-WEST-2A-IP
      port: 15443
    - address: US-WEST-2B-IP
      port: 15443
    - address: US-WEST-2C-IP
      port: 15443

@savealive
Copy link

We're experiencing the same problem as described in comment #29359 (comment) after upgrading istio to 1.11.2. On 1.11.1 everything is ok.

@stevenctl
Copy link
Contributor

#34810 merged in release-1.11 between these releases, could be the cause (unrelated to the main problem in this issue).. could be the culprit.

@mangelajo
Copy link

I'm experiencing this same issue

2021-09-28T10:13:34.603924Z	warn	model	Failed parsing gateway address a9bb0aca6a026447b953d4b5c02d49ef-a670fe8f0737be62.elb.eu-central-1.amazonaws.com from Service Registry. Hostnames are not supported for gateways
2021-09-28T10:13:34.603947Z	warn	model	Failed parsing gateway address a9bb0aca6a026447b953d4b5c02d49ef-a670fe8f0737be62.elb.eu-central-1.amazonaws.com from Service Registry. Hostnames are not supported for gateways

@markszabo
Copy link

Here is my workaround for the problem as a shell script running in a CronJob: https://github.com/markszabo/istio-crosscluster-workaround-for-eks Any feedback is welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environments area/networking feature/Multi-cluster issues related with multi-cluster support feature/Multi-control-plane issues related with multi-control-plane support in a cluster kind/docs
Projects
None yet