Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicluster Replicated control planes with operator controller #21544

Closed
elfinhe opened this issue Feb 26, 2020 · 6 comments
Closed

Multicluster Replicated control planes with operator controller #21544

elfinhe opened this issue Feb 26, 2020 · 6 comments
Labels
area/environments lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@elfinhe
Copy link
Member

elfinhe commented Feb 26, 2020

I am testing multicluster/gateways/ with controller mode operator for #21435

Istio Version: istio-1.5.0-beta.3
K8s version: 1.15.8-gke.3
Nodes: 4

Bug description

  1. Follow the controller guide to install Istio operator controller.
  2. Follow the multicluster user guide to install multicluster mesh.

The same IstioOperator.yaml installs 2 different sets of pods by using controller-mode and istioctl respectively. The the ones installed by controller never reached to a AVAILABLE status.

kubectl get deploy -n istio-system 
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
istio-ingressgateway   0/1     1            0           6m58s
istiod                 1/1     1            1           6m54s
prometheus             0/1     1            0           6m56s

IstioOperator.yaml

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: example-istiocontrolplane
spec:
  addonComponents:
    istiocoredns:
      enabled: true

  components:
    egressGateways:
      - name: istio-egressgateway
        enabled: true

  values:
    global:
      # Provides dns resolution for global services
      podDNSSearchNamespaces:
        - global
        - "{{ valueOrDefault .DeploymentMeta.Namespace \"default\" }}.global"

      multiCluster:
        enabled: true

      controlPlaneSecurityEnabled: true

    # Multicluster with gateways requires a root CA
    # Cluster local CAs are bootstrapped with the root CA.
    security:
      selfSigned: false

    gateways:
      istio-egressgateway:
        env:
          # Needed to route traffic via egress gateway if desired.
          ISTIO_META_REQUESTED_NETWORK_VIEW: "external"

Expected behavior
No difference with what is installed by istioctl

Steps to reproduce the bug

  1. Follow the controller guide to install Istio operator controller.
  2. Follow the multicluster user guide to install multicluster mesh.

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

Istio Version: istio-1.5.0-beta.3
K8s version: 1.15.8-gke.3
Nodes: 4

Environment where bug was observed (cloud vendor, OS, etc)
Client Linux, Server GKE.

@elfinhe elfinhe pinned this issue Feb 26, 2020
@elfinhe elfinhe unpinned this issue Feb 26, 2020
@elfinhe
Copy link
Member Author

elfinhe commented Feb 26, 2020

Keep seeing validation error logs in istiod

2020-02-26T20:03:52.968090Z     info    Handle EDS: 0 endpoints for istiod in namespace istio-system
2020-02-26T20:03:53.004964Z     info    successfully acquired lease istio-system/istio-leader
2020-02-26T20:03:53.005169Z     info    Starting validation controller
2020-02-26T20:03:53.005377Z     info    validationController    Reconcile(enter): initial request to kickstart reconciliation
2020-02-26T20:03:53.005213Z     info    Starting ingress status controller
2020-02-26T20:03:53.005412Z     info    validationController    Endpoint istiod is not ready: resource not found
2020-02-26T20:03:53.005648Z     warn    validationController    validatingwebhookconfiguration.admissionregistration.k8s.io "istiod-istio-system" not found
2020-02-26T20:03:53.015887Z     info    ads     Push debounce stable[1] 35: 100.248639ms since last change, 199.904706ms since last push, full=true
2020-02-26T20:03:53.026174Z     info    ads     XDS: Pushing:2020-02-26T20:03:53Z/1 Services:14 ConnectedEndpoints:0
2020-02-26T20:03:53.026257Z     info    ads     Cluster init time 1.143µs 2020-02-26T20:03:53Z/1
2020-02-26T20:03:53.032474Z     info    validationController    Reconcile(enter): add event (v1, Kind=Endpoints) istio-system/istiod
2020-02-26T20:03:53.032510Z     info    validationController    Endpoint istiod is not ready: no subset addresses ready
2020-02-26T20:03:53.032753Z     warn    validationController    validatingwebhookconfiguration.admissionregistration.k8s.io "istiod-istio-system" not found

And repeating

2020-02-26T20:04:15.367927Z     info    validationController    Reconcile(enter): retry dry-run creation of invalid config
2020-02-26T20:04:15.373298Z     info    validationController    Not ready to switch validation to fail-closed: dummy invalid config not rejected
2020-02-26T20:04:15.373549Z     warn    validationController    validatingwebhookconfiguration.admissionregistration.k8s.io "istiod-istio-system" not found

@elfinhe
Copy link
Member Author

elfinhe commented Feb 26, 2020

IngressGateway logs:

2020-02-26T20:03:57.882900Z     info    Received new config, creating new Envoy epoch 0
2020-02-26T20:03:57.883041Z     info    Epoch 0 starting
2020-02-26T20:03:57.894888Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.16.4.35~istio-ingressgateway-864c79bd44-7zt9c.istio-system~istio-system.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4
 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error]
[Envoy (Epoch 0)] [2020-02-26 20:03:57.947][22][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-02-26 20:03:57.947][22][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-02-26T20:03:57.954919Z     info    sds     node:router~10.16.4.35~istio-ingressgateway-864c79bd44-7zt9c.istio-system~istio-system.svc.cluster.local-1 resource:default new connection
[Envoy (Epoch 0)] [2020-02-26 20:03:58.295][22][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-02-26 20:03:58.295][22][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-02-26T20:03:58.698619Z     info    grpc: addrConn.createTransport failed to connect to {istio-pilot.istio-system.svc:15012  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012: connect: connection refused". Reconnecting...
2020-02-26T20:03:58.698736Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000345a70, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012: connect: connection refused"}
2020-02-26T20:03:58.698785Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012: connect: connection refused"
2020-02-26T20:03:58.749039Z     warn    cache   node:router~10.16.4.35~istio-ingressgateway-864c79bd44-7zt9c.istio-system~istio-system.svc.cluster.local-1 resource:default request:5e74e0d5-3d7b-4fde-98aa-824d41d804a4 CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012
: connect: connection refused", retry in 50 millisec
2020-02-26T20:03:58.749135Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012: connect: connection refused"
2020-02-26T20:03:58.814571Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-02-26T20:03:58.849343Z     warn    cache   node:router~10.16.4.35~istio-ingressgateway-864c79bd44-7zt9c.istio-system~istio-system.svc.cluster.local-1 resource:default request:5e74e0d5-3d7b-4fde-98aa-824d41d804a4 CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012
: connect: connection refused", retry in 100 millisec
2020-02-26T20:03:58.849496Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.19.255.233:15012: connect: connection refused"

@elfinhe
Copy link
Member Author

elfinhe commented Mar 2, 2020

Trying again with latest 1.5 branch, I can only get EgressGateway pods (Not Ready) and CoreDns (Ready):

Logs in EgressGateway:

2020-03-02T20:16:44.165786Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected                                                                                                                                                                        
2020-03-02T20:16:44.418964Z     info    sds     resource:default new connection                                                                                                                                                                                                                                                                                           
2020-03-02T20:16:44.621226Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host"                                                                                                        
2020-03-02T20:16:44.671435Z     warn    cache   resource:default request:3e9f4c35-a57d-4722-8b51-13e18bc82fb6 CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host", retry in 50 millisec                                   
2020-03-02T20:16:44.671556Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host"                                                                                                        
2020-03-02T20:16:44.771743Z     warn    cache   resource:default request:3e9f4c35-a57d-4722-8b51-13e18bc82fb6 CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host", retry in 100 millisec 
2020-03-02T20:16:44.771846Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host"
2020-03-02T20:16:44.972062Z     warn    cache   resource:default request:3e9f4c35-a57d-4722-8b51-13e18bc82fb6 CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host", retry in 200 millisec
2020-03-02T20:16:44.972233Z     error   citadelclient   Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istio-pilot.istio-system.svc on 10.7.240.10:53: no such host"   

Installation with istioctl works well.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 1, 2020
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-03-02. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jun 16, 2020
@HelloChenHZ
Copy link

Hi, I meet the same problem in gcp private cluster. I install the istio 1.6.9. Have your solved this ?

@m0ps
Copy link

m0ps commented Sep 16, 2020

@HelloChenHZ it seems like your issue can be related to #24609
At least I have a similar problem and #24609 helped me to fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environments lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

5 participants