Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Kubernetes API Calls Blocked by Istio #8696

Closed
rsnj opened this issue Sep 13, 2018 · 26 comments
Closed

Internal Kubernetes API Calls Blocked by Istio #8696

rsnj opened this issue Sep 13, 2018 · 26 comments

Comments

@rsnj
Copy link

rsnj commented Sep 13, 2018

Describe the bug
I'm installing a monitoring service in to my pod which is trying to make a call to the Kubernetes API server. This request is being blocked by the Istio sidecar. If I disable the istio-injection and redeploy everything works as planned. Do I need to enable anything to make this work?

Expected behavior
My pods can access the internal Kubernetes API

Steps to reproduce the bug

curl https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/api/v1/namespaces/default/pods

from inside my pod does not respond.

Version
Istio:

Version: 1.0.2
GitRevision: d639408fded355fb906ef2a1f9e8ffddc24c3d64
User: root@
Hub: gcr.io/istio-release
GolangVersion: go1.10.1
BuildStatus: Clean

Kubernetes:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Installation

helm install install/kubernetes/helm/istio \
    --name istio \
    --namespace istio-system \
    --set certmanager.enabled=true

Environment
Microsoft Azure AKS

@rsnj rsnj changed the title Internal Kubernetes API Calls Blocked by Istio Sidecar Internal Kubernetes API Calls Blocked by Istio Sep 20, 2018
@krancour
Copy link
Contributor

I can confirm this issue exists and is also the root cause of Knative not working on AKS-- their autoscaler, which is a controller, is unable to sync with the Kubernetes apiserver. Disabling the istio-proxy sidecar on that pod fixes things, but my sense is that's not the right thing to do.

It's perplexing to me that this occurs in AKS but not elsewhere.

Can someone help me troubleshoot this? (I work for Azure.)

@m1o1
Copy link

m1o1 commented Nov 28, 2018

We have this problem - we made a ServiceEntry and VirtualService to account for the fact that the apiserver is now accessed over a public URL. I made the following ServiceEntry and VirtualService:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: azmk8s-ext
spec:
  hosts:
  - "<my-cluster>.hcp.centralus.azmk8s.io"
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: tls-routing
spec:
  hosts:
  - <my-cluster>.hcp.centralus.azmk8s.io
  tls:
  - match:
    - port: 443
      sniHosts:
      - <my-cluster>.hcp.centralus.azmk8s.io
    route:
    - destination:
        host: <my-cluster>.hcp.centralus.azmk8s.io

This gets it to the point where I can access the api-server, but after 5 minutes, it stops working, and calls to the api-server hang. Also, calls to Go's net.LookupIP(host) hang during this period, when providing the FQDN of the AKS apiserver.

I found that if I wait 10-15 minutes, the problem seems to resolve itself, but starts failing again after another 5 minutes. I also found that I can make a request to the apiserver when it's working, and that seems to delay the point where it stops. I made a request, then another 1 minute later - and it started failing 5 minutes after the 2nd request, not the first.

I should mention that curl requests made directly to the API server when I kubectl exec into the pod DO succeed. That made me think that maybe it's client-go that I was using incorrectly, but the fact that net.LookupIP(host) would also hang made me decide that probably isn't the case...

@containerpope
Copy link

I am having the same issue with the RabbitMQ Kubernetes peer plugin, that wants to list the other pods, but can't connect to the AKS API server. The external service entry didn't work, the only way to fix this, was to set the ip range

I only experienced this issue on AKS.

@m1o1
Copy link

m1o1 commented Dec 24, 2018

@mkjoerg, does the plug-in use client-go? If so, there's a strange issue that seems to only happen when combining Istio, AKS, and client-go (but any 2 are fine): kubernetes/client-go#527

@containerpope
Copy link

@adinunzio84, no the rabbitmq plugin is erlang based. https://github.com/adinunzio84

@krancour
Copy link
Contributor

krancour commented Jan 29, 2019

While kubernetes/client-go#527 may technically still be an issue, I want to point out that there is now, at least, a workaround in place on the AKS end.

A mutating webhook is now overwriting environment variables such as KUBERNETES_SERVICE_HOST for all pods with the external DNS name for your Kubernetes apiserver. The practical effect of this is that traffic bound for the apiserver exits and re-enters the cluster (which isn't ideal-- hence why I count this as a workaround rather than a solution). The load balancer(s) that are involved in this alternative route to the apiserver are not subject to the difficulties explained in kubernetes/client-go#527.

While this may be more of a workaround than a strategic solution, it's fair to say that this issue is effectively remediated. Should we consider closing it?

EDIT: Because the apiserver address will appear to be external, you do have to add an appropriate ServiceEntry.

@m1o1
Copy link

m1o1 commented Jan 29, 2019

@krancour are you sure about: "The load balancer(s) that are involved in this alternative route to the apiserver are not subject to the difficulties explained in kubernetes/client-go#527."?

I commented here more info about the issue I describe in kubernetes/client-go#527, and it seems it's actually more related to the load balancers involved with AKS than client-go.

If applications with Istio sidecars are able to access the API server after the 5-minute window (where the LB closes the connection), then I think this can be closed. Otherwise, in my opinion, this should remain open and depends on envoyproxy/envoy#3634

@krancour
Copy link
Contributor

@adinunzio84, the cluster-internal load balancers and externally facing load balancers are different. The externally facing load balancers have had TCP reset as an opt-in "preview" feature since mid-September, while the cluster-internal load balancers, if I understand correctly, still lack this feature.

https://azure.microsoft.com/en-us/updates/load-balancer-outbound-rules/

Oddly, when I dig down into LB details, I cannot see any evidence that AKS actually enabled the feature in question when it deployed the cluster, however, I am currently observing correct/desired behavior.

I'll follow up with my colleagues on the AKS team to figure out what's going on here.

If you want to try this yourself, perhaps you can independently verify / refute that this works as I claim.

@m1o1
Copy link

m1o1 commented Jan 29, 2019

Sure I'll test it out when I have a chance. If I understand correctly, that TCP reset preview feature is for a Standard Load Balancer. One of the people I spoke with on the AKS team said that AKS does not have Standard Load Balancer support enabled yet, but it should happen soon.

@krancour
Copy link
Contributor

That is correct. The post I linked to does reference standard load balancers, whilst AKS is currently using basic load balancers only-- which deepens the mystery of why this is now working.

@vcanaa
Copy link

vcanaa commented Mar 5, 2019

Internal Kubernetes API calls seem to fail for the first seconds. You might find the this repro useful: see update of #12187

@stale
Copy link

stale bot commented Jun 3, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 3, 2019
@stale
Copy link

stale bot commented Jul 3, 2019

This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@stale stale bot closed this as completed Jul 3, 2019
@omerfsen
Copy link

omerfsen commented Nov 5, 2020

I know it is an old issue but i solved this issue with:

K8S_INTERNAL_API_IP=$(kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}')

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
....
.....
      proxy:
        autoInject: enabled
        clusterDomain: cluster.local
        componentLogLevel: misc:error
        enableCoreDump: false
        envoyStatsd:
          enabled: false
        excludeIPRanges: "${K8S_INTERNAL_API_IP}/32"

Path is:

.spec.values.global.proxy.excludeIPRanges

@taitelman
Copy link

taitelman commented Mar 8, 2021

how about
(lets assume kubernetes.default.svc.cluster.local = 172.21.0.1 since most cloud providers uses static IP range for K8s internals)

kind: IstioOperator
metadata:
  namespace: perators
  name: my-iop
spec:
  profile: empty
  values:
    global:
      proxy:
        excludeIPRanges: "172.21.0.0/32" # don't let istio egress rules block K8s API
        ... more code here

or if its only one POD which is affected: simply inject a POD annotation:

traffic.sidecar.istio.io/excludeOutboundIPRanges: "172.21.0.0/32" # don't let istio egress rules block K8s API

but the thing that works best for me is:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: k8s-api-ext
  namespace: istio-system 
spec:
  hosts:
    - kubernetes.default.svc.cluster.local
  addresses:
    - 172.21.0.1
  endpoints:
    - address: 172.21.0.1
  exportTo:
    - "*"
  location: MESH_EXTERNAL
  resolution: STATIC
  ports:
    - number: 443
      name: https-k8s
      protocol: HTTPS

@xlanor
Copy link

xlanor commented Apr 29, 2021

I know it is an old issue but i solved this issue with:

K8S_INTERNAL_API_IP=$(kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}')

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
....
.....
      proxy:
        autoInject: enabled
        clusterDomain: cluster.local
        componentLogLevel: misc:error
        enableCoreDump: false
        envoyStatsd:
          enabled: false
        excludeIPRanges: "${K8S_INTERNAL_API_IP}/32"

Path is:

.spec.values.global.proxy.excludeIPRanges

thank you so much. This worked perfectly.

I was running into an issue because I'm using kubectl in init containers to check job statuses before deploying a replicaset. couldnt connect to the api at all.

@jdelgadillo
Copy link

jdelgadillo commented May 12, 2021

We just switched from Contour to Istio (1.9.4) on our dev environments and are running into this issue alot

We've modified our IstioOperator settings with the excludeIPRanges mentioned in this issue, but we're still seeing the issue.

It tends to happen regularly when our nightly builds get deployed to our dev clusters (~2AM PT). After enough restarts, the problem seems to go away, but we've yet to find the surefire workaround/remedy.

Any other things we should be looking at?

@jdelgadillo
Copy link

We managed to get around this issue with the following DestinationRul in our services' namespace:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: k8s-default-destrule
  namespace: my-service-namespace
spec:
  host: "kubernetes.default.svc" #Disabling it for Kube API Server communication
  trafficPolicy:
    tls:
      mode: DISABLE

@taitelman
Copy link

taitelman commented Jun 8, 2021

be aware that something has changed in terms of egress rules precedence in 1.8.5+

@jdelgadillo
Copy link

Even with the DestinationRule, we're still seeing intermittent issues talking to kubernetes.default.svc and ElasticSearch instances in different namespaces. We've added retries in our code as well as a VirtualService to kubernetes.default.svc with retries, yet we still see intermittent issues. The issues were present, but not nearly as common since the switch to istio.

@taitelman
Copy link

taitelman commented Jun 9, 2021

ok so I have another solution : try this master sidecar config:
basically this will be injected to every istio managed POD . it will allow the pod to connect only to other PODs in the same namespace as well as connect to istio-system namespace and K8s API service

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: default
  namespace: istio-system  # or just place here your container NS
spec:  # also consider using a workloadSelector to fine tune your K8s API permissions to a specific pod
  egress:
    - hosts:
        - ./*
        - default/kubernetes.default.svc.cluster.local
        - default/etcd.default.svc.cluster.local
        - istio-system/*

credit not to me but to @WilliamNewshutz & @gregoryhanson

nestorsalceda pushed a commit to sysdiglabs/charts that referenced this issue Aug 27, 2021
Istio and the api-sever does not play pretty well together, like pizza
and pineapple:

* istio/istio#8696
* istio/istio#12187

Chart is open enough for the users to set the podAnnotation with the
api-server instead of not deploying the sidecar in the webhook, but by
default we offer an option that works on most possible scenarios.
nestorsalceda pushed a commit to sysdiglabs/charts that referenced this issue Aug 27, 2021
* fix: Don't let istio to instrument the webhook

Istio and the api-sever does not play pretty well together, like pizza
and pineapple:

* istio/istio#8696
* istio/istio#12187

Chart is open enough for the users to set the podAnnotation with the
api-server instead of not deploying the sidecar in the webhook, but by
default we offer an option that works on most possible scenarios.

* chore: Bump version
@pmalmsten
Copy link

My team and I recently observed a somewhat similar issue with Istio and the Kubernetes API server (kube-apiserver) on Microsoft Azure Kubernetes Service (AKS), and found the suggestions in this thread to be helpful.

For our issue, we saw connection resets when attempting to contact kube-apiserver from pods having the istio sidecar injected:

root [ / ]# curl -k https://kubernetes.default:443
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to kubernetes.default:443

We also saw TLSV1_ALERT_NO_APPLICATION_PROTOCOL in corresponding Envoy access log lines, in the %UPSTREAM_TRANSPORT_FAILURE_REASON% position.

We tried @taitelman's solution of adding a ServiceEntry marking the kubernetes.default service as MESH_EXTERNAL, and so far so good, the problem is gone in our test environments. We haven't rolled it out to production yet, but it looks promising. Thanks @taitelman for the suggestion.

Istio version: 1.15.3

Kubernetes versions:

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:57:26Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"63a397b187f0a635240b37ad7b85d65ea2ae0ec3", GitTreeState:"clean", BuildDate:"2022-10-06T17:46:47Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

@Chaitan1991
Copy link

Chaitan1991 commented Sep 6, 2023

Hi @taitelman and @pmalmsten,

I have installed Istio v1.18.0 and I am trying to install Kiali v1.67.0 on two different namespaces, but I am facing below issue, I have tried adding above sidecar crd, service entry crd and destination entry crd too mentioned by @taitelman and @jdelgadillo, but I am still not able to resolve the issue with Kiali and and below is the error log, could you please check and let me know what restriction can be on AKS cluster and how I can it get resolved?

2023-09-06T11:29:16Z WRN Cannot resolve local cluster name. Err: Get "https://10.67.224.1:443/apis/apps/v1/namespaces/istio-system/deployments/istiod": dial tcp 10.67.224.1:443: connect: connection refused. Falling back to 'Kubernetes' 2023-09-06T11:29:16Z INF Using authentication strategy [token] 2023-09-06T11:29:16Z INF Some validation errors will be ignored [KIA1201]. If these errors do occur, they will still be logged. If you think the validation errors you see are incorrect, please report them to the Kiali team if you have not done so already and provide the details of your scenario. This will keep Kiali validations strong for the whole community. 2023-09-06T11:29:16Z DBG Rest perf config QPS: 175.000000 Burst: 200 2023-09-06T11:29:16Z INF Initializing Kiali Cache 2023-09-06T11:29:16Z INF Adding a RegistryRefreshHandler 2023-09-06T11:29:16Z DBG [Kiali Cache] Using 'cluster' scoped Kiali Cache 2023-09-06T11:29:16Z WRN Error checking Istio API configuration: Get "https://10.67.224.1:443/apis/networking.istio.io": dial tcp 10.67.224.1:443: connect: connection refused 2023-09-06T11:29:16Z WRN Error checking Kubernetes Gateway API configuration: Get "https://10.67.224.1:443/apis/gateway.networking.k8s.io": dial tcp 10.67.224.1:443: connect: connection refused 2023-09-06T11:29:16Z DBG [Kiali Cache] Starting cluster-scoped informers 2023-09-06T11:29:16Z INF [Kiali Cache] Waiting for cluster-scoped cache to sync

@jayakasadev
Copy link

@Chaitan1991 were you able to fix your issue?

@Chaitan1991
Copy link

Chaitan1991 commented Oct 26, 2023

yes, check the new versions of Istio and Kiali which you are installing are from correct repo with correct image tag, we were referring old testing repo with latest tag and that's not updated code at all!!!

@elouanKeryell-Even
Copy link

how about (lets assume kubernetes.default.svc.cluster.local = 172.21.0.1 since most cloud providers uses static IP range for K8s internals)

kind: IstioOperator
metadata:
  namespace: perators
  name: my-iop
spec:
  profile: empty
  values:
    global:
      proxy:
        excludeIPRanges: "172.21.0.0/32" # don't let istio egress rules block K8s API
        ... more code here

or if its only one POD which is affected: simply inject a POD annotation:

traffic.sidecar.istio.io/excludeOutboundIPRanges: "172.21.0.0/32" # don't let istio egress rules block K8s API

but the thing that works best for me is:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: k8s-api-ext
  namespace: istio-system 
spec:
  hosts:
    - kubernetes.default.svc.cluster.local
  addresses:
    - 172.21.0.1
  endpoints:
    - address: 172.21.0.1
  exportTo:
    - "*"
  location: MESH_EXTERNAL
  resolution: STATIC
  ports:
    - number: 443
      name: https-k8s
      protocol: HTTPS

ty @taitelman , your ServiceEntry saved my life

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests