New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal Kubernetes API Calls Blocked by Istio #8696
Comments
I can confirm this issue exists and is also the root cause of Knative not working on AKS-- their autoscaler, which is a controller, is unable to sync with the Kubernetes apiserver. Disabling the It's perplexing to me that this occurs in AKS but not elsewhere. Can someone help me troubleshoot this? (I work for Azure.) |
We have this problem - we made a ServiceEntry and VirtualService to account for the fact that the apiserver is now accessed over a public URL. I made the following ServiceEntry and VirtualService:
This gets it to the point where I can access the api-server, but after 5 minutes, it stops working, and calls to the api-server hang. Also, calls to Go's net.LookupIP(host) hang during this period, when providing the FQDN of the AKS apiserver. I found that if I wait 10-15 minutes, the problem seems to resolve itself, but starts failing again after another 5 minutes. I also found that I can make a request to the apiserver when it's working, and that seems to delay the point where it stops. I made a request, then another 1 minute later - and it started failing 5 minutes after the 2nd request, not the first. I should mention that curl requests made directly to the API server when I |
I am having the same issue with the RabbitMQ Kubernetes peer plugin, that wants to list the other pods, but can't connect to the AKS API server. The external service entry didn't work, the only way to fix this, was to set the ip range I only experienced this issue on AKS. |
@mkjoerg, does the plug-in use client-go? If so, there's a strange issue that seems to only happen when combining Istio, AKS, and client-go (but any 2 are fine): kubernetes/client-go#527 |
@adinunzio84, no the rabbitmq plugin is erlang based. https://github.com/adinunzio84 |
While kubernetes/client-go#527 may technically still be an issue, I want to point out that there is now, at least, a workaround in place on the AKS end. A mutating webhook is now overwriting environment variables such as While this may be more of a workaround than a strategic solution, it's fair to say that this issue is effectively remediated. Should we consider closing it? EDIT: Because the apiserver address will appear to be external, you do have to add an appropriate |
@krancour are you sure about: "The load balancer(s) that are involved in this alternative route to the apiserver are not subject to the difficulties explained in kubernetes/client-go#527."? I commented here more info about the issue I describe in kubernetes/client-go#527, and it seems it's actually more related to the load balancers involved with AKS than client-go. If applications with Istio sidecars are able to access the API server after the 5-minute window (where the LB closes the connection), then I think this can be closed. Otherwise, in my opinion, this should remain open and depends on envoyproxy/envoy#3634 |
@adinunzio84, the cluster-internal load balancers and externally facing load balancers are different. The externally facing load balancers have had TCP reset as an opt-in "preview" feature since mid-September, while the cluster-internal load balancers, if I understand correctly, still lack this feature. https://azure.microsoft.com/en-us/updates/load-balancer-outbound-rules/ Oddly, when I dig down into LB details, I cannot see any evidence that AKS actually enabled the feature in question when it deployed the cluster, however, I am currently observing correct/desired behavior. I'll follow up with my colleagues on the AKS team to figure out what's going on here. If you want to try this yourself, perhaps you can independently verify / refute that this works as I claim. |
Sure I'll test it out when I have a chance. If I understand correctly, that TCP reset preview feature is for a Standard Load Balancer. One of the people I spoke with on the AKS team said that AKS does not have Standard Load Balancer support enabled yet, but it should happen soon. |
That is correct. The post I linked to does reference standard load balancers, whilst AKS is currently using basic load balancers only-- which deepens the mystery of why this is now working. |
Internal Kubernetes API calls seem to fail for the first seconds. You might find the this repro useful: see update of #12187 |
This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I know it is an old issue but i solved this issue with: K8S_INTERNAL_API_IP=$(kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}') apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
....
.....
proxy:
autoInject: enabled
clusterDomain: cluster.local
componentLogLevel: misc:error
enableCoreDump: false
envoyStatsd:
enabled: false
excludeIPRanges: "${K8S_INTERNAL_API_IP}/32" Path is: .spec.values.global.proxy.excludeIPRanges |
how about
or if its only one POD which is affected: simply inject a POD annotation:
but the thing that works best for me is:
|
thank you so much. This worked perfectly. I was running into an issue because I'm using kubectl in init containers to check job statuses before deploying a replicaset. couldnt connect to the api at all. |
We just switched from Contour to Istio (1.9.4) on our dev environments and are running into this issue alot We've modified our IstioOperator settings with the excludeIPRanges mentioned in this issue, but we're still seeing the issue. It tends to happen regularly when our nightly builds get deployed to our dev clusters (~2AM PT). After enough restarts, the problem seems to go away, but we've yet to find the surefire workaround/remedy. Any other things we should be looking at? |
We managed to get around this issue with the following DestinationRul in our services' namespace:
|
be aware that something has changed in terms of egress rules precedence in 1.8.5+ |
Even with the DestinationRule, we're still seeing intermittent issues talking to kubernetes.default.svc and ElasticSearch instances in different namespaces. We've added retries in our code as well as a VirtualService to kubernetes.default.svc with retries, yet we still see intermittent issues. The issues were present, but not nearly as common since the switch to istio. |
ok so I have another solution : try this master sidecar config:
credit not to me but to @WilliamNewshutz & @gregoryhanson |
Istio and the api-sever does not play pretty well together, like pizza and pineapple: * istio/istio#8696 * istio/istio#12187 Chart is open enough for the users to set the podAnnotation with the api-server instead of not deploying the sidecar in the webhook, but by default we offer an option that works on most possible scenarios.
* fix: Don't let istio to instrument the webhook Istio and the api-sever does not play pretty well together, like pizza and pineapple: * istio/istio#8696 * istio/istio#12187 Chart is open enough for the users to set the podAnnotation with the api-server instead of not deploying the sidecar in the webhook, but by default we offer an option that works on most possible scenarios. * chore: Bump version
My team and I recently observed a somewhat similar issue with Istio and the Kubernetes API server (kube-apiserver) on Microsoft Azure Kubernetes Service (AKS), and found the suggestions in this thread to be helpful. For our issue, we saw connection resets when attempting to contact kube-apiserver from pods having the istio sidecar injected:
We also saw We tried @taitelman's solution of adding a Istio version: 1.15.3 Kubernetes versions:
|
Hi @taitelman and @pmalmsten, I have installed Istio v1.18.0 and I am trying to install Kiali v1.67.0 on two different namespaces, but I am facing below issue, I have tried adding above sidecar crd, service entry crd and destination entry crd too mentioned by @taitelman and @jdelgadillo, but I am still not able to resolve the issue with Kiali and and below is the error log, could you please check and let me know what restriction can be on AKS cluster and how I can it get resolved?
|
@Chaitan1991 were you able to fix your issue? |
yes, check the new versions of Istio and Kiali which you are installing are from correct repo with correct image tag, we were referring old testing repo with latest tag and that's not updated code at all!!! |
ty @taitelman , your |
Describe the bug
I'm installing a monitoring service in to my pod which is trying to make a call to the Kubernetes API server. This request is being blocked by the Istio sidecar. If I disable the
istio-injection
and redeploy everything works as planned. Do I need to enable anything to make this work?Expected behavior
My pods can access the internal Kubernetes API
Steps to reproduce the bug
from inside my pod does not respond.
Version
Istio:
Kubernetes:
Installation
Environment
Microsoft Azure AKS
The text was updated successfully, but these errors were encountered: