Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS + Weave + Istio]Error creating: Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https: //istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while await ing headers) #16434

Closed
hustshawn opened this issue Aug 21, 2019 · 9 comments

Comments

@hustshawn
Copy link

hustshawn commented Aug 21, 2019

(NOTE: This is used to report product bugs:
To report a security vulnerability, please visit https://istio.io/about/security-vulnerabilities/
To ask questions about how to use Istio, please visit https://discuss.istio.io
)

Bug description

  1. If auto-inject is labeled in a namespace, and no pod will spawn. The error on the corresponding rs is Error creating: Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https: //istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while await ing headers)
    The problem is basically similar as the issue Admission control webhooks (e.g. sidecar injector) don't work on EKS old_issues_repo#271. I found the conclusion for this issue is to open port 443 between worker nodes and control plane. But it does not work for my case.

  2. Unable to create the traffic management resource gw/vs/dr
    eg.

cat <<EOF | kubectl apply -f -
pipe heredoc> apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: grafana-gw
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway-internal
    # istio: istio-ingressgateway-internal
  servers:
  - hosts:
    - 'mydomain.com'
    port:
      name: https
      number: 443
      protocol: HTTP
EOF

The output is
Error from server (Timeout): error when creating "STDIN": Timeout: request did not complete within requested timeout 30s , and the resource failed to create.

Actually, my K8S cluster is a little bit tricky. I DISABLED the CNI for AWS.

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[x ] User Experience
[x ] Developer Infrastructure

Expected behavior
The sidecar can be successfully injected.

Steps to reproduce the bug
The process I setup the EKS clsuter:

  1. Create an EKS cluster from AWS console.
  2. Get access to the control plane API via kubectl
  3. delete the CNI plugin by kubectl delete deploy/aws-node -n kube-system
  4. Install weave network. kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
  5. Create worker nodes with the officially provided cloudformation template, a little hack is to inject the bootstrap parameter with --use-max-pods=false; sed -i 's/"maxPods":.*/"maxPods": 200/' /etc/kubernetes/kubelet/kubelet-config.json; systemctl restart kubelet;.
  6. Follow the doc to make worker nodes join the cluster, then the cluster is up with your nodes.
  7. Install helm and tiller with cluster-admin permission (for simple use this case)
  8. Install Istio
helm upgrade --install istio install/kubernetes/helm/istio --namespace istio-system \
    --values install/kubernetes/helm/istio/values-istio-demo-auth.yaml \
    --set gateways.istio-egressgateway.enabled=false \
    --set gateways.istio-ingressgateway.sds.enabled=true \
    --set global.k8sIngress.enabled=true \
    --set global.k8sIngress.enableHttps=true \
    --set global.k8sIngress.gatewayName=ingressgateway \
   --set certmanager.enabled=true \
    --set certmanager.email="devops@mydomain.com"

===

Label namespace default
kubectl label namespace default istio-injection=enabled

Setup Application

kubectl run nginx --image=openresty/openresty:alpine --port 80
kubectl expose deploy/nginx --port 80 --target-port=80
$ kubectl get deploy nginx
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     0            0           15h

Version (include the output of istioctl version --remote and kubectl version)

$ istioctl version --remote
client version: 1.2.3
citadel version: 1.2.3
galley version: 1.2.3
ingressgateway version: 1.2.3
ingressgateway-internal version: 
pilot version: 1.2.3
policy version: 1.2.3
sidecar-injector version: 1.2.3
telemetry version: 1.2.3
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.8-eks-a977ba", GitCommit:"a977bab148535ec195f12edc8720913c7b943f9c", GitTreeState:"clean", BuildDate:"2019-07-29T20:47:04Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?
Using Helm

helm upgrade --install istio install/kubernetes/helm/istio --namespace istio-system \
    --values install/kubernetes/helm/istio/values-istio-demo-auth.yaml \
    --set gateways.istio-egressgateway.enabled=false \
    --set gateways.istio-ingressgateway.sds.enabled=true \
    --set global.k8sIngress.enabled=true \
    --set global.k8sIngress.enableHttps=true \
    --set global.k8sIngress.gatewayName=ingressgateway \
    --set certmanager.enabled=true \
    --set certmanager.email="devops@mydomain.com"

Environment where bug was observed (cloud vendor, OS, etc)
AWS EKS.
Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

@hustshawn
Copy link
Author

Since the port 443 between control plane and the worker nodes are enabled by default, I began to suspect whether the issue is due to the overlay network by Weave on EKS. But I am not sure and that's why I raised this issue here.

However, my another k8s cluster setup by kops on AWS also using Weave overlay network, and the istio works fine.

@hustshawn
Copy link
Author

hustshawn commented Aug 22, 2019

Just made another try, and find the manual injection works.

  • removed the auto inject labels in the namespace
  • Manual inject with kubectl get deploy nginx -oyaml | istioctl kube-inject -f - | kubectl apply -f -

So this looks like the sidecar-injector webhook not work. However, this blog Amazon claims EKS added the support to the webhook/controller.

And I still cannot create gateway and virtualservice.
eg.

$ kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml -n default
Error from server (Timeout): error when creating "samples/bookinfo/networking/bookinfo-gateway.yaml": Timeout: request did not complete within requested timeout 30s
Error from server (Timeout): error when creating "samples/bookinfo/networking/bookinfo-gateway.yaml": Timeout: request did not complete within requested timeout 30s

Really confused, anyone has idea?

@cmanzi
Copy link
Contributor

cmanzi commented Aug 24, 2019

@hustshawn Been trying to get this to work for days. I ran into this, you actually need to run both the AWS and WeaveNet CNIs in parallel (pods with multiple network interfaces). This can be accomplished with CNI-Genie. The reason for this is that the validation webhooks Istio uses cannot be sent from the control plane to the WeaveNet subnet CIDR range.

Here's how I did it:

  1. Modify the CNI-Genie config to use "default": "weave"
  2. Installing CNI-Genie
  3. Installing WeaveNet
  4. Add the annotation cni: 'aws,weave' to Istio pods - galley, sidecar-injector, telemetry (mixer)
  5. Install Istio

Once I did that, the post-install-security job and sidecar-injection worked. I've now encountered another problem though. While pods can reach each other using the WeaveNet IPs, they are unable to reach any service (ClusterIP) IPs which map to WeaveNet IPs. Did you run into anything like that?

@hustshawn
Copy link
Author

@cmanzi I did not try your solution, but I suppose I understand your solution. You are assuming the istio component will utilize the CNIs, while other applications use weave. However, in terms of the problem you encountered, this is totally not acceptable.

I think this should be a compatibility issue between Amazon EKS and Istio. There is almost little thing I can do as a basic user, or too much hacky way may incur more unexpected result.

By the way, I found a document that AWS shows EKS work with Istio, and the truth is they just disabled the sidecar-injection by helm install install/kubernetes/helm/istio --name istio --namespace istio-system --set global.configValidation=false --set sidecarInjectorWebhook.enabled=false --set grafana.enabled=true --set servicegraph.enabled=true. So at this stage, I suppose EKS still not able to support Istio to install with sidecar-injection, and all other applications are working fine.

@guojingyinan219
Copy link

i have the problem , cannot create gateway and virtualservice , auto inject is failed ;

Error creating: Internal error occurred: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

@guojingyinan219
Copy link

use Manual inject , but can not create virtualservice, error message is

Error from server (Timeout): error when creating "hxtrip-product-vs.yaml": Timeout: request did not complete within requested timeout 30s
Error from server (Timeout): error when creating "hxtrip-product-vs.yaml": Timeout: request did not complete within requested timeout 30s

@howardjohn
Copy link
Member

Hey all, thanks for the reports. We are tracking and actively working on this issue in #13840. Lets consolidate this in one place to make things easier. Thanks!

@anupash147
Copy link

this problem is not solved how did @howardjohn closed the issue

@widdix123
Copy link

@howardjohn - Is this issue fixed ? I get the same error

Can someone please let me know how to fix this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants