Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd does not inject proxy containers with custom CNI on AWS #12489

Open
gabbler97 opened this issue Apr 23, 2024 · 5 comments
Open

Linkerd does not inject proxy containers with custom CNI on AWS #12489

gabbler97 opened this issue Apr 23, 2024 · 5 comments
Labels

Comments

@gabbler97
Copy link

gabbler97 commented Apr 23, 2024

What is the issue?

Linkerd proxy injection does not work with custom CNI (cilium) on AWS EKS clusters.

How can it be reproduced?

Install cilium

 helm list -n kube-system | grep cilium
cilium                          kube-system     4               2024-04-19 12:19:50.727550183 +0000 UTC deployed        cilium-1.15.4                           1.15.4
helm get values cilium -n kube-system
USER-SUPPLIED VALUES:
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: cni-plugin
          operator: NotIn
          values:
          - aws
egressMasqueradeInterfaces: eth0
hubble:
  enabled: true
  relay:
    enabled: true
  ui:
    enabled: true
ipam:
  operator:
    clusterPoolIPv4PodCIDRList:
    - 10.0.0.0/8

Install linkerd

helm list -n linkerd
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
linkerd-control-plane   linkerd         1               2024-04-22 14:00:07.29744245 +0000 UTC  deployed        linkerd-control-plane-2024.3.5  edge-24.3.5
linkerd-crds            linkerd         5               2024-04-04 11:06:36.932480898 +0000 UTC deployed        linkerd-crds-2024.3.5
helm get values linkerd-control-plane -n linkerd
USER-SUPPLIED VALUES:
disableHeartBeat: true
identity:
  issuer:
    scheme: kubernetes.io/tls
identityTrustAnchorsPEM: |-    
  -----BEGIN CERTIFICATE-----
  $CERT_CONTENT
  -----END CERTIFICATE-----
linkerdVersion: edge-24.3.5
policyController:
  image:
    name: my-artifactory/ghcr-docker-remote/linkerd/policy-controller
    version: edge-24.3.5
profileValidator:
  externalSecret: false
proxy:
  image:
    name: my-artifactory/ghcr-docker-remote/linkerd/proxy
    version: edge-24.3.5
  resources:
    cpu:
      limit: 100m
      request: 50m
    memory:
      limit: 100Mi
      request: 40Mi
proxyInit:
  image:
    name: my-artifactory/ghcr-docker-remote/linkerd/proxy-init
    version: v2.2.4
  runAsRoot: false

Annotate the namespace for automatic injection

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    config.linkerd.io/proxy-await: enabled
    linkerd.io/inject: enabled
...

Delete the pods

k get pod -n goldilocks
NAME                                     READY   STATUS    RESTARTS   AGE
goldilocks-controller-7869c48649-nqwkl   1/1     Running   0          50m
goldilocks-dashboard-75df58d594-49cj2    1/1     Running   0          50m
goldilocks-dashboard-75df58d594-zgw6v    1/1     Running   0          50m
user@ip-10-x-x-65 ~ $ k delete pod --all -n goldilocks
pod "goldilocks-controller-7869c48649-nqwkl" deleted
pod "goldilocks-dashboard-75df58d594-49cj2" deleted
pod "goldilocks-dashboard-75df58d594-zgw6v" deleted
user@ip-10-x-x-65 ~ $ k get pod -n goldilocks
NAME                                     READY   STATUS    RESTARTS   AGE
goldilocks-controller-7869c48649-vq5g2   1/1     Running   0          8s
goldilocks-dashboard-75df58d594-jdrnm    1/1     Running   0          6s
goldilocks-dashboard-75df58d594-ppxjm    1/1     Running   0          8s

Sidecar proxy should be injected and the last output should be

k get pod -n goldilocks
NAME                                     READY   STATUS    RESTARTS   AGE
goldilocks-controller-7869c48649-vq5g2   2/2     Running   0          8s
goldilocks-dashboard-75df58d594-jdrnm    2/2     Running   0          6s
goldilocks-dashboard-75df58d594-ppxjm    2/2     Running   0          8s

Logs, error output, etc

https://gist.github.com/gabbler97/6734dc908cf7136df49a8d2ba5e67eb9

output of linkerd check -o short

linkerd check -o short
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2024-04-25T05:51:39Z
    see https://linkerd.io/2.13/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.13.4 but the latest stable version is 2.14.10
    see https://linkerd.io/2.13/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 24.3.5 but the latest edge version is 24.4.4
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-24.3.5 but cli running stable-2.13.4
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-c6595f85b-b9tlz (edge-24.3.5)
        * linkerd-identity-6bfcf4bf97-cr8km (edge-24.3.5)
        * linkerd-proxy-injector-59d7d485b-crbgj (edge-24.3.5)
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-c6595f85b-b9tlz running edge-24.3.5 but cli running stable-2.13.4
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-cli-version for hints

linkerd-viz
-----------
‼ linkerd-viz pods are injected
    could not find proxy container for metrics-api-5bd869c749-6vqmt pod
    see https://linkerd.io/2.13/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
    container "linkerd-proxy" in pod "metrics-api-5bd869c749-6vqmt" is not ready
    see https://linkerd.io/2.13/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are healthy
    no "linkerd-proxy" containers found in the "linkerd" namespace
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-healthy for hints

Status check results are √

Environment

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11-eks-b9c9ed7

Possible solution

No response

Additional context

I have tried the linkerd-proxy-injector with hostNetwork=true.
In this case the proxy sidecar containers are injected automatically after a deployment rollout.
Some nodes became not ready because the kubelet stopped sending status.
After a given time (10 minutes it has benn resolved automatically).
My pods which are interacting with the kube API server started to crashloopbackoff, but only on one specific node at a time (where the linkerd proxy injector pod was running):

k get pod -A -o wide | grep "("
backup                  node-agent-dh6tm                                             0/1     CrashLoopBackOff   6 (43s ago)    10m     172.24.2.7      ip-10-x-x-162   <none>           <none>
monitoring              datadog-jwgx6                                                3/4     Running            5 (73s ago)    10m     172.24.2.83     ip-10-x-x-162   <none>           <none>
storage-ebs             ebs-csi-node-sfkx6                                           1/3     CrashLoopBackOff   8 (21s ago)    4m25s   172.24.2.163    ip-10-x-x-162   <none>           <none>
storage-fsx             fsx-openzfs-csi-node-2tkw7                                   1/3     CrashLoopBackOff   12 (71s ago)   9m25s   172.24.2.251    ip-10-x-x-162   <none>           <none>

Inside the pod logs I have found timeout for api server requests

k logs ebs-csi-node-775zv -n storage-ebs
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0405 08:52:12.308665       1 driver.go:83] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.28.0"
I0405 08:52:12.308784       1 node.go:93] "regionFromSession Node service" region="eu-central-1"
I0405 08:52:12.308809       1 metadata.go:85] "retrieving instance data from ec2 metadata"
I0405 08:52:24.870306       1 metadata.go:88] "ec2 metadata is not available"
I0405 08:52:24.870333       1 metadata.go:96] "retrieving instance data from kubernetes api"
I0405 08:52:24.871040       1 metadata.go:101] "kubernetes api is available"
panic: error getting Node ip-10-x-x-77.eu-central-1.compute.internal: Get "https://172.20.0.1:443/api/v1/nodes/ip-10-x-x-77": dial tcp 172.20.0.1:443: i/o timeout

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc00041cfc0)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:96 +0x3b1
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc000477ec0, 0xd, 0x4?})
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:106 +0x3e6
main.main()
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:64 +0x595

Would you like to work on fixing this bug?

None

@gabbler97 gabbler97 added the bug label Apr 23, 2024
@alpeb
Copy link
Member

alpeb commented Apr 25, 2024

Before attempting to use host networking, can you post the events (kubectl describe) for the deployments (not the pods) after rolling them out to see if there's any info about why they didn't get injected? Also the events for the injector pod and its logs might prove to be useful.

@gabbler97
Copy link
Author

Thank you for your answer @alpeb !

user@ip-10-x-x-65 ~ $ k logs linkerd-proxy-injector-55f86f4fc9-tsmgc  -n linkerd
Defaulted container "linkerd-proxy" out of: linkerd-proxy, proxy-injector, linkerd-init (init)
[     0.095648s]  INFO ThreadId(01) linkerd2_proxy: release 2.224.0 (d91421a) by linkerd on 2024-03-28T18:07:05Z
[     0.099989s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.101281s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.101298s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.101302s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.101305s]  INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.101309s]  INFO ThreadId(01) linkerd2_proxy: SNI is linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.101312s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.101315s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.104250s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=10.0.2.118:8090
[     0.195414s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=10.0.2.118:8086
[     0.202508s]  INFO ThreadId(02) identity:identity{server.addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_pool_p2c: Adding endpoint addr=10.0.31.152:8080
[     0.315761s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local
user@ip-10-x-x-65 ~ $ k logs linkerd-proxy-injector-55f86f4fc9-tsmgc  -n linkerd -c proxy-injector
time="2024-04-25T11:25:20Z" level=info msg="running version edge-24.3.5"
time="2024-04-25T11:25:20Z" level=info msg="starting admin server on :9995"
time="2024-04-25T11:25:20Z" level=info msg="waiting for caches to sync"
time="2024-04-25T11:25:20Z" level=info msg="listening at :8443"
time="2024-04-25T11:25:20Z" level=info msg="caches synced"
user@ip-10-x-x-65 ~ $ k logs linkerd-proxy-injector-55f86f4fc9-tsmgc  -n linkerd -c linkerd-init
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy-save -t nat"
time="2024-04-25T11:25:12Z" level=info msg="# Generated by iptables-save v1.8.10 on Thu Apr 25 11:25:12 2024\n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\nCOMMIT\n# Completed on Thu Apr 25 11:25:12 2024\n"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -N PROXY_INIT_REDIRECT"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports 4190,4191,4567,4568 -j RETURN -m comment --comment proxy-init/ignore-port-4190,4191,4567,4568/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143 -m comment --comment proxy-init/redirect-all-incoming-to-proxy-port/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PREROUTING -j PROXY_INIT_REDIRECT -m comment --comment proxy-init/install-proxy-init-prerouting/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -N PROXY_INIT_OUTPUT"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN -m comment --comment proxy-init/ignore-proxy-user-id/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN -m comment --comment proxy-init/ignore-loopback/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_OUTPUT -p tcp --match multiport --dports 443,6443 -j RETURN -m comment --comment proxy-init/ignore-port-443,6443/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140 -m comment --comment proxy-init/redirect-all-outgoing-to-proxy-port/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy -t nat -A OUTPUT -j PROXY_INIT_OUTPUT -m comment --comment proxy-init/install-proxy-init-output/1714044312"
time="2024-04-25T11:25:12Z" level=info msg="/sbin/iptables-legacy-save -t nat"
time="2024-04-25T11:25:12Z" level=info msg="# Generated by iptables-save v1.8.10 on Thu Apr 25 11:25:12 2024\n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\n:PROXY_INIT_OUTPUT - [0:0]\n:PROXY_INIT_REDIRECT - [0:0]\n-A PREROUTING -m comment --comment \"proxy-init/install-proxy-init-prerouting/1714044312\" -j PROXY_INIT_REDIRECT\n-A OUTPUT -m comment --comment \"proxy-init/install-proxy-init-output/1714044312\" -j PROXY_INIT_OUTPUT\n-A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -m comment --comment \"proxy-init/ignore-proxy-user-id/1714044312\" -j RETURN\n-A PROXY_INIT_OUTPUT -o lo -m comment --comment \"proxy-init/ignore-loopback/1714044312\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m multiport --dports 443,6443 -m comment --comment \"proxy-init/ignore-port-443,6443/1714044312\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m comment --comment \"proxy-init/redirect-all-outgoing-to-proxy-port/1714044312\" -j REDIRECT --to-ports 4140\n-A PROXY_INIT_REDIRECT -p tcp -m multiport --dports 4190,4191,4567,4568 -m comment --comment \"proxy-init/ignore-port-4190,4191,4567,4568/1714044312\" -j RETURN\n-A PROXY_INIT_REDIRECT -p tcp -m comment --comment \"proxy-init/redirect-all-incoming-to-proxy-port/1714044312\" -j REDIRECT --to-ports 4143\nCOMMIT\n# Completed on Thu Apr 25 11:25:12 2024\n"

And the events for the deployments

user@ip-10-x-x-65 ~ $ k describe deploy -n linkerd | grep Events
Events:          <none>
Events:          <none>
Events:          <none>
user@ip-10-x-x-65 ~ $ k describe deploy -n goldilocks | grep Events
Events:          <none>
Events:          <none>

@alpeb
Copy link
Member

alpeb commented Apr 25, 2024

Also, can you post what you get from kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io linkerd-proxy-injector-webhook-config -oyaml?

@gabbler97
Copy link
Author

Yes of course!

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    meta.helm.sh/release-name: linkerd-control-plane
    meta.helm.sh/release-namespace: linkerd
  labels:
    app.kubernetes.io/managed-by: Helm
    linkerd.io/control-plane-component: proxy-injector
    linkerd.io/control-plane-ns: linkerd
  name: linkerd-proxy-injector-webhook-config
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: $CABUNDLE
    service:
      name: linkerd-proxy-injector
      namespace: linkerd
      path: /
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: linkerd-proxy-injector.linkerd.io
  namespaceSelector:
    matchExpressions:
    - key: config.linkerd.io/admission-webhooks
      operator: NotIn
      values:
      - disabled
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values:
      - kube-system
      - cert-manager
  objectSelector:
    matchExpressions:
    - key: linkerd.io/control-plane-component
      operator: DoesNotExist
    - key: linkerd.io/cni-resource
      operator: DoesNotExist
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    - services
    scope: Namespaced
  sideEffects: None
  timeoutSeconds: 10

@gabbler97
Copy link
Author

gabbler97 commented Apr 29, 2024

Any idea how should I continue?
Thank you very much in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants