Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to validate networking configuration. Please ensure iptables rules are rewriting traffic as expected. #11735

Closed
wibed opened this issue Dec 11, 2023 · 7 comments
Labels

Comments

@wibed
Copy link

wibed commented Dec 11, 2023

What is the issue?

linkerd-cni combined with cilium leads the linkerd pods (destination, proxy ..) crash.

How can it be reproduced?

repro:
0. install cilium

  1. install kustomize cli (https://kubectl.docs.kubernetes.io/installation/kustomize/)
  2. adjust helmGlobals.chartHome as necessary (is arbitrary
  3. run kustomize build -o build.yaml --enable-helm --load-restrictor=LoadRestrictionsNone .
apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: linkerd-cni
    linkerd.io/cni-resource: "true"
    config.linkerd.io/admission-webhooks: disabled
    pod-security.kubernetes.io/enforce: privileged
  name: linkerd-cni
--- 
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmGlobals:
  chartHome: ../../../../base/linkerdcni

helmCharts:
- name: linkerd2-cni
  releaseName: cluster0
  namespace: linkerd-cni
  includeCRDs: true
  valuesInline:
    logLevel: debug
  repo: https://helm.linkerd.io/stable
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmGlobals:
  chartHome: ../../../../base/linkerdcrds

helmCharts:
- name: linkerd-crds
  releaseName: cluster0
  namespace: linkerd
  includeCRDs: true
  valuesInline:
  repo: https://helm.linkerd.io/stable
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: linkerd
    config.linkerd.io/admission-webhooks=disabled
    hnc.x-k8s.io/included-namespace=true
    kubernetes.io/metadata.name=linkerd
    linkerd.io/control-plane-ns=linkerd
    linkerd.io/is-control-plane=true
    linkerd.tree.hnc.x-k8s.io/depth=0
    pod-security.kubernetes.io/enforce=privileged
    staging.tree.hnc.x-k8s.io/depth=1
  annotations:
    hnc.x-k8s.io/subnamespace-of: staging
    linkerd.io/inject: disabled
  name: linkerd
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmGlobals:
  chartHome: ../../../../base/linkerdcontrolplane

helmCharts:
- name: linkerd-control-plane
  releaseName: cluster0
  namespace: linkerd
  includeCRDs: true
  valuesInline:
    controllerLogLevel: debug
    cniEnabled: true
    securityContext:
      capabilities:
        drop:
          - ALL
    identityTrustAnchorsPEM: |
      -----BEGIN CERTIFICATE-----
      MIIBjTCCATSgAwIBAgIUNSKycPwhnNVhIpMnxXdnDV0CBO8wCgYIKoZIzj0EAwIw
      JTEjMCEGA1UEAwwacm9vdC5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjMxMTMw
      MDY0NDUwWhcNMjQxMTI5MDY0NDUwWjAlMSMwIQYDVQQDDBpyb290LmxpbmtlcmQu
      Y2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABGS40KJuqY91
      yQegKKyOacLKeEDK+75ICvJERZ4TmqMUB+W8denrziT2r6Ln0whGO8ddlslj91N5
      0yU3yF66j+ujQjBAMA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgGGMB0G
      A1UdDgQWBBQ3AH9bUjQDvGaMp1zLqZTYYU3MXTAKBggqhkjOPQQDAgNHADBEAiAv
      FbdxB+bwfXN6MwAT8mQm36x4ucJ7FRSjOwRGMVyCOAIgXdEfUwFvsltNf/beBnvC
      3nDcyuEZ6pQu8SGg41fp6O8=
      -----END CERTIFICATE-----
    identity:
      issuer:
        tls:
          crtPEM: |
            -----BEGIN CERTIFICATE-----
            MIIBszCCAVmgAwIBAgIUTOUpw0cZp4vDBfrBTYJEhHvhDaQwCgYIKoZIzj0EAwIw
            JTEjMCEGA1UEAwwacm9vdC5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjMxMTMw
            MDY0NDUwWhcNMjQwNTI4MDY0NDUwWjApMScwJQYDVQQDDB5pZGVudGl0eS5saW5r
            ZXJkLmNsdXN0ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQSyaRv
            wR2gJ0svRsxek/zO+afeJLAkYmTsvB2o4F5S7Q2lK+uchl3B/5Y8FCInZBJv3woE
            LyVKN9DVOE1uPRXgo2MwYTAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIB
            hjAdBgNVHQ4EFgQU9nVUCb3VISXkjWoDBaixi26uyvUwHwYDVR0jBBgwFoAUNwB/
            W1I0A7xmjKdcy6mU2GFNzF0wCgYIKoZIzj0EAwIDSAAwRQIgMcc/XS43WU3nT0yN
            Z5UuUKce+NRpXbHp+X3QROnSDrwCIQCFAnMVdXeHGo6pPVlvaeYOQVfW0w5cQ0WZ
            rdyNP/xXJg==
            -----END CERTIFICATE-----
          keyPEM: |
            -----BEGIN EC PRIVATE KEY-----
            MHcCAQEEIFJn8Sq4KD1RYRIatP8DFyqxzbP+CjHrksQ6M3abPdl6oAoGCCqGSM49
            AwEHoUQDQgAEEsmkb8EdoCdLL0bMXpP8zvmn3iSwJGJk7LwdqOBeUu0NpSvrnIZd
            wf+WPBQiJ2QSb98KBC8lSjfQ1ThNbj0V4A==
            -----END EC PRIVATE KEY-----
  repo: https://helm.linkerd.io/stable
---

Logs, error output, etc

error:

 cmd$ kubectl logs deployment/linkerd-destination -c linkerd-network-validator -n linkerd
 # Output
 INFO linkerd_network_validator: Listening for connections on 0.0.0.0:4140
 DEBUG linkerd_network_validator: token="GYBfoS7BvgErC8QAVAOg29zzO0kfkbn3O0oz1qFsHnUm6VuwK45xoPNnou1wUt1\n"
 INFO linkerd_network_validator: Connecting to 1.1.1.1:20001
 ERROR linkerd_network_validator: Failed to validate networking configuration. Please ensure iptables rules are rewriting traffic as expected. timeout=10s

output of linkerd check -o short

linkerd-existence
-----------------
× control plane pods are ready
    No running pods for "linkerd-destination"
    see https://linkerd.io/2.14/checks/#l5d-api-control-ready for hints

linkerd-viz
-----------
‼ linkerd-viz pods are injected
    could not find proxy container for metrics-api-799c844985-4hhph pod
    see https://linkerd.io/2.14/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
    container "linkerd-proxy" in pod "metrics-api-799c844985-4hhph" is not ready
    see https://linkerd.io/2.14/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are healthy
    no "linkerd-proxy" containers found in the "linkerd" namespace
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-healthy for hints

Environment

# talosctl version
Client:
	Tag:         v1.5.5
	SHA:         ad7361c7
	Built:       
	Go version:  go1.20.11
	OS/Arch:     darwin/amd64
Server:
	NODE:        10.0.48.20
	Tag:         v1.6.0-alpha.0
	SHA:         8670450d
	Built:       
	Go version:  go1.21.0 X:loopvar
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        10.0.48.22
	Tag:         v1.6.0-alpha.0
	SHA:         8670450d
	Built:       
	Go version:  go1.21.0 X:loopvar
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        10.0.48.23
	Tag:         v1.6.0-alpha.0
	SHA:         8670450d
	Built:       
	Go version:  go1.21.0 X:loopvar
	OS/Arch:     linux/amd64
	Enabled:     RBAC
	NODE:        10.0.48.21
	Tag:         v1.6.0-alpha.0
	SHA:         8670450d
	Built:       
	Go version:  go1.21.0 X:loopvar
	OS/Arch:     linux/amd64
	Enabled:     RBAC

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

maybe

@wibed wibed added the bug label Dec 11, 2023
@jdinsel-xealth
Copy link

jdinsel-xealth commented Dec 11, 2023

See also #11073 and #11699. We also experienced this issue on AWS after upgrading linkerd-control-plane from v1.16.4. It does not reproduce in v1.16.4 (stable-2.14.3). We've witnessed it in all of the stable versions released after that.

@alpeb
Copy link
Member

alpeb commented Dec 12, 2023

It's possible that this setup doesn't leave enough time for the linkerd-cni DaemonSet to become ready before the linkerd control plane in deployed, which is addressed by #11699. However, even with that fix in, you might bump into the Talos issue #7945, for which an fix is being looked at in linkerd/linkerd2-proxy-init#264. Please check if you don't have access to the nsenter binary in your system to see if that's indeed the case.

@jdinsel-xealth
Copy link

However, even with that fix in, you might bump into the Talos issue #7945, for which an fix is being looked at in linkerd/linkerd2-proxy-init#264. Please check if you don't have access to the nsenter binary in your system to see if that's indeed the case.

For us, I don't believe we're hitting the Talos issue described in the link. We're able to get the linkerd-network-validator to start by deleting the pod that is stuck in the crash loop. In almost all circumstances, the pod starts successfully when it's recreated.

@olix0r
Copy link
Member

olix0r commented Dec 14, 2023

@wibed Please note that you posted issuer private key credentials:

          keyPEM: |
            -----BEGIN EC PRIVATE KEY-----
            MHcCAQEEIFJn8Sq4KD1RYRIatP8DFyqxzbP+CjHrksQ6M3abPdl6oAoGCCqGSM49
            AwEHoUQDQgAEEsmkb8EdoCdLL0bMXpP8zvmn3iSwJGJk7LwdqOBeUu0NpSvrnIZd
            wf+WPBQiJ2QSb98KBC8lSjfQ1ThNbj0V4A==
            -----END EC PRIVATE KEY-----

These credentials could be used to forge certificates for your cluster. You should make sure to regenerate these credentials before using them in a real environment.

@wibed
Copy link
Author

wibed commented Dec 15, 2023

@wibed Please note that you posted issuer private key credentials:

          keyPEM: |
            -----BEGIN EC PRIVATE KEY-----
            MHcCAQEEIFJn8Sq4KD1RYRIatP8DFyqxzbP+CjHrksQ6M3abPdl6oAoGCCqGSM49
            AwEHoUQDQgAEEsmkb8EdoCdLL0bMXpP8zvmn3iSwJGJk7LwdqOBeUu0NpSvrnIZd
            wf+WPBQiJ2QSb98KBC8lSjfQ1ThNbj0V4A==
            -----END EC PRIVATE KEY-----

These credentials could be used to forge certificates for your cluster. You should make sure to regenerate these credentials before using them in a real environment.

i know, this is for you to avoid the necessity to google openssl commands... again.
these arent production credentials. (its play-dough man =))

@alpeb
Copy link
Member

alpeb commented Jan 4, 2024

@jdinsel-xealth for your specific case, that's gonna get addressed with the cni-repair controller (#11699), that should be included in an edge release as soon as that merges.

@wibed
Copy link
Author

wibed commented Jan 5, 2024

i dont think either, as i believe the cni has not correctly been dispatched on the host as assumed.
i am not sure how to approach this on talos and am no longer interested in looking for a solution.

kindly reopen if someone wants to take my place.

@wibed wibed closed this as completed Jan 5, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants