no vxlan decap when host has multiple network interfaces #6051

MiddleMan5 · 2022-05-10T16:31:34Z

The vxlan header is present when using capturing traffic destined for a docker container running on a node with multiple network interfaces. Result is endless TCP Retransmission errors and connection timeout.

Expected Behavior

vxlan headers to be stripped when sent from an interface that crosses subnet boundaries, regardless of host computer having network interface on same subnet or not.

Alternatively, log errors or better document limitations of multi-interface hosts

Current Behavior

vxlan headers are not stripped, tcp connections timeout

Steps to Reproduce (for bugs)

Configure host nodes in kubernetes as documented below
Deploy tcp server on multi-interface node
Deploy client on single interface node
Deploy ksniff pod with tcpdump alongside server
Send request from client to server
Watch packets enter server network without vxlan headers stripped (no decap)

Context

Machine 1:

interface 172.24.204.0/24
Machine 2:
interface 172.24.214.0/24 (primary)
interface 172.24.204.0/24

See attached pcap files of bad TCP SYN with rtran from both ksniff and host tcpdump
calico_pcap.tar.gz

Relevant calico node config values

env:
  # Use Kubernetes API as the backing datastore.
  - name: DATASTORE_TYPE
    value: "kubernetes"
  # Wait for the datastore.
  - name: WAIT_FOR_DATASTORE
    value: "true"
  # Set based on the k8s node name.
  - name: NODENAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  # Choose the backend to use.
  - name: CALICO_NETWORKING_BACKEND
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: calico_backend
  # Cluster type to identify the deployment type
  - name: CLUSTER_TYPE
    value: "k8s,bgp"
  # Auto-detect the BGP IP address.
  - name: IP
    value: "autodetect"
  - name: IP_AUTODETECTION_METHOD
    value: "can-reach=172.24.214.12"
  # Disable IPIP
  - name: CALICO_IPV4POOL_IPIP
    value: "Never"
  # Enable or Disable VXLAN on the default IP pool.
  - name: CALICO_IPV4POOL_VXLAN
    value: "Always"
  # Set MTU for tunnel device used if ipip is enabled
  - name: FELIX_IPINIPMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # Set MTU for the VXLAN tunnel device.
  - name: FELIX_VXLANMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # Set MTU for the Wireguard tunnel device.
  - name: FELIX_WIREGUARDMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # The default IPv4 pool to create on startup if none exists. Pod IPs will be
  # chosen from this range. Changing this value after installation will have
  # no effect. This should fall within `--cluster-cidr`.
  # - name: CALICO_IPV4POOL_CIDR
  #   value: "192.168.0.0/16"
  # Disable file logging so `kubectl logs` works.
  - name: CALICO_DISABLE_FILE_LOGGING
    value: "true"
  # Set Felix endpoint to host default action to ACCEPT.
  - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
    value: "ACCEPT"
  # Disable IPv6 on Kubernetes.
  - name: FELIX_IPV6SUPPORT
    value: "false"
  - name: FELIX_HEALTHENABLED
    value: "true"
  - name: FELIX_LOGSEVERITYSCREEN
    value: "error"

Your Environment

Calico version: 3.19
Kubernetes 1.23
Ubuntu 16.04 server

lmm · 2022-05-17T16:43:42Z

Thanks @MiddleMan5 , a couple of questions:

when you say docker container running on a node do you mean a container within a Kubernetes pod or something else?
are you doing anything custom/manual with any of the Calico node setup? In particular, the "server" node VXLAN setup.

MiddleMan5 · 2022-05-23T17:29:53Z

Sorry for the late reply!

The docker containers are running in a kubernetes pod, and the kubernetes version is stated above yes.
Apart from the listed environment configurations above, the calico deployment has not been modified from 3.19 defaults

lmm · 2022-05-31T16:40:34Z

Thanks @MiddleMan5. Could you please:

share ip addr and ip route output from the server node
share calico-node pod logs from the server? (Not sure how much you'd need to redact.) You would have to change FELIX_LOGSEVERITYSCREEN to info, not sure if that's feasible or not.
Try client -> server traffic using the 172.24.204.x address? I wonder if this works.

lmm · 2022-05-31T16:50:46Z

(I think this might be a bug but I want to understand this more before flagging it as such)

lmm added kind/bug kind/support and removed kind/bug labels May 31, 2022

caseydavenport closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no vxlan decap when host has multiple network interfaces #6051

no vxlan decap when host has multiple network interfaces #6051

MiddleMan5 commented May 10, 2022 •

edited

lmm commented May 17, 2022

MiddleMan5 commented May 23, 2022

lmm commented May 31, 2022 •

edited

lmm commented May 31, 2022 •

edited

no vxlan decap when host has multiple network interfaces #6051

no vxlan decap when host has multiple network interfaces #6051

Comments

MiddleMan5 commented May 10, 2022 • edited

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

lmm commented May 17, 2022

MiddleMan5 commented May 23, 2022

lmm commented May 31, 2022 • edited

lmm commented May 31, 2022 • edited

MiddleMan5 commented May 10, 2022 •

edited

lmm commented May 31, 2022 •

edited

lmm commented May 31, 2022 •

edited