Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no vxlan decap when host has multiple network interfaces #6051

Closed
MiddleMan5 opened this issue May 10, 2022 · 4 comments
Closed

no vxlan decap when host has multiple network interfaces #6051

MiddleMan5 opened this issue May 10, 2022 · 4 comments

Comments

@MiddleMan5
Copy link

MiddleMan5 commented May 10, 2022

The vxlan header is present when using capturing traffic destined for a docker container running on a node with multiple network interfaces. Result is endless TCP Retransmission errors and connection timeout.

Expected Behavior

vxlan headers to be stripped when sent from an interface that crosses subnet boundaries, regardless of host computer having network interface on same subnet or not.

Alternatively, log errors or better document limitations of multi-interface hosts

Current Behavior

vxlan headers are not stripped, tcp connections timeout

image

Steps to Reproduce (for bugs)

  1. Configure host nodes in kubernetes as documented below
  2. Deploy tcp server on multi-interface node
  3. Deploy client on single interface node
  4. Deploy ksniff pod with tcpdump alongside server
  5. Send request from client to server
  6. Watch packets enter server network without vxlan headers stripped (no decap)

Context

Machine 1:

  • interface 172.24.204.0/24
    Machine 2:
  • interface 172.24.214.0/24 (primary)
  • interface 172.24.204.0/24

See attached pcap files of bad TCP SYN with rtran from both ksniff and host tcpdump
calico_pcap.tar.gz

Relevant calico node config values

env:
  # Use Kubernetes API as the backing datastore.
  - name: DATASTORE_TYPE
    value: "kubernetes"
  # Wait for the datastore.
  - name: WAIT_FOR_DATASTORE
    value: "true"
  # Set based on the k8s node name.
  - name: NODENAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  # Choose the backend to use.
  - name: CALICO_NETWORKING_BACKEND
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: calico_backend
  # Cluster type to identify the deployment type
  - name: CLUSTER_TYPE
    value: "k8s,bgp"
  # Auto-detect the BGP IP address.
  - name: IP
    value: "autodetect"
  - name: IP_AUTODETECTION_METHOD
    value: "can-reach=172.24.214.12"
  # Disable IPIP
  - name: CALICO_IPV4POOL_IPIP
    value: "Never"
  # Enable or Disable VXLAN on the default IP pool.
  - name: CALICO_IPV4POOL_VXLAN
    value: "Always"
  # Set MTU for tunnel device used if ipip is enabled
  - name: FELIX_IPINIPMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # Set MTU for the VXLAN tunnel device.
  - name: FELIX_VXLANMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # Set MTU for the Wireguard tunnel device.
  - name: FELIX_WIREGUARDMTU
    valueFrom:
      configMapKeyRef:
        name: calico-config
        key: veth_mtu
  # The default IPv4 pool to create on startup if none exists. Pod IPs will be
  # chosen from this range. Changing this value after installation will have
  # no effect. This should fall within `--cluster-cidr`.
  # - name: CALICO_IPV4POOL_CIDR
  #   value: "192.168.0.0/16"
  # Disable file logging so `kubectl logs` works.
  - name: CALICO_DISABLE_FILE_LOGGING
    value: "true"
  # Set Felix endpoint to host default action to ACCEPT.
  - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
    value: "ACCEPT"
  # Disable IPv6 on Kubernetes.
  - name: FELIX_IPV6SUPPORT
    value: "false"
  - name: FELIX_HEALTHENABLED
    value: "true"
  - name: FELIX_LOGSEVERITYSCREEN
    value: "error"

Your Environment

  • Calico version: 3.19
  • Kubernetes 1.23
  • Ubuntu 16.04 server
@lmm
Copy link
Contributor

lmm commented May 17, 2022

Thanks @MiddleMan5 , a couple of questions:

  • when you say docker container running on a node do you mean a container within a Kubernetes pod or something else?
  • are you doing anything custom/manual with any of the Calico node setup? In particular, the "server" node VXLAN setup.

@MiddleMan5
Copy link
Author

Sorry for the late reply!

  • The docker containers are running in a kubernetes pod, and the kubernetes version is stated above yes.
  • Apart from the listed environment configurations above, the calico deployment has not been modified from 3.19 defaults

@lmm
Copy link
Contributor

lmm commented May 31, 2022

Thanks @MiddleMan5. Could you please:

  • share ip addr and ip route output from the server node
  • share calico-node pod logs from the server? (Not sure how much you'd need to redact.) You would have to change FELIX_LOGSEVERITYSCREEN to info, not sure if that's feasible or not.
  • Try client -> server traffic using the 172.24.204.x address? I wonder if this works.

@lmm
Copy link
Contributor

lmm commented May 31, 2022

(I think this might be a bug but I want to understand this more before flagging it as such)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants