Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31) #2069

Closed
dnoliver opened this issue Jul 27, 2020 · 4 comments

Comments

@dnoliver
Copy link

dnoliver commented Jul 27, 2020

Environmental Info:
K3s Version:

[root@first ~]# k3s -v
k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version:

[root@first ~]# uname -a
Linux first.mshome.net 5.5.17-200.fc31.x86_64 #1  SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@second test]# uname -a
Linux second.mshome.net 5.5.17-200.fc31.x86_64 rancher/k3s#1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

2 VMs running Fedora IoT 31
1 VM (first.mshome.net) is the primary
1 VM (second.mshome.net) is the secondary
HOST GW mode used instead of default VXLAN because of #2049

Describe the bug:

Running the Kubernetes Basic sample to test my deployment.

The hosts were deployed following the Kubernetes on Fedora IoT with k3s post

The primary is deployed with:

firewall-cmd --zone=public --add-port=6443/tcp --permanent
firewall-cmd --zone=public --add-port=8472/udp --permanent
firewall-cmd --reload
curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s.service.d/
cat > /etc/systemd/system/k3s.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount

[Service]
ExecStart=
ExecStart=/usr/local/bin/k3s server --flannel-backend=host-gw
EOA

The secondary is deployed with:

firewall-cmd --zone=public --add-port=6443/tcp --permanent
firewall-cmd --zone=public --add-port=8472/udp --permanent
firewall-cmd --reload
# Token generated in
# /var/lib/rancher/k3s/server/node-token
curl -sfL https://get.k3s.io | K3S_URL=https://first.mshome.net:6443 INSTALL_K3S_SKIP_START=true \
    K3S_TOKEN=<TOKEN> sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s-agent.service.d/
cat > /etc/systemd/system/k3s-agent.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount
EOA

After everything is started, nodes are up and running

[root@first ~]# kubectl get nodes
NAME                STATUS   ROLES    AGE     VERSION
first.mshome.net    Ready    master   2d21h   v1.18.6+k3s1
second.mshome.net   Ready    <none>   2d21h   v1.18.6+k3s1

Deployment of the test pods is successful:

kubectl create deployment kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1
kubectl scale deployment kubernetes-bootcamp --replicas=4

Trying to communicate with the pods trough the Kube Proxy fails for pods not in the same host

[root@first ~]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-txbh8   1/1     Running   1          2d21h
kubernetes-bootcamp-6f6656d949-rtcz4   1/1     Running   1          2d21h
kubernetes-bootcamp-6f6656d949-6gnjj   1/1     Running   1          2d21h
kubernetes-bootcamp-6f6656d949-lh7s4   1/1     Running   1          2d21h

[root@first ~]# kubectl proxy &
[1] 8716

[root@first ~]# Starting to serve on 127.0.0.1:8001

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-txbh8:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-6gnjj:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-lh7s4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.6:8080: connect: no route to host'

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-rtcz4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.7:8080: connect: no route to host'

kill 8716

Deploying a service, and using Kube Proxy to communicate with the service also fails when it tries to route to pods in different nodes:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get service kubernetes-bootcamp
NAME                  TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
kubernetes-bootcamp   NodePort   10.43.66.205   <none>        8080:32611/TCP   14s

[root@first ~]# kubectl proxy &
[1] 9400
Starting to serve on 127.0.0.1:8001

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.7:8080: connect: no route to host'[root@first ~]

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.6:8080: connect: no route to host'

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

[root@first ~]# kill 9400
[1]+  Terminated              kubectl proxy

[root@first ~]# kubectl delete service kubernetes-bootcamp 
service "kubernetes-bootcamp" deleted

Additionally, using the service deployed in the node directly yields the same issue:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get services
NAME                  TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
kubernetes            ClusterIP   10.43.0.1     <none>        443/TCP          2d22h
kubernetes-bootcamp   NodePort    10.43.53.49   <none>        8080:32096/TCP   16s

[root@first ~]# curl first.mshome.net:32096
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl first.mshome.net:32096
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

[root@first ~]# curl first.mshome.net:32096
curl: (7) Failed to connect to first.mshome.net port 32096: No route to host

[root@first ~]# curl first.mshome.net:32096
curl: (7) Failed to connect to first.mshome.net port 32096: No route to host

[root@first ~]# curl second.mshome.net:32096
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-rtcz4 | v=1

[root@first ~]# curl second.mshome.net:32096
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lh7s4 | v=1

[root@first ~]# curl second.mshome.net:32096
curl: (7) Failed to connect to second.mshome.net port 32096: No route to host

[root@first ~]# curl second.mshome.net:32096
curl: (7) Failed to connect to second.mshome.net port 32096: No route to host

Steps To Reproduce:

  • Installed K3s:

Included above

Expected behavior:

All the pods should be addressable using the Kube Proxy

Actual behavior:

Only pods in the same node are addressable using the Kube Proxy

Additional context / logs:

@dnoliver dnoliver changed the title Kube Proxy cannot route traffic to pods running on different host (Fedora IoT 31, HyperV VMs) Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31, HyperV VMs) Jul 27, 2020
@dnoliver
Copy link
Author

Also tested this with libvirt. It runs into the same problem. So, removing the HyperV VM from the title

@dnoliver dnoliver changed the title Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31, HyperV VMs) Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31) Jul 30, 2020
@dnoliver
Copy link
Author

dnoliver commented Aug 4, 2020

Using the ingress to route traffic to different hosts works!

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubernetes-bootcamp
spec:
  rules:
    - host: kubernetes-bootcamp.192.168.122.102.xip.io
      http:
        paths:
          - path: /
            backend:
              serviceName: kubernetes-bootcamp
              servicePort: 8080
kubectl apply -f my-ingress.yaml
[root@first ~]# curl kubernetes-bootcamp.192.168.122.102.xip.io
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-csxzp | v=1

[root@first ~]# curl kubernetes-bootcamp.192.168.122.102.xip.io
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-dph7z | v=1

[root@first ~]# curl kubernetes-bootcamp.192.168.122.102.xip.io
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-p4nrb | v=1

[root@first ~]# curl kubernetes-bootcamp.192.168.122.102.xip.io
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-qj97v | v=1

[root@first ~]# curl kubernetes-bootcamp.192.168.122.102.xip.io
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-csxzp | v=1

Also works from a different VM in the same virtual network that does not belongs to the cluster

@stenwt
Copy link

stenwt commented Apr 5, 2021

I'm seeing the same with fedora-iot 33 on Raspberry PI 4s.

If I run this daemonset:

kind: DaemonSet
metadata:
  name: overlaytest
spec:
  selector:
      matchLabels:
        name: overlaytest
  template:
    metadata:
      labels:
        name: overlaytest
    spec:
      tolerations:
      - operator: Exists      
      containers:
      - image: kubesphere/kubectl
        imagePullPolicy: Always
        name: overlaytest
        command: ["sh", "-c", "tail -f /dev/null"]
        terminationMessagePath: /dev/termination-log

And then run this script:

echo "=> Start network overlay test"
  kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' |
  while read spod shost
    do kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' |
    while read tip thost
      do kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"
        RC=$?
        if [ $RC -ne 0 ]
          then echo FAIL: $spod on $shost cannot reach pod IP $tip on $thost
          else echo $shost can reach $thost
        fi
    done
  done
echo "=> End network overlay test"

Pods on the same node (so 2 <-> 2, 3<->3) can communicate but not on different nodes, when deploying with vxlan. If I set to host-gw, all pods can communicate.

In my cluster, I've set selinux=disabled, and disabled firewalld. This is a new cluster with fresh installs of Fedora-IOT 33 and k3s v1.20.5+k3s1.

@stale
Copy link

stale bot commented Oct 2, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Oct 2, 2021
@stale stale bot closed this as completed Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants