Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

Closed
dnoliver opened this issue Jul 22, 2020 · 13 comments

Comments

@dnoliver
Copy link

dnoliver commented Jul 22, 2020

Environmental Info:
K3s Version:

[root@first ~]# k3s -v
k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version:

[root@first ~]# uname -a
Linux first.mshome.net 5.5.17-200.fc31.x86_64 #1  SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@second test]# uname -a
Linux second.mshome.net 5.5.17-200.fc31.x86_64 rancher/k3s#1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

2 VMs running Fedora IoT 31
1 VM (first.mshome.net) is the primary
1 VM (second.mshome.net) is the secondary

Describe the bug:

Running the Kubernetes Basic sample to test my deployment.

The hosts were deployed following the Kubernetes on Fedora IoT with k3s post

The primary is deployed with:

curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -

The secondary is deployed with:

curl -sfL https://get.k3s.io | K3S_URL=https://first.mshome.net:6443 INSTALL_K3S_SKIP_START=true K3S_TOKEN=K10977bcc7b0cf459d06084e75a4055c4899c4c1a83f9b7df59f6b1565e95383821::server:183f8d6761f5ea728f8b15142b0c43d4 sh -

After everything is started, nodes are up and running

[root@first ~]# kubectl get nodes
NAME                STATUS   ROLES    AGE     VERSION
first.mshome.net    Ready    master   3h30m   v1.18.6+k3s1
second.mshome.net   Ready    <none>   104m    v1.18.6+k3s1

After following the tutorial, pods are running:

NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-p7l8f   1/1     Running   2          89m
kubernetes-bootcamp-6f6656d949-dnsjk   1/1     Running   2          102m
kubernetes-bootcamp-6f6656d949-sffrn   1/1     Running   2          89m
kubernetes-bootcamp-6f6656d949-89ckv   1/1     Running   2          89m

Getting the pods ip addresses:

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.0.28
10.42.0.31
10.42.1.13
10.42.1.11

Start a test container for pinging pods:

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ #

The sample app publishes the 8080 port. But I can only wget it from the pods running in the same host:

/ # wget -qO- --timeout 5 10.42.0.28:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-p7l8f | v=1
/ # wget -qO- --timeout 5 10.42.0.31:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-dnsjk | v=1
/ # wget -qO- --timeout 5 10.42.1.13:8080
wget: download timed out
/ # wget -qO- --timeout 5 10.42.1.11:8080
wget: download timed out

I have seen several similar issues, and I have tried some of the firewalld commands posted:

[root@first ~]# history | grep firewall-cmd
  100  firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
  101  firewall-cmd --reload
  111  firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
  112  firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
  113  firewall-cmd --reload

This was done in both nodes (primary and secondary), and tried several reboots, and the issue is still there.

One thing that I did not see in any issue is that nmcli is telling me that flannel is not running.

[root@first ~]# nmcli 
eth0: connected to eth0
        "The Linux Foundation Microsoft Hyper-V"
        ethernet (hv_netvsc), 00:15:5D:E5:22:37, hw, mtu 1500
        ip4 default
        inet4 172.17.110.196/28
        route4 0.0.0.0/0
        route4 172.17.110.192/28
        inet6 fe80::361c:2bb7:30ee:4a9d/64
        route6 fe80::/64
        route6 ff00::/8

cni0: connected to cni0
        "cni0"
        bridge, 32:CA:5A:20:1E:B1, sw, mtu 1450
        inet4 10.42.0.1/24
        route4 10.42.0.0/16
        inet6 fe80::30ca:5aff:fe20:1eb1/64
        route6 fe80::/64
        route6 ff00::/8

flannel.1: disconnected
        "flannel.1"
        vxlan, B2:FF:73:E6:CF:A3, sw, mtu 1450

Also, I do not have flanneld. I am not sure if I need it, though.

[root@first ~]# systemctl status flanneld
Unit flanneld.service could not be found.

Steps To Reproduce:

  • Installed K3s: described above

Expected behavior:

wget -qO- --timeout 5 10.42.1.11:8080 should not fail.

Actual behavior:

wget -qO- --timeout 5 10.42.1.11:8080 fails.

Additional context / logs:

Additionally, I can do several kubernetes related operations without problems: drain, uncordon, update, rollaback, service creation, everything works. But when 2 pods on different hosts try to talk, or when I use a service to load balance the request, things start failing

@brandond
Copy link
Contributor

Flannel is built in to k3s, you will not see a separate service for it.

Can you try stopping k3s, stopping firewalld (or any other iptables-based firewall), and then starting k3s - just to ensure that it's not conflicting with something else?

@dnoliver
Copy link
Author

stopped firewalld in both primary and secondary and restarted k3s and k3s-agent

still same result

[root@first ~]# systemctl status firewalld.service 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2020-07-22 15:00:53 PDT; 1min 35s ago
     Docs: man:firewalld(1)
  Process: 815 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 815 (code=exited, status=0/SUCCESS)

Jul 22 09:54:04 first.mshome.net systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 22 09:54:05 first.mshome.net systemd[1]: Started firewalld - dynamic firewall daemon.
Jul 22 15:00:52 first.mshome.net systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jul 22 15:00:53 first.mshome.net systemd[1]: firewalld.service: Succeeded.
Jul 22 15:00:53 first.mshome.net systemd[1]: Stopped firewalld - dynamic firewall daemon.

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.1.13
10.42.1.11
10.42.0.28
10.42.0.31

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # wget -qO- --timeout 5 10.42.1.13:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-sffrn | v=1
/ # wget -qO- --timeout 5 10.42.0.28:8080
wget: download timed out

@dnoliver
Copy link
Author

Additional information:

Using the Kube Proxy, I can communicate with all the pods:

[root@first ~]# kubectl proxy   
Starting to serve on 127.0.0.1:8001

[root@first ~]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-lqkgk   1/1     Running   1          13h
kubernetes-bootcamp-6f6656d949-dnsjk   1/1     Running   3          24h
kubernetes-bootcamp-6f6656d949-p7l8f   1/1     Running   3          24h
kubernetes-bootcamp-6f6656d949-z4hkc   1/1     Running   1          13h

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-lqkgk:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lqkgk | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-dnsjk:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-dnsjk | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-p7l8f:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-p7l8f | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-z4hkc:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-z4hkc | v=1

@brandond
Copy link
Contributor

brandond commented Jul 23, 2020

FWIW, I tried this on Fedora IoT 32 and wasn't even able to get it to start due to changes to kernel cgroups. I'll try again on 31.

@dnoliver
Copy link
Author

About the SELinux comment that I don't see now: I am installing the following:

rpm-ostree install --reboot https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm

I have this SELinux packages installed:

[root@first ~]# rpm -qa selinux*
selinux-policy-3.14.4-50.fc31.noarch
selinux-policy-targeted-3.14.4-50.fc31.noarch

[root@first ~]# rpm -qa *-selinux
k3s-selinux-0.1.1-rc1.el7.noarch
rpm-plugin-selinux-4.15.1-1.fc31.x86_64
container-selinux-2.124.0-3.fc31.noarch
cockpit-selinux-220-1.fc31.noarch

Also, after installing k3s, I need to do a restorecon on the created directory, otherwise the policy throws errors:

$ restorecon -R /var/lib/rancher

And also, for testing purposes, I am running with the container_t context as permissive:

[root@first ~]# semodule -l | grep permissive
permissive_container_t
permissivedomains

For the CGRoups problem, you are right, you need to downgrade to CGRoups v1 in F32 to make any container system to work. This is my kernel command line:

[root@first ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/ostree/fedora-iot-108a44c2f0a50881cdd8c62efa9680697e3ae3eca304c727a89507ba1e53e219/vmlinuz-5.5.17-200.fc31.x86_64 ima_policy=tcb user_namespace.enable=1 systemd.unified_cgroup_hierarchy=0 lockdown=confidentiality resume=/dev/mapper/system-swap rd.lvm.lv=system/root rd.lvm.lv=system/swap rd.shell=0 root=/dev/mapper/system-root ostree=/ostree/boot.1/fedora-iot/108a44c2f0a50881cdd8c62efa9680697e3ae3eca304c727a89507ba1e53e219/0

@brandond
Copy link
Contributor

brandond commented Jul 23, 2020

Thanks - I hadn't gotten that far yet since I just had a spare moment to test. I didn't see mention of those steps on the fedoramagazine post which is odd since it won't work at all otherwise.

I was able to reproduce this, but I don't have any idea off the top of my head what it might be.

@brandond
Copy link
Contributor

brandond commented Jul 24, 2020

I tried restarting the k3s server node with --flanel-backend=host-gw and everything works, so I suspect it is something related to vxlan.

There was a kernel vxlan issue that was triggered by kube-proxy's iptables rules, but that was supposed to have been fixed (and I confirmed the fix on several different system) as of the most recent releases of 1.16/1.17/1.18, so I'm not sure what it might be.

In the mean time, you can use host-gw instead of vxlan.

@dnoliver
Copy link
Author

Updated k3s service to use host-gw backend:

[root@first ~]# systemctl cat k3s.service 
# /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
    --flannel-backend=host-gw

Restarted/rebooted primary

In the secondaries, I edited the following file to use host-gw. Is this necesary?

[root@second agent]# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json 
{
        "Network": "10.42.0.0/16",
        "Backend": {
        "Type": "host-gw"
}
}

Restarted the secondaries. At this point, I could communicate with the pod running in one of the secondaries, but not the other one (freshly deployed). After typing random commands in the non-working secondary, I applied the firewall-cmd customization:

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

And then, everything works! I can talk with my pods :)

After that, I tested the same with a service:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080

[root@first ~]# kubectl get services
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
kubernetes            ClusterIP   10.43.0.1    <none>        443/TCP          2d
kubernetes-bootcamp   NodePort    10.43.6.56   <none>        8080:31175/TCP   96s

And from the alpine pod:

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh                                                                                                                                  
If you don't see a command prompt, try pressing enter.

/ # wget -qO- 10.43.6.56:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-vm7p7 | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-mslrq | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-zgg82 | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-pvc4k | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-mslrq | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-vm7p7 | v=1

NodePort Service working fine!

Follow up questions:

  1. Do I need to modify /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json in the secondaries as well to enable host-gw?
  2. Are the 3 firewall-cmd customization necessary?
  3. Based on the Supported Backends docs, I have the feeling that host-gw will work fine (if not better) for my use case, which are several hosts communicated in a LAN via ethernet or wifi. In which case I would need vxlan? For example, If I want one of my pods to move from my local nodes to a cloud host?

Thank you!

@brandond
Copy link
Contributor

You only need to set --flannel-backend=host-gw on the server, since the flannel configuration is part of the kubelet config distributed to agents by the servers. You will need to restart the servers first, and then the agents, to pull in the change. I generally run with firewalld off but I suspect the the additional rules will still be necessary.

@dnoliver
Copy link
Author

Thanks!

I have updated my kickstart configuration to apply this changes, and re-deploy my cluster from scratch.
NOTE: All the nodes use a partition mounted in /var/lib/rancher

My primary is deployed with:

curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s.service.d/
cat > /etc/systemd/system/k3s.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount

[Service]
ExecStart=
ExecStart=/usr/local/bin/k3s server --flannel-backend=host-gw
EOA

The empty ExecStart line is mandatory, otherwise systemd refuse to run the service because of bad configuration (something like "service of type oneshot cannot have multiple ExecStart")

The secondaries are deployed with:

curl -sfL https://get.k3s.io | K3S_URL=https://first.mshome.net:6443 INSTALL_K3S_SKIP_START=true \
    K3S_TOKEN=<TOKEN> sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s-agent.service.d/
cat > /etc/systemd/system/k3s-agent.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount
EOA

And the /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json is automatically populated with "host-gw" :)

After this new deployment, I have executed this tests to check connectivity:

# Deployment
kubectl create deployment kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1
kubectl scale deployment kubernetes-bootcamp --replicas=4

# Communication between pods
kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
kubectl run -it --rm --restart=Never alpine --image=alpine sh
for i in <POD 1 IP> <POD 2 IP> <POD 3 IP> <POD 4 IP>; do wget -qO- $i:8080; done

# Communication with Service from inside pod
kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
kubectl get service kubernetes-bootcamp
kubectl run -it --rm --restart=Never alpine --image=alpine sh
wget -qO- <SERVICE IP>:8080

# Communication with Kube Proxy
kubectl get pods
kubectl proxy
curl http://localhost:8001/api/v1/namespaces/default/pods/<POD NAME>:8080/proxy/

# Communication with Service using Kube Proxy
kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
kubectl get service kubernetes-bootcamp
curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/

Communication between pods

Works!

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.0.8
10.42.0.9
10.42.1.3
10.42.1.4

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # for i in 10.42.0.8 10.42.0.9 10.42.1.3 10.42.1.4; do wget -qO- $i:8080; done
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lh7s4 | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-rtcz4 | v=1
/ # exit
pod "alpine" deleted

Communication with Service from inside pod

Works!

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get service kubernetes-bootcamp
NAME                  TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes-bootcamp   NodePort   10.43.187.101   <none>        8080:31812/TCP   5s

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lh7s4 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

Communication with Kube Proxy

Works only for pods in the primary. This was working before as in #2049 (comment).

[root@first ~]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-6gnjj   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-txbh8   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-lh7s4   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-rtcz4   1/1     Running   0          30m

[root@first ~]# kubectl proxy &
[1] 23995

[root@first ~]# Starting to serve on 127.0.0.1:8001

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-6gnjj:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-txbh8:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-lh7s4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.3:8080: connect: no route to host' 

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-rtcz4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.4:8080: connect: no route to host'

[root@first ~]# kill 23995

Communication with Service using Kube Proxy

Only working for pods located in the same node as the service:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get service kubernetes-bootcamp
NAME                  TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
kubernetes-bootcamp   NodePort   10.43.62.217   <none>        8080:30303/TCP   4s

[root@first ~]# kubectl proxy &
[1] 24592
[root@first ~]# Starting to serve on 127.0.0.1:8001

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.4:8080: connect: no route to host' 

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.3:8080: connect: no route to host' 

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

So, pod-to-pod networking and pod-to-service-to-pod is working fine apparently. This was the reported issue, and seems to be solved by using host-gw and the firewall rules

Kube Proxy Networking seems to have additional problems. I could create a new issue for that one, or continue here, whatever you prefer!

@brandond
Copy link
Contributor

Lets do another issue for that one.

@dnoliver
Copy link
Author

Done! so the state of this issue is:

  1. VXLAN does not work in this setup
  2. Use HOST GW as a workaround
  3. Kube Proxy problems reported another issue Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31) #2069

@caroline-suse-rancher
Copy link
Contributor

Closing due to age, and seems to be a relatively isolated incident

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants