Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed hybrid and wireguard-native : wrong interface internal IP ? #7355

Closed
acamacho-unige opened this issue Apr 26, 2023 · 8 comments
Closed

Comments

@acamacho-unige
Copy link

acamacho-unige commented Apr 26, 2023

Environmental Info:
K3s Version:

  • k3s version v1.26.3+k3s1 (01ea3ff)
  • go version go1.19.7

Node(s) CPU architecture, OS, and Version:

  • Linux k3s-master 5.10.0-21-amd64 Use-case? #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
  • Linux k3s-node1 5.10.0-21-amd64 Use-case? #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
  • Linux debian-test 5.10.0-21-amd64 Use-case? #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

Cluster Configuration:

  • 1 server, 2 nodes
    • site1 (public_ip = XX.XX.XX.XX) : k3s-master (192.168.1.161) + k3s-node1 (192.168.1.162)
    • site2 (public_ip = YY.YY.YY.YY) : debian-test (192.168.1.230)

K3s-master configuration :

ExecStart=/usr/local/bin/k3s \
    server \
    --node-external-ip XX.XX.XX.XX \
    --kube-apiserver-arg service-node-port-range=31000-31767 \
    --flannel-backend=wireguard-native \
    --flannel-external-ip \

K3s-node1 configuration :

$ cat /etc/systemd/system/k3s-agent.service
ExecStart=/usr/local/bin/k3s \
    agent \

$ cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='XXXXXXXXXXXXXXXXXXXX'
K3S_URL='https://192.168.1.161:6443'

Debian-test configuration :

$ cat /etc/systemd/system/k3s-agent.service
ExecStart=/usr/local/bin/k3s \
    agent \
    --node-external-ip=YY.YY.YY.YY \

$ cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='K100723f093e1a69c6669619c6579228fa03f52f7861c34a4271b3a0917bb6146bc::server:e73ab9b910abae9aeb036ff4bea1c1f5'
K3S_URL='https://XX.XX.XX.XX:6443'

Describe the bug:

  • I am trying to create a hybrid cluster with two nodes on site1 and one node on site2.
  • I understood from documentation that using --flannel-backend=wireguard-native should create a mesh VPN network between nodes and use that network for internal communication
  • I expected that internal_ip in kubectl get nodes -o wide should show wireguard IP
  • wg show on every node seems to confirm that master can communicate with workers, and workers can ping master, using VPN mesh network
  • What I did not understand :
    • I don't know If worker nodes should be able to ping each other using VPN mesh network
    • I don't know If i can create a distributed hybrid if k3s-master and k3s-node1 do not have a different public IP address

Steps To Reproduce:

  • Installed K3s server using configurations above
  • Installed K3s workers using configurations above
  • Deploy an hello-world to confirm that I can reach pod on debian-test using service or ingress

Expected behavior:

  • Result of kubectl get nodes -o wide on master
NAME          STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                         KERNEL-VERSION    CONTAINER-RUNTIME
k3s-master    Ready    control-plane,master   36d   v1.26.3+k3s1   10.42.0.1   XX.XX.XX.XX    Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1
debian-test   Ready    <none>                 47h   v1.26.3+k3s1   10.42.2.1   YY.YY.YY.YY   Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1
k3s-node1     Ready    <none>                 36d   v1.26.3+k3s1   10.42.1.1   <none>          Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1

Actual behavior:

  • Result of kubectl get nodes -o wide on master
NAME          STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                         KERNEL-VERSION    CONTAINER-RUNTIME
k3s-master    Ready    control-plane,master   36d   v1.26.3+k3s1   192.168.1.161   XX.XX.XX.XX    Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1
debian-test   Ready    <none>                 47h   v1.26.3+k3s1   192.168.1.230   YY.YY.YY.YY   Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1
k3s-node1     Ready    <none>                 36d   v1.26.3+k3s1   192.168.1.162   <none>          Debian GNU/Linux 11 (bullseye)   5.10.0-21-amd64   containerd://1.6.19-k3s1

Additional context / logs:

  • I am probably misunderstanding something network related, but can't figure out what

Thanks a lot !

@acamacho-unige acamacho-unige changed the title Distributed hybrid and wireguard-native Distributed hybrid and wireguard-native : wrong interface internal IP ? Apr 26, 2023
@acamacho-unige
Copy link
Author

acamacho-unige commented Apr 26, 2023

Complementary information :

  • I think my setup has a problem when I deployed kube-prometheus, and one part of the stack is the automatic deployment of /node-exporter.
  • kube-prometheus seem to try to deploy and access node-exporter pods on the following IP addresses :
Endpoint | State | Labels | Last Scrape | Scrape Duration | Error
-- | -- | -- | -- | -- | --
https://192.168.1.161:9100/metrics | UP | container="kube-rbac-proxy"endpoint="https"instance="k3s-master"job="node-exporter"namespace="monitoring"pod="node-exporter-gk64g"service="node-exporter" | 14.433s ago | 17.187ms |  
https://192.168.1.230:9100/metrics | DOWN | container="kube-rbac-proxy"endpoint="https"instance="debian-test"job="node-exporter"namespace="monitoring"pod="node-exporter-vp6ch"service="node-exporter" | 8.346s ago | 3.59s | Get "https://192.168.1.230:9100/metrics": dial tcp 192.168.1.230:9100: connect: no route to host
https://192.168.1.162:9100/metrics | UP | container="kube-rbac-proxy"endpoint="https"instance="k3s-node1"job="node-exporter"namespace="monitoring"pod="node-exporter-jjn9f"service="node-exporter" | 10.753s ago | 17.459ms |  
  • I don't know if a correctly setted up cluster should advertise the correct IP for debian-test node (external IP).
  • Or If it is totally expected behaviour, and that I need to configure my prometheus-stack to use an external IP when trying to connect to debian-test:9100/metrics.

Another example of the output of kubectl get endpoints. It seems to me that while we have a local internal IP here, master node will never be able to ping the endpoint.

root@k3s-master:/# kubectl -n monitoring get endpoints node-exporter
NAME            ENDPOINTS                                                  AGE
node-exporter   192.168.1.161:9100,192.168.1.162:9100,192.168.1.230:9100   21h

Thanks a lot

@brandond
Copy link
Contributor

The wireguard flannel backend only handles cluster networking - traffic between pods and services on different nodes. It does not solve the problem of how to allow pods to access nodes across isolated networks. Things like prometheus or metrics-server that attempt to scrape nodes directly will not work, as traffic leaving the cluster to a node IP is not handled by the CNI.

I would take a look at #7353 - it addresses many of the challenges you're running into.

@acamacho-unige
Copy link
Author

The wireguard flannel backend only handles cluster networking - traffic between pods and services on different nodes. It does not solve the problem of how to allow pods to access nodes across isolated networks. Things like prometheus or metrics-server that attempt to scrape nodes directly will not work, as traffic leaving the cluster to a node IP is not handled by the CNI.

I would take a look at #7353 - it addresses many of the challenges you're running into.

Can I assume from your reply that my k3s setup in correct, and that my problem is that deploying my kube-prometheus in my cluster will need some custom configuration ?

@brandond
Copy link
Contributor

It looks good at first glance. You didn't mention any problems with pod-to-pod or pod-to-service connectivity so I'm assuming the wireguard-native flannel backend is working properly otherwise?

@acamacho-unige
Copy link
Author

Yes, that seem to work without any problem as I can access an hello-world service that brings me to a pod on my remote node (debian-test) as well as a pod on my master node (k3s-master).

Thank you very much for your valuable information and quick reply !

@manuelbuil
Copy link
Contributor

I can confirm that your config is totally correct. Your node IP keeps being the private IP. If you execute ip r you will see that for non-local traffic, traffic is routed via the wireguard interface (in my case 10.42.0.0/16 dev flannel-wg scope link).
As @brandond mentioned, pod-->remote-host communication does not work as expected because it follows a different path. This PR will include an alternative to solve it #7353

@acamacho-unige
Copy link
Author

Thanks a lot.

For other people having the same problem, what I needed to specify on master-node and worker-node was the --node-ip parameter.

Forcing the Wireguard VPN IP on each node solved my problems.

@caroline-suse-rancher
Copy link
Contributor

Wonderful to hear we were able to help! Closing for now since there's not a clear bug/request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants