-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RKE INTERNAL-IP and EXTERNAL-IP addresses are not correctly set #22584
Comments
As a reference, at https://github.com/rgl/kubernetes-ubuntu-vagrant/blob/master/provision-kubernetes-master.sh, I launch k8s with kubeadm init \
--kubernetes-version=1.15.3 \
--apiserver-advertise-address=10.11.0.101 \
--pod-network-cidr=10.12.0.0/16 \
--service-cidr=10.13.0.0/16 \
--service-dns-domain=vagrant.local
ip address show dev eth0 | grep 'inet '
# => inet 192.168.121.77/24 brd 192.168.121.255 scope global dynamic eth0
ip address show dev eth1 | grep 'inet '
# => inet 10.11.0.101/24 brd 10.11.0.255 scope global eth1
kubectl get nodes -o wide
# => NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# => km1 Ready master 23m v1.15.3 10.11.0.101 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://18.9.8
kubectl describe nodes
# => Name: km1
# => Roles: master
# => Labels: beta.kubernetes.io/arch=amd64
# => beta.kubernetes.io/os=linux
# => kubernetes.io/arch=amd64
# => kubernetes.io/hostname=km1
# => kubernetes.io/os=linux
# => node-role.kubernetes.io/master=
# => Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
# => node.alpha.kubernetes.io/ttl: 0
# => volumes.kubernetes.io/controller-managed-attach-detach: true
# => Addresses:
# => InternalIP: 10.11.0.101
# => Hostname: km1
ps auxw|grep 10.11.0.101
# => root 8378 4.0 4.8 1859952 49228 ? Ssl 10:23 0:46 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf --node-ip=10.11.0.101
# => root 8791 4.6 21.3 403328 214936 ? Ssl 10:23 0:53 kube-apiserver --advertise-address=10.11.0.101 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.13.0.0/16 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
# => root 8873 2.5 3.2 10538668 32996 ? Ssl 10:23 0:28 etcd --advertise-client-urls=https://10.11.0.101:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://10.11.0.101:2380 --initial-cluster=km1=https://10.11.0.101:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.11.0.101:2379 --listen-peer-urls=https://10.11.0.101:2380 --name=km1 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt |
Is there a way to work this around? Just faced this issue and it is quite a blocker for me |
Facing this issue as well. Internal-IP is set to a public IP address and I cannot get the nodes to communicate over given private IP address. I've tried |
Any news on this? |
There seems to be no way to convince a RKE cluster to only use specified IPs for a given node if the node's primary ethernet interface ( In my use-case working with Hetzner Cloud (or baremetal), the nodes have public IPs and traffic is explicitly blocked on the public interface. All private traffic traverses a secondary VLAN interface with a different IP address. The workaround:
Huge caveat with the workaround is that the node still uses |
This issue is currently hitting us in a (at the time of writing) benign way: This is an issue that needs to be fixed, ASAP, considering the age of it. |
I've managed to "fix" the way Flannel extract host IP by passing the below options through network:
plugin: canal
options:
canal_iface: ens10 <- this interface is mounted on a Private Network
flannel_iface: ens10
mtu: 0
canal_network_provider:
iface: ens10
flannel_network_provider:
iface: ens10
node_selector: {}
update_strategy: null There's still a problem with the method used to extract the
* I'm trying to configure Kubernetes to communicate between hosts using Hetzner Private networks and blocking all the incoming traffic from the Public IP address. UpdateI've managed to solve the |
Probably related rke issue "add ability to force set node-ip argument on kubelet": rancher/rke#900 |
Any news on when will this be fixed? |
Hitting same issue rke2. |
Hitting same issue. |
On RKE2 issue with node unable to communicate to eachother was due to the kubelet defaulting to DNS or ExternalIP Our /etc/rancher/rke2/config.yaml look like:
Workaround, but at least we can use the cluster. |
Any way to apply these changes when launching an RKE cluster from rancher? Like@iosifnicolae2 I am using Hetzner private networks for communications. No traffic is allowed over the public network and we're deploying clusters using the node driver from rancher. I've tried adding the network plugin settings however when deploying things like monitoring I'm seeing that the node exporter endpoints are using the public ip address and so cannot be reached. |
Additionaly this causes issues with certificates as they are setup for the private lan and since requests go over the public network the certs aren't valid. |
Were you able to solve your issue? I'm facing the same challenges and I haven't had success so far. Whatever I do, it seems it always boils done to the nodes not using the internal IPs that I assign to them. |
A quick fix for this problem is to setup CertManager to perform the verification using a DNS challange. |
@iosifnicolae2 I think what @haswalt actually tried to achieve is securing the internal network with SSL encryption. But the certs are generated for the public IP which the nodes don't use in this case (the use internal IP). Using a DNS challenge is good for issuing certs for a public reachable domain. Has anyone figured a working configuration for using a Rancher generated RKE cluster with a Hetzner private network? This is what I'm after:
The cluster nodes are attached to Networks A and B. In network A we have the Hetzner Load Balancer with the public IP, in network B the is the Rancher as a single Docker instance (for now). Adding a node to the cluster works, but the public IP is used. I want the node to use the internal IP in network B. |
@riker09 you are correct, that was my aim. However it's not just certs that are the problem. The incorrect IP setup means things like the node exorters for prometheus and other endpoints are setup using the public IP and cannot be accessed NOT (and I think this is more important) secured with a firewall. |
I think we're on the same page here. 🙂 What is puzzling me is the fact, that this is still unsolved. When I explicitly tell the cluster node upon registration to use the private network interface (here: docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.5.7 --server https://rancher.[REDACTED].tld --token [REDACTED] --address ens11 --internal-address ens11 --worker I don't have/see any problem with the nodes connecting to the Rancher cluster via its public IP. Does anybody else object to that? |
Except I wouldn't want nodes to connect over the public network. That means that rancher communication is going over the WAN. I would expect to be able to make nodes use the private network exclusively. |
Good point. But since the traffic should be SSL encrypted I didn't pay much thought to it. I will, however, setup a single Rancher node that will reside behind a load balancer. |
No, you can issue HTTPS certificates for a domain name that points to an internal IP if you do a DNS challenge verification (I'm having HTTPS certificates for domains that point to an internal IP). |
I think the point is being missed here and the focus is being pushed onto SSL. Let's ignore SSL for now as SSL can be achieved multiple ways. The issue is that the external and internal IPs are set incorrectly so that communication between nodes does not work over a public network. In conversations with others on the Rancher community we managed to get a working setup (outside of Hetzner) however this involved using a stack where we could remove the public network interface and only have 1, private, interface as eth0. The setup we're aiming for here is each node has 2 network interfaces (as provided by hetzner). This can't really be changed. eth0 is connected to the public WAN. ens11 is the private network between nodes. We want to be able to secure communication between rancher using a firewall. With autoscaling in place, we don't know the IP address of each new node, but we do know the subnet of the private lan so we can allow access via that. We also want to avoid traffic going over the WAN entirely. Any public network access happens via the load balancers which send traffic over the private network (ens11). So we essentially want to disable any communication over eth0. Now when we launch new nodes with RKE they don't set the interface / internal network correctly so connections to Rancher server, and various endpoints attempt to use eth0 which fails because the firewall blocks all traffic over the public WAN. |
Yes, this! I couldn't say it any better (really, I tried), so thank you for your summary. Just a few notes, however:
You meant "...but we DO know the subnet", right? And the private networks between LB A and Rancher Server and LB B and the Nodes could differ. Also between the Rancher Server and the Nodes. Hetzner is naming the virtual network interfaces [EDIT] |
You're right I did mean "DO" thanks. |
@riker09 The interface name is dictated by the VM type. From Heztners docs:
|
I believe I found a working solution for my problem. Let me explain my setup: There are three CX31 nodes running at Hetzner Cloud (the type shouldn't matter, I'm just being thorough). All are provisioned with a combination of Cloud Init and Ansible. #cloud-config
groups:
- mygroup
users:
- name: myuser
groups: users, admin, mygroup
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- [REDACTED]
packages:
- fail2ban
package_update: true
package_upgrade: true
runcmd:
## Enable Fail2Ban & SSH jail
- mkdir -p /etc/fail2ban/jail.d
- touch /etc/fail2ban/jail.d/sshd.local
- printf "[sshd]\nenabled = true\nbanaction = iptables-multiport" > /etc/fail2ban/jail.d/sshd.local
- systemctl enable fail2ban
## Harden SSH
- sed -i -e '/^\(#\|\)PermitRootLogin/s/^.*$/PermitRootLogin no/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)PasswordAuthentication/s/^.*$/PasswordAuthentication no/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)X11Forwarding/s/^.*$/X11Forwarding no/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)MaxAuthTries/s/^.*$/MaxAuthTries 2/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)AllowTcpForwarding/s/^.*$/AllowTcpForwarding yes/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)AllowAgentForwarding/s/^.*$/AllowAgentForwarding no/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)AuthorizedKeysFile/s/^.*$/AuthorizedKeysFile .ssh\/authorized_keys/' /etc/ssh/sshd_config
- sed -i '$a AllowUsers myuser' /etc/ssh/sshd_config
## Reboot
- reboot After the initial cloud config has successfully run I further setup each node with Ansibe. Nothing fancy, though. I install Docker with One of the three nodes is hosting the Rancher installation, started by I've created to networks in Hetzner Cloud named This is the command that I've used to provision the cluster on the two remaining nodes: ## Main node
docker run -d \
--privileged \
--restart=unless-stopped \
--net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /var/run:/var/run \
rancher/rancher-agent:v2.5.7 \
--server https://10.2.0.4 \
--token [REDACTED] \
--ca-checksum [REDACTED] \
--address ens11 \
--internal-address ens10 \
--all-roles
## Worker node
docker run -d \
--privileged \
--restart=unless-stopped \
--net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /var/run:/var/run \
rancher/rancher-agent:v2.5.7 \
--server https://10.2.0.4 \
--token [REDACTED] \
--ca-checksum [REDACTED] \
--address ens11 \
--internal-address ens10 \
--worker For the sake of completeness, this is the Rancher cluster.yaml config: answers: {}
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
enabled: false
name: mycluster
rancher_kubernetes_engine_config:
addon_job_timeout: 45
authentication:
strategy: x509
authorization: {}
bastion_host:
ssh_agent_auth: false
cloud_provider: {}
dns:
linear_autoscaler_params: {}
node_selector: null
nodelocal:
node_selector: null
update_strategy: {}
options: null
reversecidrs: null
stubdomains: null
tolerations: null
update_strategy: {}
upstreamnameservers: null
ignore_docker_version: true
ingress:
default_backend: false
http_port: 0
https_port: 0
provider: none
kubernetes_version: v1.20.5-rancher1-1
monitoring:
provider: metrics-server
replicas: 1
network:
mtu: 0
options:
flannel_backend_type: vxlan
plugin: weave
weave_network_provider: {}
restore:
restore: false
rotate_encryption_key: false
services:
etcd:
backup_config:
enabled: true
interval_hours: 12
retention: 6
safe_timestamp: false
timeout: 300
creation: 12h
extra_args:
election-timeout: '5000'
heartbeat-interval: '500'
gid: 0
retention: 72h
snapshot: false
uid: 0
kube-api:
always_pull_images: false
pod_security_policy: false
secrets_encryption_config:
enabled: false
service_node_port_range: 30000-32767
kube-controller: {}
kubelet:
fail_swap_on: false
generate_serving_certificate: false
kubeproxy: {}
scheduler: {}
ssh_agent_auth: false
upgrade_strategy:
drain: false
max_unavailable_controlplane: '1'
max_unavailable_worker: 10%
node_drain_input:
delete_local_data: false
force: false
grace_period: -1
ignore_daemon_sets: true
timeout: 120
scheduled_cluster_scan:
enabled: false
scan_config:
cis_scan_config:
override_benchmark_version: rke-cis-1.5
profile: permissive
schedule_config:
cron_schedule: 0 0 * * *
retention: 24 With this configuration I was able to create a DaemonSet with an NginX image and a Service with a NodePort 30080 that the Load Balancer routes to. Also the deployment of Longhorn went through without any issues (which has failed in the past). The thing is, when I change the CNI from Weave to Canal everything falls apart. So either the default setup for Canal is buggy or missing some essential configuration. 🤷🏻♂️ |
@haswalt I figured out a way that works. The solution was already in this very issue, see #22584 (comment) The trick is to use two private networks and use Weave as CNI provider. In my experiments when I used Canal it wouldn't work although the nodes overview showed two private IPs. But Longhorn would still try to access a public IP. Long story short, here's my setup: Two private networks that are attached to:
I have created a third private network that is attached to a Load Balancer and the cluster nodes, but that does not affect the Rancher-Cluster-setup. I'm just mentioning this for completeness. When creating the cluster in Rancher I select Weave as CNI, answer "no" to Authorized Endpoint and "no" to Nginx Ingress and that's it. Since my Rancher installation uses a Let's Encrypt certificate I made one more change to my cluster nodes and that is a modification to the
Let me know if this works for you. |
@riker09 I'm not sure but I think I ran into this problem a long time ago when experimenting with openstack..
The reduction in MTU is used so whatever space is left can be used to add overlay routing information to the packet. I believe the solution was to either
/edit
Where does that 10.1.0.2 address fit in?
Should that not have been an address from your "rancher" range ? |
You are absolutely right, this should have been an IP from one of the two networks. Achievement unlocked: Typo spotter! 😄 Thanks for your insights into the MTUs, I hope this is helpful for somebody else. I guess I will stick to my two network solution for now. |
This issue is still relevant. At least until a solution like #17180 is implemented |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
What @cortopy said. 😄 |
If you look at Hetzner (private) networks and subnets more closely the cloud servers are not in one L2 network and can only reach each other via the gateway of the given subnet. (look at So one either:
|
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
not stale |
I just ran in to this issue this week and found a work-around. It seems like the main issue is that rancher does not pass the The behavior which Kubelet uses to determine the IP can be found here: https://github.com/kubernetes/kubernetes/blob/0e0abd602fac12c4422f8fe89c1f04c34067a76f/pkg/kubelet/nodestatus/setters.go#L214, it boils down to:
So simply adding your desired node IP along with the nodes hostname to |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
not stale |
Hi ! @jszanto your analyse is right : nothing sent to kubelet, so it tries it self. BUT kubelet container it self is responsible of that wrong IP. Line 163 in 13aff47
And solution is easy and there Just move the block from line 215
At line 162 And then use the $RESOLVED_ADDR in the line 164
There lot of chances the way to the resolver is the way of the private NET Hey rancher team, please add this small hack in the short terms, it would really help Bare Metal deployment with public/private network. Thanks ! --------------- UPDATED --------------- |
…etwork interfces are on the node, and you want to use the private one : see rancher#22584 (comment)
I also facing the same issue but the solution provided here doesn't worked for me.
Also tried
but no luck :( |
Hi everyone, I just wanted to add a solution that combined a few things here, and is working for me! By the way, I know some people have major requirements for specific CNIs, if that's the case then this may not help because I had to use Calico for this to work (thanks bpesics for your comment above, it's what made me give it a try 😄). Just for context, this is a cluster on Hetzner Cloud that requires outgoing traffic but I don't need all of that traffic to come from a specific/static IP. I used RKE to set this up and I wanted to load balance requests across all of my worker nodes. The steps I took were as follows:
You can now load balance across the worker nodes on port I know many may say that this is stupid because I am using One final note, the reason that I chose a general network ( Hope this helps, and thanks to everyone in this issue (and others) that gave feedback that led to this solution. I appreciate the community and the knowledge shared here! |
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions. |
Not stale. Is there still no "proper" way to provision an RKE cluster with dual nics? |
Not stale |
1 similar comment
Not stale |
Ability to launch RKE in configurations with two network interfaces on node machines is highly needed. |
What kind of request is this (question/bug/enhancement/feature request):
bug.
Steps to reproduce (least amount of steps as possible):
Add a node to a RKE cluster as:
https://github.com/rgl/rancher-single-node-ubuntu-vagrant/blob/048567e05b87247ce14b1b3d2680314cbd7f3115/provision-rancher.sh#L182-L199
Result:
The
INTERNAL-IP
andEXTERNAL-IP
and not correctly set as can be seen on the following output:Other details that may be helpful:
This is using a vagrant VM which has two interfaces, eth0 (192.168.121.150) and eth1(10.1.0.3). It should use the eth1(10.1.0.3) ip address as
INTERNAL-IP
andEXTERNAL-IP
addresses.The vagrant environment is at https://github.com/rgl/rancher-single-node-ubuntu-vagrant.
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.2.8Cluster information
kubectl version
):docker version
):The text was updated successfully, but these errors were encountered: