Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HA] etcd cannot listen behind NAT if there is only one master node existing #2345

Closed
stevefan1999-personal opened this issue Oct 5, 2020 · 7 comments
Assignees
Labels
area/etcd kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@stevefan1999-personal
Copy link

stevefan1999-personal commented Oct 5, 2020

If you are using a home router, and that you want to use etcd, then you are out of luck now because the embedded etcd assumes its advertise-address to be the same as what k3s does too -- this means it will try and listen to the external IP and it will not listen to your actual internal IP (e.g. you got external IP 123.123.123.123, but your router gives you an internal IP 192.168.1.123 from 192.168.1.0/24, and etcd tried to listen to 123.123.123.123 which is invalid), causing a "deadlock" situation to prevent bootstrapping from finishing. I tried to modify the etcd config but it was overwritten everytime. I need to listen to 192.168.1.123 and not 123.123.123.123.

If you tried to change advertise address to your internal IP, here the example would be 192.168.1.123, then although your etcd will resume working, but all of your other nodes won't be able to connect to you, well because you advertised your internal IP rather than the supposed external IP 123.123.123.123. In other words this is a dilemma.

This is only reproducible for one node master. If you have more than that your etcd client and other node clients will just connect to other reachable master nodes and it will just work over there and see you as a partition.

@stevefan1999-personal
Copy link
Author

@stevefan1999-personal stevefan1999-personal changed the title etcd cannot listen if behind NAT [HA] etcd cannot listen if behind NAT if there is only one master node existing Oct 5, 2020
@stevefan1999-personal stevefan1999-personal changed the title [HA] etcd cannot listen if behind NAT if there is only one master node existing [HA] etcd cannot listen behind NAT if there is only one master node existing Oct 5, 2020
@stevefan1999-personal
Copy link
Author

So here's what I think, we should allow the user to change the etcd listen addresses, or even listen on wildcard 0.0.0.0, the latter is of course a bad catch-all solution but who cares?

@brandond
Copy link
Contributor

brandond commented Oct 5, 2020

Can you show the output of:

  • kubectl get nodes -o wide
  • AUTH=$(awk -F:: '{print $2}' /var/lib/rancher/k3s/server/node-token) curl -ks https://${AUTH}@localhost:6443/db/info

@brandond
Copy link
Contributor

brandond commented Oct 6, 2020

Oh I see, it won't even come up because it's trying to connect to itself at the public IP address.

@brandond brandond added [zube]: To Triage area/etcd kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 6, 2020
@MonzElmasry MonzElmasry self-assigned this Oct 30, 2020
@davidnuzik davidnuzik added this to the v1.19.4+k3s1 milestone Nov 3, 2020
@rancher-max
Copy link
Contributor

Validated using latest master commit: ea916030c27acbb6bc22cfe235741ee485de426b

  • This is easiest reproducible by starting k3s with flags --cluster-init --node-external-ip <public ip of node>
  • advertise-client-url is seen in etcd config: cat /var/lib/rancher/k3s/server/db/etcd/config and when node-external-ip is passed or you have the scenario listed in the issue description, this value ends up being the public ip but should not be. Additional public ip values are noticed in: initial-advertise-peer-urls, initial-cluster, listen-client-urls, and listen-peer-urls.
  • The fix updates all of these to use private ip of the node if it's available. Cases where private ip not being available are irrelevant for this issue, but regression tested those and see no issues. Additionally in those cases, the public ip is correctly used in the etcd config.
  • k3s comes up successfully and using values as expected
  • embedded etcd continues to work successfully.
  • Can join etcd nodes successfully to the cluster, whether using external ip or not

@adi90x
Copy link

adi90x commented Dec 1, 2020

Hello,
Is there a way to keep etcd listening on external adress ?
Setup is : One master behind NAT, and some other master in the cloud .
So first master need to listen on public IP if other master want to connect...
If first master is in the cloud then it seems to be fine as other master can connect.
Any workround for that ?

Regards,

@kable-wilmoth
Copy link

kable-wilmoth commented Dec 2, 2020

I have a similar use case where I need etcd to advertise the external IP.

Working on a POC w/ K3S running in WSL2. WSL2 gets a new IP each time so I had to create a static IP address and specify that as --node-ip to keep k3s working between restarts. Windows is the external IP and it portproxies to the embedded WSL's dynamic ip which K3S/etcd are running in. It runs as a single master just fine and can remotely access it with kubectl etc. The problem comes up when trying to join as a cluster. Another 'normal' K3S instance can't complete the cluster join because the etcd 'advertised IP' is the Windows/WSL/Static IP which of course is not routeable.

If I can just add a flag to tell etcd to use the external address, instead of the internal private address, my use case will be fixed.

I am able to get further by changing my static IP address to be the Windows public IP (I am sure I have other problems ahead) but at least the etcd advertised is now reporting a routeable IP.
Yep, next problem, etcd is listening on the fake static Windows IP, so when windows ports forward to WSL2's eth0 dynamic IP, there isn't anything bound to it. Most everything is listening to *, but etcd is only listening to the private IP. So I am stuck, guess I can play w/ some iptable forwarding...

Still wondering if you can supply etcd w/ what it should advertise, similar to their docs:
https://etcd.io/docs/v3.1.12/op-guide/configuration/
ETCD_ADVERTISE_CLIENT_URLS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/etcd kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants