Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to Lighhouse when client WAN IP changes until cache expires (if ever) #515

Closed
borrelan opened this issue Aug 24, 2021 · 6 comments

Comments

@borrelan
Copy link

borrelan commented Aug 24, 2021

First off I would like to thank you for opening this tool/service with the world as opensource software.

I tried searching for an issue that was similar in nature, but unsure if #512 qualifies.

I have a nagging issue which causes nebula to disconnect and not reconnect/recover for over 5 minutes. I believe the issue comes from a stale external IP cache, but I can't tell where it's stored as restarting Lighthouse (nebula_v1.4.0_linux_amd64) and client (nebula_v1.4.0_linux_arm64) doesn't resolve the issue. Eventually everything sorts itself out, but in a painfully slow timeframe.

Lighhouse:
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel Core Processor (Broadwell, no TSX, IBRS)

Client:
Processor : AArch64 Processor rev 4 (aarch64)
Hardware : sun50iw1p1

Please find the logs below from the Lighhouse node:

Aug 24 17:02:52 cdn nebula[21623]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:52 cdn nebula[21623]: time="2021-08-24T17:02:52Z" level=info msg="Taking new handshake" certName={filtered} vpnIp=192.168.3.2 Aug 24 17:02:52 cdn nebula[21623]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message sent" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:2 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=1226101364 sentCachedPackets=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:52 cdn nebula[21623]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:52 cdn nebula[21623]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:53 cdn nebula[21623]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:53 cdn nebula[21623]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:53 cdn nebula[21623]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:53 cdn nebula[21623]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:54 cdn nebula[21623]: time="2021-08-24T17:02:54Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:54 cdn nebula[21623]: time="2021-08-24T17:02:54Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:55 cdn nebula[21623]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:55 cdn nebula[21623]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:55 cdn nebula[21623]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:55 cdn nebula[21623]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:57 cdn nebula[21623]: time="2021-08-24T17:02:57Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:57 cdn nebula[21623]: time="2021-08-24T17:02:57Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:58 cdn nebula[21623]: time="2021-08-24T17:02:58Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:58 cdn nebula[21623]: time="2021-08-24T17:02:58Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:59 cdn nebula[21623]: time="2021-08-24T17:02:59Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:02:59 cdn nebula[21623]: time="2021-08-24T17:02:59Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Taking new handshake" certName={filtered} vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:2 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=2616167467 sentCachedPackets=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:01 cdn nebula[21623]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:02 cdn nebula[21623]: time="2021-08-24T17:03:02Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:02 cdn nebula[21623]: time="2021-08-24T17:03:02Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:03 cdn nebula[21623]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:03 cdn nebula[21623]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:03 cdn nebula[21623]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:03 cdn nebula[21623]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:04 cdn nebula[21623]: time="2021-08-24T17:03:04Z" level=info msg="Handshake message received" certName={filtered} fingerprint=9a8af4edd523c19c7c46d8fe3b1bad102f08d6c6e5b1c97e9ed76ab4f18b5bff handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 remoteIndex=0 responderIndex=0 udpAddr="{filtered}:13695" vpnIp=192.168.3.2 Aug 24 17:03:04 cdn nebula[21623]: time="2021-08-24T17:03:04Z" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="{filtered}:13695" vpnIp=192.168.3.2

Logs on the client
Aug 24 17:02:52 homester nebula[12637]: time="2021-08-24T17:02:52Z" level=info msg="Handshake timed out" durationNs=8379105950 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3972176064 remoteIndex=0 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:52 homester nebula[12637]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:52 homester nebula[12637]: time="2021-08-24T17:02:52Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:53 homester nebula[12637]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:53 homester nebula[12637]: time="2021-08-24T17:02:53Z" level=info msg="Handshake timed out" durationNs=8579430949 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3580733626 remoteIndex=0 udpAddrs="[]" vpnIp=192.168.3.3 Aug 24 17:02:53 homester nebula[12637]: time="2021-08-24T17:02:53Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:54 homester nebula[12637]: time="2021-08-24T17:02:54Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:55 homester nebula[12637]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:55 homester nebula[12637]: time="2021-08-24T17:02:55Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:57 homester nebula[12637]: time="2021-08-24T17:02:57Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:58 homester nebula[12637]: time="2021-08-24T17:02:58Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:02:59 homester nebula[12637]: time="2021-08-24T17:02:59Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:01 homester nebula[12637]: time="2021-08-24T17:03:01Z" level=info msg="Handshake timed out" durationNs=8495165069 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=901163749 remoteIndex=0 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:01 homester nebula[12637]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:01 homester nebula[12637]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:01 homester nebula[12637]: time="2021-08-24T17:03:01Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:02 homester nebula[12637]: time="2021-08-24T17:03:02Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:02 homester nebula[12637]: time="2021-08-24T17:03:02Z" level=info msg="Handshake timed out" durationNs=8763083880 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=4048442694 remoteIndex=0 udpAddrs="[]" vpnIp=192.168.3.3 Aug 24 17:03:03 homester nebula[12637]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1 Aug 24 17:03:03 homester nebula[12637]: time="2021-08-24T17:03:03Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1064900936 udpAddrs="[{filtered}:13695]" vpnIp=192.168.3.1

@cg31
Copy link

cg31 commented Aug 31, 2021

Does it happen every 5 minutes or just sometimes?

I have a similar issue, sometimes the connection would drop and reestablish. I haven't looked into it.

I guess it is related to:
https://github.com/slackhq/nebula/blob/v1.4.0/examples/config.yml#L204

in your log, it also happens:
https://github.com/slackhq/nebula/blob/v1.4.0/handshake_manager.go#L124

@borrelan
Copy link
Author

Thank you for taking the time to respond.

Issue occurs randomly after my public IP changes. It doesn’t happen every time nor does it happen on the same node. When it does happen, it lasts anywhere from 5m to hours while my logs fill up with continuous handshake attempts with the wrong external IP.

The timeout occurs because the handshake is using a public IP address that is no longer part of my network.

Recently I’ve been having some luck with
‘systemctl stop nebula.service; sleep 900; systemctl start nebula.service’. I wouldn’t say with high confidence that it solves the issue, but it does help reduce the recovery time.

@natemellendorf
Copy link

@borrelan

You mentioned that this happens when your public IP changes. Just to be clear, and while reviewing your logs, it appears that the public IP that changes is the public IP of your client and not the lighthouse. Just trying to confirm for clarity.

While this is happening, do you have the ability to review traffic on your lighthouse? I’d be curious if it’s seeing the traffic from your client once it’s public IP changes.

On your lighthouse, maybe use something like tcpdump: tcpdump host <client_public_ip>

Some details on using tcpdump here:
https://linuxhint.com/tcpdump-beginner-guide-2/

If the lighthouse is seeing the traffic but nebula isn’t updating, then we know it’s nebula. If you’re not seeing any traffic from your new public IP on the lighthouse, then we know it’s something else in between.

Thanks!

@borrelan
Copy link
Author

borrelan commented Sep 13, 2021

@natemellendorf

Correct! The IP address that keeps changing and randomly causing the issue is on the client side.

The lighthouse has a static IPv4 and IPv6 although I'm not using the IPv6 in this case since non of my client nodes have IPv6

I've attached the latest logs and the latest configuration that I'm using on both the lighthouse and client side
20210913_nebula_issue_515_logs.txt

It looks like the traffic is indeed coming from the new IP address, but the logs within lighthouse register as the old client public IP address. What I've found is if I stop the client service for a period of time (15m in my case; first time delay that I tried), the issue goes away. This is my current workaround to this issue.

Hope this helps.

@borrelan borrelan changed the title Cannot connect to Lighhouse when WAN IP changes until cache expires (if ever) Cannot connect to Lighhouse when client WAN IP changes until cache expires (if ever) Sep 13, 2021
@rawdigits
Copy link
Collaborator

rawdigits commented Sep 13, 2021

I looked through the config and have some ideas, as I've seen similar things before. This is likely all centered on port numbers being remapped but it happens in an unresolvably asynchronous way because of a conntrack race in the linux kernel. I have actually seen this a lot in some old tests. Luckily, if this is the case there is generally an easy fix, as long as you control the firewall on the client side:

Block unsolicited UDP into your routable internet ip address.

I did a long demonstration for the folks at Defined on how this happens and how blocking unsolicited UDP resolves it, but it took me a long time to figure out why I was having this issue at a particular location.

I would also recommend you consider changing the port setting on all clients to 0 instead of a fixed port number, as it avoids collisions among multiple clients using the same number behind a NAT. Those collisions usually resolve (as it is different to the conntrack issue), but it is cleaner to give them randomized ports.

Edit: I removed the suggestion to not forward the nebula port because that's actually fine and I just remembered why.

@johnmaguire
Copy link
Collaborator

Closed for inactivity. If the solution @rawdigits shared above doesn't work for you, please open a new ticket with relevant info. Thanks!

@johnmaguire johnmaguire closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants