-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: NAT Setup #33
Comments
|
Also, to note in this setup all nodes are behind different NATs on different networks. Hub and spoke with the hub being the lighthouse and spokes going to hosts on different networks. |
|
My best guess (because I just messed this up in a live demo), is that am_lighthouse may be set to "true" on the individual nodes. Either way, can you post your lighthouse config and one of your node configs? (feel free to replace any sensitive IP/config bits, just put consistent placeholders in their place) |
|
Hi, I have the same issue. My lighthouse is on a DigitalOcean droplet with public IP. My MacBook and Linux Laptop at home are on the same network both connected to lighthouse. I can ping lighthouse from both laptop, but I cannot ping from one laptop to the other. Lighthouse config pki:
ca: /data/cert/nebula/ca.crt
cert: /data/cert/nebula/lighthouse.crt
key: /data/cert/nebula/lighthouse.key
static_host_map:
"192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
am_lighthouse: true
interval: 60
hosts:
listen:
host: 0.0.0.0
port: 4242
punchy: true
tun:
dev: neb0
drop_local_broadcast: false
drop_multicast: false
tx_queue: 500
mtu: 1300
logging:
level: info
format: text
firewall:
conntrack:
tcp_timeout: 120h
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: any
- port: 443
proto: tcp
groups:
- laptopMacbook config pki:
ca: /Volumes/code/cert/nebula/ca.crt
cert: /Volumes/code/cert/nebula/mba.crt
key: /Volumes/code/cert/nebula/mba.key
static_host_map:
"192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "LIGHTHOUSE_PUBLIC_IP"
punchy: true
tun:
dev: neb0
drop_local_broadcast: false
drop_multicast: false
tx_queue: 500
mtu: 1300
logging:
level: debug
format: text
firewall:
conntrack:
tcp_timeout: 120h
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: any
- port: 443
proto: tcp
groups:
- laptopLinux laptop config pki:
ca: /data/cert/nebula/ca.crt
cert: /data/cert/nebula/server.crt
key: /data/cert/nebula/server.key
static_host_map:
"192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "LIGHTHOUSE_PUBLIC_IP"
punchy: true
listen:
host: 0.0.0.0
port: 4242
tun:
dev: neb0
drop_local_broadcast: false
drop_multicast: false
tx_queue: 500
mtu: 1300
logging:
level: info
format: text
firewall:
conntrack:
tcp_timeout: 120h
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: any
- port: 443
proto: tcp
groups:
- laptop |
|
@nfam thanks for sharing the config. My next best guess is that nat isn't reflecting and for some reason nodes also aren't finding each other locally. Try setting the |
|
@nfam similar setup. Public lighthouse on digital ocean, laptop on home nat, and server in AWS behind a NAT. Local and AWS are using different private ranges(though overlap should be handled) |
|
@rawdigits setting
|
|
My Config: Lighthouse: pki:
ca: /etc/nebula/ca.crt
cert: /etc/nebula/lighthouse.crt
key: /etc/nebula/lighthouse.key
static_host_map:
"192.168.100.1": ["167.71.175.250:4242"]
lighthouse:
am_lighthouse: true
interval: 60
listen:
host: 0.0.0.0
port: 4242
punchy: true
tun:
dev: nebula1
mtu: 1300
logging:
level: info
format: text
firewall:
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: anyLaptop: pki:
# The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
ca: /etc/nebula/ca.crt
cert: /etc/nebula/laptop.crt
key: /etc/nebula/laptop.key
static_host_map:
"192.168.100.1": ["167.71.175.250:4242"]
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "192.168.100.1"
listen:
host: 0.0.0.0
port: 0
punchy: true
tun:
dev: nebula1
mtu: 1300
logging:
level: info
format: text
firewall:
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: anyServer: pki:
ca: /etc/nebula/ca.crt
cert: /etc/nebula/server.crt
key: /etc/nebula/server.key
static_host_map:
"192.168.100.1": ["167.71.175.250:4242"]
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "192.168.100.1"
listen:
host: 0.0.0.0
port: 0
punchy: true
tun:
dev: nebula1
mtu: 1300
logging:
level: info
format: text
firewall:
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: anyWith this setup, both server and laptop can ping lighthouse, lighhouse can ping server and laptop, but laptop cannot ping server and server cannot ping laptop. I get messages such as this as it's trying to make the connection: |
|
@nfam similar error, not sure it's the problem
|
|
The As far as the handshakes, for some reason hole punching isn't working. A few things to try:
Also It appears the logs with the handshake messages are from the laptop? If so can you also share nebula logs from the server as it tries to reach the laptop? Thanks! |
|
Aha, @nfam I think I spotted the config problem. instead of it should be |
|
adding #40 to cover accidental misconfiguration noted above. |
|
@rawdigits yes, it is. Now both laptops can ping to each other. |
Server log: |
|
So, tried a few more setups, just comes down to what seems like if the two hosts that are trying to communicate with each other are both on different networks and both behind NAT, it will not work. |
|
Dual NAT scenario is a bit tricky, possibly room for improvement from nebula's perspective there. Do you have details on the type of NATs you are dealing with? |
|
@nbrownus nothing crazy, I've done multiple AWS VPC NAT gateways with hosts behind them and they cannot connect. I've also tried "home" NAT(google WiFi router based NAT), with no success. From a networking perspective, I get why it's "tricky" was hoping there was some trick nebula was doing. |
|
@rawdigits can speak to the punching better than I can. If you are having problems in AWS then we can get a test running and sort out the issues. |
|
Yeah, so all my tests have had at least one host behind an AWS NAT Gateway |
|
Longshot, but one more thing to try until I set up an AWS NAT GW: Probably won't work, but easy to test.. |
|
@rawdigits same issue Network combination: I added in a second server in a different VPC on AWS to remove the FIOS variable, and had the same results, with server and server2 trying to communicate |
|
@jatsrt I'll stand up a testbed this week to explore what may be the cause of the issue. Thanks! |
I have got the same situation. But I found, node_A and node_B can communicate with each other ONLY if both are connected to the same router, such as the same WiFi router. PS No firewall on node_A, node_B and lighthouse. |
|
hole punch very difficult and random |
|
I also can't get nebula to work properly when both nodes are behind a typical NAT (Technically PAT) regardless of any port pinning I do in the config. They happily connect to the lighthouse I have in AWS but it seems like something isn't working properly. I've got punchy and punchback enabled on everything and it doesn't seem to help. I've tried setting the port on the nodes to 0, and also trying the same port that lighthouse is listening on. The nodes have no issues connecting to each other over the MPLS, but we don't want that (performance reasons) Edit: To add a bit more detail, even Meraki's AutoVPN can't deal with this. In their situation the "hub" needs to be told it's public IP and a fixed port that is open inbound. I'd be fine with that as an option, and may be the only reliable one if both nodes are behind different NATs. Another option I had considered, what if we could use the lighthouses to hairpin traffic? I'd much rather pay AWS for the bandwidth than have to deal with unfriendly NATs everywhere. |
|
I did a bit more research, and it appears that the "AWS Nat Gateway" uses Symmetric NAT, which isn't friendly to hole punching of any kind. NAT gateways also don't appear to support any type of port forwarding, so fixing this by statically assigning and forwarding a port doesn't appear to be an option. A NAT instance would probably work, but I realize that's probably not a great option. One thing I recommend considering would be to give instances a routable IP address, but disallow all inbound traffic. This wouldn't greatly change the security of your network, since you still aren't allowing any unsolicited packets to reach the hosts, but would allow hole punching to work properly. |
|
I don't think NAT so much is the issue but PAT (port translation). Unfortunately with that you can't predict what your public port will be and hole punching becomes impossible if both ends are behind a similar PAT. I'm going to do some testing, but I think that as long as 1 of 2 nodes has a 1:1 NAT (no port translation) a public IP on the node directly isn't a concern. If I get particularly ambitious I may attempt to whip up some code in lighthouse to detect when one/both nodes are behind a PAT and throw a warning saying that this won't work out of the box. |
I've thought about this before. You need at least 2 lighthouses, and I think it's best to implement as a flag on the non-lighthouses (when you query the lighthouses for a host, if you get results with the same IP but different ports then you know the remote is problematic). |
|
I haven't dug into the handshake code but if you include the source port in the handshake the lighthouse can compare that to what it sees. If they differ you know something in the middle is doing port translation. |
I bet this is also my issue... will test it soon. That section is confusing 😕 |
|
@schuft69 Are your nodes able to connect to the lighthouse? If so, you may just need to statically set a port for each extra node and then open those up on OPNsense. |
-> setting everything up with static ports + dyndns is working quite well. <- I was hoping to get rid of static ports with nebula (which I have now with wireguard). The hole-punching (from lighthouse on a 1€ droplet at strato.de) is neither working on the FritzBox (where I have devices on my parents home) and not on my home (OPNSense - maybe because disabling UDP port rewriting like written earlier is not working somehow (have to ask at the OPNsense conmunity..))). |
|
Glad you've got a workable solution! I'm hearing from a lot of people that the NAT punching isn't as successful as I think the Nebula devs had expected. I remember reading a comment in an older issue/PR thread from one of them about being disappointed that many users aren''t able to use IPv6 since it doesn't have any of these NAT issues. I really hope they'll add in some support for partial mesh implementations soon, and update the readme to explain that it currently only supports 100% full mesh deployments. |
|
In this workshop video it sounds like the NAT-to-NAT traversal is supported. But here it sounds like NAT is still as messy as it always has been. What the status on UPnP/NAT-PMP support? |
|
I am really confused on what nebula supports or does not support in regards to NAT traversal. From reading the comments it sounds like this: Let's say I have a lighthouse and two networks behind NATs. I assume:
Are these assumption correct? |
Sort of, but not really. Machine1 needs the network the printer is on specified within its cert (
Yes, that should be the case
No, Nebula does not have a "proxy", "routing" or "connection hopping" mechanism built in.
PMP is not supported (but there's a PR to add it!) and UPnP should work.
Correct |
|
Thanks for the help, @tarrenj
So with Where I am still a bit lost is the "OTHER". And machine2 should have a local LAN connection to the printer on another interface.
So that means I would have to open a port to every nebula instance?!
Found it! #148 |
No, the opposite. You'd use the network printer1 is on when creating the cert for machine1, and the network printer2 is on when creating the cert for machine2. Certs are all about trust. When you generate a cert on machine1 with subnet n specified, you then have to have it signed by the CA (which all other nodes trust). This effectively tells all other nodes "According to the CA (which you already trust), machine1 is allowed to relay traffic to network n" Generating and signing the machine1 cert with the --subnet n argument basically grants machine1 "permission" to route traffic to that unsafe network.
Adding an The
Nebula assumes that each node is able to establish a direct connection with each other node (using NAT hole punching through UPnP). Machine1 would not be able to access machine2 by connecting "through" the lighthouse, in your above example. |
|
So to summarise: The machine1 cert would be signed for nat1, the machine2 cert would be signed for nat2 - that defines their trust relationship as "exit node" into the LAN. And specifying the I guess the real important information in the context of this issue is that every nebula instance must be directly reachable through the NAT - so is requiring a punched/forwarded port. I didn't expect that. Thanks for clearing this up! |
|
Thanks for this conversation. I might have found another way after a bunch of trial and error. I had two laptops inside my regular home network behind a NAT that could not connect to a server on another network also behind a NAT. The server network had a lighthouse with perimeter firewall rule as a lighthouse should, to which the laptops could PING but they could not the server endpoint. Solution/workaround that worked for me: On the laptops, create an entry in the config.yml for the server endpoint as though it's a lighthouse (even though it's not actually a lighthouse), and alongside the lighthouse "hosts" entry in the config.yml. Put the external network IP and port for the other endpoint, even if there is no perimeter firewall rule to it. In my case the lighthouse has port 4242 and the server endpoint was a different port (not sure if that is necessary). Note: I did NOT need to do anything with My theory it works because the server endpoint's external network has an actual lighthouse, so the laptop client knows how to reach that network, meaning that the laptop client associates the server's Nebula IP address with the external IP of the lighthouse being on the same network. The actual lighthouse knows how to get to the server endpoint and provides the path once the Nebula connection is established. Note: It's possible that some of my other troubleshooting put some temporary route that stuck, but I don't think so because by removing the workaround entry in config.yml it stopped working again, therefore reproducible. Good luck! |
|
Nebula 1.6.0 is released with a Relay feature, to cover cases like a Symmetric NAT. Check out the example config to see how to configure a Nebula node to act as a relay, and how to configure other nodes to identify which Relay can be used by peers for access. (edit to provide some documentation of the feature) For most personal users of Nebula, the Lighthouse is the ideal relay. To use Relays on your network, do the following:
Some rules around Relays:
|
|
wow, I was running into this issue and this post appeared 15 hours ago! Thanks, nebula team. Can confirm it works! Home PC to a remote server (192.168.32.4) on a remote node with public IP acting as a lighthouse (192.168.32.1) |
|
Something I've noticed in using this, I've had to lower mtu a bit from the original 1300. Not sure if it's because another end of mine doing 1:1 nat or if it's relay related. Other than that no problems. ╭─ ~ ▓▒░──────────────────────────────────────────────────░▒▓ ✔ at 21:20:39 ─╮
╰─ ping 192.168.32.4 -s 1216 ─╯
PING 192.168.32.4 (192.168.32.4) 1216(1244) bytes of data.
1224 bytes from 192.168.32.4: icmp_seq=1 ttl=64 time=26.6 ms
1224 bytes from 192.168.32.4: icmp_seq=2 ttl=64 time=26.0 ms
1224 bytes from 192.168.32.4: icmp_seq=3 ttl=64 time=25.2 ms
^C
--- 192.168.32.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 25.186/25.913/26.592/0.575 ms
╭─ ~ ▓▒░──────────────────────────────────────────────────░▒▓ ✔ at 21:20:43 ─╮
╰─ ping 192.168.32.4 -s 1217 ─╯
PING 192.168.32.4 (192.168.32.4) 1217(1245) bytes of data.
^C
--- 192.168.32.4 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms |
|
@sfxworks thanks for the feedback! You're spot on - when relaying, Nebula sticks additional headers onto the packets, which will impact the MTU. |
There is still a special case needing attention (or yet another type of node) which I don't get my head wrapped around: Gateways between two or more meshes. A server wiht several instances of nebula running on different addresse and/or ports could act as a relay node between them, permitting segmentation between the equivalent of VLANs. But that would probably also require some separated DNS service that can be shared among the meshes. |
|
Something I noticed when dealing with more MTU issues: mtu: 1200
# Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
routes:
#- mtu: 8800
# route: 10.0.0.0/16
# Unsafe routes allows you to route traffic over nebula to non-nebula nodes
# Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
# NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
# `mtu` will default to tun mtu if this option is not specified
# `metric` will default to 0 if this option is not specified
unsafe_routes:
- route: 192.168.8.0/23
via: 192.168.32.5
mtu: 1300Even with one unsafe route specified as 1300, the entire tunnel was configured to be 1300 instead of 1200. So while I could reach my unsafe route area, 14: nebula1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1200 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 192.168.32.6/19 scope global nebula1
valid_lft forever preferred_lft foreverI could not reach my office server that required the mtu of 1200 even though the default was set to 1200. |
|
Wait no, that's not the issue I was having. Logs from home PC: I can ping 192.168.32.7 from my Home PC just fine. So, not sure why I'm getting This occurs on occasion with my config. I have two relay hosts based on two lighthouses with public IPs. These also act as routers for their respective zones. So I have Home PC (192.168.32.6) and Office Server (192.168.32.4) relay:
relays:
- 192.168.32.1
- 192.168.32.7
am_relay: false
use_relays: trueWith Home Router (192.168.32.7) relay:
relays:
- 192.168.32.1
am_relay: true
use_relays: trueAnd Office Router (192.168.32.1) relay:
relays:
- 192.168.32.7
am_relay: true
use_relays: trueThe thing is, after either a systemctl restart and/or some time, the issue resolves itself and I can reach my office server again. It's intermittent. Is one of my relays just bad, or is this somehow the wrong way to set this up? Edit: Edit 2: |
|
@sfxworks |
|
@noseshimself I think the Nebula way to join two Nebula networks together is to run multiple instances of Nebula on all hosts joined to both networks, rather than on one Gateway host to join the networks. With direct connections between the peers, you get all the identity fidelity and corresponding firewall rules. If hosts are joined by an intermediary, their identity is lost - you will only have the identity of the gateway host, not the identity of the peer. That being said, I think the existing unsafe routes feature would accomplish what you described. (It's called |
I prefer doing packet filtering on dedicated systems. Imagine having a set of server systems that are supposed to be reachable by "accounting" and "thieves" and I don't want the thieves to be able to access the systems in the accounting network while not trusting the administrators of the servers either (but trusting the networking staff due to being under my control). I could of course trust the Nebula certificates taking care of that but I don't know if $asshole-from-thieves would install a modified client removing the restriction. |
|
Hi all! There's a lot of questions, answers, and information in this thread, but it's gotten a bit hard to follow. We believe that the relay feature should be sufficient for most tricky NAT scenarios. As such, I'm going to close this issue out as solved. If you're continuing to experience connectivity issues, please feel free to open up a new issue or join us on Slack. Thanks! |

I seem to be missing something important. If I setup a mesh of hosts with all direct public IP addresses, it works fine. However, if I have a network with a light house(public IP), then all nodes behind NAT, they will not connect to each other. The lighthouse is able to communicate with all hosts, but hosts are not able to communicate with each other.
Watching the logs I see connections trying to be made to both the NAT public, and the private IPs.
I have enabled punchy and punch back, but does not seem to help.
Hope it is something simple?
The text was updated successfully, but these errors were encountered: