Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain name in static_host_map #176

Closed
enykeev opened this issue Jan 27, 2020 · 10 comments
Closed

Domain name in static_host_map #176

enykeev opened this issue Jan 27, 2020 · 10 comments

Comments

@enykeev
Copy link

enykeev commented Jan 27, 2020

Using domain name instead of ip in static_host_map may result in nebula ending up in a state when the process is still running, but no attempts to reconnect to lighthouse will ever occur.

The situation happens during initial boot when interface is already up and network-online.target seem to be fulfilled, but dhcpcd is yet to set a nameserver. If nebula is to start in that time frame (which in my case it does 9 out of 10 times), it won't be able to parse lighthouse IP, report lighthouse unreachable due to missing static_host_map entry (see #41) and continue to run while making no attempts to establish connection.

In my opinion, if we consider #41 to be a configuration issue, the error should be fatal. If on the other hand we consider it an operational issue, we should resolve the host at connect time rather than during config parsing.

@bartmichu
Copy link

I have the same problem on my Ubuntu laptop. I need to restart nebula service to get it working every time after system boots up.

@bartmichu
Copy link

Nebula v1.2.0 is still affected.

@windwalker78
Copy link

I confirm this is still not working 100% on 1.2.0. Seen on Ubuntu and Debian 9.

@ghost
Copy link

ghost commented Jun 16, 2020

Confirmed not working on CentoOS 7 on Nebula 1.1.0

@keitme
Copy link

keitme commented Nov 22, 2020

This is with the current 1.3.0 release.

I have reproduced this issue on several Windows revisions (10, server 2016, server 2019). On Windows, as a workaround, I have manually set the service start up type for Nebula Network Service to Automatic (Delayed). With this set the connection establishes properly when using the FQDN of the lighthouse server rather than its IP address, but the connection may not be available until several minutes after the machine is otherwise available.

I have reproduced this issue on both the latest Mac OS X revisions as well, but I haven't developed a workaround as of yet so I am specifying the IP address of the lighthouse. I have not tested on Mac OS 11 as of yet. I expect similar behavior, but will confirm once I have an opportunity to test.

@rawdigits
Copy link
Collaborator

The only useful solution here would probably involve regularly re-querying names. If we go down this path, we should also consider re-querying even after we get a successful answer, as this would allow us to migrate to a new IP if the underlying DNS entry changes.

@SgtZapper
Copy link

SgtZapper commented Jun 2, 2022

I think I might have encountered the same problem after a power outage had my single lighthouse unreachable for a few hours and when it came back on a new public IP none of the nodes that where connected wouldn't ever reconnect without restarting nebula on them.

It would be good to try to resolve the dns entry for a host in the static hosts every now and then when a static host is unreachable.

@johnmaguire
Copy link
Collaborator

@enykeev @bartmichu @windwalker78 @keitme @SgtZapper If you're still encountering this error, would you try adding a Wants=nss-lookup.target to your systemd file? This should cause the system to wait for DNS resolution to be available prior to starting nebula.

@johnmaguire
Copy link
Collaborator

I'm closing this issue out as #791 has landed. We believe this should solve the startup race.

@johnmaguire
Copy link
Collaborator

In addition to #791, #796 is released and working in v1.7.1 and should re-query for DNS even if the initial query for DNS fails. By default, we will re-query on a 30s cadence, but this can be configured via static_map.cadence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants