-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nebula not connecting to lighthouse on linux boot #372
Comments
Sounds like you may be starting nebula before the network is up. Which init system are you using? |
systemd I have attached my systemd service file. |
I had the exact same problem.
If it doesn't help, try adding ExecStartPre, like that:
|
I have had this as well and was able to repro easily on 2 Linux computers (Ubuntu 16.04 server, Deepin 15.11) plus a Windows computer (Win10 Pro 20H2). Tested with Nebula 1.3.0. Reproduction on Linux as follows:
Windows encounters the same issue when installed as a service, when using an FQDN for the lighthouse, on system startup. It is also possible to replicate this by starting Nebula manually with the lighthouse as an FQDN when the network is disconnected (assuming no locally installed DNS, tested on both Linux hosts and the Windows host). My guess for the startup issue is that the DNS resolution is failing due to the early startup of the service on both OS. For some reason, Nebula never re-checks to see if the lighthouse is alive, it just gets marked as bad (at least, I waited 10 minutes and no retry was attempted). My solution is to code the lighthouse IP directly in the config, and this works fine in both test cases (startup and no network). I'd rather be able to use an FQDN though, as it means no config updates if we have to change a lighthouse IP for some reason. @dcwynar I like your exec prestart idea. I'm not sure how we would achieve the same thing in Windows though. I guess I could write a wrapper script, but that seems like more trouble than it's worth. @nbrownus Perhaps if DNS resolution of a lighthouse fails on startup, it could be reattempted (for example) on a binary exponential backoff schedule starting with 5 seconds? The current method of marking it as dead on the first attempt seems overly aggressive. |
For my office I've implemented a systemd timer that I start prior to
starting the service. If the timer cannot ping the lighthouse on the
nebula IP, it will restart the nebula service.
…On Wed, Feb 10, 2021 at 2:15 AM jimpea21 ***@***.***> wrote:
I have had this as well and was able to repro easily on 2 Linux computers
(Ubuntu 16.04 server, Deepin 15.11) plus a Windows computer (Win10 Pro
20H2).
Tested with Nebula 1.3.0.
Reproduction on Linux as follows:
1. Setup Nebula and specify the lighthouse in config.yml by FQDN
2. Install using the systemd script
3. Restart the computer - Nebula will fail to start with the
lighthouse unreachable error on the service log
Windows encounters the same issue when installed as a service, when using
an FQDN for the lighthouse, on system startup.
It is also possible to replicate this by starting Nebula manually with the
lighthouse as an FQDN when the network is disconnected (assuming no locally
installed DNS, tested on both Linux hosts and the Windows host).
My guess for the startup issue is that the DNS resolution is failing due
to the early startup of the service on both OS. For some reason, Nebula
never re-checks to see if the lighthouse is alive, it just gets marked as
bad (at least, I waited 10 minutes and no retry was attempted).
My solution is to code the lighthouse IP directly in the config, and this
works fine in both test cases (startup and no network). I'd rather be able
to use an FQDN though, as it means no config updates if we have to change a
lighthouse IP for some reason.
@dcwynar <https://github.com/dcwynar> I like your exec prestart idea. I'm
not sure how we would achieve the same thing in Windows though. I guess I
could write a wrapper script, but that seems like more trouble than it's
worth.
@nbrownus <https://github.com/nbrownus> Perhaps if DNS resolution of a
lighthouse fails on startup, it could be reattempted (for example) on a
binary exponential backoff schedule starting with 5 seconds? The current
method of marking it as dead on the first attempt seems overly aggressive.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#372 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIK3CWYXCZUG2XSH62SPPZ3S6I6CJANCNFSM4W74E37A>
.
|
I also have this issue. Retrying in nebula makes the most sense to me. |
I've been trying to figure out how to get the DNS stuff working. I've read
some of the posts on that and I'm not certain what I'm doing wrong.
…On Tue, Apr 13, 2021 at 9:43 PM Nathan Brown ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/957319/114645386-fdb20400-9c9e-11eb-84aa-896ddfc64535.png>
Jokes aside, we should support re-querying these names since we support
DNS names. We will give this some brain time after we cut the v1.4.0
release.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#372 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIK3CW2MRKSQIIZ7OTL4SNDTIT6NLANCNFSM4W74E37A>
.
|
@brucealthompson @dcwynar @jimpea21 @wildardoc If any of you are still experiencing this issue could you try adding a For Windows, there's a similar solution mentioned here: #176 (comment) |
Hi all - I'm closing this issue out as stale. We believe that #791 should solve the race by ensuring that the DNS server is available before Nebula boots. Additionally, #796 is released and working in v1.7.1 and should re-query for DNS even if the initial query for DNS fails. By default, we will re-query on a 30s cadence, but this can be configured via Please let us know if you continue to experience issues! |
I am running a reasonably large Nebula network using Debian clients and lighthouse. I have noticed that my Nebula clients are not able to connect to the lighthouse when Nebula is started after initial boot. However, if I restart Nebula after the initial failure, the client connects to the lighthouse with no errors or issues. Here is the error I get from Nebula when I start it after initial boot:
level=error msg="Lighthouse unreachable" error="Lighthouse 10.217.0.1 does not have a static_host_map entry.
I have attached my config.yml which does include a static_host_map entry for the lighthouse.
config.yml.txt
Nebula Version: 1.3.0
The text was updated successfully, but these errors were encountered: