Nebula not connecting to lighthouse on linux boot #372

brucealthompson · 2021-02-03T00:26:17Z

I am running a reasonably large Nebula network using Debian clients and lighthouse. I have noticed that my Nebula clients are not able to connect to the lighthouse when Nebula is started after initial boot. However, if I restart Nebula after the initial failure, the client connects to the lighthouse with no errors or issues. Here is the error I get from Nebula when I start it after initial boot:
level=error msg="Lighthouse unreachable" error="Lighthouse 10.217.0.1 does not have a static_host_map entry.

I have attached my config.yml which does include a static_host_map entry for the lighthouse.
config.yml.txt

Nebula Version: 1.3.0

nbrownus · 2021-02-03T00:40:44Z

Sounds like you may be starting nebula before the network is up.

Which init system are you using?

brucealthompson · 2021-02-03T01:41:20Z

Which init system are you using?

systemd

I have attached my systemd service file.
nebula.service.txt

dcwynar · 2021-02-07T19:58:33Z

I had the exact same problem.
Solved it by changing service Unit to:

[Unit]
Description=nebula
After=network-online.target
Wants=network-online.target

If it doesn't help, try adding ExecStartPre, like that:

[Unit]
Description=nebula
Wants=basic.target
After=basic.target network.target
[Service]
SyslogIdentifier=nebula
StandardOutput=syslog
StandardError=syslog
ExecReload=/bin/kill -HUP 
ExecStartPre=/bin/sh -c 'until ping -c1 [your-nebula-lighthouse-public-host-or-ip]; do sleep 1; done;'
ExecStart=/usr/bin/nebula -config /etc/nebula/config.yml
Restart=always
[Install]
WantedBy=multi-user.target

jimpea21 · 2021-02-10T08:15:10Z

I have had this as well and was able to repro easily on 2 Linux computers (Ubuntu 16.04 server, Deepin 15.11) plus a Windows computer (Win10 Pro 20H2).

Tested with Nebula 1.3.0.

Reproduction on Linux as follows:

Setup Nebula and specify the lighthouse in config.yml by FQDN
Install using the systemd script
Restart the computer - Nebula will fail to start with the lighthouse unreachable error on the service log

Windows encounters the same issue when installed as a service, when using an FQDN for the lighthouse, on system startup.

It is also possible to replicate this by starting Nebula manually with the lighthouse as an FQDN when the network is disconnected (assuming no locally installed DNS, tested on both Linux hosts and the Windows host).

My guess for the startup issue is that the DNS resolution is failing due to the early startup of the service on both OS. For some reason, Nebula never re-checks to see if the lighthouse is alive, it just gets marked as bad (at least, I waited 10 minutes and no retry was attempted).

My solution is to code the lighthouse IP directly in the config, and this works fine in both test cases (startup and no network). I'd rather be able to use an FQDN though, as it means no config updates if we have to change a lighthouse IP for some reason.

@dcwynar I like your exec prestart idea. I'm not sure how we would achieve the same thing in Windows though. I guess I could write a wrapper script, but that seems like more trouble than it's worth.

@nbrownus Perhaps if DNS resolution of a lighthouse fails on startup, it could be reattempted (for example) on a binary exponential backoff schedule starting with 5 seconds? The current method of marking it as dead on the first attempt seems overly aggressive.

wildardoc · 2021-02-10T12:35:44Z

For my office I've implemented a systemd timer that I start prior to starting the service. If the timer cannot ping the lighthouse on the nebula IP, it will restart the nebula service.

…

On Wed, Feb 10, 2021 at 2:15 AM jimpea21 ***@***.***> wrote: I have had this as well and was able to repro easily on 2 Linux computers (Ubuntu 16.04 server, Deepin 15.11) plus a Windows computer (Win10 Pro 20H2). Tested with Nebula 1.3.0. Reproduction on Linux as follows: 1. Setup Nebula and specify the lighthouse in config.yml by FQDN 2. Install using the systemd script 3. Restart the computer - Nebula will fail to start with the lighthouse unreachable error on the service log Windows encounters the same issue when installed as a service, when using an FQDN for the lighthouse, on system startup. It is also possible to replicate this by starting Nebula manually with the lighthouse as an FQDN when the network is disconnected (assuming no locally installed DNS, tested on both Linux hosts and the Windows host). My guess for the startup issue is that the DNS resolution is failing due to the early startup of the service on both OS. For some reason, Nebula never re-checks to see if the lighthouse is alive, it just gets marked as bad (at least, I waited 10 minutes and no retry was attempted). My solution is to code the lighthouse IP directly in the config, and this works fine in both test cases (startup and no network). I'd rather be able to use an FQDN though, as it means no config updates if we have to change a lighthouse IP for some reason. @dcwynar <https://github.com/dcwynar> I like your exec prestart idea. I'm not sure how we would achieve the same thing in Windows though. I guess I could write a wrapper script, but that seems like more trouble than it's worth. @nbrownus <https://github.com/nbrownus> Perhaps if DNS resolution of a lighthouse fails on startup, it could be reattempted (for example) on a binary exponential backoff schedule starting with 5 seconds? The current method of marking it as dead on the first attempt seems overly aggressive. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#372 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIK3CWYXCZUG2XSH62SPPZ3S6I6CJANCNFSM4W74E37A> .

jamescorbett · 2021-04-14T00:02:38Z

I also have this issue. Retrying in nebula makes the most sense to me.

nbrownus · 2021-04-14T02:43:21Z

Jokes aside, we should support re-querying these names since we support DNS names. We will give this some brain time after we cut the v1.4.0 release.

wildardoc · 2021-04-15T18:50:20Z

I've been trying to figure out how to get the DNS stuff working. I've read some of the posts on that and I'm not certain what I'm doing wrong.

…

On Tue, Apr 13, 2021 at 9:43 PM Nathan Brown ***@***.***> wrote: [image: image] <https://user-images.githubusercontent.com/957319/114645386-fdb20400-9c9e-11eb-84aa-896ddfc64535.png> Jokes aside, we should support re-querying these names since we support DNS names. We will give this some brain time after we cut the v1.4.0 release. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#372 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIK3CW2MRKSQIIZ7OTL4SNDTIT6NLANCNFSM4W74E37A> .

johnmaguire · 2022-12-07T18:32:39Z

@brucealthompson @dcwynar @jimpea21 @wildardoc

If any of you are still experiencing this issue could you try adding a Wants=nss-lookup.target line to your nebula systemd unit file? Thanks!

For Windows, there's a similar solution mentioned here: #176 (comment)

johnmaguire · 2023-05-18T20:41:12Z

Hi all - I'm closing this issue out as stale. We believe that #791 should solve the race by ensuring that the DNS server is available before Nebula boots.

Additionally, #796 is released and working in v1.7.1 and should re-query for DNS even if the initial query for DNS fails. By default, we will re-query on a 30s cadence, but this can be configured via static_map.cadence.

Please let us know if you continue to experience issues!

johnmaguire mentioned this issue Dec 19, 2022

Add nss-lookup to the systemd wants #791

Merged

johnmaguire closed this as completed May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nebula not connecting to lighthouse on linux boot #372

Nebula not connecting to lighthouse on linux boot #372

brucealthompson commented Feb 3, 2021

nbrownus commented Feb 3, 2021

brucealthompson commented Feb 3, 2021

dcwynar commented Feb 7, 2021 •

edited

Loading

jimpea21 commented Feb 10, 2021

wildardoc commented Feb 10, 2021 via email

jamescorbett commented Apr 14, 2021 •

edited

Loading

nbrownus commented Apr 14, 2021

wildardoc commented Apr 15, 2021 via email

johnmaguire commented Dec 7, 2022

johnmaguire commented May 18, 2023

Nebula not connecting to lighthouse on linux boot #372

Nebula not connecting to lighthouse on linux boot #372

Comments

brucealthompson commented Feb 3, 2021

nbrownus commented Feb 3, 2021

brucealthompson commented Feb 3, 2021

dcwynar commented Feb 7, 2021 • edited Loading

jimpea21 commented Feb 10, 2021

wildardoc commented Feb 10, 2021 via email

jamescorbett commented Apr 14, 2021 • edited Loading

nbrownus commented Apr 14, 2021

wildardoc commented Apr 15, 2021 via email

johnmaguire commented Dec 7, 2022

johnmaguire commented May 18, 2023

dcwynar commented Feb 7, 2021 •

edited

Loading

jamescorbett commented Apr 14, 2021 •

edited

Loading