Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/wireguard: 'vpn' configure plugin hook missing #3565

Closed
schuellerstefan opened this issue Aug 24, 2023 · 14 comments
Closed

net/wireguard: 'vpn' configure plugin hook missing #3565

schuellerstefan opened this issue Aug 24, 2023 · 14 comments
Assignees
Labels
feature Adding new functionality

Comments

@schuellerstefan
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Since 23.7 opnsense randomly fails to init wireguard interfaces at boot if DNS lookup fails.
I used a host name as Endpoint Address which needs a DNS lookup. It seems that the
PPPoE session hasn't started yet, thus DNS query fails, causing the whole wireguard interfaces the Endpoint is attached to being unavailable.

Expected behavior

wireguard interface is brought up even if DNS is used as endpoint address

Describe alternatives you considered

Switching to static IP adress solves the issue for me.

Relevant log files

See attachement
fail.log

@fichtner fichtner added support Community support feature Adding new functionality and removed support Community support labels Aug 24, 2023
@fichtner fichtner self-assigned this Aug 24, 2023
@fichtner
Copy link
Member

I know why this happens...spotted it this week doing a review for upcoming wireguard changes.

@fichtner fichtner changed the title wireguard interface start can fail at boot net/wireguard: 'vpn' configure plugin hook missing Aug 24, 2023
@fichtner fichtner transferred this issue from opnsense/core Aug 24, 2023
@AdSchellevis
Copy link
Member

Wasn't the dnsreresolve action not intended for these kind of issues?

[renew]
command:/usr/local/opnsense/scripts/Wireguard/reresolve-dns.py
parameters:
type:script
message:Renew DNS for WireGuard
description:Renew DNS for WireGuard on stale connections

ref: https://github.com/WireGuard/wireguard-tools/tree/master/contrib/reresolve-dns

@fichtner
Copy link
Member

I think this was meant for cron job use to unbreak.

what I mean is adding the vpn hook like OpenVPN and IPsec do on core which is called by newwanip which is called by PPPoE eventually.

@fichtner
Copy link
Member

To clarify this is pre os-wireguard 2.0. The problem always existed. Flaky DNS during boot is a main driver for this issue. Most people don't even realise that until it happens.

@fichtner
Copy link
Member

@AdSchellevis looks like you are right... does this work for you then? d6df03a56

@wkochFPV
Copy link

wkochFPV commented Sep 2, 2023

Same problem with wireguard tunnel, here. Worked reliably in previous versions.
Cron job "Renew DNS for WireGuard on stale connection" does not resolve the issue for me!

/usr/local/opnsense/scripts/Wireguard/wg-service-control.php: The command '/usr/bin/wg setconf 'wg1' '/usr/local/etc/wireguard/wg1.conf'' returned exit code '1', the output was 'Name does not resolve: dyndnsexample.com:51993' Configuration parsing error'`

@fichtner
Copy link
Member

fichtner commented Sep 2, 2023

„Name does not resolve“ is a pretty basic support issue, not a technical problem. Nothing we do here will solve it.

@wkochFPV
Copy link

wkochFPV commented Sep 16, 2023

Sorry for opening this up, again. Problem persists with OPNsense 23.7.4

I know that "Name does not resolve" is a basic error, but the endpoint domain name is correct and CAN be resolved. The problem with this is, that the DNS does not resolve upon opnsense startup, but does resolve later on.

  1. Opnsense boots
  2. During wireguard startup, DNS resolve for the endpoint fails. Wireguard does not load configuration at all. (with failure above)
  3. The "/usr/local/opnsense/scripts/Wireguard/reresolve-dns.py" (wireguard renew) action does not fix the issue, since it only sets peer and endpoint, but not pre-shared key, allowed ips etc. -> Wireguard handshake keeps failing

Manual intervention is needed: restarting wireguard service via gui -> DNS now resolves -> tunnel is successfully established.

Possible solution:
Reassign config file prior to setting peer in reresolve-dns.py for example with command
"wg setconf wg1 /usr/local/etc/wireguard/wg1.conf"

@fichtner
Copy link
Member

I’m not sure it makes too much sense. It might not be a wireguard issue. It’s trying to restart all the time on dynamic changes. Maybe the root issue is the WAN configuration and DNS setup.

@wkochFPV
Copy link

From my point of view, wireguard is started too early upon opnsense bootup.

From the attached excepts of various log files you can see, the following sequence of events:

  1. LAN interface (igc0) is up
  2. Wireguard instance wg1 is started and fails due to DNS problem (WAN is not up, yet!)
  3. WAN interface (igc3) is up
  4. "Renew DNS for Wireguard" is called, at that time, wg1 config is not loaded, command does not establish tunnel

It seems that the statup sequence does not account for the possibility, WAN is not attached to the first available interface. (It will probably also not account for temporary DNS server failures)

Best regards, Walter

2023-09-16T15:20:38 Notice configd.py [2fd65f11-34be-4916-a523-2ccfceedbd30] Linkup starting igc0 2023-09-16T15:20:38 Notice configd.py [07508d5e-ecbf-4daa-8063-e48ef44be722] configure wireguard instances 2023-09-16T15:20:41 Notice wireguard Wireguard interface WKhomeWireguard (wg1) started 2023-09-16T15:20:41 Error wireguard /usr/local/opnsense/scripts/Wireguard/wg-service-control.php: The command '/usr/bin/wg setconf 'wg1' '/usr/local/etc/wireguard/wg1.conf'' returned exit code '1', the output was 'Name does not resolve: xxxx.mooo.com:51234' Configuration parsing error'
2023-09-16T15:20:46 Notice configd.py [3e84c15b-ae35-487c-9e00-88baaef5b98e] Linkup starting igc3
2023-09-16T15:20:44 Notice configd.py [ebca1ea6-4ce8-4154-a9ee-72e416e8c7ba] Renew DNS for WireGuard`

@fichtner
Copy link
Member

Linkup starting igc3 is suspicious. The only reason for this is probably netmap being used. I’d rather like to see the system log.

@fichtner fichtner reopened this Sep 16, 2023
@wkochFPV
Copy link

wkochFPV commented Sep 16, 2023

It’s trying to restart all the time on dynamic changes.

Unfortunately it doesn't. If the config is not loaded during initial wg startup, the "wireguard renew" action fails.

System log pulled from /var/logs: see syslog.log
General log copied from GUI: see general.log

Thank you for your efforts!

general.log
syslog.log

@wkochFPV
Copy link

wkochFPV commented Sep 16, 2023

It’s trying to restart all the time on dynamic changes.

It is probably not necessary to review the logs (see above). You're assuming that your plugin hook restarts wireguard on any type of change, but it doesn't. If the config is not loaded during initial wg startup, the "wireguard renew" action fails.

Here is what happens from wireguard side of view:

This is what "wg show all" displays after initial startup with failed DNS request. Interface seems to exist, but with no configuration at all.

interface: wg1 listening port: 25027

This is what happens after your "renew" (what you consider a restart) action (/usr/bin/wg set wg1 peer XXXX endpoint xxx.mooo.com:12345) with successful DNS resolution. Please note: Allowed ips and preshared key are empty/missing.

`interface: wg1
listening port: 25027

peer: XXXX
endpoint: 1xx.1xx.2xx.2:12345
allowed ips: (none)`

This is what a successfully established tunnel looks like:

`interface: wg1
public key: XXX
private key: (hidden)
listening port: 3826

peer: XXX
preshared key: (hidden)
endpoint: 1xx.1xx.2xx.2:12345
allowed ips: 192.168.26.0/24, 100.102.0.0/15, 161.156.128.32/28, 188.144.0.0/15, 192.168.50.0/24, 192.168.3.0/24, 10.11.0.2/32, 192.168.2.0/24
latest handshake: 19 seconds ago
transfer: 527.65 KiB received, 614.14 KiB sent`

Reassigning the config file in reresolve-dns.py (wireguard renew) seems necessary ("wg setconf wg1 /usr/local/etc/wireguard/wg1.conf").

Another (in my oppinion rather bad fix) would be: start wg with dummy static endpoint ip (so that it loads config), then assign correct endpoint IP oder DNS name later (using your renew action).

@AdSchellevis
Copy link
Member

The reresolve dns script is only intended to resolve an address again as per upstream example.

We might consider calling configure on interface event, but it likely needs some additional checks in the existing script (maybe required for future CARP changes anyway)

The root cause however might be something different as DNS should be available at that point in time and the link event looks rather suspicious (in which case forcing a resolve for a functional tunnel should be all it needs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adding new functionality
Development

No branches or pull requests

4 participants