Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wireguard: not fully configured after reboot #7148

Closed
hakkabara opened this issue Jan 19, 2024 · 31 comments
Closed

wireguard: not fully configured after reboot #7148

hakkabara opened this issue Jan 19, 2024 · 31 comments
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@hakkabara
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug
Hello I have a bug.
I have a wireguard tunnel for friends and family to reach my services.
I didnt change any rules for weeks everything was fine I just updated and rebooted and then my rules didnt apply anymore.
I just need to add a new one, delete the rule and apply so anything is as before and everything is working...

on the top the green ones are the rules after adding a dummy rule and delete it...
See Screenshots:
Capture
1
image
image

wireguard connection is also stable and connected
image

my aliases debiandocker contains: 192.168.189.3
httphttps: 80, 443
DNS: 53, 853
see screenshot

image
image

I use a VM running on proxmox

@AdSchellevis AdSchellevis added the support Community support label Jan 19, 2024
@boomer41
Copy link

boomer41 commented Feb 4, 2024

I have the same behavior with Wireguard interfaces.
The problem started with some 23.x release, and are still present in 24.1. :(

Just saving a rule without changing anything and reloading the firewall restores functionality.

Unfortunately I have not found and log entries suggesting some kind of error.

@fichtner
Copy link
Member

fichtner commented Feb 4, 2024

This has been known to happen for unknown reasons unfortnately.

Can you post /tmp/rules.debug and /tmp/ifconfig.debug from after boot (broken state) and after manual rules apply (good state).

Cheers,
Franco

@fichtner fichtner changed the title Firewall rules dont apply after reboot interfaces: WireGuard firewall rules do not apply after reboot Feb 4, 2024
@fichtner fichtner self-assigned this Feb 4, 2024
@fichtner fichtner added cleanup Low impact changes and removed support Community support labels Feb 4, 2024
@fichtner fichtner added this to the 24.7 milestone Feb 4, 2024
@boomer41
Copy link

boomer41 commented Feb 4, 2024

Can you post /tmp/rules.debug and /tmp/rules.ifconfig from after boot (broken state) and after manual rules apply (good state).

I have captured /tmp/rules.debug and /tmp/rules.limits. /tmp/rules.ifconfig does not exist on my 24.1.
Additionally, i have captured a pfctl -vvs all in both scenarios.

Can I send you the logs via e-mail or such? Appending sensitive information on a public issue doesn't seem like the best idea :)

@fichtner
Copy link
Member

fichtner commented Feb 4, 2024

Sorry, the file is /tmp/ifconfig.debug.

You can send it to franco@opnsense.org

Thanks!

@boomer41
Copy link

boomer41 commented Feb 4, 2024

Sorry, the file is /tmp/ifconfig.debug.

Yep, that one works :)

You can send it to franco@opnsense.org

Done.

@SeimusS
Copy link

SeimusS commented Feb 4, 2024

Hello,

I have the same issue as boomer41. If there is something I can provide as well let me know franco.

Regards,
S.

@Westie
Copy link

Westie commented Feb 6, 2024

I posted this on my thread on the forums, however if someone is wanting a quick emergency fix, have a quick look at this gist.

https://gist.github.com/Westie/5557cffd927dd32de93255e5ac4a22e0

Feel free to adjust as needed!

@fichtner
Copy link
Member

fichtner commented Feb 9, 2024

It appears that the current code has issues with assigned wireguard instances and you can try this patch on 24.1 and above: 9e01b27

# opnsense-patch 9e01b27

I'm not 100% sure it solves the issues reported, but it will bring us to a more consistent outcome after reload which can reveal another underlying issue better.

Cheers,
Franco

@fichtner fichtner changed the title interfaces: WireGuard firewall rules do not apply after reboot wireguard: not fully configured after reboot Feb 9, 2024
@boomer41
Copy link

boomer41 commented Feb 9, 2024

It appears that the current code has issues with assigned wireguard instances and you can try this patch on 24.1 and above: 9e01b27

# opnsense-patch 9e01b27

I'm not 100% sure it solves the issues reported, but it will bring us to a more consistent outcome after reload which can reveal another underlying issue better.

Cheers, Franco

I have applied your patch, but the issue still persists the same way :(
The code before did always reload the interface, didn't it? Now it doesn't when the interface exists.

@fichtner
Copy link
Member

fichtner commented Feb 9, 2024

@boomer41 from your output that seemed to be the obvious issue and I could reproduce. Can you send me the new data gathered the same way but with the patch applied? The rules looked good in both cases (no diff between them) so I suspect a more fundamental issue relating to VIPs or DNS resolution or requiring overlap in connectivity (boiling down to route complexity) which I can't see from these files gathered, but step by step will do it.

So just to be clear after reboot it doesn't work and then you do "x" and it works? What is "x" in precise terms again so I can focus on this.

Thanks,
Franco

@boomer41
Copy link

boomer41 commented Feb 9, 2024

Logs sent via mail with the patch applied.

As stated in the email, I just log on to the management UI, edit & save some arbitrary rule to get the "Apply changes"-Button. After hitting that apply button, things start working.

@fichtner
Copy link
Member

fichtner commented Feb 9, 2024

Can you confirm that running

# configctl filter reload

Does the same thing?

@boomer41
Copy link

boomer41 commented Feb 9, 2024

Can you confirm that running

# configctl filter reload

Does the same thing?

Can confirm. Running that command fixes the issue the same way as the UI way does.

AdSchellevis added a commit that referenced this issue Feb 10, 2024
…alls its own routes, we are not able to track them properly. If that's the case and the user reconfigures, drop all interface addresses instead of removing the interface (and creating it again).

There is a small chance of remnants after the fact, but dropping the interface is more problematic to recover from as it will invalidate filter rulesets as well.
The user is still able to force a stop/start using the reload action, which also reloads the filter after the fact.

proposal for #7148
fichtner added a commit that referenced this issue Feb 27, 2024
…7148

(cherry picked from commit b8665c9)
(cherry picked from commit 7413ca6)
(cherry picked from commit 30862f8)
(cherry picked from commit dbe52ee)
(cherry picked from commit e0cee10)
@fichtner
Copy link
Member

Here is a backport of our recent efforts which adds cleanly to 24.1.2 7d35204f2a, to apply:

# opnsense-patch 7d35204f2a

Cheers,
Franco

@SeimusS
Copy link

SeimusS commented Feb 27, 2024

I've tried the patch on 24.1.2 sadly didn't fix the problem for me. After the reboot issue still present.

sudo opnsense-patch 7d35204f2a
Password:
Fetched 7d35204f2a via https://github.com/opnsense/core
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 7d35204f2aab13311aee3a266f07972370922ccd Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Thu, 8 Feb 2024 17:13:32 +0100
|Subject: [PATCH] wireguard: address assorted interface configuration
| inconsistencies #7148
|
|(cherry picked from commit b8665c9da0780a7744da5e84e6cafa4183f37f57)
|(cherry picked from commit 7413ca696dbb5e8c1f4786207054e43c05b9f8c4)
|(cherry picked from commit 30862f87113865898630127c2f1f790d34678be1)
|(cherry picked from commit dbe52eeaa9c17ec56a22ff6cefcf6b94615bd8b4)
|(cherry picked from commit e0cee10ad13e33e603a50c33c62a31b5dd8def6e)
|---
| .../scripts/Wireguard/wg-service-control.php  | 48 ++++++++++++++-----
| 1 file changed, 36 insertions(+), 12 deletions(-)
|
|diff --git a/src/opnsense/scripts/Wireguard/wg-service-control.php b/src/opnsense/scripts/Wireguard/wg-service-control.php
|index 09dcd57b19..7f801b9b44 100755
|--- a/src/opnsense/scripts/Wireguard/wg-service-control.php
|+++ b/src/opnsense/scripts/Wireguard/wg-service-control.php
--------------------------
Patching file opnsense/scripts/Wireguard/wg-service-control.php using Plan A...
Hunk #1 succeeded at 62.
Hunk #2 succeeded at 80.
Hunk #3 succeeded at 134.
Hunk #4 succeeded at 273.
Hunk #5 succeeded at 287.
Hunk #6 succeeded at 332.
done
All patches have been applied successfully.  Have a nice day.

*** FINAL System shutdown message from ***** ***

System going down IMMEDIATELY



Regards,
S.

@boomer41
Copy link

# opnsense-patch 7d35204f2a

Reversed all debug patches and applied this one.
In contrast to @SeimusS, the patch works for me. Running 24.1_1.

@fichtner
Copy link
Member

@SeimusS what was "the problem" again? There have been overlapping issues but some are beyond our control, especially with DNS-based endpoints and dynamic connections.

@boomer41 ok, that's good. We'll be adding this to 24.1.3 to bring us closer to being able to identify other issues then.

Cheers,
Franco

@SeimusS
Copy link

SeimusS commented Feb 27, 2024

@boomer41
Happy to hear it worked for you.

@fichtner
Basically the problem for me is that after a reboot of OPN, the traffic coming out of WG to External destinations, public, is not working. I can still reach the LAN without problem but anything that goes to internet will fail if the source is a WG host. When looking closer for me it looks like NAT is not being applied correctly for WG even thou I use automatic rules and can see the WG interface as part of the auto created rule.

When hitting the apply button in FW > NAT > Outbound or using "configctl filter reload" it starts to work normally again.

edit: I use RA setup, DNS is not OPNsense but on RPi.

Regards,
S.

@fichtner
Copy link
Member

The wireguard logs with the patch applied might help pin this down further. Basically it tries to retain wireguard interfaces so that NAT et al are applied correctly all the time. Not sure where the issue now lies. Are your endpoint addresses using hostnames instead of IPs?

@SeimusS
Copy link

SeimusS commented Feb 27, 2024

No, Clients use IPs not hostnames. Each WG client has configured a IP/32 on its end.

image

Similar as well IP on the dedicated host.

Regards,
S.

@fichtner
Copy link
Member

@SeimusS Ok, it would be better to start with the basics as described in #7148 (comment) sent to franco AT opnsense DOT org

@SeimusS
Copy link

SeimusS commented Feb 27, 2024

Done, logs generated (& sent to you) as per the comment mentioned after reboot, pre and post reapplying the NAT rules/"configctl filter reload"

Let me known if you need anything else from my OPNsense box.

Regards,
S.

fichtner added a commit that referenced this issue Mar 5, 2024
…7148

(cherry picked from commit b8665c9)
(cherry picked from commit 7413ca6)
(cherry picked from commit 30862f8)
(cherry picked from commit dbe52ee)
(cherry picked from commit e0cee10)
fichtner pushed a commit that referenced this issue Mar 12, 2024
…alls its own routes, we are not able to track them properly. If that's the case and the user reconfigures, drop all interface addresses instead of removing the interface (and creating it again).

There is a small chance of remnants after the fact, but dropping the interface is more problematic to recover from as it will invalidate filter rulesets as well.
The user is still able to force a stop/start using the reload action, which also reloads the filter after the fact.

proposal for #7148
@fichtner
Copy link
Member

In accordance with @SeimusS we're closing this issue and wait for other user reports or code changes to make the problem more visible. The main problem reported here, however, has been fixed since 24.1.3.

Cheers,
Franco

@illum1n4ti
Copy link

Hallo Franco

I do still have issues with Wireguard from surfshark. Every update (OPNsense 24.1.7) i am applying this opnsense-patch 7d35204 and it works but is this the way i have to make my wireguard wotk?

cheers

@fichtner
Copy link
Member

@illum1n4ti it depends how old your initial setup is and what "have issues" means? There have been a lot of individual cases that were cleared up and re-applying the patch just removes it so you are going backwards in time...

@illum1n4ti
Copy link

@fichtner Thank u for your replay
Before i have been using version 23.x and when i upgraded to 24.1 was still fine working, but when the WireGuard plugin was removed and was implemented in OPNsense than the problems started. I am not getting handshake with surfshark

thanks to the patch i still can use Surfshark VPN with WireGuard, but i am afraid with the patch i am using old protocol plus maybe there are bugs for security reasons

I hope u could give me some advise

@fichtner
Copy link
Member

Common mistake was that assigned wireguard interface had IPv4 mode not set to "none" (IPv6 mode too) which is now prohibited and IP address needs to be set in instance as "tunnel address". Brought it back to life for most people.

@hsand
Copy link

hsand commented Jul 17, 2024

I had the same issue with 24.1.10, seems to be related to NAT.

I was able to get things working by following Step 4(b) in the documentation (even though it says this step is not necessary) - https://docs.opnsense.org/manual/how-tos/wireguard-client.html#step-4-b-create-an-outbound-nat-rule

@itsMaxio
Copy link

The problem is still not solved. Automatic NAT rules are still not added to the Wireguard interface, they appear in the GUI but they are not in the debug files. Only after restarting the rules via GUI or console they are added.
image

Does anyone have any solution? Maybe you @fichtner would have a solution? I have manual rules set but I wonder why the automatic ones don't work.

@SeimusS
Copy link

SeimusS commented Sep 4, 2024

@itsMaxio
This could be a bit different issue than the one fixed in that PR.

If I remember correctly I have the same problem or similar problem you reported. Basicaly WG is working but only able to access LAN related stuff not able to access Internet as the Automatic NAT rules are missing when checking the debug.

Tried to Tshoot it with Franco's help but I couldn't find why its happening. So I have currently a permanent workaround in place that reloads the filter 2s after boot.

1. Create a file in the rc.syshook.d/start/ called 92-wireguard-firewall-workaround
vi /usr/local/etc/rc.syshook.d/start/92-wireguard-firewall-workaround

2. Put in

#!/bin/sh
sleep 2
configctl filter reload

This basically makes sure in case the device gets rebooted the Wireguard will work as intended.

You also may want to open a new Ticket for this WG issue as this thread was closed with a PR fix.

Regards,
S.

@b4zyl
Copy link

b4zyl commented Sep 27, 2024

@SeimusS

Even when i did add your workaround script to Opnsense i still need to manually reload Wireguard service each Opnsense restart :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

10 participants