MutiWAN and Reset States #5387

ElXk6 · 2021-12-02T08:37:15Z

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Similar issues or feature requests:
#3979
https://forum.opnsense.org/index.php?topic=25818.0

Describe the bug

We are using MultiWAN with 2 Uplinks with:

Gateway switching (Allow default gateway switching => enabled)
Kill States ( Disable State Killing on Gateway Failure => not ticked)
Sticky Connections (Use sticky connections => not ticked)
Two Gateways, one with higher prio
Also tested Gateway group

On top of that, i run a OpenVPN Client Connection (TCP)

When I produce the active Gateway failure, the Gateway switching jumps in, the OpenVPN Tunnel times out and the takeover is fine. It also seems to do a TCP States Reset since my SSH Tunnel/Access dies.

HOWEVER: If I switch back to the Active Gateway it switches back to the main one again, BUT the TCP States does not get killed.

The SSH Session is still active. Not states Reset seem to happen.
If I kill the ESTABLISHED connection in the "States Dump" GUI, then it will start to connect via the active/correct gateway.

So wonder if:
-I set up something wrong?

the state reset just happens by design on the 1st failover
the state reset function is a bug and should be triggered when jumping back to the primary interface

To Reproduce

Steps to reproduce the behavior:

Setup MutliWAN setup with
- Gateway switching (Allow default gateway switching => enabled)
- Kill States ( Disable State Killing on Gateway Failure => not ticked)
- Sticky Connections (Use sticky connections => not ticked)
Test interruption of default gateway
=> State Reset happens, ssh connection goes down
Wait till all seams fine
Reconnect default gateway
Gateway with higher prio get default gateway
=> State doesn't get reset, ssh connection is up
Wait (60min)
=> Still backup gateway is in use
Kill states manually (Firewall: Diagnostics: States)
=> State Reset happens, ssh connection goes down, default gateway gets used

Expected behavior

If we set a gateway with higher prio, it should jump back to default gateway, like all other connections.
Also, MultiWAN firewall rule seams to do nothing in this behavior, it gets also ignored.

Describe alternatives you considered

An option to force connections back.

Additional context

All other connections like HTTP/HTTPS/ICMP are jumping back and forth between default gateway and backup gateway.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 21.7.6-amd64
FreeBSD 12.1-RELEASE-p21-HBSD
OpenSSL 1.1.1l 24 Aug 2021
AMD G-SERIES SOC GX-416RA 1.6 GHz Quad-Core
Network Intel® I210-AT

AdSchellevis · 2021-12-02T09:10:37Z

you could try #5367 (comment) , but if it's specifically for OpenVPN clients you might have to wait for @mimugmail as he offered to setup a test on his end.

ElXk6 · 2022-01-10T11:21:21Z

Are there any updates on this issue?

you could try #5367 (comment)

In my case, it is mostly OpenVPN specific.

OPNsense-bot · 2022-05-31T07:05:12Z

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository,
please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue,
just let us know, so we can reopen the issue and assign an owner to it.

ElXk6 · 2023-06-05T09:24:23Z

Is there anything new here?

I had it recently again, that a VPN connection ran for several weeks over the wrong WAN interface, until we noticed problems.

Is there an extra option for this?
Dynamic state reset was removed and can no longer be used, that is also no longer an option for it.

For some setups it would also be nice to just have a hardreset for the second gateway, because the second gateway should really only be used in case of emergency (e.g. LTE uplinks). Maybe also an option that you can set for each interface seperat?

Is this currently a "works as intended"?
Because the only solution I currently see is monitoring whether an gateway is still used or even write a script that the second gateway is always disabled when the first is up :/

fichtner · 2023-06-05T09:55:49Z

To most reporters in the past the disruptive clearing of states is the actual undesirable outcome. If you need it you could throw a script into /usr/local/etc/rc.syshook.d/monitor directory as per https://docs.opnsense.org/development/backend/autorun.html and do the relevant pfctl magic there.

Cheers,
Franco

ElXk6 · 2023-06-05T10:03:39Z

To most reporters in the past the disruptive clearing of states is the actual undesirable outcome

Okay, I had already suspected that.

If you need it you could throw a script into /usr/local/etc/rc.syshook.d/monitor directory as per https://docs.opnsense.org/development/backend/autorun.html and do the relevant pfctl magic there.

Thanks, then I will think about something here.
An official configurable option would of course still be nice :).

fichtner · 2023-06-05T10:11:16Z

I don't mind a feature request, but it must be designed correctly: clearing all states everywhere is not an option anymore as it will lead to the same reports again. Working on the gateway monitoring code the past few weeks there is quite a bit of complexity involved in setup at hand (these are already multiple requirements) and how the expectation of the failover will go. At the moment monitoring is target driven: search the best candidate. Handling the previous candidate can add a lot of complexity that might not be worth it (I don't remember any such request from the past on how to deal selectively with lines being demoted).

mimugmail · 2023-06-05T12:25:51Z

A nightly cron with VPN reset should also do the trick

ElXk6 · 2023-06-05T13:53:26Z

Yes, I thought about it a bit.
I think I go with you, as it is now it will be best for most, there are too many cases to cover.

You can of course throw in options like, service restarts at x o'clock, if a gateway was down, as mimugmail already meant.
But for one person it is better early in the morning for the other in the evening etc.
I think you can't meet all the requirements here without a lot of time and effort.

Yes, clearing all states is not a good idea, if then only for the IP range of the failover gateway. But even here I had it, that the connections were resumed after a clear, probably because the client answered the udp stream again.
In this case only a gateway deactivate => clear states => activate gateway helped.
But I did not look more closely here, maybe it was my fault.

Maybe a notification that services are still running through the failover gateway would be enough, so that you can react to it manually.
But I think everyone can also monitor for themselves and their requirements.

Sorry for the reopening, I think the topic has settled again for now :D.

alex8654 · 2024-04-03T14:07:10Z

I noticed this problem not just on VPNs, but on any traffic. Let's say there is a state open via WAN1, the interface goes down, the state is still on WAN1 and does not fail over to WAN2. I have confirmed this behaviour, I need to manually reset the states, or wait for it to time out due to inactivity. If you leave ping running, or traceroute, it will never time out, and it will never take the other interface that is up.

gitmachtl · 2024-04-09T22:59:08Z

I have the same problem. Currently migrating from Draytek Routers to OPNSense. Dual WAN with a CableModem and LTE Connection. When the WAN(Cablemodes) comes back up again, the states for WAN2(LTE) are not killed and clients stick to those connections.

Can someone please do something about that? Can someone tell me how to write a script that is killing WAN2 states once WAN is ok again?

Thanks!

gitmachtl · 2024-04-10T19:21:25Z

I made myself a solution .. for those who are interested in, its here:
#6803 (comment)

fichtner added the support Community support label Dec 9, 2021

OPNsense-bot closed this as completed May 31, 2022

OPNsense-bot added the help wanted Contributor missing / timeout label May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MutiWAN and Reset States #5387

MutiWAN and Reset States #5387

ElXk6 commented Dec 2, 2021 •

edited

AdSchellevis commented Dec 2, 2021

ElXk6 commented Jan 10, 2022

OPNsense-bot commented May 31, 2022

ElXk6 commented Jun 5, 2023

fichtner commented Jun 5, 2023

ElXk6 commented Jun 5, 2023

fichtner commented Jun 5, 2023

mimugmail commented Jun 5, 2023

ElXk6 commented Jun 5, 2023

alex8654 commented Apr 3, 2024

gitmachtl commented Apr 9, 2024

gitmachtl commented Apr 10, 2024 •

edited

MutiWAN and Reset States #5387

MutiWAN and Reset States #5387

Comments

ElXk6 commented Dec 2, 2021 • edited

AdSchellevis commented Dec 2, 2021

ElXk6 commented Jan 10, 2022

OPNsense-bot commented May 31, 2022

ElXk6 commented Jun 5, 2023

fichtner commented Jun 5, 2023

ElXk6 commented Jun 5, 2023

fichtner commented Jun 5, 2023

mimugmail commented Jun 5, 2023

ElXk6 commented Jun 5, 2023

alex8654 commented Apr 3, 2024

gitmachtl commented Apr 9, 2024

gitmachtl commented Apr 10, 2024 • edited

ElXk6 commented Dec 2, 2021 •

edited

gitmachtl commented Apr 10, 2024 •

edited