Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi WAN testbeds #1821

Closed
mimugmail opened this issue Sep 13, 2017 · 16 comments
Closed

Multi WAN testbeds #1821

mimugmail opened this issue Sep 13, 2017 · 16 comments
Assignees
Labels
bug Production bug

Comments

@mimugmail
Copy link
Member

I did some testing with Multi WAN in combination with (hopefully not too soon) deprecated "Enable gateway switching". My intention is to test if services enabled at WAN side are reachable (WebGUI and just ping wan address)

Scenario: LAN, WAN1 (Static IP, default route), WAN2 (dhcp, receives default gw, but not set to default)

Test1:
No gateways in FW rules.
No gateway groups
GW switching active

Result1:
Failover works
Both WANs reachable


Test2:
No gateways in FW rules.
No gateway groups
GW switching NOT active

Result2:
Failover doesn't work
Both WANs not reachable

Sidenote2:
Enabling gw switching afterwards doesn't fix this situation without reboot. Reenabling WAN makes default gw available again but in UI it's marked as pening. Only restart of apinger fix this (just cosmetic, perhaps some syscall missing here)


Test3:
No gateways in FW rules.
GWGROUP, WAN1 Tier1, WAN2 Tier2
GW switching active

Result3:
Failover works
Both WANs reachable

Sidenode3: This time reenabling WAN1 didn't end with apinger status pending


Test4:
FW Rule LAN to ANY set to GWGROUP.
GWGROUP, WAN1 Tier1, WAN2 Tier2
GW switching active

Result4:
Same as Result3


Test5:
FW Rule LAN to ANY set to GWGROUP.
GWGROUP, WAN1 Tier1, WAN2 Tier2
GW switching NOT active

Result5:
Both WANs not reachable
Failover for LAN rule works

I'm not sure if this results helps in any way, but I hope it let's gateway switching live a bit longer, since as a MSP we have the requirement to reach our managed firewalls via both/all WANs. I think this also relates to services like OpenVPN to make them redundantly available.

Perhaps other people also want to contribute their experiences.

@fichtner fichtner self-assigned this Sep 13, 2017
@L1ghtn1ng
Copy link

L1ghtn1ng commented Sep 13, 2017 via email

@mimugmail
Copy link
Member Author

For the archive:
WAN static, set to default, everything works but gateway status "Pending"
Switched to WAN DHCP as default

Result: default gateway route lost, not reproduceable when status for both OK. Will wait until this happens again. Restarting apinger doesn't help. /usr/local/etc/rc.reload_interfaces will fix it.
Will also try reload routing next time

@NOYB
Copy link
Contributor

NOYB commented Dec 16, 2017

My config is:
Interfaces:
LAN static
WAN DHCP
WAN2 DHCP (OPT1)
Gateways:
WAN_DHCP (default), gtw ip
WAN2_DHCP, gtw dynamic

As mentioned in the forum, can trigger loss of default route on WAN by release/renew on WAN2 (OPT1).

@mimugmail
Copy link
Member Author

@NOYB have you tried the settings posted in the forums ( https://forum.opnsense.org/index.php?topic=6643.0 )

@NOYB
Copy link
Contributor

NOYB commented Dec 16, 2017

Not yet.

@AdSchellevis AdSchellevis modified the milestones: 18.1, 18.7 Dec 22, 2017
@AdSchellevis
Copy link
Member

As just discussed with @mimugmail on IRC, there are some different issues with pppoe and multiwan setups. Because we don't have a pppoe setup here at our office it's difficult to track them down, and we really should refactor (at least some of) the ancient ppp interface code (one of the topics we skipped until now).

The plan is to work together in Q1 2018 and refactor the underlaying code, @mimugmail can provide us with the needed equipment. When decently structured, we can see if issues still remain and what they are exactly.
2018 sounds like a great year to fix these issues.

@fichtner I have assigned myself to do the work, but if you want it back, just let me know.

@AdSchellevis AdSchellevis added bug Production bug cleanup Low impact changes labels Dec 22, 2017
@fichtner
Copy link
Member

@AdSchellevis sounds good to me, will be happy to offer review :)

One thing that seems to make PPPoE stumble on IPv6 seems to be dyndns, we're tracking this via:

#1403 which is likely solved by 9f535ba

I'm about to merge this code into 18.1, it looks sane enough, reduces the risk of races and reload stability by deferring all plugin hooks... a second opinion from you is very much welcome.

@mimugmail
Copy link
Member Author

Just for my reference to catch most constellations:

https://forum.opnsense.org/index.php?topic=6817.0
https://forum.opnsense.org/index.php?topic=6686.0

@NOYB
Copy link
Contributor

NOYB commented Jan 12, 2018

For me OPT1 interface DHCP renewal causing the default route on WAN interface to be removed. Can manually trigger it at will by doing release/renew on the OPT1 interface.
https://forum.opnsense.org/index.php?topic=6643.msg28598#msg28598

@mimugmail
Copy link
Member Author

https://forum.opnsense.org/index.php?topic=7199.0

Not sure if it matches ... but when you follow the forums, dhcp and/or pppoe makes more and more problems. I'm back at work on March, perhaps we can start troubleshooting then? I have two machines connected via static and I can plug in pppoe directly or lte via dhcp.

@AdSchellevis
Copy link
Member

@mimugmail let's try to do that, the coming weeks I'm quite busy, so ping me when you're back.

@fichtner fichtner removed this from the 18.7 milestone Feb 27, 2018
@fichtner
Copy link
Member

fichtner commented Feb 27, 2018

Here's an interesting one... pfsense/pfsense@d35dfaaec

Tracking via #2164

@fichtner
Copy link
Member

I've committed another related cleanup and removed the deprecation note for default gateway switching. Please test the development version on 18.1.3... it'll likely give us better behaviour and increased logging during switching routes.

fichtner added a commit that referenced this issue Feb 27, 2018
In order for default gateway switching to work we need to
call routing first, then set up gateways, lastly invoke
filter reload which currently chains the gateway swtich code.

While here, remove deprecation notes.
@fichtner
Copy link
Member

fichtner commented Feb 27, 2018

there is also an additional patch here for default gateway switching 07785e2 (not yet on master)

fichtner added a commit that referenced this issue Feb 28, 2018
In order for default gateway switching to work we need to
call routing first, then set up gateways, lastly invoke
filter reload which currently chains the gateway swtich code.

While here, remove deprecation notes.

(cherry picked from commit b30cbe1)
@mimugmail
Copy link
Member Author

Seems @fichtner fixed the pppoe stuff today with cbad1bfe020 .. stable pppoe failovers 👍

Next week we'll face double DHCP setup like @NOYB runs it ..

@fichtner fichtner self-assigned this Mar 5, 2018
@mimugmail
Copy link
Member Author

I'm closing this now since all kinds of combinations are working fine!
Attached the XLS with my testings and setups.

Most important:
Firewall - Settings - Adanved: At Gateway Monitoring tick "Kill states" so it's disabled. Tick Gateway Switching (it allows you to run local services, like HA OpenVPN server).
At Multi WAN tick all three options
System - Gateways - Single: Choose the primary gateway as default, enable montoring for ALL gateways with EXTERNAL systems! Choose priorities for weighting in advanced tab (1 or 2).
System - Gateways - Group: Set up the correct tiering.
Firewall - Rules: Set up your balancing rule

I tested all possible combinations of DHCP, PPPOE and static WAN ... BUT .. only v4! Not sure how it works with v6 but I'm optimistic.

Thanks @fichtner and @AdSchellevis for your help :)
OPNsense_Multiwan.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

5 participants