Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system: switch default gateway by optionally selected group #2279

Open
mimugmail opened this Issue Mar 21, 2018 · 61 comments

Comments

Projects
None yet
5 participants
@mimugmail
Copy link
Member

mimugmail commented Mar 21, 2018

Hi,

is it possible to set a metric in routes or better in gateway config?
The problems is when you have more than 2 lines you can not set an order which line follows after primary down (with gateway switching).

With Cisco you can set as many default gateways and use metrics to set the priority.

Is this also possible with FreeBSD / OPN?

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Mar 21, 2018

I think route metrics are not in FreeBSD #123

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Mar 21, 2018

Hm, OK, Not used by recent kernels. In Gateway - Advanced you can set a priority, would it be possible to patch the Script for adding routes to respect this setting?

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Apr 1, 2018

@mimugmail what exactly do you mean?

@fichtner fichtner added the support label Apr 1, 2018

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Apr 1, 2018

Via Advanced in gateways you can set a priority. When you have 3 you can set them 1 to 3 and when failover occurs, the script changing the routes could (if possible) check if the gateway which is down is a lower number, and if yes is there another gateway with a lower number, then use this as next default gateway, if not, self default gateway.

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Apr 1, 2018

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Apr 2, 2018

Oh, weight, correct. But that doesn't matter since in setups were weight is activly used, you don't care which default gateway the system has. But when you have a 1G, a 100M and a LTE line you don't want to see the firewall switch from 1G to LTE. Sure, the default gw is highest and there can only be one at a time. But it would be good to have a choice :)

In the long term (20.1, 20.7) I'd love to see the way Cisco does this with monitoring groups and tracked interfaces. 👍

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Apr 2, 2018

Okay, i think we’re on the same page then. 😊

@fichtner fichtner self-assigned this Apr 2, 2018

@fichtner fichtner added feature and removed support labels Apr 2, 2018

@fichtner fichtner modified the milestones: 18.1, 18.7 Apr 2, 2018

@fichtner fichtner changed the title set metrics/ad in routes system: influence default gateway switching order by weight Apr 2, 2018

@fichtner fichtner modified the milestones: 18.7, 19.1 Jun 23, 2018

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Jul 30, 2018

@AdSchellevis this was the topic we talked via IRC lately :)

@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Jul 30, 2018

@mimugmail ok, to summarize what we've discussed (if I remember correctly).

If @fichtner agrees, we could add a marker for "backup default" with a weight and use the weights to sort the gateways. This would allow us to prioritise default gateway switching. Ideally we would like to use policy based routing for local traffic too, but that is more of a long run solution. agreed? (if so, I'm offering todo the work for this item)

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Jul 30, 2018

yes, but I want to see how da4d25e works out on 18.7... it's preliminary work to exclude certain gateways from default gateway switching. it's not practical to mark a gateway down and should probably be a separate option, but if this separate option is what we talk about here with a priority setting that would be best.

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Jul 30, 2018

(it's correct to exclude down gateways from switching, I just mean it's impractical if you don't want it down and still not use it for default switching)

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Jul 30, 2018

Full agree with this! :) (also with fichtner comments couple of sec's ago)

Just to have it here: The need for local PBR traffic would cover all transparent stuff like Squid, siproxd, ftp-proxy etc.

Local PBR can be a mind/long term task :)

Thanks guys, very appreciated!

@fichtner fichtner removed their assignment Sep 13, 2018

@fichtner fichtner assigned fichtner and unassigned AdSchellevis Nov 19, 2018

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Nov 24, 2018

  • Add gateway group selection for IPv4 and IPv6
  • Move gateway switching back to System: Settings: General (not deprecated, not part of firewall)
  • Validate IPv4 / IPv6 on gateway groups (no model property!)
  • Possibly move fixup_default_gateways() to routing setup
  • Pass down gateways in order for fixup_default_gatways() to use them only or any available one
@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Nov 24, 2018

If you need root access to some Multi WAN systems, ping me <3

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Nov 24, 2018

@mimugmail I'll let you know when to test. Still some things to figure out.

AdSchellevis added a commit that referenced this issue Apr 14, 2019

Routing, gateways. minor regression in getDefaultGW() we should only …
…return a gateway with an address as default here. for #2279
@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 14, 2019

@fichtner can you try this 7a8b12f ? I might have been a bit to enthusiastic here by adding tunnel endpoints automatically as possible gateways. The old code only included the ones which wrote a /tmp/XX_router file. Other possible suspect is bfca97e, which accidentally returned a gateway without an address.

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Apr 15, 2019

When manually disabling gw with prio 1 it's changing correctly, after enabling it seems to recognize the correct gw (188) but sets 127.0.0.1 as default:

Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: entering configure using defaults
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: IPv4 default gateway set to wan
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: setting IPv4 default route to 81.24.66.188
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: removing /tmp/em0_defaultgw
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: creating /tmp/em0_defaultgw using '81.24.66.188'
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: IPv6 default gateway set to loopback
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: setting IPv6 default route to ::1
Apr 15 07:58:30 OPNsense opnsense-devel: /system_gateways.php: ROUTING: keeping current default gateway '::1'
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: removing /tmp/em0_defaultgw
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: creating /tmp/lo0_defaultgw using '127.0.0.1'
Apr 15 07:58:30 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '::1'

@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

@mimugmail ok, I'll have to check where /tmp/lo0_defaultgw comes from in this case.

@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

ok, it seems that I missed a race condition, this is all just too stupid. When a file exists with the name /tmp/[intf]_defaultgw[v6], it's considered "default" (as it was before), but the file itself is written when setting a default route...

@file_put_contents("/tmp/{$realif}_defaultgw", $gateway);

AdSchellevis added a commit that referenced this issue Apr 15, 2019

Routing, gateways. The `/tmp/*_defaultgw` construction has a race con…
…dition the way it is implemented now. for #2279

It is used by dhcp client to detect if a default route might be overwritten and it determines default gateway priority. Since I don't want to refactor the dhclient-script at the moment, we best keep the file, but remove the "default" detection.
So system_default_route() sets the file, which dhclient can pickup when a new gateway is propagated.
@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

@mimugmail @fichtner I'm not exactly sure yet why the gateway switched to localhost (other than the fact that localhost is in our list now, but with a very low priority), I'll have to do some more testing over here as well. The interaction with dhclient looks a bit fragile, so I rather not touch that one if possible.

In dd8d344 I've deleted the _defaultgw as prioritisation, which looks more logical, but I can't find any consumers for the _defaultgwv6 file. any ideas where that is/should be used?

AdSchellevis added a commit that referenced this issue Apr 15, 2019

Routing, gateways. don't consider lo0 as a default gateway candidate.…
… Since it doesn't make much sense to send all traffic to localhost, we better exclude it to keep the previous behaviour. for #2279
@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

For reference, the _defaultgw originates from quite some time ago:

Add "new status file" in pfsense/pfsense@924f202 and then at some point in time, create a loop to read the same back to determine "default" pfsense/pfsense@999111c

Then ipv6 came in pfsense/pfsense@5a5413b , which copied the same _defaultgw to _defaultgwv6, which for as far as I can find is only written, not really used in a similar way as a dhclient to renew an address

AdSchellevis added a commit that referenced this issue Apr 15, 2019

Routing, gateways. When gathering gateway status from dpinger, don't …
…consider dpinger endpoints down if not yet available. This could lead to some unexpected gateway switches. for #2279

We might consider another status for "startup", although since we report loss and delay as "~" it should already be obvious that we don't know the status yet.
@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

@mimugmail this 497f523 might be your issue, although I did commit some other small fixes as well.

While testing, I noticed my gateway overview reported a gateway down right after apply, where it actually didn't know the status yet. If a filter reload would be started at the same time (which your logs seem to suggest), it would mean that this gateway would not be considered anymore.

@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Apr 15, 2019

I'll fetch latest master tonight on a fresh machine. Seems my test machine has 2 gateways with same name in config.xml

@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 15, 2019

thanks!

@hippi-viking

This comment has been minimized.

Copy link

hippi-viking commented Apr 15, 2019

Thanks for all the hard work on this guys!
To chime in I have some problems with the current (19.1.6) gateway-switching logic which might be relevant for the reworked code as well.

My setup has a (tier 1) cable modem connection considered to work most of the time, when it doesn't I plug in my mobile (tier 2) to use them as a failover group. (The mobile's interface is locked to prevent removal, just marked as down and the gateway disabled when the cable modem works normally and the mobile is not connected.)

The problem is that when the cable modem is offline it still reacts to the DHCP request from the OPNsense box, which in turn accepts is as the default route [edit: was gateway] without any further checks.
Steps to reproduce the issue:

  • tier 1 cable modem connection offline, tier 2 mobile connection online (dpinger thinks this as well), default gw is the mobile
  • DHCPREQUEST to cable modem (192.168.100.1)
  • DHCPACK from cable modem (192.168.100.10)
  • default route changed to 192.168.100.1
  • uplink connection is lost

The bad think is that this happens every few minutes, even if I forcefully mark the cable modem's gateway as down. Would it be possible to insert a check in the logic to determine if the would-be default gateway is online at all? (Dpinger should already know about this.)

Thanks very much!

AdSchellevis added a commit that referenced this issue Apr 15, 2019

Routing, gateway_groups, don't hide gateways on edit, which keeps pre…
…sentation on new/edit equal. Previously you could have a group, containing an item that didn't exist anymore (interface removed), in which case you needed to remove the group to be able to edit it. related to #2279

AdSchellevis added a commit that referenced this issue Apr 15, 2019

Routing, gateways - groups. regression in #2279 , since "interface" c…
…ontains the configured value now, we should use "if".

AdSchellevis added a commit that referenced this issue Apr 15, 2019

routing, gateways. In gateway groups you could originally select a vi…
…p, which isn't used in our system. orgininally this came from pfsense/pfsense@ab1112d

Let's remove it while working on #2279

AdSchellevis added a commit that referenced this issue Apr 16, 2019

AdSchellevis added a commit that referenced this issue Apr 16, 2019

@AdSchellevis

This comment has been minimized.

Copy link
Member

AdSchellevis commented Apr 16, 2019

@fichtner we could probably move gwlb.inc now, I have gathered the last todo's in #3423 about the two functions that might be used in older plugins, not sure when we can kill those.

AdSchellevis added a commit that referenced this issue Apr 16, 2019

Routing, gateways. Technically we could add tunnel gateways automatic…
…ally, but since you can easily add them manually, we better start without these and only add the ones found in the /tmp/XX_router[XX] files. for #2279
@mimugmail

This comment has been minimized.

Copy link
Member Author

mimugmail commented Apr 16, 2019

Fetched latest master, only had one gateway with Prio1:

    <gateway_item>
      <interface>wan</interface>
      <gateway>81.24.66.129</gateway>
      <name>GW_WAN</name>
      <priority>1</priority>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval>1</interval>
      <descr>Interface WAN Gateway</descr>
    </gateway_item>

Then I added a second one with Prio2 and now it's active and new default:

  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>81.24.66.129</gateway>
      <name>GW_WAN</name>
      <priority>1</priority>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval>1</interval>
      <descr>Interface WAN Gateway</descr>
    </gateway_item>
    <gateway_item>
      <interface>wan</interface>
      <gateway>81.24.66.188</gateway>
      <name>NOCDialin</name>
      <priority>2</priority>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr/>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>

Apr 16 23:06:37 OPNsense opnsense-devel: /system_gateways.php: Successful login for user 'root' from: 81.24.66.132
Apr 16 23:07:22 OPNsense opnsense-devel: /system_gateways.php: ROUTING: entering configure using defaults
Apr 16 23:07:22 OPNsense opnsense-devel: /system_gateways.php: ROUTING: IPv4 default gateway set to wan
Apr 16 23:07:22 OPNsense opnsense-devel: /system_gateways.php: ROUTING: setting IPv4 default route to 81.24.66.188
Apr 16 23:07:22 OPNsense opnsense-devel: /system_gateways.php: ROUTING: removing /tmp/em0_defaultgw
Apr 16 23:07:22 OPNsense opnsense-devel: /system_gateways.php: ROUTING: creating /tmp/em0_defaultgw using '81.24.66.188'
Apr 16 23:07:22 OPNsense opnsense-devel: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '81.24.66.188'

@fichtner

This comment has been minimized.

Copy link
Member

fichtner commented Apr 17, 2019

@AdSchellevis thanks, it works out of the box now, but I have a few questions later today about the code and backwards-compat

AdSchellevis added a commit that referenced this issue Apr 17, 2019

Routing, gateways for #2279 align automatic gateways to legacy behavi…
…our. Since gif/gre interfaces already write _router files, we should only add openvpn client gateways to mimic the way it wasbefore. Also skip disabled interfaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.