Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mwan3: No interface recovers from offline if all become offline #3885

Closed
joaochainho opened this issue Jan 19, 2017 · 18 comments
Closed

mwan3: No interface recovers from offline if all become offline #3885

joaochainho opened this issue Jan 19, 2017 · 18 comments

Comments

@joaochainho
Copy link

Hi,
I noticed that no interface recovers from offline if all interfaces become offline.
The test scenario if the following: two interfaces (wan and wwan), default policy is wan as primary and wwan as backup.
If I manually run 'ifup wan/wwan' then both interfaces become online.
I noticed this issue for some time. Tested latest mwan3 version (2.0-3) in OpenWrt and LEDE (ar71xx).
Meanwhile I found out that in this state the router sends ARP requests querying the public IP addresses defined as track_ip's.

root@Router1:~# tcpdump -qni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:53:49.551576 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:50.548220 ARP, Request who-has 0.0.0.0 tell 0.0.0.0, length 28
11:53:51.548217 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:56.579945 ARP, Request who-has 0.0.0.0 tell 0.0.0.0, length 28
11:53:57.578211 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:53:58.578210 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:54:03.616117 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28
11:54:04.608211 ARP, Request who-has 208.67.222.222 tell 192.168.100.245, length 28

My config:

config interface 'wan'
	option enabled '1'
	list track_ip '8.8.4.4'
	list track_ip '208.67.222.222'
	option reliability '2'
	option count '1'
	option timeout '2'
	option interval '5'
	option down '2'
	option up '5'

config interface 'wwan'
	option enabled '1'
	list track_ip '8.8.8.8'
	list track_ip '208.67.220.220'
	option reliability '1'
	option count '1'
	option timeout '3'
	option interval '5'
	option down '3'
	option up '8'

config member 'wan_m1_w3'
	option interface 'wan'
	option metric '1'
	option weight '3'

config member 'wan_m2_w3'
	option interface 'wan'
	option metric '2'
	option weight '3'

config member 'wwan_m1_w2'
	option interface 'wwan'
	option metric '1'
	option weight '2'

config member 'wwan_m2_w2'
	option interface 'wwan'
	option metric '2'
	option weight '2'

config policy 'wan_only'
	list use_member 'wan_m1_w3'

config policy 'wwan_only'
	list use_member 'wwan_m1_w2'

config policy 'balanced'
	list use_member 'wan_m1_w3'
	list use_member 'wwan_m1_w2'

config policy 'wan_wwan'
	list use_member 'wan_m1_w3'
	list use_member 'wwan_m2_w2'

config policy 'wwan_wan'
	list use_member 'wan_m2_w3'
	list use_member 'wwan_m1_w2'

config rule 'default_rule'
	option dest_ip '0.0.0.0/0'
	option use_policy 'wan_wwan'

I'm available to provide more info and do further testing if needed.

TIA

@joaochainho
Copy link
Author

Version 1.6-3 doesn't have this issue.

@feckert
Copy link
Member

feckert commented Mar 18, 2017

You have to change the last resort to default

@joaochainho
Copy link
Author

Hi @feckert , thanks for your feedback.
Do you mean to use the default routing table as policy?

config rule 'default_rule'
	option dest_ip '0.0.0.0/0'
	option use_policy 'default'

@joaochainho
Copy link
Author

Hi @feckert , I tested with default routing table as policy.
Indeed the primary (wan) interface recovers automatically, but traffic isn't routed through the secondary (wwan) interface. Only if wan is really down (cable unplugged and no default route on that interface).
Am I missing something?
TIA

@feckert
Copy link
Member

feckert commented Mar 23, 2017

@joaochainho, I have the same scenario on my router (wan as main and wwan as backup). If mwan3track notice that interface wan is down traffic will be router to the wwan interface. And if wwan goes also down then (if use_policy default is set) mwan3track will recover the interface because he will use the default routing table. Have you set different metrics for each interface (wan / wwan) in the network config?

@joaochainho
Copy link
Author

joaochainho commented Mar 24, 2017

Have you set different metrics for each interface (wan / wwan) in the network config?

Yes, 10 for wan and 20 for wwan.
I installed and configured everything from scratch (LEDE r3844-c5e245a), and still not working for me.
Here's what I get from ping when both interfaces are down and wan supposedly should be online (wan_wwan as default policy).

root@LEDE:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable

I'm trying to figure out what's different between MWAN3 v2.x and 1.6, but no clue yet.

@feckert
Copy link
Member

feckert commented Mar 25, 2017

@joaochainho i am not aware of the version LEDE r3844-c5e245a.
Is it the latest LEDE stable 17.01 or the master.
If on master I fixed a issue #4158 maybe this could be your problem.

What interface proto do you have on the wan interface dhcp/static?
And could you add your output of ip route show to get the default route if one is set?

I think if both interfaces are down then no default route is set the router can not route the icmp pages (wan dhcp cable unpluged/ wwan dhcp lease removed and not renewed due connectivity loose). But if one of the wans get online again (wan cable pluged in / wwan dhcp lease renewed) then the icmp can be routed again (default route set be netifd on ifup event) and mwan3track declares this wan after successful pings as online again.

@joaochainho
Copy link
Author

Hi @feckert, thanks for your feedback and sorry for replying so late.
Regarding your questions,

Is it the latest LEDE stable 17.01 or the master.
If on master I fixed a issue #4158 maybe this could be your problem.

It's master commit c5e245a, and the fix for #4158 is already included.

What interface proto do you have on the wan interface dhcp/static?

wan (eth0) uses DHCP. wwan (USB modem) uses 3G protocol.

Could you add your output of ip route show to get the default route if one is set?

wan online, wwan online

~# mwan3 interfaces
Interface status:
 interface wan is online and tracking is active
 interface wwan is online and tracking is active

~# ip route show
default via 192.168.100.254 dev eth0  proto static  src 192.168.100.205  metric 10 
default via 10.64.64.64 dev 3g-wwan  proto static  metric 20 
10.64.64.64 dev 3g-wwan  proto kernel  scope link  src 10.16.194.169 
192.168.90.0/24 dev br-lan  proto kernel  scope link  src 192.168.90.254 linkdown 
192.168.100.0/24 dev eth0  proto static  scope link  metric 10 
192.168.100.254 dev eth0  proto static  scope link  src 192.168.100.205  metric 10 

wan offline, wwan offline

  • Ethernet cable is connected between the router WAN port and a switch
  • Cable between the switch and the ISP router was disconnected until wan became offline, and reconnected afterwards.
  • USB modem unplugged.
~# mwan3 interfaces
Interface status:
 interface wan is offline and tracking is active
 interface wwan is unknown and tracking is active

~# ip route show
default via 192.168.100.254 dev eth0  proto static  src 192.168.100.205  metric 10 
192.168.90.0/24 dev br-lan  proto kernel  scope link  src 192.168.90.254 
192.168.100.0/24 dev eth0  proto static  scope link  metric 10 
192.168.100.254 dev eth0  proto static  scope link  src 192.168.100.205  metric 10 

My suspicion is that when all interfaces are offline, the specific MWAN rules/routes (based on the configured metrics/weights) are deleted. Because the ethernet cable on the wan port is never unplugged, the physical link state never changes and there are no ifdown/ifup events. So the MWAN rules are never reloaded again. Does this make sense?
Interestingly this doesn't happen with MWAN 1.5x and 1.6x versions.

@feckert
Copy link
Member

feckert commented Apr 3, 2017

My suspicion is that when all interfaces are offline, the specific MWAN rules/routes (based on the configured metrics/weights) are deleted. Because the ethernet cable on the wan port is never unplugged, the physical link state never changes and there are no ifdown/ifup events. So the MWAN rules are never reloaded again. Does this make sense?

@joaochainho Yes the rules/routes are deleted but the mwan3track is still running on inteface wwan/wan

  • If the cable is plugged in again (tested on my setup) mwan3track is recognizing the interface after reliability check as up again and i am able to surf over the wan interface.

  • If i enable wwan (plugin usb wwan modem) then mwan3track recognize the interface wwan after reliability check as online again as well.

I have attached my mwa3 config

config policy 'wan_only'
list use_member 'wan_m1_w1'

config policy 'xdsl_only'
list use_member 'xdsl_m2_w1'

config policy 'wwan_only'
list use_member 'wwan_m3_w1'

config member 'wan_m1_w1'
option interface 'wan'
option metric '1'
option weight '1'

config member 'xdsl_m2_w1'
option interface 'xdsl'
option metric '2'
option weight '1'

config rule 'default_rule'
option dest_ip '0.0.0.0/0'
option proto 'all'
option sticky '0'
option use_policy 'wan_xdsl_wwan'

config member 'wwan_m3_w1'
option interface 'wwan'
option metric '3'
option weight '1'

config policy 'wan_xdsl_wwan'
list use_member 'wan_m1_w1'
list use_member 'xdsl_m2_w1'
list use_member 'wwan_m3_w1'
option last_resort 'default'

config interface 'wan'
option enabled '1'
list track_ip '8.8.8.8'
list track_ip '8.8.4.4'
option count '1'
option timeout '2'
option interval '60'
option failure '10'
option recovery '10'
option down '3'
option reliability '1'
option up '3'
option family 'ipv4'
option flush_conntrack 'always'

config interface 'xdsl'
option enabled '1'
list track_ip '8.8.8.8'
list track_ip '8.8.4.4'
option reliability '1'
option count '1'
option timeout '2'
option interval '60'
option failure '10'
option recovery '10'
option down '3'
option family 'ipv4'
option up '3'
option flush_conntrack 'always'

config interface 'wwan'
option enabled '1'
list track_ip '8.8.8.8'
list track_ip '8.8.4.4'
option reliability '1'
option count '1'
option timeout '5'
option interval '60'
option failure '10'
option recovery '10'
option down '3'
option up '3'
option family 'ipv4'
option flush_conntrack 'always'

I have a simpe backup szenario

  1. wan -> all traffic goes over wan if online -> if it goes offline then surfe over xdsl
  2. xdsl -> all traffic goes over xdsl if online -> if it goes offline then surfe over wwan
  3. wwan -> all traffic goes over wwan if online this is the las wan interface -> if this last inerface goes offline as well then i have a problem ;-)

If during backup a higher interface comes online again then this interface will used wwan->wan

@tpham3783
Copy link

tpham3783 commented Apr 3, 2017 via email

@joaochainho
Copy link
Author

Hi @tpham3783 , your patch solved my issue! 😄

@joaochainho
Copy link
Author

hi @feckert , I only now noticed this new option last_resort.
I'll try last_resort = default then.

@joaochainho
Copy link
Author

Hi @feckert , using last_resort = default also solves the problem 👍
However, during the tests I stumbled on another issue - mwan metrics doesn't seem to apply to the traffic originated from the router itself. Default metrics from the interfaces seem to apply instead.
Is this behaviour expected?

@feckert
Copy link
Member

feckert commented Apr 5, 2017

@joaochainho if last_resort is not set to default and no interface is up, then the default table will not run through and the package will be dropped. A improvement would be to add the ping targets to an ipset and do not mangle the packages. As suggested by @tpham3783. The ipset should only contain ip/adresse of the targets per interface.

mwan metrics doesn't seem to apply to the traffic originated from the router itself.

See:
https://wiki.openwrt.org/doc/howto/mwan3
Section:
The routable loopback (self)

@joaochainho
Copy link
Author

HI @feckert , thanks for your feedback. And sorry for missing the wiki info 😄

@tpham3783
Copy link

tpham3783 commented Apr 5, 2017 via email

@feckert
Copy link
Member

feckert commented Apr 7, 2017

@joaochainho I think we could close this issue. I will try to implement a feature that the track_ips will not be mangled on the OUTPUT CHAIN.

@joaochainho
Copy link
Author

I think we could close this issue. I will try to implement a feature that the track_ips will not be mangled on the OUTPUT CHAIN.

Hi @feckert I agree, it can be closed. Thanks for your help and effort.

@hnyman hnyman closed this as completed Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants