Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mwan3: packets are not mangled on mwan3 restart. #13277

Closed
misanthropos opened this issue Sep 2, 2020 · 38 comments
Closed

mwan3: packets are not mangled on mwan3 restart. #13277

misanthropos opened this issue Sep 2, 2020 · 38 comments
Assignees

Comments

@misanthropos
Copy link

misanthropos commented Sep 2, 2020

Maintainer: @feckert
Environment: (ZyXEL NBG6817, OpenWrt SNAPSHOT r14365-46abcb3ade / LuCI Master git-20.245.35471-b32eb12)

mwan3 - 2.9.0-1

Description:
I am not sure if this actually mwan3 related or a problem with iptables-save/restore, but mwan3 wont route on starting the router and/or restarting mwan3.
It will however start to work if I change a firewall rule aftwards.

From the logs:

Wed Sep  2 14:23:15 2020 user.notice mwan3-hotplug[18392]: Execute ifup event on interface VPN3 (tun3)
Wed Sep  2 14:23:15 2020 user.notice mwan3-hotplug[18326]: Started tracker [18527] on interface VPN0 (tun0)
Wed Sep  2 14:23:15 2020 user.notice mwan3-hotplug[18340]: Started tracker [18545] on interface VPN1 (tun1)
Wed Sep  2 14:23:15 2020 user.notice mwan3-hotplug[18363]: Started tracker [18591] on interface VPN2 (tun2)
Wed Sep  2 14:23:15 2020 user.notice mwan3-hotplug[18392]: Started tracker [18611] on interface VPN3 (tun3)
Wed Sep  2 14:23:16 2020 user.err mwan3[18119]: set_user_rules: ip6tables-restore v1.8.4 (legacy): host/network `172.20.1.22' not found Error occurred at line: 3 Try `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.

another example:

Wed Sep  2 14:22:16 2020 user.notice mwan3-hotplug[17508]: Started tracker [17728] on interface VPN3 (tun3)
Wed Sep  2 14:22:17 2020 user.err mwan3[17241]: set_user_rules: ip6tables-restore v1.8.4 (legacy): The protocol family of set vpnbypass is IPv4, which is not applicable.  Error occurred at line: 3 Try `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.

changing

"command -v /usr/sbin/ip6tables >/dev/null" to a command like ip7tables (does not exist) makes it work for me.
in /lib/mwan3/mwan3.sh

@feckert
Copy link
Member

feckert commented Sep 2, 2020

The check you describe looks different in the official source?

command -v ip6tables > /dev/null

@feckert feckert self-assigned this Sep 2, 2020
@misanthropos
Copy link
Author

my bad - i did not copy and paste it but wrote it from what I recalled.
The symptom however is the same. If I effectively disable the ip6 check it will work (because no ip6 commands will be issued). I am not sure however why ip6tables restore wants to handle ip4 addresses.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 2, 2020

You need to mark the rules as family "ipv4" rather than the default "any". A more graceful way of handling these errors is coming here:

fc50267

@misanthropos
Copy link
Author

@aaronjg I tried your patch, but it has other issues so I stuck to my crude "solution" for now.

examples

rmdir: '': No such file or directory
sh: -gt: argument expected
/usr/sbin/mwan3: line 213: rule: not found

@misanthropos
Copy link
Author

short update: "disabling" ipv6 just prevents spitting out errors from ip6tables-restore... the rules won`t apply. Only after triggering the firewall again routing (marking etc) works.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 2, 2020

That commit is part of a larger patch set. Possible that something that was needed in that patch was included in an earlier or later patch. Still a WIP, I will look to try to better separate these patches.

Setting family to ipv4 on the rules should also fix the issue.

@misanthropos
Copy link
Author

misanthropos commented Sep 2, 2020

There is the catch: Setting what to ipv4 exactly? All 4 VPN interfaces are set to ipv4 and have been that way for quite some time now. Same goes for the WAN interface. And I can not find any other place in mwan3 to set the family.

ok - added ipv4 family to the rules with an editor. no errors spitting BUT:

now the rules are not working at all.. meaning every packet leaves the default route.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 2, 2020

You can set the option in the rule section. It may not be in Luci yet though.

Anyway, if you are not using ipv6 at all, the warning is harmless. It was always happening, but it used to be that all iptables errors were redirected to /dev/null, which as you can imagine made debugging rules quite difficult.

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Sep 3, 2020

The commit is here if you are interested, as @aaronjg mentioned, iptables/ip6tables output was being sent to /dev/null, so this will have been happening for a while, just mwan3 would have masked the errors, but around version 2.8.9, this changed.

702a104#diff-ccaf6dffccf2fc134d63b5c33d5ee8e1

Sounds like the same issue here: #13003. LuCI is currently missing the family option when configuring rules via luci-app-mwan3. There is an open PR currently to fix this: openwrt/luci#4349 and also add in some dependency logic when using certain rule options.

I updated the documentation recently to add a note about this issue here: https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3#rule_configuration.

In the meantime, any rules using IPv4 specific src/dest values, should look like this, you will need to edit /etc/config/mwan3 outside of LuCI to prevent the error:

config rule 'my_ipv4_rule'
      option src_ip '172.20.1.22'
      option family 'ipv4'
      option use_policy 'something'

Really, it's all about option family 'ipv4'. That's it.

@misanthropos
Copy link
Author

Well - I did all that. Every rule has a policy and family set.
See mwan3 config
And I get no errors. However mangling will not occur until i mess with the firewall. (restarting firewall does not work). But changing something and save/apply will work.
Again: Only after that I will see packet mangling in logger and packets are actually routed correctly. So somehow by restarting mwan3 the mangling will not occur.

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Sep 3, 2020

Hmmm OK, that sounds more like something else. I think there's potentially two separate issues. The iptables error you reported shouldn't cause what you are describing.

I can see from your mwan3 config you've got what looks like one physical WAN and then four logical VPN interfaces?

Does this happen on 2.8.12? The iptables error will, but the behaviour with the firewall sounds like something else.

@misanthropos
Copy link
Author

misanthropos commented Sep 3, 2020

short update - after upgrading some packages this morning, mangling does not apply anymore at all...

yes. one physical 4 logical VPNs

on 2.8.12 I had not such issues.

Well.. I went a step back to 2.8.16 and have again some old issue with losing all connections to the router on restarting mwan3 - forcing me to reboot it... oh well... at least that is gone with the current version :) -
BUT with 2.6.18-1-1 routing through vpn works (but logging does not).

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Sep 3, 2020

There was some refactoring done between 2.8.12 and 2.9.0 but mainly for performance and scalability. However it might be worth @aaronjg chiming on that, for a possible regression.

The current advice is to avoid snapshot builds if possible, there is a current PR open that is trying to make mwan3 routing more compatible with snapshot builds:

#13169

If you fancy it, could you try out that PR. You'll need to compile it with the SDK as it's got a helper library now but it would be interesting to see if it helps your configuration. More testers is always helpful!

For now, reverting back to 2.8.12 seems to be a temp workaround for now.

@misanthropos
Copy link
Author

Thanks @jamesmacwhite , I might try out that PR!

@aaronjg
Copy link
Contributor

aaronjg commented Sep 3, 2020

Hmm. Possible there was a regression in 2.9.0. 2.8.x was flushing and recreating the iptables a lot, which slowed things down, in 2.9.0 it does it only as needed. I had no issues with this, but perhaps something was missed? If the issue persists in 2.10.x, please let me know.

@jamesmacwhite
Copy link
Contributor

@aaronjg Can't say I've had any issues either, but I guess with different configurations you can never be entirely sure. I think testing 2.9.10 would be good if you can @misanthropos. We know it has helped fixed an issue with Wireguard for another mwan3 user on snapshot and as you are using multiple logical interfaces as well it might help your case as well.

Generally the changes @aaronjg has made in that PR are more complaint with OpenWrt routing generally, so I'd be hopeful it will potentially help your case.

@misanthropos
Copy link
Author

OK!
After compiling with @aaronjg PR I still had the same issue. mwan3 worked, but no packets were mangled.
What I tried: some files were owned by network:102 others by root. After chowning all config files to network:102 and restarting mwan3 routing worked....

@aaronjg
Copy link
Contributor

aaronjg commented Sep 3, 2020

Strange. Is 101:102 the uid:guid that you compiled with? Perhaps something is wrong in the mwan3 install script.

@misanthropos
Copy link
Author

misanthropos commented Sep 4, 2020

I compiled with my normal user account here a home and that is not 101 and I am not part of a group with ID 102. Maybe it was coincidence? Could it be a race condition and I got lucky? Every process is running as root except for dnsmasq atm.

@misanthropos misanthropos changed the title mwan3: can not apply its rules to route packets on ip6tables-restore error. mwan3: packets are not mangled on mwan3 restart. Sep 4, 2020
@misanthropos
Copy link
Author

what actually did the trick is: restarting firewall then restarting mwan3

@aaronjg
Copy link
Contributor

aaronjg commented Sep 4, 2020

Does it work on a clean reboot? Mwan3 startup is now much faster than it used to be, so perhaps there is a race condition between it and the firewall script.

@misanthropos
Copy link
Author

No. Normally I have to restart OpenVPN - then firewall and mwan3.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 4, 2020

We should probably move this to later in the startup process. Though I have an R7800, which has similar dual core 1.7Ghz processors, and haven't had an issue. If you change the "START=19" line in /etc/init.d/mwan3 to something like "START=20" or "START=25", does that fix the problem?

@misanthropos
Copy link
Author

it seems OpenVPN does not need a restart anymore. But I still have to restart firewall then mwan3.
START=25 btw.

@misanthropos
Copy link
Author

misanthropos commented Sep 4, 2020

@aaronjg I read you are testing your patch with current master - I built with 19.07.3 - and I have made a new one with current master. Which one should I try with (what helps you more).

@aaronjg
Copy link
Contributor

aaronjg commented Sep 6, 2020

@misanthropos, testing with the snapshot build on the 5.4.x kernel would be very helpful. There were some issues with mwan3 <=2.9 on the new kernel, and more testing to make sure they are resolved would be appreciated.

@misanthropos
Copy link
Author

@aaronjg I tried... here is the thing: I checked out master / changed in feeds the packages to your changes and out comes
an image with 4.14.180 - and I have no clue where to set the kernel version.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 6, 2020

Strange. What commit are you on from openwrt/master? It looks like it should build LINUX_5_4 for your device:

 Symbol: LINUX_5_4 [=y]                                                            
  │ Type  : bool                                                                      
  │ Defined at tmp/.config-target.in:206406                                           
  │ Selected by [y]: 
  │   - TARGET_ipq806x_generic [=y] && <choice> && TARGET_ipq806x [=y] && !TESTING_KERNEL [=n] 

@misanthropos
Copy link
Author

misanthropos commented Sep 6, 2020

commit 2c2fcbd (HEAD -> master, origin/master, origin/HEAD, misan/master)

could it be the image was rejected and booted with the one it had?

Must be it.. extracted the kernel from sysupgrade image and it is 5.4.63

sysupgrade does not work...
-T no problem
tried with -n. System goes down.. and cames back as it was.

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Sep 7, 2020

It's possible. If you install luci-app-advanced-reboot and note what partition is booted (active). On dual flash routers, it will alternate on every successful flash/sysupgrade as a way to prevent bricking or a no boot scenarios, if you always have one good partition. If you reboot and the active partition hasn't changed, I don't think the upgrade worked.

@aaronjg
Copy link
Contributor

aaronjg commented Sep 10, 2020

Were you able to boot the master branch?

Here is another resource on using the dual partitions on your router:
https://forum.openwrt.org/t/zyxel-nbg6817-problems-flashing-openwrt/62939/5

@misanthropos
Copy link
Author

@jamesmacwhite - thank you for that hint. I have added that package - it might come in handy from here on.

@aaronjg - I have built an image from current master with your PR #13277 for packages. The problem is still there. After a reboot I have to restart firewall and mwan3 to get packets mangled and routed through the right VPN interfaces.

kernel: 5.4.71, mwan3 - 2.9.0-1

@aaronjg
Copy link
Contributor

aaronjg commented Oct 17, 2020

Can you please share the ouput of
iptables -t mangle -S
after a restart when mwan3 is not working and again you have restarted the services and mawn3 is working?

@misanthropos
Copy link
Author

misanthropos commented Oct 17, 2020

Sure thing:

mangles-after-boot.txt
mangles-after-1-firewall.txt
mangles-after-2-mwan3.txt

and

diff mangles-after-boot.txt mangles-after-2-mwan3.txt
6d5
< -N VPNBYPASS
23d21
< -A PREROUTING -m mark --mark 0x0/0xff0000 -g VPNBYPASS
37d34
< -A VPNBYPASS -m set --match-set vpnbypass dst -j MARK --set-xmark 0x10000/0xff0000
58c55,56
< -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN4
---
> -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN0
> -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN1
61,62c59
< -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN1
< -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN0
---
> -A mwan3_ifaces_in -m mark --mark 0x0/0x3f00 -j mwan3_iface_in_VPN4
69a67,74
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_wwan_only
> -A mwan3_rules -m set --match-set vpnbypass dst -m mark --mark 0x0/0x3f00 -j mwan3_policy_wwan_only
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_vpnfixed
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_vpnfixedde
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_vpnfixed
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_vpnfixedde
> -A mwan3_rules -s xxx.xxx.xxx.xxx/32 -m mark --mark 0x0/0x3f00 -j mwan3_policy_vpnbalanced
> -A mwan3_rules -m mark --mark 0x0/0x3f00 -j mwan3_policy_wwan_only

Obviously the host marking rules are not set after reboot - interestingly the VPNx marks are gone but that does not matter.
The VPNBYPASS for hosts still works (but that might be related to a rule in mwan3 to use IPSET)

@aaronjg
Copy link
Contributor

aaronjg commented Oct 17, 2020

I see. It appears that mwan3 may be trying to start before the vpnbypass set is created. Can you move the creation of that ipset earlier in the boot process?

Also what is creating the "VPNBYPASS" chain? That appears to be in at first, and then lost after the firewall restart process.

@misanthropos
Copy link
Author

misanthropos commented Oct 17, 2020

I forgot to tell that restarting just mwan3 does not help.

mangles-after-1-mwan3.txt

VPNBYPASS is set by the package vpnbypass. This way one can use routing packages based on e.g. domains. And does still work because I have a mawn3 rule based on that ipset. So mangling is not needed.

I have tried before with or without that package enabled. It makes no difference.

I have moved vpnbypass before mwan3 - the result is exactly the same.

mangles-after-boot-vpnbypass-moved-before-mwan3.txt
mangles-1-after-firewall-vpnbypass-moved-before-mwan3.txt
mangles-2-after-mwan3-vpnbypass-moved-before-mwan3.txt

@aaronjg
Copy link
Contributor

aaronjg commented Oct 17, 2020

It appears the issue is with the vpnbypass package, and when you restart the firewall, you are removing the entries for that.

It looks like the difference between "mangles-after-1-mwan3.txt" which is not working and "mangles-after-2-mwan3.txt" which is working is that restarting the firewall has cleared the following rules:

-A PREROUTING -m mark --mark 0x0/0xff0000 -g VPNBYPASS
-A VPNBYPASS -m set --match-set vpnbypass dst -j MARK --set-xmark 0x10000/0xff0000

The -g VPNBYPASS causes the rest of the mwan3 rules to be skipped.

It appears the VPNBYPASS package is incompatible with mwan3. I just read up on the vpnbypass package. Since you already have an mwan3 rule for the ipset, You can replicate the functionality by adding the following rules to your firewall:

config  ipset
        option  name            'vpnbypass_4'
        option  match           'src_net'
        option  storage         'hash'
        option  enabled         '1'

config  ipset
        option  name            'vpnbypass_6'
        option  match           'src_net'
        option  family          'ipv6'
        option  storage         'hash'
        option  enabled         '1'

config  ipset
        option  name            'vpnbypass'
        option  match           'set'
        option  storage         'list'
        option  enabled         '1'
        list    entry           'vpnbypass_4'
        list    entry           'vpnbypass_6'

And the following rules to your dnsmasq configuration (dnsmasq-full required)

list ipset '/example.com/example2.com/vpnbypass_6,vpnbypass_4'

@misanthropos
Copy link
Author

Man @aaronjg - BIG THANKS for that!!

I disabled vpnbypass with the effect that mwan3 would not work at all... (which I had tried in the past with the same effect). Only this time I checked logread and saw that mwan3 spat out an error because of it trying to apply the now missing vpnbypass set.
Removing that rule from mwan3 made it work!!!!

Kudos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants