Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traffic Shaping: Direction-"out"-Rules do not match any traffic! #1900

Closed
SeTk opened this issue Oct 27, 2017 · 72 comments
Closed

Traffic Shaping: Direction-"out"-Rules do not match any traffic! #1900

SeTk opened this issue Oct 27, 2017 · 72 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@SeTk
Copy link

SeTk commented Oct 27, 2017

#Using: OPNsense 17.7.7_1-amd64
with mutliple gateways and gateway-groups for redundancy.
Shared forwarding is enabled.

Problem: I am not able to shape outgoing traffic (e.g. for VoIP).
Issue: Using an direction-out-rule only works with default gateway.
On all other gateway-interfaces, traffic does not match and bypasses the shaper.

In out setup 004_WAN7 is our default gateway.
A rule for traffic leaving this interface works. Outgoing traffic matches and goes through the shaper.
01-defaultgateway

But choosing any other interface such as 0E4_WAN5 does not work.
There is outgoing traffic; but it does not match. It bypasses the shaper.
02-nondefaultgateway

Outgoing traffic flow:
WORKING: LAN-Interface > Firewall-Rules (without route to / gateway = default gateway WAN7) > Traffic-Shaper > WAN7-Gateway-Interface
SHOULD WORK: LAN-Interface > Firewall-Rules (with route to / gateway WAN5 = WAN5) > Traffic-Shaper > WAN5-Gateway-Interface
WHAT HAPPENS: LAN-Interface > Firewall-Rules (with route to / gateway WAN5 = WAN5) > BYPASSES Traffic-Shaper > WAN5-Gateway-Interface

Incoming traffic works in traffic shaper without any problems.

@mimugmail
Copy link
Member

2 Pipes, 2 Queues with 2 different rules for 2 WANs please.

Changing IFs sometimes require are reboot

@SeTk
Copy link
Author

SeTk commented Oct 27, 2017

Sure. It is only for testing purpose the same queue.
Same behaviour with seperated pipes/queue/rules.
Reboots/updates have not solved this issue.

@mimugmail
Copy link
Member

Does changing 0E4_WAN to Default fix this? (Just a test)

@SeTk
Copy link
Author

SeTk commented Oct 27, 2017

Yes, traffic shaper for 0E4_WAN5 is working in this case (if it is the default gateway).
Whether route to / gateway in firewall is set to 0E4_WAN5 explicitly or to default.
But not for the old gateway interface 004_WAN7 any more.

@fichtner fichtner self-assigned this Oct 29, 2017
@fichtner fichtner added the bug Production bug label Oct 29, 2017
@fichtner fichtner added this to the 18.1 milestone Oct 29, 2017
@fichtner
Copy link
Member

Hi guys,

It's perfectly possible that despite shared forwarding not everything is being shared correctly yet. This will take some time to get to the bottom of. I expect we can look at this in December. It will definitely require kernel patching and I'm not available most of November.

Cheers,
Franco

@fichtner
Copy link
Member

PS: @SeTk if you can try this with 11.1 base/kernel applied, shared forwarding was improved, now also includes IPv6 if that matters here... https://forum.opnsense.org/index.php?topic=6257.0

@SeTk
Copy link
Author

SeTk commented Oct 31, 2017

Sorry. I don't want to try 11.1, because both our opnsense machines (in HA configuration) are in 24/7 productive use. You don't recommend to try 11.1 in HA-productive use, do you?

@fichtner
Copy link
Member

We need the data points at one point or another before 18.1 is out to work on this, but I would not recommend this in your setup at the moment.

Theoretically 11.0 and 11.1 are similar in nature so syncing between them should not be a problem. You could also skip the „b“ in these test commands to only update the kernel which gives you even less friction.

After a week or two with no other problem report in the forum it’s worth a try.

@SeTk
Copy link
Author

SeTk commented Oct 31, 2017

Ok, I've upgraded to 11.1, rebooted and tested again.
The issue is still there. :(
Default gateway: Out-Rules matching/working.
Other gateways: Out-Rules don't match/work.

@fichtner
Copy link
Member

fichtner commented Oct 31, 2017 via email

@mimugmail
Copy link
Member

@SeTk For me it's working. I'm currently playing with the Traffic Shaper and build a small test bed. I have 2 WANs, one static with 100Mbit (default) and one with a ADSL router in front. When I set a firewall rule to send traffic vom LAN_X via ADSL it's perfectly shaped like I set the values in the pipe's.

But what I experienced was, that CoDel doesn't do what I want ... I can shape to exact values, but ping times are always bad (also with default gateway).

@SeTk
Copy link
Author

SeTk commented Nov 19, 2017

@mimugmail Ok, let's compare oure configurations for upload traffic shaper.

WAN1 is default.

Pipes:

  • WAN1-Upload-Pipe: I've configured (other settings default)

a fixed bandwidth 5 Mbit/s and
a mask "source"

  • WAN2-Upload-Pipe: Same as WAN1, but with a fixed bandwidth 20 Mbit/s
    Queues not configures to keep test simple.

Rules:

  • WAN1-Upload-Rule: I've configured (other settings default)

sequence 11
interface WAN1 (interface 2 none)
direction "out"
target "WAN1-Upload-Pipe"

  • WAN1-Upload-Rule: Same as WAN1 for interface WAN2 and target "WAN2-Upload-Pipe"

To keep sure, configuration is used, I've reseted traffic shaper.
In status than I can see: All upload is handled in WAN1-Upload-Pipe.

Testing from VLAN with WAN1 gateway or default gateway configured: Upload is limited to 5 Mbit/s.
Testing from VLAN with WAN2 gateway configured: Upload is limited to 5 Mbit/s.

Change the setup, so that WAN2 is default gateway allows 20 Mbit/s upload in both cases.
Extend the test with download-shaping (direction-"in"-rules) shows download traffic is assignes the right pipe in bothe cases.

Could you attach screenshots of your pipes and rules in advanced mode?

Thank You for your support!

@mimugmail
Copy link
Member

@SeTk Hm, in my setup I only tested up/down at same speeds .. now I tweaked the values and get the same behaviour as yours.

WAN1 (d) UP works
WAN1 (d) DOWN works
WAN2 UP fails (used PIPE WAN2 DOWN)
WAN2 DOWN works

@fichtner I'll backup the config and keep this lab alive if you want to do some testing after vacation.

@SeTk
Copy link
Author

SeTk commented Nov 20, 2017

Thanks. :)
We don't have a lab enviroment, so a pre-alpha version isn't a goot idea for us.
But after that I like to help with testing too.

@patcsy88
Copy link

Hi @SeTk @mimugmail ,

I too am observing the same problem and this is on 17,7,8,,,shaper is not restricting the traffic per the bandwidth set. I am open to test this as i have decoupled my HA

@namezero111111
Copy link
Contributor

namezero111111 commented Feb 18, 2018

I am experiencing the exact same issue:

WAN1 (d) UP works
WAN1 (d) DOWN works
WAN2 UP fails (used PIPE WAN1 UP).
WAN2 DOWN works

I.e.: In this example, queue 10008 should have been used (em4), WAN2, and not 10006 (em2, WAN1)
60021 0 0 queue 10008 ip from 192.168.0.0/16 to any via em4
60022 94 15715 queue 10013 ip from any to 192.168.0.0/16 via em4
60023 153 16887 queue 10006 ip from 192.168.0.0/16 to any via em2
60024 0 0 queue 10011 ip from any to 192.168.0.0/16 via em2

Disabling Shared forwarding results in no UP queue (10006, 10008) being used, but down queues work.

Has there been any more tracking down going on on this issue? Something that I could try?

I was thinking about defining no default gateway, but then the policy based routes return destination unreachable for some reason!?

@fichtner fichtner removed this from the 18.7 milestone Feb 27, 2018
@namezero111111
Copy link
Contributor

I would like to investigate this further since it's blocking a major upgrade (from pfsense 2.0.1); is there any starting point in which direction to look?

@namezero111111
Copy link
Contributor

I have compared sources from pfsense (where this works) to opnsense.

In opnsense, the function pf_route was split into pf_route and pf_route_shared depending on the sysctl value for shared forwarding.
PFSense has not done this, but has pf_route differences (2.4) that have the following comment:

Send it out since it came from state recorded ifp(rt_addr).
Routing table lookup might have chosen not correct interface!

There is also much more "meat" to r->rt being flagged PF_ROUTETO .

I am not enough pf expert to determine whether this is relevant or not, and the code is mostly undocumented, but would that be something to start looking at?

@fichtner
Copy link
Member

fichtner commented Mar 4, 2018 via email

@namezero111111
Copy link
Contributor

Thanks for the answer.
With shared forwarding off, upload (outgoing traffic) doesn't enter any queue at all, incoming traffic still does.

That's why i'm suspecting somehow a wrong interface index or something.

@fichtner
Copy link
Member

fichtner commented Mar 4, 2018

Baseline is important still, see shaper / limiter question.

We wrote shared forwarding because ipfw and pf in FreeBSD do not go together well. There could be further bugs, but it's essentially non-reproducible in pfSense or FreeBSD because of that....

@namezero111111
Copy link
Contributor

Not an expert at all at pf; first time looking at its source and trying to figure anything out, but just reading for now because I'm not sure exactly what to be looking for.

@fichtner
Copy link
Member

fichtner commented Mar 4, 2018

Still, simple question: in pfSense did you compare against shaper or limiter?

@mimugmail
Copy link
Member

kernel -out (out WAN2):

      Cookie: lan10.1522139567.392524.52c4b88a6019
      TCP MSS: 1448 (default)
[  4] local 10.10.10.10 port 52258 connected to 62.75.151.240 port 5000
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 30 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  98.8 KBytes   809 Kbits/sec   25   11.2 KBytes
[  4]   1.00-2.00   sec  92.8 KBytes   761 Kbits/sec    0   15.5 KBytes
[  4]   2.00-3.00   sec  61.9 KBytes   507 Kbits/sec    0   18.3 KBytes
[  4]   3.00-4.00   sec   155 KBytes  1.27 Mbits/sec    0   32.3 KBytes
[  4]   4.00-5.00   sec   155 KBytes  1.27 Mbits/sec    0   52.0 KBytes
[  4]   5.00-6.00   sec   205 KBytes  1.68 Mbits/sec    0   80.2 KBytes
[  4]   6.00-7.00   sec   180 KBytes  1.47 Mbits/sec    0    105 KBytes
[  4]   7.00-8.00   sec  54.8 KBytes   449 Kbits/sec   18   53.4 KBytes
[  4]   8.00-9.00   sec   122 KBytes  1.00 Mbits/sec    6   59.1 KBytes
[  4]   9.00-10.00  sec  67.5 KBytes   553 Kbits/sec    0   74.5 KBytes
[  4]  10.00-11.00  sec  63.3 KBytes   518 Kbits/sec    0   87.2 KBytes
[  4]  11.00-12.00  sec  0.00 Bytes  0.00 bits/sec    8   64.7 KBytes
[  4]  12.00-13.00  sec  63.3 KBytes   518 Kbits/sec    4   61.9 KBytes
[  4]  13.00-14.00  sec  63.3 KBytes   518 Kbits/sec    0   66.1 KBytes
[  4]  14.00-15.00  sec   127 KBytes  1.04 Mbits/sec    0   71.7 KBytes
[  4]  15.00-16.00  sec  63.3 KBytes   518 Kbits/sec    0   74.5 KBytes
[  4]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec    1   66.1 KBytes
[  4]  17.00-18.00  sec  63.3 KBytes   518 Kbits/sec    2   52.0 KBytes
[  4]  18.00-19.00  sec  63.3 KBytes   518 Kbits/sec    0   54.8 KBytes
[  4]  19.00-20.00  sec  63.3 KBytes   518 Kbits/sec    0   60.5 KBytes
[  4]  20.00-21.00  sec  63.3 KBytes   518 Kbits/sec    0   61.9 KBytes
[  4]  21.00-22.00  sec  63.3 KBytes   518 Kbits/sec    0   63.3 KBytes
[  4]  22.00-23.00  sec  63.3 KBytes   518 Kbits/sec    0   64.7 KBytes
[  4]  23.00-24.00  sec  63.3 KBytes   518 Kbits/sec    0   70.3 KBytes
[  4]  24.00-25.00  sec  63.3 KBytes   518 Kbits/sec    0   81.6 KBytes
[  4]  25.00-26.00  sec  0.00 Bytes  0.00 bits/sec    4   77.3 KBytes
[  4]  26.00-27.00  sec  63.3 KBytes   518 Kbits/sec   10   61.9 KBytes
[  4]  27.00-28.00  sec  63.3 KBytes   518 Kbits/sec    0   56.2 KBytes
[  4]  28.00-29.00  sec   127 KBytes  1.04 Mbits/sec    0   43.6 KBytes
[  4]  29.00-30.00  sec  63.3 KBytes   518 Kbits/sec    0   49.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-30.00  sec  2.34 MBytes   654 Kbits/sec   78             sender
[  4]   0.00-30.00  sec  1.74 MBytes   487 Kbits/sec                  receiver
CPU Utilization: local/sender 0.3% (0.1%u/0.2%s), remote/receiver 0.1% (0.1%u/0.1%s)

@fichtner
Copy link
Member

Yay. 👍

(more feedback welcome)

fichtner added a commit to opnsense/src that referenced this issue Mar 27, 2018
Many thanks to everyone testing, prodding and pushing for this!  :)

PR: opnsense/core#1900
@fichtner
Copy link
Member

Final patch is opnsense/src@d1cb3383d if all feedback is good

@namezero111111
Copy link
Contributor

namezero111111 commented Mar 27, 2018

I can confirm both patches work, and although I can't comment on the smoothness of the flips scientifically, the out patch "behaves" better apparently.

Interesting how the fixed code is the exact same but patches elsewhere (save for the DIR_OUT in the final commit)

Congratulations on a job well done :} Excellent detective work :D

Will be see this in head soon?

@fichtner
Copy link
Member

Symmetry issue... fixed pf in that area later, but ipfw needs the same obviously

The extra explaining made this abundantly clear so thanks for asking! :)

@namezero111111
Copy link
Contributor

namezero111111 commented Mar 27, 2018

Any time; especially after digging through that code I have plenty of questions :D.
In the end it's always "obvious" eh?
Either way I have much respect for people qualified enough to touch that code :}

@mimugmail
Copy link
Member

It makes so much fun woking on this project, really enjoy the progress 👍

@fichtner
Copy link
Member

Alright so I don't know when we ship this, possibly with the next round of OS security fixes (FreeBSD advisories). If you want to keep this kernel, you can lock it under System: Firmware: Packages, but do not forget to unlock once this really ships, otherwise you'll miss out on security patches.

Thanks everyone, time to close this ❤️

@fichtner fichtner added this to the 18.7 milestone Mar 27, 2018
@fichtner
Copy link
Member

Glancing over this today it seems that IPv4 works now but not IPv6, so here's another kernel:

# opnsense-update -kr 18.1.5-ipfw

Cheers,
Franco

@fichtner
Copy link
Member

(confirmation that IPv4 works as before is probably enough at this point)

@mimugmail
Copy link
Member

Works as expected with v4

@mimugmail
Copy link
Member

btw. enabling CoDel for the W2FQ sched makes the 500kb upload WAY more smother .. 👍

@namezero111111
Copy link
Contributor

We don't have IPv6 yet so I'm unable to check for that, but the new kernel (18.1.5-ipfw) also works for IPv4 here :}

@namezero111111
Copy link
Contributor

Ok interesting. Why rshift IPV6_VERSION?

@fichtner
Copy link
Member

@namezero111111 copied from the same function a number of lines below

@namezero111111
Copy link
Contributor

namezero111111 commented Mar 29, 2018

@fichtner Is it possible the patch messes with NAT? Or maybe there is another issue?

It appears as though with a gateway group, the wrong NAT IP is used.

Example:
DMZ1 192.168.4.131 with CARP 192.168.4.135
DMZ2 192.168.4.141 with CARP 192.168.4.150

DMZ1 and DMZ2 gateways are in a gateway group, patched kernel applied.
NAT 192.168.0.0/16 on DMZ1 via 192.168.4.135
NAT 192.168.0.0/16 on DMZ2 via 192.168.4.150

Pings time out/website won't load, etc...
tcpdump capture on DMZ2 show packages originating from 192.168.4.135

When NAT rules are reversed in priority,
tcpdump capture on DMZ1 show packages originating from 192.168.4.150

it seems like something else is affected here by a similar issue.

@namezero111111
Copy link
Contributor

namezero111111 commented Mar 30, 2018

Excerpt from rules.debug looks good:

nat on em3 inet proto tcp from $mx2 to any port 25 -> 192.168.4.150 port 1024:65535 # MX static outbound mapping
no nat on em2 inet from (self) to any # Do not NAT self
no nat on em3 inet from (self) to any # Do not NAT self
nat on em2 inet from 192.168.0.0/16 to any -> 192.168.4.135 port 1024:65535
nat on em3 inet from 192.168.0.0/16 to any -> 192.168.4.150 port 1024:65535

With shared forwarding off the problem does not occur. Turning it back on reproduces it again.

@mimugmail
Copy link
Member

18.1.5 or latest master?

@namezero111111
Copy link
Contributor

Yes, unfortunately on both :/
Do you have any reference?

fichtner added a commit to opnsense/src that referenced this issue Mar 31, 2018
PR: opnsense/core#1900

(cherry picked from commit d1cb338)
(cherry picked from commit d59de14)
(cherry picked from commit 529bbe6)
@namezero111111
Copy link
Contributor

I mean why he was asking about the latest version.
I will test the "stock" kernel later to see if this is related, but it was first noticed since the patch.
@mimugmail Did you try with NAT?

@fichtner
Copy link
Member

fichtner commented Apr 3, 2018

I'm not sure the patch does anything here. NAT is done in pf, not ipfw. It means you'll likely see the issue on a stock 181.5. And I'm not even sure since you said gateway group and NAT it heavily depends on your WAN state, monitoring and load balancing rules.

@mimugmail
Copy link
Member

@namezero111111 I tested with NAT on both WANs, sure! Today I'm in the office, if you need more tests just put it here ..

@namezero111111
Copy link
Contributor

Ok, this seems solely related to "Disable State Killing on Gateway Failure". When unchecked (i.e. state killing enabled) the behavior occurs.
When disabled (checked) everything seems fine ("stock" kernel or patched).
Please disregard, this is not related to the patch at all.

@mimugmail
Copy link
Member

Best news today :)

@fichtner
Copy link
Member

fichtner commented Apr 3, 2018

:)

@dannykorpan
Copy link

dannykorpan commented Jan 10, 2021

Same issue occurs again with OPNsense 20.7.7_1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

7 participants