Feature: Knob-less QoS with fq_codel/cake and something like OpenWRT's "SQM" #505

Closed
obrienmd opened this Issue Dec 7, 2015 · 27 comments

Projects

None yet

6 participants

@obrienmd

Given the amazing success of OpenWRT with fq_codel and SQM at beating back bufferbloat and providing low latency across flows with having to tune: http://www.bufferbloat.net/projects/codel/wiki

I'd like to move some of my company's pfsense boxes over to a distro that uses something like this. Right now IPFire (being linux-based) is able to do this pretty easily, but I would love to use OPNsense.

This seems seriously non-trivial to do in FreeBSD given the chatter in the pfsense community about this.

@AdSchellevis
OPNsense member

We're using a different system for traffic shaping and QoS (ipfw dummynet), which doesn't contain the codel algorithm.
There is a custom patch available for ALTQ/pf (which is in pfSense), but won't match our codebase.

In ipfw/dummynet there also are some options for scheduling, which are not in our UI at the moment, but which should be a more logical approach in our case.
https://www.freebsd.org/cgi/man.cgi?ipfw%288%29#TRAFFIC%09SHAPER_%28DUMMYNET%29_CONFIGURATION

@obrienmd

Right - Would you give any consideration to fq_codel (not really the same as codel, see the bufferbloat.net link above), or cake? From my experience in all sorts of shaping / QoS systems, they are significantly ahead of the rest of the field in out-of-the-box effectiveness:

https://indico.uknof.org.uk/getFile.py/access?contribId=3&resId=0&materialId=slides&confId=27

@fichtner
OPNsense member

I really don't see ALTQ/CODEL (it will be in FreeBSD 11 courtesy of pfSense) combination lift off as ALTQ is (and most likely will remain) disabled in FreeBSD GENERIC. OpenBSD removed ALTQ as well some time ago.

I've read the page you provided and a bit more on the topic (thanks btw). There seems to be dual-licensed code that could make its way into FreeBSD in another way. The work looks very promising. I also like the zero-config approach, although in theory this shouldn't be a subsystem, it should be a holistic switch that covers all traffic flowing through the box (or an interface). As such it may have side effects with an enabled traffic shaper, but it's better than having to deal with "either this or that, not both" scenarios. Or at least that's how the GUI should handle it, right? :)

@obrienmd

Honestly, I'm not sure - it's dual licensed but from what I've read (I'm no expert on kernels or low-level nets code) the port to FreeBSD is not easy.

With regard to the GUI side, having something like OpenWRT's SQM would be my personal idea for my team's deployments.

@obrienmd

Looks like there is a Comcast-sponsored student working on getting fq_codel into FreeBSD dummynet:

http://lists.freebsd.org/pipermail/freebsd-net/2015-September/043443.html

@fichtner
OPNsense member

Great news indeed, this probably won't make it into FreeBSD 11.0 in time, but I'm sure we can backport or wait for 11.1. The OPNsense traffic shaper code is basically ready for this as is now.

@obrienmd

Very cool. It's hard to overstate just how impressive fq_codel w/ BFQ is in action - I highly recommend spinning up OpenWRT and seeing it in action with SQM limited to 95% of bandwidth.

@fichtner fichtner added the upstream label Feb 16, 2016
@fichtner fichtner added this to the Future milestone Feb 16, 2016
@obrienmd

Wow! I'm impressed they cranked this out that quickly...

@fichtner
OPNsense member

We also have a test kernel based on the latest OPNsense code ;)

In case anyone wants to try it:

# opnsense-update -bkr 16.1.3-aqm && /usr/local/etc/rc.reboot

Here's the doc to operate AQM on the command line...

http://caia.swin.edu.au/freebsd/aqm/patches/README-0.1.txt

@obrienmd

Great job! Seems to work OK, very interested in how it gets integrated into GUI - take a look at OpenWRT sqm in luci, it's super simple and works great.

@dtaht

Applause! Benchmarks wanted. (try using https://github.com/tohojo/flent )

Is the -0.2 patch incorporated yet? That fixed a few problems, notably one with ecn handling.

@fichtner
OPNsense member

@dtaht not yet shipped, but it's in the repo already opnsense/src@fb03383

I will push another test build soon enough. Preliminary compile looked good yesterday

Thanks for the link, will benchmark what I can from here :)

@fichtner fichtner added feature and removed upstream labels Apr 21, 2016
@fichtner
OPNsense member

Test base/kernel with v0.2 is up for amd64:

# opnsense-update -bkr 16.1.9-aqm && /usr/local/etc/rc.reboot
@AdSchellevis AdSchellevis added a commit that referenced this issue Apr 25, 2016
@AdSchellevis AdSchellevis (traffic shaper) add advanced option schedule type for pipe, makes cu…
…rrent default (wf2q+) explicit. related to #505
6fbd2dc
@AdSchellevis
OPNsense member

performed some simple tests to see how it works (using v0.1), thanks @dtaht for the tip of using flent.

All tests performed using the following command:

flent rrul -p all_scaled -l 60 -H hostname -t "Title" -o filename.png

Test 1: ipfw enabled, but not passed through dummynet
bufferbloat_no_codel_no_dummynet

Test 2: dummynet enabled, using default Weighted Fair Queueing (wf2q+)
bufferbloat_no_codel

Test 3: CoDel enabled, using defaults
bufferbloat_codel

Test 4: FQ-CoDel enabled, using defaults
bufferbloat_fq_codel

@fichtner fichtner added a commit that referenced this issue Apr 27, 2016
@AdSchellevis AdSchellevis (traffic shaper) add advanced option schedule type for pipe, makes cu…
…rrent default (wf2q+) explicit. related to #505

(cherry picked from commit 6fbd2dc)
d9cde46
@fichtner fichtner added a commit that referenced this issue Apr 27, 2016
@AdSchellevis AdSchellevis (traffic shaper) add Codel / FQ-CoDel support, #505
(cherry picked from commit 083ca3c)
3e0ee5f
@fichtner
OPNsense member
fichtner commented Apr 27, 2016 edited

Initial work is delivered with 16.1.12 today. I'm going to close this ticket now.

The work will continue on our end, e.g. AQM v0.2 will be merged shortly after a bit more testing.

Feel free to discuss this ticket / its results further and add new tickets for individual improvements and bugs so we can track them independently.

Thank you all for your input, testing and help. :)

@fichtner fichtner closed this Apr 27, 2016
@dtaht

your "codel" result (test 3) doesn't make any sense. You should have seen 10-20ms latency on this test with pure codel. Also, you can turn off log scales when generating test results in flent....

@dtaht

also, @AdSchellevis were you explicitly shaping to 400mbit's or trying to run at line rate? Certainly shaping eats a great deal of cpu, and we don't know how fast BSD boxes can actually forward packets at line rate at all at this point, no matter the aqm/fq technology in play.

(hint, you can produce a comparison test in flent-gui by loading up all the *.flent.gz files and selecting "Data->add other open files) - bar charts, etc....)

PS: If you could stick up your flent.gz files somewhere I could get them, I could do a more full writeup elsewhere (blog.cerowrt.org probalby)

Thx VERY much for showing classic FQ result, also.

@AdSchellevis
OPNsense member

@dtaht I'm not sure why you expect 10-20ms, without codel it was around 10-20ms under stress and dropped to 2ms with codel enabled, maybe I'm missing something or misinterpret the reading.
While testing, I had 2 different pipes (1 up, 1 down) limited to the max line speed (1Gbps), so shaping could certainly have impacted my performance a bit.

I deleted the *.flent.gz files from my machine, the measurements are probably not the best in the world. I can rerun the tests later under approx the same circumstances and send them to you then.
Can you provide me with an email address to send the files (or a link) to?

Thanks for the hint, I will certainly try the flent-gui too.

@dtaht
dtaht commented Apr 28, 2016 edited

From eyeballing the tests you only got 400mbit in both directions on the gbit link, when could crack about 880 in both directions simultaneously theoretically on a switched network, with suitably fast clients/servers driving the test. Your first test without anything in play (again, from eyeballing, there's a bar chart and totals chart in the flent-gui that makes it easier to read) hit 680/600 or so. (which is still well below theoretical) So what you were measuring was loss elsewhere in the stack. Probably. Sure! The end result looks good (I personally will take low jitter and latency all the time at some cost in bandwidth vs big spikes of throughput, high jitter/latency/loss), but...

Try shaping to 100mbit on both sides of the link to see a difference between codel and fq_codel.

@AdSchellevis
OPNsense member

Yesterday I wasn't able to redo some testing, a bit too busy. I will try to run the 100Mbps test like you suggested next week (yes, my measurements where well below max, probably old / slow switches in between, a bit too busy to build a decent test setup ;) ).
If you drop me an email (ad at project domain), I will send you the results next week.

@dtaht
dtaht commented Apr 30, 2016 edited

one result I think you are showing is that fq_codel tends to drop stuff sanely when router cpu is overloaded. ;) I look forward to your results and I dropped you an email a few minutes ago.

@RasoolAlSaadi

I would like to announce that we released Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE) which includes important bugs fixing. I highly recommend to upgrade to this version.

@fichtner
OPNsense member

@RasoolAlSaadi thank you, added via opnsense/src@74aa1a1

@skarekrow

So with the recent 16.1.14 does one just create a pipe with FlowQueue-CoDel as the scheduler and with Enable CoDel checked to use FQ-CoDel? Or does the Enable CoDel checkbox interfere with it?

Thanks for the great work guys!

@AdSchellevis
OPNsense member

@skarekrow you can either use FlowQueue-CoDel or use "Enable Codel" on another scheduling mechanism (for example the default wf2q+). I don't think the checkbox actually does anything on FlowQueue-CoDel

@skarekrow

@AdSchellevis Ah, thank you sir :)

@fichtner fichtner modified the milestone: 16.7, Future Jul 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment