Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: network shaping #1280

Merged
merged 2 commits into from Aug 8, 2018
Merged

feat: network shaping #1280

merged 2 commits into from Aug 8, 2018

Conversation

3Hren
Copy link
Member

@3Hren 3Hren commented Aug 7, 2018

This enables network traffic shaping on worker, making it possible to limit network bandwidth for each deal separately.

How it works

Internally this is achieved using linux kernel traffic control mechanism and making friendship with Docker.

At the first attempt we achieved this by using policing, which drops excess packets, throttling TCP window sizes and reducing the overall output rate of affected traffic streams. Overly aggressive burst sizes (which is tricky to set properly) led to excess packet drops and throttle the overall output rate, particularly with TCP-based flows.
All above is relevant to the TBF (token bucket filtering) classless discipline, which is the easiest way to shape network traffic.

An example of iperf3 with limiting ingress to 5Mbit/s and egress to 200Kbit/s using TBF qdisc.

root@da2026107fa8:/# iperf3 -c bouygues.iperf.fr -p 5201 -t10 -R
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[  4] local 172.18.0.2 port 51404 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   687 KBytes  5.63 Mbits/sec
[  4]   1.00-2.00   sec   597 KBytes  4.89 Mbits/sec
[  4]   2.00-3.00   sec   444 KBytes  3.64 Mbits/sec
[  4]   3.00-4.00   sec   725 KBytes  5.94 Mbits/sec
[  4]   4.00-5.00   sec   584 KBytes  4.78 Mbits/sec
[  4]   5.00-6.00   sec   583 KBytes  4.77 Mbits/sec
[  4]   6.00-7.00   sec   584 KBytes  4.78 Mbits/sec
[  4]   7.00-8.00   sec   584 KBytes  4.78 Mbits/sec
[  4]   8.00-9.00   sec   584 KBytes  4.78 Mbits/sec
[  4]   9.00-10.00  sec   581 KBytes  4.76 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  7.01 MBytes  5.88 Mbits/sec  143             sender
[  4]   0.00-10.00  sec  6.06 MBytes  5.09 Mbits/sec                  receiver

iperf Done.
root@da2026107fa8:/# iperf3 -c bouygues.iperf.fr -p 5201 -t10
Connecting to host bouygues.iperf.fr, port 5201
[  4] local 172.18.0.2 port 51408 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   205 KBytes  1.68 Mbits/sec   24   5.66 KBytes
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    9   2.83 KBytes
[  4]   2.00-3.00   sec  79.2 KBytes   649 Kbits/sec   11   5.66 KBytes
[  4]   3.00-4.00   sec  73.5 KBytes   602 Kbits/sec   13   4.24 KBytes
[  4]   4.00-5.00   sec  63.6 KBytes   521 Kbits/sec   13   5.66 KBytes
[  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec   15   4.24 KBytes
[  4]   6.00-7.00   sec  63.6 KBytes   521 Kbits/sec   11   5.66 KBytes
[  4]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec   13   2.83 KBytes
[  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec   12   2.83 KBytes
[  4]   9.00-10.00  sec  66.5 KBytes   544 Kbits/sec   13   2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   551 KBytes   452 Kbits/sec  134             sender
[  4]   0.00-10.00  sec   420 KBytes   344 Kbits/sec                  receiver

iperf Done.


As you see, shaping egress traffic is quite weird. Moreover it's impossible (or quite hard) to limit network for the entire worker or for each container separately.
All that forced us to investigate on classful disciplines.

An alternative approach is using the HTB (hierarchical token bucket) queueing discipline, which is classful and allows to build hierarchical rules for traffic shaping and policing. For egress traffic intermediate functional block (IFB) devices is used which has separate packet queueing.
All this allows both to build hierarchical rules for each packet type, network device, etc; and moreover to restrict traffic for each container and/or for the entire worker.

An example of iperf3 with limiting ingress to 10Mbit/s and egress to 5Mbit/s using HTB.

root@86148dfb9775:/# iperf3 -c bouygues.iperf.fr -p 5201 -t10 -R
Connecting to host bouygues.iperf.fr, port 5201
Reverse mode, remote host bouygues.iperf.fr is sending
[  4] local 172.18.0.2 port 56362 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  1.12 MBytes  9.36 Mbits/sec
[  4]   1.00-2.00   sec  1.11 MBytes  9.29 Mbits/sec
[  4]   2.00-3.00   sec  1.06 MBytes  8.86 Mbits/sec
[  4]   3.00-4.00   sec  1.12 MBytes  9.38 Mbits/sec
[  4]   4.00-5.00   sec  1.14 MBytes  9.54 Mbits/sec
[  4]   5.00-6.00   sec  1.11 MBytes  9.29 Mbits/sec
[  4]   6.00-7.00   sec  1.06 MBytes  8.93 Mbits/sec
[  4]   7.00-8.00   sec  1.02 MBytes  8.57 Mbits/sec
[  4]   8.00-9.00   sec  1.13 MBytes  9.46 Mbits/sec
[  4]   9.00-10.00  sec  1.11 MBytes  9.28 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  11.2 MBytes  9.36 Mbits/sec   56             sender
[  4]   0.00-10.00  sec  11.1 MBytes  9.30 Mbits/sec                  receiver

iperf Done.
root@86148dfb9775:/# iperf3 -c bouygues.iperf.fr -p 5201 -t10
Connecting to host bouygues.iperf.fr, port 5201
[  4] local 172.18.0.2 port 56366 connected to 89.84.1.222 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.15 MBytes  9.68 Mbits/sec    0    100 KBytes
[  4]   1.00-2.00   sec   636 KBytes  5.21 Mbits/sec    6   56.6 KBytes
[  4]   2.00-3.00   sec   573 KBytes  4.69 Mbits/sec    0   62.2 KBytes
[  4]   3.00-4.00   sec   573 KBytes  4.69 Mbits/sec    0   69.3 KBytes
[  4]   4.00-5.00   sec   573 KBytes  4.69 Mbits/sec    0   74.9 KBytes
[  4]   5.00-6.00   sec   573 KBytes  4.69 Mbits/sec    0   80.6 KBytes
[  4]   6.00-7.00   sec   573 KBytes  4.69 Mbits/sec    0   86.3 KBytes
[  4]   7.00-8.00   sec   573 KBytes  4.69 Mbits/sec    0   90.5 KBytes
[  4]   8.00-9.00   sec   509 KBytes  4.17 Mbits/sec    4   74.9 KBytes
[  4]   9.00-10.00  sec   700 KBytes  5.73 Mbits/sec    9   41.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  6.31 MBytes  5.29 Mbits/sec   19             sender
[  4]   0.00-10.00  sec  5.71 MBytes  4.79 Mbits/sec                  receiver

iperf Done.


It is clear for now that both direction shaping works without packed dropping spikes. Epic win.

Workflow

After an ask plan is created the following actions are performed:

  • Create bridge device with its own iptables rules, where all containers for a specific deal will live. This bridge allows tasks to communicate with each-other, but not with tasks spawned within another deal. For that purposes you have to use overlay network drivers.
  • Apply TC rules to that bridge, that will limit ingress traffic. Yep, ingress, because container's egress traffic seems as an ingress traffic for the Docker device.
  • Create IFB virtual device.
  • Mirror egress traffic to that IFB by applying TC filters and create another rules for that traffic.

What is left to do

  • Custom Docker bridges for each deal.
  • Assign deal tasks with their bridges.
  • TBF egress shaping.
  • TBF ingress shaping.
  • Utilize netlink.
  • Collect no longer used networks.
  • Use classful disciplines, like HTB, because it gives more smooth ingress traffic shaping in conjunction with ifb (Intermediate Functional Block) devices.
  • Restore network after worker restart.
  • Do not return to users too much information about exactly what was going on. For example no one cares when the fifth filter was not applied - it's just unable to configure network for whatever reasons.
  • Restrict by flags to allow travis compilation without libnl3 installing.

NOTE: previous PR was imprudently merged with conflicts, so it was reverted, sorry.

@3Hren 3Hren added S: Worker This PR/Issue changes Worker T: feature This PR/Issue adds a new feature P: medium This PR/Issue has normal priority V: minor This PR/Issue required minor version to be bumped labels Aug 7, 2018
nikonov1101
nikonov1101 previously approved these changes Aug 8, 2018
@3Hren 3Hren force-pushed the feat/network-limits branch 2 times, most recently from 552b4f9 to 42975a5 Compare August 8, 2018 15:54
This enables network traffic shaping on worker, making it possible to limit
network bandwidth for each deal separately.

Internally this is achieved using linux kernel traffic control mechanism and
making friendship with Docker.

At the first attempt we achieved this by using policing, which drops excess
packets, throttling TCP window sizes and reducing the overall output rate of
affected traffic streams. Overly aggressive burst sizes (which is tricky to set
properly) led to excess packet drops and throttle the overall output rate,
particularly with TCP-based flows.
All above is relevant to the TBF (token bucket filtering) classless discipline,
which is the easiest way to shape network traffic.

An alternative approach is using the HTB (hierarchical token bucket) queueing
discipline, which is classful and allows to build hierarchical rules for traffic
shaping and policing. For egress traffic intermediate functional block (IFB)
devices is used which has separate packet queueing.
All this allows both to build hierarchical rules for each packet type, network
device, etc; and moreover to restrict traffic for each container and/or for the
entire worker.

It is clear for now that both direction shaping works without packed dropping
spikes. Epic win.

After an ask plan is created the following actions are performed:
- Create bridge device with its own iptables rules, where all containers for
  a specific deal will live. This bridge allows tasks to communicate with
  each-other, but not with tasks spawned within another deal. For that purposes
  you have to use overlay network drivers.
- Apply TC rules to that bridge, that will limit ingress traffic. Yep, ingress,
  because container's egress traffic seems as an ingress traffic for the Docker
  device.
- Create IFB virtual device.
- Mirror egress traffic to that IFB by applying TC filters and create another
  rules for that traffic.
@3Hren 3Hren merged commit 615b9be into master Aug 8, 2018
@3Hren 3Hren deleted the feat/network-limits branch August 8, 2018 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P: medium This PR/Issue has normal priority S: Worker This PR/Issue changes Worker T: feature This PR/Issue adds a new feature V: minor This PR/Issue required minor version to be bumped
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants