Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor NAT & networking performance #7857

Closed
hustcat opened this issue Sep 3, 2014 · 20 comments
Closed

poor NAT & networking performance #7857

hustcat opened this issue Sep 3, 2014 · 20 comments

Comments

@hustcat
Copy link

@hustcat hustcat commented Sep 3, 2014

I use netperf to test network performance. These are some result:
network packet size Sum Trans Rate/s
no docker 1 742020
Bridge+NAT 1 213721
Bridge only 1 432079
docker host 1 674737

we can see, NAT's performance is very poor, and bridge(only) is also declined. Is there any way to improve performance while maintaining network isolation, such as SR-IOV in KVM?

@unclejack

This comment has been minimized.

Copy link
Contributor

@unclejack unclejack commented Sep 3, 2014

@hustcat How were you benchmarking?

@hustcat

This comment has been minimized.

Copy link
Author

@hustcat hustcat commented Sep 4, 2014

run netserver on one machine, and run netperf on the other 4 machines with 400 processes each machine. This is the client script.

#!/bin/bash
if [ $# -lt 2 ]; then
echo "Usage: $0 <proc_num> <base_port>"
exit 1
fi

num=$1
base=$2
port=$base
i=0
while [ $i -lt $num ]
do
bin/netperf -H 10.x.x.x -p 12865 -l 300 -t TCP_RR -- -r 1,1 -P 0,$port &
i=expr $i + 1
port=expr $i + $base
done

@unclejack unclejack changed the title The NAT performance is poor, Is there any way improve it? poor NAT & networking performance Oct 10, 2014
@hustcat

This comment has been minimized.

Copy link
Author

@hustcat hustcat commented Nov 6, 2014

For bridge only, see #8277, qdisc of veth will become the bottleneck,see here
some test results:
network_mode Sum Trans Rate/s
no docker 742020
bridge only 432079
bridge only(veth txqueuelen=0) 704440

As we can see, bridge only with set veth's txqueuelen to zero, performance loss is small.
But for NAT, conntrack module of kernel become bottleneck, and it seems to be no way to optimize.
Overall, bridge will consume a lot of CPU, MACVLAN is better. However, the kernel is to achieve Non-promisc bridge, see here

@gdm85

This comment has been minimized.

Copy link
Contributor

@gdm85 gdm85 commented Nov 8, 2014

Is there an issue already covering native support of MACVLAN in Docker? I've read how to accomplish it here, but would like to follow the issue that could unlock this feature.

@unclejack

This comment has been minimized.

Copy link
Contributor

@unclejack unclejack commented Nov 21, 2014

@hustcat Have you benchmarked NAT with bridge and veth txqueuelen=0? It would be interesting to see where that would fit in your benchmark above.

@hustcat

This comment has been minimized.

Copy link
Author

@hustcat hustcat commented Nov 23, 2014

@unclejack Yes, I've tested this, the result(243145/s) is better than NAT with default veth's txqueuelen, but is still very poor, because conntrack module(NAT will use it) of kernel become bottleneck.

@oncletom

This comment has been minimized.

Copy link

@oncletom oncletom commented Dec 19, 2014

Have a same issue for a container running in KVM. I am still not sure why, as the other containers connectivity is fine. For some reason I also cannot access this container through localhost: I have to explicitly mention the IP of the docker0 network interface.

@fulltopic

This comment has been minimized.

Copy link

@fulltopic fulltopic commented Jun 18, 2015

I've tested UDP bandwidth by single netper instance with the same configuration as your "No NAT" case, and the result is that container has is about 2/3 of the bandwidth of host-host. Is that reasonable?

Container to Host:
64Byte: 80Mbps
1024Byte: 1251Mbps
8192Byte: 4009Mbps

Bridge to Host:
64Byte: 146.3Mbps
1024Byte: 2239Mbps
8192Byte: 6355Mbps

CPU: 100%
Command: netperf -c -L 172.17.42.131 -H 172.17.42.170 -t UDP_STREAM -l 20 -T12,12 -P0 -- -r 1024,1024

Linux compute2 3.10.74-rt79 #2 SMP PREEMPT RT Fri May 29 15:30:35 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

I had had the txqueuelen = 0 set.

I found the cause, I had not had RPS enabled.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Jul 23, 2015

Is this actionable or is this just a side-effect of using veth?
Also, on your NAT test, are you certain traffic was not routing through the userland proxy?

@unclejack

This comment has been minimized.

Copy link
Contributor

@unclejack unclejack commented Jul 23, 2015

@cpuguy83 This is indeed the kind of performance you get through NAT. It is still a problem because that's the default and some resort to host network to get around this problem.

@priyadarsh

This comment has been minimized.

Copy link

@priyadarsh priyadarsh commented Jun 17, 2016

We are implementing microservices using docker and the poor network performance is something we found out in our performance tests. For the time being, we are using host network instead of default bridge network. However, I am just curious if there is any plan to fix this issue. This ticket is open from the past one year with no updates.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 17, 2016

@priyadarsh How are you using the network?
Most people should not even notice performance issues with the network.

@priyadarsh

This comment has been minimized.

Copy link

@priyadarsh priyadarsh commented Jun 19, 2016

@cpuguy83 Hi. We have deployed a rest-json based micro service as a docker image and things work fine when the response size is in Kbs. However, there are cases where the response size exceeds 3Mb. In such cases, the download time is over a minute. We have tried gzipping the response but to our surprise it took more time. The difference is network latency between host and bridge is very evident with such package size.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 19, 2016

@priyadarsh I'm more interested in how you are accessing these services?
Are they on the same host? By what means are they communicating?

@priyadarsh

This comment has been minimized.

Copy link

@priyadarsh priyadarsh commented Jun 20, 2016

@cpuguy83 The client consuming this service(via http) is running on a different host and not deployed as a docker image.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 20, 2016

@priyadarsh Thank you. I would not expect the bridge interfaces or NAT to give such bad overhead.
Maybe it's something else? MTU?

@tactical-drone

This comment has been minimized.

Copy link

@tactical-drone tactical-drone commented Jun 22, 2016

docker-proxy seems to use a lot of CPU.

Therefor: If your CPU is slow, so will your networking.

I really thought they could have achieved networking with some netfilter trix instead of an executable that sucks CPU power. Unlucky.

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 22, 2016

@pompomJuice docker-proxy is (should) only be used for local traffic, this is to facilitate hairpinning traffic back into the container.
If you are accessing the public host port from within the same host... basically just, don't do this.

@tactical-drone

This comment has been minimized.

Copy link

@tactical-drone tactical-drone commented Jun 23, 2016

Aah, I see.

Thanks @cpuguy83

asfgit pushed a commit to apache/aurora-packaging that referenced this issue Aug 2, 2016
The default bridge network is known to be slow (1) and potentially
flaky (2). Switching to host networking is a desperate attempt to reduce
flaps in our nightly package builds.

(1) moby/moby#7857
(2) moby/moby#11407

Reviewed at https://reviews.apache.org/r/50716/
@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Aug 11, 2016

Docker 1.12 has support for macvlan and ipvlan (l2 and l3) which should give even better performance than bridge networking and does not require nating.

Closing I believe this solves the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.