-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Measurement #4
Comments
Are you using the default make file settings, which compiles libuinet with no optimization at all, or have you modified them? |
I have not yet modified them. I am going to try them next.
Can you also please suggest if there is some other optimization which I can try to get the better numbers? What about the equivalent sysctl changes. Also, in the above program, my server runs with libuinet and netmap in a VM and my clients are on the base machine. Thanks... |
Also, if I increase a the client requests, I see the accept failing . The numbers printed here are from my test program which prints the number of connections are handled in that one second. this 1 sec : connections 3379 I think it is because of slow pick. In this case how can we increase the queue size? |
I am not seeing the accept failed errors after I creased the value of MAXCON in sys/sys/socket.h. now, I have compiled libuinet and sample application with -O3 flag and running with nice (19), and the maximum CPS I was able to achieve is 10K connections. which is less than when compared with kernel space TCP/IP application. top output: top - 05:34:28 up 38 min, 3 users, load average: 1.30, 1.16, 1.04 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND My observation is that nm_tx thread has reduced the CPU usage in -O3 run, while nm_rx thread did not. I also printed various constants used in this program, I observed that even is maxsocket number is good, maxfiles number is very low. uinet starting: cpus=1, nmbclusters=262144 |
Peter, Thank you for all of the detailed information on what results you are getting and how you are getting them. I am really busy, but I am working my way towards reproducing what you are seeing and will get back to you. In the meantime, one thing you could try if you are up to it is batch-processing accepts in accept_cb. You can look at accept_cb() in bin/passive.c for an example. If you ignore all of the references to peer sockets and connections there, I think the structure is pretty straightforward to transfer to your test program. Batching accepts should reduce the total event loop overhead under high connection rates. |
Hi Patrick, From the callgrind analysis we figured out that we are taking considerable time when the server closes the connection. We thought let the server not close the connection immediately and see if the performance improves. But, looks like we can't keep more than 65K con-current connections. Not sure if this is limited by some constants/defines. Do you remember what can we change to increase con-current connection limits. I have sent you the callgrind screen shot, through email which will help you in where the time is being spent. Please let us know. Thanks, Peter |
What command line are you using on the server side, and what are you using to drive traffic? The first thing I am thinking of given the apparent 65k limit is exhaustion of the 16-bit port space on the client side. The only limit on the libuinet side should be the maximum number of sockets configured via the second parameter to uinet_init. This limit is really an upper bound of the size of the pool used for socket context - making it a huge number at init time will not result in any immediate additional allocation, it will just allow the pool to grow that large if required during operation. If the issue is that you are hitting the limit due to connections being in time-wait, increasing that parameter should relieve the issue. libuinet has been tested with up to 1 million concurrent listen sockets plus 1 million concurrent active sockets, which requires a suitably large value for the second parameter to uinet_init(), and also a suitable multiplicity of available {server_IP, server_port, client_IP, client_port} tuples. |
I was using one client machine, I think which is running out of ports. I will use multiple clients and let you know. On the second front, the max sockets is set to 262144. The other parameters is mentioned below. Also, I have sent you the callgrind o/p in email. uinet starting: cpus=1, nmbclusters=262144 |
OK. To answer an earlier questions of yours regarding the small value of maxfiles, don't worry about that. The maxfiles parameter exists as part of the FreeBSD common kernel infrastructure that is in libuinet, but libuinet makes no use of it - there is no emulation or use of kernel file descriptors at all in libuinet. |
Thanks. Please let me know what you find from the callgrind output. I am going to try to find the CPS without closing the accepted connections (as "soclose" was taking significant CPU cycle, as shown in the callgrind output). I am also going to replace arc4random with a simple static variable for the random number generation to save the time from arc4random. With these two let me see how much CPS can I get. I am just trying to figure out the places where we need to do some optimization. I know you are busy for your presentation tomorrow. So, please see when you have time. I will keep you updated on my progress. Thanks...Peter |
I tried to see with out closing the socket what is the CPS i can achieve, It was around 18K connection per second. When compared with the open and close it is +7K sessions. I would like try by disabling syncache. Could you please let me know if i can give it a try by disabling syncache ? Thanks |
I am getting closer to the point where I can spend a little time digging into this. It is interesting that the close reduces performance so significantly. Until I can reproduce this on my end and have something more concrete to comment on, here are a couple of things that I think frame the issue: It is a known issue that FreeBSD performance is currently lagging in the area of short-lived connections - see http://www.freebsd.org/cgi/query-pr.cgi?pr=183659. This doesn't mean further tuning and application-side work won't improve the numbers you are seeing, but I think it does set expectations for how high the numbers might go. libuinet itself is just entering the phase where performance will be analyzed and improved. One of the things that really needs to happen ahead of this work is updating the stack sources libuinet is using to something considerably more recent than the 9.1-RELEASE version it currently uses. Not only do we want to avoid measuring and 'fixing' issues that no longer exist due to subsequent improvements in the main line sources, but in cases where the libuinet work indicates there could be general improvements made to the stack itself, we want to avoid the work of then reproducing the issue with more current sources and developing equivalent patches for submission. |
Thank you. Please let me know, once you finish the integration. I can do the testing for you and help you identifying the few jerks (if any). I have also integrated a small webserver with libuinet to measure the RPS and CPS and have a KVM-VM handly to measure the performance. Looking forward to hear from you. |
Is there a time frame for the migration away from 9.1-RELEASE? Also, I wonder if the user land stack will lose any benefits from checksum offloading which the kernel stack running on a physical box can enjoy (I understand petergsnm's tests were done on KVM). |
See issue #11 for information on |
I cannt compile it in linux! the error as flows: |
Please don't piggyback on existing unrelated issues. Open a new issue for this and include necessary context for interpreting your problem, such as the specific Linux distribution and version you are using, whether you are using something other than the stock compiler for that distribution, the command you executed and the directory you executed it in. |
I was measuring the performance of libuinet on a KVM VM (which uses virtio) with one core. I performed two tests -
Next,
I am not sure why the performance get reduced when I run with libuinet. I am planning to run callgrind and see where in the libuinet we take time. But, I expected the libuinet will get me better performance.
I can't attach my simple TCP programs here. But, if you can drop me an email at peter.gsnm@gmail.com, where I can share my sample programs.
Other thing is to get the 200K PPS (without libuinet), I made the sysctl changes as follows. Not sure if I need to modify some of your header files for the below mentioned sysctl changes. Can you please point me to where I need to modify.
fs.file-max = 5000000
net.core.netdev_max_backlog = 4000000
net.core.optmem_max = 10000000
net.core.rmem_default = 10000000
net.core.rmem_max = 3000000
net.core.somaxconn = 10000000
net.core.wmem_default = 10000000
net.core.wmem_max = 30000000
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_congestion_control = bic
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_max_syn_backlog = 65000
net.ipv4.tcp_max_tw_buckets = 6000000
net.ipv4.tcp_mem = 30000000 30000000 30000000
net.ipv4.tcp_rmem = 30000000 30000000 30000000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_wmem = 30000000 30000000 30000000
net.ipv4.tcp_early_retrans = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_fin_timeout = 7
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_low_latency = 1
Please let me know what you think.
Thank you.
~Peter
The text was updated successfully, but these errors were encountered: