-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Poor localhost performance on Ubuntu #1206
Description
This topic has been covered in some other issues, more notably #468. I'm creating this because I suspect that issue has gone by the wayside because it was reported in 2017.
I have found there is improved performance on Ubuntu with the usage of TCP_QUICKACK (#1201). However, one unfortunate thing is that the localhost performance is woefully bad. Compared to a simple golang web server being tested on localhost, the golang server is 3X faster running on a t3a.small instance and roughly 4.6X faster running on a m5.large instance.
Note I posted some test results in #468.
The cpprestsdk simple server has higher latency and slightly better requests/sec testing on localhost compared to invoking wrk on my desktop to test against the server over the wire. It isn't clear to me where the bottlenecks are coming from. I've seen 40ms+ in a simple handler like the following. It seems inconceivable that on today's hardware you can run that slowly when doing very little work.
handler( web::http::uri uri )
: m_listener{ std::move( uri ) }
{
m_listener.support(
web::http::methods::GET,
[]( web::http::http_request message ){
auto start = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
message.reply( web::http::status_codes::OK, "Hello world!" ).then( [start](pplx::task<void> t)
{
try{
t.get();
if (show_timing) {
auto now = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
printf("Handler took %lld usec\n", static_cast<unsigned long long>(now - start));
}
}
catch(...){
//
}
} );
} );
}
I suspect that pplx has some hand in this poor performance. I ran a test on my desktop (localhost testing) where I created a separate io_service just for servicing pplx. Note in this implementation, as I am on an iMac Pro, I am using io_service versus GCD, which is the default implementation in pplxapple.cpp.
Using shared io_service (slower than GCD)
Running 10s test @ http://localhost:8989
1 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 18.82ms 20.66ms 183.89ms 93.33%
Req/Sec 13.13k 5.27k 22.93k 67.35%
130080 requests in 10.08s, 11.41MB read
Socket errors: connect 750, read 2898, write 0, timeout 0
Using separate io_service
Running 10s test @ http://localhost:8989
1 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.33ms 12.27ms 158.32ms 92.82%
Req/Sec 19.40k 6.41k 33.77k 67.74%
188477 requests in 10.02s, 16.54MB read
Socket errors: connect 750, read 3466, write 0, timeout 0
Requests/sec: 18812.64
Transfer/sec: 1.65MB
I did try to apply this to pplxlinux.cpp, but did not find an improvement. I have also found threads do really affect things quite a bit. The default is 40. The AWS instance types I'm using have 2 vCPU. If I change the number of threads down to 2-4, I get better performance.
Clearly there is something wrong in the code.
Are there any plans to improve this SDK regarding performance? Or is the mantra simply "convenient enough to use" (don't look under the hood)?
I am hoping Microsoft actually reads these issues that come up. One would like to think that this SDK would have some value to them in helping to illustrate that Microsoft is a "player" in the cloud. Why would I want to use Azure or other MS products if their software is demonstrably poor in performance? How do I know that Microsoft software in general isn't "leaving CPU on the table" by bad implementation? While pplx is "optimized" for Windows platforms, are we really sure it still isn't wasteful in CPU even on their own platforms? I've looked at the pplx code in the code base and it's some pretty difficult to get through stuff.
Microsoft could use this SDK as one of many ways to illustrate/showcase to people that they are leaders in cloud/computing.
Instead the message is that they can't even support "reasonable performance" of a simple "Hello world!" HTTP server implemented in C++.