Poor localhost performance on Ubuntu

This topic has been covered in some other issues, more notably #468. I'm creating this because I suspect that issue has gone by the wayside because it was reported in 2017.

I have found there is improved performance on Ubuntu with the usage of `TCP_QUICKACK` (#1201). However, one unfortunate thing is that the localhost performance is woefully bad. Compared to a simple golang web server being tested on localhost, the golang server is 3X faster running on a t3a.small instance and roughly 4.6X faster running on a m5.large instance. 

Note I posted some test results in #468.

The cpprestsdk simple server has higher latency and slightly better requests/sec testing on localhost compared to invoking `wrk` on my desktop to test against the server over the wire. It isn't clear to me where the bottlenecks are coming from. I've seen 40ms+ in a simple handler like the following. It seems inconceivable that on today's hardware you can run that slowly when doing very little work.

```
    handler( web::http::uri uri )
      :  m_listener{ std::move( uri ) }
    {
      m_listener.support(
        web::http::methods::GET,
        []( web::http::http_request message ){
          auto start = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
          message.reply( web::http::status_codes::OK, "Hello world!" ).then( [start](pplx::task<void> t)
            {
              try{
                t.get();
                if (show_timing) {
                    auto now = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
                    printf("Handler took %lld usec\n", static_cast<unsigned long long>(now - start));
                }
              }
              catch(...){
                //
              }
            } );
        } );
    }
```

I suspect that `pplx` has some hand in this poor performance. I ran a test on my desktop (localhost testing) where I created a separate `io_service` just for servicing `pplx`. Note in this implementation, as I am on an iMac Pro, I am using `io_service` versus GCD, which is the default implementation in `pplxapple.cpp`.

## Using shared `io_service` (slower than GCD)
```
Running 10s test @ http://localhost:8989
  1 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.82ms   20.66ms 183.89ms   93.33%
    Req/Sec    13.13k     5.27k   22.93k    67.35%
  130080 requests in 10.08s, 11.41MB read
  Socket errors: connect 750, read 2898, write 0, timeout 0
```

## Using separate `io_service`
```
Running 10s test @ http://localhost:8989
  1 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.33ms   12.27ms 158.32ms   92.82%
    Req/Sec    19.40k     6.41k   33.77k    67.74%
  188477 requests in 10.02s, 16.54MB read
  Socket errors: connect 750, read 3466, write 0, timeout 0
Requests/sec:  18812.64
Transfer/sec:      1.65MB
```

I did try to apply this to `pplxlinux.cpp`, but did not find an improvement. I have also found threads do really affect things quite a bit. The default is 40. The AWS instance types I'm using have 2 vCPU. If I change the number of threads down to 2-4, I get better performance.

Clearly there is something wrong in the code.

Are there any plans to improve this SDK regarding performance? Or is the mantra simply "convenient enough to use" (don't look under the hood)?

I am hoping Microsoft actually reads these issues that come up. One would like to think that this SDK would have some value to them in helping to illustrate that Microsoft is a "player" in the cloud. Why would I want to use Azure or other MS products if their software is demonstrably poor in performance? How do I know that Microsoft software in general isn't "leaving CPU on the table" by bad implementation? While `pplx` is "optimized" for Windows platforms, are we really sure it still isn't wasteful in CPU even on their own platforms?  I've looked at the `pplx` code in the code base and it's some pretty difficult to get through stuff.

Microsoft could use this SDK as one of many ways to illustrate/showcase to people that they are leaders in cloud/computing. 

Instead the message is that they can't even support "reasonable performance" of a simple "Hello world!" HTTP server implemented in C++.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor localhost performance on Ubuntu #1206

Using shared `io_service` (slower than GCD)

Using separate `io_service`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor localhost performance on Ubuntu #1206

Description

Using shared io_service (slower than GCD)

Using separate io_service

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Using shared `io_service` (slower than GCD)

Using separate `io_service`