Skip to content
This repository has been archived by the owner on May 1, 2020. It is now read-only.

Performance Benchmarking #17

Open
KernelPryanic opened this issue Apr 2, 2018 · 3 comments
Open

Performance Benchmarking #17

KernelPryanic opened this issue Apr 2, 2018 · 3 comments
Assignees
Labels
performance Performance and/or resource usage improvement question Further information is requested undecided The solution is in discussion and yet undecided

Comments

@KernelPryanic
Copy link
Collaborator

Are there already any performance benchmarking results available?

@romshark
Copy link
Owner

romshark commented Apr 2, 2018

I just published a small webwire benchmarking tool.

  1. Start the test server: go run test-server.go
  2. Run the benchmark: go run benchmark.go

Following parameters are available:

  • bench-dur: benchmark duration in seconds (default: 10)
  • addr: address of the target test server like localhost:80 (default: :8081)
  • clients: number of concurrent clients (default: 10)
  • req-timeo: default request timeout (default: 10000)
  • min-req-itv: min interval between each request in milliseconds (default: 250)
  • max-req-itv: max interval between each request in milliseconds (default: 500)
  • min-pld-sz: min request payload size in bytes (default: 32)
  • max-pld-sz: max request payload size in bytes (default: 128)

Here's an example of a 60 seconds long benchmark with 1,000 concurrent connections each sending requests with a 1 KiB payload in a 10 to 30 milliseconds interval:

go run benchmark.go -clients 1000 -min-req-itv 10 -max-req-itv 30 -min-pld-sz 1024 -max-pld-sz 1024 -req-timeo 60000 -bench-dur 60

And here's the results of the above benchmark:

2018/04/02 21:20:19   Benchmark finished (60s)

  Requests performed:  1892900
  Requests timed out:  0

  Data sent:           1.81 GiB (1938329600 bytes)
  Data received:       1.81 GiB (1938329600 bytes)
  Avg payload size:    1.00 KiB

  Avg req itv:         19.955008ms
  Max req itv:         29ms
  Min req itv:         10ms

  Avg req time:        9.420078ms
  Max req time:        832.1403ms
  Min req time:        1.0004ms

  Req/s:               31548
  Bytes/s:             32305493
  Throughput:          30.81 MiB/s

System: I7 3930K hexa-core @ 3.8 Ghz; 64,0 GB DDR3 RAM @ 1833 Mhz

As you can see I was currently able to achieve around 31,5k requests per second with an average reply time of 9 milliseconds at 1k concurrent clients

@romshark romshark added the question Further information is requested label Apr 2, 2018
@romshark romshark self-assigned this Apr 2, 2018
@romshark
Copy link
Owner

romshark commented Apr 3, 2018

Beware

The benchmark is running amok on Windows 10 in case of many concurrent connections.

Windows 10

It seems like TCP/IP connection establishment is very slow on Windows causing huge problems when creating many concurrent connections (> 1000). Too many connections are invoking ridiculously many syscalls on Windows resulting in the Go runtime spawning thousands of OS threads because of syscall-blocked goroutines rendering the machine unresponsive when reaching 10k threads.

trace_benchmark_windows10

In the above screenshot, trace demonstrates the ridiculous amount of syscalls, the slowly degrading performance and the ever growing number of spawned OS threads.

MacOS High Sierra

I've also tested the same configuration on MacOS High Sierra getting very different results:

trace_benchmark_macos_highsierra

The Mac performed just fine with only 27 OS threads. No degrading performance, no syscall spam.

Conclusion

It look more like a Windows related problem rather than a WebWire server/client problem.

@romshark romshark added the undecided The solution is in discussion and yet undecided label Apr 3, 2018
@romshark
Copy link
Owner

I performed a load test using the latest revision and got the following results:

Results

Concurrent Connections 10.000
Request Payload 1 - 64 KiB
Requests Performed 5.919.046
Timeout Rate 0.00%
Sent 183.44 GiB
Received 183.44 GiB
Throughput 313.07 MiB/s
Requests per Second 9.865 rps
Average Latency 1 millisecond
Maximum Latency 4,23 seconds

Test System

Intel i7 3930K (12 threads @ 3.8Ghz, reached full load at 72°C)
64 GB DDR3 1833 Mhz (around 4,75 GB were used during the benchmark)

Consider that both the benchmark and server ran on this machine distorting the results, which could potentially be higher if those were run on different servers.

@romshark romshark added the performance Performance and/or resource usage improvement label Aug 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Performance and/or resource usage improvement question Further information is requested undecided The solution is in discussion and yet undecided
Projects
None yet
Development

No branches or pull requests

2 participants