Skip to content
This repository has been archived by the owner on Jun 30, 2021. It is now read-only.

Ensure the eWallet continues performing well under heavy load testing #359

Closed
T-Dnzt opened this issue Jul 24, 2018 · 8 comments
Closed
Assignees
Labels
kind/ops Ops thing
Milestone

Comments

@T-Dnzt
Copy link

T-Dnzt commented Jul 24, 2018

No description provided.

@T-Dnzt T-Dnzt added this to the v1.0 milestone Jul 24, 2018
@unnawut
Copy link
Contributor

unnawut commented Jul 26, 2018

This is a crude but quick load test done within a few hours with Apache Bench. It's nowhere near accurate but given the setup, I believe it can serve as a minimum baseline:

Setup:

  • eWallet server ran on a MBP 15" 2013 Quad-core i7 2.3 GHz
  • Wifi network 😟
  • Used basic ab benchmark (= pure brute-force)
  • Used dev environment config
  • Tested against two read-only endpoints only, i.e. /api/admin and /api/admin/token.all

Cross-machine result:

  • Ping latency 65ms
  • Was able to do 200 requests concurrently with 170ms response time (including network latency)
  • Throughput 1,000 req/s
  • After 200 concurrency, response time & throughput dropped significantly

☝️Test infra limitation. A clean phoenix project gets similar results.

Same-machine result:

  • Consecutive single requests: a consistent ~40ms response time.
  • 200 concurrent requests give < 100ms response time
  • Throughput ~2,000 req/s

Other observations:

  • Still the server hit no-where near 100% CPU or memory
  • We will need to setup a proper load testing suite (dedicated environment, network, load testing tools, test scripts with ramp up/down, etc.) for more accurate/realistic results. But at least this gives some quick data.

Summarized data:

Script Path Concurrency Reqs/concurrency Reqs/second Mean response time (ms)
static /api/admin 1 100 15.66 63.88
  10 10 88.21 113.36
  10 50 92.78 107.78
  100 50 723.83 138.15
  200 50 1354.16 147.69
  300 50 1089.81 275.28
  400 50 831.74 480.92
  400 10 847.49 471.98
  500 10 596.72 837.91
  600 10 726.05 826.39
  700 10 723.77 967.16
  800 10 830.17 963.66
token_all /api/admin/token.all 1 100 14.07 71.06
  10 10 85.69 116.70
  10 50 90.28 110.76
  100 50 700.26 142.80
  200 50 1148.22 174.18
  300 50 953.18 314.74
  400 50 581.29 688.13
  400 10 830.27 481.77
  500 10 837.67 596.90
  600 10 707.50 848.05
  700 10 446.96 1566.14
  800 10 407.29 1964.20

(The number of concurrency and reqs/concurrency is ugly, but again this is a quick & dirty load test to get an initial feeling)

@unnawut
Copy link
Contributor

unnawut commented Jul 27, 2018

Too itchy about this. I'm going to baseline against an out of the box Phoenix server to get another perspective.

@unnawut
Copy link
Contributor

unnawut commented Jul 27, 2018

A clean phoenix project gives similar results to the networked tests. So I'm crossing out the cross-machine results for their invalidity.

@unnawut
Copy link
Contributor

unnawut commented Jul 27, 2018

A same-machine load test to /api/admin/token.all gives the following:

Script Path Concurrency Reqs/concurrency Reqs/second Mean response time (ms)
token_all same machine /api/admin/token.all 1 100 23.23 43.04
    10 10 121.17 82.53
    10 50 114.98 86.97
    100 50 896.74 111.52
    200 50 1401.75 142.68
    300 50 975.25 307.61
    400 50 497.15 804.59
    400 10 1948.77 205.26
    500 10 2169.16 230.50
    600 10 2324.55 258.11
    700 10 2442.27 286.62
    800 10 2431.54 329.01
    900 10 2444.97 368.10
    1000 10 2449.01 408.00

@unnawut
Copy link
Contributor

unnawut commented Aug 7, 2018

Moving back to todo for a proper environment setup

@unnawut
Copy link
Contributor

unnawut commented Nov 23, 2018

Latest results using the implementation in #499:

Setup:

  • 1 instance of GCP's n1-standard-2 (2 vCPUs Intel Broadwell, 7.5 GB memory) for the application
    • Database server resides on the same instance as the application server
  • 1 instance of GCP's n1-standard-2 (2 vCPUs Intel Broadwell, 7.5 GB memory) for generating and measuring the load

Results:

Path TPS Max Response Time Mean Min
/api/admin/transaction.create Ping 0.228 0.162 0.120
1 52 39 29
10 40 31 28
20 40 30 26
30 61 31 28
40 2455 1532 54
50 Timeout Timeout Timeout
100 Timeout Timeout Timeout
200 Crashed Crashed Crashed

Quick takeaway:
Supports up to 30 TPS with standard configurations (although it can be easily scaled up since it's a typical application server).

@unnawut
Copy link
Contributor

unnawut commented Nov 23, 2018

Next steps (v1.2):

  • Setup profiling so bottlenecks can be identified
  • Create issues to optimize the bottlenecks

@unnawut unnawut modified the milestones: v1.1, v1.2 Nov 23, 2018
@unnawut
Copy link
Contributor

unnawut commented Jan 21, 2019

A basic load test runner is available with #499.
Profiling with AppSignal is available with #586.

The optimization will continue in #361.

@unnawut unnawut closed this as completed Jan 21, 2019
@unnawut unnawut modified the milestones: v1.2, v1.1 Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/ops Ops thing
Projects
None yet
Development

No branches or pull requests

2 participants