Skip to content
This repository has been archived by the owner. It is now read-only.

Test impact of metrics gathering on production traffic #81

Closed
rfk opened this issue Aug 21, 2015 · 23 comments
Closed

Test impact of metrics gathering on production traffic #81

rfk opened this issue Aug 21, 2015 · 23 comments
Assignees

Comments

@rfk
Copy link
Member

@rfk rfk commented Aug 21, 2015

(Filing this to capture the work as part of our planning process).

We're hoping to land some new metrics-gathering scripts in #72, but they're heavy queries that will make the auth db work hard. Let's try them out in stage and get a feel for how they might impact production traffic.

@rfk rfk added this to the multi-device-metrics milestone Aug 21, 2015
@philbooth
Copy link
Contributor

@philbooth philbooth commented Aug 25, 2015

@rfk, I'm thinking this should be a script in the repo so that we can easily re-run it as we add further metrics queries in the future.

Rough methodology, first draft:

  1. Define a loop of representative SELECT statements.
  2. Define a loop of representative INSERT statements.
  3. Run each loop separately with, say, 1,000 iterations and minimal other work happening on the box (liaise with the rest of the team to ensure that). In each case, queries would be run concurrently to the level that available connections allow it.
  4. Record, say, 9 timings for each, so that we can use the medians and discard outliers.
  5. Spawn a child process that runs the metrics queries serially until instructed to exit.
  6. Re-do steps 3 and 4.
  7. Report differences, medians, means, stdevs, variances for each loop.
@rfk
Copy link
Member Author

@rfk rfk commented Aug 25, 2015

Define a loop of representative SELECT statements.
Define a loop of representative INSERT statements.
Run each loop separately with, say, 1,000 iterations and minimal other work happening on the box
(liaise with the rest of the team to ensure that). In each case, queries would be run concurrently
to the level that available connections allow it.

We can probably use the existing loadtest for this: https://github.com/mozilla/fxa-auth-server/tree/master/test/load unless @jrgm has a more up-to-date version.

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 2, 2015

At long last, I have some timings for this. Bottom-line: I see no impact.

Using the auth-server load tests and comparing benchmark timings against times taken concurrently with a sequentially-repeating loop of metrics queries, I got these results:

Benchmark times (seconds): 300.45 300.45 300.46 300.46 300.45
  Metrics times (seconds): 300.45 300.44 300.45 300.45 300.45

Then, just to see what it was like under increased load, I repeated the above but with five further loops of the load test happening in the background:

Benchmark times (seconds): 300.53 300.53 300.60 300.56 300.57
  Metrics times (seconds): 300.53 300.52 300.54 300.54 300.53

The variance is minimal so, barring some error in my method, the results seem good to me.

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 2, 2015

@rfk, I've marked this as "in review". And hopefully I've not been premature in opening the following PRs off the back of it:

@philbooth philbooth removed their assignment Sep 2, 2015
@jrgm
Copy link
Contributor

@jrgm jrgm commented Sep 2, 2015

Just curious, but are the values in
Benchmark times (seconds): 300.53 300.53 300.60 300.56 300.57
the report times for running make bench with the ./test/load/ tests?

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 2, 2015

@jrgm
Copy link
Contributor

@jrgm jrgm commented Sep 2, 2015

The loads framework just does as much work as it can in a fixed time interval. So, that the tests took ~300 seconds is expected independent of actual work being done on the database. (It's good that the load test reports no errors though).

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 2, 2015

Arrrgh! Thanks, I had no idea. Stupidly assumed it was performing a fixed number of operations. Back to the drawing board...

@rfk
Copy link
Member Author

@rfk rfk commented Sep 3, 2015

@philbooth it should also give you some summary of RPS or other throughput measure; we can look at kibana graphs during the run and eyeball for differences there as well.

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 3, 2015

Using the fixed perf-testing script I now have, quel surprise, some realistic-looking numbers.

I'll paste the raw data in below, but the summary is:

  1. Running one instance of the load tests, there was a roughly 4% slowdown in performance when metrics queries are running.
  2. Running four concurrent instances of the load tests, there was a roughly 2% slowdown in performance when metrics queries are running.

Tests were executed on a large EC2 instance, 4 cores, running against the stage auth-server.

Raw data for one load-test process, five iterations:

Benchmark requests per second: 269 278 276 279 275 [median=276, mean=275.4, stdev=3.5]
  Metrics requests per second: 267 264 263 262 268 [median=264, mean=264.8, stdev=2.3]

Benchmark number of requests: 81029 83499 82936 83860 82614 [median=82936, mean=82787.6, stdev=979.9]
  Metrics number of requests: 80190 79528 79023 78862 80508 [median=79528, mean=79622.2, stdev=640.8]

Raw data for four load-test processes, nine iterations (I thought there might be greater variance in this set):

Benchmark requests per second: 68 70 67 68 68 68 68 67 67 [median=68, mean=67.9, stdev=0.9]
  Metrics requests per second: 66 65 65 66 66 66 67 67 68 [median=66, mean=66.2, stdev=0.8]

Benchmark number of requests: 20631 21303 20327 20698 20418 20625 20447 20410 20382 [median=20447, mean=20582.3, stdev=282.3]
  Metrics number of requests: 19969 19719 19584 19974 19840 19837 20220 20241 20650 [median=19969, mean=20003.8, stdev=304.5]

The percentage slowdown figure I gave is based on the median values for the total number of requests (I used that because the requests per second number is rounded by the load-tester).

@rfk, how do these fit in with your expectations / reasonable limits?

@rfk
Copy link
Member Author

@rfk rfk commented Sep 4, 2015

This seems broadly as-expected - not blocking other queries by taking row locks or anything, just tying up resources on the server and making other queries run slower. Some other things it would be intersting to look at:

  • how does it compare to performance without READ UNCOMMITTED on the queries?
  • can we get CPU/memory stats from the DB and see what sort of a spike it caused?
@rfk
Copy link
Member Author

@rfk rfk commented Sep 4, 2015

Also, how long does the metrics script take to run in its entirety?

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 4, 2015

how does it compare to performance without READ UNCOMMITTED on the queries?

There doesn't seem to be any difference, actually. Removing the isolation level and running the queries outside a transaction, I got 266 requests per second from one instance of the load-tests and 66 requests per second with four instances running (both medians from five runs).

Raw data, one load test:

  Metrics number of requests: 79991 80182 80032 80279 79580
  Metrics requests per second: 266 266 266 267 264

Raw data, four load tests:

  Metrics number of requests: 19891 19897 20113 19784 20098
  Metrics requests per second: 66 66 66 65 66

how long does the metrics script take to run in its entirety?

~1.4 seconds without load, ~1.8 seconds under light load, ~2 seconds under heavy load (medians from five).

Combined, this and the previous answer surprised me enough to think something was wrong, the metrics script couldn't really be doing the work I thought it was. But if I run the queries individually in the console, they are very quick there too. The slowest one being:

MySQL [fxa]> SELECT COUNT(DISTINCT a.uid) AS count
    -> FROM accounts AS a
    -> INNER JOIN sessionTokens AS s
    -> ON a.uid = s.uid
    -> WHERE a.createdAt < 1441358306643
    -> AND s.uaDeviceType = 'mobile';
+-------+
| count |
+-------+
|    29 |
+-------+
1 row in set (0.60 sec)

My next fear was that the database wasn't as populated as I thought it was. But nope, it has 54 million session tokens in there:

MySQL [fxa]> SELECT COUNT(*) AS count FROM sessionTokens;
+----------+
| count    |
+----------+
| 54119099 |
+----------+
1 row in set (8.85 sec)

And notice how much slower that last query is than the metrics queries. Why is that? It smells wrong to me, am I doing something stupid?

can we get CPU/memory stats from the DB and see what sort of a spike it caused?

I'm not sure about this one yet because I don't know how to get on to the database host. Is there a way to find out from a mysql console or do I need to run top on the machine itself? @jrgm, do you have a preferred method for finding this out?

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 4, 2015

And notice how much slower that last query is than the metrics queries. Why is that? It smells wrong to me, am I doing something stupid?

Oh, hang on:

MySQL [fxa]> SELECT COUNT(*) AS count
    -> FROM accounts AS a
    -> INNER JOIN sessionTokens AS s
    -> ON a.uid = s.uid;
+--------+
| count  |
+--------+
| 119561 |
+--------+
1 row in set (0.38 sec)

It looks like, because the sessionTokens table is heavily populated with dummy data, the JOIN with accounts is producing a very small record set for the rest of the query to work on. So the short query times I'm getting on stage are not representative of what will happen with proper data in the wild.

Does that sound correct?

If so, it certainly invalidates my timings for the metrics script itself and my comparison without READ UNCOMMITTED. Does it also invalidate the broader performance tests or is it safe to say that, because the script is executed in a loop, the effects are comparable?

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 4, 2015

Fwiw, I'm running a script that will (eventually) give me a local database that is both big and more representative for these queries to run against. When it's ready, I'll re-run the perf tests against it to see how the numbers compare to those above.

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 5, 2015

Running locally against a database with 10 million accounts and 16 million session tokens, I saw:

  • A roughly 5% slowdown in performance when the metrics queries were run concurrently with one load test.
  • A roughly 8% slowdown in performance when they were run concurrently with four load tests.
  • A roughly 7% slowdown in performance when they were run without READ UNCOMMITTED against one load test.

Timings for the metrics script itself:

  • With no load: 6' 11"
  • With one load test: 6' 57"
  • With four concurrent load tests: 7' 13"

Raw data, one load test:

Benchmark requests per second: 528 522 523 520 357
  Metrics requests per second: 470 498 501 498 506

Benchmark number of requests: 158544 156893 157153 156273 107359
  Metrics number of requests: 141168 149749 150556 149614 151864

Raw data, four concurrent load tests:

Benchmark requests per second: 150 151 151 151 152
  Metrics requests per second: 140 140 139 139 138

Benchmark number of requests: 45228 45580 45413 45606 45680
  Metrics number of requests: 42053 42143 41801 41880 41538

Raw data, one load test, metrics queries running without READ UNCOMMITTED set:

Benchmark requests per second: 524 523 495 481 524
  Metrics requests per second: 486 499 458 499 444

Benchmark number of requests: 157283 157213 148651 144476 157338
  Metrics number of requests: 146130 150002 137574 149855 133538
@rfk
Copy link
Member Author

@rfk rfk commented Sep 7, 2015

@philbooth does the test report avg request time/latency by any chance? It would be another interesting axis for comparison.

@philbooth
Copy link
Contributor

@philbooth philbooth commented Sep 7, 2015

Using the updated script:

  • With one set of load tests, average request time deteriorated from 0.02 seconds to 0.03 seconds when metrics queries were running.
  • With four concurrent sets of load tests, average request time deteriorated from 0.11 seconds to 0.12 seconds when metrics queries were running.

Raw data, one load test:

Benchmark request time: 0.02s 0.02s 0.02s 0.02s 0.02s
  Metrics request time: 0.03s 0.03s 0.03s 0.03s 0.03s

Benchmark requests per second: 490 485 482 489 491
  Metrics requests per second: 439 441 446 442 443

Benchmark number of requests: 147075 145737 144855 146953 147530
  Metrics number of requests: 131890 132453 133883 132733 133193

Raw data, four concurrent load tests:

Benchmark request time: 0.11s 0.11s 0.11s 0.11s 0.11s
  Metrics request time: 0.12s 0.12s 0.12s 0.12s 0.12s

Benchmark requests per second: 135 136 134 136 135
  Metrics requests per second: 121 122 122 121 121

Benchmark number of requests: 40812 40879 40487 41112 40608
  Metrics number of requests: 36465 36659 36813 36329 36488
@philbooth philbooth removed their assignment Sep 8, 2015
@rfk rfk assigned jrgm Sep 8, 2015
@rfk
Copy link
Member Author

@rfk rfk commented Sep 9, 2015

@jrgm what's your take on the above? Based on @philbooth's findings, it seems like this script is low-risk to try running occasionally in production in order to start getting these metrics flows, but we should continue to work towards a lower-impact way of gathering them for long-term use.

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Oct 5, 2015

add a blocked label

@rfk
Copy link
Member Author

@rfk rfk commented Oct 14, 2015

This came up in my bi-weekly chat with Travis, who recommended that we push ahead with running it against the read-replica rather than trying to do it on the production db. I'm going to close this out, but leave #83 open to represent the outstanding work on that front.

@rfk rfk closed this Oct 14, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants