Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline metrics - part 1: set up read replicas on stage #3618

Closed
4 of 5 tasks
patphongs opened this issue Mar 12, 2019 · 11 comments
Closed
4 of 5 tasks

Baseline metrics - part 1: set up read replicas on stage #3618

patphongs opened this issue Mar 12, 2019 · 11 comments

Comments

@patphongs
Copy link
Member

patphongs commented Mar 12, 2019

What we're after: We would like to start tracking our performance through the use of metrics. But before we start we need to take a baseline measurement.

Completion criteria:

  • Test against one read DB replica of stage. Increase replicas based on schedule_a and schedule_b endpoints performance testing.

  • Take performance baseline of all the endpoints and document

  • Test slower query endpoints: schedule_a and schedule_b
    Receipts - 16


    /committee/{committee_id}/schedules/schedule_a/by_employer/
    /committee/{committee_id}/schedules/schedule_a/by_occupation/
    /committee/{committee_id}/schedules/schedule_a/by_size/
    /committee/{committee_id}/schedules/schedule_a/by_state/
    /committee/{committee_id}/schedules/schedule_a/by_zip/
    /schedules/schedule_a/
    /schedules/schedule_a/by_employer/
    /schedules/schedule_a/by_occupation/
    /schedules/schedule_a/by_size/
    /schedules/schedule_a/by_size/by_candidate/
    /schedules/schedule_a/by_state/
    /schedules/schedule_a/by_state/by_candidate/
    /schedules/schedule_a/by_state/totals/
    /schedules/schedule_a/by_zip/
    /schedules/schedule_a/efile/
    /schedules/schedule_a/{sub_id}/

    Disbursements - 9


    /committee/{committee_id}/schedules/schedule_b/by_purpose/
    /committee/{committee_id}/schedules/schedule_b/by_recipient/
    /committee/{committee_id}/schedules/schedule_b/by_recipient_id/
    /schedules/schedule_b/
    /schedules/schedule_b/by_purpose/
    /schedules/schedule_b/by_recipient/
    /schedules/schedule_b/by_recipient_id/
    /schedules/schedule_b/efile/
    /schedules/schedule_b/{sub_id}/

updated locust file and did the first round of testing.

  • [ ]Shut down replicas when we're done measuring the baseline
  • Ensure testing is completed within one sprint so that we can shut down the additional replicas when we're done and no longer need them
    will run another two rounds of locust testing today and tomorrow and put things together. Will generate a quick report based on that.

locust testing result is uploaded here:
https://docs.google.com/spreadsheets/d/1SCrvrFEbCV57rtrXpClWrFg-lAeqyfjU4WDr6dBtW6Y/edit#gid=2097394488

  • Identify next steps and create followup ticket, including setting new performance goals and establishing a check-in process for regular tracking:

for schedule_a and schedule_b testing/bench-marking, we need to re-collect the data after the 2-year restriction removal is deployed to stg/prod.

For regular checking, I think the slow query analysis will do most of the important work on this.

Other work done:

  1. prepare a PR for fixing the locust file testing data issue. also added two endpoints - schedule_c/sub_id and schedule_d/sub_id into endpoints_walk.py:
    Feature/update locust and endpoints #3698

  2. python script for batch query and query profiling(will demo)

  3. script script for batch api walks and timing(will demo).

Part 2: fecgov/fec-cms#2743 (We closed that issue.)

@patphongs patphongs changed the title API baseline metrics Baseline metrics Mar 12, 2019
@patphongs patphongs added this to the Sprint 8.4 milestone Mar 12, 2019
@JonellaCulmer JonellaCulmer changed the title Baseline metrics Baseline metrics - part 1 Mar 14, 2019
@lbeaufort lbeaufort changed the title Baseline metrics - part 1 Baseline metrics - part 1: set up read replicas on stage Mar 14, 2019
@pkfec pkfec modified the milestones: Sprint 8.4, Sprint 8.5 Mar 28, 2019
@qqss88
Copy link
Contributor

qqss88 commented Apr 2, 2019

@pkfec Did we talk about group testing on this one - each developer taking care of one endpoints? Or each developer come out some test cases for each endpoint and I'll put things together and automate the metrics collection process.

@qqss88
Copy link
Contributor

qqss88 commented Apr 2, 2019

since most of the slow queries are from schedule_a and schedule_b, I'll make some test cases for those endpoints first.

@pkfec
Copy link
Contributor

pkfec commented Apr 2, 2019

@qqss88 The idea was to test each api endpoint after lifting the 2 year restrictions. we talked about testing the endpoints by distributing them among the developers. If you PR is reviewed and ready to be merged, in the same PR we can assign the endpoints and ping the developers and ask them to test.

@qqss88
Copy link
Contributor

qqss88 commented Apr 4, 2019

@qqss88
Copy link
Contributor

qqss88 commented Apr 4, 2019

here is a link from previous locust test for reference:
https://drive.google.com/drive/folders/1F4lFVVC6gFfN7rwhfNl41WvhUBkksimp?ths=true

@qqss88
Copy link
Contributor

qqss88 commented Apr 10, 2019

added sched_a_multi_cycle testing to locust file. fixed some data issues with existing locust file. tested locust file against prod and stage. will summarize the result later on.

@patphongs
Copy link
Member Author

Moving to sprint 8.6 to do additional performance baselines in stage after API release cut today.

@patphongs patphongs modified the milestones: Sprint 8.5, Sprint 8.6 Apr 16, 2019
@pkfec
Copy link
Contributor

pkfec commented May 1, 2019

@patphongs @qqss88 are you still doing performance baselines testing in stage? Do you have an estimate on how long it's going to take to wrap up the testing?

@PaulClark2
Copy link
Contributor

@qqss88 I'd still like to see a concise summary of baseline and testing results.

@qqss88
Copy link
Contributor

qqss88 commented May 1, 2019

For sched_a and sched_b queries, because of the amount of data we have, the factors affecting query performance are mainly(assuming server resource usage and DB caching is not a factor here ):

  1. how many 2-year partitions involved in the query
  2. is the query taking advantage of index-scan? (data skewness could be an issue)
  3. size of query result set returned for sorting and pagination (data skewness again)

Considering those factors, data skewness for a certain argument value plays important roles when the query is submitted. I did some data skewness analysis on both sched_a and sched_b arguments involved in the api and here is a quick summary:

top skewed argument value in sched_a:

  • cmte_id = 'C00401224'(actblue) takes around 70% in 2017-1018 cycle
  • cmte_tp='W' takes more than 70% in 2017/2018 cycle
  • line_num='11AI' takes around 99% in 2017/2018 cycle
  • contribution state: skewness is not that bad, top 3 is CA, NY and TX
  • top occupation: 'retired', 'not employed'(combination > 50%)
  • zip code: skewness is not bad, top zip code 20001
  • contribution cities: skewness not bad, top 3 cities: new York, Washington, los angels
  • contributor names: skewness not bad, top 3 names: 'Gilman, Priscilla', 'immasche, sonia', 'Davis, susan'
  • employers: skewed toward 4 values: not employed, none, retired, self employed

as a result, most queries with single top argument value + most recent 3 cycles are loading ok(from seconds to around 1min). But those 3 queries are not loading within the time-out window(502 error):

top committee type 'W' not loading with cycle141618:
https://api-stage.open.fec.gov/v1/schedules/schedule_a?
sort_hide_null=false&per_page=30&is_individual=true&two_year_transaction_period
=2016&two_year_transaction_period=2018&two_year_transaction_period=2014&
sort=-contribution_receipt_date&sort_nulls_last=false&api_key=DEMO_KEY&
recipient_committee_type=W

top line_num not loading with cycle 141618:
https://api-stage.open.fec.gov/v1/schedules/schedule_a?
sort_hide_null=false&per_page=30&is_individual=true&
two_year_transaction_period=2016&two_year_transaction_period=2018
&two_year_transaction_period=2014&sort=-contribution_receipt_date
&sort_nulls_last=false&api_key=DEMO_KEY&line_number=11AI

not_employed not loading:
https://api-stage.open.fec.gov/v1/schedules/schedule_a?
sort_hide_null=false&per_page=30&is_individual=true&
two_year_transaction_period=2018&two_year_transaction_period=2014&
sort=-contribution_receipt_date&sort_nulls_last=false&api_key=DEMO_KEY&
contributor_employer=not employed

sched_b:

sched_b general performance is much better(I think it is mainly due to the amount of data comparing to sched_a)。 Here is a quick summary of data skewness:

  1. cmte_id: again, more than 90% is actblue
  2. cmte_tp: more than 90% is 'w'
  3. line_num: more than 90% is '23'
  4. disbursements purpose: more than 90% on 'other'
  5. city: around 60% on 'washington'
  6. state: around 60% from 'DC'
    7: recipient_name: skewness not bad, top value is 'democratic congressional campaign committee'

Performance is general good, top single argument value with 14-16-18 cycles comes back within seconds. But I failed on one combination filtering:

14-16-28 + line_num23 + actblue + travel
https://api-stage.open.fec.gov/v1/schedules/schedule_b/?spender_committee_type=W&sort=-disbursement_date&line_number=23&per_page=20&sort_null_only=false&disbursement_purpose_category=other&sort_hide_null=false&api_key=DEMO_KEY&committee_id=C00401224&two_year_transaction_period=2018&two_year_transaction_period=2016&two_year_transaction_period=2014&disbursement_description=travel
(slow query analysis needed on this one, current agues is non-indexing scan involved)

this test is database-oriented. It is not a through test identifying all slow queries. Adding some server side resource consumption metrics will be better.

@PaulClark2
Copy link
Contributor

thanks @qqss88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants