Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API load testing setup #4327

Closed
8 of 11 tasks
lbeaufort opened this issue Apr 29, 2020 · 4 comments
Closed
8 of 11 tasks

API load testing setup #4327

lbeaufort opened this issue Apr 29, 2020 · 4 comments
Assignees
Milestone

Comments

@lbeaufort
Copy link
Member

lbeaufort commented Apr 29, 2020

Action item from #4314

Need to know what the system can handle before it "falls over". Need to know what the load was on April 15, for example.

Completion criteria:

  • After Aurora migration, measure "fallover point" for number/complexity of requests
  • Put in tickets for findings

Technical steps

Prepare the "locusts"

Set up environment

  • Figure out what it's going to take to set up the DB environment to replicate prod now that we're using Aurora. How many clusters, PR to terraform, check with DB team on timing, etc.
  • Q: Stage and dev have 2 clusters of 4.8xlarge, prod has 3. Can we give stage and prod 3 each and dev one? Alternately, can we spin up a 3rd replica for one day?
  • Per Rohan: Currently
    DEV (1 master - 1 replica)
    STG (1 master - 1 replica)
    PRD (1 master - 4 replica)
    (Change it manually during the test make it much easier than going thru terraform). Make sure to scale it down after the test to minimize cost.
    Should test 2 vs. 4 clusters
  • Confirm access to increase cluster count
  • Make sure everything actually works by running a sample test on stage as-is
  • Make a testing plan. 2 vs 4 clusters, number of application instances and memory, CMS timeouts, API timeouts. Gunicorn workers (need research). Application profiling? https://github.com/benfred/py-spy.
  • Add downloads to locust tests? Will need to look at celery worker setups
  • First test production setup with 2 clusters
  • Test production setup with 4 clusters
  • Test more application memory

Communicate

@lbeaufort lbeaufort self-assigned this Apr 29, 2020
@lbeaufort lbeaufort added this to the Sprint 12.3 milestone Apr 29, 2020
@lbeaufort lbeaufort changed the title After Aurora migration, measure "fallover point" for number/complexity of requests API load testing setup May 5, 2020
@lbeaufort
Copy link
Member Author

We should also let the team and cloud.gov know (ideally a week beforehand) which day we plan to test. Reference: https://cloud.gov/docs/compliance/pentest/

@lbeaufort
Copy link
Member Author

lbeaufort commented May 7, 2020

Parsing script:

"""Extract API query from Kibana log file"""
import csv
import json

with open("LB_API_RTR_requests_4-15-20.csv", "r") as file:
    reader = csv.reader(file, delimiter=',')
    # Endpoint/uery lookup
    queries = {}
    count = 0
    for row in reader:
        # Column 2 has the queries. Throw out some bad data
        if 'v1' in row[1]:
            # Get the endpoint- everything after the 'v1' to the first '?'
            endpoint = row[1].partition("v1/")[2].partition("?")[0].partition(" ")[0]
            if endpoint[-1] != "/":
                endpoint += "/"
            # Get the query param string - everything between the ? and the ' '
            query_parameters = row[1].partition("?")[2].partition(" ")[0]
            # Parse arguments
            query_dict = {}
            if "&" in query_parameters:
                # Clean up some double &&
                query_parameters = query_parameters.replace("&&", "&")
                # Split each query pair out into a list
                parameter_groups = query_parameters.split("&")
                # Make a dictionary of the parameters (this is how locust needs them)
                for result in parameter_groups:
                    # Split the parameters from the values
                    if "=" in result and "api_key" not in result:
                        key, value = result.split("=")
                        if not query_dict.get(key):
                            query_dict[key] = [value]
                        else:
                            query_dict[key].append(value)
            if query_dict:
                # Add to the endpoint/query lookup
                if not queries.get(endpoint):
                    queries[endpoint] = [query_dict]
                else:
                    if query_dict not in queries[endpoint]:
                        queries[endpoint].append(query_dict)

print(json.dumps(queries, indent=1))

@lbeaufort lbeaufort modified the milestones: Sprint 12.3, Sprint 12.4 May 19, 2020
@lbeaufort
Copy link
Member Author

lbeaufort commented May 26, 2020

  • Let the team know and set up a maintenance window in Pingdom, this could cause downtime

In the console:

  • Update DB size (currently 4.8 in prod) - start with the writer.
  • Update parameter group ("DB parameter group" NOT "DB cluster parameter group" - usually needs to correspond with the size - fec-aurora-master for the writer and fec-aurora-replica-5 for reader (we will clean these up later?)
  • Note: FYI reader(s) has enhanced monitoring for autoscaling
  • Choose "apply immediately"
  • Repeat for reader instance
  • Add one instance (production currently at 3 instances)
    Steps: Select the cluster, Actions -> Add instance
  • Double check security groups
  • Set up autoscaling - we created a new policy that mirrored the policy for production docs
  • Run "warmup script" in stage for new reader - David will run this. this is a SQL package (pg_prewarm) that caches data in memory. How to run this? Can manually trigger with one statement one time after reboot, where does the script live, what machine runs it. DB130 and 029 on-prem servers. Could we run this with celery task? Currently on demand. David can share the script with the team. Best to run the readers individually. Best practice to run this from time to time - maybe after election? @dzhang-fec will document and share with the team.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.Modifying.html

@lbeaufort
Copy link
Member Author

There were some issues with formatting the requests, so I also did some testing with the normal locust setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants