Skip to content

Load Testing Best Practices

Ingo edited this page May 20, 2016 · 4 revisions

Writing Load Test Scripts

  • Keep it simple & Don't repeat yourself: Load testing is complicated. Try to keep your scripts as simple and readable as possible. Your load test is another piece of code you will have to maintain.
  • Model Different User Types: Your service will be used by different types of users. To reflect this in your load test try to categorize your Gatling sessions (=users) accordingly. Consider modeling user types based on their level of activity: e.g. 10% very active users, 60% medium active users, and 30% hardly active users. This categorization can drive min/max pause times, number of repetitions if certain calls, or enable/disable certain parts of your script.
  • Randomize Behavior: No two users of a service will behave in exactly the same way. Consider adding some randomization to min/max sleep times, the number of repetitions of certain calls, and in the payload users are sending.
  • Debugging: Consider attaching a debugger to your load test script to troubleshoot issues. Compared to adding log output or running tests remotely this can provide much quicker and more accurate feedback. To setup debugging, add the following <jvmArg> to the <configuration><jvmArgs>..</jvmArgs></configuration> section of the io.gatling:gatling-maven-plugin plugin as described in the README.md. Once you made this change in your pom.xml and start a test locally, you will be able to attach a remote debugger to port 7000 and step through your script as it is generating load.
<build>
  <plugins>
    <plugin>
      <groupId>io.gatling</groupId>
      <artifactId>gatling-maven-plugin</artifactId>
      ...
      <configuration>
        <jvmArgs>
          <jvmArg>-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7000</jvmArg>
        </jvmArgs>

Logging

  • While you are developing and troubleshooting your load test script, consider a combination of debug messages and actual local debugging. If you leave the logging statements in your script, consider their impact on CPU utilization and disk space. Minimize logging for large scale tests to the absolute minimum.

Monitoring

  • Monitor the load test while it is running to verify it is behaving the way you expect. Take advantage of the AWS CloudWatch metrics of your load generator instances. Also note that you can configure monitoring each load generator with Graphite through the gatling.conf configuration file. Take advantage of additional monitoring by SSH-ing into the load generators during the test using the same SSH key pair used by Jenkins.

Continuous Integration / Jenkins

  • Pipelines: Consider using Jenkins pipelines for the different tasks involved in load testing. We recommend creating a pipeline of the following jobs: (1) Update the load test environment with the latest code. (2) Kick off a load test against the freshly updated environment. (3) Append the results link to an external system like Confluence. Consider using build parameters to configure your load test. Among other things, we frequently wanted to parameterize the following values: URL of the server you want to test (this allows you to vary the load test environments), number and type of load generators, name of the specific scenario you want to load test (your simulation file can pickup this value to send different kinds of traffic patterns e.g. cold spike, stair case, soak, sine-wave), etc.
  • Utilization: The Gatling AWS Maven has a very specific performance profile. For the majority of the test, the plugin is network IO heavy (phase 1 and 2). Once the actual test is over, the aggregation of the results is very CPU intensive. As a result, we recommend having a dedicated Jenkins slave for running the tests. Ideally this slave is an EC2 instance itself. The network IO between the Jenkins build slave and the load generators will benefit from that and it simplifies changing the size of the slave as necessary to reduce costs.
  • Disk Space: Wipe the workspace at the start of the test to reduce the disk space requirements on Jenkins. Especially when long-running tests are logging errors and warnings for long periods of time, you will need a lot of temporary disk space on the Jenkins slave. This space can be reclaimed once the simulation results have been archived to S3.