Skip to content

lichenran1234/load-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to Load Test Databricks Model Serving Endpoints

Load Testing using Databricks Notebook (no setup needed)

This notebook provides a convenient way to load test your serving endpoints and obtain insights into workload size, QPS, and latency. By using the out-of-the-box setup, you can quickly get started with load testing without any additional setup. This notebook is recommended for low to mid QPS goals. If you have a QPS goal greater than 2k, we recommend following the rest of this tutorial to set up Locust.

Load Testing using Locust

Load testing Databricks Model Serving endpoints is an important step before moving the endpoint to production. A load test verfies the latency meets your requirements, helps you estimate costs, and determines expected throughput and concurrency.

This repository demonstrates how to load test a Databricks Model Serving endpoint using the open source load testing tool locust.io.

Expected Results

After setting up Locust, you will be able to send requests to your model endpoint with a configurable concurrency. Locust will record the response latency (p50, p75, p99, ect) and display it via the web UI. The data can also be downloaded in CSV format.

Screen Shot 2023-03-29 at 10 39 07 AM

Requirements

Before starting the walkthrough, make sure you complete the following tasks:

  • Create a Model Serving endpoint and verify it is in the "Ready" state.
  • Have at least one sample payload for the model ready in JSON format. You can see the supported format options here.
  • Record the instance name of the Databricks workspace where the model is deployed.
  • Databricks APIs use Personal Access Tokens (PAT)s to authenticate. Use an existing PAT or generate a new PAT that has "Can View" or "Can Manage" permissions on the model serving endpoint.
  • If your Databricks workspace restricts IP addresses to an IP access list, your load testing source's IP address must be within the access list. You may need to connect to your company's VPN. You can check if IP access lists are used in your Databricks workspace by following the instructions here.

Setting up Locust on your own computer in single-process mode (5 min setup, supports low QPS)

Follow the steps outlined here to run a single-process load test from your local computer. Locust will only use one CPU core on your machine, and the max queries per second (QPS) supported depends on your payload size. For small payloads, it can support up to hundreds of QPS. For large payloads, it may only support less than 10 QPS.

Setting up distributed Locust on a powerful machine (30 min setup, supports high QPS)

Follow the steps outlined here to set up distributed Locust on an AWS EC2 instance. It should also be applicable to other cloud providers’ Ubuntu instances.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages