Skip to content

Commit

Permalink
Unify benchmark tool based on stresscli library (#66)
Browse files Browse the repository at this point in the history
* Unify benchmarktool based on stresscli

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* update code

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* add locust template files

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* fix issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix stress cli config path issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* update code

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove hardcode in aistress.py

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* add readme

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* update document

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* Support streaming response for getting correct first token latency and input output tokens number

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add input token number in output format

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* update config.ini with input and output token number

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

---------

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
lvliang-intel and pre-commit-ci[bot] authored Aug 19, 2024
1 parent ebee50c commit 71637c0
Show file tree
Hide file tree
Showing 26 changed files with 1,474 additions and 1,340 deletions.
123 changes: 74 additions & 49 deletions evals/benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,96 @@
# Stress Test Script
# OPEA Benchmark Tool

## Introduction
This Tool provides a microservices benchmarking framework that uses YAML configurations to define test cases for different services. It executes these tests using `stresscli`, built on top of [locust](https://github.com/locustio/locust), a performance/load testing tool for HTTP and other protocols and logs the results for performance analysis and data visualization.

This script is a load testing tool designed to simulate high-concurrency scenarios for a given server. It supports multiple task types and models, allowing users to evaluate the performance and stability of different configurations under heavy load.
## Features

## Prerequisites
- **Services load testing**: Simulates high concurrency levels to test services like LLM, reranking, ASR, E2E and more.
- **YAML-based configuration**: Define test cases, service endpoints, and testing parameters in YAML.
- **Service metrics collection**: Optionally collect service metrics for detailed performance analysis.
- **Flexible testing**: Supports various test cases like chatqna, codegen, codetrans, faqgen, audioqna, and visualqna.
- **Data analysis and visualization**: After tests are executed, results can be analyzed and visualized to gain insights into the performance and behavior of each service. Performance trends, bottlenecks, and other key metrics are highlighted for decision-making.

- Python 3.8+
- Required Python packages:
- argparse
- requests
- transformers
## Table of Contents

## Installation

1. Clone the repository or download the script to your local machine.
2. Install the required Python packages using `pip`:

```sh
pip install argparse requests transformers
```
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Test Suite Configuration](#test-suite-configuration)
- [Test Cases](#test-cases)

## Usage

The script can be executed with various command-line arguments to customize the test. Here is a breakdown of the available options:
## Installation

- `-f`: The file path containing the list of questions to be used for the test. If not provided, a default question will be used.
- `-s`: The server address in the format `host:port`. Default is `localhost:8080`.
- `-c`: The number of concurrent workers. Default is 20.
- `-d`: The duration for which the test should run. This can be specified in seconds (e.g., `30s`), minutes (e.g., `10m`), or hours (e.g., `1h`). Default is `1h`.
- `-u`: The delay time before each worker starts, specified in seconds (e.g., `2s`). Default is `1s`.
- `-t`: The task type to be tested. Options are `chatqna`, `openai`, `tgi`, `llm`, `tei_embedding`, `embedding`, `retrieval`, `tei_rerank` or `reranking`. Default is `chatqna`.
- `-m`: The model to be used. Default is `Intel/neural-chat-7b-v3-3`.
- `-z`: The maximum number of tokens for the model. Default is 1024.
### Prerequisites

### Example Commands
- Python 3.x
- Install the required Python packages:

```bash
python stress_benchmark.py -f data.txt -s localhost:8888 -c 50 -d 30m -t chatqna
pip install -r ../../requirements.txt
```

### Running the Test
## Usage

To start the test, execute the script with the desired options. The script will:
1 Define the test cases and configurations in the benchmark.yaml file.

1. Initialize the question pool from the provided file or use the default question.
2. Start a specified number of worker threads.
3. Each worker will repeatedly send requests to the server and collect response data.
4. Results will be written to a CSV file.
2 Run the benchmark script:

### Output
```bash
python benchmark.py
```

The results will be stored in a CSV file with the following columns:
The results will be stored in the directory specified by `test_output_dir` in the configuration.

- `question_len`: The length of the input question in tokens.
- `answer_len`: The length of the response in tokens.
- `first_chunk`: The time taken to receive the first chunk of the response.
- `overall`: The total time taken for the request to complete.
- `err`: Any error that occurred during the request.
- `code`: The HTTP status code of the response.

## Notes
## Configuration

- Ensure the server address is correctly specified and accessible.
- Adjust the concurrency level (`-c`) and duration (`-d`) based on the capacity of your server and the goals of your test.
- Monitor the server's performance and logs to identify any potential issues during the load test.
The benchmark.yaml file defines the test suite and individual test cases. Below are the primary sections:

## Logging
### Test Suite Configuration

The script logs detailed information about each request and any errors encountered. The logs can be useful for diagnosing issues and understanding the behavior of the server under load.
```yaml
test_suite_config:
examples: ["chatqna"] # Test cases to be run (e.g., chatqna, codegen)
concurrent_level: 4 # The concurrency level
user_queries: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048] # Number of test requests
random_prompt: false # Use random prompts if true, fixed prompts if false
run_time: 60m # Total runtime for the test suite
collect_service_metric: false # Enable service metrics collection
data_visualization: false # Enable data visualization
test_output_dir: "/home/sdp/benchmark_output" # Directory for test outputs
```
### Test Cases
Each test case includes multiple services, each of which can be toggled on/off using the `run_test` flag. You can also change specific parameters for each service for performance tuning.

Example test case configuration for `chatqna`:

```yaml
test_cases:
chatqna:
embedding:
run_test: false
service_name: "embedding-svc"
retriever:
run_test: false
service_name: "retriever-svc"
parameters:
search_type: "similarity"
k: 4
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
llm:
run_test: false
service_name: "llm-svc"
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
streaming: true
e2e:
run_test: true
service_name: "chatqna-backend-server-svc"
```
175 changes: 175 additions & 0 deletions evals/benchmark/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import os
from datetime import datetime

import yaml
from stresscli.commands.load_test import locust_runtests
from utils import get_service_cluster_ip, load_yaml

service_endpoints = {
"chatqna": {
"embedding": "/v1/embeddings",
"embedding_serving": "/v1/embeddings",
"retriever": "/v1/retrieval",
"reranking": "/v1/reranking",
"reranking_serving": "/rerank",
"llm": "/v1/chat/completions",
"llm_serving": "/v1/chat/completions",
"e2e": "/v1/chatqna",
},
"codegen": {"llm": "/v1/chat/completions", "llm_serving": "/v1/chat/completions", "e2e": "/v1/codegen"},
"codetrans": {"llm": "/v1/chat/completions", "llm_serving": "/v1/chat/completions", "e2e": "/v1/codetrans"},
"faqgen": {"llm": "/v1/chat/completions", "llm_serving": "/v1/chat/completions", "e2e": "/v1/faqgen"},
"audioqna": {
"asr": "/v1/audio/transcriptions",
"llm": "/v1/chat/completions",
"llm_serving": "/v1/chat/completions",
"tts": "/v1/audio/speech",
"e2e": "/v1/audioqna",
},
"visualqna": {"lvm": "/v1/chat/completions", "lvm_serving": "/v1/chat/completions", "e2e": "/v1/visualqna"},
}


def extract_test_case_data(content):
"""Extract relevant data from the YAML based on the specified test cases."""
# Extract test suite configuration
test_suite_config = content.get("test_suite_config", {})

return {
"examples": test_suite_config.get("examples", []),
"concurrent_level": test_suite_config.get("concurrent_level"),
"user_queries": test_suite_config.get("user_queries", []),
"random_prompt": test_suite_config.get("random_prompt"),
"test_output_dir": test_suite_config.get("test_output_dir"),
"run_time": test_suite_config.get("run_time"),
"collect_service_metric": test_suite_config.get("collect_service_metric"),
"llm_model": test_suite_config.get("llm_model"),
"all_case_data": {
example: content["test_cases"].get(example, {}) for example in test_suite_config.get("examples", [])
},
}


def create_run_yaml_content(service_name, base_url, bench_target, concurrency, user_queries, test_suite_config):
"""Create content for the run.yaml file."""
return {
"profile": {
"storage": {"hostpath": test_suite_config["test_output_dir"]},
"global-settings": {
"tool": "locust",
"locustfile": os.path.join(os.getcwd(), "stresscli/locust/aistress.py"),
"host": base_url,
"stop-timeout": 120,
"processes": 2,
"namespace": "default",
"bench-target": bench_target,
"run-time": test_suite_config["run_time"],
"service-metric-collect": test_suite_config["collect_service_metric"],
"llm-model": test_suite_config["llm_model"],
},
"runs": [{"name": "benchmark", "users": concurrency, "max-request": user_queries}],
}
}


def create_and_save_run_yaml(example, service_type, service_name, base_url, test_suite_config, index):
"""Create and save the run.yaml file for the service being tested."""
os.makedirs(test_suite_config["test_output_dir"], exist_ok=True)

run_yaml_paths = []
for user_queries in test_suite_config["user_queries"]:
concurrency = max(1, user_queries // test_suite_config["concurrent_level"])

bench_target = (
f"{example}{'bench' if service_type == 'e2e' and test_suite_config['random_prompt'] else 'fixed'}"
)
run_yaml_content = create_run_yaml_content(
service_name, base_url, bench_target, concurrency, user_queries, test_suite_config
)

run_yaml_path = os.path.join(
test_suite_config["test_output_dir"], f"run_{service_name}_{index}_users_{user_queries}.yaml"
)
with open(run_yaml_path, "w") as yaml_file:
yaml.dump(run_yaml_content, yaml_file)

run_yaml_paths.append(run_yaml_path)

return run_yaml_paths


def run_service_test(example, service_type, service_name, parameters, test_suite_config):
svc_ip, port = get_service_cluster_ip(service_name)
base_url = f"http://{svc_ip}:{port}"
endpoint = service_endpoints[example][service_type]
url = f"{base_url}{endpoint}"
print(f"[OPEA BENCHMARK] 🚀 Running test for {service_name} at {url}")

# Generate a unique index based on the current time
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# Create the run.yaml for the service
run_yaml_paths = create_and_save_run_yaml(
example, service_type, service_name, base_url, test_suite_config, timestamp
)

# Run the test using locust_runtests function
for index, run_yaml_path in enumerate(run_yaml_paths, start=1):
print(f"[OPEA BENCHMARK] 🚀 The {index} time test is running, run yaml: {run_yaml_path}...")
locust_runtests(None, run_yaml_path)

print(f"[OPEA BENCHMARK] 🚀 Test completed for {service_name} at {url}")


def process_service(example, service_name, case_data, test_suite_config):
service = case_data.get(service_name)
if service and service.get("run_test"):
print(f"[OPEA BENCHMARK] 🚀 Example: {example} Service: {service.get('service_name')}, Running test...")
run_service_test(
example, service_name, service.get("service_name"), service.get("parameters", {}), test_suite_config
)


if __name__ == "__main__":
# Load test suit configuration
yaml_content = load_yaml("./benchmark.yaml")
# Extract data
parsed_data = extract_test_case_data(yaml_content)
test_suite_config = {
"concurrent_level": parsed_data["concurrent_level"],
"user_queries": parsed_data["user_queries"],
"random_prompt": parsed_data["random_prompt"],
"run_time": parsed_data["run_time"],
"collect_service_metric": parsed_data["collect_service_metric"],
"llm_model": parsed_data["llm_model"],
"test_output_dir": parsed_data["test_output_dir"],
}

# Mapping of example names to service types
example_service_map = {
"chatqna": [
"embedding",
"embedding_serving",
"retriever",
"reranking",
"reranking_serving",
"llm",
"llm_serving",
"e2e",
],
"codegen": ["llm", "llm_serving", "e2e"],
"codetrans": ["llm", "llm_serving", "e2e"],
"faqgen": ["llm", "llm_serving", "e2e"],
"audioqna": ["asr", "llm", "llm_serving", "tts", "e2e"],
"visualqna": ["lvm", "lvm_serving", "e2e"],
}

# Process each example's services
for example in parsed_data["examples"]:
case_data = parsed_data["all_case_data"].get(example, {})
service_types = example_service_map.get(example, [])
for service_type in service_types:
process_service(example, service_type, case_data, test_suite_config)
Loading

0 comments on commit 71637c0

Please sign in to comment.