# FastAPI Dynamic Batching Benchmark

This notebook demonstrates the difference between dynamic batching and no dynamic batching using FastAPI. We'll create two endpoints, one with dynamic batching and one without, then benchmark them by sending requests.

## Step 1: Import Required Libraries

In [1]:
import asyncio
import time
from typing import List, Union
import threading

import aiohttp
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
import uvicorn

# Import the necessary batching libraries
import batch

  from tqdm.autonotebook import tqdm, trange


## Step 2: Define FastAPI App and Endpoints

In [2]:
app = FastAPI()

# Load the sentence-transformers model
model_dynamic = None
model_no_dynamic = None

class TextInput(BaseModel):
    input: Union[str, List[str]]

class EmbeddingResponse(BaseModel):
    embedding: List[float]

@app.post("/embeddings_dynamic", response_model=EmbeddingResponse)
async def create_embedding_dynamic(input_data: TextInput):
    global model_dynamic
    if model_dynamic is None:
        model_dynamic = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
        model_dynamic.half()
        model_dynamic.encode = batch.aio.dynamically(model_dynamic.encode)
    embedding = await model_dynamic.encode([input_data.input])
    return EmbeddingResponse(embedding=embedding[0].tolist())

@app.post("/embeddings_no_dynamic", response_model=EmbeddingResponse)
async def create_embedding_no_dynamic(input_data: TextInput):
    global model_no_dynamic
    if model_no_dynamic is None:
        model_no_dynamic = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
        model_no_dynamic.half()
        model_no_dynamic.encode = batch.utils.ensure_async(model_no_dynamic.encode)
    embedding = await model_no_dynamic.encode([input_data.input])
    return EmbeddingResponse(embedding=embedding[0].tolist())

## Step 3: Define Server Start Function

In [3]:
def start_server():
    uvicorn.run(app, host="0.0.0.0", port=8000)

## Step 4: Define Benchmark Function

In [4]:
async def benchmark(url: str, num_requests: int):
    async with aiohttp.ClientSession() as session:
        start_time = time.time()
        tasks = []
        for _ in range(num_requests):
            task = asyncio.create_task(
                session.post(url, json={"input": "This is a test sentence."})
            )
            tasks.append(task)
        await asyncio.gather(*tasks)
        end_time = time.time()
    return end_time - start_time

## Step 5: Start the Server in a Separate Thread

In [5]:
server_thread = threading.Thread(target=start_server)
server_thread.start()

INFO:     Started server process [73517]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


## Step 6: Run Benchmarks

In [7]:
async def run_benchmarks():
    print("Running benchmarks...")
    num_requests = 1000
    
    dynamic_time = await benchmark("http://localhost:8000/embeddings_dynamic", num_requests)
    print(f"Dynamic batching: {dynamic_time:.2f} seconds for {num_requests} requests")
    
    no_dynamic_time = await benchmark("http://localhost:8000/embeddings_no_dynamic", num_requests)
    print(f"No dynamic batching: {no_dynamic_time:.2f} seconds for {num_requests} requests")
    return dynamic_time, no_dynamic_time

dt, no_td = await run_benchmarks()
print(f"Speedup factor: {no_td / dt:.2f}x")

Running benchmarks...
INFO:     127.0.0.1:57210 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57214 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57222 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57230 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57228 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57226 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57220 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57224 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57218 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57212 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57216 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57232 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57236 - "POST /embeddings_dynamic HTTP/1.1" 200 OK
INFO:     127.0.0.1:57244 - "POST /embeddings_dynamic H

## Step 7: Analyze Results

Based on the benchmark results, you should see that the dynamic batching endpoint is faster, especially as the number of concurrent requests increases.

You can modify the `num_requests` variable in the `run_benchmarks()` function to test with different loads and see how the performance difference scales.

## Step 8: Clean Up (Optional)

If you want to stop the server after running the benchmarks, you can use the following cell. Note that this will terminate the notebook kernel, so only run it when you're done with the notebook.

In [None]:
import os
import signal

os.kill(os.getpid(), signal.SIGINT)