# Parallel Inference Demo: HuggingFace Sentiment Analysis API

This notebook demonstrates how to send concurrent POST requests to a locally deployed FastAPI server that wraps a HuggingFace model. The goal is to validate the server's ability to handle parallel inference requests efficiently.

In [1]:
import requests
import concurrent.futures

## Step 1: Define the API Endpoint

Since the service is deployed using Docker Compose, NGINX listens on port `80` and routes incoming requests to the FastAPI app. On the host machine, this is accessible via:

In [48]:
# Endpoint URL
API_URL = "http://localhost/predict"

## Step 2: Prepare Input Texts

We define a small batch of example sentences for sentiment classification. These texts will be sent in parallel to simulate concurrent user requests.


In [49]:
# Sample inputs
texts = [
    "I love this!",
    "This is terrible.",
    "Fantastic experience.",
    "Horrible and boring.",
    "Absolutely amazing!",
    "Not worth watching.",
    "I enjoyed it a lot.",
    "Disappointing outcome."
]

## Step 3: Define the Request Function

This function sends a POST request to the `/predict` endpoint with the given text and returns the server's response.

In [50]:
# Function to send POST request
def send_request(text):
    try:
        response = requests.post(API_URL, json={"text": text}, timeout=10)
        return response.json()
    except Exception as e:
        return {"error": str(e)}


## Step 4: Send Requests in Parallel

We use Python's `ThreadPoolExecutor` to simulate multiple users sending requests at the same time. This helps assess the server's concurrency capabilities under load.


In [52]:
# Send requests in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    futures = [executor.submit(send_request, text) for text in texts]
    for future in concurrent.futures.as_completed(futures):
        print(future.result())

{'result': [{'label': 'POSITIVE', 'score': 0.9998764991760254}]}
{'result': [{'label': 'NEGATIVE', 'score': 0.9996345043182373}]}
{'result': [{'label': 'POSITIVE', 'score': 0.999881386756897}]}
{'result': [{'label': 'NEGATIVE', 'score': 0.9997796416282654}]}
{'result': [{'label': 'POSITIVE', 'score': 0.9998759031295776}]}
{'result': [{'label': 'NEGATIVE', 'score': 0.9997840523719788}]}
{'result': [{'label': 'POSITIVE', 'score': 0.9998775720596313}]}
{'result': [{'label': 'NEGATIVE', 'score': 0.9997884631156921}]}


## Step 5: Conclusion

Each printed output shows the sentiment prediction (label and confidence) for an input text. This confirms the API is functioning correctly under concurrent access.

This setup can be easily scaled or extended for further stress testing using tools like `locust` or `wrk` in a production environment.
