
# Chapter 9: Deployment and Real-World Applications

This notebook focuses on:
- How to package and deploy generative models
- Real-world use cases and architecture
- Creating an API using FastAPI
- Dockerizing an LLM app for production

## Learning Objectives

- Understand deployment workflows for GenAI apps
- Serve LLMs through a FastAPI REST endpoint
- Package and containerize with Docker
- Monitor performance and handle scalability



## Real-World Applications of Generative AI

- Virtual Assistants and Chatbots
- Legal and Financial Document Summarization
- Personalized Education Systems
- Generative Code Assistants
- Content Creation for Marketing



## FastAPI Endpoint for LLM

We can wrap our LLM in a FastAPI service for easy REST integration.


In [None]:

from fastapi import FastAPI, Request
from transformers import pipeline

app = FastAPI()
generator = pipeline("text-generation", model="gpt2")

@app.post("/generate")
async def generate(request: Request):
    data = await request.json()
    prompt = data.get("prompt", "")
    output = generator(prompt, max_length=50)[0]["generated_text"]
    return {"response": output}



## Running the API

Save the FastAPI code to `main.py`, then run:

```bash
uvicorn main:app --reload
```

Test using:

```bash
curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{"prompt": "Explain AI"}'
```



## Dockerizing a GenAI Service

Create a Dockerfile:

```Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```

Build and run:

```bash
docker build -t genai-api .
docker run -p 8000:8000 genai-api
```



## Monitoring and Logging

Use tools like:
- Prometheus + Grafana for metrics
- Loguru / logging module for logs
- Streamlit for quick dashboards

Tip: Log inference time, input length, error types, GPU memory.



## Exercises

1. Modify the FastAPI endpoint to add streaming response.
2. Add OpenAPI documentation using FastAPI decorators.
3. Dockerize a LangChain RAG application.
4. Set up a logging system to monitor your app in real-time.

## References

- FastAPI Docs: https://fastapi.tiangolo.com
- Docker: https://docs.docker.com
- Hugging Face Inference: https://huggingface.co/docs/transformers/main_classes/pipelines
