<div class="prez-title"></div>

# Deploying a Model Prediction Server

*Ethan Swan&nbsp;&nbsp;&nbsp;&nbsp;•&nbsp;&nbsp;&nbsp;&nbsp;PyCon 2023*

# Welcome!

# Today's Goal

Take a **pre-trained model** and deploy it within a **FastAPI app**.


- Using a scikit-learn `LogisticRegression` model
- Predicting the species of an iris flower.

# About Me

### Day Job
- **Backend Engineer** on the Analysis Team at [ReviewTrackers](https://www.reviewtrackers.com/)
- Previously: **Data Scientist** at [84.51˚](https://www.8451.com/) (5 years)

### Outside Teaching and Consulting
- Teaching Python for 6+ years
    - Adjunct at University of Cincinnati
- I offer **consulting** and **corporate training** services
    - Web development & ML engineering

# Agenda

1. Setting up your project workspace
2. A "hello world" FastAPI app
3. Pydantic models and payloads
4. Connecting a model to an API

<div class="section-title"></div>

# Setting up your project workspace

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Create folders: `app`, `models`, `tests`
2. Save model file in `models` folder
3. Save `requirements.txt` file in base of project
4. Create a virtual environment and install requirements
    - `python3 -m venv .venv`
    - `source .venv/bin/activate`
    - `pip install -r requirements.txt`


<div class="section-title"></div>

# A "hello world"<br>FastAPI app

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Build a `GET` endpoint for `/status`
    - At `app/main.py`
    - It should return `"the API is running"` when pinged
2. Test the endpoint interactively
    - `uvicorn app.main:app`
    - `http://localhost:8000/status` in the browser
3. Write a test fixture for a `TestClient`
    - At `tests/conftest.py`
4. Write a test for the `/status` endpoint
    - At `tests/test_app.py`
5. Run tests
    - `python -m pytest`


<div class="section-title"></div>

# Pydantic models and payloads

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Add `Observation` Pydantic model
    - Fields: `sepal_length`, `sepal_width`, `petal_length`, `petal_width` (floats)
2. Add `Prediction` Pydantic model
    - Fields: `flower_type` (literal)
3. Write a "fake" POST `/predict` endpoint
    - Test it interactively: `http://localhost:8000/docs`
4. Write a test for it
    - At `tests/test_app.py`

<div class="section-title"></div>

# Connecting our model to the API

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Write `load_model()` function
    - At `app/main.py`
2. Add `Observation.as_row()` method
    - Return a `pandas.Series` object
3. Implement `/predict` endpoint with the real model
    - Test it interactively: `http://localhost:8000/docs`
4. Update test for for POST `/predict` endpoint
    - Add an observation: `[7.1, 3.5, 3.0, 0.8]` -> `versicolour`

## Optional

1. Add a POST `/batch_predict` endpoint
    - Takes `list[Observation]` as input
    - Returns `list[Prediction]` as output
2. Add a test for it


# Questions

<div class="section-title"></div>

# Other topics

# Package managers

- Using `requirements.txt` alone is a bit hacky
    - No way to separate *direct* dependencies from *transitive* dependencies
- A tool like `poetry` is a good choice
    - Separates direct deps (in `pyproject.toml`) from transitive deps (in `poetry.lock`)
    - Handles upgrades
    - Allows for installing your project as a package, making imports easier

# Model storage formats

- We used pickle for simplicity
- Pickle has some compatibility concerns
    - Not always portable across Python versions, package versions, and OSes/architectures
- However, not a lot of other common options in my experience
    - Can save a matrix of weights if it's a neural net
    - Some packages have their own serialization formats

# Alternatives to API-based deployment

- **Batch prediction**: run predictions on a schedule and save results to a database
    - If model scoring is slow, this means predictions are ready when needed
    - *But* your predictions can be out-of-date
- **Streaming prediction**: score data in small batches as it arrives
    - Again, predictions are ready when needed (usually)
    - *But* more complicated to set up than batch or API-based prediction

# Thorough testing

- We only really wrote one test
- Ideally you'd have several tests for each endpoint
    - Test the "happy path" with a few predictions
    - Test error handling with bad inputs
- How to handle testing the model itself? Tricky question
    - Often not so bad to test a few predictions, but this may change with new model versions
    - An active field right now, I haven't seen clear consensus

# Authentication

- A data scientist probably won't (and shouldn't) write authentication code
- However, it's good to be aware of the options
    - Basic auth: just pass username and password
    - API keys: issue a token to the user that they send back with their requests
    - OAuth: a more complicated protocol for authentication

# Deploying an API

- As a data scientist or ML engineer, unlikely you'll be doing this but maybe
- Typically, host it in the cloud:
    - Simple: Heroku
    - Medium: containerize (with Docker) and run app on AWS, GCP, Azure
    - Hard: Kubernetes