<div class="prez-title"></div>

# Deploying a Model Prediction Server

*Ethan Swan&nbsp;&nbsp;&nbsp;•&nbsp;&nbsp;&nbsp;PyCon 2023&nbsp;&nbsp;&nbsp;•&nbsp;&nbsp;&nbsp;Slides: [eswan18.github.io/sklearn-api-deploy-slides](https://eswan18.github.io/sklearn-api-deploy-slides)*

# Welcome!

# Today's Goal

Take a **pre-trained model** and deploy it within a **FastAPI app**.


- Using a scikit-learn `LogisticRegression` model
- Predicting the species of an iris flower.

# About Me

### Day Job
- **Backend Engineer** on the Analysis Team at [ReviewTrackers](https://www.reviewtrackers.com/)
- Previously: **Data Scientist** at [84.51˚](https://www.8451.com/) (5 years)

### Outside Teaching and Consulting
- Teaching Python for 6+ years
    - Adjunct at University of Cincinnati
- I offer **consulting** and **corporate training** services
    - Web development & ML engineering

# Find Me Online

- Website: [ethanswan.com](https://ethanswan.com/)
- GitHub: [eswan18](https://github.com/eswan18)
- Twitter: [@eswan18](https://twitter.com/eswan18)

# Agenda

1. Setting up your project workspace
2. A "hello world" FastAPI app
3. Pydantic models and payloads
4. Connecting a model to an API

# Resources

### Slides
[eswan18.github.io/sklearn-api-deploy-slides](https://eswan18.github.io/sklearn-api-deploy-slides)

### Application Code
[github.com/eswan18/sklearn-api-deploy](https://github.com/eswan18/sklearn-api-deploy)

### Incremental Diffs
Section 1: [eswan18.github.io/sklearn-api-deploy-slides/diffs/1.html](https://eswan18.github.io/sklearn-api-deploy-slides/diffs/1.html)

Section 2: [eswan18.github.io/sklearn-api-deploy-slides/diffs/2.html](https://eswan18.github.io/sklearn-api-deploy-slides/diffs/2.html)

Section 3: [eswan18.github.io/sklearn-api-deploy-slides/diffs/3.html](https://eswan18.github.io/sklearn-api-deploy-slides/diffs/3.html)

Section 4: [eswan18.github.io/sklearn-api-deploy-slides/diffs/4.html](https://eswan18.github.io/sklearn-api-deploy-slides/diffs/4.html)


<div class="section-title"></div>

# Setting up your project workspace

# Goals
- Create a virtual environment with the packages we need
- Set up folder structure for we'll use for rest of tutorial

# Getting Started

Before we get started...

- Choose a folder where you're going to save your work during this tutorial
- Make sure you can access it from your IDE


# Project Layout

- Keeping a regular project structure makes it easier to find things
    - If something is broken, you know where to look
- What sorts of things do we need to keep track of?
    - Code
    - Models
    - Tests
    - Metadata (dependencies, etc.)

# Project Layout


- Make folders for application code (`app`), models (`models`), and tests (`tests`)
- Any metadata will go at the root of the repo.

```
project
├── app/
├── models/
└── tests/
```

# Project Layout


We'll also download and add a couple of files to get started
- `requirements.txt` -> https://github.com/eswan18/sklearn-api-deploy/blob/main/app-section-1/requirements.txt
- `iris_regression.pickle` -> https://github.com/eswan18/sklearn-api-deploy/blob/main/app-section-1/models/iris_regression.pickle

```
project
├── app/
├── models/
│   └── iris_regression.pickle
├── tests/
└── requirements.txt
```

# Downloading GitHub Files

Go to the link and look for the download button (upper right)

![GH File Download](images/gh-file-download.png)

# Readmes

- Always include a short "readme" file with your projects
- Explain the purpose of the project and how to install/run it


```
project
├── app/
├── models/
│   └── iris_regression.pickle
├── tests/
├── README.md
└── requirements.txt
```

# Readmes

- Readmes are usuall written in **Markdown**
    - Markdown is simple and will just show what you type, but certain symbols (`*`, `#`) have special meaning
    - Use file extension `.md`

- You can write your own or use mine:
    - https://github.com/eswan18/sklearn-api-deploy/blob/main/app-section-1/README.md

<div class="slide-squeeze"></div>

# Iris Prediction API

This repo contains an Iris prediction server.
To get start the application, run:
```
uvicorn app.main:app --host 0.0.0.0 --port 8000
```

## Fetching Predictions

If the API server is running at `http://localhost:8000`, then the following should work in a local Python session:
```text
>>> import requests
>>> response = requests.post(
...     "http://localhost:8000/prediction",
...     json={
...         "sepal_width": 1,
...         "sepal_length": 1,
...         "petal_length": 1,
...         "petal_width": 1,
...     },
... )
>>> response.status_code
200
>>> response.json()
{'flower_type': 0}
```

# Virtual Environments

- Different projects we work on will typically require different libraries
- A "virtual environment" is a way to keep a project-specific set of "dependencies"

# Creating Virtual Environments

1. Navigate to the base of your project folder in the terminal
    - `cd ~/path/to/project` (I can help with this)
2. Create a fresh virtual environment with `python3 -m venv .venv`
3. "Activate" this environment
    - `source .venv/bin/activate` (Mac/Linux)
    - `.venv\Scripts\activate` (Windows)
4. Install the libraries from our requirements file
    - `pip install -r requirements.txt`

# Trying It Out

You can make sure it worked by starting up Python and trying to import FastAPI

```
(.venv) $ python
>>> import fastapi
```

If that runs without error, we're good to go!

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Create folders: `app`, `models`, `tests`
2. Download & save model file in `models` folder
3. Download & save `requirements.txt` file in base of project folder
4. Write a `README.md` file in base of project folder
4. Create a virtual environment and install requirements
    - `python3 -m venv .venv`
    - `source .venv/bin/activate`
    - `pip install -r requirements.txt`


<div class="section-title"></div>

# A "hello world"<br>FastAPI app

# Goals
- Build a FastAPI app that returns `"the API is running"` at `localhost:8000/`

# Web APIs

A Web API is a bit like **a function that you can call over the internet**
- You send a **request** and get back a **response**
    - A request is like function arguments
    - A response is like a function return value
- Requests specify a **method** -- a special argument for what type of action to take
    - `GET` -> asking for some data
    - `POST` -> sending some data
    - ... some others we won't use today

# HTTP

The protocol for doing this is called **HTTP**

![Request/response diagram](images/request-response.jpeg)

# Web APIs

APIs are a little bit different from functions though...

- **Network issues**: can result in slow (or no) response
- **Status codes**: success vs error indicated by a 3-digit code in the response
    - Codes 200-299 = success
    - Most others = error
- **Routes**: APIs are called by URL, not function name
    - e.g. `https://myweatherapi.com/chicago/temp`
    - We call `/chicago/temp` the "route" or "path"

# FastAPI

Awesome Python library for easily building web APIs
- **Simple**: Represents each route as a Python function
    - Uses type hints to figure out what the request should look like.
- **Documentation**: Automatically generates docs
- **Performance**: Handles requests asynchronously without any extra work

# A Simple FastAPI App



```python
# app/main.py

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def status():
    """Check that the API is working."""
    return "the API is up and running!"
```

# Running the App

```text
$ uvicorn app.main:app --host 0.0.0.0 --port 8000
```

```text
INFO:     Started server process [36347]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
```

Let's look at `http://localhost:8000/` and `http://localhost:8000/docs`.

![FastAPI GET /status](images/fastapi-get-status.png)

# Testing Our App

- Writing automated tests for your project can help ensure that things work as you expect
    - You don't have to manually test your whole app after every change

- We're going to use the very popular `pytest` library for testing
    - Most popular third-party testing library in Python
    - Easy to get started with

# Pytest

- Keep your test files in the `tests/` directory
- Files should be named `tests/test_<thing>.py`
- Tests themselves are just functions with `assert` statements

```python
# tests/test_addition.py

def test_addition_of_2_and_2():
    result = 2 + 2
    assert result == 4
```

# Pytest Fixtures

- Pytest lets you write setup code to use across multiple tests through **fixture functions**
- These are generally kept in `tests/conftest.py`

# Pytest Fixtures

In our case, we want to create an instance of our app to use in our tests:

```python
# tests/conftest.py

import pytest
from fastapi.testclient import TestClient

from app.main import app


@pytest.fixture
def client() -> TestClient:
    return TestClient(app)
```

# Testing Our Status Endpoint

Then we can use it in a test that checks the status endpoint:

```python
# tests/test_app.py

from fastapi.testclient import TestClient


def test_status_endpoint(client: TestClient):
    response = client.get("/")
    assert response.status_code == 200
    payload = response.json()
    assert payload == "the API is up and running!"
```

# Running Tests

Kick off tests from the command line with `python -m pytest`.

![pytest output](images/pytest.png)

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Build a `GET` endpoint for `/`
    - At `app/main.py`
    - It should return `"the API is running"` when pinged
2. Test the endpoint interactively
    - `uvicorn app.main:app`
    - `http://localhost:8000/` in the browser
3. Write a test fixture for a `TestClient`
    - At `tests/conftest.py`
4. Write a test for the `/` endpoint
    - At `tests/test_app.py`
5. Run tests
    - `python -m pytest`


<div class="section-title"></div>

# Pydantic models and payloads

# Goals
- Build Pydantic models to represent observations and prediction values for our model
- Create a `/predict` endpoint that accepts an observation and returns a (dummy) prediction

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Add `Observation` Pydantic model
    - Fields: `sepal_length`, `sepal_width`, `petal_length`, `petal_width` (floats)
2. Add `Prediction` Pydantic model
    - Fields: `flower_type` (literal)
3. Write a "fake" POST `/predict` endpoint
    - Test it interactively: `http://localhost:8000/docs`
4. Write a test for it
    - At `tests/test_app.py`

<div class="section-title"></div>

# Connecting our model to the API

# Goals
- Update the implementation of the `/predict` endpoint to use our sklearn model

<div class="your-turn"></div>

# ❗ Your Turn ❗

1. Write `load_model()` function
    - At `app/main.py`
2. Add `Observation.as_row()` method
    - Return a `pandas.Series` object
3. Implement `/predict` endpoint with the real model
    - Test it interactively: `http://localhost:8000/docs`
4. Update test for for POST `/predict` endpoint
    - Add an observation: `[7.1, 3.5, 3.0, 0.8]` -> `versicolour`

## Optional

1. Add a POST `/batch_predict` endpoint
    - `def batch_predict(observations: List[Observation]) -> List[Prediction]:`
2. Add a test for it


# Questions

<div class="section-title"></div>

# Other topics

# Package managers

- Using `requirements.txt` alone is a bit hacky
    - No way to separate *direct* dependencies from *transitive* dependencies
- A tool like `poetry` is a good choice
    - Separates direct deps (in `pyproject.toml`) from transitive deps (in `poetry.lock`)
    - Handles upgrades
    - Allows for installing your project as a package, making imports easier

# Model storage formats

- We used pickle for simplicity
- Pickle has some compatibility concerns
    - Not always portable across Python versions, package versions, and OSes/architectures
- However, not a lot of other common options in my experience
    - Can save a matrix of weights if it's a neural net
    - Some packages have their own serialization formats

# Alternatives to API-based deployment

- **Batch prediction**: run predictions on a schedule and save results to a database
    - If model scoring is slow, this means predictions are ready when needed
    - *But* your predictions can be out-of-date
- **Streaming prediction**: score data in small batches as it arrives
    - Again, predictions are ready when needed (usually)
    - *But* more complicated to set up than batch or API-based prediction

# Thorough testing

- We only really wrote one test
- Ideally you'd have several tests for each endpoint
    - Test the "happy path" with a few predictions
    - Test error handling with bad inputs
- How to handle testing the model itself? Tricky question
    - Often not so bad to test a few predictions, but this may change with new model versions
    - An active field right now, I haven't seen clear consensus

# Authentication

- A data scientist probably won't (and shouldn't) write authentication code
- However, it's good to be aware of the options
    - Basic auth: just pass username and password
    - API keys: issue a token to the user that they send back with their requests
    - OAuth: a more complicated protocol for authentication

# Deploying an API

- As a data scientist or ML engineer, unlikely you'll be doing this but maybe
- Typically, host it in the cloud:
    - Simple: Heroku
    - Medium: containerize (with Docker) and run app on AWS, GCP, Azure
    - Hard: Kubernetes