<style>
    div:has(> hr) h1 {
        font-size: 5em !important;
    }
</style>

# Model Deployment

*Model formats, batch vs realtime scoring, and deployment pipelines*

---

Ethan Swan, 2023

# Today's Topic: Deployment

Once a model has been selected, we usually need to do something with it.

*e.g.* We want to predict what products we should suggest to an online user. 

Setting up a model to run automatically on new data is called **deployment**.

# Agenda

1. About Me
2. Exporting a Model
3. Batch, Realtime, & On-demand Scoring
4. Deployment Pipelines

# About Me
---

# About Me

- Senior Backend Developer at [ReviewTrackers](https://www.reviewtrackers.com/)
    - Startup, ~100 employees
    - SaaS platform for online reputation management
- Analytics & ML Engineering Team
- NLP Microservice (Python), Main API Layer (Go)

- Started in February 2022
- Wanted to see a techy startup from the inside
    - especially engineering practices

# About Me

- Previously: [84.51˚](https://www.8451.com/)
    - Marketing Analytics Branch of Kroger
    - Lead Data Scientist - Internal Tools & Infrastructure
- Education: University of Notre Dame
    - B.S. in Computer Science
    - M.B.A.

- 8451:
    - Did some measurement work
    - quickly transitioned to functional support
    - taught classses and helped with tech strategy

# What is Model Deployment?
---

# Why Do We Deploy Models?

- A model is ultimately a function that maps inputs (*features*) to outputs (*targets*).
    - Usually Python or R code
- How can we use that function in a real-world application?
    - How do we get new data into the model?
    - What happens if we shut down the session? Is the model gone forever?

# Deploying the Model

1. Export (save) the model in a reusable place and format.
2. Build batch, streaming, or on-demand scoring system that loads the model.
3. Run the scoring system and feed it new data.

# Exporting Models
---

# Exporting Models: Formats

- Pickle
    - Special, non--human-readable binary format
    - Can save any Python object
    - Some compatibility issues
- Raw weights/parameters
    - Just a bunch of numbers in a file
    - More common for TensorFlow, PyTorch, etc.

- Pickle and similar libraries are easier and **more flexible**
    - but compatibility concerns
- raw model weights are **more portable**
    - but not necessarily easy to reload


# Exporting Models: Locations


- Local filesystem
    - Only if the model is going to be deployed locally
- Cloud storage
    - S3, GCS, Azure Blob Storage, etc.
    - Works for almost any deployment location

# Deployment Approaches

---

# Batch, Streaming, and On-demand Scoring
- **Batch**
    - Run the model in advance and save the output
    - Think: Spotify Discover Weekly
- **Realtime**
    - Run the model on new data as it comes in
    - Think: GitHub Copilot recommmends code as you type
    - **Streaming**
        - Queue up data and run the model 
        - Think: New 
    - **True On-demand**
        - Run the model only when the prediction is needed via an API

# Batch Scoring

Architecture
- Airflow or cloud-based scheduler to kick off the model
- Chain parts of the job (tasks) together
- Save output to a persistent location: database or cloud storage

# Batch Scoring

Pros
- Predictable workload (always one run per hour/day/week)
- Relatively easy to set up

Cons
- Predictions are stale until the next run
- Reruns happen even if nothing has changed -- wasting resources

# Realtime Scoring

Two main kinds...

- **Streaming** – Trigger the model on new "events"
- **True On-demand** – Access the model via an API when new predictions are needed

# Batch: Streaming

Kick off a model run when a certain event occurs

This is technically "batch" scoring, but it's realtime-ish

- e.g. when a new customer data is uploaded, regenerate product recommendations


Architecture
- A "publisher" sends a message to a "subscriber" when an event occurs
- Messages are "queued" up until the subscriber is ready to process them
    - Thus not truly realtime
- Platform: Kafka, RabbitMQ, cloud-based pub/sub, etc.


# Batch: Streaming

Pros
- Queues help manage spikes in workload
- Consumers (scorers) can run in parallel
    - Enables easy horizontal scaling
- Queues add additional resilience
    - A consumer crash doesn't result in lost data

Cons
- Requires an additional system (the message queue service)
- Data flow through queues is difficult to trace and reason about
- Large workloads can cause long backups

# Batch: Manual

This is exactly what you think: log in and kick off the scoring yourself.

Pros
- No setup

Cons
- Fragile, error-prone, and time-consuming
