## Deployment Patterns - an iterative process
Different cases call for different patterns
- New feature/function/capability, i.e. no benchmark: capability to start with a smaller traffic and then ramp up
- Replacing Human/Automation, i.e. benchmark available: 
    - run the new system in parallel with the benchmark first
    - inspect the examples where new model and benchmark differ (which can be valuable for further improvement)
    - again roll out in small steps

Think of deployment of ML algorithm as a spectrum along human only, shadow mode, AI assistance, partial automation and full automation, rather than a 0-1 classification. The level of automation depends on the use case. For example, quick internet search may be infeasible or costly to keep human in the loop, while in financial market applications, concept drifts render it necessary for human to override.

Some common best practices in deployment.
- Always be able to ramp up or down gradually, with monitoring and QC.
- Build fallback or guardrails, including the ability to rollback/roll out different models quickly
- Somehow related to above, modulize the data pipeline and model deployment.
- Ability to replay or backtest historical data to debug issues (maybe more a data pipeline issue)

## checklist of questions for software engineering issues
- Realtime or Batch: in both training and prediction
- Cloud vs. Edge/Browser
- Computing resources difference between research and deployment: hardware resources, ML package dependencies
- Latency, throughputs: understood as whether the prediction speed suits your need
- Logging
- Security, privacy and ethical considerations.

## Metrics to Monitor in prod
It ususally takes a few tries to converge to the right set of metrics to monitor: reason to deployment and try it out in small traffic first
- Software metrics: memory, compute, latency, throughput, server load
- Input metrics - helping you understand the distribution of $X$ and $y|X$, to guard against domain shifts: number of missing values, descriptive statistics of features
- Output metrics - helping you understand the general health of the machine learning system: frequency of null output, output breaching guardrails, metrics on the viability of the ML algo (e.g. accuracy or MSE)

## References
- [Introduction to Machine Learning in Production](https://www.coursera.org/learn/introduction-to-machine-learning-in-production?specialization=machine-learning-engineering-for-production-mlops)