## Clarifying Problems

### What is the Business Problem?

What is our business problem? Understanding the business requirements and motivations from our stakeholders will save us a lot of time. After all, one does not wish to design a model just to know it is not what the user/stakeholder wants.

So the first step in this is to discuss and understand the business requirements and lay out some technical constraints and assumptions.

### Clarifying The Business Problem

Sometimes it is good to prompt new questions in case the stakeholders miss it.

Potential questions:

- Was there already a baseline to your problem?
- Is there a benefit structure?

## Clarifying Constraints



### Data

As always, we always want to ask:

> How much data would we have access to? The consideration can be in line with the modelling process later. As a very naive example, for smaller datasets less complex models would be more appropriate but for bigger datasets, larger models like deep neural networks would work better. 

If there are little data, or rather say we are working with a classification problem, we have little data in the minority class, can we ask:

> Are we able to collect more data? Is it very expensive to do so (both manually and financially). 

However, if the minority class is inherently rare (i.e. cancer), then we can keep it as it is sometimes since it represents the population distribution.

### Hardware Constraints

Do we have sufficient hardware for a complex task at hand? GPU matters quite a fair bit especially in deep learning tasks.

#### Where should the model be deployed on?

Does the model need to be fit on a smartphone? I remembered download **YOLOv5**'s IOS app and had fun using it to detect objects around me. I reckon they have a real time inference in the app itself.

### Latency

When deployed, does the model need extremely fast inference time? 

#### Oneline (Real Time)

Is the application real time? A classic example is Google's autocomplete feature. I really like how the autocomplete knows roughly what I want to ask when I just typed a few words, however, if the model inference behind is slow (i.e. only recommending me after I finished typing the query), then the application usefulness is moot.

#### Offline (Non-Real Time)

Maybe our model just want to do **Customer Segmentation** and we can afford more time to do inference.

## Metrics

### Offline Metrics

The offline metrics are pretty much set in stone and are usually accustomed to the type of machine learning problem at hand.

- Classification: Metrics like AUROC, F1 etc. Note accuracy may not be the best as it can be misleading.
- Regression: The typical R-squared, MSE, RMSE and MAPE.
- Object Detection: MAP, IOU.

### Online Metrics

This metric is after we deployed the model in production and say it is an web application taking in user uploaded images and predicting what animal it is. 

Then we can design metrics like **Click Through Rate (CTR)** because we want users to come to our websites more.

In the case of a small example (modified), I always see my Iphone having **scan credit card** to autofill for me the details.

- I envision this as an object detection problem behind the scenes where it first localize and classify the credit card (as sometimes user puts the credit card on the table and the camera captures other objects).
- After this, we perform an OCR on it and gives the user the much needed details.

Then a good online metric will be how many times the user **failed and return back to manual keying in**.

### Non-Functional Metrics

- Training speed etc.

## Model Registry (Artifacts and Experiment Tracking) [^Comprehensive Article on Model Registry][^Made With ML's Experiment Tracking Section][^Weights & Biases as Model Registry]

This is where MLOps come into play. We basically stores each version of model and its corresponding artifacts into a centralized store. Teams can view each others version on the cloud. 

Tracking model eases the deployment process by finding the best model and deploy it.



[^Comprehensive Article on Model Registry]: Comprehensive Article on Model Registry: [https://neptune.ai/blog/ml-model-registry](https://neptune.ai/blog/ml-model-registry)
[^Made With ML's Experiment Tracking Section]: Made With ML's Experiment Tracking Section: [https://madewithml.com/courses/mlops/experiment-tracking/](https://madewithml.com/courses/mlops/experiment-tracking/)
[^Weights & Biases as Model Registry]: Weights & Biases as Model Registry: [https://docs.wandb.ai/guides/models](https://docs.wandb.ai/guides/models)

## Model Deployment

- [PyTorch](https://github.com/pytorch/serve/blob/master/README.md#serve-a-model)

## MLOps [^Google Cloud's Comprehensive Overview of MLOps][^Neptune's Blog on MLOps Architecture]


[^Google Cloud's Comprehensive Overview of MLOps]: Google Cloud's Comprehensive Overview of MLOps: [https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning)
[^Neptune's Blog on MLOps Architecture]: Neptune's Blog on MLOps Architecture: [https://neptune.ai/blog/mlops-architecture-guide](https://neptune.ai/blog/mlops-architecture-guide) has a lot of good diagrams to illustrate.

### Data Science Steps for ML

Google's blog detailed some common steps in a ML pipeline. Most of the consensus over this pipeline is similar, with only varying of terminologies. What the article wants to say is that, each step below defines the **maturity of your MLOps and DevOps**. 

In any ML project, after you define the business use case and establish the success criteria, the process of delivering an ML model to production involves the following steps. These steps can be completed manually or can be completed by an automatic pipeline.

1. Data extraction: You select and integrate the relevant data from various data sources for the ML task.
2. Data analysis: You perform exploratory data analysis (EDA) to understand the available data for building the ML model. This process leads to the following:
    - Understanding the data schema and characteristics that are expected by the model.
    - Identifying the data preparation and feature engineering that are needed for the model.
3. Data preparation: The data is prepared for the ML task. This preparation involves data cleaning, where you split the data into training, validation, and test sets. You also apply data transformations and feature engineering to the model that solves the target task. The output of this step are the data splits in the prepared format.
4. Model training: The data scientist implements different algorithms with the prepared data to train various ML models. In addition, you subject the implemented algorithms to hyperparameter tuning to get the best performing ML model. The output of this step is a trained model.
5. Model evaluation: The model is evaluated on a holdout test set to evaluate the model quality. The output of this step is a set of metrics to assess the quality of the model.
6. Model validation: The model is confirmed to be adequate for deployment—that its predictive performance is better than a certain baseline.
7. Model serving: The validated model is deployed to a target environment to serve predictions. This deployment can be one of the following:
    - Microservices with a REST API to serve online predictions.
    - An embedded model to an edge or mobile device.
    - Part of a batch prediction system.
8. Model monitoring: The model predictive performance is monitored to potentially invoke a new iteration in the ML process.


The level of automation of these steps defines the maturity of the ML process, which reflects the velocity of training new models given new data or training new models given new implementations. The following sections describe three levels of MLOps, starting from the most common level, which involves no automation, up to automating both ML and CI/CD pipelines.

### MLOps Architecture Level 0 

Many teams have data scientists and ML researchers who can build state-of-the-art models, but their process for building and deploying ML models is entirely manual. This is considered the basic level of maturity, or level 0. The following diagram shows the workflow of this process.

<img src="https://storage.googleapis.com/reighns/reighns_ml_projects/docs/deep_learning/MLOps/architecture.svg" style="margin-left:auto; margin-right:auto"/>
<p style="text-align: center">
    <b>MLOps Level 0: Manual ML steps to serve the model as a prediction service.</b>
</p>

#### Characteristics

The following list highlights the characteristics of the MLOps level 0 process, as shown in the figure above:

- Manual, script-driven, and interactive process: Every step is manual, including data analysis, data preparation, model training, and validation. It requires manual execution of each step, and manual transition from one step to another. This process is usually driven by experimental code that is written and executed in notebooks by data scientist interactively, until a workable model is produced.

- Disconnection between ML and operations: The process separates data scientists who create the model and engineers who serve the model as a prediction service. The data scientists hand over a trained model as an artifact to the engineering team to deploy on their API infrastructure. This handoff can include putting the trained model in a storage location, checking the model object into a code repository, or uploading it to a models registry. Then engineers who deploy the model need to make the required features available in production for low-latency serving, which can lead to training-serving skew.

- Infrequent release iterations: The process assumes that your data science team manages a few models that don't change frequently—either changing model implementation or retraining the model with new data. A new model version is deployed only a couple of times per year.

- No CI: Because few implementation changes are assumed, CI is ignored. Usually, testing the code is part of the notebooks or script execution. The scripts and notebooks that implement the experiment steps are source controlled, and they produce artifacts such as trained models, evaluation metrics, and visualizations.

- No CD: Because there aren't frequent model version deployments, CD isn't considered.

- Deployment refers to the prediction service: The process is concerned only with deploying the trained model as a prediction service (for example, a microservice with a REST API), rather than deploying the entire ML system.

- Lack of active performance monitoring: The process doesn't track or log the model predictions and actions, which are required in order to detect model performance degradation and other model behavioral drifts.

The engineering team might have their own complex setup for API configuration, testing, and deployment, including security, regression, and load and canary testing. In addition, production deployment of a new version of an ML model usually goes through A/B testing or online experiments before the model is promoted to serve all the prediction request traffic.


#### Challenges

MLOps level 0 is common in many businesses that are beginning to apply ML to their use cases. This manual, data-scientist-driven process might be sufficient when models are rarely changed or trained. In practice, models often break when they are deployed in the real world. The models fail to adapt to changes in the dynamics of the environment, or changes in the data that describes the environment. For more information, see Why Machine Learning Models Crash and Burn in Production.

To address these challenges and to maintain your model's accuracy in production, you need to do the following:

- Actively monitor the quality of your model in production: Monitoring lets you detect performance degradation and model staleness. It acts as a cue to a new experimentation iteration and (manual) retraining of the model on new data.

- Frequently retrain your production models: To capture the evolving and emerging patterns, you need to retrain your model with the most recent data. For example, if your app recommends fashion products using ML, its recommendations should adapt to the latest trends and products.

- Continuously experiment with new implementations to produce the model: To harness the latest ideas and advances in technology, you need to try out new implementations such as feature engineering, model architecture, and hyperparameters. For example, if you use computer vision in face detection, face patterns are fixed, but better new techniques can improve the detection accuracy.

To address the challenges of this manual process, MLOps practices for CI/CD and CT are helpful. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline. These features are discussed in more detail in the next sections.

- https://www.mle-interviews.com/ml-design-template
- Ace the Machine Learning Interview
- https://madewithml.com/courses/mlops
- [Minimizing real-time prediction serving latency in machine learning ](https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning): Talks about things like Online and Offline predictions etc.
- [Google Cloud's Tutorial on MLOps](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning): Quite a good read.