# Introduction

MLOps is the abreviation for Machine Learning Operations. It's the set of practices to **design**, **deploy** and **maintain** machine learning **in production** **continuously**, **reliably** and **efficiently**.

The full machine learning lifecycle:
![image.png](attachment:415cf0ed-73db-44f8-a0e8-e4359cd8eb5b.png)



The origin of MLOps is devops that is a set of practices and tools that ensure the software can be delivered continuously, reliably and efficiently.

![image.png](attachment:1b02c3b5-2feb-4f87-a397-04d1116b3ce1.png)

Traditionally development and operations have been separated.

[more info](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning)

In real world many components come to play when running machine learning solutions on production: 
- configuration
- automation
- data collection
- data verification
- feature engineering
- testing and debugging
- model analysis
- metadata management
- resource management
- serving infrastructure
- monitoring
- process management
- ML code

MLOps aims to: 
- bridge the gap between machine learning and operations teams, enhancing collaboration
- automate deployment of models
- facilitates monitoring of model performance



# The Machine Learning Lifecycle 

Design > Development > Deployment >
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

- It structures the ml process
- Defines key players ad each stage
- Its our toolkit for optimization


## Design 

- Clarify the context of the problem
- Assess the added value of using machine learning
- Gathering clear business requirements
- Key metrics to track progress
- Ensuring high quality data processing
- Involve stake holders to evaluate the project viability and make informed decissions

## Development 

- Develop machine learning model
- Experiment with data, algos and hyperparameters
- Goal: model ready for deployment

## Deployment

- Integrate the model in business
- Deploying the model in production
- Monitor the performance, alerts, etc

# Roles in MLOps 

## Business Roles

### Business StakeHolder

- Budget decissions
- Alingment with company vision
- Involved throughout the lifecycle but participates mainly in the problem definition and requirements gathering, model evaluation and model monitoring.

### Subject Matter Expert

- Provides Domain knowledge
- Involved throughout the lifecycle but participates mainly in the exploratory data analysis and feature engineering, model evaluation and model monitoring.

## Technical Role

### Data Scientist

- Data Analysis
- Model Training and Evaluation
- Since EDA till model training evaluation + monitoring.

### Data Engineer

- Collects, stores and processes data.
- Check and maintain quality.
- Involved in implementing the desing, feature engineering, model training and evaluation and monitoring.

### ML Engineer
- Versatile Role
- Specifically designed for complete machine learning lifecycle
- Cross Functional Role

### Other Roles
- Data Analysts
- Developers
- Software Engineers
- Backend Engineers

In startups the roles and responsibilities tend to differ from large companies


# MLOps Design Fhase

## Added Value 

The added value of a machine learning model is often measured in money or time.


## Business Requirements 

Its key to keep in mind the end user of the machine learning model: 
- Speed
- Accuracy
- Explainability/Transparency

Another aspect that can impact our project is compliance and regulations.

Budget and team size are important too.

## Key Metrics

Data scientist looks at accuracy

SME looks into customer happiness

Business Stakeholder focuses on the generated revenue

## Data Quality 

Data quality refers to both the characteristics associated with high quality data and the processes used to measure or improve the quality of data

4 dimensions of data quality:
- Accuracy: representation of reality
- Completeness: thorough description
- Consistency: similar definitions
- Timeliness: when the data becomes available

## Data Ingestion 

Automated, etl pipelines with checks.


# MLOps Development Phase 

## Feature Engineering and the Feature Store 

Process of selecting, manipulating and transforming raw data into features

More features can: 
- produce a very accurate model
- achieve more stability
- be more expensive due to additional pre-processing steps

A feature store is a place to store commly used engineered features. Feaures can be monitored there too. In general its a good idea to use a feature store when the engineered features are computationally expensive.

## Experiment Tracking

During a machine learning project many models are trained. Each of them has its own: 
- Machine learning algorithm
- Model hyperparameters
- Versions of data
- Execution scripts
- Environment configurations

Each combination of these elements lead to a different experiment. Each one having its own outcome. Keeping track of the configurations and outcomes of each experiment will allow us to compare results and reproduce them, collaborate on experiments and report results to stakeholders.

Depending on the amount of experiments, team size... we can use a spreadsheet, a in house platform or a experiment tracking tool (can be expensive).


# MLOps Deployment Phase 

Its time not to move from the development environment to the production environment.

Special attention has to be put in the runtime environment, to make sure its the same in development and production.

To mitigate differences between environments we can use containers. Deploying the ML solution as a container is the standar nowadays in MLOps. Its easy to maintain, portable and fast to start up.

## Machine Learning Deployment Architecture 

Monolithic VS Microservices

ML is mainly deployed as a microservice

**Inferencing** is the process in which we send new input to the machine learning model and receive output from the model. 

To interact with a machine learning model its often used an API.

## Integration 

Last step before having the machine learning model in the business process

# CI/CD and deployment strategies 

Another concept that comes from the software development domain 

![image.png](attachment:c0b5cd0c-3df1-4df4-bb47-1dbe4b915fde.png)

Continuous integration are the practices while code is being written

Continuous deployment are the practices after the code is completed

Deployment strategies: 
- Basic: replace old model with the new one
- Shadow: we send data to both so we can test both
- Canary: we balance new data little by little to the new model

## Automation and Scaling

Templating for business requirement

Data acquisition can be automated

Feature store saves time and helps to scale

Experiment tracking automates tracking and enables reproducibility

Contenarization easy to spawn multiple copies (scalability)

CI/CD enable collaboracion

Microservices

# Monitoring

**Statistical Monitoring**: focuses on the input and output data, including predictions 

**Computational Monitoring**: focuses on technical metrics (cpu usage, number of incoming requests...)

Once the model is live we will be gathering what will be considered the **ground truth**

**Feedback loop**: process through which the ground truth is used to improve the machine learning model.

# Retraining

Data changes through time. Those changes impact machine learning models. **Retraining** is the process of using new data to create a more up to date model.

**Drifts** in data can be: 
- Data drift: change in the input data. They may affect the performance of the model
- Concept drift: change in the relationship between the input and output. This would mean that the patterns the model learned once are not of application anymore.

## How often to retrain

It would depend on business criteria, cost or model degradation.

## Automated retraining 



# MLOps maturity level

The **MLOps maturity level** measures the level of automation, collaboration and monitoring within MLOps processes 

Higher level is not necessarily better

![image.png](attachment:22664420-6a21-4d92-8250-74bbdfc70dab.png)


# MLOps tools

- **Feature store**: feast, hopsworks
- **Experiment tracking**: mlflow, clearML, Weights and Biases
- **Contenarization**: docker and kubernetes
- **CD/CI**: jenkins
- **Monitoring**: fiddler, great_expectations
- **Full MLOps**: amazon sagemaker, azure machine learning, google Cloud AI Platform


