# Machine Learning Operations

### Framework for Building Machine Learning Models

- The generic MLOps workflow brings together Data Engineering, DevOps and Machine Learning
- It is generally composed of the MLOps pipeline and drivers.

#### *MLOps Pipeline*

- The MLOps pipeline performs operations including building, deploying and monitoring models.
- All models trained, deployed, and
monitored using the MLOps method are end-to-end traceable and their lineage is logged in
order to trace the origins of the model, which includes the source code the model used to
train, the data used to train and test the model, and parameters used to converge the model.

#### *Drivers*

- The key drivers for the MLOps pipeline include data, code, artifacts, middleware and infrastructure.

**Data**
- To manage data in ML applications, data is handled in these steps: data acquisition, data annotation, data cataloging, data preparation, data quality checking, data sampling, and data augmentation.

**Code**
- There are three essential modules of code that drive the MLOps pipeline:
training code, testing code, and application code. 
- These scripts or code are executed using the CI/CD and data pipelines to ensure the robust working of the MLOps pipeline.

**Artifacts**
- The MLOps pipeline generates artifacts such as data, serialized models,
code snippets, system logs, ML model training, and testing metrics information. 
- All these artifacts are useful for the successful working of the MLOps pipeline, ensuring its traceability and sustainability. 
- These artifacts are managed using middleware services such as the model registry, workspaces, logging services, source code management services, databases, and so on.

**Middleware**
- Middleware refers to computer software that provides services to software applications that are more than those available from the OS.
- Middleware services ensure multiple applications to automate and orchestrate
processes for the MLOps pipeline.

**Infrastructure**
- Infrastructure essentially reers to storage and computing resources to ensure the successful working of the MLOps pipeline.
- When it comes to the infrastructure, there are various options such as on-premises resources or infrastructure as a service (IaaS), which is cloud
services.

- A fully automated MLOps workflow can be achieved through the ptimization and synergy of the drivers with the MLOps pipeline.
- An advantage of having an automated MLOps workflow is the increase in the efficiency of the IT team by reducing the time spent working on repeatable tasks.

## Characterizing you Machine Learning Problem

### Machine learning solution development process

- While ML offers many possibilities to augment and automate business, in order to get the best out of ML teams involved in ML-driven buisness transformation it is important to understand both ML and the business itself, including aspects such as value-chain analyis, use-case identification and business simulations to validate transformation.
- Understanding the business is the first step of ML solutions, followed by data analysis where data is acquired, versioned and stores, after which it is consumed for ML modeling using data pipelines where feature engineering is done to get the right features to train the model. 
We evaluate the trained models and package them for deployment. Deployment and monitoring are done using a pipeline taking advantage of Continuous Integration/Continuous Deployment
(CI/CD) features that enable real-time and continuous deployment to serve trained ML
models to the users. 
- This process ensures robust and scalable ML solutions.

### Types of ML Models

*Learning Models*
- Supervised learning
- Unsupervised learning

*Hybrid models*
- Semi-supervised learning(some data is labled and large amounts of data are unlabeled)
- Self supervised learning(different from unsupervised learning in that it does not focus on clustering and grouping)
- Multi-instance learning(supervised learning where data is not labeled by individual samples but rather in categories and samples)
- Multitask learning(model trained on one dataset then used to solve multiple tasks, eg using word embeddins in NLP)
- Reinforcement learning(an agent, such as a robot system, learns to operate in a defined environment to perform sequential decision-making tasks or achieve a pre-defined goal. Simultaneously, the agent learns based on continuously evaluated feedback and rewards from the environment.)
- Ensemble learning(Two or more models trained on the same data and the result is the average of the outputs of the various models used to determine the final prediction)
- Transfer learning(model is trained to perform a task, nd is transfered to another model to act as a starting point for finetuning or trainin for performing another task, eg, pretrained model like BERT models)
- Federated learning(ML done is a collaborative fashion, training process distributed accross devices and data isn't shared for privacy and security)

*Statistical models*
- Inducive learning(It involves
a process of learning by example, where a system tries to generalize a general function or rule from a set of observed instances. For example, when we fit an ML model, it is a process of induction.)
- Deductive learning
- Transductive learning