```
From: https://github.com/ksatola
Version: 0.0.1

TODOs
1. 

```

# MLOps

**Pipeline** - a clear (recurring) sequence of well-defined steps

Pros:
- reusability: reduces the effort of deployment (build once, use many times and everywhere)
- standardization (faster onboarding, reduced learning curve) 
- automation: reduces/eliminates number of human errors, increases speed of execution, allows continuous pipeline execution
- reproducibility: consistent results for repetitive tasks
- final product more predictable, consistent, and reliable 
- once built, saves time and money
- spend time on more interesting (valuable) tasks

Cons:
- can be a long and costly process to build
- not ideal when you want to produce small quantities or highly customized products

**Machine Learning (ML) pipelines** implement and formalize processes to accelerate, reuse, manage, and deploy ML models. The key benefit of ML pipelines lies in the automation of the model life cycle steps. Other benefits include:
- Ability to focus on new models, not maintaining existing ones.
- Prevention of bugs.
- Useful paper trail.
- Standardization.
- More development time for novel models.
- Simpler processes to update existing models.
- Less time spent to reproduce models.

Not every data science (DS) project needs a pipeline. However, as soon as a model has users, it will require continuous updates and finetuning. Pipelines also become more important as a ML project grows.

### Automation tools:
- Jenkins
- Travis

### Data versioning tools:
- [Data Version Control (DVC)](https://dvc.org)
- [Pachyderm](https://www.pachyderm.com)

### Pipeline orchestration tools:
- [Apache Beam](https://beam.apache.org)
- Apache Airflow
- Kubeflow Pipelines

### ML Pipeline Frameworks
- [TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx)
- AeroSolve
- Railyard
- Luigi
- Michelangelo
- Metaflow

## Apache Beam
[Apache Beam](https://beam.apache.org) - open source tool for defining and executing data-processing jobs. Uses in TFX:
- Runs under the hood of many TFX components to carry out processing stepa like data validation or preprocessing.
- Can be used as a pipeline orchestrator.

## TensorFlow Extended (TFX)
[TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx) is the open source version of Google's internal ML pipeline framework. It contains the pipeline components and glue code for orchestration tools like Apache Airflow, Apache Beam and Kubeflow Pipelines. The pipeline components are:
- Data ingestion with ExampleGen.
- Data validation with StatisticsGen, SchemaGen and ExampleValidator.
- Data preprocessing with Transform.
- Model training with Trainer.
- Checking for previously trained models with ResolverNode.
- Model analysis and validation with Evaluator.
- Model deployments with Pusher.

```
#pip install tfx==0.22.0
pip install tfx
pip install apache-beam[interactive]

# see: 20010 - TFX_interactive_pipeline.ipynb
```