# Introducing MLflow

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

- It is ML library and language agnostic framework, i.e., it supports all popular ML libraries
- It supports both local and cloud development environments
- It is simple and modular to use - can be simply infused into existing ML code
- It is easy to get started, hence delivers positive developer experience!

## Install

The easiest way to install MLflow is using `pip` as follows -

In [6]:
!pip install --upgrade pip
!pip install --quiet mlflow

MLflow comes with a rich CLI that provides a simple interface to various functionality in MLflow. You can use the CLI to run projects, start the tracking UI, create and list experiments, download run artifacts, serve MLflow Python Function and scikit-learn models, and serve models on Microsoft Azure Machine Learning and Amazon SageMaker.

In [13]:
!mlflow --help

Usage: mlflow [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  artifacts    Upload, list, and download artifacts from an MLflow...
  azureml      Serve models on Azure ML.
  db           Commands for managing an MLflow tracking database.
  deployments  Deploy MLflow models to custom targets.
  experiments  Manage experiments.
  gc           Permanently delete runs in the `deleted` lifecycle stage.
  models       Deploy MLflow models locally.
  pipelines    Run MLflow Pipelines and inspect pipeline results.
  run          Run an MLflow project from the given URI.
  runs         Manage runs.
  sagemaker    Serve models on SageMaker.
  server       Run the MLflow tracking server.
  ui           Launch the MLflow tracking UI for local viewing of run...


## MLflow Components

MLflow currently offers four components:

![](./images/mlflow_components.png)

::: {.callout-note}
MLflow Models and MLflow Registry are not within the scope of this training.
:::

### MLflow Tracking

When you use MLflow model tracking, you can train a variety of different machine learning models then make predictions with them interchangeably using the standardized model prediction interface. You can also register your models in the MLflow model registry and keep track of which model is being used in production so that this information is easily accessible to everyone you are working with.  

MLflow Tracking is organized around the concept of `runs`, which are executions of some piece of data science code. Each run records the following information -

- `Parameters`: Key-value inputs to your code
- `Metrics`: Numeric values (can update over time)
- `Tags and Notes`: Additional information about a run
- `Artifacts`: Files, data, and models
- `Source`: Name of the file used to launch the run
- `Version`: The version of the source code
- `Run`: An instance of code that runs by MLflow
- `Experiment`: {`Run`, ..., `Run`}
- `Start & End Time`: Start and end time of a run

![](./images/mlflow_experiments.jpg)

### Runs and Artifacts Store

MLFlow provides wide variety of storage option for logging runs and artifacts.

#### MLflow Runs

- They can be recorded to local files, to a SQLAlchemy compatible database, or remotely to a tracking server.
- MLflow uses `backend store` component for storing `runs`
- Backend store persists MLflow entities (runs, parameters, metrics, tags, notes, metadata, etc.)
- Backend store options:
    - A file store backend - local file path
    - A database-backed store - `mysql`, `mssql`, `sqlite`, or `postgresql`
    - HTTP server (specified as `https://my-server:5000`), which is a server hosting an MLflow tracking server.

:::{.callout-note}
By default, the MLflow Python API logs runs locally to files in an `mlruns` directory wherever you ran your program. You can then run `mlflow ui` to see the logged runs.
:::

#### MLflow Artifacts

- They can be persisted to local files and a variety of remote file storage solutions.
- MLflow uses `artifact store` component for storing `artifacts`
- Artifact store persists artifacts (files, models, images, in-memory objects, or model summary, etc.)
- Artifact store options:
    - Local file path
    - Amazon S3
    - Azure Blob Storage
    - Google Cloud Storage
    - SFTP Server
    - NFS

## Common MLflow configurations

Since the MLflow client can interface with a variety of backend and artifact storage configurations. We will look a three common scenarios:


![Scenario 1 - MLflow on localhost](./images/scenario1.png){fig-align="center"}

![Scenario 2 - MLflow on localhost with backend store as an SQLAlchemy compatible database type: SQLite](./images/scenario2.png){fig-align="center"}

![Scenario 3 - Tracking server launched at localhost: mlflow server --backend-store-uri /workspace/mlruns](./images/scenario3.png){fig-align="center"}