# mlflow quickstart

* website: https://mlflow.org/

mlflow is an open source platform for the machine learning lifecycle. It currently offers three main components, `Tracking`, `Projects`, and `Models`.
mlflow requires `git` to provide automatical commit during training.

In [1]:
!apt update
!apt-get -y install git

Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [105 kB]3m
Get:4 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB][0m[33m0m[33m
Get:5 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [745 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]     [0m[33m
Get:7 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB]m
Get:8 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.7 kB]3m
Get:9 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [507 kB]
Get:10 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [4026 B]3m
Get:11 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]   [0m[33m[33m[33m[33m[33m[33m[33m
Get:12 http://archive.ubuntu.com/ubuntu xen

In [2]:
!python --version
!git --version

Python 3.5.2
git version 2.7.4


# Installing

In [1]:
!pip install mlflow

Collecting mlflow
[?25l  Downloading https://files.pythonhosted.org/packages/44/22/d503ea07820779c49dda122154fed65609e516eb67ed27b8c7f7b161ce37/mlflow-0.8.0.tar.gz (6.3MB)
[K    100% |████████████████████████████████| 6.3MB 3.8MB/s eta 0:00:01    12% |████▏                           | 819kB 3.6MB/s eta 0:00:02
Collecting databricks-cli>=0.8.0 (from mlflow)
[?25l  Downloading https://files.pythonhosted.org/packages/2c/9f/1c80cad05d2a8e68738d4b81195717d799a5cc6288e2cb250c5324fbab35/databricks-cli-0.8.2.tar.gz (40kB)
[K    100% |████████████████████████████████| 40kB 11.7MB/s ta 0:00:01
Collecting gunicorn (from mlflow)
[?25l  Downloading https://files.pythonhosted.org/packages/8c/da/b8dd8deb741bff556db53902d4706774c8e1e67265f69528c14c003644e6/gunicorn-19.9.0-py2.py3-none-any.whl (112kB)
[K    100% |████████████████████████████████| 122kB 22.6MB/s ta 0:00:01
Collecting gitpython>=2.1.0 (from mlflow)
[?25l  Downloading https://files.pythonhosted.org/packages/fe/e5/fafe827507644c32d6

Installing collected packages: tabulate, configparser, databricks-cli, gunicorn, smmap2, gitdb2, gitpython, querystring-parser, simplejson, argparse, nose-exclude, mleap, mlflow
Successfully installed argparse-1.4.0 configparser-3.5.0 databricks-cli-0.8.2 gitdb2-2.0.5 gitpython-2.1.11 gunicorn-19.9.0 mleap-0.8.1 mlflow-0.8.0 nose-exclude-0.5.0 querystring-parser-1.2.3 simplejson-3.16.0 smmap2-2.0.5 tabulate-0.8.2
[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


If you are going to install mlflow on macos, you have to use `homebrew` instead of `pip`.

# A simple example

In [2]:
import os
from mlflow import log_metric, log_param, log_artifact

In [5]:
def simple_example():
    # Log a parameter (key-value pair)
    log_param("param1", 5)

    # Log a metric; metrics can be updated throughout the run
    # one row represents one session
    log_metric("foo", 1)
    log_metric("foo", 2)
    log_metric("foo", 3)

    # Log an artifact (output file)
    with open("output.txt", "w") as f:
        f.write("Hello world!")
    log_artifact("output.txt")

In [6]:
simple_example()

MlflowException: Run '5cbd19ccb8754a8fa49e5a77b23c1765' not found

## Tracking Server

```sh
# host 0.0.0.0 : allow multiple name to access (remote access)
mlflow server --file-store ./mlruns --host 0.0.0.0
```

![../data/images/mlflow_simple.png](../data/images/mlflow_simple.png)

# clone the example

```sh
cd ~
git clone https://github.com/mlflow/mlflow
cd ~/mlflow/

export LC_ALL=C.UTF-8
export LANG=C.UTF-8
```

## training a model

Here we use the example code in mlflow.

```sh
# under path ~/mlflow
cd ~/mlflow
python examples/sklearn_elasticnet_wine/train.py
```

or the command as below

```sh
# alpha and l1_ratio are both defined on the train.py
python examples/sklearn_elasticnet_wine/train.py <alpha> <l1_ratio>
```

The `mlruns` folder would be generated on the current path (now is `~/mlflow`).
Now start the server to monitor the running log.

```sh
mlflow server --file-store ~/mlflow/mlruns --host 0.0.0.0
```

You can set the criteria to filter results, e.g. `metrics.r2 > 0.1`.
![../data/images/mlflow_criteria.png](../data/images/mlflow_criteria.png)

## packaging the training code

* Install the requirement first.

Here we use the python-3.5.2 as the example. `mlflow` must use `conda` so install it first (mini-conda). Based on the python version, select `Miniconda3-4.2.12-Linux-x86_64.sh`.

```sh
cd ~
apt update
apt install wget
wget https://repo.anaconda.com/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh
bash Miniconda-3.5.2-Linux-x86_64.sh
source ~/.bashrc  # make the conda command available
conda --version  # check the conda is installed
```

* the `MLproject` example : the `conda.yaml` is the requirement for the script

```yaml
# tutorial/MLproject

name: tutorial

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"
```

* the `conda.yaml` example

```yaml
name: tutorial
channels:
  - defaults
dependencies:
  - numpy=1.14.3
  - pandas=0.22.0
  - scikit-learn=0.19.1
  - pip:
    - mlflow
```

* example directory for the MLproject

```text
+ example
    - MLproject
    - conda.yaml
    - train.py
```

* start the training

```sh
mlflow run example -P alpha=0.42
```

* the mlflow example

```sh
cd ~/mlflow/examples
mlflow run sklearn_elasticnet_wine -P alpha=0.42
mlflow server --file-store ~/mlflow/examples/mlruns --host 0.0.0.0
```

After start the server, browser the webpage `http://(IP or localhost or 127.0.0.1):5000`.