# MLflow

MLflow is an open source tool that manages machine learning workflows. 

## Why MLflow? 
- Track experiments by logging hyperparameters, parameters, metrics, etc.
- Collaborate on and compare experiments.
- Share data and models. 
- Easier to deploy models - making the developement and production cycle tighter.

## Installation

```bash
pip install mlflow
```

## Setup

```bash
mlflow server
```

This will launch an mlflow tracking server at http://127.0.0.1:5000. Which will record your experiments, runs and models.

```{note}
This setup launches mlflow locally, to setup mlflow for a shared space follow [this guide](mlflow_server.ipynb)
```

if you open http://127.0.0.1:5000 in your browser, you would see the following.

![](assets/mlflow_ss_1.png)

# New Experiment

In [1]:
# sample dataset and model

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, class_sep=0.3, random_state=2)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=0
)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
print(X_train[:2], "\n", y_train[:2])

(670, 20) (330, 20) (670,) (330,)
[[-0.78669445  0.50422194  0.14499686  0.27023408 -0.41568319  2.29765363
  -0.78492925  0.04416244  0.01998036 -1.56917287  1.63286779 -0.89135744
   0.08933603 -0.80613711  0.05557609  1.36246637  2.43942101 -1.00591414
  -1.80253155 -1.22509769]
 [ 0.7902646   1.58800122 -1.22745766  2.45455804  1.41597967 -0.9577786
  -0.07197365  0.26854757 -0.13132486  1.30857502  0.28586077 -2.27043702
   0.01217551  0.97624736  1.23833836 -0.19059581  0.05277705  0.45574733
   0.23691436 -0.78250385]] 
 [0 1]


In [2]:
import mlflow

In [3]:
# connect to the mlflow server
MLFLOW_TRACKING_URI = "http://127.0.0.1:5000/"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

In [5]:
# create experiment and get experiment ID
experiment_name = "testing_mlflow2"
experiment_id = mlflow.create_experiment(experiment_name)

if you go back to the mlflow page, you can see a new experiment created with the name.

## Log Parameters and Metrics

Every experiment contains multiple runs. Each run start by `mlflow.start_run()`

In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [7]:
with mlflow.start_run(experiment_id=experiment_id):
    
    penalty = "l2"
    C = 0.01
    
    clf = LogisticRegression(penalty=penalty, C=C)
    clf.fit(X_train, y_train)
    
    train_auc = roc_auc_score(y_train, clf.predict_proba(X_train)[:,1])
    test_auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:,1])    
    
    mlflow.log_params({"penalty":penalty, "C":C}) # logs hyperparameters
    mlflow.log_metrics({"train_auc":train_auc, "test_auc":test_auc}) # logs metrics

In [8]:
# without `with` statement 
mlflow.start_run(experiment_id=experiment_id)

penalty = "l2"
C = 0.001

clf = LogisticRegression(penalty=penalty, C=C)
clf.fit(X_train, y_train)

train_auc = roc_auc_score(y_train, clf.predict_proba(X_train)[:,1])
test_auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:,1])    

mlflow.log_params({"penalty":penalty, "C":C}) # logs hyperparameters
mlflow.log_metrics({"train_auc":train_auc, "test_auc":test_auc}) # logs metrics

mlflow.end_run() # be sure to end the run