# Part I. Usage *mlpanel* from notebook

## Preparation

In *UI*:

1. create new project "InNotebook";
2. run project;
3. copy tracking server URI.



![create project](docs/images/1.create.png)

![create project](docs/images/2.create.png)

![run project](docs/images/3.run.png)

![run project](docs/images/4.run.png)

![copy tracking uri](docs/images/5.copy.tracking.url.png)

## Implement ML workflow functions 

In [None]:
import joblib
from IPython.display import display, HTML
import itertools
import matplotlib.pyplot as plt
import mlflow
from mlflow import log_artifact, log_metric, log_param, log_params
import mlflow.sklearn
from mlflow.tracking import MlflowClient
import numpy as np
import os
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, f1_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from typing import Dict, Text, Tuple

### Load dataset

In [None]:
def load_dataset(csv: Text) -> pd.DataFrame:
    
    dataset = pd.read_csv(csv)
    
    return dataset

### Features engineering

In [None]:
def create_features(dataset: pd.DataFrame) -> pd.DataFrame:
    
    dataset['sepal_length_to_sepal_width'] = dataset['sepal_length'] / dataset['sepal_width']
    dataset['petal_length_to_petal_width'] = dataset['petal_length'] / dataset['petal_width']

    dataset = dataset[[
        'sepal_length', 'sepal_width', 'petal_length', 'petal_width',
        'sepal_length_to_sepal_width', 'petal_length_to_petal_width',
        'species'
    ]]
    
    return dataset

### Translate target values to labels

In [None]:
def target_values_to_labels(dataset: pd.DataFrame, target_column: Text) -> pd.DataFrame:
    
    dataset[target_column] = LabelEncoder().fit_transform(dataset[target_column])
    
    return dataset

### Split dataset

In [None]:
def split_train_test(dataset: pd.DataFrame,
                     test_size: float,
                     random_state: int = 42) -> Tuple[pd.DataFrame, pd.DataFrame]:
    
    train_dataset, test_dataset = train_test_split(dataset, test_size=test_size, random_state=random_state)
    
    return train_dataset, test_dataset

### Train

In [None]:
PARAM_GRIDS = {
    
    'LogisticRegression': {
                'C': [0.001, 0.01],
                'max_iter': [100, 200, 300],
                'solver': ['lbfgs'],
                'multi_class': ['multinomial']
    },
    
    'SVC': {
        'C': [0.1, 1.0],
        'kernel': ['rbf', 'linear'],
        'gamma': ['scale'],
        'degree': [3, 5]
    },
    
}

ESTIMATORS = {
        'LogisticRegression': LogisticRegression,
        'SVC': SVC
}

In [None]:
def train(estimator, param_grid: Dict, cv: int, dataset: pd.DataFrame, target_column: Text):
    
    X_train = dataset.drop(target_column, axis=1)
    y_train = dataset[target_column]
    f1_scorer = make_scorer(f1_score, average='weighted')
    clf = GridSearchCV(estimator=estimator,
                       param_grid=param_grid,
                       cv=cv,
                       verbose=1,
                       scoring=f1_scorer,
                       iid=True)

    clf.fit(X_train, y_train)
    
    return clf

### Evaluate

In [None]:
def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """

    accuracy = np.trace(cm) / float(np.sum(cm))
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

In [None]:
def evaluate(clf, dataset: pd.DataFrame, target_column: Text) -> Tuple[np.ndarray, float]:
    
    Xtest = dataset.drop(target_column, axis=1)
    Ytest = dataset[target_column]
    y_pred = clf.predict(Xtest)

    cm = confusion_matrix(y_pred, Ytest)
    f1 = f1_score(y_true=Ytest, y_pred=y_pred, average='macro')
    
    return cm, f1

### Run experiment

In [None]:
def print_dataframe(df: pd.DataFrame, caption='DF', head_only=True):
    
    print(f'{caption}: ')
    
    print('\tshape:', df.shape)
    print_df = df
    
    if head_only:
        print_df = df.head()
    
    print('\tdataframe:')
    display(HTML(print_df.to_html()))
    
    print('\n\n')

In [None]:
def run_experiment(data_folder: Text,
                   dataset_csv: Text, 
                   target_column: Text, 
                   test_size, 
                   estimator_name: Text,
                   cv: int,
                   experiment_name: Text = None,
                   tracking_uri: Text = None,
                  ):
    
    if tracking_uri:
        print(tracking_uri)
        mlflow.set_tracking_uri(tracking_uri)
    
    mlflow.set_experiment(experiment_name or 'Default')
    
    with mlflow.start_run():
        
        log_param('target_column', target_column)
        log_param('test_size', test_size)
        log_param('estimator_name', estimator_name)
        log_param('cv', cv)

        
        dataset = load_dataset(dataset_csv)
        print_dataframe(dataset, 'raw dataset')
        log_artifact(dataset_csv)

        target_names = dataset[target_column].unique().tolist()
        print(f'target names: {target_names}\n\n')
        log_param('taget_names', target_names)

        dataset = create_features(dataset)
        dataset = target_values_to_labels(dataset, target_column)
        print_dataframe(dataset, 'processed dataset')
        processed_csv = os.path.join(data_folder, 'processed_iris.csv')
        dataset.to_csv(processed_csv, index=False)
        log_artifact(processed_csv)

        trainset, testset = split_train_test(dataset, test_size)
        print_dataframe(trainset, 'trainset')
        print_dataframe(testset, 'testset')
        train_csv = os.path.join(data_folder, 'train_iris.csv')
        test_csv = os.path.join(data_folder, 'test_iris.csv')
        trainset.to_csv(train_csv, index=False)
        testset.to_csv(test_csv, index=False)
        log_artifact(train_csv)
        log_artifact(test_csv)
        

        estimator = ESTIMATORS[estimator_name]()
        param_grid = PARAM_GRIDS[estimator_name]
        log_param('param_grid', param_grid)

        clf = train(estimator, param_grid, cv, trainset, target_column)
        print(f'best estimator: {clf.best_estimator_}\n\n')
        log_params(clf.best_estimator_.get_params())
        mlflow.sklearn.log_model(clf, 'model')
        
        cm, f1 = evaluate(clf, testset, target_column)
        print(f'f1 score: {f1}\n\n')
        
        log_param('cm', str(cm.tolist()))
        log_metric('f1', f1)
        plot_confusion_matrix(cm, target_names, normalize=False)

## Make experiments

### Logistic regression

In [None]:
run_experiment(data_folder='data', 
               dataset_csv='data/iris.csv', 
               target_column='species', 
               test_size=0.2, 
               estimator_name='LogisticRegression', 
               cv=5, 
               experiment_name='IrisLogregNotebook',
               tracking_uri='http://0.0.0.0:5001'
)

In [None]:
run_experiment(data_folder='data', 
               dataset_csv='data/iris.csv', 
               target_column='species', 
               test_size=0.5, 
               estimator_name='LogisticRegression', 
               cv=5, 
               experiment_name='IrisLogregNotebook',
               tracking_uri='http://0.0.0.0:5001'
)

### SVC

In [None]:
run_experiment(data_folder='data', 
               dataset_csv='data/iris.csv', 
               target_column='species', 
               test_size=0.2, 
               estimator_name='SVC', 
               cv=5, 
               experiment_name='IrisSVCNotebook',
               tracking_uri='http://0.0.0.0:5001'
)

In [None]:
run_experiment(data_folder='data', 
               dataset_csv='data/iris.csv', 
               target_column='species', 
               test_size=0.1, 
               estimator_name='SVC', 
               cv=10, 
               experiment_name='IrisSVCNotebook',
               tracking_uri='http://0.0.0.0:5001'
)

In [None]:
run_experiment(data_folder='data', 
               dataset_csv='data/iris.csv', 
               target_column='species', 
               test_size=0.5, 
               estimator_name='SVC', 
               cv=10, 
               experiment_name='IrisSVCNotebook',
               tracking_uri='http://0.0.0.0:5001'
)

## View experiments

### In MLflow UI

![view experiments](docs/images/6.view.experiments.png)

### In mlpanel UI
![view experiments](docs/images/7.view.experiments.png)



## To show specific experiment - click "Show"

![show experiment](docs/images/8.show.experiment.png)

## Show run
![show run](docs/images/9.show.run.png)

## Register model
![register model](docs/images/10.register.model.png)

## Deploy model

![deploy model](docs/images/11.deploy.model.png)
![view deployment](docs/images/12.deployment.png)

# Part II. Usage mlpanel from code

## Preparation

As in previous part in *UI*:

1. create new project "IrisProject";
2. run project;
3. copy tracking server URI.

![iris project](docs/images/13.iris.project.png)

### Assume we have project.

Go to folder IrisProject/

In [None]:
%cd IrisProject/

### To manage workflow we use config - *config/pipeline_config.yml*:

In [None]:
print(open('config/pipeline_config.yml').read())

### Edit config and make run

In [None]:
!MLFLOW_TRACKING_URI=http://0.0.0.0:5002 python run.py

### Repeate editing config and make runs.

# Task

In **Part II** make steps like in **Part 1**: 
* enter experiments and runs;
* register models;
* create deployments.