# **Lab: Model Deployment**




## Exercise 3: MLflow

We will train a RandomForest to predict the spending score of the mall customer dataset and use MLflow as our ML registry


**Pre-requisites:**
- Create a github account (https://github.com/join)
- Install git (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- Install Docker (https://docs.docker.com/get-docker/)

The steps are:
1.   Build MLflow
2.   Build docker-compose
3.   Load and prepare Data
4.   Configure MLflow
5.   Train RandomForest and log MLflow
6.   Push changes



### 1. Build MLflow

**[1.1]** Go to the folder you created previously `adv_dsi_lab_4`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Go to the folder you created previously adv_dsi_lab_4

In [None]:
# Solution:
cd ~/Projects/adv_dsi_lab_4

**[1.2]** Create a folder called `mlflow` and go into it

In [None]:
# Placeholder for student's code (2 command lines)
# Task: Create a folder called mlflow and go into it

In [None]:
# Solution:
mkdir mlflow
cd mlflow

**[1.3]** Create 2 folders called `mlruns` and `mlstore`

In [None]:
# Placeholder for student's code (2 command lines)
# Task: Create 2 folders called mlruns and mlstore

In [None]:
# Solution:
mkdir mlruns
mkdir mlstore

**[1.4]** Create a `Dockerfile` with the following content: 

`FROM python:3.7.0`

`RUN pip install --upgrade pip`

`RUN pip install mlflow==1.18.0`

`RUN pip install psycopg2-binary==2.8.5`

`RUN mkdir /mlflow/`

`RUN mkdir /mlflow/mlstore/`

`RUN mkdir /mlflow/mlruns/`

`CMD ["mlflow", "server", "--host", "0.0.0.0", "--backend-store-uri", "/mlflow/mlstore", "--default-artifact-root" , "/mlflow/mlruns"]`


In [None]:
# Placeholder for student's code (1 command line)
# Task: Create a Dockerfile

In [None]:
# Solution:
vi dockerfile

**[1.5]** Build the image from this Dockerfile

In [None]:
# Placeholder for student's code (1 command line)
# Task: Build the image from this Dockerfile

In [None]:
# Solution:
docker build -t mlflow:latest .

**[1.6]** Run the built image

In [None]:
# Placeholder for student's code (1 command line)
# Task: Run the built image

In [None]:
# Solution:
docker run -dit --rm --name adv_dsi_mlflow -v ~/Projects/adv_dsi_lab_4/mlflow/mlruns:/mlruns -v ~/Projects/adv_dsi_lab_4/mlflow/mlstore:/mlstore -p 5000:5000 mlflow:latest 

**[1.7]** Open and browser and navigate to http://127.0.0.1:5000/#/

**[1.8]** Stop the docker container


In [None]:
# Placeholder for student's code (1 command line)
# Task: Stop the docker container

In [None]:
# Solution:
docker stop adv_dsi_mlflow

### 2. Build docker-compose

**[2.1]** Go to the folder `adv_dsi_lab_4`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Go to the folder adv_dsi_lab_4

In [None]:
# Solution:
cd ~/Projects/adv_dsi_lab_4

**[2.2]** Copy the `docker-compose.yml` file from https://raw.githubusercontent.com/aso-uts/advanced_dsi/master/unit4/docker-compose.yml

In [None]:
# Placeholder for student's code (1 command line)
# Task: Copy the docker-compose.yml

In [None]:
# Solution:
wget https://raw.githubusercontent.com/aso-uts/advanced_dsi/master/unit4/docker-compose.yml

**[2.3]** Start the Docker containers

In [None]:
docker-compose up -d

**[2.4]** List all running docker containers

In [None]:
# Placeholder for student's code (1 command line)
# Task: List all running docker containers

In [None]:
# Solution:
docker ps

**[2.5]** Display last 50 lines of logs of `jupyter_docker`

In [None]:
# Placeholder for student's code (1 command line)
# Task: Display last 50 lines of logs of jupyter_docker

In [None]:
# Solution:
docker logs --tail 50 jupyter_docker

Copy the url displayed and paste it to a browser in order to launch Jupyter Lab

**[2.6]** Create a new git branch called `xgboost_spending`

In [None]:
git checkout -b xgboost_spending

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-checkout

**[2.7]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_xgboost_spending.ipynb`

### 3. Load Data and Model

**[3.1]** Import the pandas and numpy package

In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import the pandas and numpy package

In [None]:
# Solution
import pandas as pd
import numpy as np

**[3.2]** Load the prepared dataset from `data/interim` into a dataframe called `df`



In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Load the prepared dataset from data/interim into a dataframe called df

In [None]:
#Solution:
df = pd.read_csv('../data/interim/Mall_Customers.csv')

**[3.3]** Create a copy of `df` and save it into a variable called `df_cleaned`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Create a copy of df and save it into a variable called df_cleaned

In [None]:
# Solution
df_cleaned = df.copy()

**[3.4]** Import `OneHotEncoder` from `sklearn.preprocessing`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import OneHotEncoder from sklearn.preprocessing

In [None]:
# Solution
from sklearn.preprocessing import StandardScaler, OneHotEncoder

**[3.5]** Instantiate a `OneHotEncoder` with `sparse=False` and `drop='first'` and save it to a variable called `ohe`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Instantiate a OneHotEncoder with sparse=False and drop='first' and save it to a variable called ohe

In [None]:
# Solution
ohe = OneHotEncoder(sparse=False, drop='first')

**[3.6]** Fit and transform the `Gender` feature of `df_cleaned` and replace the data into it

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Fit and transform the `Gender` feature of `df_cleaned` and replace the data into it

In [None]:
# Solution
df_cleaned['Gender'] = ohe.fit_transform(df_cleaned[['Gender']])

**[3.7]** Import `split_sets_random`, `save_sets` from `src.data.sets`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Import split_sets_random, save_sets from src.data.sets

In [None]:
# Solution
from src.data.sets import split_sets_random, save_sets

**[3.8]** Split the data intro training, validation and testing sets with 80-20 ratio

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Split the data intro training, validation and testing sets with 80-20 ratio

In [None]:
# Solution
X_train, y_train, X_val, y_val, X_test, y_test = split_sets_random(df_cleaned, target_col='Spending Score (1-100)', test_ratio=0.2, to_numpy=False)

**[3.9]** Save the sets into `data/processed` folder

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Save the sets into data/processed folder

In [None]:
# Solution
save_sets(X_train, y_train, X_val, y_val, X_test, y_test, path='../data/processed/')

# 4. Configure MLflow

**[4.1]** Import mlflow and mlflow.sklearn


In [None]:
# Placeholder for student's code (2 lines of Python code)
# Task: Import mlflow and mlflow.sklearn

In [None]:
# Solution
import mlflow
import mlflow.sklearn

**[4.2]** Set the MLflow Server URI to `http://mlflow:5000` using `.set_tracking_uri()`

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Set the MLflow Server URI to http://mlflow:5000 using .set_tracking_uri()

In [None]:
# Solution
mlflow.set_tracking_uri('http://mlflow:5000')

**[4.3]** Define `xgboost_spending` as the MLflow experiment to be used with `.set_experiment()`

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Define xgboost_spending as the MLflow experiment to be used with .set_experiment()

In [None]:
# Solution
mlflow.set_experiment('xgboost_spending')

INFO: 'xgboost_spending' does not exist. Creating a new experiment


**[4.4]** Start the tracking with Mlflow using `.start_run()`

In [None]:
# Placeholder for student's code (1 line of Python code)
# Task: Start the tracking with Mlflow using .start_run()

In [None]:
# Solution
run = mlflow.start_run()

### 5. Train RandomForest and log MLflow

**[5.1]** Set a MLflow tag with `model.description` as key and `RandomForest with default hyperparameter` as value using `.set_tag()` 

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Set a MLflow tag with model.description as key and RandomForest with default hyperparameter as value using .set_tag()

In [None]:
# Solution
mlflow.set_tag("model.description", "RandomForest with default hyperparameter")

**[5.2]** Set a MLflow tag with `model.version` as key and `0.1` as value using `.set_tag()` 

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Set a MLflow tag with model.version as key and 0.1 as value using .set_tag()

In [None]:
# Solution
mlflow.set_tag("model.version", "0.1")

**[5.3]** Turn on automatic logging with sklearn

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Turn on automatic logging with sklearn

In [None]:
# Solution
mlflow.sklearn.autolog()

**[5.4]** Import `RandomForestRegressor` from `sklearn.ensemble` and instantiate it into a variable called `rf1` with `random_state=8`

In [None]:
# Placeholder for student's code (2 lines of code)
# Task: Import RandomForestRegressor from sklearn.ensemble and instantiate it into a variable called rf1 with random_state=8

In [None]:
# Solution
from sklearn.ensemble import RandomForestRegressor

rf1 = RandomForestRegressor(random_state=8)

**[5.5]** Fit the model on the training set

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Fit the model on the training set

In [None]:
# Solution
rf1.fit(X_train, y_train)

2021/06/28 11:53:48 INFO mlflow.utils.autologging_utils: sklearn autologging will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow to the MLflow run with ID '2fd68d49527649d89759d4158950bfc1'
2021/06/28 11:53:48 INFO mlflow.utils.autologging_utils: sklearn autologging will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow to the MLflow run with ID '2fd68d49527649d89759d4158950bfc1'
2021/06/28 11:53:48 INFO mlflow.utils.autologging_utils: sklearn autologging will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow to the MLflow run with ID '2fd68d49527649d89759d4158950bfc1'
2021/06/28 11:53:48 INFO mlflow.utils.autologging_utils: sklearn autologging will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow to the MLflow run

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=8, verbose=0, warm_start=False)

**[5.6]** Import `infer_signature` from `mlflow.models.signature`

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Import infer_signature from mlflow.models.signature

In [None]:
# Solution
from mlflow.models.signature import infer_signature

**[5.7]** Apply `infer_signature()` on the training set and save the results on a variable called `signature` 

In [None]:
# Placeholder for student's code (1 line of code)
# Task: Apply infer_signature() on the training set and save the results on a variable called signature

In [None]:
# Solution
signature = infer_signature(X_train, y_train)

  outputs = _infer_schema(model_output) if model_output is not None else None


**[5.8]** Log the trained model with its signature to the path `model` and `sklearn-rf-spending` as name 

In [None]:
mlflow.sklearn.log_model(rf1, artifact_path="model", signature=signature, registered_model_name="sklearn-rf-spending") 

Successfully registered model 'sklearn-rf-spending'.
2021/06/28 11:54:44 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sklearn-rf-spending, version 1
Created version '1' of model 'sklearn-rf-spending'.


**[5.9]** Close the MLflow experiment run 




In [None]:
# Placeholder for student's code (1 line of code)
# Task: Close the MLflow experiment run 

In [None]:
# Solution
mlflow.end_run()

**[5.10]** Open and browser and navigate to http://127.0.0.1:5000/#/

**[5.11]** Navigate into `xgboost_spending` and select the experiment run

### 6.   Push changes

**[6.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)
# Task: Add you changes to git staging area

In [None]:
# Solution:
git add .

**[6.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)
# Task: Create the snapshot of your repository and add a description

In [None]:
# Solution:
git commit -m "randomforest default"

**[6.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)
# Task: Push your snapshot to Github

In [None]:
# Solution:
git push

**[6.4]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the master branch

In [None]:
# Solution:
git checkout master

**[6.5]** Pull the latest updates

In [None]:
# Placeholder for student's code (1 command line)
# Task: Pull the latest updates

In [None]:
git pull

**[6.6]** Check out to the `xgboost_spending` branch

In [None]:
# Placeholder for student's code (1 command line)
# Task: Check out to the xgboost_spending branch

In [None]:
# Solution:
git checkout xgboost_spending

**[6.7]** Merge the `master` branch and push your changes




In [None]:
# Placeholder for student's code (2 command lines)
# Task: Merge the master branch and push your changes

In [None]:
# Solution:
git merge master
git push

Documentation: https://www.atlassian.com/git/tutorials/using-branches/git-merge

**[6.8]** Go to Github and merge the branch after reviewing the code and fixing any conflict




**[6.9]** Stop the Docker container

In [None]:
docker-compose down

Documentation: https://docs.docker.com/engine/reference/commandline/stop/