<a href="https://colab.research.google.com/github/mfligiel/Models-for-MLOPS-Review/blob/main/MLFlow_Weather_adapted.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MLFlow

**Topic**: MLFLOW: Model Monitoring using ML pipelines
<br>
**Problem**: Performance Classification Metrics on prediction of Weather 
<br>
**Author**: Matthew Fligiel, adapted from Devanshi Verma

In [1]:
#loading the model
#mounting the google drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
#!cd /content/drive/MyDrive/ModelMonitoringBlog

Mounted at /content/drive


In [2]:
#install mlflow and pyngrok
!pip install mlflow --quiet
!pip install pyngrok --quiet

[K     |████████████████████████████████| 14.6 MB 84 kB/s 
[K     |████████████████████████████████| 79 kB 7.6 MB/s 
[K     |████████████████████████████████| 636 kB 37.9 MB/s 
[K     |████████████████████████████████| 56 kB 4.6 MB/s 
[K     |████████████████████████████████| 170 kB 38.2 MB/s 
[K     |████████████████████████████████| 1.1 MB 38.6 MB/s 
[K     |████████████████████████████████| 145 kB 45.0 MB/s 
[K     |████████████████████████████████| 75 kB 4.5 MB/s 
[K     |████████████████████████████████| 52 kB 1.4 MB/s 
[K     |████████████████████████████████| 63 kB 1.8 MB/s 
[?25h  Building wheel for alembic (setup.py) ... [?25l[?25hdone
  Building wheel for databricks-cli (setup.py) ... [?25l[?25hdone
  Building wheel for prometheus-flask-exporter (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 745 kB 5.2 MB/s 
[?25h  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone


In [24]:
#loading the libraries
import os
import pandas as pd
import sklearn
import joblib
import pickle
import numpy as np

#logging
import mlflow
import mlflow.sklearn
import logging
from urllib.parse import urlparse
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from pyngrok import ngrok


## Loading in this data (once saved), and the model

Now I can prep to predict, and see how different it really is.

In [4]:
!ls drive/MyDrive/ModelMonitoringBlog

 Chicago.csv
 Detroit.csv
'First take on Evidently, and potential Datasets, MF 6.18.21.gdoc'
 Milwaukee.csv
'Model Code'
'Notes, 6.23.2021.gdoc'
'Notes 6.30.21.gdoc'
'Notes, 7.14.21.gdoc'
'Notes, 7.21.21.gdoc'
'Notes, 7.7.21.gdoc'
'Notes, 8.11.21.gdoc'
'Notes, 8.25.21.gdoc'
 Omaha.csv
 reports
 Rupas_Files_WIP
 Rupas_Final_Files
 St_Louis.csv
'Table of Contents.gdoc'
 Test.csv
 Toronto.csv
 Weather2020.csv
 weather_model.pkl


In [5]:
#load in data
df = pd.read_csv(r'/content/drive/MyDrive/ModelMonitoringBlog/Test.csv')
#load in model
model = pickle.load(open(r'/content/drive/MyDrive/ModelMonitoringBlog/weather_model.pkl', 'rb'))

In [6]:
#Now, to rename the columns
df.columns = ['drp', 'city', 'date', 'maxtemp']
df.drop('drp', axis=1, inplace=True)

In [7]:
#reshape the data
df = df.pivot(index='date', columns='city', values='maxtemp')

In [8]:
df.head()

city,Chicago,Detroit,Milwaukee,Omaha,St. Louis,Toronto
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-06-02,23.7,25.79,23.15,26.06,29.23,40.12
2021-06-03,24.42,24.91,22.73,27.475,28.09,41.715
2021-06-04,28.055,26.21,28.65,29.965,28.675,41.96
2021-06-05,30.675,30.885,31.51,33.61,30.565,42.41
2021-06-06,31.375,31.79,31.8,33.165,33.33,42.07


Let's make a prediction.

In [9]:
predictions = model.predict(df.drop('Chicago', axis=1))

In [10]:
df['Chicago'] - predictions

date
2021-06-02    0.252094
2021-06-03    1.663958
2021-06-04    1.463601
2021-06-05    0.412679
2021-06-06    0.194342
2021-06-07    1.332528
2021-06-08    0.825372
2021-06-09    0.262758
2021-06-10   -1.233267
2021-06-11   -0.465028
2021-06-12   -0.681110
2021-06-13   -0.408895
2021-06-14   -0.015240
2021-06-15    0.750277
Name: Chicago, dtype: float64

LEFT OFF POINT IN CONVERSION

In [14]:
#re creating the dictionary above 
cities = {'Milwaukee':'2451822', 'Detroit':'2391585', 'Toronto':'4118', 'St. Louis':'2486982', 'Omaha':'2465512', 'Chicago':'2379574'}

df_old = pd.DataFrame()

for i in cities.keys():
  if i == 'St. Louis':
    i = 'St_Louis'
  pth = "drive/MyDrive/ModelMonitoringBlog/" + i + ".csv"
  print(pth)
  to_append = pd.read_csv(pth)
  #print(to_append.head())
  if df_old.empty:
    df_old = to_append
    print(df_old.empty)
  else:
    df_old = pd.concat([df_old, to_append], ignore_index=True)
  


drive/MyDrive/ModelMonitoringBlog/Milwaukee.csv
False
drive/MyDrive/ModelMonitoringBlog/Detroit.csv
drive/MyDrive/ModelMonitoringBlog/Toronto.csv
drive/MyDrive/ModelMonitoringBlog/St_Louis.csv
drive/MyDrive/ModelMonitoringBlog/Omaha.csv
drive/MyDrive/ModelMonitoringBlog/Chicago.csv


In [15]:
#Now, to rename the columns
df_old.columns = ['drp', 'city', 'date', 'maxtemp']
df_old.drop('drp', axis=1, inplace=True)
df_old = df_old.pivot(index='date', columns='city', values='maxtemp')

In [16]:
#looking at the data
df_old.head()

city,Chicago,Detroit,Milwaukee,Omaha,St. Louis,Toronto
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-03-02,6.06,8.36,3.505,7.59,15.105,6.035
2021-03-03,6.1,6.82,7.49,14.695,15.48,3.36
2021-03-04,10.15,12.215,9.355,17.6,20.425,7.225
2021-03-05,9.785,11.255,9.935,17.77,18.55,7.525
2021-03-06,8.965,10.325,9.345,17.01,19.07,7.01


In [17]:
#looking at the data again
df.head()

city,Chicago,Detroit,Milwaukee,Omaha,St. Louis,Toronto
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-06-02,23.7,25.79,23.15,26.06,29.23,40.12
2021-06-03,24.42,24.91,22.73,27.475,28.09,41.715
2021-06-04,28.055,26.21,28.65,29.965,28.675,41.96
2021-06-05,30.675,30.885,31.51,33.61,30.565,42.41
2021-06-06,31.375,31.79,31.8,33.165,33.33,42.07


Now that all of the data has been loaded, we can use mlflow to perform a run with the new and old models.  In the version being adapted from, there is a focus on metrics for classification; here the focus will be on regression.

First, code from the <a href="mlflow.org/docs/latest/tutorials-and-examples/tutorial.html">mlflow website</a> about setting up evaluation metrics:

In [22]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


In [29]:
with mlflow.start_run(run_name = "00-Original-Training"):
    #using the fitted model
        predicted_qualities = model.predict(df_old.drop('Chicago', axis=1))

        (rmse, mae, r2) = eval_metrics(df_old['Chicago'], predicted_qualities)

        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":

            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(model, "model with original data", registered_model_name="OrigData")
        else:
            mlflow.sklearn.log_model(model, "model with original data")

  RMSE: 1.22873218125703
  MAE: 0.9610292016803056
  R2: 0.9582508175292798


In [30]:
with mlflow.start_run(run_name = "01-New-Data"):
    #using the fitted model
        predicted_qualities = model.predict(df.drop('Chicago', axis=1))

        (rmse, mae, r2) = eval_metrics(df['Chicago'], predicted_qualities)

        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":

            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(model, "model with newer data", registered_model_name="NewData")
        else:
            mlflow.sklearn.log_model(model, "model with newer data")

  RMSE: 0.8717745040233268
  MAE: 0.7115106165145207
  R2: 0.8482232858273254


In [28]:
get_ipython().system_raw("mlflow ui --port 5000 &")
ngrok.kill()
NGROK_AUTH_TOKEN = ""
ngrok.set_auth_token(NGROK_AUTH_TOKEN)
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

MLflow Tracking UI: https://c9e4-35-193-226-166.ngrok.io
