# Model auditing with MLflow on SherlockML

After installling MLflow (`pip install MLflow`), just import it as any Python library.

In [1]:
import mlflow
from mlflow import log_artifact, log_metric, log_param
import mlflow.sklearn
import mlflow.tracking

MLflow can log various objects (models, parameters, runs, entire projects) and keep track of them through the UI. When something is logged, MLflow creates an `mlruns` folder where it stores the information. You can then run the MLflow UI as described in the readme to this project to see all the information.

## Load the data (car evaluation dataset)

Data downloaded from: http://mlr.cs.umass.edu/ml/datasets/Car+Evaluation

In [2]:
import pandas as pd
import numpy as np

In [3]:
data_df = pd.read_csv("../data/car.data", header=None)
data_df.columns = ['buying','maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

In [4]:
data_df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


## Data exploration

The dataset is completely categorical. Let's get some simple information aboutit.

In [5]:
data_df.describe()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
count,1728,1728,1728,1728,1728,1728,1728
unique,4,4,4,3,3,3,4
top,vhigh,vhigh,4,4,med,med,unacc
freq,432,432,432,576,576,576,1210


Let's have a closer look at the classes to which the datapoints belong.

In [6]:
data_df.groupby(['class']).size()

class
acc       384
good       69
unacc    1210
vgood      65
dtype: int64

## Train and selection of a model

Since the dataset is completely categorical, we choose to use a random forest to predict the class of a datapoint given its features. In particular we want to find the value of the `min_samples_leaf` parameter that gives the best accuracy and log all the results with MLflow.

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

First we one-hot-encode the categorical features.

In [8]:
X = pd.get_dummies(data_df.drop('class', axis=1))
Y = data_df['class']

Cross validation: we split the data into a train and test dataset. The model will be trained on the training dataset and the accuracy will be evaluated on the test one.

In [9]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=42)

We choose a range of values for the `min_samples_leaf` paramenter and we log the accuracy we obtain using each value using MLflow. The information is stored in the `/mlruns/` folder. We can then run the MLflow UI as described in the readme (please start it in the parent folder to `/mlruns/`) and visualise the information about our runs.

In [10]:
msl_scores = []

for min_samples_leaf in range(1, 11):
    with mlflow.start_run():
        rf_random_state = 42
        rf = RandomForestClassifier(min_samples_leaf=min_samples_leaf, random_state=rf_random_state)
        rf.fit(X_train, Y_train)
        accuracy = rf.score(X_test, Y_test)
        msl_scores.append([min_samples_leaf, accuracy])

        log_param('min_samples_leaf', min_samples_leaf)
        log_param('rf_random_state', rf_random_state)
        log_metric('accuracy', accuracy)
        mlflow.sklearn.log_model(rf, "model")
        
msl_scores = np.array(msl_scores)

Let's select the best model.

In [11]:
msl_scores

array([[ 1.        ,  0.91243433],
       [ 2.        ,  0.92994746],
       [ 3.        ,  0.9352014 ],
       [ 4.        ,  0.93169877],
       [ 5.        ,  0.91593695],
       [ 6.        ,  0.90367776],
       [ 7.        ,  0.87390543],
       [ 8.        ,  0.88791594],
       [ 9.        ,  0.85814361],
       [10.        ,  0.86690018]])

In [12]:
msl_scores[np.argmax(msl_scores, axis=0)[-1]]

array([3.       , 0.9352014])

In [13]:
print("Best model:")
print("min_sample_leaf =", msl_scores[np.argmax(msl_scores, axis=0)[-1]][0])
print("accuracy =", msl_scores[np.argmax(msl_scores, axis=0)[-1]][1])

Best model:
min_sample_leaf = 3.0
accuracy = 0.9352014010507881


Load the best model using MLflow. From the UI we can see the ID of the run that obtained the best accuracy and from the ID we can use `mlflow.sklearn.load_model()` to get it.

In [14]:
#Edit this checking the right value from the UI!
run_id = '3a066de05bdc4fe2a03de8b71c0b1fe3'

rf_best = mlflow.sklearn.load_model('model', run_id=run_id)

Evaluate feature importance (how much each feature influences the classification).

In [15]:
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot

init_notebook_mode(connected=True)

In [16]:
feature_importances_df = pd.DataFrame(columns=['feature', 'importance'])

for i in range(len(rf_best.feature_importances_)):
    feature_importances_df = feature_importances_df.append(
        pd.Series({'feature':X.columns[i], 'importance':rf_best.feature_importances_[i]}),
        ignore_index=True
    )

feature_importances_df = feature_importances_df.sort_values(by='importance', ascending=False)
feature_importances_df = feature_importances_df.reset_index()
feature_importances_df = feature_importances_df.drop('index', axis=1)

feature_importances_df

Unnamed: 0,feature,importance
0,safety_low,0.215676
1,persons_2,0.192787
2,safety_high,0.104212
3,persons_4,0.0636
4,safety_med,0.061845
5,persons_more,0.057349
6,maint_vhigh,0.053617
7,buying_vhigh,0.033833
8,buying_low,0.032218
9,maint_low,0.03109


In [17]:
trace = go.Bar(
    x = feature_importances_df['feature'],
    y = feature_importances_df['importance']
)

data = [trace]

layout = go.Layout(
    xaxis=dict(tickangle=-45),
    hovermode='closest'
)

fig = go.Figure(data=data, layout=layout)

iplot(fig)