# Permutation importance as a XAI technique for global model interpreation

## load model to be explained
##Let's load the random forest model built using autoML tool tpot
##and explain it!

In [9]:
!ls ../automl_tutorials/*.model

../automl_tutorials/diabetes_hyperopt_automl.model
../automl_tutorials/diabetes_tpot_automl.model


In [4]:
import pickle
filehandler = open(b"../automl_tutorials/diabetes_tpot_automl.model","rb")
rf_model = pickle.load(filehandler)

In [10]:
filehandler2 = open(b"../automl_tutorials/diabetes_hyperopt_automl.model","rb")
hyperopt_model = pickle.load(filehandler2)

## load train and test data

In [5]:
!ls ../automl_tutorials/*csv

../automl_tutorials/diabetes_test.csv  ../automl_tutorials/diabetes_train.csv


In [6]:
import pandas as pd
df_train = pd.read_csv("../automl_tutorials/diabetes_train.csv")
df_test = pd.read_csv("../automl_tutorials/diabetes_test.csv")

## Use feature importance to explain

In [None]:
##explain the rf_model

In [8]:
import eli5
from eli5.sklearn import PermutationImportance

perms = PermutationImportance(rf_model, random_state=1231).fit(df_test.iloc[:,0:10],df_test["target"])
eli5.show_weights(perms, feature_names = df_test.columns[0:10].tolist())

Weight,Feature
0.2293  ± 0.0817,bmi
0.2270  ± 0.1038,ltg
0.1011  ± 0.0577,bp
0.0679  ± 0.0395,hdl
0.0287  ± 0.0202,sex
0.0116  ± 0.0024,tch
0.0099  ± 0.0171,glu
0.0046  ± 0.0054,age
-0.0034  ± 0.0124,ldl
-0.0036  ± 0.0086,tc


In [None]:
##explain the hyperopt_model

In [11]:
perms2 = PermutationImportance(hyperopt_model, random_state=1231).fit(df_test.iloc[:,0:10],df_test["target"])
eli5.show_weights(perms2, feature_names = df_test.columns[0:10].tolist())



Weight,Feature
0.2053  ± 0.0815,bmi
0.1814  ± 0.0586,ltg
0.0492  ± 0.0374,bp
0.0353  ± 0.0302,tch
0.0249  ± 0.0141,hdl
0.0121  ± 0.0083,glu
0.0084  ± 0.0092,sex
0.0081  ± 0.0055,age
0.0003  ± 0.0091,tc
-0.0005  ± 0.0090,ldl


## Summary of the global model interpretation:
We can see that despite different models (the two autoML results, using tpot and hyperopt-sklearn). From permutation importance, the top 3 predictors are the same.

###  Utilities and limitations

We can see the order of the features by importance. The negative values indicate these variables are not important (the permutation of them do not result in increased error), such as ldl.

THe limitation is that we cannot see the direction of the impact of variables on outcomes: is it positive or negative?