# Accumulated Local Effects (ALE) Plots

Accumulated Local Effects (ALE) is a method for computing feature effects based on the paper [Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models](https://arxiv.org/abs/1612.08468) by Apley and Zhu. The algorithm provides model-agnostic (black box) global explanations for classification and regression models on tabular data.

## Implementations

* Aleplot (only work numeric features)
* PyAle (only for regression models)
* alibi
* Dalex

### Load requiremennts

In [1]:
# packages
import joblib

In [2]:
# data
%run ./../data/data.py

In [7]:
df = load_adult_data('train')
y = df['Income']
X = df.drop(columns='Income')

In [8]:
# model
adult_rf = joblib.load('../models/adult_rf.pkl')

### Alibi

In [16]:
from alibi.explainers import ALE

In [31]:
proba_fun_rf = adult_rf.predict_proba
feature_names = X.columns.values.tolist()
target_names = y.unique().tolist()

In [36]:
proba_ale_rf = ALE(proba_fun_rf, feature_names=feature_names, target_names=target_names)

In [38]:
X.head()

Unnamed: 0,Age,Workclass,Final Weight,Education,Years of Education,Marital Status,Occupation,Relationship,Race,Sex,Capital Gain,Capital Loss,Hours per Week,Native Country
0,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States
1,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States
2,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States
3,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba
4,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States


In [40]:
proba_exp_rf = proba_ale_rf.explain(X, features=[0, 2])

TypeError: '(slice(None, None, None), 0)' is an invalid key

### Dalex

In [48]:
import dalex as dx

In [49]:
X.columns

Index(['Age', 'Workclass', 'Final Weight', 'Education', 'Years of Education',
       'Marital Status', 'Occupation', 'Relationship', 'Race', 'Sex',
       'Capital Gain', 'Capital Loss', 'Hours per Week', 'Native Country'],
      dtype='object')

In [42]:
adult_rf_exp = dx.Explainer(adult_rf, X, y, label = "Adult Random Forest")

Preparation of a new explainer is initiated

  -> data              : 32560 rows 14 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 32560 values
  -> target variable   : Please note that 'y' is a string array.
  -> target variable   : 'y' should be a numeric or boolean array.
  -> target variable   : Otherwise an Error may occur in calculating residuals or loss.
  -> model_class       : sklearn.ensemble._forest.ExtraTreesClassifier (default)
  -> label             : Adult Random Forest
  -> predict function  : <function yhat_proba_default at 0x7fb8374e2200> will be used (default)
  -> predict function  : Accepts only pandas.DataFrame, numpy.ndarray causes problems.
  -> predicted values  : min = 0.241, mean = 0.241, max = 0.241
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         :  'residual_function' returns an Error

In [44]:
ale_rf_exp = adult_rf_exp.model_profile(type='accumulated')

Calculating ceteris paribus: 100%|██████████| 14/14 [00:19<00:00,  1.39s/it]
Calculating accumulated dependency: 100%|██████████| 6/6 [00:00<00:00,  8.92it/s]


In [47]:
ale_rf_exp.plot(variables = ['Education'])

TypeError: variables do not overlap with Education

In [43]:
?adult_rf_exp.model_profile

[0;31mSignature:[0m
[0madult_rf_exp[0m[0;34m.[0m[0mmodel_profile[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtype[0m[0;34m=[0m[0;34m([0m[0;34m'partial'[0m[0;34m,[0m [0;34m'accumulated'[0m[0;34m,[0m [0;34m'conditional'[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mN[0m[0;34m=[0m[0;36m300[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvariables[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvariable_type[0m[0;34m=[0m[0;34m'numerical'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgroups[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mspan[0m[0;34m=[0m[0;36m0.25[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgrid_points[0m[0;34m=[0m[0;36m101[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvariable_splits[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvariable_splits_type[0m[0;34m=[0m[0;34m'uniform'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcenter[0m[0;34m=[