# Accumulated Local Effects (ALE)  -

for determining the feature effects on a model and how these work on linear and non linear models.

In [None]:
pip install alibi

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from alibi.explainers import ALE, plot_ale

The dataset used is Boston Housing Prices(regression).

In [None]:
data = load_boston()
feature_names = data.feature_names
X = data.data
y = data.target
print(feature_names)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

2 separate models, one linear and one non linear, have been applied for analysis -
### 1. Linear Regression

In [None]:
lr = LinearRegression()

In [None]:
lr.fit(X_train, y_train)

In [None]:
mean_squared_error(y_test, lr.predict(X_test))

###2. Random Forest

In [None]:
rf = RandomForestRegressor()
rf.fit(X_train, y_train)

In [None]:
mean_squared_error(y_test, rf.predict(X_test))

Below shown is a scatter plot between the feature 'RM' vs the model predictions.

In [None]:
FEATURE = 'RM'
index = np.where(feature_names==FEATURE)[0][0]

fig, ax = plt.subplots()
ax.scatter(X_train[:, index], lr.predict(X_train));

ax.set_xlabel(FEATURE);
ax.set_ylabel('Value in $1000\'s');

INTERPRETATION -
It is clearly seen that the graph is increasing in nature(positive correlation) i.e. as the value of RM increases, the y axis that shows the price value in 1000's also increases.

To study the influence of all feaatures and not just RM we require ALE to block the effects of other features and discuss the impact of a particular feature.


ALE is a global explanation method that takes in complete data for which the model feature effects are computed.

Applying ALE

In [None]:
lr_ale = ALE(lr.predict, feature_names=feature_names, target_names=['Value in $1000\'s'])
rf_ale = ALE(rf.predict, feature_names=feature_names, target_names=['Value in $1000\'s'])

In [None]:
lr_exp = lr_ale.explain(X_train)
rf_exp = rf_ale.explain(X_train)

In [None]:
lr_exp.feature_names

PLotting ALE first on Linear Regression model(linear) -

In [None]:
plot_ale(lr_exp, fig_kw={'figwidth':10, 'figheight': 10});

To study the interpretation, let us consider one feature such as effect of 'RM' and analyse.

In [None]:
plot_ale(lr_exp, features=['RM']);

The above plot can be interpreted as follows -
For a particular value of RM on x, the y axis shows how much increase or decrease(positive-increase, negative-decrease) is done on the price. 
The ALE on the y-axis of the plot above is in the units of the prediction variable which, in this case, is the value of the house in $1000's. 

For example, the ALE value for the point RM=8 is ~7.5. this means that the for neighbourhoods for which the average number of rooms is ~8 the model predicts an increase of ~$7500 due to feature RM. On the other hand, for neighbourhoods with an average number of rooms lower than ~6.25, the impact on the prediction becomes negative, i.e. a smaller number of rooms lowers the predicted value.

All ALE values are relative to the average prediction which is discussed in detail below-

The neighbourhoods for which the average number of rooms are close to 8 -

In [None]:
lower_index = np.where(lr_exp.feature_values[5] < 8)[0][-1]
upper_index = np.where(lr_exp.feature_values[5] > 8)[0][0]
subset = X_train[(X_train[:, 5] > lr_exp.feature_values[5][lower_index])
                 & (X_train[:, 5] < lr_exp.feature_values[5][upper_index])]
print(subset.shape)

The mean prediction on this subset is -

In [None]:
subset_pred = lr.predict(subset).mean()
subset_pred

The mean prediction averaged across the whole dataset is-

In [None]:
mean_pred = lr.predict(X_train).mean()
mean_pred

In [None]:
subset_pred - mean_pred

This difference is the total increase in prive in 1000's for neigbourhoods with the average room number close to 8. We have subtracted the mean averaged over entire dataset from prediction on subset(RM~8) since the ALE is always relative to the average prediction.

Let us take another example to analyse the advantage and need of using ALE over other methods-

In [None]:
#Crime level
plot_ale(lr_exp, features=['CRIM']);

ALE plot though being a global interpretation method and thus taking the whole dataset in consideration, gives feature effects for all values separately, unlike PDP and other feature importance plots. For example, in the above plot, the dots show the value of ALE on y axis for every value of the feature CRIM. The individual dots in the plot shows individual values instead of calculating the importance in general. For eg, at CRIM ~ 30, the ALE value is -3 showing that the model predicts a decrease of $3000 in price(higher the crime level, the lower is the price).

From interval 30 to 85, due to lack of or no data, it has simply interpolated the line.

Thus, these can help assess in which areas of the feature space, the estimated results of feature importance are more reliable.

Similarly , now ALE is applied on Random Forest model (non-linear) -

In [None]:
axes = plot_ale(rf_exp, fig_kw={'figwidth':10, 'figheight': 10});

Because the model is non-linear, the ALE plots are non-linear and non-monotonic in some cases. Similar to the previous examples,the ALE value at a point is the relative feature effect with respect to the mean feature effect. 

From these plots, it is seen that that the feature RM has the largest impact on the prediction. 

In [None]:
fig, ax = plt.subplots()
plot_ale(lr_exp, features=['RM'], ax=ax, line_kw={'label': 'Linear regression'});
plot_ale(rf_exp, features=['RM'], ax=ax, line_kw={'label': 'Random forest'});

INTERPRETATION-

While the linear regression feature effects of RM are positively correlated (the higher the no. of rooms, the higher the price), the random forest feature effects are increasing in a non linear fashion.


### Comparing ALE for Linear Regression and Random Forest for all the features

In [None]:
fig, ax = plt.subplots(5, 3, sharey='all');

plot_ale(lr_exp, ax=ax, fig_kw={'figwidth':10, 'figheight': 10},
         line_kw={'label': 'Linear regression'});
plot_ale(rf_exp, ax=ax, line_kw={'label': 'Random forest'});