# Machine Learning Readability

The following are techniques to establish how to read results of models and how to figure out the features to get there.

## 1. Permutation Importance

Permutation Importance is a simple sklearn way of identifying the best features. This is good if the data isn't intuitive or you don't have proper labels on your data - that way, you can work with the highest correlated features and play with them to get even better features.

Permutation importances are done __after a model has been fit!__

In [None]:
# obtain feature importance via "eli5" sklearn library

import eli5
from eli5.sklearn import PermutationImportance
from sklearn.model_selection import train_test_split

base_features = ['pickup_longitude',
                 'pickup_latitude',
                 'dropoff_longitude',
                 'dropoff_latitude',
                 'passenger_count']

# fit model first!
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
first_model = RandomForestRegressor(n_estimators=30, 
                                    random_state=1)
first_model.fit(train_X, train_y)

# permutation importance requires a second fit...but gotta separately fit the model first

perm = PermutationImportance(first_model, random_state=1).fit(val_X, val_y)

eli5.show_weights(perm, feature_names = base_features)

## 2. Partial Dependence Plots

Feature Importance above shows _what_ variables most affect predictions...but it doesn't show _how_ a feature affects a prediction.

That's where __Partial Dependence Plots__ come in. Like Permutations, PD plots are calculated __after__ a model has been fit.

__How it Works:__ You alter the value of _one feature_ after a model has been fit and see the change in predictions.

In [None]:
# EXAMPLE

from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

from sklearn.tree import DecisionTreeClassifier
tree_model = DecisionTreeClassifier(random_state=0, 
                                    max_depth=5, 
                                    min_samples_split=5).fit(train_X, train_y)


from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

# Create the data that we will plot
pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=feature_names, feature='Goal Scored')

# plot it
pdp.pdp_plot(pdp_goals, 'Goal Scored')
plt.show()

For the PDP Graph: 
- x axis is the feature values (price, goals scored, etc.)
- y axis is the change in prediction from what it would be predicted at the baseline (leftmost of the graph) value. Shaded area is a confidence interval.

PDP Graphs only analyze one feature. However, you can change it to account for changes in _two_ features in a quasi-heatmap graph, where the heat is the degree of change. See below:

In [None]:
'''
2 feature PDP graph is similar to regular PDP plot except we use pdp_interact 
instead of pdp_isolate and pdp_interact_plot instead of pdp_isolate_plot
'''
features_to_plot = ['Goal Scored', 'Distance Covered (Kms)']
inter1  =  pdp.pdp_interact(model=tree_model, dataset=val_X, model_features=feature_names, features=features_to_plot)

pdp.pdp_interact_plot(pdp_interact_out=inter1, feature_names=features_to_plot, plot_type='contour')
plt.show()