In [1]:
%run Latex_macros.ipynb
%run beautify_plots.py

<IPython.core.display.Latex object>

In [2]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# Feature Importance

Given the $n$ features in $\x$, which are the "most important" ?

The multiple trees in a Random Forest offer several ways to answer this question.

## Importance: Decrease in Impurity

Recall that the question that splits the examples corresponding to a node is chosen so as
to maximize Information Gain.

One method of measuring the importance of $\x_j$ is the amount of impurity decrease it creates.

- For each feature $\x_j$
    - find each node $\node{n}$ in *any* tree in the forest with question $(j, v)$ for *any* $v$
        - compute the information gain of the split on $(j, v)$ 
    - average the information gain across all such nodes
    
That is, how much does impurity decrease when $\x_j$ is used in a question.

- This is a biased method
    - Recall the universe of possible values of $\x_j$ is $V_j$
    - Larger $| V_j |$ means $\x_j$ is more likely to appear in a questions
        - e.g., when $\x_j$ is a continuous variable that has been made discrete
    - So $\x_j$ will appear in more questions


## Importance: Permutation importance

Let's consider building one tree from bootstrapped sample $S$.

Create another sample $S'$, derived from $S$ by *permuting* the values of $\x_j$.
- maintains the unconditional distribution of $\x_j$
- breaks the correlation of $\x_j$ with the target and other features

We can now measure the importance of $\x_j$ as
- the change in out of bag accuracy of the tree built from $S$ and $S'$.

That is, if $\x_j$ is unimportant, then permuting its values should have little effect on accuracy.

<table>
    <tr>
        <center>Permutation Importance, feature j</center>
    </tr>
<img src=images/Permutation_importance.png width=800>
</table>

Permutation importance also has issues
- may be biased if $\x_j$ is strongly correlated with another feature $\x_{j'}$

In that case $\x_{j'}$ may compensate for the permuted $\x_j$, making $\x_j$ seem unimportant.

In [4]:
print("Done")

Done
