# Feature Importance in NimbusML
In many cases, it is often desirable to not only obtain the predictions from the machine learning model, but also get some sort of 'explanations': why did the model make this prediction? What were the features that affected the predictions the most?

This might be especially relevant in cases with business or regulatory requirements to have explainable decisions, for example explaining the most important factors for a credit application being denied.

In addition, this information helps the experimenter to understand the model better, check for overfitting, and verify the quality of features. NimbusML provides mechanisms for model analysis that provide both model-wide and example-level feature importances.


### Model-wide Analysis: Permutaiton Feature Importance (PFI)
Permutation Feature Importance is a technique that calculates how much each feature 'matters' to the predictions. Namely, how much the model's predictions will change if we randomly permute the values of one feature across the evaluation set? If the quality doesn't change much, this feature is not very important. If the quality drops drastically, this was a really important feature. NimbusML provides an implementation of PFI with the `permutation_feature_importance()` method in the `Pipeline()` object and individual prediction estimators.


### Example-level Analysis: Feature Contributions
Observation level feature importances explain which features were most important when making a *specific* prediction. When predictions are made on a dataset, a score is produced for each example. For classification, this scores gets converted to a probability to make a prediction, and for regression, the score is the prediction itself. To understand and explain these predictions it can be useful to inspect which features influenced them most significantly.

The `get_feature_contributions()` method in the NimbusML `Pipeline()` object and individual prediction extimators computes per-feature contributions to the score for each example. These contributions can be positive (they make the score higher) or negative (they make the score lower). Feature contributions are implemented for **linear and tree models** in NimbusML.

## Tutorial
The following tutorial will show how to use the model level and example level feature importances in NimbusML, using the UCI Adult Income dataset as an example. The dataset is used for a binary classification problem where the label is whether or not an indivisual's income is over $50,000.

#### Loading Data

In [1]:
import os
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import LogisticRegressionBinaryClassifier
from nimbusml.preprocessing.schema import ColumnSelector

In [2]:
train_path = get_dataset('uciadult_train').as_filepath()
test_path = get_dataset('uciadult_test').as_filepath()
print("Train data file path: " + str(os.path.basename(train_path)))
print("Test data file path: " + str(os.path.basename(test_path)))

train_data = FileDataStream.read_csv(train_path)
test_data = FileDataStream.read_csv(test_path)

train_data.head()

Train data file path: train-500.uciadult.sample.csv
Test data file path: test-100.uciadult.sample.csv


Unnamed: 0,label,workclass,education,marital-status,occupation,relationship,ethnicity,sex,native-country-region,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
0,0,Private,11th,Never-married,Machine-op-inspct,Own-child,Black,Male,United-States,25,226802,7,0,0,40
1,0,Private,HS-grad,Married-civ-spouse,Farming-fishing,Husband,White,Male,United-States,38,89814,9,0,0,50
2,1,Local-gov,Assoc-acdm,Married-civ-spouse,Protective-serv,Husband,White,Male,United-States,28,336951,12,0,0,40
3,1,Private,Some-college,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,United-States,44,160323,10,7688,0,40
4,0,?,Some-college,Never-married,?,Own-child,White,Female,United-States,18,103497,10,0,0,30


#### Train linear and tree binary classifiers

In [3]:
%%capture
# supress training output as it not relevant to a discusion of feature importances

feature_columns = ['age', 'capital-gain', 'hours-per-week',
                   'education', 'marital-status', 'ethnicity', 'sex']

cat = OneHotVectorizer(columns=['education', 'marital-status', 'ethnicity', 'sex'])

linear_clf = LogisticRegressionBinaryClassifier(feature=feature_columns, label='label')
linear_model = Pipeline(steps=[cat, linear_clf])
linear_model.fit(train_data)

tree_clf = FastTreesBinaryClassifier(feature=feature_columns, label='label')
tree_model = Pipeline(steps=[cat, tree_clf])
tree_model.fit(train_data)

#### Permutation Feature Importance (PFI)
Evaluate PFI for the linear model on the test data to get feature importance when making predictions. The training data can be used similarly to analyze important features during training.

Here, we permute each of the `Features.*` columns 5 times and report the mean change in each metric, along with the statndard error of the mean. Note that the most important features will be different for each metric of interest. It is up to the user to determine which metric(s) they care about most, and look at the PFI for that metric.

Let's look at the most important features with respect to Area Under ROC Curve (AUC). Since AUC is an increasing metric, the features that decreased AUC the most are the most important.

In [4]:
pfi = linear_model.permutation_feature_importance(test_data, permutation_count=5)
pfi.sort_values('AreaUnderRocCurve').head()

Unnamed: 0,FeatureName,AreaUnderRocCurve,AreaUnderRocCurve.StdErr,Accuracy,Accuracy.StdErr,PositivePrecision,PositivePrecision.StdErr,PositiveRecall,PositiveRecall.StdErr,NegativePrecision,NegativePrecision.StdErr,NegativeRecall,NegativeRecall.StdErr,F1Score,F1Score.StdErr,AreaUnderPrecisionRecallCurve,AreaUnderPrecisionRecallCurve.StdErr
19,marital-status.Married-civ-spouse,-0.153399,0.019996,-0.042,0.005831,-0.060563,0.048983,-0.2,0.01559,-0.041178,0.003361,0.007895,0.005263,-0.23131,0.022871,-0.239532,0.043835
18,marital-status.Never-married,-0.047752,0.011941,-0.022,0.008,-0.024848,0.039564,-0.108333,0.02826,-0.022613,0.006113,0.005263,0.003223,-0.116835,0.03617,-0.107423,0.028333
1,capital-gain,-0.022643,0.002877,-0.016,0.002449,-0.041616,0.013287,-0.058333,0.010206,-0.013252,0.002082,-0.002632,0.002632,-0.064925,0.010544,-0.078153,0.022393
11,education.Masters,-0.016941,0.003446,-0.02,0.003162,-0.061616,0.018499,-0.066667,0.010206,-0.015474,0.002194,-0.005263,0.003223,-0.07669,0.011168,-0.053875,0.007807
12,education.Doctorate,-0.013268,0.001894,-0.012,0.002,-0.032727,0.014545,-0.041667,0.0,-0.009638,0.0004,-0.002632,0.002632,-0.046387,0.002689,-0.044245,0.016611


#### Example-level Feature Contributions (Linear Models)
Let's look at feature contributions for individual predictions on the test data using the linear model. For linear models, each feature's contribution to the score is equal to the product of the feature times the corresponding weight.

In [5]:
linear_fc = linear_model.get_feature_contributions(test_data)
linear_fc.filter(regex='label|PredictedLabel|Score|Probability|FeatureContributions').head()

Unnamed: 0,label,PredictedLabel,Score,Probability,FeatureContributions.age,FeatureContributions.capital-gain,FeatureContributions.hours-per-week,FeatureContributions.education.11th,FeatureContributions.education.HS-grad,FeatureContributions.education.Assoc-acdm,...,FeatureContributions.marital-status.Separated,FeatureContributions.marital-status.Married-spouse-absent,FeatureContributions.marital-status.Married-AF-spouse,FeatureContributions.ethnicity.Black,FeatureContributions.ethnicity.White,FeatureContributions.ethnicity.Asian-Pac-Islander,FeatureContributions.ethnicity.Other,FeatureContributions.ethnicity.Amer-Indian-Inuit,FeatureContributions.sex.Male,FeatureContributions.sex.Female
0,0,0,-4.047609,0.017164,0.030594,0.0,0.360155,-0.59735,0.0,0.0,...,0.0,0.0,0.0,0.059523,0.0,0.0,0.0,0.0,0.05366,0.0
1,0,0,-0.463503,0.386155,0.02975,0.0,0.288005,0.0,-0.181008,0.0,...,0.0,0.0,0.0,0.0,0.148458,0.0,0.0,0.0,0.034328,0.0
2,1,0,-0.059127,0.485223,0.021921,0.0,0.230404,0.0,0.0,0.112218,...,0.0,0.0,0.0,0.0,0.148458,0.0,0.0,0.0,0.034328,0.0
3,1,0,-0.618038,0.350228,0.034447,0.072675,0.230404,0.0,0.0,0.0,...,0.0,0.0,0.0,0.038079,0.0,0.0,0.0,0.0,0.034328,0.0
4,0,0,-3.723567,0.023578,0.022028,0.0,0.270116,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.232061,0.0,0.0,0.0,0.0,-0.054896


#### Example-level Feature Contributions (Tree Models)
Feature contributions for tree models are determined based on which splits in the decision trees have the most impact on the final score. The calculation is done by evaluating the score we would have gotten *if we had taken the opposite split* everytime we encountered a given feature. The importance for this feature is then given by the difference between this score and the original score. 

In [6]:
tree_fc = tree_model.get_feature_contributions(test_data)
tree_fc.filter(regex='label|PredictedLabel|Score|Probability|FeatureContributions').head()

Unnamed: 0,label,PredictedLabel,Score,Probability,FeatureContributions.age,FeatureContributions.capital-gain,FeatureContributions.hours-per-week,FeatureContributions.education.11th,FeatureContributions.education.HS-grad,FeatureContributions.education.Assoc-acdm,...,FeatureContributions.marital-status.Separated,FeatureContributions.marital-status.Married-spouse-absent,FeatureContributions.marital-status.Married-AF-spouse,FeatureContributions.ethnicity.Black,FeatureContributions.ethnicity.White,FeatureContributions.ethnicity.Asian-Pac-Islander,FeatureContributions.ethnicity.Other,FeatureContributions.ethnicity.Amer-Indian-Inuit,FeatureContributions.sex.Male,FeatureContributions.sex.Female
0,0,0,-25.577993,3.6e-05,-0.72882,-1.0,-0.367933,-0.162649,-0.027093,0.0,...,-0.154559,0.0,0.0,0.111279,0.0,0.0,0.0,0.0,0.0,-0.03548
1,0,0,-10.135474,0.017054,-1.0,-0.668069,-0.490714,0.225786,-0.101992,0.0,...,-0.111768,0.0,0.0,-0.096452,0.069662,0.0,0.0,0.0,0.009991,-0.013131
2,1,1,2.394207,0.722658,0.892277,-0.850473,-0.321354,0.745953,0.117656,0.0,...,-0.172352,0.0,0.0,-0.150833,0.756565,0.0,0.0,0.0,0.0,0.0
3,1,1,18.900896,0.99948,0.23666,1.0,0.417579,0.437051,0.0,0.0,...,0.0,0.0,0.0,0.335541,0.0,0.0,0.0,0.0,-0.00597,-0.007641
4,0,0,-26.024494,3e-05,-1.0,-0.90215,-0.912701,0.162933,0.114411,0.0,...,-0.138418,0.0,0.0,-0.031655,0.040154,0.0,0.0,0.0,0.0,0.0
