Data Source:

De Cock, D. (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project. Journal of Statistics Education, 19(3). https://doi.org/10.1080/10691898.2011.11889627

In [2]:
from pyreal.sample_applications import ames_housing

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


In this tutorial, we use Pyreal to get feature contribution explanations for the AmesHousing dataset

First, we load in the data. Pyreal expects all data as DataFrames, where columns have the feature names.

In [3]:
x_orig, y = ames_housing.load_data(include_targets=True)

Next, we load in the interpretable feature descriptions.

In [4]:
feature_descriptions = ames_housing.load_feature_descriptions()
feature_descriptions

{'MSSubClass': ' Identifies the type of dwelling involved in the sale.',
 'MSZoning': ' Identifies the general zoning classification of the sale.',
 'LotFrontage': ' Linear feet of street connected to property',
 'LotArea': ' Lot size in square feet',
 'Street': ' Type of road access to property',
 'Alley': ' Type of alley access to property',
 'LotShape': ' General shape of property',
 'LandContour': ' Flatness of the property',
 'Utilities': ' Type of utilities available',
 'LotConfig': ' Lot configuration',
 'LandSlope': ' Slope of property',
 'Neighborhood': ' Physical locations within Ames city limits',
 'Condition1': ' Proximity to various conditions',
 'Condition2': ' Proximity to various conditions (if more than one is present)',
 'BldgType': ' Type of dwelling',
 'HouseStyle': ' Style of dwelling',
 'OverallQual': ' Rates the overall material and finish of the house',
 'OverallCond': ' Rates the overall condition of the house',
 'YearBuilt': ' Original construction date',
 'Ye

Next, we load the transfomers.

The first kind of transformer manually imputes the data based on information we know about the dataset.
We will call this the `AmesHousingImputer`

This imputation code comes from https://www.kaggle.com/juliencs/a-study-on-regression-applied-to-the-ames-dataset

In [5]:
transformers = ames_housing.load_transformers()
transformers

[<pyreal.sample_applications.ames_housing.AmesHousingImputer at 0x174caadeb30>,
 <pyreal.transformers.one_hot_encode.OneHotEncoder at 0x174caadeb90>]

In [6]:
model = ames_housing.load_model()
model

Now, we can initialize a RealApp object.

In [7]:
from pyreal import RealApp

realApp =  RealApp(model,
                   X_train_orig=x_orig,
                   y_train=y,
                   transformers=transformers,
                   feature_descriptions=feature_descriptions,
                   id_column="Id"
                )

We can make predictions using RealApp objects.

In [8]:
realApp.predict(x_orig.iloc[0:10])


{0: 157585.2499694071,
 1: 208451.08625945353,
 2: 200785.2076056447,
 3: 208661.747407954,
 4: 166794.2494034586,
 5: 285660.2014755854,
 6: 156554.26930558967,
 7: 279133.15592026466,
 8: 225810.47104341377,
 9: 138028.17155712727}

And generate different types of explanations

In [9]:
realApp.produce_feature_contributions(x_orig.iloc[0])

Unnamed: 0,Feature Name,Feature Value,Contribution,Average/Mode
0,Identifies the type of dwelling involved in t...,20,1719.226755,20
1,Linear feet of street connected to property,75.0,219.032356,75.0
2,Lot size in square feet,9937,-1529.914407,9937
3,Rates the overall material and finish of the ...,5,-10743.763013,5
4,Rates the overall condition of the house,6,2565.738373,6
...,...,...,...,...
74,Type of roof,Gable,-273.27385,Gable
75,Condition of sale,Normal,1302.65723,Normal
76,Type of sale,WD,-2609.610771,WD
77,Type of road access to property,Pave,0.0,Pave


In [10]:
realApp.produce_similar_examples(x_orig.iloc[0], num_examples=2, fast=True)

{'X':       Identifies the type of dwelling involved in the sale.  \
 0                                                   20        
 202                                                 20        
 
      Identifies the general zoning classification of the sale.  \
 0                                                   RL           
 202                                                 RL           
 
       Linear feet of street connected to property   Lot size in square feet  \
 0                                            75.0                      9937   
 202                                          75.0                     10125   
 
      Type of road access to property  Type of alley access to property  \
 0                               Pave                               NaN   
 202                             Pave                               NaN   
 
      General shape of property  Flatness of the property  \
 0                          Reg                       Lvl   
 202   

In [11]:
realApp.produce_feature_importance()

Unnamed: 0,Feature Name,Importance
Identifies the type of dwelling involved in the sale.,Identifies the type of dwelling involved in t...,1783.873890
Linear feet of street connected to property,Linear feet of street connected to property,349.504024
Lot size in square feet,Lot size in square feet,3205.578462
Rates the overall material and finish of the house,Rates the overall material and finish of the ...,8240.334718
Rates the overall condition of the house,Rates the overall condition of the house,4576.497005
...,...,...
Type of roof,Type of roof,4487.693167
Condition of sale,Condition of sale,2432.473278
Type of sale,Type of sale,5139.319727
Type of road access to property,Type of road access to property,126.182607


In [12]:
realApp.explainers["lfc"]

{'shap': <pyreal.explainers.lfc.local_feature_contribution.LocalFeatureContribution at 0x174caadff40>}