Data Source:

De Cock, D. (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project. Journal of Statistics Education, 19(3). https://doi.org/10.1080/10691898.2011.11889627

In [1]:
from pyreal.sample_applications import ames_housing

In this tutorial, we use Pyreal to get feature contribution explanations for the AmesHousing dataset

First, we load in the data. Pyreal expects all data as DataFrames, where columns have the feature names.

In [2]:
x_orig, y = ames_housing.load_data(include_targets=True)
x_orig

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,0,20,RL,75.0,9937,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,6,2008,WD,Normal
1,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,2,2008,WD,Normal
2,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,5,2007,WD,Normal
3,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,9,2008,WD,Normal
4,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,2,2006,WD,Abnorml
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1455,20,FV,62.0,7500,Pave,Pave,Reg,Lvl,AllPub,...,0,0,,,,0,10,2009,WD,Normal
1456,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,0,,,,0,8,2007,WD,Normal
1457,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,0,,MnPrv,,0,2,2010,WD,Normal
1458,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,0,,GdPrv,Shed,2500,5,2010,WD,Normal


Next, we load in the interpretable feature descriptions.

In [3]:
feature_descriptions = ames_housing.load_feature_descriptions()
feature_descriptions

{'MSSubClass': ' Identifies the type of dwelling involved in the sale.',
 'MSZoning': ' Identifies the general zoning classification of the sale.',
 'LotFrontage': ' Linear feet of street connected to property',
 'LotArea': ' Lot size in square feet',
 'Street': ' Type of road access to property',
 'Alley': ' Type of alley access to property',
 'LotShape': ' General shape of property',
 'LandContour': ' Flatness of the property',
 'Utilities': ' Type of utilities available',
 'LotConfig': ' Lot configuration',
 'LandSlope': ' Slope of property',
 'Neighborhood': ' Physical locations within Ames city limits',
 'Condition1': ' Proximity to various conditions',
 'Condition2': ' Proximity to various conditions (if more than one is present)',
 'BldgType': ' Type of dwelling',
 'HouseStyle': ' Style of dwelling',
 'OverallQual': ' Rates the overall material and finish of the house',
 'OverallCond': ' Rates the overall condition of the house',
 'YearBuilt': ' Original construction date',
 'Ye

Next, we load the transfomers.

The first kind of transformer manually imputes the data based on information we know about the dataset.
We will call this the `AmesHousingImputer`

This imputation code comes from https://www.kaggle.com/juliencs/a-study-on-regression-applied-to-the-ames-dataset

In [4]:
transformers = ames_housing.load_transformers()
transformers

[<pyreal.sample_applications.ames_housing.AmesHousingImputer at 0x1ff966caca0>,
 <pyreal.transformers.one_hot_encode.OneHotEncoder at 0x1ffde3d0640>]

In [5]:
model = ames_housing.load_model()
model

Now, we can initialize a RealApp object.

In [6]:
from pyreal import RealApp

realApp =  RealApp(model,
                   x_orig,
                   y_train=y,
                   transformers=transformers,
                   feature_descriptions=feature_descriptions,
                   id_column="Id"
                )

We can make predictions using RealApp objects.

In [7]:
realApp.predict(x_orig.iloc[0:10])


{0: 157585.24996940728,
 1: 208451.08625945327,
 2: 200785.20760564407,
 3: 208661.74740795366,
 4: 166794.24940345713,
 5: 285660.2014755842,
 6: 156554.26930558987,
 7: 279133.15592026524,
 8: 225810.4710434134,
 9: 138028.17155712598}

In [8]:
realApp.produce_feature_contributions(x_orig.iloc[0:10])


{0:                                          Feature Name Feature Value  \
 0    Identifies the type of dwelling involved in t...            20   
 1         Linear feet of street connected to property          75.0   
 2                             Lot size in square feet          9937   
 3    Rates the overall material and finish of the ...             5   
 4            Rates the overall condition of the house             6   
 ..                                                ...           ...   
 74                                       Pool quality           NaN   
 75                                      Fence quality           NaN   
 76   Miscellaneous feature not covered in other ca...           NaN   
 77                                       Type of sale            WD   
 78                                  Condition of sale        Normal   
 
     Contribution Average/Mode  
 0    2400.592543         47.0  
 1     177.749488    71.444444  
 2   -1893.089348      10374.8  