**Description of the Ames Iowa Housing Data columns:**

SOURCE: https://rdrr.io/cran/AmesHousing/man/ames_raw.html

Order: Observation number

PID: Parcel identification number - can be used with city web site for parcel review.

MS SubClass: Identifies the type of dwelling involved in the sale.

MS Zoning: Identifies the general zoning classification of the sale.

Lot Frontage: Linear feet of street connected to property

Lot Area: Lot size in square feet

Street: Type of road access to property

Alley: Type of alley access to property

Lot Shape: General shape of property

Land Contour: Flatness of the property

Utilities: Type of utilities available

Lot Config: Lot configuration

Land Slope: Slope of property

Neighborhood: Physical locations within Ames city limits (map available)

Condition 1: Proximity to various conditions

Condition 2: Proximity to various conditions (if more than one is present)

Bldg Type: Type of dwelling

House Style: Style of dwelling

Overall Qual: Rates the overall material and finish of the house

Overall Cond: Rates the overall condition of the house

Year Built: Original construction date

Year Remod/Add: Remodel date (same as construction date if no remodeling or additions)

Roof Style: Type of roof

Roof Matl: Roof material

Exterior 1: Exterior covering on house

Exterior 2: Exterior covering on house (if more than one material)

Mas Vnr Type: Masonry veneer type

Mas Vnr Area: Masonry veneer area in square feet

Exter Qual: Evaluates the quality of the material on the exterior

Exter Cond: Evaluates the present condition of the material on the exterior

Foundation: Type of foundation

Bsmt Qual: Evaluates the height of the basement

Bsmt Cond: Evaluates the general condition of the basement

Bsmt Exposure: Refers to walkout or garden level walls

BsmtFin Type 1: Rating of basement finished area

BsmtFin SF 1: Type 1 finished square feet

BsmtFinType 2: Rating of basement finished area (if multiple types)

BsmtFin SF 2: Type 2 finished square feet

Bsmt Unf SF: Unfinished square feet of basement area

Total Bsmt SF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

Central Air: Central air conditioning

Electrical: Electrical system

1st Flr SF: First Floor square feet

2nd Flr SF: Second floor square feet

Low Qual Fin SF: Low quality finished square feet (all floors)

Gr Liv Area: Above grade (ground) living area square feet

Bsmt Full Bath: Basement full bathrooms

Bsmt Half Bath: Basement half bathrooms

Full Bath: Full bathrooms above grade

Half Bath: Half baths above grade

Bedroom: Bedrooms above grade (does NOT include basement bedrooms)

Kitchen: Kitchens above grade

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality (Assume typical unless deductions are warranted)

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

Garage Type: Garage location

Garage Yr Blt: Year garage was built

Garage Finish: Interior finish of the garage

Garage Cars: Size of garage in car capacity

Garage Area: Size of garage in square feet

Garage Qual: Garage quality

Garage Cond: Garage condition

Paved Drive: Paved driveway

Wood Deck SF: Wood deck area in square feet

Open Porch SF: Open porch area in square feet

Enclosed Porch: Enclosed porch area in square feet

3-Ssn Porch: Three season porch area in square feet

Screen Porch: Screen porch area in square feet

Pool Area: Pool area in square feet

Pool QC: Pool quality

Fence: Fence quality

Misc Feature: Miscellaneous feature not covered in other categories

Misc Val: $Value of miscellaneous feature

Mo Sold: Month Sold

Yr Sold: Year Sold

Sale Type: Type of sale

Sale Condition: Condition of sale

**Load packages:**

In [1]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

%config InlineBackend.figure_format = 'retina'
%matplotlib inline


 **Load the data:**

In [2]:
ames_file_train = '../data/train.csv'


In [3]:
df = pd.read_csv(ames_file_train)


**Describe the basic format:**

In [4]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


In [5]:
df.dtypes


Id                   int64
PID                  int64
MS SubClass          int64
MS Zoning           object
Lot Frontage       float64
Lot Area             int64
Street              object
Alley               object
Lot Shape           object
Land Contour        object
Utilities           object
Lot Config          object
Land Slope          object
Neighborhood        object
Condition 1         object
Condition 2         object
Bldg Type           object
House Style         object
Overall Qual         int64
Overall Cond         int64
Year Built           int64
Year Remod/Add       int64
Roof Style          object
Roof Matl           object
Exterior 1st        object
Exterior 2nd        object
Mas Vnr Type        object
Mas Vnr Area       float64
Exter Qual          object
Exter Cond          object
Foundation          object
Bsmt Qual           object
Bsmt Cond           object
Bsmt Exposure       object
BsmtFin Type 1      object
BsmtFin SF 1       float64
BsmtFin Type 2      object
B

In [6]:
df.head(5)


Unnamed: 0,Id,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
0,109,533352170,60,RL,,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [7]:
df.shape


(2051, 81)

**Re-organize data:**

In [8]:
ames_file_train = '../data/train.csv'


In [9]:
df = pd.read_csv(ames_file_train, index_col="Id")


In [10]:
df = df.sort_values(by='Id')


In [11]:
df.index = df.index.set_names([''])


In [12]:
df.head(5)

Unnamed: 0,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,526301100.0,20.0,RL,141.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,526351010.0,20.0,RL,81.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,527105010.0,60.0,RL,74.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,527145080.0,120.0,RL,43.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,527146030.0,120.0,RL,39.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [13]:
df.shape


(2051, 80)

**Drop unwanted columns:**

* SAVE AS MUCH AS THE DATA AS YOU CAN, ONLY DROP COLUMNS THAT MODELS WILL NOT NEED (PID) 

In [14]:
df.drop(columns=['PID'], inplace=True)


In [15]:
df.head(5)

Unnamed: 0,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,RL,141.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,RL,81.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,RL,74.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,RL,43.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,RL,39.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


**Determine missing observations:**

In [16]:
df.isnull().sum()

MS SubClass           0
MS Zoning             0
Lot Frontage        330
Lot Area              0
Street                0
Alley              1911
Lot Shape             0
Land Contour          0
Utilities             0
Lot Config            0
Land Slope            0
Neighborhood          0
Condition 1           0
Condition 2           0
Bldg Type             0
House Style           0
Overall Qual          0
Overall Cond          0
Year Built            0
Year Remod/Add        0
Roof Style            0
Roof Matl             0
Exterior 1st          0
Exterior 2nd          0
Mas Vnr Type         22
Mas Vnr Area         22
Exter Qual            0
Exter Cond            0
Foundation            0
Bsmt Qual            55
Bsmt Cond            55
Bsmt Exposure        58
BsmtFin Type 1       55
BsmtFin SF 1          1
BsmtFin Type 2       56
BsmtFin SF 2          1
Bsmt Unf SF           1
Total Bsmt SF         1
Heating               0
Heating QC            0
Central Air           0
Electrical      

**Make the column names searchable**

In [17]:
df.columns = df.columns.str.replace('3','three')


In [18]:
df.columns = df.columns.str.replace('1st','first')


In [19]:
df.columns = df.columns.str.replace('/','_')


In [20]:
df.columns = df.columns.str.replace(' ','_')


In [21]:
df.columns = df.columns.str.lower()

In [22]:
df.dtypes


ms_subclass          int64
ms_zoning           object
lot_frontage       float64
lot_area             int64
street              object
alley               object
lot_shape           object
land_contour        object
utilities           object
lot_config          object
land_slope          object
neighborhood        object
condition_1         object
condition_2         object
bldg_type           object
house_style         object
overall_qual         int64
overall_cond         int64
year_built           int64
year_remod_add       int64
roof_style          object
roof_matl           object
exterior_first      object
exterior_2nd        object
mas_vnr_type        object
mas_vnr_area       float64
exter_qual          object
exter_cond          object
foundation          object
bsmt_qual           object
bsmt_cond           object
bsmt_exposure       object
bsmtfin_type_1      object
bsmtfin_sf_1       float64
bsmtfin_type_2      object
bsmtfin_sf_2       float64
bsmt_unf_sf        float64
t

**Analyze & Evaluate Columns**

* I will try and predict housing prices, any data that could by useful in this task will be kept.

In [23]:
# COLUMN:		ms_subclass
# DEFINITION:	Identifies the type of dwelling involved in the sale.
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.ms_subclass.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING VALUES!


array([ 20,  60, 120, 160,  80,  50,  90,  30, 190,  70,  85,  75,  45,
       180,  40, 150])

In [24]:
# COLUMN:		ms_zoning
# DEFINITION:	Identifies the general zoning classification of the sale.
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. ms_zoning.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO MISSING VALUES!


array(['RL', 'FV', 'RH', 'RM', 'C (all)', 'I (all)', 'A (agr)'],
      dtype=object)

In [25]:
ms_zoning = {'RL':'0', 'FV':'1', 'RH':'2', 'RM':'3', 'C (all)':'4', 'I (all)':'5', 'A (agr)':'6'}


In [26]:
df.ms_zoning = [ms_zoning[item] for item in df.ms_zoning] 


In [27]:
df.ms_zoning.unique()


array(['0', '1', '2', '3', '4', '5', '6'], dtype=object)

In [28]:
df["ms_zoning"] = df["ms_zoning"].astype(dtype=np.int)


In [29]:
df['ms_zoning'].dtype


dtype('int64')

In [30]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,141.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,81.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,74.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,43.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,39.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [31]:
# COLUMN:		lot_frontage
# DEFINITION:	Linear feet of street connected to property
# DATA TYPE:	float64
# MISSING VALUES:	330
# UNIQUE VALUES:	
df.lot_frontage.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING VALUES, COLUMN TO BE DROPPED!


array([141.,  81.,  74.,  43.,  39.,  60.,  75.,  nan,  63.,  85.,  47.,
       140., 105.,  65.,  70.,  26.,  21.,  53.,  24., 102.,  98.,  95.,
        79., 100., 110.,  61.,  41.,  36.,  67., 108.,  59.,  92.,  58.,
        56.,  73.,  72.,  76.,  50.,  55.,  68., 107.,  25.,  30.,  57.,
        40.,  80.,  77.,  90.,  88., 120., 137., 119.,  78.,  71.,  87.,
        69.,  52.,  51.,  54.,  94.,  44.,  83.,  64.,  82.,  38.,  48.,
        89.,  66.,  35., 129.,  93.,  42.,  99.,  96., 104.,  97., 103.,
        34., 117.,  62., 174., 106.,  84., 128.,  91., 144., 122., 112.,
        86.,  45., 130., 109., 113., 125., 101.,  46., 114., 135.,  37.,
        22.,  32., 313.,  49., 124., 123., 150., 160., 195., 118., 134.,
       116., 138., 155., 115., 200., 111., 121.,  33., 153.])

In [32]:
df = df.drop(columns=['lot_frontage'])


In [33]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [34]:
# COLUMN:		lot_area
# DEFINITION:	Lot size in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING VALUES!


array([31770, 14267, 13830, ..., 17400,  7937,  8885])

In [35]:
# COLUMN:		alley
# DEFINITION:	Alley: Type of alley access to property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.alley.unique()
# EVALUATION:COLUMN WHERE THERE IS ONE POINT OF ENTRY, COULD EFFECT PRICE. MISSING DATA CAUSE LOT DOESN'T ABUTT ALLEY.
# CHANGED DATA TO REFLECT WHETHER LOT HAS ALLEY.


array([nan, 'Pave', 'Grvl'], dtype=object)

In [None]:
alley = {np.nan:'0', 'Pave':'1', 'Grvl':'1'}


In [37]:
alley = {np.nan:'0', 'Pave':'1', 'Grvl':'1'}
alley = [alley[item] for item in df.alley] 
df["alley"] = alley.astype(dtype=np.int)


AttributeError: 'list' object has no attribute 'astype'

In [42]:
df['alley'].dtype


dtype('O')

In [43]:
df.alley.unique()


array([nan, 'Pave', 'Grvl'], dtype=object)

In [46]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,Pave,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,Pave,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,Pave,0.0,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,Pave,0.0,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,Pave,0.0,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [38]:
# COLUMN:		lot_shape
# DEFINITION:	General shape of property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.lot_shape.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING VALUES, COLUMN TO BE DROPPED!


array(['IR1', 'Reg', 'IR2', 'IR3'], dtype=object)

In [41]:
df=df.drop(columns=['lot_shape'])


In [43]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,Pave,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,Pave,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,Pave,0.0,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,Pave,0.0,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,Pave,0.0,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [29]:
# COLUMN:		land_contour
# DEFINITION:	Land Contour: Flatness of the property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_contour.unique()


array(['Lvl', 'HLS', 'Bnk', 'Low'], dtype=object)

In [30]:
# COLUMN:		utilities
# DEFINITION:	Type of utilities available
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.utilities.unique()


array(['AllPub', 'NoSewr', 'NoSeWa'], dtype=object)

In [31]:
# COLUMN:		lot_config
# DEFINITION:	Type of utilities available
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_config.unique()


array(['Corner', 'Inside', 'CulDSac', 'FR2', 'FR3'], dtype=object)

In [32]:
# COLUMN:		lot_config
# DEFINITION:	 Lot configuration
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_config.unique()


array(['Corner', 'Inside', 'CulDSac', 'FR2', 'FR3'], dtype=object)

In [33]:
# COLUMN:		land_slope
# DEFINITION:	Slope of property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_slope.unique()


array(['Gtl', 'Mod', 'Sev'], dtype=object)

In [34]:
# COLUMN:		neighborhood
# DEFINITION:	Physical locations within Ames city limits
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.neighborhood.unique()


array(['NAmes', 'Gilbert', 'StoneBr', 'NWAmes', 'Somerst', 'BrDale',
       'NPkVill', 'NridgHt', 'Blmngtn', 'NoRidge', 'SawyerW', 'Sawyer',
       'Greens', 'BrkSide', 'OldTown', 'IDOTRR', 'ClearCr', 'SWISU',
       'Edwards', 'CollgCr', 'Crawfor', 'Blueste', 'Mitchel', 'Timber',
       'MeadowV', 'Veenker', 'GrnHill', 'Landmrk'], dtype=object)

In [35]:
# COLUMN:		condition_1
# DEFINITION:	Proximity to various conditions
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. condition_1.unique()


array(['Norm', 'RRAe', 'RRNe', 'Feedr', 'Artery', 'PosA', 'PosN', 'RRAn',
       'RRNn'], dtype=object)

In [36]:
# COLUMN:		condition_2
# DEFINITION:	Proximity to various conditions (if more than one is present)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.condition_2.unique()


array(['Norm', 'Feedr', 'PosN', 'Artery', 'PosA', 'RRNn', 'RRAe', 'RRAn'],
      dtype=object)

In [37]:
# COLUMN:		bldg_type
# DEFINITION:	Type of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bldg_type.unique()


array(['1Fam', 'TwnhsE', 'Twnhs', 'Duplex', '2fmCon'], dtype=object)

In [38]:
# COLUMN:		house_style 
# DEFINITION:	Style of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.house_style.unique()


array(['1Story', '2Story', 'SLvl', '1.5Fin', 'SFoyer', '2.5Unf', '1.5Unf',
       '2.5Fin'], dtype=object)

In [39]:
# COLUMN:		overall_qual
# DEFINITION:	Rates the overall material and finish of the house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.overall_qual.unique()


array([ 6,  5,  8,  7,  4,  9,  3,  2, 10,  1])

In [40]:
# COLUMN:		overall_cond        
# DEFINITION:	Rates the overall condition of the house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. overall_cond.unique()


array([5, 6, 7, 8, 2, 4, 9, 3, 1])

In [41]:
# COLUMN:		year_built
# DEFINITION:	Original construction date
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_built.unique()


array([1960, 1958, 1997, 1992, 1995, 1999, 1993, 1998, 1990, 1985, 2003,
       1951, 1978, 1977, 2000, 1970, 1971, 1968, 1975, 2009, 2007, 2005,
       2004, 2002, 2006, 2001, 1996, 1994, 2008, 1980, 1979, 1984, 1965,
       1967, 1962, 1974, 2010, 1976, 1988, 1963, 1959, 1966, 1964, 1949,
       1940, 1954, 1955, 1956, 1953, 1920, 1948, 1952, 1927, 1957, 1945,
       1929, 1923, 1928, 1900, 1915, 1910, 1885, 1922, 1950, 1939, 1942,
       1936, 1930, 1921, 1912, 1875, 1969, 1947, 1946, 1987, 1941, 1924,
       1989, 1896, 1991, 1972, 1981, 1973, 1961, 1916, 1925, 1890, 1935,
       1938, 1898, 1917, 1937, 1926, 1931, 1934, 1983, 1880, 1932, 1986,
       1905, 1914, 1872, 1893, 1911, 1895, 1982, 1879, 1901, 1918, 1913,
       1908, 1892, 1919])

In [42]:
# COLUMN:		year_remod_add      
# DEFINITION:	Remodel date (same as construction date if no remodeling or additions)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_remod_add.unique()


array([1960, 1958, 1998, 1992, 1996, 1999, 1994, 2007, 1990, 1985, 2003,
       1951, 1988, 1977, 2000, 1970, 2008, 1968, 1971, 1975, 2010, 2005,
       2006, 2004, 2002, 2001, 1995, 2009, 1980, 1979, 1978, 1967, 1993,
       1963, 1959, 1966, 1964, 1950, 1954, 1972, 1956, 1955, 1952, 1962,
       1984, 1957, 1997, 1965, 1969, 1987, 1976, 1989, 1991, 1986, 1981,
       1974, 1973, 1961, 1983, 1953, 1982])

In [43]:
# COLUMN:		roof_style         
# DEFINITION:	Type of roof
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_style.unique()


array(['Hip', 'Gable', 'Mansard', 'Flat', 'Gambrel', 'Shed'], dtype=object)

In [44]:
# COLUMN:		roof_matl                  
# DEFINITION:	Roof material
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_matl.unique()


array(['CompShg', 'WdShngl', 'Tar&Grv', 'WdShake', 'Membran', 'ClyTile'],
      dtype=object)

In [45]:
# COLUMN:		exterior_1st                        
# DEFINITION:	Exterior covering on house
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_first.unique()


array(['BrkFace', 'Wd Sdng', 'VinylSd', 'HdBoard', 'CemntBd', 'Plywood',
       'MetalSd', 'AsbShng', 'WdShing', 'Stucco', 'BrkComm', 'CBlock',
       'AsphShn', 'Stone', 'ImStucc'], dtype=object)

In [46]:
# COLUMN:		exterior_2nd                              
# DEFINITION:	Exterior covering on house (if more than one material)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. exterior_2nd.unique()


array(['Plywood', 'Wd Sdng', 'VinylSd', 'HdBoard', 'CmentBd', 'Wd Shng',
       'MetalSd', 'ImStucc', 'Brk Cmn', 'AsbShng', 'BrkFace', 'Stucco',
       'CBlock', 'Stone', 'AsphShn'], dtype=object)

In [47]:
# COLUMN:		mas_vnr_type                                     
# DEFINITION:	Masonry veneer type
# DATA TYPE:	object
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_type.unique()


array(['Stone', 'BrkFace', 'None', nan, 'BrkCmn'], dtype=object)

In [48]:
# COLUMN:		mas_vnr_area                                          
# DEFINITION:	Masonry veneer area in square feet
# DATA TYPE:	float64
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_area.unique()


array([1.120e+02, 1.080e+02, 0.000e+00, 6.030e+02, 1.190e+02, 4.800e+02,
       1.800e+02, 5.040e+02, 3.810e+02, 1.620e+02, 2.000e+02, 2.260e+02,
       2.400e+02, 1.680e+02, 7.600e+02, 1.095e+03, 2.320e+02, 4.120e+02,
       1.780e+02, 1.060e+02, 1.600e+01,       nan, 1.650e+02, 3.380e+02,
       3.620e+02, 3.480e+02, 3.000e+01, 5.790e+02, 3.600e+01, 1.220e+02,
       3.100e+01, 2.500e+02, 1.200e+02, 2.160e+02, 4.320e+02, 2.890e+02,
       2.800e+01, 4.200e+01, 4.510e+02, 2.680e+02, 8.600e+01, 3.400e+02,
       1.100e+02, 1.640e+02, 3.610e+02, 5.060e+02, 1.500e+02, 2.200e+02,
       3.240e+02, 2.610e+02, 2.180e+02, 3.510e+02, 2.940e+02, 3.000e+02,
       4.700e+01, 1.430e+02, 2.880e+02, 9.600e+01, 3.360e+02, 1.770e+02,
       8.500e+01, 2.460e+02, 7.200e+01, 2.400e+01, 3.200e+02, 4.790e+02,
       4.420e+02, 1.700e+02, 1.090e+02, 9.800e+01, 2.030e+02, 4.400e+01,
       1.860e+02, 3.350e+02, 6.000e+01, 8.400e+01, 1.880e+02, 1.600e+02,
       2.200e+01, 4.000e+01, 3.440e+02, 7.480e+02, 

In [49]:
# COLUMN:		exter_qual                                                   
# DEFINITION:	Evaluates the quality of the material on the exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_qual.unique()


array(['TA', 'Gd', 'Ex', 'Fa'], dtype=object)

In [50]:
# COLUMN:		exter_cond                                                           
# DEFINITION:	Evaluates the present condition of the material on the exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_cond.unique()


array(['TA', 'Gd', 'Po', 'Fa', 'Ex'], dtype=object)

In [51]:
# COLUMN:		foundation                                                                    
# DEFINITION:	Type of foundation
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.foundation.unique()


array(['CBlock', 'PConc', 'Slab', 'BrkTil', 'Stone', 'Wood'], dtype=object)

In [52]:
# COLUMN:		bsmt_qual                                                                              
# DEFINITION:	Evaluates the height of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df. bsmt_qual.unique()


array(['TA', 'Gd', 'Ex', nan, 'Fa', 'Po'], dtype=object)

In [53]:
# COLUMN:		bsmt_cond                                                                                        
# DEFINITION:	Evaluates the general condition of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmt_cond.unique()


array(['Gd', 'TA', nan, 'Po', 'Fa', 'Ex'], dtype=object)

In [54]:
# COLUMN:		bsmt_exposure                                                                                              
# DEFINITION:	Refers to walkout or garden level walls
# DATA TYPE:	object
# MISSING VALUES:	58
# UNIQUE VALUES:	
df.bsmt_exposure.unique()


array(['Gd', 'No', 'Av', 'Mn', nan], dtype=object)

In [55]:
# COLUMN:		bsmtfin_type_1                                                                                                   
# DEFINITION:	Rating of basement finished areawalls
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmtfin_type_1.unique()


array(['BLQ', 'ALQ', 'GLQ', 'Unf', 'LwQ', 'Rec', nan], dtype=object)

In [56]:
# COLUMN:		bsmtfin_sf_1                                                                                                        
# DEFINITION:	Type 1 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_1.unique()


array([6.390e+02, 9.230e+02, 7.910e+02, 2.630e+02, 1.180e+03, 0.000e+00,
       9.350e+02, 6.370e+02, 3.680e+02, 1.416e+03, 1.200e+02, 7.900e+02,
       7.050e+02, 5.330e+02, 5.780e+02, 7.340e+02, 7.750e+02, 4.320e+02,
       1.051e+03, 1.560e+02, 3.600e+02, 5.140e+02, 1.218e+03, 1.201e+03,
       2.800e+01, 2.000e+00, 2.188e+03, 1.373e+03, 4.560e+02, 2.400e+01,
       3.260e+02, 6.250e+02, 9.190e+02, 1.032e+03, 5.240e+02, 8.160e+02,
       1.078e+03, 2.220e+02, 6.560e+02, 6.950e+02, 5.430e+02, 3.380e+02,
       5.530e+02, 4.500e+02, 6.590e+02, 1.260e+02, 6.740e+02, 1.298e+03,
       2.800e+02, 3.760e+02, 3.780e+02, 2.440e+02, 1.052e+03, 5.060e+02,
       1.137e+03, 1.200e+03, 3.940e+02, 5.690e+02, 1.059e+03, 1.010e+03,
       1.014e+03, 3.000e+02, 6.960e+02, 3.540e+02, 4.430e+02, 2.470e+02,
       1.188e+03, 8.560e+02, 1.018e+03, 1.000e+03, 6.970e+02, 6.480e+02,
       5.320e+02, 7.310e+02, 7.200e+01, 4.810e+02, 3.400e+02, 5.070e+02,
       2.340e+02, 7.170e+02, 5.790e+02, 2.740e+02, 

In [57]:
# COLUMN:		bsmtfin_type_2                                                                                                             
# DEFINITION:	Rating of basement finished area (if multiple types)
# DATA TYPE:	object
# MISSING VALUES:	56
# UNIQUE VALUES:	
df.bsmtfin_type_2.unique()


array(['Unf', 'BLQ', 'Rec', nan, 'GLQ', 'ALQ', 'LwQ'], dtype=object)

In [58]:
# COLUMN:		bsmtfin_sf_2                                                                                                                   
# DEFINITION:	Type 2 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_2.unique()


array([   0., 1120.,  163.,  119.,  117.,  859.,   42.,   81.,  290.,
        132.,  713.,  162.,  258.,  174.,  906.,  486.,  263., 1073.,
        692.,   12.,  159.,  474.,  387.,  688.,  712.,  127.,  334.,
        232.,  590.,  284.,  168.,  239.,  294.,  622.,  495.,  539.,
        479.,  113.,  180.,  774.,  364.,  596.,  311.,   92.,  147.,
       1127.,  466.,  201.,  345.,  230.,  247.,  661.,  620.,  202.,
        483.,  750.,  105.,   60.,  102.,   95.,  465.,  262.,  500.,
        670.,  768.,  286.,  450.,  177.,  344.,   72.,  144.,  420.,
        210.,  875.,  507.,  419.,  116.,  354.,  624.,  273.,   76.,
        270.,  110.,  288.,  411.,  276.,  228.,  186.,   93.,  613.,
        852.,  555.,  811.,  842.,  382.,  182.,   80.,   64.,  306.,
        308.,  374.,  872.,  108.,   52.,  196.,  128.,  488.,  532.,
        106.,  169.,  608.,   nan,  240.,   41.,  645.,  181.,  956.,
       1080., 1063.,  380.,  531.,  435.,  120.,  612.,  125.,  400.,
        208.,  823.,

In [59]:
# COLUMN:		bsmt_unf_sf                                                                                                                         
# DEFINITION:	Unfinished square feet of basement area
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmt_unf_sf.unique()


array([ 441.,  406.,  137., 1017.,  415.,  994.,  763.,  233.,  789.,
        663.,    0.,  234.,  744.,  589., 1139.,  281.,  426.,  344.,
        432.,  354.,  327.,  525.,  709.,  341.,  836., 1590.,  486.,
        340., 1794., 1515.,  142., 1473., 1093., 1296., 1346.,  764.,
       1324., 1232.,   58.,  235.,  847.,  884., 1393.,  801.,  431.,
        628., 1195., 1217., 1595., 1218.,  732.,  488.,  769.,  300.,
        831.,  253.,  261., 1055.,  918.,  702.,  224.,  381.,  223.,
         76.,  190.,  320.,  600.,  892.,  378.,  286.,  610.,  174.,
        192., 1323.,  143.,  410.,  586.,  678.,  150.,  500.,  533.,
        138.,  482.,  162.,  728.,  412.,  350.,  662.,  557., 1604.,
        292.,  125.,  380., 1194.,  188.,  571.,  832.,  324.,  456.,
        326.,  576.,  912.,  733.,  161.,  403.,  180., 1116.,  756.,
        357., 1008.,  747.,  278.,  247.,  624.,  930.,  840.,  312.,
        622.,  777.,  455.,  200.,  144.,  308.,  316.,  480.,  252.,
        164.,  888.,

In [60]:
# COLUMN:		total_bsmt_sf                                                                                                                             
# DEFINITION:	Total square feet of basement areaarea
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.total_bsmt_sf.unique()


array([1080., 1329.,  928., 1280., 1595.,  994.,  763., 1168.,  789.,
       1300., 1488., 1650.,  864., 1542., 1844.,  814., 1004., 1078.,
       1056., 1405.,  483.,  525., 1069.,  855.,  836., 1590., 1704.,
       1541., 1822., 1517., 2330., 2846., 1671., 1752., 1370.,  764.,
       1324., 1256.,  384.,  860.,  847.,  884., 1393., 1720., 1463.,
       1152., 1195., 2033., 1218.,  756., 1566.,  991.,  956.,  831.,
        948.,  923., 1055.,  918., 1040.,    0.,  894.,  882., 1208.,
        750.,  600., 1470.,  530., 1642., 1226.,  725., 1829., 1610.,
        980., 1328.,  950., 1209., 1510.,  533.,  782.,  858.,  728.,
       1156., 1105., 1604., 1480., 1143., 1398., 1194., 1188., 1268.,
        832.,  972.,  988., 1057.,  576.,  912., 1063.,  816., 1246.,
        910.,  900., 1116., 1395.,  936., 1008., 1347.,  747.,  788.,
        926., 1027.,  678.,  624.,  930.,  686.,  840.,  622.,  777.,
        738.,  608.,  572.,  835.,  780.,  528., 1124.,  888.,  662.,
       1032., 1768.,

In [61]:
# COLUMN:		heating                                                                                                                                         
# DEFINITION:	Type of heating
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating.unique()


array(['GasA', 'GasW', 'Grav', 'Wall', 'OthW'], dtype=object)

In [62]:
# COLUMN:		heating_qc                                                                                                                                                  
# DEFINITION:	Heating quality and condition
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating_qc.unique()


array(['Fa', 'TA', 'Gd', 'Ex', 'Po'], dtype=object)

In [63]:
# COLUMN:		central_air                                                                                                                                                          
# DEFINITION:	Central air conditioning
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.central_air.unique()


array(['Y', 'N'], dtype=object)

In [64]:
# COLUMN:		electrical                                                                                                                                                                   
# DEFINITION:	Electrical system
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. electrical.unique()


array(['SBrkr', 'FuseA', 'FuseF', 'FuseP', 'Mix'], dtype=object)

In [65]:
# COLUMN:		1st_flr_sf                                                                                                                                                                                     
# DEFINITION:	First Floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.first_flr_sf.unique()


array([1656, 1329,  928, 1280, 1616, 1028,  763, 1187,  789, 1341, 1502,
       1690,  864, 2073, 1844,  814, 1004, 1078, 1056, 1337,  483,  525,
       1069,  855,  836, 1627, 1704, 1541, 1822, 1535, 2364, 2696, 1687,
       1752, 1370,  764, 1324, 1269,  744,  860,  847,  884, 1422, 1720,
       1500, 1164, 1195, 2053, 1595, 1218, 1566,  991,  956,  831, 1222,
        923, 1055,  918, 1097, 1318,  894,  900, 1040, 1494, 1061, 1488,
        600, 1478,  769,  530, 1418, 1226,  725, 1829, 1610,  980, 1328,
       1225, 1209, 1510, 1131, 1152, 1019,  858, 1306, 1063, 1520, 1105,
       1888, 1604, 1480, 1143, 1700, 1194, 1188, 1264,  832,  972,  988,
       1057,  985,  827,  912, 1287,  816, 1246,  910, 1116, 1395, 1051,
        936, 1347,  747,  804,  926, 1027,  720,  930,  966, 1128, 1236,
        741,  868, 1030,  608,  848,  955,  780,  548, 1068,  902,  888,
        662,  372, 1207, 1768, 1039, 1392,  892,  663, 1373, 1483,  756,
       1067, 1117,  835, 1074, 1169, 1172, 1508, 12

In [66]:
# COLUMN:		low_qual_fin_sf              
# DEFINITION:	Low quality finished square feet (all floors)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.low_qual_fin_sf.unique()


array([   0,  390,  362,  144, 1064,  120,  436,  371,  259,  397,  312,
        513,  108,  205,  156,  697,  384,  473,  512,  528,  114,  479,
        515,   53,   80,  572,  360,  234,  140,  450,  514])

In [67]:
# COLUMN:		gr_liv_area                       
# DEFINITION:	Above grade (ground) living area square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.gr_liv_area.unique()


array([1656, 1329, 1629, ..., 1003, 1389, 2000])

In [68]:
# COLUMN:		bsmt_full_bath                           
# DEFINITION:	Basement full bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_full_bath.unique()


array([ 1.,  0.,  2.,  3., nan])

In [69]:
# COLUMN:		bsmt_half_bath                              
# DEFINITION:	Basement half bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_half_bath.unique()

array([ 0.,  1., nan,  2.])

In [70]:
# COLUMN:		full_bath                                        
# DEFINITION:	Full bathrooms above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.full_bath.unique()


array([1, 2, 3, 0, 4])

In [71]:
# COLUMN:		half_bath                                                  
# DEFINITION:	Half baths above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.half_bath.unique()

array([0, 1, 2])

In [72]:
# COLUMN:		bedroom_abvgr                                                  
# DEFINITION:	Bedrooms above grade (does NOT include basement bedrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bedroom_abvgr.unique()


array([3, 2, 1, 4, 5, 6, 0, 8])

In [73]:
# COLUMN:		kitchen_abvgr                                                        
# DEFINITION:	Kitchens above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. kitchen_abvgr.unique()

array([1, 2, 3, 0])

In [74]:
# COLUMN:		kitchen_qual                                                              
# DEFINITION:	Kitchen quality
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.kitchen_qual.unique()


array(['TA', 'Gd', 'Ex', 'Fa'], dtype=object)

In [75]:
# COLUMN:		totrms_abvgrd                                                                     
# DEFINITION:	Total rooms above grade (does not include bathrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.totrms_abvgrd.unique()


array([ 7,  6,  5,  4, 12, 10,  8, 11,  9,  3, 13,  2, 15, 14])

In [76]:
# COLUMN:		functional                                                                             
# DEFINITION:	Home functionality (Assume typical unless deductions are warranted)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.functional.unique()


array(['Typ', 'Mod', 'Min1', 'Min2', 'Maj1', 'Maj2', 'Sev', 'Sal'],
      dtype=object)

In [77]:
# COLUMN:		fireplaces                                                                                       
# DEFINITION:	Number of fireplaces
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.fireplaces.unique()

array([2, 0, 1, 3, 4])

In [78]:
# COLUMN:		fireplace_qu                                                                                              
# DEFINITION:	Fireplace quality
# DATA TYPE:	object
# MISSING VALUES:	1000
# UNIQUE VALUES:	
df.fireplace_qu.unique()


array(['Gd', nan, 'TA', 'Po', 'Fa', 'Ex'], dtype=object)

In [79]:
# COLUMN:		garage_type                                                                                                      
# DEFINITION:	Garage location
# DATA TYPE:	object
# MISSING VALUES:	113
# UNIQUE VALUES:	
df.garage_type.unique()


array(['Attchd', 'BuiltIn', 'Detchd', nan, 'Basment', '2Types', 'CarPort'],
      dtype=object)

In [80]:
# COLUMN:		garage_yr_blt                                                                                                           
# DEFINITION:	Year garage was built
# DATA TYPE:	float64
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_yr_blt.unique()


array([1960., 1958., 1997., 1992., 1995., 1999., 1993., 1998., 1990.,
       1985., 2003., 1951., 1978., 1977., 2000., 1970., 1971., 1968.,
         nan, 1975., 2009., 2008., 2005., 2004., 2002., 2006., 2001.,
       1996., 1994., 1980., 1979., 1986., 1973., 1962., 1974., 2010.,
       1976., 1967., 1988., 1963., 1966., 1964., 1949., 1954., 1955.,
       1959., 1956., 1953., 1989., 1948., 1950., 1927., 1957., 1945.,
       1940., 1928., 1930., 1961., 1939., 1942., 1923., 1915., 1920.,
       1965., 1969., 1987., 1947., 1946., 1941., 1922., 1952., 1896.,
       2007., 1984., 1972., 1983., 1981., 1991., 1982., 1916., 1938.,
       1910., 1917., 1936., 1926., 1935., 1931., 1934., 1900., 1925.,
       1929., 1921., 1937., 1932., 1895., 1933., 1918., 2207., 1924.,
       1914., 1919.])

In [81]:
# COLUMN:		garage_finish                                                                                                                
# DEFINITION:	Interior finish of the garage
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_finish.unique()


array(['Fin', 'Unf', 'RFn', nan], dtype=object)

In [82]:
# COLUMN:		garage_cars                                                                                                                      
# DEFINITION:	Size of garage in car capacity
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_cars.unique()


array([ 2.,  1.,  3.,  0.,  4.,  5., nan])

In [83]:
# COLUMN:		garage_area                                                                                                                             
# DEFINITION:	Size of garage in square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_area.unique()


array([ 528.,  312.,  482.,  506.,  608.,  442.,  440.,  420.,  393.,
        841.,  400.,  500.,  546.,  663.,  480.,  304.,    0.,  511.,
        264.,  308.,  751.,  772.,  532.,  678.,  820.,  958.,  756.,
        576.,  484.,  474.,  430.,  433.,  434.,  779.,  527.,  712.,
        671.,  486.,  666.,  880.,  676.,  614.,  750.,  618.,  463.,
        462.,  539.,  336.,  280.,  260.,  461.,  564.,  496.,  852.,
        475.,  535.,  660.,  504.,  517.,  470.,  364.,  578.,  620.,
        447.,  531.,  263.,  305.,  246.,  392.,  330.,  720.,  360.,
        551.,  220.,  240.,  780.,  288.,  416.,  923.,  560.,  624.,
        363.,  572.,  180.,  349.,  231.,  299.,  591.,  533.,  690.,
        436.,  522.,  366.,  467.,  209.,  476., 1017.,  574.,  776.,
        632.,  594.,  850.,  670.,  598.,  606.,  494.,  319.,  352.,
        672.,  216.,  252.,  567.,  473.,  200.,  384.,  525.,  741.,
        573.,  888.,  520.,  680.,  510.,  431.,  495.,  275.,  616.,
        538.,  758.,

In [84]:
# COLUMN:		garage_qual                                                                                                                                     
# DEFINITION:	Garage quality
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_qual.unique()


array(['TA', nan, 'Fa', 'Gd', 'Ex', 'Po'], dtype=object)

In [85]:
# COLUMN:		garage_cond                                                                                                                                            
# DEFINITION:	Garage condition
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_cond.unique()


array(['TA', nan, 'Fa', 'Gd', 'Ex', 'Po'], dtype=object)

In [86]:
# COLUMN:		paved_drive                                                                                                                                                  
# DEFINITION:	Paved driveway
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.paved_drive.unique()


array(['P', 'Y', 'N'], dtype=object)

In [87]:
# COLUMN:		wood_deck_sf                                                                                                                                                       
# DEFINITION:	Wood deck area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.wood_deck_sf.unique()


array([ 210,  393,  212,    0,  237,  140,  157,  483,  192,  503,  349,
        203,  275,  173,  144,  220,  238,  196,  120,   36,  146,  100,
        288,  668,  240,  186,  132,  283,  169,   80,  635,   28,  416,
        296,  168,  160,  223,  224,  228,  352,  227,  117,  263,   42,
        252,  364,  414,  218,  222,  657,   51,  106,   54,  221,  306,
         12,  344,   56,  379,  226,  496,  290,  336,  450,  156,  105,
        367,  316,  365,  188,  180,   60,  257,  141,  112,   30,   68,
        128,  375,  135,  182,  200,  431,   22,  287,  129,  162,  269,
        201,   52,  256,  342,   63,  233,  474,   32,   96,  250,   87,
        260,  108,  147,  216,  404,  382,  319,   99,  184,  125,  165,
        248,  114,  230,  170,  172,  208,  148,  143,   24,  298,  340,
        517,  297,   70,  205,  195,  158,  171,  462,  371,  312,  321,
         78,   85,  164,  110,  289,  280,   66,  126,  187,   26,   40,
         48,  266,   74,  244,   45,  189,  509,  3

In [88]:
# COLUMN:		open_porch_sf                                                                                                                                                         
# DEFINITION:	Open porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.open_porch_sf.unique()


array([ 62,  36,  34,  82, 152,  60,  84,  21,  75,   0,  54, 122,  96,
        85,  68,  55,  30, 133,  50,  70, 119,  67, 150, 130,  49,  27,
        23,  20,  48,  56,  32,  57,  81,  86, 136,  45, 168, 104, 144,
        39, 172, 166, 192,  78,  44,  76,  66,  26,  40,  73,  38,  52,
        17, 124, 100, 228,  18, 158,  10,  11,  46, 278,  92,  90,  33,
        61,  59,  25,  35, 105,  64, 140, 207,  53, 312, 111,  72,  94,
       176, 195, 120,  28, 162, 102, 197,  98, 274, 170, 185, 190, 116,
        63, 235, 183,  16,  51, 128, 146, 126, 165, 226, 121, 175, 113,
        91,  41,  42,  93,  74, 234,  24,  99,  58,  88,  80, 110, 189,
       204,  12, 156, 103, 523, 135, 198, 215, 142,  29, 151, 200, 148,
       112, 160, 118, 154,  95, 238, 304, 101, 173,  22, 282,  69, 180,
       134, 153,  87, 174, 108, 210, 251,  65, 243, 240, 211, 129,   4,
       114, 213, 547, 291, 502, 299, 365, 182,  89, 117, 137,   8, 187,
       155, 159, 106, 372, 292, 184, 141, 123, 276, 265, 164, 22

In [89]:
# COLUMN:		enclosed_porch                                                                                                                                                              
# DEFINITION:	Enclosed porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.enclosed_porch.unique()


array([  0, 184, 154, 186, 156, 120, 150, 164, 189, 205, 113, 216, 135,
       130, 126, 246, 196,  18, 158, 114, 128,  35,  48,  32,  64, 364,
       112, 318,  45, 176,  77,  52,  56, 168,  36, 136, 162,  98, 265,
        50, 280, 222, 202, 144,  24, 236,  84, 264, 260, 203, 140, 100,
       134, 432, 198,  42,  40, 148,  25,  80, 160, 226, 244, 115,  94,
       105,  54,  34, 268,  30, 213, 288,  90, 177, 211, 185, 180,  44,
        57,  81, 218,  78,  72, 368,  70, 165,  92,  16, 192, 123,  96,
       102,  66, 210, 109,  60, 194, 219, 259, 116, 212,  20, 101,  87,
       117, 204, 122, 231, 239, 138, 301, 207, 224, 172, 174, 137,  99,
       249, 252, 291, 145, 214, 275, 175,  26, 143, 183, 230, 170,  88,
        39,  68,  43,  19, 200, 169, 133, 234,  37, 240, 324, 161,  75,
       167, 104, 296, 330, 228, 256,  55, 129, 225, 294, 121, 190, 208,
       272,  67,  23])

In [90]:
# COLUMN:		three_season_porch                                                                                                                                                                       
# DEFINITION:	Three season porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.threessn_porch.unique()


array([  0, 224, 144, 508, 168, 255, 162, 140, 150, 182, 153, 304, 407,
        96, 245, 216, 120, 176,  86, 290, 180, 323])

In [91]:
# COLUMN:		screen_porch                                                                                                                                                                             
# DEFINITION:	Screen porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.screen_porch.unique()

array([  0, 144, 140, 210, 165, 204, 143, 182, 385, 240, 168, 148,  95,
       266, 160, 161, 155, 291, 200, 490, 170, 192, 180, 156, 152, 288,
       342, 189, 252, 216, 234, 255, 111, 112, 231, 120, 100, 142, 110,
       396,  92, 195, 145, 224, 233, 190, 141, 208, 176, 196,  94, 164,
       130, 480, 220,  64, 163,  90, 227, 265, 171, 135, 322, 174, 147,
       276, 260, 175, 198, 217, 201, 109, 225, 150, 126, 259, 184,  84,
       154, 116,  53, 153, 108,  88, 280, 440, 374, 222, 264, 270, 122,
       162, 115, 410, 271, 312, 348, 113, 104, 138])

In [92]:
# COLUMN:		pool_area                                                                                                                                                                                       
# DEFINITION:	Pool area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.pool_area.unique()

array([  0, 480, 576, 368, 228, 561, 519, 648, 800, 738])

In [93]:
# COLUMN:		pool_qc                                                                                                                                                                                                   
# DEFINITION:	Pool quality
# DATA TYPE:	object
# MISSING VALUES:	2042
# UNIQUE VALUES:	
df.pool_qc.unique()

array([nan, 'Gd', 'TA', 'Ex', 'Fa'], dtype=object)

In [94]:
# COLUMN:		fence              
# DEFINITION:	Fence quality
# DATA TYPE:	object
# MISSING VALUES:	1651
# UNIQUE VALUES:	
df.fence.unique()


array([nan, 'MnPrv', 'GdPrv', 'GdWo', 'MnWw'], dtype=object)

In [95]:
# COLUMN:		misc_feature       
# DEFINITION:	Miscellaneous feature not covered in other categories
# DATA TYPE:	object
# MISSING VALUES:	1986
# UNIQUE VALUES:	
df.misc_feature.unique()

array([nan, 'Gar2', 'Shed', 'Othr', 'Elev', 'TenC'], dtype=object)

In [96]:
# COLUMN:		misc_val            
# DEFINITION:	$Value of miscellaneous feature
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.misc_val.unique()

array([    0, 12500,   500,   700,   400,   450,   300,  1200,  3500,
        2000,  2500,    54,    80,   650,   600,   900,   800,  1500,
        6500,  1150,  4500,  3000,  1300,  8300,   480, 17000,   455,
         460])

In [97]:
# COLUMN:		mo_sold             
# DEFINITION:	Month Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.mo_sold.unique()

array([ 5,  6,  3,  1,  4,  2,  7,  8, 10,  9, 12, 11])

In [98]:
# COLUMN:		yr_sold             
# DEFINITION:	Year Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.yr_sold.unique()


array([2010, 2009, 2008, 2007, 2006])

In [99]:
# COLUMN:		sale_type          
# DEFINITION:	Type of sale
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.sale_type.unique()

array(['WD ', 'COD', 'ConLI', 'New', 'Con', 'ConLD', 'Oth', 'ConLw',
       'CWD'], dtype=object)

In [100]:
# COLUMN:		saleprice                     
# DEFINITION:	Condition of sale
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.saleprice.unique()

array([215000, 172000, 189900, 191500, 236500, 189000, 175900, 185000,
       180400, 171500, 212000, 538000, 141000, 210000, 190000, 216000,
       149000, 149900, 142000, 115000, 184000,  96000,  88000, 127500,
       120000, 376162, 306000, 220000, 259000, 214000, 611657, 500000,
       320000, 319900, 205000, 175500, 199500, 192000, 184500, 216500,
       185088, 222500, 333168, 260400, 325000, 290000, 221000, 410000,
       221500, 204500, 215200, 262500, 254900, 233000, 181000, 143000,
        99500, 152000, 112000, 138500, 122000, 127000, 169000, 260000,
       155000, 151000, 149500, 222000, 177500, 147110, 267916, 206000,
       130500, 218500, 243500, 196500, 128950, 159000, 178900, 136300,
       180500, 172500, 116500,  76500, 128000, 153000, 154300, 135000,
       136000, 165500, 148000, 167500, 108000, 122500, 119000, 109000,
       105000, 107500,  97500, 162000, 132000, 154000, 166000, 134800,
       160000, 109500,  80000, 130000, 129000,  12789, 105900, 150000,
      

COLUMNS THAT ARE NO IN DATABASE: ms_subclass, 

------------------------------------------ STOP HERE!!! ---------------------------------------------------------

**Export Dataframe:**

In [101]:
df.to_csv('../data/train_clean.csv')