**Description of the Ames Iowa Housing Data columns:**

SOURCE: https://rdrr.io/cran/AmesHousing/man/ames_raw.html

Order: Observation number

PID: Parcel identification number - can be used with city web site for parcel review.

MS SubClass: Identifies the type of dwelling involved in the sale.

MS Zoning: Identifies the general zoning classification of the sale.

Lot Frontage: Linear feet of street connected to property

Lot Area: Lot size in square feet

Street: Type of road access to property

Alley: Type of alley access to property

Lot Shape: General shape of property

Land Contour: Flatness of the property

Utilities: Type of utilities available

Lot Config: Lot configuration

Land Slope: Slope of property

Neighborhood: Physical locations within Ames city limits (map available)

Condition 1: Proximity to various conditions

Condition 2: Proximity to various conditions (if more than one is present)

Bldg Type: Type of dwelling

House Style: Style of dwelling

Overall Qual: Rates the overall material and finish of the house

Overall Cond: Rates the overall condition of the house

Year Built: Original construction date

Year Remod/Add: Remodel date (same as construction date if no remodeling or additions)

Roof Style: Type of roof

Roof Matl: Roof material

Exterior 1: Exterior covering on house

Exterior 2: Exterior covering on house (if more than one material)

Mas Vnr Type: Masonry veneer type

Mas Vnr Area: Masonry veneer area in square feet

Exter Qual: Evaluates the quality of the material on the exterior

Exter Cond: Evaluates the present condition of the material on the exterior

Foundation: Type of foundation

Bsmt Qual: Evaluates the height of the basement

Bsmt Cond: Evaluates the general condition of the basement

Bsmt Exposure: Refers to walkout or garden level walls

BsmtFin Type 1: Rating of basement finished area

BsmtFin SF 1: Type 1 finished square feet

BsmtFinType 2: Rating of basement finished area (if multiple types)

BsmtFin SF 2: Type 2 finished square feet

Bsmt Unf SF: Unfinished square feet of basement area

Total Bsmt SF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

Central Air: Central air conditioning

Electrical: Electrical system

1st Flr SF: First Floor square feet

2nd Flr SF: Second floor square feet

Low Qual Fin SF: Low quality finished square feet (all floors)

Gr Liv Area: Above grade (ground) living area square feet

Bsmt Full Bath: Basement full bathrooms

Bsmt Half Bath: Basement half bathrooms

Full Bath: Full bathrooms above grade

Half Bath: Half baths above grade

Bedroom: Bedrooms above grade (does NOT include basement bedrooms)

Kitchen: Kitchens above grade

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality (Assume typical unless deductions are warranted)

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

Garage Type: Garage location

Garage Yr Blt: Year garage was built

Garage Finish: Interior finish of the garage

Garage Cars: Size of garage in car capacity

Garage Area: Size of garage in square feet

Garage Qual: Garage quality

Garage Cond: Garage condition

Paved Drive: Paved driveway

Wood Deck SF: Wood deck area in square feet

Open Porch SF: Open porch area in square feet

Enclosed Porch: Enclosed porch area in square feet

3-Ssn Porch: Three season porch area in square feet

Screen Porch: Screen porch area in square feet

Pool Area: Pool area in square feet

Pool QC: Pool quality

Fence: Fence quality

Misc Feature: Miscellaneous feature not covered in other categories

Misc Val: $Value of miscellaneous feature

Mo Sold: Month Sold

Yr Sold: Year Sold

Sale Type: Type of sale

Sale Condition: Condition of sale

**Load packages:**

In [1]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

%config InlineBackend.figure_format = 'retina'
%matplotlib inline


 **Load the data:**

In [2]:
ames_file_train = '../data/train.csv'


In [3]:
df = pd.read_csv(ames_file_train)


**Describe the basic format:**

In [4]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


In [5]:
df.dtypes


Id                   int64
PID                  int64
MS SubClass          int64
MS Zoning           object
Lot Frontage       float64
Lot Area             int64
Street              object
Alley               object
Lot Shape           object
Land Contour        object
Utilities           object
Lot Config          object
Land Slope          object
Neighborhood        object
Condition 1         object
Condition 2         object
Bldg Type           object
House Style         object
Overall Qual         int64
Overall Cond         int64
Year Built           int64
Year Remod/Add       int64
Roof Style          object
Roof Matl           object
Exterior 1st        object
Exterior 2nd        object
Mas Vnr Type        object
Mas Vnr Area       float64
Exter Qual          object
Exter Cond          object
Foundation          object
Bsmt Qual           object
Bsmt Cond           object
Bsmt Exposure       object
BsmtFin Type 1      object
BsmtFin SF 1       float64
BsmtFin Type 2      object
B

In [6]:
df.head(5)


Unnamed: 0,Id,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
0,109,533352170,60,RL,,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
1,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
2,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
3,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
4,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [7]:
df.shape


(2051, 81)

**Re-organize data:**

In [8]:
ames_file_train = '../data/train.csv'


In [9]:
df = pd.read_csv(ames_file_train, index_col="Id")


In [10]:
df = df.sort_values(by='Id')


In [11]:
df.index = df.index.set_names([''])


In [12]:
df.head(5)

Unnamed: 0,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,526301100.0,20.0,RL,141.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,526351010.0,20.0,RL,81.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,527105010.0,60.0,RL,74.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,527145080.0,120.0,RL,43.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,527146030.0,120.0,RL,39.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [13]:
df.shape


(2051, 80)

**Drop unwanted columns:**

* SAVE AS MUCH AS THE DATA AS YOU CAN, ONLY DROP COLUMNS THAT MODELS WILL NOT NEED (PID) 

In [14]:
df.drop(columns=['PID'], inplace=True)


In [15]:
df.head(5)

Unnamed: 0,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,RL,141.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,RL,81.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,RL,74.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,RL,43.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,RL,39.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


**Determine missing observations:**

In [16]:
df.isnull().sum()

MS SubClass           0
MS Zoning             0
Lot Frontage        330
Lot Area              0
Street                0
Alley              1911
Lot Shape             0
Land Contour          0
Utilities             0
Lot Config            0
Land Slope            0
Neighborhood          0
Condition 1           0
Condition 2           0
Bldg Type             0
House Style           0
Overall Qual          0
Overall Cond          0
Year Built            0
Year Remod/Add        0
Roof Style            0
Roof Matl             0
Exterior 1st          0
Exterior 2nd          0
Mas Vnr Type         22
Mas Vnr Area         22
Exter Qual            0
Exter Cond            0
Foundation            0
Bsmt Qual            55
Bsmt Cond            55
Bsmt Exposure        58
BsmtFin Type 1       55
BsmtFin SF 1          1
BsmtFin Type 2       56
BsmtFin SF 2          1
Bsmt Unf SF           1
Total Bsmt SF         1
Heating               0
Heating QC            0
Central Air           0
Electrical      

**Make the column names searchable**

In [17]:
df.columns = df.columns.str.replace('3','three')


In [18]:
df.columns = df.columns.str.replace('1st','first')


In [19]:
df.columns = df.columns.str.replace('/','_')


In [20]:
df.columns = df.columns.str.replace(' ','_')


In [21]:
df.columns = df.columns.str.lower()

In [22]:
df.dtypes


ms_subclass          int64
ms_zoning           object
lot_frontage       float64
lot_area             int64
street              object
alley               object
lot_shape           object
land_contour        object
utilities           object
lot_config          object
land_slope          object
neighborhood        object
condition_1         object
condition_2         object
bldg_type           object
house_style         object
overall_qual         int64
overall_cond         int64
year_built           int64
year_remod_add       int64
roof_style          object
roof_matl           object
exterior_first      object
exterior_2nd        object
mas_vnr_type        object
mas_vnr_area       float64
exter_qual          object
exter_cond          object
foundation          object
bsmt_qual           object
bsmt_cond           object
bsmt_exposure       object
bsmtfin_type_1      object
bsmtfin_sf_1       float64
bsmtfin_type_2      object
bsmtfin_sf_2       float64
bsmt_unf_sf        float64
t

**Analyze & Evaluate Columns**

* I will try and predict housing prices, any data that could by useful in this task will be kept.

In [23]:
# COLUMN:		ms_subclass
# DEFINITION:	Identifies the type of dwelling involved in the
# sale.
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.ms_subclass.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 20,  60, 120, 160,  80,  50,  90,  30, 190,  70,  85,  75,  45,
       180,  40, 150])

In [24]:
# COLUMN:		ms_zoning
# DEFINITION:	Identifies the general zoning classification of
# the sale.
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. ms_zoning.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['RL', 'FV', 'RH', 'RM', 'C (all)', 'I (all)', 'A (agr)'],
      dtype=object)

In [25]:
ms_zoning = {'RL':'0', 'FV':'1', 'RH':'2', 'RM':'3', 'C (all)':'4', 'I (all)':'5', 'A (agr)':'6'}


In [26]:
df.ms_zoning = [ms_zoning[item] for item in df.ms_zoning] 


In [27]:
df.ms_zoning.unique()


array(['0', '1', '2', '3', '4', '5', '6'], dtype=object)

In [28]:
df["ms_zoning"] = df["ms_zoning"].astype(dtype=np.int)


In [29]:
df['ms_zoning'].dtype


dtype('int64')

In [30]:
# COLUMN:		lot_frontage
# DEFINITION:	Linear feet of street connected to property
# DATA TYPE:	float64
# MISSING VALUES:	330
# UNIQUE VALUES:	
df.lot_frontage.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([141.,  81.,  74.,  43.,  39.,  60.,  75.,  nan,  63.,  85.,  47.,
       140., 105.,  65.,  70.,  26.,  21.,  53.,  24., 102.,  98.,  95.,
        79., 100., 110.,  61.,  41.,  36.,  67., 108.,  59.,  92.,  58.,
        56.,  73.,  72.,  76.,  50.,  55.,  68., 107.,  25.,  30.,  57.,
        40.,  80.,  77.,  90.,  88., 120., 137., 119.,  78.,  71.,  87.,
        69.,  52.,  51.,  54.,  94.,  44.,  83.,  64.,  82.,  38.,  48.,
        89.,  66.,  35., 129.,  93.,  42.,  99.,  96., 104.,  97., 103.,
        34., 117.,  62., 174., 106.,  84., 128.,  91., 144., 122., 112.,
        86.,  45., 130., 109., 113., 125., 101.,  46., 114., 135.,  37.,
        22.,  32., 313.,  49., 124., 123., 150., 160., 195., 118., 134.,
       116., 138., 155., 115., 200., 111., 121.,  33., 153.])

In [31]:
df = df.drop(columns=['lot_frontage'])


In [32]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,Pave,,IR1,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,Pave,,IR1,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [33]:
# COLUMN:		lot_area
# DEFINITION:	Lot size in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([31770, 14267, 13830, ..., 17400,  7937,  8885])

In [34]:
# COLUMN:		street
# DEFINITION:	Type of road access to property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.street.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Pave', 'Grvl'], dtype=object)

In [35]:
street = {'Pave':'0', 'Grvl':'1'}


In [36]:
df.street = [street[item] for item in df.street] 


In [37]:
df.street.unique()


array(['0', '1'], dtype=object)

In [38]:
df["street"] = df["street"].astype(dtype=np.int)


In [39]:
df['street'].dtype


dtype('int64')

In [40]:
# COLUMN:		Alley
# DEFINITION:	Type of alley access to property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.alley.unique()
# EVALUATION: LOT WITH MORE THAN ONE POINT OF ENTRY, COULD
# EFFECT PRICE. MISSING DATA CAUSE, LOT DOESN'T ABUT ALLEY.
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array([nan, 'Pave', 'Grvl'], dtype=object)

In [41]:
alley = {np.nan:'0', 'Pave':'1', 'Grvl':'1'}


In [42]:
df.alley = [alley[item] for item in df.alley] 


In [43]:
df['alley'] = pd.to_numeric(df['alley'], errors='coerce')


In [44]:
df['alley'].dtype


dtype('int64')

In [45]:
df.alley.unique()


array([0, 1])

In [46]:
# COLUMN:		lot_shape
# DEFINITION:	General shape of property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.lot_shape.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['IR1', 'Reg', 'IR2', 'IR3'], dtype=object)

In [47]:
df=df.drop(columns=['lot_shape'])


In [48]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,Lvl,AllPub,Corner,Gtl,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,HLS,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,Lvl,AllPub,Inside,Gtl,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [49]:
# COLUMN:		land_contour
# DEFINITION:	Land Contour: Flatness of the property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_contour.unique()
# EVALUATION: LOT CONTOUR COULD EFFECT PRICE, HOUSE HAS VIEW 
# AND IS WORTH MORE. COLUMN NEEDS TO BE CONVERTED TO INTERGER,
# NO MISSING VALUES!


array(['Lvl', 'HLS', 'Bnk', 'Low'], dtype=object)

In [50]:
land_contour = {'Lvl':'0', 'HLS':'1', 'Bnk':'2', 'Low':'3'}


In [51]:
df.land_contour = [land_contour[item] for item in df.land_contour] 


In [52]:
df.land_contour.unique()


array(['0', '1', '2', '3'], dtype=object)

In [53]:
df["land_contour"] = df["land_contour"].astype(dtype=np.int)


In [54]:
df['land_contour'].dtype

dtype('int64')

In [55]:
# COLUMN:		utilities
# DEFINITION:	Type of utilities available
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.utilities.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['AllPub', 'NoSewr', 'NoSeWa'], dtype=object)

In [56]:
utilities = {'AllPub':'0', 'NoSewr':'1', 'NoSeWa':'2'}


In [57]:
df.utilities = [utilities[item] for item in df.utilities] 


In [58]:
df.utilities.unique()


array(['0', '1', '2'], dtype=object)

In [59]:
df["utilities"] = df["utilities"].astype(dtype=np.int)


In [60]:
df['utilities'].dtype


dtype('int64')

In [61]:
# COLUMN:		lot_config
# DEFINITION:	Lot configuration
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_config.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!

array(['Corner', 'Inside', 'CulDSac', 'FR2', 'FR3'], dtype=object)

In [62]:
lot_config = {'Corner':'0', 'Inside':'1', 'CulDSac':'2', 'FR2':'3', 'FR3':'4'}


In [63]:
df.lot_config = [lot_config[item] for item in df.lot_config] 


In [64]:
df.lot_config.unique()


array(['0', '1', '2', '3', '4'], dtype=object)

In [65]:
df["lot_config"] = df["lot_config"].astype(dtype=np.int)


In [66]:
df['ms_zoning'].dtype


dtype('int64')

In [67]:
# COLUMN:		land_slope
# DEFINITION:	Slope of property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_slope.unique()
# EVALUATION: COLUMN IS SIMILAR TO LAND CONTOUR, ONLY
# DESCRIBING DEGREE OF SLOPE. COLUMN DOES NOT PERTAIN TO
# HOUSING PRICE, COLUMN TO BE DROPPED!


array(['Gtl', 'Mod', 'Sev'], dtype=object)

In [68]:
df = df.drop(columns=['land_slope'])


In [69]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,NAmes,Norm,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,NAmes,Norm,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,Gilbert,Norm,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,StoneBr,Norm,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [70]:
# COLUMN:		neighborhood
# DEFINITION:	Physical locations within Ames city limits
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.neighborhood.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['NAmes', 'Gilbert', 'StoneBr', 'NWAmes', 'Somerst', 'BrDale',
       'NPkVill', 'NridgHt', 'Blmngtn', 'NoRidge', 'SawyerW', 'Sawyer',
       'Greens', 'BrkSide', 'OldTown', 'IDOTRR', 'ClearCr', 'SWISU',
       'Edwards', 'CollgCr', 'Crawfor', 'Blueste', 'Mitchel', 'Timber',
       'MeadowV', 'Veenker', 'GrnHill', 'Landmrk'], dtype=object)

In [71]:
neighborhood = {'NAmes':'0', 'Gilbert':'1', 'StoneBr':'2', 'NWAmes':'3', 'Somerst':'4', 'BrDale':'5', 'NPkVill':'6','NridgHt':'7', 'Blmngtn':'8', 'NoRidge':'9', 'SawyerW':'10', 'Sawyer':'11', 'Greens':'12', 'BrkSide':'13','OldTown':'14', 'IDOTRR':'15', 'ClearCr':'16', 'SWISU':'17', 'Edwards':'18', 'CollgCr':'19', 'Crawfor':'20','Blueste':'21', 'Mitchel':'22', 'Timber':'23', 'MeadowV':'10', 'Veenker':'24', 'GrnHill':'25', 'Landmrk':'26'}


In [72]:
df.neighborhood = [neighborhood[item] for item in df.neighborhood] 


In [73]:
df.neighborhood.unique()


array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
       '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23',
       '24', '25', '26'], dtype=object)

In [74]:
df["neighborhood"] = df["neighborhood"].astype(dtype=np.int)


In [75]:
df['ms_zoning'].dtype


dtype('int64')

In [76]:
# COLUMN:		condition_1
# DEFINITION:	Proximity to various conditions
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. condition_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Norm', 'RRAe', 'RRNe', 'Feedr', 'Artery', 'PosA', 'PosN', 'RRAn',
       'RRNn'], dtype=object)

In [77]:
df = df.drop(columns=['condition_1'])


In [78]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,Norm,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,Norm,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,Norm,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,Norm,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,Norm,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [79]:
# COLUMN:		condition_2
# DEFINITION:	Proximity to various conditions (if more than one is present)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.condition_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Norm', 'Feedr', 'PosN', 'Artery', 'PosA', 'RRNn', 'RRAe', 'RRAn'],
      dtype=object)

In [80]:
df = df.drop(columns=['condition_2'])


In [81]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,1Fam,1Story,6.0,5.0,1960.0,1960.0,Hip,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,1Fam,1Story,6.0,6.0,1958.0,1958.0,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,1Fam,2Story,5.0,5.0,1997.0,1998.0,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,TwnhsE,1Story,8.0,5.0,1992.0,1992.0,Gable,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,TwnhsE,1Story,8.0,5.0,1995.0,1996.0,Gable,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [82]:
# COLUMN:		bldg_type
# DEFINITION:	Type of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bldg_type.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['1Fam', 'TwnhsE', 'Twnhs', 'Duplex', '2fmCon'], dtype=object)

In [83]:
bldg_type = {'1Fam':'0', 'TwnhsE':'1', 'Twnhs':'2', 'Duplex':'3', '2fmCon':'4'}


In [84]:
df.bldg_type = [bldg_type[item] for item in df.bldg_type] 


In [85]:
df.bldg_type.unique()


array(['0', '1', '2', '3', '4'], dtype=object)

In [86]:
df["bldg_type"] = df["bldg_type"].astype(dtype=np.int)


In [87]:
df['bldg_type'].dtype

dtype('int64')

In [88]:
# COLUMN:		house_style 
# DEFINITION:	Style of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.house_style.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['1Story', '2Story', 'SLvl', '1.5Fin', 'SFoyer', '2.5Unf', '1.5Unf',
       '2.5Fin'], dtype=object)

In [89]:
house_style = {'1Story':'0', '2Story':'1', 'SLvl':'2', '1.5Fin':'3', 'SFoyer':'4', '2.5Unf':'5', '1.5Unf':'6','2.5Fin':'7'}


In [90]:
df.house_style = [house_style[item] for item in df.house_style] 


In [91]:
df.house_style.unique()


array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype=object)

In [92]:
df["house_style"] = df["house_style"].astype(dtype=np.int)


In [93]:
df['house_style'].dtype

dtype('int64')

In [94]:
# COLUMN:		overall_qual
# DEFINITION:	Rates the overall material and finish of the
# house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.overall_qual.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 6,  5,  8,  7,  4,  9,  3,  2, 10,  1])

In [95]:
# COLUMN:		overall_cond        
# DEFINITION:	Rates the overall condition of the house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. overall_cond.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([5, 6, 7, 8, 2, 4, 9, 3, 1])

**COME BACK TO AND TRY TO GROUP BY DECADE**

In [96]:
# COLUMN:		year_built
# DEFINITION:	Original construction date
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_built.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

array([1960, 1958, 1997, 1992, 1995, 1999, 1993, 1998, 1990, 1985, 2003,
       1951, 1978, 1977, 2000, 1970, 1971, 1968, 1975, 2009, 2007, 2005,
       2004, 2002, 2006, 2001, 1996, 1994, 2008, 1980, 1979, 1984, 1965,
       1967, 1962, 1974, 2010, 1976, 1988, 1963, 1959, 1966, 1964, 1949,
       1940, 1954, 1955, 1956, 1953, 1920, 1948, 1952, 1927, 1957, 1945,
       1929, 1923, 1928, 1900, 1915, 1910, 1885, 1922, 1950, 1939, 1942,
       1936, 1930, 1921, 1912, 1875, 1969, 1947, 1946, 1987, 1941, 1924,
       1989, 1896, 1991, 1972, 1981, 1973, 1961, 1916, 1925, 1890, 1935,
       1938, 1898, 1917, 1937, 1926, 1931, 1934, 1983, 1880, 1932, 1986,
       1905, 1914, 1872, 1893, 1911, 1895, 1982, 1879, 1901, 1918, 1913,
       1908, 1892, 1919])

**COME BACK TO AND TRY TO GROUP BY DECADE**

In [97]:
# COLUMN:		year_remod_add      
# DEFINITION:	Remodel date (same as construction date if no remodeling or additions)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_remod_add.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

array([1960, 1958, 1998, 1992, 1996, 1999, 1994, 2007, 1990, 1985, 2003,
       1951, 1988, 1977, 2000, 1970, 2008, 1968, 1971, 1975, 2010, 2005,
       2006, 2004, 2002, 2001, 1995, 2009, 1980, 1979, 1978, 1967, 1993,
       1963, 1959, 1966, 1964, 1950, 1954, 1972, 1956, 1955, 1952, 1962,
       1984, 1957, 1997, 1965, 1969, 1987, 1976, 1989, 1991, 1986, 1981,
       1974, 1973, 1961, 1983, 1953, 1982])

In [98]:
# COLUMN:		roof_style         
# DEFINITION:	Type of roof
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_style.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Hip', 'Gable', 'Mansard', 'Flat', 'Gambrel', 'Shed'], dtype=object)

In [99]:
df = df.drop(columns=['roof_style'])


In [100]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_matl,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,CompShg,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,CompShg,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,CompShg,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,CompShg,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [101]:
# COLUMN:		roof_matl                  
# DEFINITION:	Roof material
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_matl.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['CompShg', 'WdShngl', 'Tar&Grv', 'WdShake', 'Membran', 'ClyTile'],
      dtype=object)

In [102]:
df = df.drop(columns=['roof_matl'])


In [103]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exterior_first,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,BrkFace,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,Wd Sdng,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,HdBoard,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,CemntBd,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [104]:
# COLUMN:		exterior_first                        
# DEFINITION:	Exterior covering on house
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_first.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['BrkFace', 'Wd Sdng', 'VinylSd', 'HdBoard', 'CemntBd', 'Plywood',
       'MetalSd', 'AsbShng', 'WdShing', 'Stucco', 'BrkComm', 'CBlock',
       'AsphShn', 'Stone', 'ImStucc'], dtype=object)

In [105]:
df = df.drop(columns=['exterior_first'])


In [106]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exterior_2nd,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,Plywood,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,Wd Sdng,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,HdBoard,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,CmentBd,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [107]:
# COLUMN:		exterior_2nd                              
# DEFINITION:	Exterior covering on house (if more than one
# material)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. exterior_2nd.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Plywood', 'Wd Sdng', 'VinylSd', 'HdBoard', 'CmentBd', 'Wd Shng',
       'MetalSd', 'ImStucc', 'Brk Cmn', 'AsbShng', 'BrkFace', 'Stucco',
       'CBlock', 'Stone', 'AsphShn'], dtype=object)

In [108]:
df = df.drop(columns=['exterior_2nd'])


In [109]:
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,Stone,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,BrkFace,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [110]:
# COLUMN:		mas_vnr_type                                     
# DEFINITION:	Masonry veneer type
# DATA TYPE:	object
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Stone', 'BrkFace', 'None', nan, 'BrkCmn'], dtype=object)

In [111]:
df = df.drop(columns=['mas_vnr_type'])


In [112]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,112.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,108.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,0.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,0.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,0.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [113]:
# COLUMN:		mas_vnr_area                                          
# DEFINITION:	Masonry veneer area in square feet
# DATA TYPE:	float64
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1.120e+02, 1.080e+02, 0.000e+00, 6.030e+02, 1.190e+02, 4.800e+02,
       1.800e+02, 5.040e+02, 3.810e+02, 1.620e+02, 2.000e+02, 2.260e+02,
       2.400e+02, 1.680e+02, 7.600e+02, 1.095e+03, 2.320e+02, 4.120e+02,
       1.780e+02, 1.060e+02, 1.600e+01,       nan, 1.650e+02, 3.380e+02,
       3.620e+02, 3.480e+02, 3.000e+01, 5.790e+02, 3.600e+01, 1.220e+02,
       3.100e+01, 2.500e+02, 1.200e+02, 2.160e+02, 4.320e+02, 2.890e+02,
       2.800e+01, 4.200e+01, 4.510e+02, 2.680e+02, 8.600e+01, 3.400e+02,
       1.100e+02, 1.640e+02, 3.610e+02, 5.060e+02, 1.500e+02, 2.200e+02,
       3.240e+02, 2.610e+02, 2.180e+02, 3.510e+02, 2.940e+02, 3.000e+02,
       4.700e+01, 1.430e+02, 2.880e+02, 9.600e+01, 3.360e+02, 1.770e+02,
       8.500e+01, 2.460e+02, 7.200e+01, 2.400e+01, 3.200e+02, 4.790e+02,
       4.420e+02, 1.700e+02, 1.090e+02, 9.800e+01, 2.030e+02, 4.400e+01,
       1.860e+02, 3.350e+02, 6.000e+01, 8.400e+01, 1.880e+02, 1.600e+02,
       2.200e+01, 4.000e+01, 3.440e+02, 7.480e+02, 

In [114]:
df = df.drop(columns=['mas_vnr_area'])


In [115]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,TA,TA,CBlock,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,TA,TA,CBlock,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,TA,TA,PConc,Gd,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,Gd,TA,PConc,Gd,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,Gd,TA,PConc,Gd,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [116]:
# COLUMN:		exter_qual                                                   
# DEFINITION:	Evaluates the quality of the material on the
# exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['TA', 'Gd', 'Ex', 'Fa'], dtype=object)

In [117]:
exter_qual = {'TA':'0', 'Gd':'1', 'Ex':'2', 'Fa':'3'}


In [118]:
df.exter_qual = [exter_qual[item] for item in df.exter_qual] 


In [119]:
df.exter_qual.unique()


array(['0', '1', '2', '3'], dtype=object)

In [120]:
df["exter_qual"] = df["exter_qual"].astype(dtype=np.int)


In [121]:
df['exter_qual'].dtype


dtype('int64')

In [122]:
# COLUMN:		exter_cond                                                           
# DEFINITION:	Evaluates the present condition of the material on the exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_cond.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['TA', 'Gd', 'Po', 'Fa', 'Ex'], dtype=object)

In [123]:
df = df.drop(columns=['bsmt_qual'])

In [124]:
df['exter_cond'].dtype


dtype('O')

In [125]:
# COLUMN:		foundation                                                                    
# DEFINITION:	Type of foundation
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.foundation.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['CBlock', 'PConc', 'Slab', 'BrkTil', 'Stone', 'Wood'], dtype=object)

In [126]:
df = df.drop(columns=['foundation'])


In [127]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,0.0,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,0.0,TA,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,1.0,TA,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,1.0,TA,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [128]:
# COLUMN:		bsmt_qual                                                                              
# DEFINITION:	Evaluates the height of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df. bsmt_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


AttributeError: 'DataFrame' object has no attribute 'bsmt_qual'

In [None]:
df = df.drop(columns=['bsmt_qual'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmt_cond                                                                                        
# DEFINITION:	Evaluates the general condition of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmt_cond.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER
# MISSING VALUES!


In [None]:
bsmt_cond = {np.nan:'0', 'Gd':'1', 'TA':'2', 'Po':'3', 'Fa':'4', 'Ex':'5'}


In [None]:
df.bsmt_cond = [bsmt_cond[item] for item in df.bsmt_cond] 


In [None]:
df['bsmt_cond'] = pd.to_numeric(df['bsmt_cond'], errors='coerce')


In [None]:
df['bsmt_cond'].dtype


In [None]:
df.bsmt_cond.unique()


In [None]:
# COLUMN:		bsmt_exposure                                                                                              
# DEFINITION:	Refers to walkout or garden level walls
# DATA TYPE:	object
# MISSING VALUES:	58
# UNIQUE VALUES:	
df.bsmt_exposure.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmt_exposure'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmtfin_type_1                                                                                                   
# DEFINITION:	Rating of basement finished areawalls
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmtfin_type_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmtfin_type_1'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmtfin_sf_1                                                                                                        
# DEFINITION:	Type 1 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmtfin_sf_1'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmtfin_type_2                                                                                                             
# DEFINITION:	Rating of basement finished area (if multiple types)
# DATA TYPE:	object
# MISSING VALUES:	56
# UNIQUE VALUES:	
df.bsmtfin_type_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmtfin_type_2'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmtfin_sf_2                                                                                                                   
# DEFINITION:	Type 2 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmtfin_sf_2'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		bsmt_unf_sf                                                                                                                         
# DEFINITION:	Unfinished square feet of basement area
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmt_unf_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['bsmt_unf_sf'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		total_bsmt_sf                                                                                                                             
# DEFINITION:	Total square feet of basement areaarea
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.total_bsmt_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [None]:
df=df.drop(columns=['total_bsmt_sf'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		heating                                                                                                                                         
# DEFINITION:	Type of heating
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


In [None]:
heating = {'GasA':'0', 'GasW':'1', 'Grav':'2', 'Wall':'3', 'OthW':'4'}


In [None]:
df.heating = [heating[item] for item in df.heating] 


In [None]:
df.heating.unique()


In [None]:
df["heating"] = df["heating"].astype(dtype=np.int)


In [None]:
df['heating'].dtype


In [None]:
# COLUMN:		heating_qc                                                                                                                                                  
# DEFINITION:	Heating quality and condition
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating_qc.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


In [None]:
heating_qc = {'Fa':'0', 'TA':'1', 'Gd':'2', 'Ex':'3', 'Po':'4'}


In [None]:
df.heating_qc = [heating_qc[item] for item in df.heating_qc] 


In [None]:
df.heating_qc.unique()


In [None]:
df["heating_qc"] = df["heating_qc"].astype(dtype=np.int)


In [None]:
df['heating_qc'].dtype


In [None]:
# COLUMN:		central_air                                                                                                                                                          
# DEFINITION:	Central air conditioning
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.central_air.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


In [None]:
central_air = {'Y':'0', 'N':'1'}


In [None]:
df.central_air = [central_air[item] for item in df.central_air] 


In [None]:
df.central_air.unique()


In [None]:
df["central_air"] = df["central_air"].astype(dtype=np.int)


In [None]:
df['central_air'].dtype


In [None]:
# COLUMN:		electrical                                                                                                                                                                   
# DEFINITION:	Electrical system
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. electrical.unique()


In [None]:
electrical = {'SBrkr':'0', 'FuseA':'1', 'FuseF':'2', 'FuseP':'3', 'Mix':'4'}


In [None]:
df.electrical = [electrical[item] for item in df.electrical] 


In [None]:
df.electrical.unique()


In [None]:
df["electrical"] = df["electrical"].astype(dtype=np.int)


In [None]:
df['electrical'].dtype


In [None]:
# COLUMN:		first_flr_sf                                                                                                                                                                                     
# DEFINITION:	First Floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.first_flr_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


In [None]:
df = df.drop(columns=['first_flr_sf'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		low_qual_fin_sf              
# DEFINITION:	Low quality finished square feet (all floors)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.low_qual_fin_sf.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


In [None]:
df = df.drop(columns=['low_qual_fin_sf'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		gr_liv_area                       
# DEFINITION:	Above grade (ground) living area square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.gr_liv_area.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!

In [None]:
df = df.drop(columns=['gr_liv_area'])


In [129]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2010.0,WD,215000.0
3.0,20.0,0.0,14267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,1958.0,1958.0,0.0,TA,TA,No,ALQ,923.0,Unf,0.0,406.0,1329.0,GasA,TA,Y,SBrkr,1329.0,0.0,0.0,1329.0,0.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1958.0,Unf,1.0,312.0,TA,TA,Y,393.0,36.0,0.0,0.0,0.0,0.0,,,Gar2,12500.0,6.0,2010.0,WD,172000.0
5.0,60.0,0.0,13830.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,5.0,1997.0,1998.0,0.0,TA,TA,No,GLQ,791.0,Unf,0.0,137.0,928.0,GasA,Gd,Y,SBrkr,928.0,701.0,0.0,1629.0,0.0,0.0,2.0,1.0,3.0,1.0,TA,6.0,Typ,1.0,TA,Attchd,1997.0,Fin,2.0,482.0,TA,TA,Y,212.0,34.0,0.0,0.0,0.0,0.0,,MnPrv,,0.0,3.0,2010.0,WD,189900.0
8.0,120.0,0.0,5005.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1992.0,1992.0,1.0,TA,TA,No,ALQ,263.0,Unf,0.0,1017.0,1280.0,GasA,Ex,Y,SBrkr,1280.0,0.0,0.0,1280.0,0.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,0.0,,Attchd,1992.0,RFn,2.0,506.0,TA,TA,Y,0.0,82.0,0.0,0.0,144.0,0.0,,,,0.0,1.0,2010.0,WD,191500.0
9.0,120.0,0.0,5389.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,8.0,5.0,1995.0,1996.0,1.0,TA,TA,No,GLQ,1180.0,Unf,0.0,415.0,1595.0,GasA,Ex,Y,SBrkr,1616.0,0.0,0.0,1616.0,1.0,0.0,2.0,0.0,2.0,1.0,Gd,5.0,Typ,1.0,TA,Attchd,1995.0,RFn,2.0,608.0,TA,TA,Y,237.0,152.0,0.0,0.0,0.0,0.0,,,,0.0,3.0,2010.0,WD,236500.0


In [130]:
# COLUMN:		bsmt_full_bath                           
# DEFINITION:	Basement full bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_full_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!

array([ 1.,  0.,  2.,  3., nan])

In [None]:
df = df.dropna(subset=['bsmt_full_bath'])

In [None]:
df.bsmt_full_bath.unique()

In [None]:
df["bsmt_full_bath"] = df["bsmt_full_bath"].astype(dtype=np.int)


In [None]:
df['bsmt_full_bath'].dtype


In [None]:
df.bsmt_full_bath.unique()


In [None]:
# COLUMN:		bsmt_half_bath                              
# DEFINITION:	Basement half bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_half_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!

In [None]:
df = df.dropna(subset=['bsmt_half_bath'])


In [None]:
df.bsmt_half_bath.unique()


In [None]:
df["bsmt_half_bath"] = df["bsmt_half_bath"].astype(dtype=np.int)


In [None]:
df['bsmt_half_bath'].dtype


In [None]:
df.bsmt_half_bath.unique()


In [None]:
# COLUMN:		full_bath                                        
# DEFINITION:	Full bathrooms above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.full_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


In [None]:
# COLUMN:		half_bath                                                  
# DEFINITION:	Half baths above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.half_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


In [None]:
# COLUMN:		bedroom_abvgr                                                  
# DEFINITION:	Bedrooms above grade (does NOT include basement bedrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bedroom_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


In [None]:
# COLUMN:		kitchen_abvgr                                                        
# DEFINITION:	Kitchens above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. kitchen_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

In [None]:
# COLUMN:		kitchen_qual                                                              
# DEFINITION:	Kitchen quality
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.kitchen_qual.unique()


In [None]:
# COLUMN:		totrms_abvgrd                                                                     
# DEFINITION:	Total rooms above grade (does not include bathrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.totrms_abvgrd.unique()


In [None]:
# COLUMN:		functional                                                                             
# DEFINITION:	Home functionality (Assume typical unless deductions are warranted)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.functional.unique()


In [None]:
# COLUMN:		fireplaces                                                                                       
# DEFINITION:	Number of fireplaces
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.fireplaces.unique()

In [None]:
# COLUMN:		fireplace_qu                                                                                              
# DEFINITION:	Fireplace quality
# DATA TYPE:	object
# MISSING VALUES:	1000
# UNIQUE VALUES:	
df.fireplace_qu.unique()


In [174]:
df = df.drop(columns=['fireplace_qu'])


In [175]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,1.0,1.0,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,1.0,1.0,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,1.0,1.0,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,1.0,1.0,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [129]:
# COLUMN:		garage_type                                                                                                      
# DEFINITION:	Garage location
# DATA TYPE:	object
# MISSING VALUES:	113
# UNIQUE VALUES:	
df.garage_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Attchd', 'BuiltIn', 'Detchd', nan, 'Basment', '2Types', 'CarPort'],
      dtype=object)

In [130]:
garage_type = {np.nan:'0', 'Attchd':'1', 'Detchd':'2', 'BuiltIn':'3', 'CarPort':'4', 'Basment':'5', '2Types':'6'}
df.garage_type = [garage_type[item] for item in df.garage_type] 
df['garage_type'] = pd.to_numeric(df['garage_type'], errors='garage_type')


In [131]:
df['garage_type'].dtype


dtype('int64')

In [132]:
df.garage_type.unique()


array([1, 3, 2, 0, 5, 6, 4])

In [186]:
# COLUMN:		garage_yr_blt                                                                                                           
# DEFINITION:	Year garage was built
# DATA TYPE:	float64
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_yr_blt.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1960., 1997., 1959., 1962., 2000., 2005., 1957., 1945., 1974.,
       1928., 2003., 1958., 1966., 1992., 2004., 2009., 1978., 1976.,
       1948., 1967., 1989., 1953., 1951., 1970., 1971.,   nan, 1999.,
       1977., 1975., 2008., 1973., 1950., 1956., 1993., 1963., 1961.,
       1968., 1969., 1998., 1965., 1964., 1987., 1920., 1955., 1954.,
       1952., 1985., 1972., 2006., 1995., 1994., 1984., 1983., 1979.,
       1949., 1941., 1996., 1930., 1990., 1980., 1986., 2001., 1922.,
       2002., 2010., 2007., 1939., 1910., 1917., 1923., 1936., 1931.,
       1915., 1991., 1981., 1934., 1900., 1988., 1940., 1942., 1926.,
       1946., 1925., 1929., 1916., 1982., 1947., 1937., 1935., 1938.,
       1895., 1924., 1914., 1918., 2207., 1932., 1921., 1927., 1933.,
       1919., 1896.])

In [174]:
df = df.drop(columns=['garage_yr_blt'])


In [175]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,1.0,1.0,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,1.0,1.0,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,1.0,1.0,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,1.0,1.0,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [184]:
# COLUMN:		garage_finish                                                                                                                
# DEFINITION:	Interior finish of the garage
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_finish.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!



array(['Fin', 'RFn', 'Unf', nan], dtype=object)

In [174]:
df = df.drop(columns=['garage_finish'])


In [175]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,1.0,1.0,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,1.0,1.0,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,1.0,1.0,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,1.0,1.0,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [178]:
# COLUMN:		garage_cars                                                                                                                      
# DEFINITION:	Size of garage in car capacity
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_cars.unique()


array([ 2.,  1.,  3.,  0.,  4.,  5., nan])

In [179]:
df = df.dropna(subset=['garage_cars'])

In [180]:
df.garage_cars.unique()

array([2., 1., 3., 0., 4., 5.])

In [181]:
df["garage_cars"] = df["garage_cars"].astype(dtype=np.int)


In [182]:
df['garage_cars'].dtype

dtype('int64')

In [183]:
df.garage_cars.unique()


array([2, 1, 3, 0, 4, 5])

In [None]:
# COLUMN:		garage_area                                                                                                                             
# DEFINITION:	Size of garage in square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


In [174]:
df = df.drop(columns=['garage_area'])


In [175]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,1.0,1.0,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,1.0,1.0,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,1.0,1.0,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,1.0,1.0,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [162]:
# COLUMN:		garage_qual                                                                                                                                     
# DEFINITION:	Garage quality
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_qual.unique()
# COLUMN APPEARS TO BE THE SAME AS garage_cond. DROPPING
# garage_cond AND KEEPING garage_qual. CHANGE DATA TYPE,
# ADDRESS MISSING VALUES!


array(['TA', 'Fa', nan, 'Ex', 'Gd', 'Po'], dtype=object)

In [164]:
print(df['garage_qual'])


1        TA
160      TA
163      TA
89       TA
164      TA
165      TA
166      TA
174      TA
176      TA
178      TA
182      Fa
140      TA
139      TA
138      TA
136      TA
91       TA
94       TA
96       TA
97       TA
98       TA
99       TA
100      TA
101      TA
104      TA
108      TA
109      TA
158      TA
155      TA
154      TA
153      TA
19       TA
20       TA
23       TA
43       TA
24       TA
25       TA
28      NaN
29       TA
30       TA
32       TA
33       TA
34       TA
111      TA
35       TA
38       TA
41       TA
84       TA
86       TA
87       TA
162      TA
141      TA
142      TA
145      TA
146      TA
149      TA
152      TA
37       TA
16       TA
112      TA
119      TA
620      TA
561      TA
591      TA
563      TA
568      TA
571      TA
572      TA
573      TA
579      TA
580      TA
581      TA
582      TA
586      TA
588      TA
589      TA
590      TA
592      TA
615     NaN
594      TA
595      TA
596      TA
601      TA
602      Ex
604

In [167]:
garage_qual = {np.nan:'0', 'TA':'1', 'Fa':'2', 'Ex':'3', 'Gd':'4', 'Po':'5'}


In [168]:
df.garage_qual = [garage_qual[item] for item in df.garage_qual] 

In [169]:
df['garage_qual'] = pd.to_numeric(df['garage_qual'], errors='coerce')


In [171]:
df['garage_qual'].dtype

dtype('int64')

In [173]:
df.garage_qual.unique()

array([1, 2, 0, 3, 4, 5])

In [163]:
# COLUMN:		garage_cond                                                                                                                                            
# DEFINITION:	Garage condition
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_cond.unique()


array(['TA', 'Fa', nan, 'Ex', 'Po', 'Gd'], dtype=object)

In [165]:
print(df['garage_cond'])


1        TA
160      TA
163      TA
89       TA
164      TA
165      TA
166      TA
174      TA
176      TA
178      TA
182      Fa
140      TA
139      TA
138      TA
136      TA
91       TA
94       TA
96       TA
97       TA
98       TA
99       TA
100      TA
101      TA
104      TA
108      TA
109      TA
158      TA
155      TA
154      TA
153      TA
19       TA
20       TA
23       TA
43       TA
24       TA
25       TA
28      NaN
29       TA
30       TA
32       TA
33       TA
34       TA
111      TA
35       TA
38       TA
41       TA
84       TA
86       TA
87       TA
162      TA
141      TA
142      TA
145      TA
146      TA
149      TA
152      TA
37       TA
16       TA
112      TA
119      TA
620      TA
561      TA
591      TA
563      TA
568      TA
571      TA
572      TA
573      TA
579      TA
580      TA
581      TA
582      TA
586      TA
588      TA
589      TA
590      TA
592      TA
615     NaN
594      TA
595      TA
596      TA
601      TA
602      Ex
604

In [174]:
df = df.drop(columns=['garage_cond'])


In [175]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,1.0,1.0,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,1.0,1.0,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,1.0,1.0,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,1.0,1.0,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [158]:
# COLUMN:		paved_drive                                                                                                                                                  
# DEFINITION:	Paved driveway
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.paved_drive.unique()


array(['P', 'Y', 'N'], dtype=object)

In [159]:
paved_drive = {'P':'1', 'Y':'1', 'N':'0'}
df.paved_drive = [paved_drive[item] for item in df.paved_drive] 
df["paved_drive"] = df["paved_drive"].astype(dtype=np.int)


In [160]:
df.paved_drive.unique()


array([1, 0])

In [161]:
df['paved_drive'].dtype


dtype('int64')

In [None]:
# COLUMN:		wood_deck_sf                                                                                                                                                       
# DEFINITION:	Wood deck area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.wood_deck_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!

In [153]:
df = df.drop(columns=['wood_deck_sf'])


In [154]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,TA,TA,Y,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,TA,TA,Y,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,TA,TA,Y,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,TA,TA,Y,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [141]:
# COLUMN:		open_porch_sf                                                                                                                                                         
# DEFINITION:	Open porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.open_porch_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 62,  36,  34,  82, 152,  60,  84,  21,  75,   0,  54, 122,  96,
        85,  68,  55,  30, 133,  50,  70, 119,  67, 150, 130,  49,  27,
        23,  20,  48,  56,  32,  57,  81,  86, 136,  45, 168, 104, 144,
        39, 172, 166, 192,  78,  44,  76,  66,  26,  40,  73,  38,  52,
        17, 124, 100, 228,  18, 158,  10,  11,  46, 278,  92,  90,  33,
        61,  59,  25,  35, 105,  64, 140, 207,  53, 312, 111,  72,  94,
       176, 195, 120,  28, 162, 102, 197,  98, 274, 170, 185, 190, 116,
        63, 235, 183,  16,  51, 128, 146, 126, 165, 226, 121, 175, 113,
        91,  41,  42,  93,  74, 234,  24,  99,  58,  88,  80, 110, 189,
       204,  12, 156, 103, 523, 135, 198, 215, 142,  29, 151, 200, 148,
       112, 160, 118, 154,  95, 238, 304, 101, 173,  22, 282,  69, 180,
       134, 153,  87, 174, 108, 210, 251,  65, 243, 240, 211, 129,   4,
       114, 213, 547, 291, 502, 299, 365, 182,  89, 117, 137,   8, 187,
       155, 159, 106, 372, 292, 184, 141, 123, 276, 265, 164, 22

In [153]:
df = df.drop(columns=['open_porch_sf'])


In [154]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,TA,TA,Y,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,TA,TA,Y,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,TA,TA,Y,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,TA,TA,Y,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [139]:
# COLUMN:		enclosed_porch                                                                                                                                                              
# DEFINITION:	Enclosed porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.enclosed_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! 
# COMBINE threessn_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.

array([  0, 184, 154, 186, 156, 120, 150, 164, 189, 205, 113, 216, 135,
       130, 126, 246, 196,  18, 158, 114, 128,  35,  48,  32,  64, 364,
       112, 318,  45, 176,  77,  52,  56, 168,  36, 136, 162,  98, 265,
        50, 280, 222, 202, 144,  24, 236,  84, 264, 260, 203, 140, 100,
       134, 432, 198,  42,  40, 148,  25,  80, 160, 226, 244, 115,  94,
       105,  54,  34, 268,  30, 213, 288,  90, 177, 211, 185, 180,  44,
        57,  81, 218,  78,  72, 368,  70, 165,  92,  16, 192, 123,  96,
       102,  66, 210, 109,  60, 194, 219, 259, 116, 212,  20, 101,  87,
       117, 204, 122, 231, 239, 138, 301, 207, 224, 172, 174, 137,  99,
       249, 252, 291, 145, 214, 275, 175,  26, 143, 183, 230, 170,  88,
        39,  68,  43,  19, 200, 169, 133, 234,  37, 240, 324, 161,  75,
       167, 104, 296, 330, 228, 256,  55, 129, 225, 294, 121, 190, 208,
       272,  67,  23])

In [149]:
df.sort_values(["enclosed_porch"], axis=0, ascending=True, inplace=True)


In [150]:
print(df['enclosed_porch'])


1         0
160       0
163       0
89        0
164       0
165       0
166       0
174       0
176       0
178       0
182       0
140       0
139       0
138       0
136       0
91        0
94        0
96        0
97        0
98        0
99        0
100       0
101       0
104       0
108       0
109       0
158       0
155       0
154       0
153       0
19        0
20        0
23        0
43        0
24        0
25        0
28        0
29        0
30        0
32        0
33        0
34        0
111       0
35        0
38        0
41        0
84        0
86        0
87        0
162       0
141       0
142       0
145       0
146       0
149       0
152       0
37        0
16        0
112       0
119       0
620       0
561       0
591       0
563       0
568       0
571       0
572       0
573       0
579       0
580       0
581       0
582       0
586       0
588       0
589       0
590       0
592       0
615       0
594       0
595       0
596       0
601       0
602       0
604

In [145]:
# COLUMN:		threessn_porch                                                                                                                                                                       
# DEFINITION:	Three season porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.threessn_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS screen_porch. WILL DROP threessn_porch AND
# COMBINE screen_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.
# 

array([  0, 224, 144, 508, 168, 255, 162, 140, 150, 182, 153, 304, 407,
        96, 245, 216, 120, 176,  86, 290, 180, 323])

In [146]:
df.sort_values(["threessn_porch"], axis=0, ascending=True, inplace=True)


In [148]:
print(df['threessn_porch'])


1         0
1960      0
1959      0
1957      0
1956      0
1955      0
1953      0
1951      0
1950      0
1949      0
1948      0
1947      0
1945      0
1942      0
1941      0
1940      0
1938      0
1937      0
1934      0
1933      0
1932      0
1930      0
1929      0
1928      0
1927      0
1926      0
1925      0
1924      0
1961      0
1962      0
1963      0
1965      0
2007      0
2006      0
2004      0
2002      0
2001      0
2000      0
1998      0
1996      0
1995      0
1994      0
1993      0
1990      0
1987      0
1921      0
1986      0
1982      0
1981      0
1979      0
1978      0
1976      0
1975      0
1974      0
1973      0
1972      0
1971      0
1970      0
1968      0
1967      0
1983      0
1919      0
1916      0
1915      0
1872      0
1871      0
1870      0
1869      0
1864      0
1863      0
1862      0
1861      0
1860      0
1859      0
1857      0
1854      0
1853      0
1873      0
1852      0
1849      0
1848      0
1847      0
1846      0
184

In [155]:
df['enclosed_porch']=df['enclosed_porch']+df['threessn_porch']    
    
    

In [156]:
print(df['enclosed_porch'])


1         0
160       0
163       0
89        0
164       0
165       0
166       0
174       0
176       0
178       0
182       0
140       0
139       0
138       0
136       0
91        0
94        0
96        0
97        0
98        0
99        0
100       0
101       0
104       0
108       0
109       0
158       0
155       0
154       0
153       0
19        0
20        0
23        0
43        0
24        0
25        0
28        0
29        0
30        0
32        0
33        0
34        0
111       0
35        0
38        0
41        0
84        0
86        0
87        0
162       0
141       0
142       0
145       0
146       0
149       0
152       0
37        0
16        0
112       0
119       0
620       0
561       0
591       0
563       0
568       0
571       0
572       0
573       0
579       0
580       0
581       0
582       0
586       0
588       0
589       0
590       0
592       0
615       0
594       0
595       0
596       0
601       0
602       0
604

In [153]:
df = df.drop(columns=['threessn_porch'])


In [154]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,TA,TA,Y,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,TA,TA,Y,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,TA,TA,Y,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,TA,TA,Y,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [151]:
# COLUMN:		screen_porch                                                                                                                                                                             
# DEFINITION:	Screen porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.screen_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS threessn_porch. WILL DROP screen_porch!!!


array([  0, 182, 385, 165, 143, 210, 168, 156, 152, 112, 204, 288, 189,
       252, 216, 234, 255, 342, 140, 291, 200, 160,  95, 144, 161, 155,
       180, 374, 270, 224, 192, 170, 231, 490,  92, 100, 396, 120, 145,
       142, 195, 233, 190, 208, 141, 130,  94, 266, 164, 220,  64, 163,
        90, 480, 176, 196, 135, 322, 174, 147, 276, 265, 271, 260, 175,
       198, 217, 201, 109, 150, 225, 259, 184, 126, 171,  84, 154, 116,
       280, 153, 104, 113, 240,  88, 138, 410, 312, 222, 108, 440, 162,
       115, 122, 148, 264, 348, 110,  53, 111, 227])

In [146]:
df.sort_values(["screen_porch"], axis=0, ascending=True, inplace=True)


In [148]:
print(df['screen_porch'])


1         0
1960      0
1959      0
1957      0
1956      0
1955      0
1953      0
1951      0
1950      0
1949      0
1948      0
1947      0
1945      0
1942      0
1941      0
1940      0
1938      0
1937      0
1934      0
1933      0
1932      0
1930      0
1929      0
1928      0
1927      0
1926      0
1925      0
1924      0
1961      0
1962      0
1963      0
1965      0
2007      0
2006      0
2004      0
2002      0
2001      0
2000      0
1998      0
1996      0
1995      0
1994      0
1993      0
1990      0
1987      0
1921      0
1986      0
1982      0
1981      0
1979      0
1978      0
1976      0
1975      0
1974      0
1973      0
1972      0
1971      0
1970      0
1968      0
1967      0
1983      0
1919      0
1916      0
1915      0
1872      0
1871      0
1870      0
1869      0
1864      0
1863      0
1862      0
1861      0
1860      0
1859      0
1857      0
1854      0
1853      0
1873      0
1852      0
1849      0
1848      0
1847      0
1846      0
184

In [153]:
df = df.drop(columns=['screen_porch'])


In [154]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,TA,TA,Y,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,TA,TA,Y,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,TA,TA,Y,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,TA,TA,Y,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [152]:
# COLUMN:		pool_area                                                                                                                                                                                       
# DEFINITION:	Pool area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.pool_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!

array([  0, 576, 480, 368, 800, 561, 738, 519, 648, 228])

In [153]:
df = df.drop(columns=['pool_area'])


In [154]:
df.head(5)

Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,land_contour,utilities,lot_config,neighborhood,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,exter_qual,exter_cond,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_qc,fence,misc_val,mo_sold,yr_sold,sale_type,saleprice
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1.0,20.0,0.0,31770.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,5.0,1960.0,1960.0,0.0,TA,Gd,Gd,BLQ,639.0,Unf,0.0,441.0,1080.0,GasA,Fa,Y,SBrkr,1656.0,0.0,0.0,1656.0,1.0,0.0,1.0,0.0,3.0,1.0,TA,7.0,Typ,2.0,Gd,Attchd,1960.0,Fin,2.0,528.0,TA,TA,P,210.0,62.0,0.0,0.0,0.0,,,0.0,5.0,2010.0,0.0,215000.0
160.0,20.0,0.0,9830.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,7.0,1959.0,2006.0,0.0,Gd,TA,No,ALQ,72.0,Rec,258.0,733.0,1063.0,GasA,Ex,Y,SBrkr,1287.0,0.0,0.0,1287.0,1.0,0.0,1.0,0.0,3.0,1.0,Gd,7.0,Typ,1.0,Gd,Detchd,1997.0,Fin,2.0,576.0,TA,TA,Y,364.0,17.0,0.0,0.0,182.0,,,0.0,3.0,2010.0,0.0,162000.0
163.0,20.0,0.0,7500.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0,7.0,1959.0,1994.0,0.0,TA,TA,No,LwQ,340.0,Rec,906.0,0.0,1246.0,GasA,Ex,Y,SBrkr,1246.0,0.0,0.0,1246.0,1.0,0.0,1.0,1.0,3.0,1.0,Gd,6.0,Typ,0.0,,Attchd,1959.0,RFn,1.0,305.0,TA,TA,Y,218.0,0.0,0.0,0.0,0.0,,GdPrv,0.0,5.0,2010.0,0.0,154000.0
89.0,20.0,0.0,6897.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.0,8.0,1962.0,2010.0,0.0,Gd,TA,No,ALQ,659.0,Unf,0.0,381.0,1040.0,GasA,Ex,Y,SBrkr,1040.0,0.0,0.0,1040.0,1.0,0.0,1.0,1.0,3.0,1.0,TA,6.0,Typ,0.0,,Detchd,1962.0,Unf,1.0,260.0,TA,TA,Y,0.0,104.0,0.0,0.0,0.0,,,0.0,4.0,2010.0,0.0,127000.0
164.0,50.0,0.0,8520.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,4.0,1952.0,1952.0,0.0,Fa,TA,No,Rec,507.0,Unf,0.0,403.0,910.0,GasA,Fa,Y,SBrkr,910.0,475.0,0.0,1385.0,0.0,0.0,2.0,0.0,4.0,1.0,TA,6.0,Typ,0.0,,Detchd,2000.0,Unf,2.0,720.0,TA,TA,Y,0.0,0.0,0.0,0.0,0.0,,MnPrv,0.0,6.0,2010.0,0.0,166000.0


In [138]:
# COLUMN:		pool_qc                                                                                                                                                                                                   
# DEFINITION:	Pool quality
# DATA TYPE:	object
# MISSING VALUES:	2042
# UNIQUE VALUES:	
df.pool_qc.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([nan, 'Gd', 'TA', 'Ex', 'Fa'], dtype=object)

In [137]:
df = df.drop(columns=['pool_qc'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		fence              
# DEFINITION:	Fence quality
# DATA TYPE:	object
# MISSING VALUES:	1651
# UNIQUE VALUES:	
df.fence.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!

In [137]:
df = df.drop(columns=['fence'])


In [None]:
df.head(5)

In [None]:
# COLUMN:		misc_feature       
# DEFINITION:	Miscellaneous feature not covered in other categories
# DATA TYPE:	object
# MISSING VALUES:	1986
# UNIQUE VALUES:	
df.misc_feature.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. LOTS OF MISSING
# VALUES. MISC_VAL DOES NOT EQUAL VALUES FROM MISC_FEATURE.
# COLUMNS APPEAR TO CORILATE.COLUMN TO BE DROPPED! 


In [137]:
df = df.drop(columns=['misc_feature'])


In [None]:
df.head(5)

In [136]:
# COLUMN:		misc_val            
# DEFINITION:	Value of miscellaneous feature
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.misc_val.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

array([    0, 12500,   500,   700,   400,   450,   300,  1200,  3500,
        2000,  2500,    54,    80,   650,   600,   900,   800,  1500,
        6500,  1150,  4500,  3000,  1300,  8300,   480, 17000,   455,
         460])

In [135]:
# COLUMN:		mo_sold             
# DEFINITION:	Month Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.mo_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

array([ 5,  6,  3,  1,  4,  2,  7,  8, 10,  9, 12, 11])

In [134]:
# COLUMN:		yr_sold             
# DEFINITION:	Year Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.yr_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2010, 2009, 2008, 2007, 2006])

In [130]:
# COLUMN:		sale_type          
# DEFINITION:	Type of sale
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.sale_type.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. NO MISSING
# VALUES!

array(['WD ', 'COD', 'ConLI', 'New', 'Con', 'ConLD', 'Oth', 'ConLw',
       'CWD'], dtype=object)

In [131]:
sale_type = {'WD ':'0', 'COD':'1', 'ConLI':'2', 'New':'3', 'Con':'4', 'ConLD':'5', 'Oth':'6', 'ConLw':'7','CWD':'8'}
df.sale_type = [sale_type[item] for item in df.sale_type] 
df["sale_type"] = df["sale_type"].astype(dtype=np.int)


In [132]:
df.sale_type.unique()


array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [133]:
df['sale_type'].dtype


dtype('int64')

In [100]:
# COLUMN:		saleprice                     
# DEFINITION:	Condition of sale
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.saleprice.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!

array([215000, 172000, 189900, 191500, 236500, 189000, 175900, 185000,
       180400, 171500, 212000, 538000, 141000, 210000, 190000, 216000,
       149000, 149900, 142000, 115000, 184000,  96000,  88000, 127500,
       120000, 376162, 306000, 220000, 259000, 214000, 611657, 500000,
       320000, 319900, 205000, 175500, 199500, 192000, 184500, 216500,
       185088, 222500, 333168, 260400, 325000, 290000, 221000, 410000,
       221500, 204500, 215200, 262500, 254900, 233000, 181000, 143000,
        99500, 152000, 112000, 138500, 122000, 127000, 169000, 260000,
       155000, 151000, 149500, 222000, 177500, 147110, 267916, 206000,
       130500, 218500, 243500, 196500, 128950, 159000, 178900, 136300,
       180500, 172500, 116500,  76500, 128000, 153000, 154300, 135000,
       136000, 165500, 148000, 167500, 108000, 122500, 119000, 109000,
       105000, 107500,  97500, 162000, 132000, 154000, 166000, 134800,
       160000, 109500,  80000, 130000, 129000,  12789, 105900, 150000,
      

COLUMNS THAT ARE NO IN DATABASE: ms_subclass, 

------------------------------------------ STOP HERE!!! ---------------------------------------------------------

**Export Dataframe:**

In [101]:
df.to_csv('../data/train_clean.csv')