**Description of the Ames Iowa Housing Data columns:**

SOURCE: https://rdrr.io/cran/AmesHousing/man/ames_raw.html

Order: Observation number

PID: Parcel identification number - can be used with city web site for parcel review.

MS SubClass: Identifies the type of dwelling involved in the sale.

MS Zoning: Identifies the general zoning classification of the sale.

Lot Frontage: Linear feet of street connected to property

Lot Area: Lot size in square feet

Street: Type of road access to property

Alley: Type of alley access to property

Lot Shape: General shape of property

Land Contour: Flatness of the property

Utilities: Type of utilities available

Lot Config: Lot configuration

Land Slope: Slope of property

Neighborhood: Physical locations within Ames city limits (map available)

Condition 1: Proximity to various conditions

Condition 2: Proximity to various conditions (if more than one is present)

Bldg Type: Type of dwelling

House Style: Style of dwelling

Overall Qual: Rates the overall material and finish of the house

Overall Cond: Rates the overall condition of the house

Year Built: Original construction date

Year Remod/Add: Remodel date (same as construction date if no remodeling or additions)

Roof Style: Type of roof

Roof Matl: Roof material

Exterior 1: Exterior covering on house

Exterior 2: Exterior covering on house (if more than one material)

Mas Vnr Type: Masonry veneer type

Mas Vnr Area: Masonry veneer area in square feet

Exter Qual: Evaluates the quality of the material on the exterior

Exter Cond: Evaluates the present condition of the material on the exterior

Foundation: Type of foundation

Bsmt Qual: Evaluates the height of the basement

Bsmt Cond: Evaluates the general condition of the basement

Bsmt Exposure: Refers to walkout or garden level walls

BsmtFin Type 1: Rating of basement finished area

BsmtFin SF 1: Type 1 finished square feet

BsmtFinType 2: Rating of basement finished area (if multiple types)

BsmtFin SF 2: Type 2 finished square feet

Bsmt Unf SF: Unfinished square feet of basement area

Total Bsmt SF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

Central Air: Central air conditioning

Electrical: Electrical system

1st Flr SF: First Floor square feet

2nd Flr SF: Second floor square feet

Low Qual Fin SF: Low quality finished square feet (all floors)

Gr Liv Area: Above grade (ground) living area square feet

Bsmt Full Bath: Basement full bathrooms

Bsmt Half Bath: Basement half bathrooms

Full Bath: Full bathrooms above grade

Half Bath: Half baths above grade

Bedroom: Bedrooms above grade (does NOT include basement bedrooms)

Kitchen: Kitchens above grade

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality (Assume typical unless deductions are warranted)

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

Garage Type: Garage location

Garage Yr Blt: Year garage was built

Garage Finish: Interior finish of the garage

Garage Cars: Size of garage in car capacity

Garage Area: Size of garage in square feet

Garage Qual: Garage quality

Garage Cond: Garage condition

Paved Drive: Paved driveway

Wood Deck SF: Wood deck area in square feet

Open Porch SF: Open porch area in square feet

Enclosed Porch: Enclosed porch area in square feet

3-Ssn Porch: Three season porch area in square feet

Screen Porch: Screen porch area in square feet

Pool Area: Pool area in square feet

Pool QC: Pool quality

Fence: Fence quality

Misc Feature: Miscellaneous feature not covered in other categories

Misc Val: $Value of miscellaneous feature

Mo Sold: Month Sold

Yr Sold: Year Sold

Sale Type: Type of sale

Sale Condition: Condition of sale

**Load packages:**

In [1]:
import numpy as np
import pandas as pd


 **Load the data:**

In [2]:
ames_file_train = '../data/train.csv'


In [3]:
df = pd.read_csv(ames_file_train)


**Change Index type and start at #1:**

In [4]:
df.index = df.index.astype(int, copy = False)


In [5]:
df.index = df.index + 1


In [6]:
df.head(5)


Unnamed: 0,Id,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
1,109,533352170,60,RL,,13517,Pave,,IR1,Lvl,...,0,0,,,,0,3,2010,WD,130500
2,544,531379050,60,RL,43.0,11492,Pave,,IR1,Lvl,...,0,0,,,,0,4,2009,WD,220000
3,153,535304180,20,RL,68.0,7922,Pave,,Reg,Lvl,...,0,0,,,,0,1,2010,WD,109000
4,318,916386060,60,RL,73.0,9802,Pave,,Reg,Lvl,...,0,0,,,,0,4,2010,WD,174000
5,255,906425045,50,RL,82.0,14235,Pave,,IR1,Lvl,...,0,0,,,,0,3,2010,WD,138500


**Describe the basic format:**

In [7]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


In [8]:
df.dtypes


Id                   int64
PID                  int64
MS SubClass          int64
MS Zoning           object
Lot Frontage       float64
Lot Area             int64
Street              object
Alley               object
Lot Shape           object
Land Contour        object
Utilities           object
Lot Config          object
Land Slope          object
Neighborhood        object
Condition 1         object
Condition 2         object
Bldg Type           object
House Style         object
Overall Qual         int64
Overall Cond         int64
Year Built           int64
Year Remod/Add       int64
Roof Style          object
Roof Matl           object
Exterior 1st        object
Exterior 2nd        object
Mas Vnr Type        object
Mas Vnr Area       float64
Exter Qual          object
Exter Cond          object
Foundation          object
Bsmt Qual           object
Bsmt Cond           object
Bsmt Exposure       object
BsmtFin Type 1      object
BsmtFin SF 1       float64
BsmtFin Type 2      object
B

In [9]:
df.shape


(2051, 81)

**Drop unwanted columns:**

* Preliminary look through data, more columns to be dropped in Analyze & Evaluate section.

In [10]:
df.drop(columns=['PID', 'Id'], inplace=True)


- 'PID' & 'Id' column are for identification purposes only. Not relevant to predicting housing costs.

In [11]:
df.head(5)


Unnamed: 0,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,SalePrice
1,60,RL,,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,RL,43.0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,RL,68.0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,RL,73.0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,RL,82.0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


**Determine missing observations:**

In [12]:
df.isnull().sum()


MS SubClass           0
MS Zoning             0
Lot Frontage        330
Lot Area              0
Street                0
Alley              1911
Lot Shape             0
Land Contour          0
Utilities             0
Lot Config            0
Land Slope            0
Neighborhood          0
Condition 1           0
Condition 2           0
Bldg Type             0
House Style           0
Overall Qual          0
Overall Cond          0
Year Built            0
Year Remod/Add        0
Roof Style            0
Roof Matl             0
Exterior 1st          0
Exterior 2nd          0
Mas Vnr Type         22
Mas Vnr Area         22
Exter Qual            0
Exter Cond            0
Foundation            0
Bsmt Qual            55
Bsmt Cond            55
Bsmt Exposure        58
BsmtFin Type 1       55
BsmtFin SF 1          1
BsmtFin Type 2       56
BsmtFin SF 2          1
Bsmt Unf SF           1
Total Bsmt SF         1
Heating               0
Heating QC            0
Central Air           0
Electrical      

**Make the column names searchable**

In [13]:
df.columns = df.columns.str.lower()


In [14]:
df.columns = df.columns.str.replace(' ','_')


In [15]:
df.columns = df.columns.str.replace('/','_')


In [16]:
df.columns = df.columns.str.replace('3','three')


In [17]:
df.columns = df.columns.str.replace('1st','first')


In [18]:
df.columns = df.columns.str.replace('2nd','second')


In [19]:
df.dtypes


ms_subclass          int64
ms_zoning           object
lot_frontage       float64
lot_area             int64
street              object
alley               object
lot_shape           object
land_contour        object
utilities           object
lot_config          object
land_slope          object
neighborhood        object
condition_1         object
condition_2         object
bldg_type           object
house_style         object
overall_qual         int64
overall_cond         int64
year_built           int64
year_remod_add       int64
roof_style          object
roof_matl           object
exterior_first      object
exterior_second     object
mas_vnr_type        object
mas_vnr_area       float64
exter_qual          object
exter_cond          object
foundation          object
bsmt_qual           object
bsmt_cond           object
bsmt_exposure       object
bsmtfin_type_1      object
bsmtfin_sf_1       float64
bsmtfin_type_2      object
bsmtfin_sf_2       float64
bsmt_unf_sf        float64
t

**Analyze & Evaluate Columns**

* I will try and predict housing prices, any data that could by useful in this task will be kept.

In [20]:
# COLUMN:		ms_subclass
# DEFINITION:	Identifies the type of dwelling involved in the
#               sale.
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.ms_subclass.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 60,  20,  50, 180, 160,  70, 120, 190,  85,  30,  90,  80,  75,
        45,  40, 150])

In [21]:
# COLUMN:		ms_zoning
# DEFINITION:	Identifies the general zoning classification of
#               the sale.
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. ms_zoning.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['RL', 'RM', 'FV', 'C (all)', 'A (agr)', 'RH', 'I (all)'],
      dtype=object)

In [22]:
ms_zoning = {'RL':'0', 'FV':'1', 'RH':'2', 'RM':'3', 'C (all)':'4', 'I (all)':'5', 'A (agr)':'6'}
df.ms_zoning = [ms_zoning[item] for item in df.ms_zoning] 
df.ms_zoning.unique()


array(['0', '3', '1', '4', '6', '2', '5'], dtype=object)

In [23]:
df["ms_zoning"] = df["ms_zoning"].astype(dtype=np.int)
df['ms_zoning'].dtype


dtype('int64')

In [24]:
# COLUMN:		lot_frontage
# DEFINITION:	Linear feet of street connected to property
# DATA TYPE:	float64
# MISSING VALUES:	330
# UNIQUE VALUES:	
df.lot_frontage.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ nan,  43.,  68.,  73.,  82., 137.,  35.,  70.,  21.,  64., 120.,
        24.,  74.,  93.,  34.,  80.,  71.,  72., 109.,  40., 103., 100.,
        92.,  65.,  75.,  60.,  30.,  79.,  41., 105., 107.,  81.,  36.,
        63.,  32.,  94.,  44.,  50.,  48.,  67.,  88.,  83.,  53.,  58.,
        57.,  52.,  87., 134.,  56.,  54., 140.,  78.,  85.,  90.,  96.,
        62.,  49.,  59., 155.,  91.,  61.,  86., 128.,  77.,  42.,  89.,
        51.,  69.,  55., 112.,  76., 125.,  98., 113., 102.,  22., 122.,
        84., 119., 118.,  66.,  95., 108., 195., 106.,  39., 110., 130.,
        97.,  45.,  37., 123.,  38., 129., 115.,  47., 114., 104.,  46.,
       121., 124., 313., 141., 101.,  99., 160., 174.,  26., 144., 138.,
       111.,  25.,  33., 200., 150., 117., 153., 116., 135.])

In [25]:
df = df.drop(columns=['lot_frontage'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,13517,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,11492,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,7922,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,9802,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,14235,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [26]:
# COLUMN:		lot_area
# DEFINITION:	Lot size in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! DECISION TO DROP DROP DATA.


array([13517, 11492,  7922, ..., 12444, 11449,  7558])

In [27]:
df = df.drop(columns=['lot_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,Pave,,IR1,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,Pave,,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,Pave,,Reg,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [28]:
# COLUMN:		street
# DEFINITION:	Type of road access to property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.street.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Pave', 'Grvl'], dtype=object)

In [29]:
street = {'Pave':'0', 'Grvl':'1'}
df.street = [street[item] for item in df.street] 
df.street.unique()


array(['0', '1'], dtype=object)

In [30]:
df["street"] = df["street"].astype(dtype=np.int)
df['street'].dtype


dtype('int64')

In [31]:
# COLUMN:		Alley
# DEFINITION:	Type of alley access to property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.alley.unique()
# EVALUATION: LOT WITH MORE THAN ONE POINT OF ENTRY, COULD
# EFFECT PRICE. MISSING DATA CAUSE, LOT DOESN'T ABUT ALLEY.
# COLUMN NEEDS TO BE CONVERTED TO INTERGER!


array([nan, 'Pave', 'Grvl'], dtype=object)

In [32]:
alley = {np.nan:'0', 'Pave':'1', 'Grvl':'1'}
df.alley = [alley[item] for item in df.alley]
df.alley.unique()


array(['0', '1'], dtype=object)

In [33]:
df['alley'] = pd.to_numeric(df['alley'], errors='coerce')
df['alley'].dtype



dtype('int64')

In [34]:
# COLUMN:		lot_shape
# DEFINITION:	General shape of property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.lot_shape.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['IR1', 'Reg', 'IR2', 'IR3'], dtype=object)

In [35]:
df=df.drop(columns=['lot_shape'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,Lvl,AllPub,CulDSac,Gtl,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,Lvl,AllPub,CulDSac,Gtl,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,Lvl,AllPub,Inside,Gtl,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,Lvl,AllPub,Inside,Gtl,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [36]:
# COLUMN:		land_contour
# DEFINITION:	Land Contour: Flatness of the property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_contour.unique()
# EVALUATION: LOT CONTOUR COULD EFFECT PRICE, HOUSE HAS VIEW 
# AND IS WORTH MORE. COLUMN NEEDS TO BE CONVERTED TO INTERGER,
# NO MISSING VALUES!


array(['Lvl', 'HLS', 'Bnk', 'Low'], dtype=object)

In [37]:
land_contour = {'Lvl':'0', 'HLS':'1', 'Bnk':'2', 'Low':'3'}
df.land_contour = [land_contour[item] for item in df.land_contour] 
df.land_contour.unique()


array(['0', '1', '2', '3'], dtype=object)

In [38]:
df["land_contour"] = df["land_contour"].astype(dtype=np.int)
df['land_contour'].dtype


dtype('int64')

In [39]:
# COLUMN:		utilities
# DEFINITION:	Type of utilities available
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.utilities.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['AllPub', 'NoSeWa', 'NoSewr'], dtype=object)

In [40]:
utilities = {'AllPub':'0', 'NoSewr':'1', 'NoSeWa':'2'}
df.utilities = [utilities[item] for item in df.utilities] 
df.utilities.unique()


array(['0', '2', '1'], dtype=object)

In [41]:
df["utilities"] = df["utilities"].astype(dtype=np.int)
df['utilities'].dtype


dtype('int64')

In [42]:
# COLUMN:		lot_config
# DEFINITION:	Lot configuration
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_config.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['CulDSac', 'Inside', 'Corner', 'FR2', 'FR3'], dtype=object)

In [43]:
lot_config = {'Corner':'0', 'Inside':'1', 'CulDSac':'2', 'FR2':'3', 'FR3':'4'}
df.lot_config = [lot_config[item] for item in df.lot_config] 
df.lot_config.unique()


array(['2', '1', '0', '3', '4'], dtype=object)

In [44]:
df["lot_config"] = df["lot_config"].astype(dtype=np.int)
df['lot_config'].dtype


dtype('int64')

In [45]:
# COLUMN:		land_slope
# DEFINITION:	Slope of property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_slope.unique()
# EVALUATION: COLUMN IS SIMILAR TO LAND CONTOUR, ONLY
# DESCRIBING DEGREE OF SLOPE. COLUMN DOES NOT PERTAIN TO
# HOUSING PRICE, COLUMN TO BE DROPPED!


array(['Gtl', 'Sev', 'Mod'], dtype=object)

In [46]:
df = df.drop(columns=['land_slope'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,Sawyer,RRAe,Norm,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,SawyerW,Norm,Norm,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,NAmes,Norm,Norm,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,Timber,Norm,Norm,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,SawyerW,Norm,Norm,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [47]:
# COLUMN:		neighborhood
# DEFINITION:	Physical locations within Ames city limits
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.neighborhood.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Sawyer', 'SawyerW', 'NAmes', 'Timber', 'Edwards', 'OldTown',
       'BrDale', 'CollgCr', 'Somerst', 'Mitchel', 'StoneBr', 'NridgHt',
       'Gilbert', 'Crawfor', 'IDOTRR', 'NWAmes', 'Veenker', 'MeadowV',
       'SWISU', 'NoRidge', 'ClearCr', 'Blmngtn', 'BrkSide', 'NPkVill',
       'Blueste', 'GrnHill', 'Greens', 'Landmrk'], dtype=object)

In [48]:
neighborhood = {'NAmes':'0', 'Gilbert':'1', 'StoneBr':'2', 'NWAmes':'3', 'Somerst':'4', 'BrDale':'5', 'NPkVill':'6','NridgHt':'7', 'Blmngtn':'8', 'NoRidge':'9', 'SawyerW':'10', 'Sawyer':'11', 'Greens':'12', 'BrkSide':'13','OldTown':'14', 'IDOTRR':'15', 'ClearCr':'16', 'SWISU':'17', 'Edwards':'18', 'CollgCr':'19', 'Crawfor':'20','Blueste':'21', 'Mitchel':'22', 'Timber':'23', 'MeadowV':'10', 'Veenker':'24', 'GrnHill':'25', 'Landmrk':'26'}
df.neighborhood = [neighborhood[item] for item in df.neighborhood] 
df.neighborhood.unique()


array(['11', '10', '0', '23', '18', '14', '5', '19', '4', '22', '2', '7',
       '1', '20', '15', '3', '24', '17', '9', '16', '8', '13', '6', '21',
       '25', '12', '26'], dtype=object)

In [49]:
df["neighborhood"] = df["neighborhood"].astype(dtype=np.int)
df['neighborhood'].dtype


dtype('int64')

In [50]:
# COLUMN:		condition_1
# DEFINITION:	Proximity to various conditions
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. condition_1.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['RRAe', 'Norm', 'PosA', 'Artery', 'Feedr', 'PosN', 'RRAn', 'RRNe',
       'RRNn'], dtype=object)

In [51]:
condition_1 = {'Norm':'0', 'RRAe':'1', 'RRNe':'2', 'Feedr':'3', 'Artery':'4', 'PosA':'5', 'PosN':'6', 'RRAn':'7','RRNn':'8'}
df.condition_1 = [condition_1[item] for item in df.condition_1] 
df.condition_1.unique()


array(['1', '0', '5', '4', '3', '6', '7', '2', '8'], dtype=object)

In [52]:
df["condition_1"] = df["condition_1"].astype(dtype=np.int)
df['condition_1'].dtype


dtype('int64')

In [53]:
# COLUMN:		condition_2
# DEFINITION:	Proximity to various conditions (if more than one is present)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.condition_2.unique()
# EVALUATION: REDUNDANT INFORMATION. COLUMN DOES NOT PERTAIN TO HOUSING PRICE, COLUMN
# TO BE DROPPED!


array(['Norm', 'RRNn', 'Feedr', 'Artery', 'PosA', 'PosN', 'RRAe', 'RRAn'],
      dtype=object)

In [54]:
df = df.drop(columns=['condition_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,1Fam,2Story,6,8,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,1Fam,2Story,7,5,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,1Fam,1Story,5,7,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,1Fam,2Story,5,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,1Fam,1.5Fin,6,8,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [55]:
# COLUMN:		bldg_type
# DEFINITION:	Type of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bldg_type.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['1Fam', 'TwnhsE', 'Twnhs', '2fmCon', 'Duplex'], dtype=object)

In [56]:
bldg_type = {'1Fam':'0', 'TwnhsE':'1', 'Twnhs':'2', 'Duplex':'3', '2fmCon':'4'}
df.bldg_type = [bldg_type[item] for item in df.bldg_type] 
df.bldg_type.unique()


array(['0', '1', '2', '4', '3'], dtype=object)

In [57]:
df["bldg_type"] = df["bldg_type"].astype(dtype=np.int)
df['bldg_type'].dtype


dtype('int64')

In [58]:
# COLUMN:		house_style 
# DEFINITION:	Style of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.house_style.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['2Story', '1Story', '1.5Fin', 'SFoyer', 'SLvl', '2.5Unf', '2.5Fin',
       '1.5Unf'], dtype=object)

In [59]:
house_style = {'1Story':'0', '2Story':'1', 'SLvl':'2', '1.5Fin':'3', 'SFoyer':'4', '2.5Unf':'5', '1.5Unf':'6','2.5Fin':'7'}
df.house_style = [house_style[item] for item in df.house_style] 
df.house_style.unique()


array(['1', '0', '3', '4', '2', '5', '7', '6'], dtype=object)

In [60]:
df["house_style"] = df["house_style"].astype(dtype=np.int)
df['house_style'].dtype


dtype('int64')

In [61]:
# COLUMN:		overall_qual
# DEFINITION:	Rates the overall material and finish of the
# house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.overall_qual.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 6,  7,  5,  8, 10,  4,  9,  3,  2,  1])

In [62]:
# COLUMN:		overall_cond        
# DEFINITION:	Rates the overall condition of the house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. overall_cond.unique()
# EVALUATION: REDUNDANT INFORMATION. COLUMN DOES NOT PERTAIN TO HOUSING PRICE, COLUMN
# TO BE DROPPED!


array([8, 5, 7, 6, 3, 9, 2, 4, 1])

In [63]:
df = df.drop(columns=['overall_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1976,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1996,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,1953,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,2006,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,1900,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [64]:
# COLUMN:		year_built
# DEFINITION:	Original construction date
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_built.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! MODELS DON'T DO WELL WITH THIS MANY UNIQUE VALUES.
# DROP COLUMN.


array([1976, 1996, 1953, 2006, 1900, 1966, 2005, 1959, 1952, 1969, 1971,
       1880, 1999, 2007, 2004, 1916, 1963, 1977, 2009, 1968, 2000, 1992,
       1955, 1961, 1965, 1937, 1895, 1949, 1981, 1929, 1995, 1958, 1973,
       1994, 1978, 1954, 1935, 1941, 1931, 2003, 1928, 1970, 1951, 1920,
       1930, 1924, 1927, 1960, 1925, 1910, 2008, 1915, 1997, 1956, 1979,
       1964, 2001, 1972, 1957, 1939, 1962, 1947, 1940, 1932, 1967, 1993,
       1875, 1912, 2010, 1987, 1918, 1988, 1922, 1926, 1984, 1942, 1890,
       2002, 1975, 1998, 1936, 1938, 1985, 1923, 1948, 1950, 1980, 1991,
       1917, 1986, 1946, 1885, 1914, 1896, 1983, 1921, 1945, 1901, 1990,
       1974, 1913, 1905, 1982, 1919, 1872, 1892, 1934, 1879, 1893, 1898,
       1911, 1908, 1989])

In [65]:
df = df.drop(columns=['year_built'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,2005,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1997,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,2007,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,1993,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [66]:
# COLUMN:		year_remod_add      
# DEFINITION:	Remodel date (same as construction date if no remodeling or additions)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_remod_add.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2005, 1997, 2007, 1993, 2002, 2006, 1959, 1952, 1969, 1971, 2000,
       1950, 1963, 1977, 2009, 1968, 1955, 1961, 1995, 1981, 1996, 2008,
       1958, 1973, 1994, 1965, 1978, 1954, 1960, 2004, 1970, 1951, 1975,
       1953, 2001, 2010, 2003, 1979, 1964, 1956, 1972, 1957, 1992, 1962,
       1998, 1990, 1967, 1985, 1987, 1988, 1976, 1984, 1999, 1966, 1980,
       1989, 1991, 1986, 1982, 1983, 1974])

In [67]:
df = df.drop(columns=['year_remod_add'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,Gable,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,Gable,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,Gable,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,Gable,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [68]:
# COLUMN:		roof_style         
# DEFINITION:	Type of roof
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_style.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Gable', 'Hip', 'Flat', 'Mansard', 'Shed', 'Gambrel'], dtype=object)

In [69]:
df = df.drop(columns=['roof_style'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,CompShg,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,CompShg,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,CompShg,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,CompShg,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,CompShg,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [70]:
# COLUMN:		roof_matl                  
# DEFINITION:	Roof material
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_matl.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['CompShg', 'WdShngl', 'Tar&Grv', 'WdShake', 'Membran', 'ClyTile'],
      dtype=object)

In [71]:
df = df.drop(columns=['roof_matl'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,HdBoard,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,VinylSd,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,VinylSd,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,VinylSd,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,Wd Sdng,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [72]:
# COLUMN:		exterior_first                        
# DEFINITION:	Exterior covering on house
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_first.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['HdBoard', 'VinylSd', 'Wd Sdng', 'BrkFace', 'Plywood', 'MetalSd',
       'AsbShng', 'CemntBd', 'WdShing', 'Stucco', 'BrkComm', 'Stone',
       'CBlock', 'ImStucc', 'AsphShn'], dtype=object)

In [73]:
df = df.drop(columns=['exterior_first'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,Plywood,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,VinylSd,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,VinylSd,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,VinylSd,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,Plywood,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [74]:
# COLUMN:		exterior_second                              
# DEFINITION:	Exterior covering on house (if more than one
# material)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_second.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Plywood', 'VinylSd', 'Wd Sdng', 'HdBoard', 'MetalSd', 'AsbShng',
       'CmentBd', 'Wd Shng', 'BrkFace', 'Stucco', 'Brk Cmn', 'ImStucc',
       'Stone', 'CBlock', 'AsphShn'], dtype=object)

In [75]:
df = df.drop(columns=['exterior_second'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,BrkFace,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,BrkFace,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [76]:
# COLUMN:		mas_vnr_type                                     
# DEFINITION:	Masonry veneer type
# DATA TYPE:	object
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['BrkFace', 'None', nan, 'Stone', 'BrkCmn'], dtype=object)

In [77]:
df = df.drop(columns=['mas_vnr_type'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,289.0,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,132.0,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0.0,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0.0,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0.0,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [78]:
# COLUMN:		mas_vnr_area                                          
# DEFINITION:	Masonry veneer area in square feet
# DATA TYPE:	float64
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([2.890e+02, 1.320e+02, 0.000e+00, 8.200e+01, 1.800e+02, 1.920e+02,
       2.320e+02, 4.560e+02, 1.480e+02,       nan, 3.000e+00, 3.360e+02,
       4.800e+02, 3.200e+02, 1.040e+02, 3.350e+02, 4.000e+01, 1.100e+02,
       1.060e+02, 5.130e+02, 1.840e+02, 5.220e+02, 1.430e+02, 3.480e+02,
       1.170e+02, 5.100e+02, 1.450e+02, 9.600e+01, 6.680e+02, 5.000e+01,
       2.280e+02, 6.500e+01, 3.610e+02, 7.480e+02, 1.970e+02, 5.720e+02,
       1.280e+02, 1.200e+02, 2.540e+02, 8.600e+01, 3.000e+01, 5.400e+01,
       2.460e+02, 3.970e+02, 2.960e+02, 1.440e+02, 9.020e+02, 2.610e+02,
       2.600e+02, 2.750e+02, 5.700e+01, 1.050e+03, 3.590e+02, 1.080e+02,
       6.620e+02, 5.000e+02, 2.100e+02, 1.650e+02, 2.080e+02, 3.600e+02,
       1.600e+02, 6.400e+01, 2.240e+02, 2.060e+02, 1.160e+02, 6.510e+02,
       5.040e+02, 2.520e+02, 3.370e+02, 8.400e+01, 3.090e+02, 4.660e+02,
       6.000e+02, 1.890e+02, 3.680e+02, 1.980e+02, 1.400e+02, 9.220e+02,
       1.600e+01, 1.800e+01, 9.000e+01, 4.250e+02, 

In [79]:
df = df.drop(columns=['mas_vnr_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,Gd,TA,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,Gd,TA,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,TA,Gd,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,TA,TA,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,TA,TA,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [80]:
# COLUMN:		exter_qual                                                   
# DEFINITION:	Evaluates the quality of the material on the
# exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Gd', 'TA', 'Ex', 'Fa'], dtype=object)

In [81]:
exter_qual = {'TA':'0', 'Gd':'1', 'Ex':'2', 'Fa':'3'}
df.exter_qual = [exter_qual[item] for item in df.exter_qual] 
df.exter_qual.unique()


array(['1', '0', '2', '3'], dtype=object)

In [82]:
df["exter_qual"] = df["exter_qual"].astype(dtype=np.int)
df['exter_qual'].dtype


dtype('int64')

In [83]:
# COLUMN:		exter_cond                                                           
# DEFINITION:	Evaluates the present condition of the material on the exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_cond.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES, SIMILIAR TO exter_qual! DROPPING IN FAVOR OF
# exter_qual!


array(['TA', 'Gd', 'Fa', 'Ex', 'Po'], dtype=object)

In [84]:
df = df.drop(columns=['exter_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,CBlock,TA,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,PConc,Gd,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,CBlock,TA,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,PConc,Gd,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,PConc,Fa,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [85]:
# COLUMN:		foundation                                                                    
# DEFINITION:	Type of foundation
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.foundation.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['CBlock', 'PConc', 'BrkTil', 'Slab', 'Stone', 'Wood'], dtype=object)

In [86]:
foundation  = {'CBlock':'0', 'PConc':'1', 'Slab':'2', 'BrkTil':'3', 'Stone':'4', 'Wood':'5'}
df.foundation  = [foundation [item] for item in df.foundation ] 
df.foundation .unique()


array(['0', '1', '3', '2', '4', '5'], dtype=object)

In [87]:
df["foundation"] = df["foundation"].astype(dtype=np.int)
df['foundation'].dtype


dtype('int64')

In [88]:
# COLUMN:		bsmt_qual                                                                              
# DEFINITION:	Evaluates the height of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df. bsmt_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array(['TA', 'Gd', 'Fa', nan, 'Ex', 'Po'], dtype=object)

In [89]:
df = df.drop(columns=['bsmt_qual'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,TA,No,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,TA,No,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,TA,No,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,TA,No,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,Gd,No,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [90]:
# COLUMN:		bsmt_cond                                                                                        
# DEFINITION:	Evaluates the general condition of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmt_cond.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER
# MISSING VALUES!


array(['TA', 'Gd', nan, 'Fa', 'Po', 'Ex'], dtype=object)

In [91]:
bsmt_cond = {np.nan:'0', 'Gd':'1', 'TA':'2', 'Po':'3', 'Fa':'4', 'Ex':'5'}
df.bsmt_cond = [bsmt_cond[item] for item in df.bsmt_cond] 
df.bsmt_cond.unique()


array(['2', '1', '0', '4', '3', '5'], dtype=object)

In [92]:
df['bsmt_cond'] = pd.to_numeric(df['bsmt_cond'], errors='coerce')
df['bsmt_cond'].dtype


dtype('int64')

In [93]:
# COLUMN:		bsmt_exposure                                                                                              
# DEFINITION:	Refers to walkout or garden level walls
# DATA TYPE:	object
# MISSING VALUES:	58
# UNIQUE VALUES:	
df.bsmt_exposure.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['No', 'Gd', 'Av', nan, 'Mn'], dtype=object)

In [94]:
df=df.drop(columns=['bsmt_exposure'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,GLQ,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,GLQ,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,GLQ,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,Unf,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,Unf,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [95]:
# COLUMN:		bsmtfin_type_1                                                                                                   
# DEFINITION:	Rating of basement finished areawalls
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmtfin_type_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['GLQ', 'Unf', 'ALQ', 'Rec', nan, 'BLQ', 'LwQ'], dtype=object)

In [96]:
df=df.drop(columns=['bsmtfin_type_1'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,533.0,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,637.0,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,731.0,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,0.0,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,0.0,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [97]:
# COLUMN:		bsmtfin_sf_1                                                                                                        
# DEFINITION:	Type 1 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([5.330e+02, 6.370e+02, 7.310e+02, 0.000e+00, 2.470e+02, 5.470e+02,
       1.000e+03, 2.920e+02, 6.500e+02, 3.870e+02, 3.930e+02, 8.130e+02,
       2.400e+01, 6.510e+02, 4.920e+02, 1.158e+03, 9.350e+02, 1.056e+03,
       1.312e+03, 5.530e+02, 6.060e+02, 1.104e+03, 4.370e+02, 4.410e+02,
       1.288e+03, 5.480e+02, 7.050e+02, 9.160e+02, 4.200e+02, 8.300e+02,
       1.386e+03, 1.097e+03, 9.060e+02, 2.100e+02, 4.080e+02, 3.540e+02,
       3.530e+02, 6.220e+02, 7.900e+02, 1.760e+02, 3.710e+02, 3.680e+02,
       4.860e+02, 8.500e+01, 1.380e+02, 5.240e+02, 6.400e+01, 1.092e+03,
       3.600e+02, 1.720e+02, 2.060e+02, 2.460e+02, 1.600e+01, 1.346e+03,
       7.000e+02, 6.550e+02, 4.430e+02, 1.680e+02, 1.904e+03, 2.400e+02,
       8.640e+02, 4.150e+02, 8.330e+02, 3.770e+02, 2.800e+02, 8.280e+02,
       7.620e+02, 3.600e+01, 1.014e+03, 6.000e+01, 5.880e+02, 4.380e+02,
       1.153e+03, 5.270e+02, 1.337e+03, 3.480e+02, 1.044e+03, 6.900e+02,
       9.620e+02, 6.410e+02, 1.110e+03, 4.210e+02, 

In [98]:
df=df.drop(columns=['bsmtfin_sf_1'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,Unf,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,Unf,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,Unf,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,Unf,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,Unf,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [99]:
# COLUMN:		bsmtfin_type_2                                                                                                             
# DEFINITION:	Rating of basement finished area (if multiple types)
# DATA TYPE:	object
# MISSING VALUES:	56
# UNIQUE VALUES:	
df.bsmtfin_type_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Unf', 'Rec', nan, 'BLQ', 'GLQ', 'LwQ', 'ALQ'], dtype=object)

In [100]:
df=df.drop(columns=['bsmtfin_type_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,0.0,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,0.0,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,0.0,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,0.0,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,0.0,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [101]:
# COLUMN:		bsmtfin_sf_2                                                                                                                   
# DEFINITION:	Type 2 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([   0.,  713.,  117., 1057.,  173.,  290.,  420.,  469.,   42.,
        955.,  483.,  147.,  596.,  206.,  382.,  546.,  202.,  228.,
        661.,  279.,  106.,  321.,  232.,  956.,  670.,  915.,  116.,
       1080.,   80.,  215.,  144.,  590.,  149.,  281.,  297.,  612.,
        468.,  891.,  622.,  507.,  432.,  852.,  108.,  128.,  294.,
       1061.,  127.,  712.,  125.,  324.,  252.,  247.,   72.,  150.,
        906.,  555.,   38.,  180.,   64.,  288.,  311.,  227.,  842.,
        620.,  181.,  162.,  354.,  539.,  551.,  110.,  219.,  547.,
        186.,  774.,  123.,  613.,  167.,  230.,  495.,  208.,  308.,
        604.,  154.,  334.,  417.,  624.,  442.,  497.,  211., 1474.,
        532.,  132.,  829., 1127.,  435.,  174.,  105.,  375.,  608.,
       1039., 1063.,  264.,  270.,  259.,  531.,  488.,  500.,   41.,
        177.,  169.,  344.,  869.,  182.,  768.,  119.,  619.,  345.,
        645.,  278.,  113.,  466.,  522.,    6.,  377.,   92.,  859.,
        479.,  239.,

In [102]:
df=df.drop(columns=['bsmtfin_sf_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,192.0,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,276.0,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,326.0,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,384.0,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,676.0,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [103]:
# COLUMN:		bsmt_unf_sf                                                                                                                         
# DEFINITION:	Unfinished square feet of basement area
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmt_unf_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 192.,  276.,  326.,  384.,  676.,  557.,    0.,  188.,  632.,
        390.,   96.,  815.,  147., 1327., 1430.,  624.,  470.,  660.,
        732.,  402.,  343., 1209.,  233.,  141.,  224.,  336.,  957.,
        672., 1420.,  792.,  507.,  417.,  160.,   36., 1139.,  570.,
        136.,  690., 1050.,  134.,  398.,  156.,  776.,  412.,  764.,
        403.,  500.,  133.,  370.,  216.,  292.,  190.,  450.,  778.,
        628., 1120.,  328.,  576.,  727.,  914.,  285.,  976., 1802.,
       1346.,  162., 1140.,  970., 1616.,  768.,  872.,  520.,  662.,
        936.,  448.,  312.,  876.,  325., 1251.,  551.,  588.,  320.,
        599.,   98.,  122.,  138., 1081.,  550.,  250.,  408.,  547.,
        180.,  245.,  114.,  191.,   32.,  595.,  269.,  978., 1078.,
       1116.,  308., 1290.,  587.,  107.,  706.,  565.,  161., 1008.,
        584.,  707.,  637.,  113.,  367.,  677.,  466.,  100.,  396.,
         25.,  780., 1530., 1528.,  744.,  381.,  218.,  610.,  459.,
        606.,  144.,

In [104]:
df=df.drop(columns=['bsmt_unf_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,725.0,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,913.0,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1057.0,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,384.0,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,676.0,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [105]:
# COLUMN:		total_bsmt_sf                                                                                                                             
# DEFINITION:	Total square feet of basement areaarea
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.total_bsmt_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 725.,  913., 1057.,  384.,  676., 1517.,  547., 1188.,  924.,
       1040.,  483., 1208.,    0.,  960., 1351., 1430.,  624., 1121.,
        660.,  756.,  894., 1501., 1209., 1168., 1056., 1453.,  942.,
        957.,  672., 2524.,  792., 1494.,  948., 1705.,  160.,  757.,
       1844., 1242.,  990., 1256., 2076., 1097., 1050.,  608.,  984.,
        776.,  764., 1122., 1392.,  546.,  216.,  663.,  600.,  936.,
        916., 1152., 1184., 1420., 1382., 1120.,  531.,  976., 1802.,
       1362., 1508., 1840.,  655.,  970., 1616.,  768.,  872.,  520.,
       1105.,  616., 2216.,  876.,  864., 1189., 1666., 1176.,  928.,
       1288., 1427.,  860.,  754., 1141., 1138., 1561., 1342., 1884.,
        528., 1426.,  245.,  804., 1153.,  673.,  714.,  690.,  978.,
       1078., 1116., 1140., 1466., 1614.,  988.,  996., 1202., 1008.,
        720.,  994.,  707.,  637.,  813., 1055.,  855., 1719.,  780.,
       1554., 1528.,  982., 1172., 1642.,  884., 1422.,  985.,  888.,
       1992., 1478.,

In [106]:
df=df.drop(columns=['total_bsmt_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,GasA,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,GasA,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,GasA,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,GasA,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,GasA,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [107]:
# COLUMN:		heating                                                                                                                                         
# DEFINITION:	Type of heating
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['GasA', 'GasW', 'Grav', 'Wall', 'OthW'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [108]:
df = df.drop(columns=['heating'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,Ex,Y,SBrkr,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,Ex,Y,SBrkr,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,TA,Y,SBrkr,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,Gd,Y,SBrkr,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,TA,Y,SBrkr,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [109]:
# COLUMN:		heating_qc                                                                                                                                                  
# DEFINITION:	Heating quality and condition
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating_qc.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Ex', 'TA', 'Gd', 'Fa', 'Po'], dtype=object)

In [110]:
heating_qc = {'Fa':'0', 'TA':'1', 'Gd':'2', 'Ex':'3', 'Po':'4'}
df.heating_qc = [heating_qc[item] for item in df.heating_qc] 
df.heating_qc.unique()


array(['3', '1', '2', '0', '4'], dtype=object)

In [111]:
df["heating_qc"] = df["heating_qc"].astype(dtype=np.int)
df['heating_qc'].dtype


dtype('int64')

In [112]:
# COLUMN:		central_air                                                                                                                                                          
# DEFINITION:	Central air conditioning
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.central_air.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Y', 'N'], dtype=object)

In [113]:
central_air = {'Y':'0', 'N':'1'}
df.central_air = [central_air[item] for item in df.central_air] 
df.central_air.unique()


array(['0', '1'], dtype=object)

In [114]:
df["central_air"] = df["central_air"].astype(dtype=np.int)
df['central_air'].dtype


dtype('int64')

In [115]:
# COLUMN:		electrical                                                                                                                                                                   
# DEFINITION:	Electrical system
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. electrical.unique()


array(['SBrkr', 'FuseF', 'FuseA', 'FuseP', 'Mix'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [116]:
df = df.drop(columns=['electrical'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,725,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,913,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1057,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,744,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,831,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [117]:
# COLUMN:		first_flr_sf                                                                                                                                                                                     
# DEFINITION:	First Floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.first_flr_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([ 725,  913, 1057,  744,  831, 1888, 1072, 1188,  924, 1040,  483,
       1208, 1288,  962, 1361, 1430,  624, 1121, 1285,  764,  894, 1659,
       1209, 1187, 1056, 1453, 1265, 1034,  672, 2524,  792, 1494,  948,
       1718, 1142,  925, 1844, 1242,  990, 1256, 2076, 1110, 1050,  983,
        984,  851, 1063,  768, 1328, 1392,  546, 1575,  663,  600,  936,
        916, 1164, 1184, 1483,  960, 1382, 1120,  567,  976, 1802, 1506,
       1508, 2032, 1194,  970, 1616,  872,  520, 1105,  616, 2234,  876,
        864, 1189, 1666, 1200,  928, 1336, 1427, 1212,  754, 1535, 1152,
       1151, 1138, 1074, 1561, 1358, 1884,  605, 1671,  797,  804, 1193,
        673, 1664, 1390,  868, 1422, 1128, 1116,  660, 1707, 1466, 1638,
       1721,  996, 1202,  988, 1028,  720, 1599,  707,  959,  813, 1092,
       1055,  676,  855, 1719,  780, 1554, 1528, 1008, 1172, 1418,  495,
        985,  888, 1992, 1478,  950,  912, 1326, 1204,  798, 1236, 1416,
        946, 1088,  816,  536,  572, 1126,  992,  7

In [118]:
df = df.drop(columns=['first_flr_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,754,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1209,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,700,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,614,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [119]:
# COLUMN:		second_flr_sf              
# DEFINITION:	Second floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.second_flr_sf .unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([ 754, 1209,    0,  700,  614, 1040,  504,  728,  645,  720,  783,
       1044,  673,  957,  252,  725,  550,  745,  890,  620,  651,  862,
        756,  653, 1392,  546,  689,  600, 1106, 1426,  780,  531,  732,
        739, 1169,  665,  616,  540,  786, 1368, 1157,  709,  690,  445,
        836,  564, 1345,  707,  650,  712,  676,  601,  240, 1862,  881,
        408, 1427,  857, 1074,  876, 1045,  813,  927,  576,  539,  586,
        624, 1182,  702, 1093,  884,  743,  941, 1312, 1384, 1296,  492,
        462,  981, 1277, 1254,  272,  750,  608,  656, 1370,  595,  797,
        960,  549,  703, 1250,  453,  561,  685,  636,  872,  886,  840,
        829,  670,  795,  505,  698,  537,  864,  804,  704,  412,  924,
        896,  376,  438, 1371, 1089,  755,  589, 1158,  980, 1038,  517,
        925,  602,  887,  741,  348,  390, 1036,  672,  475,  464, 1194,
        530,  701,  929,  584,  465,  319,  563,  695,  668,  582,  545,
       1005,  757,  585,  684,  787,  888,  336,  5

In [120]:
df = df.drop(columns=['second_flr_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [121]:
# COLUMN:		low_qual_fin_sf              
# DEFINITION:	Low quality finished square feet (all floors)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.low_qual_fin_sf.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([   0,  205,   80,  528,  513,  371,  473,  108, 1064,  515,  120,
        312,  572,  234,  390,  697,  114,  512,  144,  514,  397,  140,
        479,  259,  436,  156,  384,  360,   53,  362,  450])

In [122]:
df = df.drop(columns=['low_qual_fin_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,1479,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,2122,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1057,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,1444,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,1445,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [123]:
# COLUMN:		gr_liv_area                       
# DEFINITION:	Above grade (ground) living area square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.gr_liv_area.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([1479, 2122, 1057, ..., 2668, 1913, 1804])

In [124]:
df = df.drop(columns=['gr_liv_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0.0,0.0,2,1,3,1,Gd,6,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1.0,0.0,2,1,4,1,Gd,8,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1.0,0.0,1,0,3,1,Gd,5,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0.0,0.0,2,1,3,1,TA,7,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0.0,0.0,2,0,3,1,TA,6,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [125]:
# COLUMN:		bsmt_full_bath                           
# DEFINITION:	Basement full bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_full_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array([ 0.,  1.,  2., nan,  3.])

In [126]:
df = df.dropna(subset=['bsmt_full_bath'])


In [127]:
df["bsmt_full_bath"] = df["bsmt_full_bath"].astype(dtype=np.int)
df['bsmt_full_bath'].dtype


dtype('int64')

In [128]:
df.bsmt_full_bath.unique()


array([0, 1, 2, 3])

In [129]:
# COLUMN:		bsmt_half_bath                              
# DEFINITION:	Basement half bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_half_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array([0., 1., 2.])

In [130]:
df = df.dropna(subset=['bsmt_half_bath'])


In [131]:
df["bsmt_half_bath"] = df["bsmt_half_bath"].astype(dtype=np.int)
df['bsmt_half_bath'].dtype


dtype('int64')

In [132]:
df.bsmt_half_bath.unique()


array([0, 1, 2])

In [133]:
# COLUMN:		full_bath                                        
# DEFINITION:	Full bathrooms above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.full_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2, 1, 3, 0, 4])

- total_rms DOES NOT INCLUDE BATHROOMS! BELOW IS THE ADDITION OF BATHROOMS TO total_rms.

In [134]:
df['totrms_abvgrd'] = df['totrms_abvgrd'] + df['full_bath']


In [135]:
# COLUMN:		half_bath                                                  
# DEFINITION:	Half baths above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.half_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([1, 0, 2])

- total_rms DOES NOT INCLUDE BATHROOMS! BELOW IS THE ADDITION OF BATHROOMS TO total_rms.

In [136]:
df['totrms_abvgrd'] = df['totrms_abvgrd'] + df['half_bath']


In [137]:
# COLUMN:		bedroom_abvgr                                                  
# DEFINITION:	Bedrooms above grade (does NOT include basement bedrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bedroom_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([3, 4, 2, 5, 1, 0, 6, 8])

In [138]:
# COLUMN:		kitchen_abvgr                                                        
# DEFINITION:	Kitchens above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. kitchen_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([1, 2, 0, 3])

In [139]:
# COLUMN:		kitchen_qual                                                              
# DEFINITION:	Kitchen quality
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.kitchen_qual.unique()
# EVALUATION: COLUMN DATA TYPE NEEDS TO BE CHANGED, NO MISSING
# VALUES!


array(['Gd', 'TA', 'Fa', 'Ex'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [140]:
df = df.drop(columns=['kitchen_qual'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,Typ,0,,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,Typ,1,TA,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,Typ,0,,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,Typ,0,,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,Typ,0,,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [141]:
# COLUMN:		totrms_abvgrd                                                                     
# DEFINITION:	Total rooms above grade (does not include bathrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.totrms_abvgrd.unique()
# EVALUATION: COLUMN DATA TYPE IS ACCEPTABLE, NO MISSING VALUES.
# WILL INCLUDE FULL AND HALF BATH ROOMS. 


array([ 9, 11,  6, 10,  8,  7, 12,  5, 16, 14, 13, 15,  4,  3, 18])

In [142]:
# COLUMN:		functional                                                                             
# DEFINITION:	Home functionality (Assume typical unless deductions are warranted)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.functional.unique()


array(['Typ', 'Mod', 'Min2', 'Maj1', 'Min1', 'Sev', 'Sal', 'Maj2'],
      dtype=object)

In [143]:
functional  = {'Typ':'0', 'Mod':'1', 'Min2':'2', 'Maj1':'3', 'Min1':'4', 'Sev':'5', 'Sal':'6', 'Maj2':'7'}
df.functional  = [functional [item] for item in df.functional] 
df.functional.unique()


array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype=object)

In [144]:
df["functional"] = df["functional"].astype(dtype=np.int)
df['functional'].dtype


dtype('int64')

In [145]:
# COLUMN:		fireplaces                                                                                       
# DEFINITION:	Number of fireplaces
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.fireplaces.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([0, 1, 2, 4, 3])

In [146]:
# COLUMN:		fireplace_qu                                                                                              
# DEFINITION:	Fireplace quality
# DATA TYPE:	object
# MISSING VALUES:	1000
# UNIQUE VALUES:	
df.fireplace_qu.unique()


array([nan, 'TA', 'Gd', 'Po', 'Ex', 'Fa'], dtype=object)

In [147]:
df = df.drop(columns=['fireplace_qu'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,Attchd,1976.0,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,Attchd,1997.0,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,Detchd,1953.0,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,BuiltIn,2007.0,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,Detchd,1957.0,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [148]:
# COLUMN:		garage_type                                                                                                      
# DEFINITION:	Garage location
# DATA TYPE:	object
# MISSING VALUES:	113
# UNIQUE VALUES:	
df.garage_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Attchd', 'Detchd', 'BuiltIn', 'Basment', nan, '2Types', 'CarPort'],
      dtype=object)

In [149]:
garage_type = {np.nan:'0', 'Attchd':'1', 'Detchd':'2', 'BuiltIn':'3', 'CarPort':'4', 'Basment':'5', '2Types':'6'}
df.garage_type = [garage_type[item] for item in df.garage_type] 
df.garage_type.unique()


array(['1', '2', '3', '5', '0', '6', '4'], dtype=object)

In [150]:
df['garage_type'] = pd.to_numeric(df['garage_type'], errors='garage_type')
df['garage_type'].dtype


dtype('int64')

In [151]:
# COLUMN:		garage_yr_blt                                                                                                           
# DEFINITION:	Year garage was built
# DATA TYPE:	float64
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_yr_blt.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1976., 1997., 1953., 2007., 1957., 1966., 2005., 1959., 1952.,
       1969., 1971., 1900., 2000., 2004., 1916., 1963., 1977., 2009.,
       1968., 1992., 1955., 1961., 1973., 1937.,   nan, 2003., 1981.,
       1931., 1995., 1958., 1965., 2006., 1978., 1954., 1935., 1951.,
       1996., 1999., 1920., 1930., 1924., 1960., 1949., 1986., 1956.,
       1994., 1979., 1964., 2001., 1972., 1939., 1962., 1927., 1948.,
       1967., 1993., 2010., 1915., 1987., 1970., 1988., 1982., 1941.,
       1984., 1942., 1950., 2002., 1975., 2008., 1974., 1998., 1918.,
       1938., 1985., 1923., 1980., 1991., 1946., 1940., 1990., 1896.,
       1983., 1914., 1945., 1921., 1925., 1926., 1936., 1932., 1947.,
       1929., 1910., 1917., 1922., 1934., 1989., 1928., 2207., 1933.,
       1895., 1919.])

In [152]:
df = df.drop(columns=['garage_yr_blt'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,RFn,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,RFn,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,Unf,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,Fin,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,Unf,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [153]:
# COLUMN:		garage_finish                                                                                                                
# DEFINITION:	Interior finish of the garage
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_finish.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['RFn', 'Unf', 'Fin', nan], dtype=object)

In [154]:
df = df.drop(columns=['garage_finish'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2.0,475.0,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2.0,559.0,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1.0,246.0,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2.0,400.0,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2.0,484.0,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [155]:
# COLUMN:		garage_cars                                                                                                                      
# DEFINITION:	Size of garage in car capacity
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_cars.unique()


array([ 2.,  1.,  3.,  0.,  4.,  5., nan])

In [156]:
df = df.dropna(subset=['garage_cars'])


In [157]:
df["garage_cars"] = df["garage_cars"].astype(dtype=np.int)
df['garage_cars'].dtype


dtype('int64')

In [158]:
df.garage_cars.unique()


array([2, 1, 3, 0, 4, 5])

In [159]:
# COLUMN:		garage_area                                                                                                                             
# DEFINITION:	Size of garage in square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 475.,  559.,  246.,  400.,  484.,  578.,  525.,  531.,  420.,
        504.,  264.,  632.,  576.,  480.,  610.,  624.,  513.,  528.,
        614.,  450.,  499.,  575.,  572.,  530.,  336.,  240.,    0.,
        542.,  481.,  410.,  826.,  384.,  546.,  276.,  850.,  602.,
        352.,  786.,  660.,  270.,  280.,  474.,  440.,  564.,  299.,
        293.,  386.,  671.,  550.,  690.,  225.,  350.,  216.,  380.,
        843.,  539.,  834.,  322., 1166.,  720.,  392.,  555.,  252.,
        502.,  516.,  608.,  495.,  396.,  556.,  725.,  670.,  560.,
        501.,  490.,  286.,  360.,  479.,  626.,  470.,  304.,  864.,
        403.,  579.,  288.,  473.,  627.,  758.,  431.,  260.,  366.,
        852.,  672.,  486.,  656.,  716.,  442.,  297.,  388.,  461.,
        447.,  619.,  308.,  506.,  319.,  676.,  312.,  478.,  342.,
        393.,  983.,  923.,  487.,  543.,  453.,  541.,  754.,  666.,
        529.,  714.,  968.,  788.,  812.,  600.,  483.,  300.,  430.,
        230.,  505.,

In [160]:
df = df.drop(columns=['garage_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,TA,TA,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,TA,TA,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,TA,TA,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,TA,TA,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,TA,TA,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [161]:
# COLUMN:		garage_qual                                                                                                                                     
# DEFINITION:	Garage quality
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_qual.unique()
# COLUMN APPEARS TO BE THE SAME AS garage_cond. DROPPING
# garage_cond AND KEEPING garage_qual. CHANGE DATA TYPE,
# ADDRESS MISSING VALUES!


array(['TA', 'Fa', nan, 'Gd', 'Ex', 'Po'], dtype=object)

In [162]:
garage_qual = {np.nan:'0', 'TA':'1', 'Fa':'2', 'Ex':'3', 'Gd':'4', 'Po':'5'}
df.garage_qual = [garage_qual[item] for item in df.garage_qual] 
df.garage_qual.unique()


array(['1', '2', '0', '4', '3', '5'], dtype=object)

In [163]:
df['garage_qual'] = pd.to_numeric(df['garage_qual'], errors='coerce')
df['garage_qual'].dtype


dtype('int64')

In [164]:
# COLUMN:		garage_cond                                                                                                                                            
# DEFINITION:	Garage condition
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_cond.unique()


array(['TA', 'Fa', nan, 'Po', 'Gd', 'Ex'], dtype=object)

In [165]:
df = df.drop(columns=['garage_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,Y,0,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,Y,0,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,Y,0,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,Y,100,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,N,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [166]:
# COLUMN:		paved_drive                                                                                                                                                  
# DEFINITION:	Paved driveway
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.paved_drive.unique()


array(['Y', 'N', 'P'], dtype=object)

In [167]:
paved_drive = {'P':'1', 'Y':'1', 'N':'0'}
df.paved_drive = [paved_drive[item] for item in df.paved_drive] 
df.paved_drive.unique()


array(['1', '0'], dtype=object)

In [168]:
df["paved_drive"] = df["paved_drive"].astype(dtype=np.int)
df['paved_drive'].dtype


dtype('int64')

In [169]:
# COLUMN:		wood_deck_sf                                                                                                                                                       
# DEFINITION:	Wood deck area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.wood_deck_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([   0,  100,  335,  105,  169,  138,  212,  243,  483,  264,  416,
        474,  208,  104,  256,  736,  216,  303,  431,  200,  379,  168,
        132,  496,  280,  158,  142,   49,  418,  228,  261,  312,  225,
        140,  227,  203,  307,  214,  125,  153,   88,  230,   50,  276,
        144,  421,  187,  122,   52,  156,  204,  344,  240,  160,  193,
        114,  275,   12,  120,  328,  108,  182,   28,  178,   60,  250,
         38,  324,  646,  176,  248,  112,  306,  143,  302,  257,  146,
        409,  224,  232,  192,  136,  441,   48,  180,  221,  263,   81,
        134,  164,  268,  194,  191,   32,  161,   96,  210,  173,   90,
        106,  172,  393,   63,  196,  246,  288,  385,  237,  205,  188,
        242,   30,   54,  439,  209,   22,  236,  282,   24,   86,  300,
        384,  148,  147,  365,  238,  201,  370,  327,  521,  184,  298,
        315,  371,  128,  211,  218,  270,  390,  133,  135,  206,  342,
        262,  329,   94,  319,  126,  530,  190,   

In [170]:
df = df.drop(columns=['wood_deck_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,44,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,74,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,52,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,59,0,0,0,0,,,,0,3,2010,WD,138500


In [171]:
# COLUMN:		open_porch_sf                                                                                                                                                         
# DEFINITION:	Open porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.open_porch_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 44,  74,  52,   0,  59, 324,  58,  50,  80,  45, 142,  21,  49,
       144,  40, 120,  30,  48,  28,  26, 122,  68, 229,  27,  32, 112,
        36, 172,  84,  57, 364, 105,  20,  46,  60,  75,  41,  11,  72,
        47, 169,  76,  90, 192, 153,  15, 189, 140,  99,  35,  70, 180,
        38,  73,  34, 104,  96, 162, 108, 170, 285,  23, 128, 288,  56,
        54, 136, 299, 154,  64, 158,  88,  63,  33, 160,  66, 100,  37,
        93,  24,  97,  39, 121, 319,  12,  42, 304, 110, 168,  87,  43,
       235, 166,  25,  77, 101,  98,  82, 126, 211, 205, 200, 127,  85,
       208,  55,  69, 228, 198, 131,  95, 133, 130, 341, 124, 152,  22,
       175,  78, 291,  29, 173,  65, 274, 111,  62, 114, 118,  18, 129,
       365, 125, 116,  16,  92, 234, 141, 444, 102,  51, 155, 119,  91,
         8, 251, 258, 226,  81, 182, 103, 146, 113, 238, 213,  94, 150,
       184,  53, 207, 174, 117, 187, 240,  61, 123, 278, 191, 106, 292,
       547, 243, 214,  67, 215, 225, 199, 138, 312, 260, 137, 20

In [172]:
df = df.drop(columns=['open_porch_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,0,0,,,,0,3,2010,WD,138500


In [173]:
# COLUMN:		enclosed_porch                                                                                                                                                              
# DEFINITION:	Enclosed porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.enclosed_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! 
# COMBINE threessn_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.


array([  0,  96, 133,  64, 207, 112, 164, 160, 236, 192,  84, 116,  87,
       143, 194, 156, 168, 144,  94,  36, 100, 120, 130,  16, 128, 176,
       177, 364,  56, 216, 158, 208,  32,  70, 272, 324, 190,  48,  52,
        81,  24, 259, 291, 268, 228,  40, 137, 252, 205, 240, 123, 246,
        30, 180, 114,  45, 172, 115, 265, 264, 211,  90, 280, 150, 134,
        77, 368,  60, 213, 167, 102,  18,  80, 222, 234, 105, 101, 135,
        34, 104, 148, 239, 109,  26,  50, 145, 140, 219, 189, 183, 175,
        66,  75,  72, 198, 122, 432, 330,  44, 162, 296, 200, 244, 214,
       204,  20, 169,  43, 260, 121,  78, 184, 117,  54,  25, 318, 210,
       212, 186, 129, 185,  37, 203, 126,  39,  35, 174, 202, 224, 275,
       196, 161,  92, 138,  55, 218, 225,  88, 165, 170, 294, 249, 154,
        42, 288, 226, 136, 231, 113,  68, 301,  57, 256,  19,  99, 230,
        23,  98,  67])

In [174]:
# df.sort_values(["enclosed_porch"], axis=0, ascending=True, inplace=True)


In [175]:
# print(df['enclosed_porch'])


In [176]:
# COLUMN:		threessn_porch                                                                                                                                                                       
# DEFINITION:	Three season porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.threessn_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS screen_porch. WILL DROP threessn_porch AND
# COMBINE screen_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.
# 

array([  0, 176, 224, 162, 168, 120, 407, 144, 150, 255, 508, 180, 140,
        96, 323, 153,  86, 216, 245, 182, 290, 304])

In [177]:
# df.sort_values(["threessn_porch"], axis=0, ascending=True, inplace=True)


In [178]:
# print(df['threessn_porch'])


- AFTER INVESTIGATION OF THE VALUES IN enclosed_porch & threessn_porch, IT WAS DISCOVERED DATA COULD BE ENGINEERED INTO ONE, enclosed_porch NOW CONTIANS threessn_porch.

In [179]:
df['enclosed_porch']=df['enclosed_porch']+df['threessn_porch']    
    

In [180]:
# print(df['enclosed_porch'])


In [181]:
df = df.drop(columns=['threessn_porch'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,0,,,,0,3,2010,WD,138500


In [182]:
# COLUMN:		screen_porch                                                                                                                                                                             
# DEFINITION:	Screen porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.screen_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS threessn_porch. WILL DROP screen_porch!!!


array([  0, 288, 216, 440, 140, 182, 385, 100, 104, 168, 120, 189, 144,
       126, 224, 201, 252, 348, 147, 192,  53, 260, 164, 143, 342, 150,
       108,  94,  92, 156, 130, 145, 233, 122, 111, 196, 225, 227,  90,
       322, 110, 255, 270, 200, 291, 112, 116, 210, 155, 162, 195, 174,
       266, 163, 142, 480, 175, 152, 410, 153, 271, 220, 165, 135, 141,
       170, 312, 264, 217, 161, 208,  84, 490, 180, 160, 198, 240, 148,
        64, 222, 113, 109, 259,  95, 138, 184, 276,  88, 115, 154, 234,
       176, 265, 374, 231, 280, 171, 396, 204, 190])

In [183]:
# df.sort_values(["screen_porch"], axis=0, ascending=True, inplace=True)


In [184]:
# print(df['screen_porch'])


In [185]:
df = df.drop(columns=['screen_porch'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,,,,0,3,2010,WD,138500


In [186]:
# COLUMN:		pool_area                                                                                                                                                                                       
# DEFINITION:	Pool area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.pool_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([  0, 519, 576, 800, 228, 480, 648, 738, 368, 561])

In [187]:
# COLUMN:		pool_qc                                                                                                                                                                                                   
# DEFINITION:	Pool quality
# DATA TYPE:	object
# MISSING VALUES:	2042
# UNIQUE VALUES:	
df.pool_qc.unique()
# EVALUATION: COLUMN DATA TYPE NEEDS TO BE CONVERTED. MISSING
# VALUES, COLUMN TO BE DROPPED!


array([nan, 'Fa', 'Gd', 'Ex', 'TA'], dtype=object)

In [188]:
df = df.drop(columns=['pool_qc'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,,,0,3,2010,WD,138500


In [189]:
# COLUMN:		fence              
# DEFINITION:	Fence quality
# DATA TYPE:	object
# MISSING VALUES:	1651
# UNIQUE VALUES:	
df.fence.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([nan, 'MnPrv', 'GdPrv', 'GdWo', 'MnWw'], dtype=object)

In [190]:
df = df.drop(columns=['fence'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,misc_feature,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,,0,3,2010,WD,138500


In [191]:
# COLUMN:		misc_feature       
# DEFINITION:	Miscellaneous feature not covered in other
#               categories
# DATA TYPE:	object
# MISSING VALUES:	1986
# UNIQUE VALUES:	
df.misc_feature.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. LOTS OF MISSING
# VALUES. MISC_VAL DOES NOT EQUAL VALUES FROM MISC_FEATURE.
# COLUMNS APPEAR TO NOT CORILATE. COLUMN TO BE DROPPED! 


array([nan, 'Shed', 'TenC', 'Gar2', 'Othr', 'Elev'], dtype=object)

In [192]:
df = df.drop(columns=['misc_feature'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,misc_val,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,0,3,2010,WD,138500


In [193]:
# COLUMN:		misc_val            
# DEFINITION:	Value of miscellaneous feature
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.misc_val.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! MISC_VAL DOES NOT EQUAL VALUES FROM MISC_FEATURE.
# COLUMNS APPEAR TO NOT CORILATE. COLUMN TO BE DROPPED!
 

array([    0,   400,   500,  2000,   650,   600,  1200,   480,   700,
         450,  3000, 12500,  4500,   460,  3500,  8300,   455,   300,
        1150,   900,    54,  6500,   800,  1500,  2500,  1300, 17000,
          80])

In [194]:
df = df.drop(columns=['misc_val'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold,sale_type,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,3,2010,WD,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,4,2009,WD,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,1,2010,WD,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,4,2010,WD,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,3,2010,WD,138500


In [195]:
# COLUMN:		mo_sold             
# DEFINITION:	Month Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.mo_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 3,  4,  1,  6,  5,  9,  7,  2, 12, 10, 11,  8])

In [196]:
# COLUMN:		yr_sold             
# DEFINITION:	Year Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.yr_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2010, 2009, 2006, 2007, 2008])

In [197]:
# COLUMN:		sale_type          
# DEFINITION:	Type of sale
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.sale_type.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. NO MISSING
# VALUES!


array(['WD ', 'New', 'COD', 'ConLD', 'Con', 'CWD', 'Oth', 'ConLI',
       'ConLw'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [198]:
df = df.drop(columns=['sale_type'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,3,2010,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,4,2009,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,1,2010,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,4,2010,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,3,2010,138500


In [199]:
# COLUMN:		saleprice                     
# DEFINITION:	Condition of sale
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.saleprice.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([130500, 220000, 109000, 174000, 138500, 190000, 140000, 142000,
       112500, 135000,  85400, 183600, 131000, 200000, 193000, 173500,
        98000, 139000, 143500, 215200, 129000, 278000, 344133, 185000,
       145000, 187500, 198000, 119600, 122900, 230000, 270000, 125000,
       297000, 113500, 127000, 175500, 146000, 147500, 465000, 165500,
       131500, 129500, 257076, 117000, 149000, 128000, 155000, 166000,
       250000,  76000, 158000, 149500, 121000, 136000, 173000, 290000,
       303477, 122250, 153000, 147000, 148500, 130000, 372000, 213490,
       308030, 300000, 159500, 137500, 232000,  93850, 105000,  68500,
       154300, 129850, 114000, 501837, 153900, 160500, 310090, 184900,
       132000, 163000, 183000, 211000, 184000, 118858, 180500, 148000,
       124000, 277500, 350000, 387000,  86000,  44000, 215000, 146500,
       165000, 252000, 150000, 139900, 162900, 160000,  63900, 149900,
       231500, 108000, 120000, 128500, 115000, 110000, 178000, 199500,
      

In [200]:
df['saleprice'].min()


12789

In [201]:
df['saleprice'].max()


611657

**Describe the basic format:**

In [202]:
df.dtypes


ms_subclass       int64
ms_zoning         int64
street            int64
alley             int64
land_contour      int64
utilities         int64
lot_config        int64
neighborhood      int64
condition_1       int64
bldg_type         int64
house_style       int64
overall_qual      int64
exter_qual        int64
foundation        int64
bsmt_cond         int64
heating_qc        int64
central_air       int64
bsmt_full_bath    int64
bsmt_half_bath    int64
full_bath         int64
half_bath         int64
bedroom_abvgr     int64
kitchen_abvgr     int64
totrms_abvgrd     int64
functional        int64
fireplaces        int64
garage_type       int64
garage_cars       int64
garage_qual       int64
paved_drive       int64
enclosed_porch    int64
pool_area         int64
mo_sold           int64
yr_sold           int64
saleprice         int64
dtype: object

In [203]:
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold,saleprice
1,60,0,0,0,0,0,2,11,1,0,1,6,1,0,2,3,0,0,0,2,1,3,1,9,0,0,1,2,1,1,0,0,3,2010,130500
2,60,0,0,0,0,0,2,10,0,0,1,7,1,1,2,3,0,1,0,2,1,4,1,11,0,1,1,2,1,1,0,0,4,2009,220000
3,20,0,0,0,0,0,1,0,0,0,0,5,0,0,2,1,0,1,0,1,0,3,1,6,0,0,2,1,1,1,0,0,1,2010,109000
4,60,0,0,0,0,0,1,23,0,0,1,5,0,1,2,2,0,0,0,2,1,3,1,10,0,0,3,2,1,1,0,0,4,2010,174000
5,50,0,0,0,0,0,1,10,0,0,3,6,0,1,1,1,0,0,0,2,0,3,1,8,0,0,2,2,1,0,0,0,3,2010,138500


In [204]:
df.shape


(2048, 35)

**Export Dataframe:**

In [205]:
df.to_csv('../data/train_clean.csv')
