**Description of the Ames Iowa Housing Data columns:**

SOURCE: https://rdrr.io/cran/AmesHousing/man/ames_raw.html

Order: Observation number

PID: Parcel identification number - can be used with city web site for parcel review.

MS SubClass: Identifies the type of dwelling involved in the sale.

MS Zoning: Identifies the general zoning classification of the sale.

Lot Frontage: Linear feet of street connected to property

Lot Area: Lot size in square feet

Street: Type of road access to property

Alley: Type of alley access to property

Lot Shape: General shape of property

Land Contour: Flatness of the property

Utilities: Type of utilities available

Lot Config: Lot configuration

Land Slope: Slope of property

Neighborhood: Physical locations within Ames city limits (map available)

Condition 1: Proximity to various conditions

Condition 2: Proximity to various conditions (if more than one is present)

Bldg Type: Type of dwelling

House Style: Style of dwelling

Overall Qual: Rates the overall material and finish of the house

Overall Cond: Rates the overall condition of the house

Year Built: Original construction date

Year Remod/Add: Remodel date (same as construction date if no remodeling or additions)

Roof Style: Type of roof

Roof Matl: Roof material

Exterior 1: Exterior covering on house

Exterior 2: Exterior covering on house (if more than one material)

Mas Vnr Type: Masonry veneer type

Mas Vnr Area: Masonry veneer area in square feet

Exter Qual: Evaluates the quality of the material on the exterior

Exter Cond: Evaluates the present condition of the material on the exterior

Foundation: Type of foundation

Bsmt Qual: Evaluates the height of the basement

Bsmt Cond: Evaluates the general condition of the basement

Bsmt Exposure: Refers to walkout or garden level walls

BsmtFin Type 1: Rating of basement finished area

BsmtFin SF 1: Type 1 finished square feet

BsmtFinType 2: Rating of basement finished area (if multiple types)

BsmtFin SF 2: Type 2 finished square feet

Bsmt Unf SF: Unfinished square feet of basement area

Total Bsmt SF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

Central Air: Central air conditioning

Electrical: Electrical system

1st Flr SF: First Floor square feet

2nd Flr SF: Second floor square feet

Low Qual Fin SF: Low quality finished square feet (all floors)

Gr Liv Area: Above grade (ground) living area square feet

Bsmt Full Bath: Basement full bathrooms

Bsmt Half Bath: Basement half bathrooms

Full Bath: Full bathrooms above grade

Half Bath: Half baths above grade

Bedroom: Bedrooms above grade (does NOT include basement bedrooms)

Kitchen: Kitchens above grade

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality (Assume typical unless deductions are warranted)

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

Garage Type: Garage location

Garage Yr Blt: Year garage was built

Garage Finish: Interior finish of the garage

Garage Cars: Size of garage in car capacity

Garage Area: Size of garage in square feet

Garage Qual: Garage quality

Garage Cond: Garage condition

Paved Drive: Paved driveway

Wood Deck SF: Wood deck area in square feet

Open Porch SF: Open porch area in square feet

Enclosed Porch: Enclosed porch area in square feet

3-Ssn Porch: Three season porch area in square feet

Screen Porch: Screen porch area in square feet

Pool Area: Pool area in square feet

Pool QC: Pool quality

Fence: Fence quality

Misc Feature: Miscellaneous feature not covered in other categories

Misc Val: $Value of miscellaneous feature

Mo Sold: Month Sold

Yr Sold: Year Sold

Sale Type: Type of sale

Sale Condition: Condition of sale

**Load packages:**

In [1]:
import numpy as np
import pandas as pd


 **Load the data:**

In [2]:
ames_file_test = '../data/test.csv'


In [3]:
df = pd.read_csv(ames_file_test)


**Change Index type and start at #1:**

In [4]:
df.index = df.index.astype(int, copy = False)


In [5]:
df.index = df.index + 1


In [6]:
df.head(5)


Unnamed: 0,Id,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type
1,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,...,0,0,0,,,,0,4,2006,WD
2,2718,905108090,90,RL,,9662,Pave,,IR1,Lvl,...,0,0,0,,,,0,8,2006,WD
3,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,...,0,0,0,,,,0,9,2006,New
4,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,...,0,0,0,,,,0,7,2007,WD
5,625,535105100,20,RL,,9500,Pave,,IR1,Lvl,...,0,185,0,,,,0,7,2009,WD


**Describe the basic format:**

In [7]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


In [8]:
df.dtypes


Id                   int64
PID                  int64
MS SubClass          int64
MS Zoning           object
Lot Frontage       float64
Lot Area             int64
Street              object
Alley               object
Lot Shape           object
Land Contour        object
Utilities           object
Lot Config          object
Land Slope          object
Neighborhood        object
Condition 1         object
Condition 2         object
Bldg Type           object
House Style         object
Overall Qual         int64
Overall Cond         int64
Year Built           int64
Year Remod/Add       int64
Roof Style          object
Roof Matl           object
Exterior 1st        object
Exterior 2nd        object
Mas Vnr Type        object
Mas Vnr Area       float64
Exter Qual          object
Exter Cond          object
Foundation          object
Bsmt Qual           object
Bsmt Cond           object
Bsmt Exposure       object
BsmtFin Type 1      object
BsmtFin SF 1         int64
BsmtFin Type 2      object
B

In [9]:
df.shape


(879, 80)

**Drop unwanted columns:**

* Preliminary look through data, more columns to be dropped in Analyze & Evaluate section.

In [10]:
df.drop(columns=['PID', 'Id'], inplace=True)


- 'PID' & 'Id' column are for identification purposes only. Not relevant to predicting housing costs.

In [11]:
df.head(5)


Unnamed: 0,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,Utilities,Lot Config,Land Slope,Neighborhood,Condition 1,Condition 2,Bldg Type,House Style,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Roof Style,Roof Matl,Exterior 1st,Exterior 2nd,Mas Vnr Type,Mas Vnr Area,Exter Qual,Exter Cond,Foundation,Bsmt Qual,Bsmt Cond,Bsmt Exposure,BsmtFin Type 1,BsmtFin SF 1,BsmtFin Type 2,BsmtFin SF 2,Bsmt Unf SF,Total Bsmt SF,Heating,Heating QC,Central Air,Electrical,1st Flr SF,2nd Flr SF,Low Qual Fin SF,Gr Liv Area,Bsmt Full Bath,Bsmt Half Bath,Full Bath,Half Bath,Bedroom AbvGr,Kitchen AbvGr,Kitchen Qual,TotRms AbvGrd,Functional,Fireplaces,Fireplace Qu,Garage Type,Garage Yr Blt,Garage Finish,Garage Cars,Garage Area,Garage Qual,Garage Cond,Paved Drive,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type
1,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,RL,,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,RL,58.0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,RM,60.0,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,RL,,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


**Determine missing observations:**

In [12]:
df.isnull().sum()


MS SubClass          0
MS Zoning            0
Lot Frontage       160
Lot Area             0
Street               0
Alley              821
Lot Shape            0
Land Contour         0
Utilities            0
Lot Config           0
Land Slope           0
Neighborhood         0
Condition 1          0
Condition 2          0
Bldg Type            0
House Style          0
Overall Qual         0
Overall Cond         0
Year Built           0
Year Remod/Add       0
Roof Style           0
Roof Matl            0
Exterior 1st         0
Exterior 2nd         0
Mas Vnr Type         1
Mas Vnr Area         1
Exter Qual           0
Exter Cond           0
Foundation           0
Bsmt Qual           25
Bsmt Cond           25
Bsmt Exposure       25
BsmtFin Type 1      25
BsmtFin SF 1         0
BsmtFin Type 2      25
BsmtFin SF 2         0
Bsmt Unf SF          0
Total Bsmt SF        0
Heating              0
Heating QC           0
Central Air          0
Electrical           1
1st Flr SF           0
2nd Flr SF 

**Make the column names searchable**

In [13]:
df.columns = df.columns.str.lower()


In [14]:
df.columns = df.columns.str.replace(' ','_')


In [15]:
df.columns = df.columns.str.replace('/','_')


In [16]:
df.columns = df.columns.str.replace('3','three')


In [17]:
df.columns = df.columns.str.replace('1st','first')


In [18]:
df.columns = df.columns.str.replace('2nd','second')


In [19]:
df.dtypes


ms_subclass          int64
ms_zoning           object
lot_frontage       float64
lot_area             int64
street              object
alley               object
lot_shape           object
land_contour        object
utilities           object
lot_config          object
land_slope          object
neighborhood        object
condition_1         object
condition_2         object
bldg_type           object
house_style         object
overall_qual         int64
overall_cond         int64
year_built           int64
year_remod_add       int64
roof_style          object
roof_matl           object
exterior_first      object
exterior_second     object
mas_vnr_type        object
mas_vnr_area       float64
exter_qual          object
exter_cond          object
foundation          object
bsmt_qual           object
bsmt_cond           object
bsmt_exposure       object
bsmtfin_type_1      object
bsmtfin_sf_1         int64
bsmtfin_type_2      object
bsmtfin_sf_2         int64
bsmt_unf_sf          int64
t

**Analyze & Evaluate Columns**

* I will try and predict housing prices, any data that could by useful in this task will be kept.

In [20]:
# COLUMN:		ms_subclass
# DEFINITION:	Identifies the type of dwelling involved in the
#               sale.
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.ms_subclass.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([190,  90,  60,  30,  20, 160, 120,  70,  80,  50,  85,  45,  75,
       180,  40])

In [21]:
# COLUMN:		ms_zoning
# DEFINITION:	Identifies the general zoning classification of
#               the sale.
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. ms_zoning.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['RM', 'RL', 'FV', 'RH', 'C (all)', 'I (all)'], dtype=object)

In [22]:
ms_zoning = {'RL':'0', 'FV':'1', 'RH':'2', 'RM':'3', 'C (all)':'4', 'I (all)':'5', 'A (agr)':'6'}
df.ms_zoning = [ms_zoning[item] for item in df.ms_zoning] 
df.ms_zoning.unique()


array(['3', '0', '1', '2', '4', '5'], dtype=object)

In [23]:
df["ms_zoning"] = df["ms_zoning"].astype(dtype=np.int)
df['ms_zoning'].dtype


dtype('int64')

In [24]:
# COLUMN:		lot_frontage
# DEFINITION:	Linear feet of street connected to property
# DATA TYPE:	float64
# MISSING VALUES:	330
# UNIQUE VALUES:	
df.lot_frontage.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 69.,  nan,  58.,  60.,  21.,  52.,  39.,  75.,  50.,  68.,  80.,
       121.,  51.,  65.,  74.,  73.,  40.,  46.,  61.,  85., 136.,  34.,
        70.,  59., 130., 100.,  77., 131.,  67.,  79.,  98.,  53.,  95.,
        37., 106.,  90., 120.,  72.,  76., 118.,  24.,  78.,  86., 102.,
        66., 112.,  63.,  57., 110.,  89., 103., 105.,  48.,  43., 160.,
        31.,  55.,  84.,  44., 150.,  35.,  64.,  71., 122., 174.,  49.,
        47.,  93.,  56., 149.,  87., 168.,  82.,  36.,  41.,  96.,  88.,
        83.,  45.,  94.,  54.,  30., 124.,  81.,  42., 152., 115., 113.,
       101., 104., 116.,  62., 114., 107.,  33.,  99.,  91.,  97., 126.,
        28., 108., 133.,  32.,  92., 182.])

In [25]:
df = df.drop(columns=['lot_frontage'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,lot_area,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,9142,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,9662,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,17104,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,8520,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,9500,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [26]:
# COLUMN:		lot_area
# DEFINITION:	Lot size in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! DECISION TO DROP DROP DATA.


array([  9142,   9662,  17104,   8520,   9500,   1890,   8516,   9286,
         3515,  10125,   7175,   7200,  11310,   7976,  11737,   9060,
        10800,   9571,  17671,   8246,   8499,   8012,  21453,   9605,
         7180,  12513,   9000,   8800,   3500,   7340,   6240,  11700,
        10000,  11166,  13204,   5520,   8892,   7321,   3951,  11214,
         7388,  14559,   4054,   1533,  11250,   7800,   5868,   7500,
         9439,   9345,   6000,   7930,   9120,   9503,  11050,  10316,
        11675,  12900,  13770,  13355,   7153,  10750,   8125,   3903,
         4270,   4426,   5825,  23303,  25485,  10784,   8510,  17871,
        40094,  21750,   9510,   8339,  14828,   8749,   9140,   3600,
         8777,  12798,  12704,   7480,  16500,  16669,   8064,   7758,
        11767,   6900,   4671,   6125,   9863,   4590,   8450,   8400,
        14694,  14250,   6951,  12720,  11988,   3880,   7685,  18837,
        11136,  10773,  10200,  11800,  17217,  16196,  35760,   7630,
      

In [27]:
df = df.drop(columns=['lot_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,lot_shape,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,Pave,,Reg,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,Pave,,IR1,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [28]:
# COLUMN:		street
# DEFINITION:	Type of road access to property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.street.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Pave', 'Grvl'], dtype=object)

In [29]:
street = {'Pave':'0', 'Grvl':'1'}
df.street = [street[item] for item in df.street] 
df.street.unique()


array(['0', '1'], dtype=object)

In [30]:
df["street"] = df["street"].astype(dtype=np.int)
df['street'].dtype


dtype('int64')

In [31]:
# COLUMN:		Alley
# DEFINITION:	Type of alley access to property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.alley.unique()
# EVALUATION: LOT WITH MORE THAN ONE POINT OF ENTRY, COULD
# EFFECT PRICE. MISSING DATA CAUSE, LOT DOESN'T ABUT ALLEY.
# COLUMN NEEDS TO BE CONVERTED TO INTERGER!


array(['Grvl', nan, 'Pave'], dtype=object)

In [32]:
alley = {np.nan:'0', 'Pave':'1', 'Grvl':'1'}
df.alley = [alley[item] for item in df.alley]
df.alley.unique()


array(['1', '0'], dtype=object)

In [33]:
df['alley'] = pd.to_numeric(df['alley'], errors='coerce')
df['alley'].dtype



dtype('int64')

In [34]:
# COLUMN:		lot_shape
# DEFINITION:	General shape of property
# DATA TYPE:	object
# MISSING VALUES:	1911
# UNIQUE VALUES:	
df.lot_shape.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Reg', 'IR1', 'IR3', 'IR2'], dtype=object)

In [35]:
df=df.drop(columns=['lot_shape'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,land_slope,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,Lvl,AllPub,Inside,Gtl,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,Lvl,AllPub,Inside,Gtl,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,Lvl,AllPub,Inside,Gtl,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [36]:
# COLUMN:		land_contour
# DEFINITION:	Land Contour: Flatness of the property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_contour.unique()
# EVALUATION: LOT CONTOUR COULD EFFECT PRICE, HOUSE HAS VIEW 
# AND IS WORTH MORE. COLUMN NEEDS TO BE CONVERTED TO INTERGER,
# NO MISSING VALUES!


array(['Lvl', 'Bnk', 'Low', 'HLS'], dtype=object)

In [37]:
land_contour = {'Lvl':'0', 'HLS':'1', 'Bnk':'2', 'Low':'3'}
df.land_contour = [land_contour[item] for item in df.land_contour] 
df.land_contour.unique()


array(['0', '2', '3', '1'], dtype=object)

In [38]:
df["land_contour"] = df["land_contour"].astype(dtype=np.int)
df['land_contour'].dtype


dtype('int64')

In [39]:
# COLUMN:		utilities
# DEFINITION:	Type of utilities available
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.utilities.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['AllPub', 'NoSewr'], dtype=object)

In [40]:
utilities = {'AllPub':'0', 'NoSewr':'1', 'NoSeWa':'2'}
df.utilities = [utilities[item] for item in df.utilities] 
df.utilities.unique()


array(['0', '1'], dtype=object)

In [41]:
df["utilities"] = df["utilities"].astype(dtype=np.int)
df['utilities'].dtype


dtype('int64')

In [42]:
# COLUMN:		lot_config
# DEFINITION:	Lot configuration
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.lot_config.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Inside', 'CulDSac', 'Corner', 'FR2', 'FR3'], dtype=object)

In [43]:
lot_config = {'Corner':'0', 'Inside':'1', 'CulDSac':'2', 'FR2':'3', 'FR3':'4'}
df.lot_config = [lot_config[item] for item in df.lot_config] 
df.lot_config.unique()


array(['1', '2', '0', '3', '4'], dtype=object)

In [44]:
df["lot_config"] = df["lot_config"].astype(dtype=np.int)
df['lot_config'].dtype


dtype('int64')

In [45]:
# COLUMN:		land_slope
# DEFINITION:	Slope of property
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.land_slope.unique()
# EVALUATION: COLUMN IS SIMILAR TO LAND CONTOUR, ONLY
# DESCRIBING DEGREE OF SLOPE. COLUMN DOES NOT PERTAIN TO
# HOUSING PRICE, COLUMN TO BE DROPPED!


array(['Gtl', 'Mod', 'Sev'], dtype=object)

In [46]:
df = df.drop(columns=['land_slope'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,condition_2,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,OldTown,Norm,Norm,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,Sawyer,Norm,Norm,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,Gilbert,Norm,Norm,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,OldTown,Norm,Norm,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,NAmes,Norm,Norm,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [47]:
# COLUMN:		neighborhood
# DEFINITION:	Physical locations within Ames city limits
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.neighborhood.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['OldTown', 'Sawyer', 'Gilbert', 'NAmes', 'MeadowV', 'CollgCr',
       'Somerst', 'Mitchel', 'SawyerW', 'IDOTRR', 'BrkSide', 'Edwards',
       'ClearCr', 'NWAmes', 'Timber', 'NoRidge', 'NridgHt', 'Crawfor',
       'StoneBr', 'Veenker', 'BrDale', 'SWISU', 'Blmngtn', 'NPkVill',
       'Greens', 'Blueste'], dtype=object)

In [48]:
neighborhood = {'NAmes':'0', 'Gilbert':'1', 'StoneBr':'2', 'NWAmes':'3', 'Somerst':'4', 'BrDale':'5', 'NPkVill':'6','NridgHt':'7', 'Blmngtn':'8', 'NoRidge':'9', 'SawyerW':'10', 'Sawyer':'11', 'Greens':'12', 'BrkSide':'13','OldTown':'14', 'IDOTRR':'15', 'ClearCr':'16', 'SWISU':'17', 'Edwards':'18', 'CollgCr':'19', 'Crawfor':'20','Blueste':'21', 'Mitchel':'22', 'Timber':'23', 'MeadowV':'10', 'Veenker':'24', 'GrnHill':'25', 'Landmrk':'26'}
df.neighborhood = [neighborhood[item] for item in df.neighborhood] 
df.neighborhood.unique()


array(['14', '11', '1', '0', '10', '19', '4', '22', '15', '13', '18',
       '16', '3', '23', '9', '7', '20', '2', '24', '5', '17', '8', '6',
       '12', '21'], dtype=object)

In [49]:
df["neighborhood"] = df["neighborhood"].astype(dtype=np.int)
df['neighborhood'].dtype


dtype('int64')

In [50]:
# COLUMN:		condition_1
# DEFINITION:	Proximity to various conditions
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. condition_1.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Norm', 'Feedr', 'Artery', 'PosA', 'RRAn', 'PosN', 'RRNn', 'RRAe',
       'RRNe'], dtype=object)

In [51]:
condition_1 = {'Norm':'0', 'RRAe':'1', 'RRNe':'2', 'Feedr':'3', 'Artery':'4', 'PosA':'5', 'PosN':'6', 'RRAn':'7','RRNn':'8'}
df.condition_1 = [condition_1[item] for item in df.condition_1] 
df.condition_1.unique()


array(['0', '3', '4', '5', '7', '6', '8', '1', '2'], dtype=object)

In [52]:
df["condition_1"] = df["condition_1"].astype(dtype=np.int)
df['condition_1'].dtype


dtype('int64')

In [53]:
# COLUMN:		condition_2
# DEFINITION:	Proximity to various conditions (if more than one is present)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.condition_2.unique()
# EVALUATION: REDUNDANT INFORMATION. COLUMN DOES NOT PERTAIN TO HOUSING PRICE, COLUMN
# TO BE DROPPED!


array(['Norm', 'PosN', 'Feedr', 'PosA'], dtype=object)

In [54]:
df = df.drop(columns=['condition_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,overall_cond,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,2fmCon,2Story,6,8,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,Duplex,1Story,5,4,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,1Fam,2Story,7,5,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,1Fam,1Story,5,6,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,1Fam,1Story,6,5,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [55]:
# COLUMN:		bldg_type
# DEFINITION:	Type of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bldg_type.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['2fmCon', 'Duplex', '1Fam', 'TwnhsE', 'Twnhs'], dtype=object)

In [56]:
bldg_type = {'1Fam':'0', 'TwnhsE':'1', 'Twnhs':'2', 'Duplex':'3', '2fmCon':'4'}
df.bldg_type = [bldg_type[item] for item in df.bldg_type] 
df.bldg_type.unique()


array(['4', '3', '0', '1', '2'], dtype=object)

In [57]:
df["bldg_type"] = df["bldg_type"].astype(dtype=np.int)
df['bldg_type'].dtype


dtype('int64')

In [58]:
# COLUMN:		house_style 
# DEFINITION:	Style of dwelling
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.house_style.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['2Story', '1Story', '1.5Fin', 'SLvl', 'SFoyer', '2.5Fin', '2.5Unf',
       '1.5Unf'], dtype=object)

In [59]:
house_style = {'1Story':'0', '2Story':'1', 'SLvl':'2', '1.5Fin':'3', 'SFoyer':'4', '2.5Unf':'5', '1.5Unf':'6','2.5Fin':'7'}
df.house_style = [house_style[item] for item in df.house_style] 
df.house_style.unique()


array(['1', '0', '3', '2', '4', '7', '5', '6'], dtype=object)

In [60]:
df["house_style"] = df["house_style"].astype(dtype=np.int)
df['house_style'].dtype


dtype('int64')

In [61]:
# COLUMN:		overall_qual
# DEFINITION:	Rates the overall material and finish of the
# house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.overall_qual.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 6,  5,  7,  4,  8,  3, 10,  9,  2])

In [62]:
# COLUMN:		overall_cond        
# DEFINITION:	Rates the overall condition of the house
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. overall_cond.unique()
# EVALUATION: REDUNDANT INFORMATION. COLUMN DOES NOT PERTAIN TO HOUSING PRICE, COLUMN
# TO BE DROPPED!


array([8, 4, 5, 6, 7, 9, 3, 2, 1])

In [63]:
df = df.drop(columns=['overall_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,year_built,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,1910,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,1977,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,2006,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1923,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,1963,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [64]:
# COLUMN:		year_built
# DEFINITION:	Original construction date
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_built.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! MODELS DON'T DO WELL WITH THIS MANY UNIQUE VALUES.
# DROP COLUMN.


array([1910, 1977, 2006, 1923, 1963, 1972, 1958, 2004, 1991, 1925, 1954,
       2000, 1924, 1957, 1940, 1956, 1961, 1882, 1968, 1993, 1969, 2007,
       1920, 1880, 1945, 1971, 1973, 2001, 1953, 1999, 2009, 1998, 1959,
       1951, 1987, 1970, 1965, 1930, 1890, 1938, 1975, 1992, 1994, 2005,
       1931, 1926, 1960, 1967, 1962, 2002, 1921, 1997, 2008, 1981, 1948,
       1988, 1939, 1927, 1984, 2003, 1978, 1964, 1974, 1949, 1995, 1900,
       1912, 1976, 1966, 1941, 1983, 1950, 1906, 1952, 1947, 1915, 1955,
       1946, 1996, 1922, 1918, 1980, 1935, 1932, 1928, 2010, 1902, 1919,
       1917, 1982, 1979, 1937, 1990, 1916, 1934, 1892, 1936, 1989, 1905,
       1914, 1904, 1986, 1908, 1985, 1907, 1885])

In [65]:
df = df.drop(columns=['year_built'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,year_remod_add,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,1950,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,1977,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,2006,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,2006,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,1963,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [66]:
# COLUMN:		year_remod_add      
# DEFINITION:	Remodel date (same as construction date if no remodeling or additions)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.year_remod_add.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([1950, 1977, 2006, 1963, 1972, 1989, 2004, 1991, 1992, 1954, 2000,
       1996, 1956, 1986, 2001, 1961, 1994, 1969, 2007, 1971, 2003, 1980,
       2009, 1999, 2002, 1987, 2008, 1965, 1995, 2005, 1983, 1975, 1998,
       1953, 1960, 1976, 1985, 1959, 1997, 1981, 1957, 1978, 1964, 1967,
       1974, 1970, 1968, 1952, 1993, 1955, 1962, 1973, 1990, 1984, 1988,
       2010, 1958, 1966, 1982, 1979, 1951])

In [67]:
df = df.drop(columns=['year_remod_add'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,roof_style,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,Gable,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,Gable,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,Gable,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,Gable,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,Gable,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [68]:
# COLUMN:		roof_style         
# DEFINITION:	Type of roof
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_style.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Gable', 'Hip', 'Gambrel', 'Flat', 'Mansard', 'Shed'], dtype=object)

In [69]:
df = df.drop(columns=['roof_style'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,roof_matl,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,CompShg,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,CompShg,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,CompShg,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,CompShg,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,CompShg,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [70]:
# COLUMN:		roof_matl                  
# DEFINITION:	Roof material
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.roof_matl.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['CompShg', 'Metal', 'WdShake', 'Tar&Grv', 'WdShngl', 'Roll'],
      dtype=object)

In [71]:
df = df.drop(columns=['roof_matl'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exterior_first,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,AsbShng,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,Plywood,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,VinylSd,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,Wd Sdng,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,Plywood,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [72]:
# COLUMN:		exterior_first                        
# DEFINITION:	Exterior covering on house
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_first.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['AsbShng', 'Plywood', 'VinylSd', 'Wd Sdng', 'CemntBd', 'MetalSd',
       'HdBoard', 'BrkComm', 'Stucco', 'WdShing', 'BrkFace', 'PreCast',
       'AsphShn'], dtype=object)

In [73]:
df = df.drop(columns=['exterior_first'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exterior_second,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,AsbShng,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,Plywood,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,VinylSd,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,Wd Sdng,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,Plywood,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [74]:
# COLUMN:		exterior_second                              
# DEFINITION:	Exterior covering on house (if more than one
# material)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exterior_second.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['AsbShng', 'Plywood', 'VinylSd', 'Wd Sdng', 'CmentBd', 'MetalSd',
       'BrkFace', 'Stucco', 'HdBoard', 'Wd Shng', 'ImStucc', 'Brk Cmn',
       'PreCast', 'CBlock', 'AsphShn', 'Other'], dtype=object)

In [75]:
df = df.drop(columns=['exterior_second'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,mas_vnr_type,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,BrkFace,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [76]:
# COLUMN:		mas_vnr_type                                     
# DEFINITION:	Masonry veneer type
# DATA TYPE:	object
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['None', 'BrkFace', 'Stone', 'BrkCmn', 'CBlock', nan], dtype=object)

In [77]:
df = df.drop(columns=['mas_vnr_type'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,mas_vnr_area,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0.0,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0.0,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,0.0,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,0.0,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,247.0,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [78]:
# COLUMN:		mas_vnr_area                                          
# DEFINITION:	Masonry veneer area in square feet
# DATA TYPE:	float64
# MISSING VALUES:	22
# UNIQUE VALUES:	
df.mas_vnr_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([   0.,  247.,   23.,   98.,  104.,  156.,  180.,   44.,   76.,
         70.,  352.,  162.,  444.,  495.,  340.,  634.,  182.,  147.,
        108.,   20.,  423.,  178.,  359.,  762.,   75.,  161.,  674.,
        100.,  306.,  509.,  653.,  450.,  360.,  680.,  112.,   72.,
        440., 1378.,  304.,  364.,  754.,  788.,  230.,  368.,  120.,
        113.,  216.,  371.,  153.,  151.,  396.,  215.,  472.,  500.,
        468.,   14.,   50.,   96.,   99.,  342.,  174.,  310.,  114.,
         74.,  270.,  260.,  123.,  218.,  415.,  921.,  771.,  726.,
         16.,  362.,  473.,  870., 1224.,  285.,  420.,  137.,  259.,
         82.,  632.,  170.,  408.,   53.,  532.,  286.,  206.,  308.,
        405.,  128.,  236.,  350.,  302.,  256.,  657.,  194.,  567.,
        116.,   65.,  305.,  188.,  281.,  300.,  198.,   95.,  481.,
        226.,  459.,  480.,  422.,  877.,  166.,  149.,  190.,  189.,
        492.,  205.,  130.,  250.,  223.,  280.,  435.,  229.,  438.,
        975.,   67.,

In [79]:
df = df.drop(columns=['mas_vnr_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,exter_cond,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,TA,Fa,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,TA,TA,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,Gd,TA,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,Gd,TA,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,TA,TA,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [80]:
# COLUMN:		exter_qual                                                   
# DEFINITION:	Evaluates the quality of the material on the
# exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['TA', 'Gd', 'Fa', 'Ex'], dtype=object)

In [81]:
exter_qual = {'TA':'0', 'Gd':'1', 'Ex':'2', 'Fa':'3'}
df.exter_qual = [exter_qual[item] for item in df.exter_qual] 
df.exter_qual.unique()


array(['0', '1', '3', '2'], dtype=object)

In [82]:
df["exter_qual"] = df["exter_qual"].astype(dtype=np.int)
df['exter_qual'].dtype


dtype('int64')

In [83]:
# COLUMN:		exter_cond                                                           
# DEFINITION:	Evaluates the present condition of the material on the exterior
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.exter_cond.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES, SIMILIAR TO exter_qual! DROPPING IN FAVOR OF
# exter_qual!


array(['Fa', 'TA', 'Gd', 'Ex', 'Po'], dtype=object)

In [84]:
df = df.drop(columns=['exter_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,Stone,Fa,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,CBlock,Gd,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,PConc,Gd,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,CBlock,TA,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,CBlock,Gd,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [85]:
# COLUMN:		foundation                                                                    
# DEFINITION:	Type of foundation
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.foundation.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array(['Stone', 'CBlock', 'PConc', 'BrkTil', 'Slab', 'Wood'], dtype=object)

In [86]:
foundation  = {'CBlock':'0', 'PConc':'1', 'Slab':'2', 'BrkTil':'3', 'Stone':'4', 'Wood':'5'}
df.foundation  = [foundation [item] for item in df.foundation ] 
df.foundation .unique()


array(['4', '0', '1', '3', '2', '5'], dtype=object)

In [87]:
df["foundation"] = df["foundation"].astype(dtype=np.int)
df['foundation'].dtype


dtype('int64')

In [88]:
# COLUMN:		bsmt_qual                                                                              
# DEFINITION:	Evaluates the height of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df. bsmt_qual.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array(['Fa', 'Gd', 'TA', 'Ex', nan, 'Po'], dtype=object)

In [89]:
df = df.drop(columns=['bsmt_qual'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,TA,No,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,TA,No,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,Gd,Av,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,TA,No,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,TA,No,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [90]:
# COLUMN:		bsmt_cond                                                                                        
# DEFINITION:	Evaluates the general condition of the basement
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmt_cond.unique()
# EVALUATION: COLUMN NEEDS TO BE CONVERTED TO INTERGER
# MISSING VALUES!


array(['TA', 'Gd', nan, 'Fa'], dtype=object)

In [91]:
bsmt_cond = {np.nan:'0', 'Gd':'1', 'TA':'2', 'Po':'3', 'Fa':'4', 'Ex':'5'}
df.bsmt_cond = [bsmt_cond[item] for item in df.bsmt_cond] 
df.bsmt_cond.unique()


array(['2', '1', '0', '4'], dtype=object)

In [92]:
df['bsmt_cond'] = pd.to_numeric(df['bsmt_cond'], errors='coerce')
df['bsmt_cond'].dtype


dtype('int64')

In [93]:
# COLUMN:		bsmt_exposure                                                                                              
# DEFINITION:	Refers to walkout or garden level walls
# DATA TYPE:	object
# MISSING VALUES:	58
# UNIQUE VALUES:	
df.bsmt_exposure.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['No', 'Av', nan, 'Mn', 'Gd'], dtype=object)

In [94]:
df=df.drop(columns=['bsmt_exposure'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,Unf,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,Unf,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,GLQ,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,Unf,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,BLQ,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [95]:
# COLUMN:		bsmtfin_type_1                                                                                                   
# DEFINITION:	Rating of basement finished areawalls
# DATA TYPE:	object
# MISSING VALUES:	55
# UNIQUE VALUES:	
df.bsmtfin_type_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Unf', 'GLQ', 'BLQ', 'Rec', 'ALQ', nan, 'LwQ'], dtype=object)

In [96]:
df=df.drop(columns=['bsmtfin_type_1'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,0,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,0,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,554,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,0,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,609,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [97]:
# COLUMN:		bsmtfin_sf_1                                                                                                        
# DEFINITION:	Type 1 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_1.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([   0,  554,  609,  294,  196,  641,  278,  820,  590,  257,  913,
        216,  188,  660,  165,  938,  390,  763,  144,  322,  152, 1084,
         24,  284,  405,  650,  949,  486,  248,  824,  324,  220,  439,
        329,  457,  636,  735, 1660, 1300,  190,  583,  387,  236,  614,
       1030,  544,  697, 1230,  540,  500,  528, 2260,  701, 1383,  308,
        316,  173,  462,  480,  962,  120,  588,  767,  306,  870,  472,
        643, 1252,  658, 2257,  777,  518,  687,  704,  844, 1443, 1387,
       1104,  602, 1572, 1191,  182, 1249,  360,   68,  502,  828,  267,
        944,  550,  426,  866,  596,  982,   49,  378,  121,  491,  769,
        902,  873,   48,  994,  352, 2288,   33,  814,   16,  788, 2096,
        746,  612,  600,  483,  252,  739,  552,  125,  204,  652,  474,
        194,  915,  210,  489,  773,  729,  646,  271,  456,   70,  871,
        442,  224, 1236, 1141,  397,  810,  749,  574,  450, 1056,  400,
        238,  793, 1035,  150, 1036,  592, 1360, 17

In [98]:
df=df.drop(columns=['bsmtfin_sf_1'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,Unf,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,Unf,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,Unf,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,Unf,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,Unf,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [99]:
# COLUMN:		bsmtfin_type_2                                                                                                             
# DEFINITION:	Rating of basement finished area (if multiple types)
# DATA TYPE:	object
# MISSING VALUES:	56
# UNIQUE VALUES:	
df.bsmtfin_type_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Unf', 'LwQ', nan, 'ALQ', 'GLQ', 'Rec', 'BLQ'], dtype=object)

In [100]:
df=df.drop(columns=['bsmtfin_type_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,0,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,0,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,0,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,0,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,0,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [101]:
# COLUMN:		bsmtfin_sf_2                                                                                                                   
# DEFINITION:	Type 2 finished square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmtfin_sf_2.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([   0,  279,  668,  841,  180,  240,  472,  374,  873, 1526, 1020,
        337,  210,  136,   46,  453,  252, 1029,  634,  468, 1164, 1031,
        884,  606,  820,  826,   63,  480,  184,  336,  391,  147,  398,
        182,  250,  165,  627,  393,  723,  449,  319,  682,  912,  530,
        276,  831,  121,   93,  288,  287,   48,   32,  193,  684,  456,
        904,  764,  755,  544,  402,  491,  691,  243,   40,  344,  543,
         42,  712,   78,  352,  350,  438,  690,  679,  216,  362,  694,
        435,  972,  144,  360,  791,  448,  264,  168,  981,  176,  174,
        396,  799,  400,  630,  139,  512,  153,  492])

In [102]:
df=df.drop(columns=['bsmtfin_sf_2'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,1020,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1967,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,100,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,968,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,785,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [103]:
# COLUMN:		bsmt_unf_sf                                                                                                                         
# DEFINITION:	Unfinished square feet of basement area
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.bsmt_unf_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1020, 1967,  100,  968,  785,  252,  869, 1072,  840,  276,  939,
       1040, 1367,  348,  848,    0,  294,  816,  400,  700,  204,  598,
       1218,  474,  715,  636,  173,  226,  536,  628, 1240,   92, 1468,
        608,  863,  105, 1339,  612,  930,  658,  178,  552,  546,  434,
        187,  448,  216,  588,  346, 1615,  803,  248,  115,  697,  193,
        540,  257,  322,   95,  242,  374,  262,  244,  272,  151,  600,
        278,  384,  543, 1152,  878,  397, 1459,  321,  371,  166,  154,
       2042,  396,  270, 1686,  744,  411,  560,  827,  461,  422,  639,
       1530,  355,  167,  306,  998,  218,  467,  357,  179,  491, 1921,
       1128,  590, 1058, 1140,   39,  360,  630,  616,  324,  394, 1085,
        523,  795,  165,  504,  365,  497,  513,  136,  194,  850, 1369,
       1114,  436,  220,  756, 1632,  960, 1048,  311,  592, 1619,  860,
        901,  768,  196,   63, 1055,  484,  416,  108,  342, 1451,  300,
       1129,  172, 1421,  712,  312,  691, 1466,  5

In [104]:
df=df.drop(columns=['bsmt_unf_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,total_bsmt_sf,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,1020,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1967,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,654,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,968,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,1394,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [105]:
# COLUMN:		total_bsmt_sf                                                                                                                             
# DEFINITION:	Total square feet of basement areaarea
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.total_bsmt_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1020, 1967,  654,  968, 1394,  546,  869, 1268,  840, 1196, 1217,
       1040, 1367, 1168,  848,    0,  884, 1073, 1313,  916, 1060,  864,
       1604,  938, 1218,  715,  636,  936,  370,  858,  780, 1240, 1176,
       1492,  608, 1147,  105, 1339,  612,  930, 1063, 1008, 1501,  920,
        828,  912,  346, 1615,  803,  468, 1026, 1024,  992, 1982, 1300,
       1158,  825,  761, 2024, 1302,  544,  600, 1508, 1560,  384, 1043,
       1680, 3138, 1135, 1780, 1459,  629,  687,  676,  616, 2042,  876,
       1232, 1686,  999,  560,  827, 1228,  728, 1509, 1554,  810, 1694,
        998, 2535, 1244,  686,  697, 1224, 1921, 1832, 1434, 1058, 1140,
       1482, 1930,  360,  630, 1720,  926, 1966, 1085, 1714,  794,  795,
       1414,  565, 1530, 1015,  943,  720, 1080,  744, 1528, 1369, 1114,
        816, 1161, 1078,  756, 1632,  960, 1048,  689,  592, 1740,  860,
        988,  901,  768, 1098, 1104, 1055, 1478, 1056, 2630, 1451, 1563,
       1145, 1445, 2396,  990, 1776,  712,  691, 19

In [106]:
df=df.drop(columns=['total_bsmt_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,GasA,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,GasA,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,GasA,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,GasA,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,GasA,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [107]:
# COLUMN:		heating                                                                                                                                         
# DEFINITION:	Type of heating
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['GasA', 'GasW', 'Grav', 'Floor'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [108]:
df = df.drop(columns=['heating'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,electrical,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,Gd,N,FuseP,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,TA,Y,SBrkr,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,Ex,Y,SBrkr,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,TA,Y,SBrkr,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,Gd,Y,SBrkr,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [109]:
# COLUMN:		heating_qc                                                                                                                                                  
# DEFINITION:	Heating quality and condition
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.heating_qc.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['Gd', 'TA', 'Ex', 'Fa'], dtype=object)

In [110]:
heating_qc = {'Fa':'0', 'TA':'1', 'Gd':'2', 'Ex':'3', 'Po':'4'}
df.heating_qc = [heating_qc[item] for item in df.heating_qc] 
df.heating_qc.unique()


array(['2', '1', '3', '0'], dtype=object)

In [111]:
df["heating_qc"] = df["heating_qc"].astype(dtype=np.int)
df['heating_qc'].dtype


dtype('int64')

In [112]:
# COLUMN:		central_air                                                                                                                                                          
# DEFINITION:	Central air conditioning
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.central_air.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, NO
# MISSING VALUES!


array(['N', 'Y'], dtype=object)

In [113]:
central_air = {'Y':'0', 'N':'1'}
df.central_air = [central_air[item] for item in df.central_air] 
df.central_air.unique()


array(['1', '0'], dtype=object)

In [114]:
df["central_air"] = df["central_air"].astype(dtype=np.int)
df['central_air'].dtype


dtype('int64')

In [115]:
# COLUMN:		electrical                                                                                                                                                                   
# DEFINITION:	Electrical system
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. electrical.unique()


array(['FuseP', 'SBrkr', 'FuseA', 'FuseF', nan], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [116]:
df = df.drop(columns=['electrical'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,first_flr_sf,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,908,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,1967,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,664,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,968,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1394,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [117]:
# COLUMN:		first_flr_sf                                                                                                                                                                                     
# DEFINITION:	First Floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.first_flr_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([ 908, 1967,  664,  968, 1394,  546, 1093, 1268,  840, 1279, 1217,
       1040, 1375, 1168, 1017, 1340,  884, 1073, 1773,  916, 1060,  864,
       1617,  988, 1218, 1281, 1089, 1054,  442,  858,  848, 1320, 1178,
       1492,  608, 1147,  998,  910, 1358,  612,  956, 1327, 1363, 1501,
        798,  920,  965,  936,  912, 1157, 1615,  803,  822, 1026, 1133,
       1344, 1193,  992, 2006, 1140, 1176,  845,  810, 2063, 1302,  774,
        747, 1508, 1560,  802, 1050, 1724, 3138, 1771, 1207,  882, 1780,
       1459,  727,  687,  760,  616, 2042,  876, 1707, 1064,  999,  796,
        827, 1228,  728,  914, 1509, 1554, 1694, 1790,  923, 2470, 1244,
        866,  697, 1287, 1921, 1832, 1434, 1058, 1382, 1494, 1831, 1032,
        630, 1720,  926, 1966, 1120, 1714,  794,  954, 1414, 1008,  565,
       1530, 1025,  943,  720, 1128,  757, 1225, 1369, 1114,  816, 1381,
       1078,  756, 1474,  929,  960, 1048,  689,  432, 1740,  860, 1088,
        900,  792, 1098, 1055, 1493,  768, 1056, 26

In [118]:
df = df.drop(columns=['first_flr_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,second_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,1020,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,832,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [119]:
# COLUMN:		second_flr_sf              
# DEFINITION:	Second floor square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.second_flr_sf .unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([1020,    0,  832,  546,  840, 1619,  810,  552,  826,  457,  661,
        228, 1320,  850,  510,  612,  930,  866,  336, 1111,  557,  320,
        687,  873,  825,  793, 1012,  670, 1538,  380,  671,  676, 1072,
        550,  424,  728,  855,  804,  604,  816, 1796,  780,  672,  678,
        795,  917,  651,  695,  720,  792,  908,  713,  208, 1000,  703,
        432,  860,  638,  110, 1215, 1208, 1358, 2065,  623,  809,  649,
        772,  880,  754,  560,  683,  701,  919,  473,  580,  830,  878,
        686,  601,  648, 1039,  767,  933,  343,  893,  682,  739,  813,
        876, 1141,  995, 1067, 1360,  807,  978, 1140, 1150,  403, 1182,
        434,  348,  800,  630,  768, 1134,  769, 1031,  798,  764, 1479,
        900,  308,  349,  871, 1175,  504,  368,  812,  967,  784,  979,
        167, 1037, 1321, 1788, 1286,  224,  240,  406,  665,  976,  704,
        858,  992,  901,  544,  646,  823,  955,  702,  886,  714, 1080,
        741,  520,  620,  499,  942, 1296, 1315,  5

In [120]:
df = df.drop(columns=['second_flr_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [121]:
# COLUMN:		low_qual_fin_sf              
# DEFINITION:	Low quality finished square feet (all floors)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.low_qual_fin_sf.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([  0, 360, 431, 481,  80, 392, 232, 420])

In [122]:
df = df.drop(columns=['low_qual_fin_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,1928,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,1967,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1496,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,968,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1394,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [123]:
# COLUMN:		gr_liv_area                       
# DEFINITION:	Above grade (ground) living area square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.gr_liv_area.unique()
# COLUMN DOES NOT PERTAIN TO HOUSING PRICE,
# COLUMN TO BE DROPPED!


array([1928, 1967, 1496,  968, 1394, 1092, 1093, 1268, 1680, 1279, 1217,
       1040, 1375, 2787, 1827, 1340, 1436, 1073, 1773, 1742, 1060,  864,
       1617,  988, 1218, 1738, 1750, 1054,  670,  858, 1208, 2640, 1178,
       1492, 1458, 1657,  998,  910, 1358, 1224, 1886, 1327, 1363, 1501,
       1344, 1786,  965,  936, 1248, 2268, 1615, 1360, 1142, 1026, 1820,
       1193, 1865, 2006, 1140, 1176, 1670, 1603, 2063, 1302,  774,  848,
        747, 2520, 1560, 1472, 1050, 1724, 4676, 1771, 1207,  882, 1780,
       1459, 1107, 1688, 2042,  876, 1320, 1707, 1495,  999, 1346, 1251,
       1228, 1274, 1642, 1509, 1554, 1677, 1665, 1694, 1790,  923, 2470,
       1244,  866, 1891, 1921, 1832, 1434, 1874, 1382, 1494, 3627, 1812,
       1720, 1604, 1966, 1970, 1714, 1470, 2230, 2331, 1008, 1216, 1530,
       1025, 1638, 1440, 1128, 1549, 2133, 1369, 1114,  816, 1381, 1078,
       1469, 1474, 1137, 1960, 1768, 1392, 1740, 1626, 1198,  900,  792,
       1098, 2127, 2263, 1493, 1200, 1056, 2674, 14

In [124]:
df = df.drop(columns=['gr_liv_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,Fa,9,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,TA,10,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,Gd,7,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,TA,5,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,TA,6,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [125]:
# COLUMN:		bsmt_full_bath                           
# DEFINITION:	Basement full bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_full_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array([0, 1, 2])

In [126]:
df = df.dropna(subset=['bsmt_full_bath'])


In [127]:
df["bsmt_full_bath"] = df["bsmt_full_bath"].astype(dtype=np.int)
df['bsmt_full_bath'].dtype


dtype('int64')

In [128]:
df.bsmt_full_bath.unique()


array([0, 1, 2])

In [129]:
# COLUMN:		bsmt_half_bath                              
# DEFINITION:	Basement half bathrooms
# DATA TYPE:	float64
# MISSING VALUES:	2
# UNIQUE VALUES:	
df.bsmt_half_bath.unique()
# COLUMN NEEDS TO BE CONVERTED TO INTERGER, CONVERT
# MISSING VALUES!


array([0, 1])

In [130]:
df = df.dropna(subset=['bsmt_half_bath'])


In [131]:
df["bsmt_half_bath"] = df["bsmt_half_bath"].astype(dtype=np.int)
df['bsmt_half_bath'].dtype


dtype('int64')

In [132]:
df.bsmt_half_bath.unique()


array([0, 1])

In [133]:
# COLUMN:		full_bath                                        
# DEFINITION:	Full bathrooms above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.full_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2, 1, 4, 3, 0])

- total_rms DOES NOT INCLUDE BATHROOMS! BELOW IS THE ADDITION OF BATHROOMS TO total_rms.

In [134]:
df['totrms_abvgrd'] = df['totrms_abvgrd'] + df['full_bath']


In [135]:
# COLUMN:		half_bath                                                  
# DEFINITION:	Half baths above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.half_bath.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([0, 1, 2])

- total_rms DOES NOT INCLUDE BATHROOMS! BELOW IS THE ADDITION OF BATHROOMS TO total_rms.

In [136]:
df['totrms_abvgrd'] = df['totrms_abvgrd'] + df['half_bath']


In [137]:
# COLUMN:		bedroom_abvgr                                                  
# DEFINITION:	Bedrooms above grade (does NOT include basement bedrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.bedroom_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([4, 6, 3, 2, 1, 5, 0])

In [138]:
# COLUMN:		kitchen_abvgr                                                        
# DEFINITION:	Kitchens above grade
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df. kitchen_abvgr.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2, 1, 0, 3])

In [139]:
# COLUMN:		kitchen_qual                                                              
# DEFINITION:	Kitchen quality
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.kitchen_qual.unique()
# EVALUATION: COLUMN DATA TYPE NEEDS TO BE CHANGED, NO MISSING
# VALUES!


array(['Fa', 'TA', 'Gd', 'Ex', 'Po'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [140]:
df = df.drop(columns=['kitchen_qual'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,Typ,0,,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,Typ,0,,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,Typ,1,Gd,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,Typ,0,,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,Typ,2,Gd,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [141]:
# COLUMN:		totrms_abvgrd                                                                     
# DEFINITION:	Total rooms above grade (does not include bathrooms)
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.totrms_abvgrd.unique()
# EVALUATION: COLUMN DATA TYPE IS ACCEPTABLE, NO MISSING VALUES.
# WILL INCLUDE FULL AND HALF BATH ROOMS. 


array([11, 12, 10,  6,  8,  7, 14,  5,  9, 13,  4, 15, 16])

In [142]:
# COLUMN:		functional                                                                             
# DEFINITION:	Home functionality (Assume typical unless deductions are warranted)
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.functional.unique()


array(['Typ', 'Min2', 'Min1', 'Mod', 'Maj1', 'Maj2'], dtype=object)

In [143]:
functional  = {'Typ':'0', 'Mod':'1', 'Min2':'2', 'Maj1':'3', 'Min1':'4', 'Sev':'5', 'Sal':'6', 'Maj2':'7'}
df.functional  = [functional [item] for item in df.functional] 
df.functional.unique()


array(['0', '2', '4', '1', '3', '7'], dtype=object)

In [144]:
df["functional"] = df["functional"].astype(dtype=np.int)
df['functional'].dtype


dtype('int64')

In [145]:
# COLUMN:		fireplaces                                                                                       
# DEFINITION:	Number of fireplaces
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.fireplaces.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([0, 1, 2, 3])

In [146]:
# COLUMN:		fireplace_qu                                                                                              
# DEFINITION:	Fireplace quality
# DATA TYPE:	object
# MISSING VALUES:	1000
# UNIQUE VALUES:	
df.fireplace_qu.unique()


array([nan, 'Gd', 'Fa', 'TA', 'Po', 'Ex'], dtype=object)

In [147]:
df = df.drop(columns=['fireplace_qu'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,Detchd,1910.0,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,Attchd,1977.0,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,Attchd,2006.0,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,Detchd,1935.0,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,Attchd,1963.0,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [148]:
# COLUMN:		garage_type                                                                                                      
# DEFINITION:	Garage location
# DATA TYPE:	object
# MISSING VALUES:	113
# UNIQUE VALUES:	
df.garage_type.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Detchd', 'Attchd', 'BuiltIn', nan, '2Types', 'CarPort', 'Basment'],
      dtype=object)

In [149]:
garage_type = {np.nan:'0', 'Attchd':'1', 'Detchd':'2', 'BuiltIn':'3', 'CarPort':'4', 'Basment':'5', '2Types':'6'}
df.garage_type = [garage_type[item] for item in df.garage_type] 
df.garage_type.unique()


array(['2', '1', '3', '0', '6', '4', '5'], dtype=object)

In [150]:
df['garage_type'] = pd.to_numeric(df['garage_type'], errors='garage_type')
df['garage_type'].dtype


dtype('int64')

In [151]:
# COLUMN:		garage_yr_blt                                                                                                           
# DEFINITION:	Year garage was built
# DATA TYPE:	float64
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_yr_blt.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([1910., 1977., 2006., 1935., 1963., 1972., 1959., 1978., 2004.,
       1980., 1991., 1993., 1954., 2000., 1943., 1957., 1940., 1956.,
       1961., 1925., 1968., 1982., 1969., 2007., 1989., 1920., 1937.,
         nan, 1979., 1962., 1950., 1973., 2001., 1952., 1953., 1999.,
       2009., 1998., 1974., 1951., 1987., 2003., 1924., 1970., 1975.,
       1958., 1971., 1992., 1994., 2005., 1960., 1967., 2002., 1997.,
       1981., 1938., 1988., 1939., 1927., 1984., 1964., 1995., 2008.,
       1966., 1941., 1983., 1906., 1930., 1955., 1946., 1996., 1928.,
       1976., 1915., 1923., 1965., 1947., 1900., 1918., 1934., 1932.,
       1986., 2010., 1985., 1949., 1948., 1917., 1990., 1921., 1926.,
       1931., 1916., 1922., 1908.])

In [152]:
df = df.drop(columns=['garage_yr_blt'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,Unf,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,Fin,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,RFn,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,Unf,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,RFn,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [153]:
# COLUMN:		garage_finish                                                                                                                
# DEFINITION:	Interior finish of the garage
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_finish.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array(['Unf', 'Fin', 'RFn', nan], dtype=object)

In [154]:
df = df.drop(columns=['garage_finish'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,440,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,580,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,426,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,480,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,514,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [155]:
# COLUMN:		garage_cars                                                                                                                      
# DEFINITION:	Size of garage in car capacity
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_cars.unique()


array([1, 2, 4, 0, 3])

In [156]:
df = df.dropna(subset=['garage_cars'])


In [157]:
df["garage_cars"] = df["garage_cars"].astype(dtype=np.int)
df['garage_cars'].dtype


dtype('int64')

In [158]:
df.garage_cars.unique()


array([1, 2, 4, 0, 3])

In [159]:
# COLUMN:		garage_area                                                                                                                             
# DEFINITION:	Size of garage in square feet
# DATA TYPE:	float64
# MISSING VALUES:	1
# UNIQUE VALUES:	
df.garage_area.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 440,  580,  426,  480,  514,  286,  308,  252,  588,  473,  484,
        320,  451,  820,  240,  828,  340,  418,  424,  270,  732,  533,
        540,  576,  352,  368,    0,  684,  539,  864,  439,  608,  454,
        162,  460,  414,  625,  528,  431,  624,  288,  512,  300,  160,
        297,  506,  839,  938,  400,  303,  464,  815,  631,  420,  640,
        402,  336,  884,  264,  294,  816,  527,  603, 1390,  495,  511,
        384,  472,  224,  374,  322,  627,  642,  789,  678,  807,  672,
        846,  470, 1092,  753,  517,  388, 1003,  355,  984,  370,  315,
        312,  605,  676,  216,  299,  565,  332,  260,  833,  905,  508,
        762,  758,  437,  541,  813,  409,  586,  324,  486,  518,  478,
        280,  253,  275,  276,  393,  880,  962,  432,  379,  668,  205,
        513,  521,  483,  399,  779,  574,  307,  889,  795,  754, 1134,
        466,  685,  656,  502,  746,  870,  396,  463,  617,  615,  258,
        706,  461,  422,  425,  520,  525,  614, 10

In [160]:
df = df.drop(columns=['garage_area'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,Po,Po,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,TA,TA,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,TA,TA,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,Fa,TA,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,TA,TA,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [161]:
# COLUMN:		garage_qual                                                                                                                                     
# DEFINITION:	Garage quality
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_qual.unique()
# COLUMN APPEARS TO BE THE SAME AS garage_cond. DROPPING
# garage_cond AND KEEPING garage_qual. CHANGE DATA TYPE,
# ADDRESS MISSING VALUES!


array(['Po', 'TA', 'Fa', nan, 'Gd'], dtype=object)

In [162]:
garage_qual = {np.nan:'0', 'TA':'1', 'Fa':'2', 'Ex':'3', 'Gd':'4', 'Po':'5'}
df.garage_qual = [garage_qual[item] for item in df.garage_qual] 
df.garage_qual.unique()


array(['5', '1', '2', '0', '4'], dtype=object)

In [163]:
df['garage_qual'] = pd.to_numeric(df['garage_qual'], errors='coerce')
df['garage_qual'].dtype


dtype('int64')

In [164]:
# COLUMN:		garage_cond                                                                                                                                            
# DEFINITION:	Garage condition
# DATA TYPE:	object
# MISSING VALUES:	114
# UNIQUE VALUES:	
df.garage_cond.unique()


array(['Po', 'TA', nan, 'Fa', 'Gd', 'Ex'], dtype=object)

In [165]:
df = df.drop(columns=['garage_cond'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,Y,0,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,Y,170,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,Y,100,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,N,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,Y,0,76,0,0,185,0,,,,0,7,2009,WD


In [166]:
# COLUMN:		paved_drive                                                                                                                                                  
# DEFINITION:	Paved driveway
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.paved_drive.unique()


array(['Y', 'N', 'P'], dtype=object)

In [167]:
paved_drive = {'P':'1', 'Y':'1', 'N':'0'}
df.paved_drive = [paved_drive[item] for item in df.paved_drive] 
df.paved_drive.unique()


array(['1', '0'], dtype=object)

In [168]:
df["paved_drive"] = df["paved_drive"].astype(dtype=np.int)
df['paved_drive'].dtype


dtype('int64')

In [169]:
# COLUMN:		wood_deck_sf                                                                                                                                                       
# DEFINITION:	Wood deck area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.wood_deck_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([  0, 170, 100, 173, 238, 312,  27, 116, 355, 406,  55, 120, 181,
       224, 196, 176,  89, 324, 240, 421, 168, 171, 316,  40, 144, 182,
       110, 149, 192, 164, 208, 147, 403, 574, 180, 158, 156, 501, 362,
       154,  58, 190,  80,  36, 402, 361, 344, 360,  76, 177, 212, 127,
       248, 330, 366,  57, 228, 175,  72, 140, 390, 453, 143,  32, 220,
       174, 280,  23,  77, 256, 124, 264, 483,  96,  84, 315,  50, 135,
       104, 210, 123, 400, 274, 250, 234, 314,  44, 304, 296, 185, 231,
       670, 113, 215,  90, 285, 245, 133, 690, 198, 286, 322, 237, 152,
       142, 411, 247, 128, 172, 200, 339,  85, 261, 199,  95, 325, 467,
       169,  56, 179, 178, 216,  38, 468, 257, 300, 132, 370, 239, 155,
        60, 225, 188, 382, 150, 353, 121, 186, 426, 351, 235, 305, 141,
       233, 112,  68, 268, 295, 262, 136, 184, 253, 209,  25, 486, 108,
       145, 328,  71, 490, 160,  74, 277, 129, 270, 297, 106, 306, 161,
        42, 125, 197, 272,  70,  49, 474, 502, 202, 460, 204, 25

In [170]:
df = df.drop(columns=['wood_deck_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,open_porch_sf,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,60,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,24,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,76,0,0,185,0,,,,0,7,2009,WD


In [171]:
# COLUMN:		open_porch_sf                                                                                                                                                         
# DEFINITION:	Open porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.open_porch_sf.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([ 60,   0,  24,  76, 111,  83,  64, 132,  30,  36,  98, 169,  90,
       312,  69, 130, 178,  21,  23, 114,  33, 174, 234,  42, 108,  65,
        48, 100,  28, 184, 124,  56,  50, 273,  75, 406, 240,  68,  39,
        32,  20, 120,  73,  45,  40,  63, 180,  58, 134,  25, 104,  52,
        74,  55, 188,  18, 136,  38, 203, 102,  70, 144, 192,  99,  78,
       116, 246,  35, 253, 166,  46, 156,  43,  47,  72, 126,  34, 172,
       341,  29, 236,  49, 113,  85, 115,  53,  16,  44,  81, 267, 241,
        51, 274,  54, 155, 182, 160, 216,  12,  92,  27,  95, 217, 139,
        26,  77,  91, 263,  62, 105, 133, 138,  84, 208, 112,  61,  96,
       154,  22, 150, 193,  59, 266, 287, 152, 204, 189, 382,  82,  15,
       162, 170, 131, 254, 570, 122, 222, 137, 110,  80,  57,  87, 103,
       195, 250, 177, 165, 262,  88,  66,  41, 742,  89, 140, 201,   6,
       265, 224, 194, 247, 175, 198, 230])

In [172]:
df = df.drop(columns=['open_porch_sf'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,threessn_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,185,0,,,,0,7,2009,WD


In [173]:
# COLUMN:		enclosed_porch                                                                                                                                                              
# DEFINITION:	Enclosed porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.enclosed_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! 
# COMBINE threessn_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.


array([ 112,    0,  184,   64,   42,  126,  293,  386,  140,  150,   80,
        220,  192,  190,   98,  286,   32,  584,  185,   40,  180,   86,
        239,  144,  160,  218,  120,  224,   94,   60,   56,   88,  183,
        168,  105,  196,  552,  123,  108,  169,  221,  254,  139,   91,
        209,   55,  170,  248,  242,   81,  244,  102,  238,  252,  264,
         41,  290,  240,  205,   35,  132,  228,  208,   51,   84,  116,
        128,  100,   20,   96,   68,   70,  231, 1012,  121,  429,   90,
        334,  137,  164,   48,   28,  202])

In [174]:
# df.sort_values(["enclosed_porch"], axis=0, ascending=True, inplace=True)


In [175]:
# print(df['enclosed_porch'])


In [176]:
# COLUMN:		threessn_porch                                                                                                                                                                       
# DEFINITION:	Three season porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.threessn_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS screen_porch. WILL DROP threessn_porch AND
# COMBINE screen_porch WITH enclosed_porch TO FORM NEW
# enclosed_porch.
# 

array([  0, 180, 130, 225, 219, 360, 238,  23, 174, 196, 216, 320])

In [177]:
# df.sort_values(["threessn_porch"], axis=0, ascending=True, inplace=True)


In [178]:
# print(df['threessn_porch'])


- AFTER INVESTIGATION OF THE VALUES IN enclosed_porch & threessn_porch, IT WAS DISCOVERED DATA COULD BE ENGINEERED INTO ONE, enclosed_porch NOW CONTIANS threessn_porch.

In [179]:
df['enclosed_porch']=df['enclosed_porch']+df['threessn_porch']    
    

In [180]:
# print(df['enclosed_porch'])


In [181]:
df = df.drop(columns=['threessn_porch'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,screen_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,185,0,,,,0,7,2009,WD


In [182]:
# COLUMN:		screen_porch                                                                                                                                                                             
# DEFINITION:	Screen porch area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.screen_porch.unique()
# EVALUATION: DATA TYPE GOOD, NO MISSING VALUES! DATA APPEARS
# TO BE THE SAME AS threessn_porch. WILL DROP screen_porch!!!


array([  0, 185, 144, 168, 160, 200, 576, 216, 189, 128, 259, 108, 192,
       256, 166, 115, 153, 138, 287,  99, 112, 263, 155, 266, 204, 156,
       119,  60, 273, 117,  90,  40,  92, 110, 126, 225, 195, 288, 228,
       116,  80, 123, 184, 120, 175, 178, 227, 100,  63, 198, 221, 196,
       197, 121])

In [183]:
# df.sort_values(["screen_porch"], axis=0, ascending=True, inplace=True)


In [184]:
# print(df['screen_porch'])


In [185]:
df = df.drop(columns=['screen_porch'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,,,,0,7,2009,WD


In [186]:
# COLUMN:		pool_area                                                                                                                                                                                       
# DEFINITION:	Pool area in square feet
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.pool_area.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([  0, 144, 555, 512, 444])

In [187]:
# COLUMN:		pool_qc                                                                                                                                                                                                   
# DEFINITION:	Pool quality
# DATA TYPE:	object
# MISSING VALUES:	2042
# UNIQUE VALUES:	
df.pool_qc.unique()
# EVALUATION: COLUMN DATA TYPE NEEDS TO BE CONVERTED. MISSING
# VALUES, COLUMN TO BE DROPPED!


array([nan, 'Ex', 'TA'], dtype=object)

In [188]:
df = df.drop(columns=['pool_qc'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,,,0,7,2009,WD


In [189]:
# COLUMN:		fence              
# DEFINITION:	Fence quality
# DATA TYPE:	object
# MISSING VALUES:	1651
# UNIQUE VALUES:	
df.fence.unique()
# EVALUATION: COLUMN DOES NOT PERTAIN TO HOUSING PRICE, MISSING
# VALUES, COLUMN TO BE DROPPED!


array([nan, 'MnPrv', 'GdPrv', 'GdWo', 'MnWw'], dtype=object)

In [190]:
df = df.drop(columns=['fence'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,misc_feature,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,,0,7,2009,WD


In [191]:
# COLUMN:		misc_feature       
# DEFINITION:	Miscellaneous feature not covered in other
#               categories
# DATA TYPE:	object
# MISSING VALUES:	1986
# UNIQUE VALUES:	
df.misc_feature.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. LOTS OF MISSING
# VALUES. MISC_VAL DOES NOT EQUAL VALUES FROM MISC_FEATURE.
# COLUMNS APPEAR TO NOT CORILATE. COLUMN TO BE DROPPED! 


array([nan, 'Shed', 'Othr', 'Gar2'], dtype=object)

In [192]:
df = df.drop(columns=['misc_feature'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,misc_val,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,0,7,2009,WD


In [193]:
# COLUMN:		misc_val            
# DEFINITION:	Value of miscellaneous feature
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.misc_val.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES! MISC_VAL DOES NOT EQUAL VALUES FROM MISC_FEATURE.
# COLUMNS APPEAR TO NOT CORILATE. COLUMN TO BE DROPPED!
 

array([    0,  2000,  1200,   420,   400,  1500,   750,   450,   500,
         650,   600,   700,   560,   350,  1400,   480,  1000,   490,
       15500,  1512,   620])

In [194]:
df = df.drop(columns=['misc_val'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold,sale_type
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,4,2006,WD
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,8,2006,WD
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,9,2006,New
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,7,2007,WD
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,7,2009,WD


In [195]:
# COLUMN:		mo_sold             
# DEFINITION:	Month Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.mo_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([ 4,  8,  9,  7,  6,  5, 10,  1,  2, 11, 12,  3])

In [196]:
# COLUMN:		yr_sold             
# DEFINITION:	Year Sold
# DATA TYPE:	int64
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.yr_sold.unique()
# EVALUATION: COLUMN READY TO GO... DATA TYPE GOOD, NO MISSING
# VALUES!


array([2006, 2007, 2009, 2010, 2008])

In [197]:
# COLUMN:		sale_type          
# DEFINITION:	Type of sale
# DATA TYPE:	object
# MISSING VALUES:	0
# UNIQUE VALUES:	
df.sale_type.unique()
# EVALUATION: DATA TYPE NEEDS TO BE CONVERTED. NO MISSING
# VALUES!


array(['WD ', 'New', 'Con', 'COD', 'VWD', 'CWD', 'ConLD', 'ConLI', 'Oth',
       'ConLw'], dtype=object)

- COLUMN HAS DIFFERENT VALUES THAN TEST.CSV

In [198]:
df = df.drop(columns=['sale_type'])
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,4,2006
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,8,2006
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,9,2006
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,7,2007
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,7,2009


**Describe the basic format:**

In [199]:
df.dtypes


ms_subclass       int64
ms_zoning         int64
street            int64
alley             int64
land_contour      int64
utilities         int64
lot_config        int64
neighborhood      int64
condition_1       int64
bldg_type         int64
house_style       int64
overall_qual      int64
exter_qual        int64
foundation        int64
bsmt_cond         int64
heating_qc        int64
central_air       int64
bsmt_full_bath    int64
bsmt_half_bath    int64
full_bath         int64
half_bath         int64
bedroom_abvgr     int64
kitchen_abvgr     int64
totrms_abvgrd     int64
functional        int64
fireplaces        int64
garage_type       int64
garage_cars       int64
garage_qual       int64
paved_drive       int64
enclosed_porch    int64
pool_area         int64
mo_sold           int64
yr_sold           int64
dtype: object

In [200]:
df.head(5)


Unnamed: 0,ms_subclass,ms_zoning,street,alley,land_contour,utilities,lot_config,neighborhood,condition_1,bldg_type,house_style,overall_qual,exter_qual,foundation,bsmt_cond,heating_qc,central_air,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,totrms_abvgrd,functional,fireplaces,garage_type,garage_cars,garage_qual,paved_drive,enclosed_porch,pool_area,mo_sold,yr_sold
1,190,3,0,1,0,0,1,14,0,4,1,6,0,4,2,2,1,0,0,2,0,4,2,11,0,0,2,1,5,1,112,0,4,2006
2,90,0,0,0,0,0,1,11,0,3,0,5,0,0,2,1,0,0,0,2,0,6,2,12,0,0,1,2,1,1,0,0,8,2006
3,60,0,0,0,0,0,1,1,0,0,1,7,1,1,1,3,0,1,0,2,1,3,1,10,0,1,1,2,1,1,0,0,9,2006
4,30,3,0,0,0,0,1,14,0,0,0,5,1,0,2,1,0,0,0,1,0,2,1,6,0,0,2,2,2,0,184,0,7,2007
5,20,0,0,0,0,0,1,0,0,0,0,6,0,0,2,2,0,1,0,1,1,3,1,8,0,2,1,2,1,1,0,0,7,2009


In [201]:
df.shape


(879, 34)

**Export Dataframe:**

In [202]:
df.to_csv('../data/test_clean.csv')
