<img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="240" height="360" />

# ML1 Project using HousePrice Dataset

## Table of Contents

1. [Problem Statement](#section1)<br>
2. [Data Loading and Description](#section2)
3. [Pandas Profiling](#section3)
4. [Preprocessing](#section4)
5. [EDA](#section5)
6. [Feature Selection using Random Forest](#section6)
7. [Linear Regression](#section7)<br>
    - 4.1 [Introduction to Random Forest](#section401)<br>
    - 4.2 [Real Life Analogy](#section402)<br>
    - 4.3 [Wisdom of Crowd](#section403)<br>
    - 4.4 [Concept behind random forest](#section404)<br>
         - 4.4.1 [Random Sampling](#randomsampling)<br>
         - 4.4.2 [Random Subsets of Features](#randomsubset)<br>
    - 4.5 [Advantages and Disadvantages](#section405)<br>
    - 4.6 [Use Cases](#section406)<br>
    - 4.7 [Preparing X and y using pandas](#section407)<br>
    - 4.8 [Splitting X and y into training and test datasets.](#section408)<br>
    - 4.9 [Random Forest in scikit-learn](#section409)<br>
    - 4.10 [Using the Model for Prediction](#section410)<br>
8. [Logistic Regression](#section8)<br>
    - 5.1 [Model Evaluation using accuracy score](#section501)<br>
    - 5.2 [Model Evaluation using confusion matrix](#section502)<br>
9. [Decision Tree](#section9)<br>
10. [Random Forest](#section10)<br>
11. [Comparision of Models](#section11)<br>
12. [Conclusion](#section12)

### 1. Problem Statement

The goal is to __predict house price__ using __Linear Regression, Logistic Regression, Decision Tree and Random Forest__.

### 2. Data Loading and Description

- The dataset consists of the information about people boarding the famous RMS Titanic. Various variables present in the dataset includes data of age, sex, fare, ticket etc. 
- The dataset comprises of __891 observations of 12 columns__. Below is a table showing names of all the columns and their description.

| Column Name   | Description                                               |
| ------------- |:-------------                                            :| 
| PassengerId   | Passenger Identity                                        | 
| Survived      | Whether passenger survived or not                         |  
| Pclass        | Class of ticket                                           | 
| Name          | Name of passenger                                         |   
| Sex           | Sex of passenger                                          |
| Age           | Age of passenger                                          |
| SibSp         | Number of sibling and/or spouse travelling with passenger |
| Parch         | Number of parent and/or children travelling with passenger|
| Ticket        | Ticket number                                             |
| Fare          | Price of ticket                                           |
| Cabin         | Cabin number                                              |
| Embarked      | Port of embarkation                                       |

#### Importing packages                                          

In [1]:
import numpy as np                     

import pandas as pd
pd.set_option('mode.chained_assignment', None)      # To suppress pandas warnings.
pd.set_option('display.max_colwidth', None)         # To display all the data in each column
pd.set_option('display.max_columns', None)          # To display every column of the dataset in head()

import warnings
warnings.filterwarnings('ignore')                   # To suppress all the warnings in the notebook.

In [2]:
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set(style='whitegrid', font_scale=1.3, color_codes=True)      # To apply seaborn styles to the plots.

In [3]:
# Making plotly specific imports

# pip install chart-studio

'''
# If you're using an older version of plotly, you might have to import the below modules as well.
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import chart_studio.plotly as py
from plotly import tools
init_notebook_mode(connected=True)
'''

import plotly.graph_objs as go

#### Importing the Dataset

In [5]:
houseprice = pd.read_csv('https://raw.githubusercontent.com/insaid2018/Term-2/master/Projects/house_data.csv')

In [6]:
houseprice.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gable,CompShg,VinylSd,VinylSd,BrkFace,196.0,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,SBrkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,,Attchd,2003.0,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gable,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976.0,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,162.0,Gd,TA,PConc,Gd,TA,Mn,GLQ,486,Unf,0,434,920,GasA,Ex,Y,SBrkr,920,866,0,1786,1,0,2,1,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,2,608,TA,TA,Y,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,2Story,7,5,1915,1970,Gable,CompShg,Wd Sdng,Wd Shng,,0.0,TA,TA,BrkTil,TA,Gd,No,ALQ,216,Unf,0,540,756,GasA,Gd,Y,SBrkr,961,756,0,1717,1,0,1,0,3,1,Gd,7,Typ,1,Gd,Detchd,1998.0,Unf,3,642,TA,TA,Y,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,2Story,8,5,2000,2000,Gable,CompShg,VinylSd,VinylSd,BrkFace,350.0,Gd,TA,PConc,Gd,TA,Av,GLQ,655,Unf,0,490,1145,GasA,Ex,Y,SBrkr,1145,1053,0,2198,1,0,2,1,4,1,Gd,9,Typ,1,TA,Attchd,2000.0,RFn,3,836,TA,TA,Y,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000


In [7]:
houseprice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

In [8]:
houseprice.shape

(1460, 81)

In [9]:
houseprice.describe(include='all')

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
count,1460.0,1460.0,1460,1201.0,1460.0,1460,91,1460,1460,1460,1460,1460,1460,1460,1460,1460,1460,1460.0,1460.0,1460.0,1460.0,1460,1460,1460,1460,1452.0,1452.0,1460,1460,1460,1423,1423,1422,1423,1460.0,1422,1460.0,1460.0,1460.0,1460,1460,1460,1459,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460,1460.0,1460,1460.0,770,1379,1379.0,1379,1460.0,1460.0,1379,1379,1460,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,7,281,54,1460.0,1460.0,1460.0,1460,1460,1460.0
unique,,,5,,,2,2,4,4,2,5,3,25,9,8,5,8,,,,,6,8,15,16,4.0,,4,5,6,4,4,4,6,,6,,,,6,5,2,5,,,,,,,,,,,4,,7,,5,6,,3,,,5,5,3,,,,,,,3,4,4,,,,9,6,
top,,,RL,,,Pave,Grvl,Reg,Lvl,AllPub,Inside,Gtl,NAmes,Norm,Norm,1Fam,1Story,,,,,Gable,CompShg,VinylSd,VinylSd,,,TA,TA,PConc,TA,TA,No,Unf,,Unf,,,,GasA,Ex,Y,SBrkr,,,,,,,,,,,TA,,Typ,,Gd,Attchd,,Unf,,,TA,TA,Y,,,,,,,Gd,MnPrv,Shed,,,,WD,Normal,
freq,,,1151,,,1454,50,925,1311,1459,1052,1382,225,1260,1445,1220,726,,,,,1141,1434,515,504,864.0,,906,1282,647,649,1311,953,430,,1256,,,,1428,741,1365,1334,,,,,,,,,,,735,,1360,,380,870,,605,,,1311,1326,1340,,,,,,,3,157,49,,,,1267,1198,
mean,730.5,56.89726,,70.049958,10516.828082,,,,,,,,,,,,,6.099315,5.575342,1971.267808,1984.865753,,,,,,103.685262,,,,,,,,443.639726,,46.549315,567.240411,1057.429452,,,,,1162.626712,346.992466,5.844521,1515.463699,0.425342,0.057534,1.565068,0.382877,2.866438,1.046575,,6.517808,,0.613014,,,1978.506164,,1.767123,472.980137,,,,94.244521,46.660274,21.95411,3.409589,15.060959,2.758904,,,,43.489041,6.321918,2007.815753,,,180921.19589
std,421.610009,42.300571,,24.284752,9981.264932,,,,,,,,,,,,,1.382997,1.112799,30.202904,20.645407,,,,,,181.066207,,,,,,,,456.098091,,161.319273,441.866955,438.705324,,,,,386.587738,436.528436,48.623081,525.480383,0.518911,0.238753,0.550916,0.502885,0.815778,0.220338,,1.625393,,0.644666,,,24.689725,,0.747315,213.804841,,,,125.338794,66.256028,61.119149,29.317331,55.757415,40.177307,,,,496.123024,2.703626,1.328095,,,79442.502883
min,1.0,20.0,,21.0,1300.0,,,,,,,,,,,,,1.0,1.0,1872.0,1950.0,,,,,,0.0,,,,,,,,0.0,,0.0,0.0,0.0,,,,,334.0,0.0,0.0,334.0,0.0,0.0,0.0,0.0,0.0,0.0,,2.0,,0.0,,,1900.0,,0.0,0.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0,1.0,2006.0,,,34900.0
25%,365.75,20.0,,59.0,7553.5,,,,,,,,,,,,,5.0,5.0,1954.0,1967.0,,,,,,0.0,,,,,,,,0.0,,0.0,223.0,795.75,,,,,882.0,0.0,0.0,1129.5,0.0,0.0,1.0,0.0,2.0,1.0,,5.0,,0.0,,,1961.0,,1.0,334.5,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0,5.0,2007.0,,,129975.0
50%,730.5,50.0,,69.0,9478.5,,,,,,,,,,,,,6.0,5.0,1973.0,1994.0,,,,,,0.0,,,,,,,,383.5,,0.0,477.5,991.5,,,,,1087.0,0.0,0.0,1464.0,0.0,0.0,2.0,0.0,3.0,1.0,,6.0,,1.0,,,1980.0,,2.0,480.0,,,,0.0,25.0,0.0,0.0,0.0,0.0,,,,0.0,6.0,2008.0,,,163000.0
75%,1095.25,70.0,,80.0,11601.5,,,,,,,,,,,,,7.0,6.0,2000.0,2004.0,,,,,,166.0,,,,,,,,712.25,,0.0,808.0,1298.25,,,,,1391.25,728.0,0.0,1776.75,1.0,0.0,2.0,1.0,3.0,1.0,,7.0,,1.0,,,2002.0,,2.0,576.0,,,,168.0,68.0,0.0,0.0,0.0,0.0,,,,0.0,8.0,2009.0,,,214000.0


## 3. Pandas Profiling

In [10]:
import pandas_profiling

In [11]:
profile = houseprice.profile_report(title="House Data Profiling before Data Preprocessing", progress_bar=False, minimal=True)
profile.to_file(output_file="house_data_profiling_before_preprocessing.html")

#### Observations from Profiling

**** SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.**** <BR>
MSSubClass: The building class **good data, continous values, value 20 has 536 count**<BR>
MSZoning: The general zoning classification **categorical - 5 values, RL has 1151 count**<BR>
LotFrontage: Linear feet of street connected to property **Good histogram, 17.7% missing value**<BR>
LotArea: Lot size in square feet **good histogram, 0% missing value**<BR>
Street: Type of road access ** Categorical, 2 Distinct, 0% missing value**<BR>
-- Alley: Type of alley access **categorical, 2 distinct, 93.8 % missing value, can be deleted.**<BR>
LotShape: General shape of property **Categorical, 4 distinct value, 0% Missing values**<BR>
LandContour: Flatness of the property **Categorical, 4 distinct value, 0% Missing values**<BR>
Utilities: Type of utilities available **CAtegorical, 2 distinct values, 0% Missing vale**<BR>
LotConfig: Lot configuration **Categorical, 5 distinct values, 0% missing values**<BR>
LandSlope: Slope of property **Categorical, 3 distinct values, 0% missing values**<BR>
Neighborhood: Physical locations within Ames city limits **Categorical,  distinct values, 0% missing values**<BR>
Condition1: Proximity to main road or railroad**Categorical,  distinct values, 0% missing values**<BR>
Condition2: Proximity to main road or railroad (if a second is present)**Categorical,  distinct values, 0% missing values**<BR>
BldgType: Type of dwelling**Categorical,  distinct values, 0% missing values**<BR>
HouseStyle: Style of dwelling**Categorical,  distinct values, 0% missing values**<BR>
OverallQual: Overall material and finish quality **good histogram, 0% missing values**<BR>
OverallCond: Overall condition rating **good histogram, 0% missing values**<BR>
YearBuilt: Original construction date **year column**<BR>
YearRemodAdd: Remodel date **year 1950 has 178 values**<BR>
RoofStyle: Type of roof **Categorical,  distinct values, 0% missing values**<BR>
RoofMatl: Roof material**Categorical,  distinct values, 0% missing values**<BR>
Exterior1st: Exterior covering on house**Categorical,  distinct values, 0% missing values**<BR>
Exterior2nd: Exterior covering on house (if more than one material)**Categorical,  distinct values, 0% missing values**<BR>
MasVnrType: Masonry veneer type **Categorical,  distinct values, 0.5% missing values**<BR>
MasVnrArea: Masonry veneer area in square feet **Right Skewed, Median is 0, mean is 104**<BR>
ExterQual: Exterior material quality **Categorical,  distinct values, 0% missing values**<BR>
ExterCond: Present condition of the material on the exterior **Categorical,  distinct values, 0% missing values**<BR>
Foundation: Type of foundation **Categorical,  distinct values, 0% missing values**<BR>
BsmtQual: Height of the basement**Categorical,  distinct values, 2.5% missing values**<BR>
BsmtCond: General condition of the basement **Categorical,  distinct values, 2.5% missing values**<BR>
BsmtExposure: Walkout or garden level basement walls **Categorical,  distinct values, 2.6% missing values**<BR>
BsmtFinType1: Quality of basement finished area **Categorical,  distinct values, 2.5% missing values**<BR>
BsmtFinSF1: Type 1 finished square feet **Right skewed, 0% missing values.**<BR>
BsmtFinType2: Quality of second finished area (if present)**Categorical,  distinct values, 2.6% missing values**<BR>
BsmtFinSF2: Type 2 finished square feet **Right skewed, 0 % missing values**<BR>
BsmtUnfSF: Unfinished square feet of basement area **right skewed, 0% ,missing values**<BR>
TotalBsmtSF: Total square feet of basement area **Good histogram, 0% missing values**<BR>
Heating: Type of heating **Categorical,  distinct values, 0% missing values**<BR>
HeatingQC: Heating quality and condition **Categorical,  distinct values, 0% missing values**<BR>
CentralAir: Central air conditioning**Boolean,  2 distinct values, 0% missing values**<BR>
Electrical: Electrical system **Categorical,  distinct values, 0.1% missing values**<BR>
1stFlrSF: First Floor square feet **good histogram, 0% missing values**<BR>
2ndFlrSF: Second floor square feet **0 median, 0% missing data**<BR>
LowQualFinSF: Low quality finished square feet (all floors) **0 median, 0% missing data**<BR>
GrLivArea: Above grade (ground) living area square feet **Good histogram, 0% missing values**<BR>
BsmtFullBath: Basement full bathrooms**Categorical,  distinct values, 0% missing values**<BR>
BsmtHalfBath: Basement half bathrooms**Categorical,  distinct values, 0% missing values**<BR>
FullBath: Full bathrooms above grade**Categorical,  distinct values, 0% missing values**<BR>
HalfBath: Half baths above grade**Categorical,  distinct values, 0% missing values**<BR>
Bedroom: Number of bedrooms above basement level **good histogram, 0% missing values**<BR>
Kitchen: Number of kitchens **Categorical,  distinct values, 0% missing values**<BR>
KitchenQual: Kitchen quality**Categorical,  distinct values, 0% missing values**<BR>
TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) **Good histogram, 0% missing values**<BR>
Functional: Home functionality rating **Categorical,  distinct values, 0% missing values**<BR>
Fireplaces: Number of fireplaces **Categorical,  distinct values, 0% missing values**<BR>
-- FireplaceQu: Fireplace quality **Categorical,  distinct values, 47.3% missing values**<BR>
GarageType: Garage location **Categorical,  distinct values, 5.5% missing values**<BR>
GarageYrBlt: Year garage was built **year column, 5.5% missing data**<BR>
GarageFinish: Interior finish of the garage **Categorical,  distinct values, 5.5% missing values**<BR>
GarageCars: Size of garage in car capacity **good histogram, 0% missing values**<BR>
GarageArea: Size of garage in square feet **good histogram, 0% missing data.**<BR>
GarageQual: Garage quality **Categorical,  distinct values, 5.5% missing values**<BR>
GarageCond: Garage condition **Categorical,  distinct values, 5.5% missing values**<BR>
PavedDrive: Paved driveway **Categorical,  distinct values, 0% missing values**<BR>
WoodDeckSF: Wood deck area in square feet **continuous data,  0% missing values, 52% zeroes**<BR>
-- OpenPorchSF: Open porch area in square feet **CONTINUOUS DATA, 44% zero values**<BR>
-- EnclosedPorch: Enclosed porch area in square feet **Continuous data, 85% of zeroes**<BR>
-- 3SsnPorch: Three season porch area in square feet **continuous data, 98.4% zeroes**<BR>
-- ScreenPorch: Screen porch area in square feet **continous data, 92.1%**<BR>
-- PoolArea: Pool area in square feet **Continuous data, 99.5% zeroes**<BR>
-- PoolQC: Pool quality **Categorical,  distinct values, 99.9% missing values**<BR>
-- Fence: Fence quality **Categorical,  distinct values, 88.8% missing values**<BR>
-- MiscFeature: Miscellaneous feature not covered in other categories **Categorical,  distinct values, 96.3% missing values**<BR>
-- MiscVal: $Value of miscellaneous feature **Continuous data, 96.4 %zeroes**<BR>
MoSold: Month Sold **good histogram, 0% missing data**<BR>
YrSold: Year Sold **0% missing data**<BR>
SaleType: Type of sale **Categorical,  distinct values, 0% missing values**<BR>
SaleCondition: Condition of sale **Categorical,  distinct values, 0% missing values**<BR>

#### 3.1 Data plot's before data cleaning and pre processing for understanding of raw data.

##### 3.1.1 Distplot before data pre Processing .

## 4. Preprocessing the data

#### Creating dummies for categorical features

In [12]:
MSZoning = pd.get_dummies(houseprice.MSZoning, prefix='MSZoning').iloc[:, 1:]
Street = pd.get_dummies(houseprice.MSZoning, prefix='Street').iloc[:, 1:]
LotShape = pd.get_dummies(houseprice.MSZoning, prefix='LotShape').iloc[:, 1:]
LandContour = pd.get_dummies(houseprice.MSZoning, prefix='LandContour').iloc[:, 1:]
Utilities = pd.get_dummies(houseprice.MSZoning, prefix='Utilities').iloc[:, 1:]
LotConfig = pd.get_dummies(houseprice.MSZoning, prefix='LotConfig').iloc[:, 1:]
LandSlope = pd.get_dummies(houseprice.MSZoning, prefix='LandSlope').iloc[:, 1:]
Neighborhood = pd.get_dummies(houseprice.MSZoning, prefix='Neighborhood').iloc[:, 1:]
Condition1 = pd.get_dummies(houseprice.MSZoning, prefix='Condition1').iloc[:, 1:]
BldgType = pd.get_dummies(houseprice.MSZoning, prefix='BldgType').iloc[:, 1:]
HouseStyle = pd.get_dummies(houseprice.MSZoning, prefix='HouseStyle').iloc[:, 1:]
RoofStyle = pd.get_dummies(houseprice.MSZoning, prefix='RoofStyle').iloc[:, 1:]
RoofMatl = pd.get_dummies(houseprice.MSZoning, prefix='RoofMatl').iloc[:, 1:]
Exterior1st = pd.get_dummies(houseprice.MSZoning, prefix='Exterior1st').iloc[:, 1:]
MasVnrType = pd.get_dummies(houseprice.MSZoning, prefix='MasVnrType').iloc[:, 1:]
Alley = pd.get_dummies(houseprice.Alley, prefix='Alley').iloc[:, 1:]
Condition2 = pd.get_dummies(houseprice.Condition2, prefix='Condition2').iloc[:, 1:]
Exterior2nd = pd.get_dummies(houseprice.Exterior2nd, prefix='Exterior2nd').iloc[:, 1:]
ExterQual = pd.get_dummies(houseprice.ExterQual, prefix='ExterQual').iloc[:, 1:]
ExterCond = pd.get_dummies(houseprice.ExterCond, prefix='ExterCond').iloc[:, 1:]
Foundation = pd.get_dummies(houseprice.Foundation, prefix='Foundation').iloc[:, 1:]
BsmtQual = pd.get_dummies(houseprice.BsmtQual, prefix='BsmtQual').iloc[:, 1:]
BsmtCond = pd.get_dummies(houseprice.BsmtCond, prefix='BsmtCond').iloc[:, 1:]
BsmtExposure = pd.get_dummies(houseprice.BsmtExposure, prefix='BsmtExposure').iloc[:, 1:]
BsmtFinType1 = pd.get_dummies(houseprice.BsmtFinType1, prefix='BsmtFinType1').iloc[:, 1:]
BsmtFinType2 = pd.get_dummies(houseprice.BsmtFinType2, prefix='BsmtFinType2').iloc[:, 1:]
Heating = pd.get_dummies(houseprice.Heating, prefix='Heating').iloc[:, 1:]
HeatingQC = pd.get_dummies(houseprice.HeatingQC, prefix='HeatingQC').iloc[:, 1:]
CentralAir = pd.get_dummies(houseprice.CentralAir, prefix='CentralAir').iloc[:, 1:]
Electrical = pd.get_dummies(houseprice.Electrical, prefix='Electrical').iloc[:, 1:]
KitchenQual = pd.get_dummies(houseprice.KitchenQual, prefix='KitchenQual').iloc[:, 1:]
Functional = pd.get_dummies(houseprice.Functional, prefix='Functional').iloc[:, 1:]
FireplaceQu = pd.get_dummies(houseprice.FireplaceQu, prefix='FireplaceQu').iloc[:, 1:]
GarageType = pd.get_dummies(houseprice.GarageType, prefix='GarageType').iloc[:, 1:]
GarageFinish = pd.get_dummies(houseprice.GarageFinish, prefix='GarageFinish').iloc[:, 1:]
GarageQual = pd.get_dummies(houseprice.GarageQual, prefix='GarageQual').iloc[:, 1:]
GarageCond = pd.get_dummies(houseprice.GarageCond, prefix='GarageCond').iloc[:, 1:]
PavedDrive = pd.get_dummies(houseprice.PavedDrive, prefix='PavedDrive').iloc[:, 1:]
PoolQC = pd.get_dummies(houseprice.PoolQC, prefix='PoolQC').iloc[:, 1:]
Fence = pd.get_dummies(houseprice.Fence, prefix='Fence').iloc[:, 1:]
MiscFeature = pd.get_dummies(houseprice.MiscFeature, prefix='MiscFeature').iloc[:, 1:]
SaleType = pd.get_dummies(houseprice.SaleType, prefix='SaleType').iloc[:, 1:]
SaleCondition = pd.get_dummies(houseprice.SaleCondition, prefix='SaleCondition').iloc[:, 1:]

In [13]:
houseprice = pd.concat([houseprice, MSZoning], axis=1)
houseprice = pd.concat([houseprice, Street], axis=1)
houseprice = pd.concat([houseprice, LotShape], axis=1)
houseprice = pd.concat([houseprice, LandContour], axis=1)
houseprice = pd.concat([houseprice, Utilities], axis=1)
houseprice = pd.concat([houseprice, LotConfig], axis=1)
houseprice = pd.concat([houseprice, LandSlope], axis=1)
houseprice = pd.concat([houseprice, Neighborhood], axis=1)
houseprice = pd.concat([houseprice, Condition1], axis=1)
houseprice = pd.concat([houseprice, BldgType], axis=1)
houseprice = pd.concat([houseprice, HouseStyle], axis=1)
houseprice = pd.concat([houseprice, RoofStyle], axis=1)
houseprice = pd.concat([houseprice, RoofMatl], axis=1)
houseprice = pd.concat([houseprice, Exterior1st], axis=1)
houseprice = pd.concat([houseprice, MasVnrType], axis=1)
houseprice = pd.concat([houseprice, Alley], axis=1)
houseprice = pd.concat([houseprice, Condition2], axis=1)
houseprice = pd.concat([houseprice, Exterior2nd], axis=1)
houseprice = pd.concat([houseprice, ExterQual], axis=1)
houseprice = pd.concat([houseprice, ExterCond], axis=1)
houseprice = pd.concat([houseprice, Foundation], axis=1)
houseprice = pd.concat([houseprice, BsmtQual], axis=1)
houseprice = pd.concat([houseprice, BsmtCond], axis=1)
houseprice = pd.concat([houseprice, BsmtExposure], axis=1)
houseprice = pd.concat([houseprice, BsmtFinType1], axis=1)
houseprice = pd.concat([houseprice, BsmtFinType2], axis=1)
houseprice = pd.concat([houseprice, Heating], axis=1)
houseprice = pd.concat([houseprice, HeatingQC], axis=1)
houseprice = pd.concat([houseprice, CentralAir], axis=1)
houseprice = pd.concat([houseprice, Electrical], axis=1)
houseprice = pd.concat([houseprice, KitchenQual], axis=1)
houseprice = pd.concat([houseprice, Functional], axis=1)
houseprice = pd.concat([houseprice, FireplaceQu], axis=1)
houseprice = pd.concat([houseprice, GarageType], axis=1)
houseprice = pd.concat([houseprice, GarageFinish], axis=1)
houseprice = pd.concat([houseprice, GarageQual], axis=1)
houseprice = pd.concat([houseprice, GarageCond], axis=1)
houseprice = pd.concat([houseprice, PavedDrive], axis=1)
houseprice = pd.concat([houseprice, PoolQC], axis=1)
houseprice = pd.concat([houseprice, Fence], axis=1)
houseprice = pd.concat([houseprice, MiscFeature], axis=1)
houseprice = pd.concat([houseprice, SaleType], axis=1)
houseprice = pd.concat([houseprice, SaleCondition], axis=1)

In [14]:
houseprice.drop(['MSZoning'], axis = 1, inplace = True)
houseprice.drop(['Street'], axis = 1, inplace = True)
houseprice.drop(['LotShape'], axis = 1, inplace = True)
houseprice.drop(['LandContour'], axis = 1, inplace = True)
houseprice.drop(['Utilities'], axis = 1, inplace = True)
houseprice.drop(['LotConfig'], axis = 1, inplace = True)
houseprice.drop(['LandSlope'], axis = 1, inplace = True)
houseprice.drop(['Neighborhood'], axis = 1, inplace = True)
houseprice.drop(['Condition1'], axis = 1, inplace = True)
houseprice.drop(['BldgType'], axis = 1, inplace = True)
houseprice.drop(['HouseStyle'], axis = 1, inplace = True)
houseprice.drop(['RoofStyle'], axis = 1, inplace = True)
houseprice.drop(['RoofMatl'], axis = 1, inplace = True)
houseprice.drop(['Exterior1st'], axis = 1, inplace = True)
houseprice.drop(['MasVnrType'], axis = 1, inplace = True)
houseprice.drop(['Alley'], axis = 1, inplace = True)
houseprice.drop(['Condition2'], axis = 1, inplace = True)
houseprice.drop(['Exterior2nd'], axis = 1, inplace = True)
houseprice.drop(['ExterQual'], axis = 1, inplace = True)
houseprice.drop(['ExterCond'], axis = 1, inplace = True)
houseprice.drop(['Foundation'], axis = 1, inplace = True)
houseprice.drop(['BsmtQual'], axis = 1, inplace = True)
houseprice.drop(['BsmtCond'], axis = 1, inplace = True)
houseprice.drop(['BsmtExposure'], axis = 1, inplace = True)
houseprice.drop(['BsmtFinType1'], axis = 1, inplace = True)
houseprice.drop(['BsmtFinType2'], axis = 1, inplace = True)
houseprice.drop(['Heating'], axis = 1, inplace = True)
houseprice.drop(['HeatingQC'], axis = 1, inplace = True)
houseprice.drop(['CentralAir'], axis = 1, inplace = True)
houseprice.drop(['Electrical'], axis = 1, inplace = True)
houseprice.drop(['KitchenQual'], axis = 1, inplace = True)
houseprice.drop(['Functional'], axis = 1, inplace = True)
houseprice.drop(['FireplaceQu'], axis = 1, inplace = True)
houseprice.drop(['GarageType'], axis = 1, inplace = True)
houseprice.drop(['GarageFinish'], axis = 1, inplace = True)
houseprice.drop(['GarageQual'], axis = 1, inplace = True)
houseprice.drop(['GarageCond'], axis = 1, inplace = True)
houseprice.drop(['PavedDrive'], axis = 1, inplace = True)
houseprice.drop(['PoolQC'], axis = 1, inplace = True)
houseprice.drop(['Fence'], axis = 1, inplace = True)
houseprice.drop(['MiscFeature'], axis = 1, inplace = True)
houseprice.drop(['SaleType'], axis = 1, inplace = True)
houseprice.drop(['SaleCondition'], axis = 1, inplace = True)
houseprice

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice,MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_FV,Street_RH,Street_RL,Street_RM,LotShape_FV,LotShape_RH,LotShape_RL,LotShape_RM,LandContour_FV,LandContour_RH,LandContour_RL,LandContour_RM,Utilities_FV,Utilities_RH,Utilities_RL,Utilities_RM,LotConfig_FV,LotConfig_RH,LotConfig_RL,LotConfig_RM,LandSlope_FV,LandSlope_RH,LandSlope_RL,LandSlope_RM,Neighborhood_FV,Neighborhood_RH,Neighborhood_RL,Neighborhood_RM,Condition1_FV,Condition1_RH,Condition1_RL,Condition1_RM,BldgType_FV,BldgType_RH,BldgType_RL,BldgType_RM,HouseStyle_FV,HouseStyle_RH,HouseStyle_RL,HouseStyle_RM,RoofStyle_FV,RoofStyle_RH,RoofStyle_RL,RoofStyle_RM,RoofMatl_FV,RoofMatl_RH,RoofMatl_RL,RoofMatl_RM,Exterior1st_FV,Exterior1st_RH,Exterior1st_RL,Exterior1st_RM,MasVnrType_FV,MasVnrType_RH,MasVnrType_RL,MasVnrType_RM,Alley_Pave,Condition2_Feedr,Condition2_Norm,Condition2_PosA,Condition2_PosN,Condition2_RRAe,Condition2_RRAn,Condition2_RRNn,Exterior2nd_AsphShn,Exterior2nd_Brk Cmn,Exterior2nd_BrkFace,Exterior2nd_CBlock,Exterior2nd_CmentBd,Exterior2nd_HdBoard,Exterior2nd_ImStucc,Exterior2nd_MetalSd,Exterior2nd_Other,Exterior2nd_Plywood,Exterior2nd_Stone,Exterior2nd_Stucco,Exterior2nd_VinylSd,Exterior2nd_Wd Sdng,Exterior2nd_Wd Shng,ExterQual_Fa,ExterQual_Gd,ExterQual_TA,ExterCond_Fa,ExterCond_Gd,ExterCond_Po,ExterCond_TA,Foundation_CBlock,Foundation_PConc,Foundation_Slab,Foundation_Stone,Foundation_Wood,BsmtQual_Fa,BsmtQual_Gd,BsmtQual_TA,BsmtCond_Gd,BsmtCond_Po,BsmtCond_TA,BsmtExposure_Gd,BsmtExposure_Mn,BsmtExposure_No,BsmtFinType1_BLQ,BsmtFinType1_GLQ,BsmtFinType1_LwQ,BsmtFinType1_Rec,BsmtFinType1_Unf,BsmtFinType2_BLQ,BsmtFinType2_GLQ,BsmtFinType2_LwQ,BsmtFinType2_Rec,BsmtFinType2_Unf,Heating_GasA,Heating_GasW,Heating_Grav,Heating_OthW,Heating_Wall,HeatingQC_Fa,HeatingQC_Gd,HeatingQC_Po,HeatingQC_TA,CentralAir_Y,Electrical_FuseF,Electrical_FuseP,Electrical_Mix,Electrical_SBrkr,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,Functional_Maj2,Functional_Min1,Functional_Min2,Functional_Mod,Functional_Sev,Functional_Typ,FireplaceQu_Fa,FireplaceQu_Gd,FireplaceQu_Po,FireplaceQu_TA,GarageType_Attchd,GarageType_Basment,GarageType_BuiltIn,GarageType_CarPort,GarageType_Detchd,GarageFinish_RFn,GarageFinish_Unf,GarageQual_Fa,GarageQual_Gd,GarageQual_Po,GarageQual_TA,GarageCond_Fa,GarageCond_Gd,GarageCond_Po,GarageCond_TA,PavedDrive_P,PavedDrive_Y,PoolQC_Fa,PoolQC_Gd,Fence_GdWo,Fence_MnPrv,Fence_MnWw,MiscFeature_Othr,MiscFeature_Shed,MiscFeature_TenC,SaleType_CWD,SaleType_Con,SaleType_ConLD,SaleType_ConLI,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
0,1,60,65.0,8450,7,5,2003,2003,196.0,706,0,150,856,856,854,0,1710,1,0,2,1,3,1,8,0,2003.0,2,548,0,61,0,0,0,0,0,2,2008,208500,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
1,2,20,80.0,9600,6,8,1976,1976,0.0,978,0,284,1262,1262,0,0,1262,0,1,2,0,3,1,6,1,1976.0,2,460,298,0,0,0,0,0,0,5,2007,181500,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
2,3,60,68.0,11250,7,5,2001,2002,162.0,486,0,434,920,920,866,0,1786,1,0,2,1,3,1,6,1,2001.0,2,608,0,42,0,0,0,0,0,9,2008,223500,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
3,4,70,60.0,9550,7,5,1915,1970,0.0,216,0,540,756,961,756,0,1717,1,0,1,0,3,1,7,1,1998.0,3,642,0,35,272,0,0,0,0,2,2006,140000,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
4,5,60,84.0,14260,8,5,2000,2000,350.0,655,0,490,1145,1145,1053,0,2198,1,0,2,1,4,1,9,1,2000.0,3,836,192,84,0,0,0,0,0,12,2008,250000,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,62.0,7917,6,5,1999,2000,0.0,0,0,953,953,953,694,0,1647,0,0,2,1,3,1,7,1,1999.0,2,460,0,40,0,0,0,0,0,8,2007,175000,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
1456,1457,20,85.0,13175,6,6,1978,1988,119.0,790,163,589,1542,2073,0,0,2073,1,0,2,0,3,1,7,2,1978.0,2,500,349,0,0,0,0,0,0,2,2010,210000,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0
1457,1458,70,66.0,9042,7,9,1941,2006,0.0,275,0,877,1152,1188,1152,0,2340,0,0,2,0,4,1,9,2,1941.0,1,252,0,60,0,0,0,0,2500,5,2010,266500,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0
1458,1459,20,68.0,9717,5,6,1950,1996,0.0,49,1029,0,1078,1078,0,0,1078,1,0,1,0,2,1,5,0,1950.0,1,240,366,0,112,0,0,0,0,4,2010,142125,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0


In [18]:
#Use these lines of code to print out the column names which contain at least one null value:

for column in houseprice:
    if houseprice[column].isnull().any():
       print('{0} has {1} null values'.format(column, houseprice[column].isnull().sum()))

LotFrontage has 259 null values
MasVnrArea has 8 null values
GarageYrBlt has 81 null values


##### Details of null values in Dataset
LotFrontage has 259 null values <br>
MasVnrArea has 8 null values <br>
GarageYrBlt has 81 null values <br>

#### Replace null or NA with mean of the column

In [19]:
# Replace value NA with mean of the column

houseprice['LotFrontage'].fillna((houseprice['LotFrontage'].mean()), inplace=True)
houseprice['MasVnrArea'].fillna((houseprice['MasVnrArea'].mean()), inplace=True)
houseprice['GarageYrBlt'].fillna((houseprice['GarageYrBlt'].mean()), inplace=True)

## 5. EDA

## 6. Feature Selection using Random Forest

In [24]:
import pandas as pd
from sklearn.ensemble.forest import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import train_test_split

In [22]:
features = ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'MSZoning_FV', 'MSZoning_RH', 'MSZoning_RL', 'MSZoning_RM', 'Street_FV', 'Street_RH', 'Street_RL', 'Street_RM', 'LotShape_FV', 'LotShape_RH', 'LotShape_RL', 'LotShape_RM', 'LandContour_FV', 'LandContour_RH', 'LandContour_RL', 'LandContour_RM', 'Utilities_FV', 'Utilities_RH', 'Utilities_RL', 'Utilities_RM', 'LotConfig_FV', 'LotConfig_RH', 'LotConfig_RL', 'LotConfig_RM', 'LandSlope_FV', 'LandSlope_RH', 'LandSlope_RL', 'LandSlope_RM', 'Neighborhood_FV', 'Neighborhood_RH', 'Neighborhood_RL', 'Neighborhood_RM', 'Condition1_FV', 'Condition1_RH', 'Condition1_RL', 'Condition1_RM', 'BldgType_FV', 'BldgType_RH', 'BldgType_RL', 'BldgType_RM', 'HouseStyle_FV', 'HouseStyle_RH', 'HouseStyle_RL', 'HouseStyle_RM', 'RoofStyle_FV', 'RoofStyle_RH', 'RoofStyle_RL', 'RoofStyle_RM', 'RoofMatl_FV', 'RoofMatl_RH', 'RoofMatl_RL', 'RoofMatl_RM', 'Exterior1st_FV', 'Exterior1st_RH', 'Exterior1st_RL', 'Exterior1st_RM', 'MasVnrType_FV', 'MasVnrType_RH', 'MasVnrType_RL', 'MasVnrType_RM', 'Alley_Pave', 'Condition2_Feedr', 'Condition2_Norm', 'Condition2_PosA', 'Condition2_PosN', 'Condition2_RRAe', 'Condition2_RRAn', 'Condition2_RRNn', 'Exterior2nd_AsphShn', 'Exterior2nd_Brk Cmn', 'Exterior2nd_BrkFace', 'Exterior2nd_CBlock', 'Exterior2nd_CmentBd', 'Exterior2nd_HdBoard', 'Exterior2nd_ImStucc', 'Exterior2nd_MetalSd', 'Exterior2nd_Other', 'Exterior2nd_Plywood', 'Exterior2nd_Stone', 'Exterior2nd_Stucco', 'Exterior2nd_VinylSd', 'Exterior2nd_Wd Sdng', 'Exterior2nd_Wd Shng', 'ExterQual_Fa', 'ExterQual_Gd', 'ExterQual_TA', 'ExterCond_Fa', 'ExterCond_Gd', 'ExterCond_Po', 'ExterCond_TA', 'Foundation_CBlock', 'Foundation_PConc', 'Foundation_Slab', 'Foundation_Stone', 'Foundation_Wood', 'BsmtQual_Fa', 'BsmtQual_Gd', 'BsmtQual_TA', 'BsmtCond_Gd', 'BsmtCond_Po', 'BsmtCond_TA', 'BsmtExposure_Gd', 'BsmtExposure_Mn', 'BsmtExposure_No', 'BsmtFinType1_BLQ', 'BsmtFinType1_GLQ', 'BsmtFinType1_LwQ', 'BsmtFinType1_Rec', 'BsmtFinType1_Unf', 'BsmtFinType2_BLQ', 'BsmtFinType2_GLQ', 'BsmtFinType2_LwQ', 'BsmtFinType2_Rec', 'BsmtFinType2_Unf', 'Heating_GasA', 'Heating_GasW', 'Heating_Grav', 'Heating_OthW', 'Heating_Wall', 'HeatingQC_Fa', 'HeatingQC_Gd', 'HeatingQC_Po', 'HeatingQC_TA', 'CentralAir_Y', 'Electrical_FuseF', 'Electrical_FuseP', 'Electrical_Mix', 'Electrical_SBrkr', 'KitchenQual_Fa', 'KitchenQual_Gd', 'KitchenQual_TA', 'Functional_Maj2', 'Functional_Min1', 'Functional_Min2', 'Functional_Mod', 'Functional_Sev', 'Functional_Typ', 'FireplaceQu_Fa', 'FireplaceQu_Gd', 'FireplaceQu_Po', 'FireplaceQu_TA', 'GarageType_Attchd', 'GarageType_Basment', 'GarageType_BuiltIn', 'GarageType_CarPort', 'GarageType_Detchd', 'GarageFinish_RFn', 'GarageFinish_Unf', 'GarageQual_Fa', 'GarageQual_Gd', 'GarageQual_Po', 'GarageQual_TA', 'GarageCond_Fa', 'GarageCond_Gd', 'GarageCond_Po', 'GarageCond_TA', 'PavedDrive_P', 'PavedDrive_Y', 'PoolQC_Fa', 'PoolQC_Gd', 'Fence_GdWo', 'Fence_MnPrv', 'Fence_MnWw', 'MiscFeature_Othr', 'MiscFeature_Shed', 'MiscFeature_TenC', 'SaleType_CWD', 'SaleType_Con', 'SaleType_ConLD', 'SaleType_ConLI', 'SaleType_ConLw', 'SaleType_New', 'SaleType_Oth', 'SaleType_WD', 'SaleCondition_AdjLand', 'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal', 'SaleCondition_Partial']
# create a Python list of feature names
target = ['SalePrice']                                     # Define the target variable
houseprice[features].shape

(1460, 215)

In [25]:
X_train, X_test, y_train, y_test = train_test_split(houseprice[features], houseprice[target], test_size=0.3, random_state=0)

In [26]:
print(X_train.shape)
print(y_train.shape)

(1022, 215)
(1022, 1)


In [27]:
sel = SelectFromModel(RandomForestClassifier(n_estimators = 100))
sel.fit(X_train, y_train)

SelectFromModel(estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                                 class_weight=None,
                                                 criterion='gini',
                                                 max_depth=None,
                                                 max_features='auto',
                                                 max_leaf_nodes=None,
                                                 max_samples=None,
                                                 min_impurity_decrease=0.0,
                                                 min_impurity_split=None,
                                                 min_samples_leaf=1,
                                                 min_samples_split=2,
                                                 min_weight_fraction_leaf=0.0,
                                                 n_estimators=100, n_jobs=None,
                                                 oob_score=False,

In [28]:
selected_feat= X_train.columns[(sel.get_support())]
len(selected_feat)

56

In [29]:
print(selected_feat)

Index(['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2',
       'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GrLivArea',
       'BsmtFullBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', 'ScreenPorch', 'MoSold', 'YrSold',
       'Exterior2nd_HdBoard', 'Exterior2nd_MetalSd', 'Exterior2nd_Plywood',
       'Exterior2nd_VinylSd', 'Exterior2nd_Wd Sdng', 'ExterQual_Gd',
       'Foundation_CBlock', 'Foundation_PConc', 'BsmtQual_Gd', 'BsmtQual_TA',
       'BsmtExposure_No', 'BsmtFinType1_BLQ', 'BsmtFinType1_GLQ',
       'BsmtFinType1_Unf', 'HeatingQC_Gd', 'HeatingQC_TA', 'KitchenQual_Gd',
       'KitchenQual_TA', 'FireplaceQu_Gd', 'FireplaceQu_TA',
       'GarageType_Attchd', 'GarageType_Detchd', 'GarageFinish_RFn',
       'GarageFinish_Unf', 'Fence_MnPrv', 'Sal

#### Feature selected using Random Forest:
['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2',
       'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GrLivArea',
       'BsmtFullBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', 'ScreenPorch', 'MoSold', 'YrSold',
       'Exterior2nd_HdBoard', 'Exterior2nd_MetalSd', 'Exterior2nd_Plywood',
       'Exterior2nd_VinylSd', 'Exterior2nd_Wd Sdng', 'ExterQual_Gd',
       'ExterQual_TA', 'ExterCond_TA', 'Foundation_CBlock', 'Foundation_PConc',
       'BsmtQual_Gd', 'BsmtQual_TA', 'BsmtExposure_Mn', 'BsmtExposure_No',
       'BsmtFinType1_BLQ', 'BsmtFinType1_GLQ', 'BsmtFinType1_Unf',
       'HeatingQC_Gd', 'HeatingQC_TA', 'KitchenQual_Gd', 'KitchenQual_TA',
       'FireplaceQu_Gd', 'FireplaceQu_TA', 'GarageType_Attchd',
       'GarageType_Detchd', 'GarageFinish_RFn', 'GarageFinish_Unf',
       'SaleType_WD', 'SaleCondition_Normal']