# House Prices: Advanced Regression Techniques

This Notebooks is for reference for anyone who want to start with Neural Network using Keras and its for Beginners only.

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

# Data fields
Here's a brief version of what you'll find in the data description file.

* SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
* MSSubClass: The building class
* MSZoning: The general zoning classification
* LotFrontage: Linear feet of street connected to property
* LotArea: Lot size in square feet
* Street: Type of road access
* Alley: Type of alley access
* LotShape: General shape of property
* LandContour: Flatness of the property
* Utilities: Type of utilities available
* LotConfig: Lot configuration
* LandSlope: Slope of property
* Neighborhood: Physical locations within Ames city limits
* Condition1: Proximity to main road or railroad
* Condition2: Proximity to main road or railroad (if a second is present)
* BldgType: Type of dwelling
* HouseStyle: Style of dwelling
* OverallQual: Overall material and finish quality
* OverallCond: Overall condition rating
* YearBuilt: Original construction date
* YearRemodAdd: Remodel date
* RoofStyle: Type of roof
* RoofMatl: Roof material
* Exterior1st: Exterior covering on house
* Exterior2nd: Exterior covering on house (if more than one material)
* MasVnrType: Masonry veneer type
* MasVnrArea: Masonry veneer area in square feet
* ExterQual: Exterior material quality
* ExterCond: Present condition of the material on the exterior
* Foundation: Type of foundation
* BsmtQual: Height of the basement
* BsmtCond: General condition of the basement
* BsmtExposure: Walkout or garden level basement walls
* BsmtFinType1: Quality of basement finished area
* BsmtFinSF1: Type 1 finished square feet
* BsmtFinType2: Quality of second finished area (if present)
* BsmtFinSF2: Type 2 finished square feet
* BsmtUnfSF: Unfinished square feet of basement area
* TotalBsmtSF: Total square feet of basement area
* Heating: Type of heating
* HeatingQC: Heating quality and condition
* CentralAir: Central air conditioning
* Electrical: Electrical system
* 1stFlrSF: First Floor square feet
* 2ndFlrSF: Second floor square feet
* LowQualFinSF: Low quality finished square feet (all floors)
* GrLivArea: Above grade (ground) living area square feet
* BsmtFullBath: Basement full bathrooms
* BsmtHalfBath: Basement half bathrooms
* FullBath: Full bathrooms above grade
* HalfBath: Half baths above grade
* Bedroom: Number of bedrooms above basement level
* Kitchen: Number of kitchens
* KitchenQual: Kitchen quality
* TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
* Functional: Home functionality rating
* Fireplaces: Number of fireplaces
* FireplaceQu: Fireplace quality
* GarageType: Garage location
* GarageYrBlt: Year garage was built
* GarageFinish: Interior finish of the garage
* GarageCars: Size of garage in car capacity
* GarageArea: Size of garage in square feet
* GarageQual: Garage quality
* GarageCond: Garage condition
* PavedDrive: Paved driveway
* WoodDeckSF: Wood deck area in square feet
* OpenPorchSF: Open porch area in square feet
* EnclosedPorch: Enclosed porch area in square feet
* 3SsnPorch: Three season porch area in square feet
* ScreenPorch: Screen porch area in square feet
* PoolArea: Pool area in square feet
* PoolQC: Pool quality
* Fence: Fence quality
* MiscFeature: Miscellaneous feature not covered in other categories
* MiscVal: $Value of miscellaneous feature
* MoSold: Month Sold
* YrSold: Year Sold
* SaleType: Type of sale
* SaleCondition: Condition of sale

# Goal

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. 

# Metric

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

# Import the Packages

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # Data Visualization
import seaborn as sns # Advance Data Visualization
%matplotlib inline

#OS packages
import os

#Label Encoding of Columns
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

#Kera Neural Network Package
from keras.models import Sequential, load_model
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils
from keras.wrappers.scikit_learn import KerasRegressor

In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Load the Data

In [None]:
df_Train = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
df_Test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

# Exploratory Data Analysis

In [None]:
#To find the head of the Data
df_Train.head()

In [None]:
#Information of the Dataset Datatype
df_Train.info()

In [None]:
#Information of the Dataset Continuous Values
df_Train.describe()

In [None]:
#Columns List
df_Train.columns

In [None]:
#Shape of the Train and Test Data
print('Shape of Train Data: ', df_Train.shape)
print('Shape of Test Data: ', df_Test.shape)

In [None]:
#Null values in the Train Dataset
print('Null values in Train Data: \n', df_Train.isnull().sum())

In [None]:
#Null Values in the Test Dataset
print('Null Values in Test Data: \n', df_Test.isnull().sum())

# Basic Feature Engineering

## Joining the Train and Test Data

In [None]:
# We will concat both train and test data set
df_Train['is_train'] = 1
df_Test['is_train'] = 0

#df_Frames = [df_Train,df_Test]
df_Total = pd.concat([df_Train, df_Test])

## Fill Missing Values

### Find the Percentage of the Missing Values in each Columns

In [None]:
#Percentage of the Missing Data

null_value = pd.concat([(df_Total.isnull().sum() /  df_Total.isnull().count())*100], axis=1, keys=['DF_TOTAL'], sort=False)
null_value[null_value.sum(axis = 1) > 0].sort_values(by = ['DF_TOTAL'], ascending = False)

### Remove the Columns with more than 40% Missing Values

In [None]:
#Deleting the Columns with more than 40% Null Values

df_Total.drop('PoolQC', axis = 1, inplace = True)
df_Total.drop('MiscFeature', axis = 1, inplace = True)
df_Total.drop('Alley', axis = 1, inplace = True)
df_Total.drop('Fence', axis = 1, inplace = True)
df_Total.drop('FireplaceQu', axis = 1, inplace = True)

### Fill the Missing Values with Forward Fill

In [None]:
#using Forward Fill to fill missing Values

df_Total['LotFrontage'] = df_Total['LotFrontage'].fillna(method="ffill",axis=0)
df_Total['GarageCond'] = df_Total['GarageCond'].fillna(method="ffill",axis=0)
df_Total['GarageYrBlt'] = df_Total['GarageYrBlt'].fillna(method="ffill",axis=0)
df_Total['GarageFinish'] = df_Total['GarageFinish'].fillna(method="ffill",axis=0)
df_Total['GarageQual'] = df_Total['GarageQual'].fillna(method="ffill",axis=0)
df_Total['GarageType'] = df_Total['GarageType'].fillna(method="ffill",axis=0)
df_Total['BsmtExposure'] = df_Total['BsmtExposure'].fillna(method="ffill",axis=0)
df_Total['BsmtCond'] = df_Total['BsmtCond'].fillna(method="ffill",axis=0)
df_Total['BsmtQual'] = df_Total['BsmtQual'].fillna(method="ffill",axis=0)
df_Total['BsmtFinType2'] = df_Total['BsmtFinType2'].fillna(method="ffill",axis=0)
df_Total['BsmtFinType1'] = df_Total['BsmtFinType1'].fillna(method="ffill",axis=0)
df_Total['MasVnrType'] = df_Total['MasVnrType'].fillna(method="ffill",axis=0)
df_Total['MasVnrArea'] = df_Total['MasVnrArea'].fillna(method="ffill",axis=0)
df_Total['MSZoning'] = df_Total['MSZoning'].fillna(method="ffill",axis=0)
df_Total['Functional'] = df_Total['Functional'].fillna(method="ffill",axis=0)
df_Total['BsmtHalfBath'] = df_Total['BsmtHalfBath'].fillna(method="ffill",axis=0)
df_Total['BsmtFullBath'] = df_Total['BsmtFullBath'].fillna(method="ffill",axis=0)
df_Total['Utilities'] = df_Total['Utilities'].fillna(method="ffill",axis=0)
df_Total['KitchenQual'] = df_Total['KitchenQual'].fillna(method="ffill",axis=0)
df_Total['TotalBsmtSF'] = df_Total['TotalBsmtSF'].fillna(method="ffill",axis=0)
df_Total['BsmtUnfSF'] = df_Total['BsmtUnfSF'].fillna(method="ffill",axis=0)
df_Total['GarageCars'] = df_Total['GarageCars'].fillna(method="ffill",axis=0)
df_Total['GarageArea'] = df_Total['GarageArea'].fillna(method="ffill",axis=0)
df_Total['BsmtFinSF2'] = df_Total['BsmtFinSF2'].fillna(method="ffill",axis=0)
df_Total['BsmtFinSF1'] = df_Total['BsmtFinSF1'].fillna(method="ffill",axis=0)
df_Total['Exterior2nd'] = df_Total['Exterior2nd'].fillna(method="ffill",axis=0)
df_Total['Exterior1st'] = df_Total['Exterior1st'].fillna(method="ffill",axis=0)
df_Total['SaleType'] = df_Total['SaleType'].fillna(method="ffill",axis=0)
df_Total['Electrical'] = df_Total['Electrical'].fillna(method="ffill",axis=0)

In [None]:
#Percentage of the Missing Data

null_value = pd.concat([(df_Total.isnull().sum() /  df_Total.isnull().count())*100], axis=1, keys=['DF_TOTAL'], sort=False)
null_value[null_value.sum(axis = 1) > 0].sort_values(by = ['DF_TOTAL'], ascending = False)

## Encoding Of Columns

### One Hot Encoding

In [None]:
"""
#get dummies

Column_Object = df_Total.dtypes[df_Total.dtypes == 'object'].index
df_Total = pd.get_dummies(df_Total, columns = Column_Object, dummy_na = True)

"""

In [None]:
df_Total = pd.get_dummies(df_Total, columns=["MSZoning"])
df_Total = pd.get_dummies(df_Total, columns=["LotShape"])
df_Total = pd.get_dummies(df_Total, columns=["LandContour"])
df_Total = pd.get_dummies(df_Total, columns=["LotConfig"])
df_Total = pd.get_dummies(df_Total, columns=["LandSlope"])
df_Total = pd.get_dummies(df_Total, columns=["Neighborhood"])
df_Total = pd.get_dummies(df_Total, columns=["Condition1"])
df_Total = pd.get_dummies(df_Total, columns=["Condition2"])
df_Total = pd.get_dummies(df_Total, columns=["BldgType"])
df_Total = pd.get_dummies(df_Total, columns=["HouseStyle"])
df_Total = pd.get_dummies(df_Total, columns=["RoofStyle"])
df_Total = pd.get_dummies(df_Total, columns=["RoofMatl"])
df_Total = pd.get_dummies(df_Total, columns=["Exterior1st"])
df_Total = pd.get_dummies(df_Total, columns=["Exterior2nd"])
df_Total = pd.get_dummies(df_Total, columns=["MasVnrType"])
df_Total = pd.get_dummies(df_Total, columns=["ExterQual"])
df_Total = pd.get_dummies(df_Total, columns=["ExterCond"])
df_Total = pd.get_dummies(df_Total, columns=["Foundation"])
df_Total = pd.get_dummies(df_Total, columns=["BsmtQual"])
df_Total = pd.get_dummies(df_Total, columns=["BsmtCond"])
df_Total = pd.get_dummies(df_Total, columns=["BsmtExposure"])
df_Total = pd.get_dummies(df_Total, columns=["BsmtFinType1"])
df_Total = pd.get_dummies(df_Total, columns=["BsmtFinType2"])
df_Total = pd.get_dummies(df_Total, columns=["Heating"])
df_Total = pd.get_dummies(df_Total, columns=["HeatingQC"])
df_Total = pd.get_dummies(df_Total, columns=["Electrical"])
df_Total = pd.get_dummies(df_Total, columns=["KitchenQual"])
df_Total = pd.get_dummies(df_Total, columns=["Functional"])
df_Total = pd.get_dummies(df_Total, columns=["GarageType"])
df_Total = pd.get_dummies(df_Total, columns=["GarageFinish"])
df_Total = pd.get_dummies(df_Total, columns=["GarageQual"])
df_Total = pd.get_dummies(df_Total, columns=["GarageCond"])
df_Total = pd.get_dummies(df_Total, columns=["PavedDrive"])
df_Total = pd.get_dummies(df_Total, columns=["SaleType"])
df_Total = pd.get_dummies(df_Total, columns=["SaleCondition"])

df_Total['Street'] = le.fit_transform(df_Total['Street'])
df_Total['Utilities'] = le.fit_transform(df_Total['Utilities'])
df_Total['CentralAir'] = le.fit_transform(df_Total['CentralAir'])

In [None]:
df_Total.shape

## Un Merge the Train and Test Data after Feature Engineering

In [None]:
#Un-Merge code
df_Train_final = df_Total[df_Total['is_train'] == 1]
df_Test_final = df_Total[df_Total['is_train'] == 0]

# Keras NN Model

## Split the Data to x and y variable

In [None]:
x = df_Train_final
x = x.drop(['Id'], axis=1)
#x = x.drop(['patientid'], axis=1)
x = x.drop(['is_train'], axis=1)
x = x.drop(['SalePrice'], axis=1)
y = df_Train_final['SalePrice']
x_pred = df_Test_final
x_pred = x_pred.drop(['Id'], axis=1)
#x_pred = x_pred.drop(['patientid'], axis=1)
x_pred = x_pred.drop(['is_train'], axis=1)
x_pred = x_pred.drop(['SalePrice'], axis=1)

In [None]:
x.shape

In [None]:
x_pred.shape

In [None]:
model = Sequential()

model.add(Dense(128, input_shape=(267,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(1))

model.compile(loss='mean_squared_error', metrics=['mse'], optimizer='adam')

In [None]:
history = model.fit(x, np.ravel(y), epochs=500, verbose=2)

In [None]:
#Plotting the Accuracy Metrics
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(history.history['mse'])
plt.plot(history.history['loss'])
plt.title('Model Accuracy')
plt.ylabel('Mean Square Error')
plt.xlabel('Loss')
plt.legend(['Mean Square Error', 'Loss'], loc='lower right')
fig

In [None]:
y_pred = model.predict_classes(x_pred)

In [None]:
y_pred = y_pred.reshape(1459,)

In [None]:
submission_df = pd.DataFrame({'Id':df_Test['Id'], 'SalePrice':y_pred})
submission_df.to_csv('Sample Submission.csv', index=False)

**If you have any additional Information, Ideas or feedbacks you would like to share for Improving my Notebook, do mention in the comments below**

**Also do show your appreciations if you find this Notebook useful, which will encourage me to work on more topics for Beginners**

# Warning

**The Data for the House Price Regression Challenge is Less and Cannot be used for Modelling in Neural Network. I have created the Notebook just for Educational Purpose for anyone who needs a Base Code for starting Keras NN**