# Predicting House Prices with Regression

Understand the problem statement

Understand the dataset

Data normalization

Convert Label Value

Train and Test split

Create a neural network model

Train the model to fit the dataset

Evaluate the model

Visualize the predictions

## Understand the problem statement

For this project, we are going to work on evaluating price of houses given the following features:

1. Year of sale of the house
2. The age of the house at the time of sale
3. Distance from city center
4. Number of stores in the locality
5. The latitude
6. The longitude

![Regression](regression.png)

Note: This notebook uses `python 3` and these packages: `tensorflow`, `pandas`, `matplotlib`, `scikit-learn`.

## Importing Libraries & Helper Functions

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import logging

from utils import *
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback

%matplotlib inline
logging.getLogger("tensorflow").setLevel(logging.ERROR)

print('Libraries imported.')

Libraries imported.


In [3]:
df=pd.read_csv("data.csv",names = ['Serial','Date','Age','Distance','Stores','Latitude','Longitude','Price'])
df.head(5)

Unnamed: 0,Serial,Date,Age,Distance,Stores,Latitude,Longitude,Price
0,0,2009,21,9,6,84,121,14264
1,1,2007,4,2,3,86,121,12032
2,2,2016,18,3,7,90,120,13560
3,3,2002,13,2,2,80,128,12029
4,4,2014,25,5,8,81,122,14157


## Understand the dataset

In [4]:
df.columns

Index(['Serial', 'Date', 'Age', 'Distance', 'Stores', 'Latitude', 'Longitude',
       'Price'],
      dtype='object')

In [5]:
df.describe()

Unnamed: 0,Serial,Date,Age,Distance,Stores,Latitude,Longitude,Price
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,2499.5,2008.9128,18.945,4.9778,4.915,84.9714,124.9942,13906.6386
std,1443.520003,5.457578,11.329539,3.199837,3.142889,3.16199,3.167992,1020.774876
min,0.0,2000.0,0.0,0.0,0.0,80.0,120.0,11263.0
25%,1249.75,2004.0,9.0,2.0,2.0,82.0,122.0,13197.75
50%,2499.5,2009.0,19.0,5.0,5.0,85.0,125.0,13893.5
75%,3749.25,2014.0,29.0,8.0,8.0,88.0,128.0,14614.0
max,4999.0,2018.0,38.0,10.0,10.0,90.0,130.0,16964.0


### Check Missing Data

It's a good practice to check if the data has any missing values. In real world data, this is quite common and must be taken care of before any data pre-processing or model training.

In [6]:
df.isna().sum()

Serial       0
Date         0
Age          0
Distance     0
Stores       0
Latitude     0
Longitude    0
Price        0
dtype: int64

## Data normalization

We can make it easier for optimization algorithms to find minimas by normalizing the data before training a model.

In [None]:
df=df.iloc[:,1:]
df_norm=(df-df.mean())/df.std()
df_norm

Unnamed: 0,Age,Distance,Stores,Latitude,Longitude,Price
0,0.181384,1.257002,0.345224,-0.307212,-1.260799,0.350088
1,-1.319118,-0.930610,-0.609312,0.325301,-1.260799,-1.836486
2,-0.083410,-0.618094,0.663402,1.590328,-1.576456,-0.339584
3,-0.524735,-0.930610,-0.927491,-1.572238,0.948803,-1.839425
4,0.534444,0.006938,0.981581,-1.255981,-0.945141,0.245266
...,...,...,...,...,...,...
4995,-0.171675,0.319454,-0.609312,1.590328,0.001831,-0.360156
4996,-1.054324,1.569518,-1.563848,0.009045,1.264460,0.833055
4997,-1.142588,1.569518,0.027045,1.590328,0.001831,0.191385
4998,1.593622,-0.618094,0.027045,-1.255981,0.948803,0.398091


## Convert Label Value


Because we are using normalized values for the labels, we will get the predictions back from a trained model in the same distribution. So, we need to convert the predicted values back to the original distribution if we want predicted prices.

In [7]:
y_mean=df['Price'].mean()
y_std=df['Price'].std()

def convert_label_values(pred):
    return int(round(pred * y_std + y_mean))

convert_label_values(0.350088)

14264

## Train and Test split


In [46]:
(x_train,y_train),(x_test,y_test)=train_test_split(df_norm)

ValueError: too many values to unpack (expected 2)