# Let's Start Our Journey

In this Journey, We are going to try to predict houses prices

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

raw_data = pd.read_csv('../input/housesalesprediction/kc_house_data.csv')
raw_data.head()

In [None]:
raw_data.columns

In [None]:
raw_data.info()

there is no missing values

# **Variable Definition**

* *id*: unique ID for each home sold
* *date*: date of the home sale
* *price*: price of each home sold
* *bedrooms*: number of berdrooms
* *bathrooms*: number of bathrooms

* *sqft_living*: square footage of the apartments interior living space
* *sqft_lot*: square footage of the land space
* *floors*: number of floors
* *waterfront*: A dummy variable for whether the apartment was overlooking the waterfornt or not
* *view*: an index from 0 to 4 of how good the view of the property was


* *condition*: an index from 1 to 5 on the condition of the apartment
* *grade*: an index from 1 to 13 where, 1-3 falls short of building construction and design, 7 has an average level of construction and design, and 11-13 have a high quality level of construction and design
* *sqft_above*: the square footage of the interior housing space that is above ground level
* *sqft_basement*: the square footage of the interior housing space that is below ground level
* *yr_built*: the year the house was initially built


* *yr_renovated*: the year of the house's last renovation
* *zipcode*: what zipcode are the house is in
* *lat*: lattitude
* *long*: longitude
* *sqft_living15*: the square footage of interior housing living space for the nearest 15 neighbors
* *sqft_lot15*: the square footage of the land lots of the nearest 15 neighbors

Source: https://www.slideshare.net/PawanShivhare1/predicting-king-county-house-prices

# First thing First

## Drop the unnecessary variable




In [None]:
# I think We don't need ID. So we should remove it from our data
data = raw_data.drop('id', axis = 1)

# I try to drop zipcode because We have had Latitude and Longitude data
data = data.drop('zipcode', axis = 1)

# and actually I don't understand about sqft_living15 and sqft_lot15, so I decided to remove it
data = data.drop(['sqft_living15','sqft_lot15'], axis = 1)

## Remove Imposible House

in this section, We try to remove data (row) which has these categories:

* sqft_lot < sqft_living
* sqft_lot < sqft_above
* sqft_lot < sqft_basement

because We can't build something that larger than our land. It's illegal

In [None]:
data = data[data['sqft_lot'] >= data['sqft_living']]
data = data[data['sqft_lot'] >= data['sqft_above']]
data = data[data['sqft_lot'] >= data['sqft_basement']]

# Preprocessing each Variable

## Date

We should turn this data into day, month, and year

In [None]:

data['date'] = pd.to_datetime(data['date'], infer_datetime_format=True)

data['date'] = pd.to_datetime(data['date'], format = '%Y/%m/%d')

# this code change value format into datetime format

In [None]:
data['day'] = data['date'].dt.day
data['month'] = data['date'].dt.month

#I think We don't need year data, because it contains two values only (2014 and 2015)

In [None]:
# we drop date columns, because We have split the data into day, month, and year
data = data.drop('date', axis=1)

# reorder the columns
data = data[['price', 'day', 'month','bedrooms', 'bathrooms', 'sqft_living',
       'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade',
       'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated',
       'lat', 'long']]

## Price

In [None]:
#make a checkpoint
data_price  = data.copy()

In [None]:
sns.distplot(data_price['price'])

In [None]:
sns.boxplot(data_price['price'])

In [None]:
# normal distribution check
price_kurtosis = data_price['price'].kurt()
price_skewness = data_price['price'].skew()

print ('Kurtosis and Skewness of Price \n')
print ('Kurtosis: ' + str(price_kurtosis))
print ('Skewness: ' + str(price_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7.

As You can see, Our data is "not normal". We should do something

In [None]:
#Based on our boxplot, let's try to remove the outlier with threshold 6000000
data_price = data_price[data_price['price'] <= 6000000]

#let's check kurtosis and skewness again
price_kurtosis = data_price['price'].kurt()
price_skewness = data_price['price'].skew()
print ('Kurtosis and Skewness of Price \n')
print ('Kurtosis: ' + str(price_kurtosis))
print ('Skewness: ' + str(price_skewness))

there is no significant change, so let's going back to our check point and transform it into natural log, 

because Our Kurtosis is more than 10. Which indicate that Our data has exponential data, which mean it's going to be better if We turn this data into natural log. let's try

In [None]:
data_price  = data.copy()

In [None]:
data_price['log_price'] = np.log(data_price['price'])

In [None]:
sns.distplot(data_price['log_price'])

In [None]:
sns.boxplot(data_price['log_price'])

In [None]:
#recheck Our kurtosis and skewness score again
price_kurtosis = data_price['log_price'].kurt()
price_skewness = data_price['log_price'].skew()

print ('Final Our Kurtosis and Skewness \n')
print ('Kurtosis: ' + str(price_kurtosis))
print ('Skewness: ' + str(price_skewness))

It's better, it has significant change.

In [None]:
# before move on, I think We should drop price columns because We have log_price now
data_price = data_price.drop('price', axis = 1)

## Bedroom
 
note: zero means the house doesn't have independent room for bedroom

In [None]:
#make a checkpoint
data_bedroom = data_price.copy()

In [None]:
data_bedroom['bedrooms'].unique()

In [None]:
sns.distplot(data_bedroom['bedrooms'])

In [None]:
sns.boxplot(data_bedroom['bedrooms'])

In [None]:
# normal distribution check
bedroom_kurtosis = data_bedroom['bedrooms'].kurt()
bedroom_skewness = data_bedroom['bedrooms'].skew()

print ('Kurtosis and Skewness of Bedroom \n')
print ('Kurtosis: ' + str(bedroom_kurtosis))
print ('Skewness: ' + str(bedroom_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7. 

As You can see, Our data is "not normal". We should do something

In [None]:
#based on Our boxplot let's try to remove the outlier with threshold is 15
data_bedroom = data_bedroom[data_bedroom['bedrooms'] <= 15]

#let's check kurtosis and skewness again
bedroom_kurtosis = data_bedroom['bedrooms'].kurt()
bedroom_skewness = data_bedroom['bedrooms'].skew()
print ('Final Our Kurtosis and Skewness \n')
print ('Kurtosis: ' + str(bedroom_kurtosis))
print ('Skewness: ' + str(bedroom_skewness))

Wow, there is significant changes. So We don't worry about this variable anymore

## Bathrooms

from [this article](https://www.badeloftusa.com/buying-guides/bathrooms/) 
We know that there is 4 feature in toilet, sink, shower, toilet, and bathub.

* 0.25 bathroom means 1 feature
* 0.50 bathroom means 2 features
* 0.75 bathroom means 3 features
* 1.00 bathroom means 4 features


In [None]:
#checkpoint
data_bathroom = data_bedroom.copy()

In [None]:
sns.distplot(data_bathroom['bathrooms'])

In [None]:
sns.boxplot(data_bathroom['bathrooms'])

In [None]:
# normal distribution check
bathroom_kurtosis = data_bathroom['bathrooms'].kurt()
bathroom_skewness = data_bathroom['bathrooms'].skew()

print ('Kurtosis and Skewness of Bathrooms \n')
print ('Kurtosis: ' + str(bathroom_kurtosis))
print ('Skewness: ' + str(bathroom_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7. 

As You can see, Our data is "normal". 

## sqft_living

In [None]:
#checkpoint
data_sqft_living = data_bathroom.copy()

In [None]:
sns.distplot(data_sqft_living['sqft_living'])

In [None]:
sns.boxplot(data_sqft_living['sqft_living'])

In [None]:
# normal distribution check

sqft_living_kurtosis = data_sqft_living['sqft_living'].kurt()
sqft_living_skewness = data_sqft_living['sqft_living'].skew()

print ('Kurtosis and Skewness of Sqft Living \n')
print ('Kurtosis: ' + str(sqft_living_kurtosis))
print ('Skewness: ' + str(sqft_living_skewness))

Based on [this Scientific journal](http://web.b.ebscohost.com/ehost/pdfviewer/pdfviewer?vid=0&sid=bdfdb145-c94f-4f92-bf57-9fd88ce4cd02%40sessionmgr103), data can  be  assumed  to  be  normally  distributed if skewness < 3  and  kurtosis < 10. 

As You can see, Our data is "normal". 

## sqft_lot

In [None]:
# checkpoint
data_sqft_lot = data_sqft_living.copy()

In [None]:
sns.distplot(data_sqft_lot['sqft_lot'])

In [None]:
sns.boxplot(data_sqft_lot['sqft_lot'])

In [None]:
# normal distribution check
sqft_lot_kurtosis = data_sqft_lot['sqft_lot'].kurt()
sqft_lot_skewness = data_sqft_lot['sqft_lot'].skew()

print ('Kurtosis and Skewness of Sqft Lot \n')
print ('Kurtosis: ' + str(sqft_lot_kurtosis))
print ('Skewness: ' + str(sqft_lot_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7. 

As You can see, Our data is "not normal" (it's really bad data). We should do something

In [None]:
#Based on our boxplot, let's try to remove the outlier with threshold 1250000
data_sqft_lot = data_sqft_lot[data_sqft_lot['sqft_lot'] <= 1250000]

#let's check kurtosis and skewness again
sqft_lot_kurtosis = data_sqft_lot['sqft_lot'].kurt()
sqft_lot_skewness = data_sqft_lot['sqft_lot'].skew()

print ('Kurtosis and Skewness of Sqft Lot \n')
print ('Kurtosis: ' + str(sqft_lot_kurtosis))
print ('Skewness: ' + str(sqft_lot_skewness))

there is no significant change, so let's going back to our check point and transform it into natural log,

because Our Kurtosis is more than 10. Which indicate that Our data has exponential data, which mean it's going to be better if We turn this data into natural log. let's try

In [None]:
data_sqft_lot = data_sqft_living.copy()

In [None]:
data_sqft_lot['log_sqft_lot'] = np.log(data_sqft_lot['sqft_lot'])

In [None]:
sns.distplot(data_sqft_lot['log_sqft_lot'])

In [None]:
sns.boxplot(data_sqft_lot['log_sqft_lot'])

In [None]:
#let's check kurtosis and skewness again
sqft_lot_kurtosis = data_sqft_lot['log_sqft_lot'].kurt()
sqft_lot_skewness = data_sqft_lot['log_sqft_lot'].skew()

print ('Final Our Kurtosis and Skewness \n')
print ('Kurtosis: ' + str(sqft_lot_kurtosis))
print ('Skewness: ' + str(sqft_lot_skewness))

It's better, it has significant change

In [None]:
# before move on, I think We should drop sqft_lot columns because We have log_sqft_lot now
data_sqft_lot= data_sqft_lot.drop('sqft_lot', axis = 1)

## Floors

In [None]:
data_floor = data_sqft_lot.copy()

In [None]:
sns.distplot(data_floor['floors'])

In [None]:
sns.boxplot(data_floor['floors'])

In [None]:
# normal distribution check
floor_kurtosis = data_floor['floors'].kurt()
floor_skewness = data_floor['floors'].skew()

print ('Kurtosis and Skewness of Floors \n')
print ('Kurtosis: ' + str(floor_kurtosis))
print ('Skewness: ' + str(floor_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7 (if it's negative skewness > -2 and kurtosis > - 7). 

As You can see, Our data is "normal". 

## Waterfront

waterfront: A dummy variable for whether the apartment was overlooking the waterfornt or not.

it's categorical data

In [None]:
data['waterfront'].unique()

This variable have 2 categores only (0 and 1), so We don't have a problem with this data

## View

view: an index from 0 to 4 of how good the view of the property was

It's categorical data

In [None]:
# checkpoint
data_view = data_floor.copy()

In [None]:
data_view['view'].unique()

We should make the dummies for this variable

In [None]:
# one-hot encoding
data_view = pd.get_dummies(data_view, columns=['view'])

In [None]:
data_view.info()

## Condition

condition: an index from 1 to 5 on the condition of the apartment

It's categorical data

In [None]:
data['condition'].unique()

We should turn this variable data into dummies

In [None]:
# a checkpoint
data_condition = data_view.copy()

In [None]:
# one-hot encoding
data_condition = pd.get_dummies(data_condition, columns=['condition'])

In [None]:
data_condition.info()

## Grade

In [None]:
# checkpoint
data_grade = data_condition.copy()

In [None]:
sns.distplot(data_grade['grade'])

In [None]:
sns.boxplot(data_grade['grade'])

In [None]:
# normal distribution check
grade_kurtosis = data_grade['grade'].kurt()
grade_skewness = data_grade['grade'].skew()

print ('Kurtosis and Skewness of Grade \n')
print ('Kurtosis: ' + str(grade_kurtosis))
print ('Skewness: ' + str(grade_skewness))

Based on [this Scientific journal](http://web.b.ebscohost.com/ehost/pdfviewer/pdfviewer?vid=0&sid=bdfdb145-c94f-4f92-bf57-9fd88ce4cd02%40sessionmgr103), data can  be  assumed  to  be  normally  distributed if skewness < 3  and  kurtosis < 10 

As You can see, Our data is "normal". 

## Sqft_Above

Before We analyze the dat, make sure this variable has a meaning

In [None]:
# checkpoint
data_above = data_grade.copy()

In [None]:
sns.distplot(data_above['sqft_above'])

In [None]:
sns.boxplot(data_above['sqft_above'])

In [None]:
# normal distribution check
above_kurtosis = data_above['sqft_above'].kurt()
above_skewness = data_above['sqft_above'].skew()

print ('Kurtosis and Skewness of Sqft_Above \n')
print ('Kurtosis: ' + str(above_kurtosis))
print ('Skewness: ' + str(above_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7. 

As You can see, Our data is "normal". 

## sqft_basement

In [None]:
# checkpoint
data_basement = data_above.copy()

In [None]:
sns.distplot(data_basement['sqft_basement'])

There is a lot of houses don't have basement. I think that We must transform the data into categorical data, 0 means it doesn't have  basement and 1 means it has basement

In [None]:
# transform the data more than 0 into 1 which mean it has basement
data_basement.loc[data_basement.sqft_basement > 0, 'sqft_basement'] = 1

In [None]:
#rename the columns
data_basement = data_basement.rename({'sqft_basement': 'basement'}, axis=1)

## Year Built

In [None]:
data_built = data_basement.copy()

In [None]:
sns.distplot(data_built['yr_built'])

In [None]:
sns.boxplot(data_built['yr_built'])

In [None]:
# normal distribution check
built_kurtosis = data_built['yr_built'].kurt()
built_skewness = data_built['yr_built'].skew()

print ('Kurtosis and Skewness of Year Built \n')
print ('Kurtosis: ' + str(built_kurtosis))
print ('Skewness: ' + str(built_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7 (or if they are negative use this threshold skewness > -3 and kurtosis > -10). 

As You can see, Our data is "normal". 

## Year Renovated

In [None]:
data_renovated = data_built.copy()

In [None]:
data_renovated['yr_renovated'].value_counts()

a lot of houses hasn't been renovated since it was built. I think We should transform the data into categorical, 0 means hasn't been renovated and 1 means has been renovated

In [None]:
# transform the data more than 0 into 1 which mean it has been renovated
data_renovated.loc[data_renovated.yr_renovated > 0, 'yr_renovated'] = 1

In [None]:
#rename the columns
data_renovated = data_renovated.rename({'yr_renovated': 'renovated'}, axis=1)

## Latitude

In [None]:
# checkpoint
data_lat = data_renovated.copy()

In [None]:
sns.distplot(data_lat['lat'])

In [None]:
sns.boxplot(data_lat['lat'])

In [None]:
# normal distribution check
lat_kurtosis = data_lat['lat'].kurt()
lat_skewness = data_lat['lat'].skew()

print ('Kurtosis and Skewness of Latitude \n')
print ('Kurtosis: ' + str(lat_kurtosis))
print ('Skewness: ' + str(lat_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7 (or if they are negative use this threshold skewness > -3 and kurtosis > -10). 

As You can see, Our data is "normal". Let's move on

## Longitude

In [None]:
# checkpoint
data_long = data_lat.copy()

In [None]:
sns.distplot(data_lat['long'])

In [None]:
sns.boxplot(data_lat['long'])

In [None]:
# normal distribution check
lat_kurtosis = data_lat['lat'].kurt()
lat_skewness = data_lat['lat'].skew()

print ('Kurtosis and Skewness of Longitude \n')
print ('Kurtosis: ' + str(lat_kurtosis))
print ('Skewness: ' + str(lat_skewness))

Based on [this Scientific journal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591587/), data can  be  assumed  to  be  normally  distributed if skewness < 2  and  kurtosis < 7 (or if they are negative use this threshold skewness > -3 and kurtosis > -10). 

As You can see, Our data is "normal".

We almost done cleaning Our data Yeeaaa


In [None]:
#Let's make a checkpoint and reset the index
data_cleaned = data_long.reset_index()

# Standardizing

We have a lot of type of data, in this section We try to standardizing the quantitaive data. 

In [None]:
data_cleaned.head(100)

In [None]:
data_cleaned.columns

In [None]:
# First, We must split the data which one has quantitative data and which one has qualitative data

quantitative_data = data_cleaned[['day', 'month','bedrooms', 'bathrooms', 'sqft_living', 'floors', 'grade', 'sqft_above', 'yr_built','lat', 'long', 'log_sqft_lot']]

# I move the log_price to this, because It doesn't need to be standardize
qualitative_data = data_cleaned[['waterfront', 'basement','renovated','view_0','view_1','view_2', 'view_3', 'view_4', 'condition_1','condition_2', 'condition_3','condition_4','condition_5', 'log_price',]]

In [None]:
# Warmup the engine
from sklearn.preprocessing import StandardScaler

data_scaler = StandardScaler()

In [None]:
# standardize the data
scaled = data_scaler.fit_transform(quantitative_data)

In [None]:
scaled

In [None]:
# turn into pandas
scaled_quan = pd.DataFrame(scaled, columns=quantitative_data.columns)

In [None]:
# combine with categorical data
scaled_data = pd.concat([scaled_quan,qualitative_data], axis=1)

In [None]:
scaled_data.head(100)

In [None]:
scaled_data.info()

Our data is ready to be analyzed

# Regression

Before We do regression, We should care about [multicollinearity](https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/). I try to used VIF (Variable Inflation Factors) to analyze multicollinearity.

In [None]:
# First, We must declare which one is dependent variable, and which one is independet variables
y = scaled_data['log_price']
x = scaled_data.drop('log_price', axis=1)

In [None]:
# Import library for VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor

def calc_vif(X):

    # Calculating VIF
    vif = pd.DataFrame()
    vif["variables"] = X.columns
    vif["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

    return(vif)

In [None]:
calc_vif(x)


* VIF starts at 1 and has no upper limit
* VIF = 1, no correlation between the independent variable and the other variables
* VIF exceeding 5 or 10 indicates high multicollinearity between this independent variable and the others

Source: https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/

 Based on Our VIF result, We should do something with sqft_living and sqft_above because Their VIF is more than 5.
 
 And also, We should do something with view and condition variable because Their VIF is in infinite number which is more than 5

I think We should drop sqft_above and keep sqft_living variable because It has meaningful data than sqft_above

and We should drop one of view variable and condition variable  because [this reason](https://www.quora.com/How-and-why-having-the-same-number-of-dummy-variables-as-categories-is-problematic-in-linear-regression-Dummy-variable-trap-Im-looking-for-a-purely-mathematical-not-intuitive-explanation-Also-please-avoid-using-the/answer/Iliya-Valchanov?share=9494e990&srid=uX7Kg)

In [None]:
pre_regression = scaled_data.drop(['sqft_above', 'view_0', 'condition_5'], axis = 1)

In [None]:
# Big Checkpoint
pre_regression.to_csv('pre_regression.csv', index=False)

In [None]:
import statsmodels.api as sm

In [None]:
# First, We must declare which one is dependent variable, and which one is independet variables
y = pre_regression['log_price']
x1 = pre_regression.drop('log_price', axis=1)

In [None]:
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()

In [None]:
results.summary()

Summary

* R-Squared         :     0.766  -> Really good model
* Prob (F-statistic): 	0.00   -> Which mean Our Regression is significantly can predict the houses price
* and each variable has p value less than 0.05, which mean variablest that We used in Our model is important for Our Model

Yeaah We did it guys :)

But I want to try to build a model using sklearn and Tensorflow.

# Bonus

## Machine Learning with Sklearn

In [None]:
# import the module
from sklearn.model_selection import train_test_split

In [None]:
# define input and target
indepedent_vr = pre_regression['log_price']
dependent_vr = pre_regression.drop('log_price', axis=1)

In [None]:
# split the data into 80% train and 20% test, as a default It will shuffle the data before the split 
x_train, x_test, y_train, y_test = train_test_split(dependent_vr, indepedent_vr, test_size=0.2)

In [None]:
# regression
from sklearn import linear_model
reg = linear_model.LinearRegression()

In [None]:
# create the model based on training data
reg.fit(x_train, y_train)

In [None]:
# test the accuracy with test data
reg.score(x_test, y_test)

The accuracy of Our model is 77%, is like what We did before with statsmodels module

## Machine Learning with Tensorflow

In [None]:
# We must shuffle the data first, because tensorflow doesn't have shuffle function
data_tf = pre_regression.sample(frac=1).reset_index(drop=True)

In [None]:
#split the data into 80% train, 10% validation, and 10 %test data
row_count = data_tf.shape[0]

train_data_count = int(0.8*row_count)
validation_data_count = int(0.1*row_count)
test_data_count = row_count - train_data_count - validation_data_count

In [None]:
#divide the data into targets and inputs
targets = data_tf['log_price']
inputs = data_tf.drop('log_price', axis=1)

In [None]:
train_inputs = inputs[:train_data_count]
train_targets = targets[:train_data_count]

validation_inputs = inputs[train_data_count:train_data_count+validation_data_count]
validation_targets = targets[train_data_count:train_data_count+validation_data_count]

test_inputs = inputs[train_data_count+validation_data_count:]
test_targets = targets[train_data_count+validation_data_count:]

let's make Our model

In [None]:
# Import our module
import tensorflow as tf

In [None]:
# The number of inputs
inputs_size = len(inputs.columns)

# The number of targets
targets_size = 1

# The number of hidden layers
hidden_layer_size = 64

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(targets_size)
    ])

model.compile(optimizer='adam', loss='mse', metrics=['mae','mse'])

batch_size = 100
max_epochs = 100

# to prevent overfitting
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

# start the engine
model.fit(train_inputs, 
          train_targets, 
          batch_size = batch_size, 
          epochs = max_epochs, 
          callbacks = [early_stopping],
          validation_data = (validation_inputs, validation_targets), 
          verbose=2)

In [None]:
# Evaluate Our model with testing data
loss, mae, mse= model.evaluate(test_inputs, test_targets, verbose=2)

In [None]:
print("Testing set Mean Abs Error: " + str(mae))
print("Testing set Mean Sq Error: " + str(mse))

It's mean that Our model can predict the price very well, Because It has small error

We did it guys, **Thank You** so much for Your time. Maybe next time, We should try to another journey. Hahaha

:)