# Car Price Prediction::

Download dataset from this link:

https://www.kaggle.com/hellbuoy/car-price-prediction

# Problem Statement::

A Chinese automobile company Geely Auto aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts.

They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:

Which variables are significant in predicting the price of a car
How well those variables describe the price of a car
Based on various market surveys, the consulting firm has gathered a large data set of different types of cars across the America market.

# task::
We are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

# WORKFLOW ::

1.Load Data

2.Check Missing Values ( If Exist ; Fill each record with mean of its feature )

3.Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).

4.Model : input Layer (No. of features ), 3 hidden layers including 10,8,6 unit & Output Layer with activation function relu/tanh (check by experiment).

5.Compilation Step (Note : Its a Regression problem , select loss , metrics according to it)
6.Train the Model with Epochs (100) and validate it

7.If the model gets overfit tune your model by changing the units , No. of layers , activation function , epochs , add dropout layer or add Regularizer according to the need .

8.Evaluation Step

9.Prediction

In [None]:
import numpy as np
import pandas as pd

In [None]:
car_data = pd.read_csv("CarPrice_Assignment.csv")


In [None]:
car_data

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,carheight,curbweight,enginetype,cylindernumber,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.40,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.40,8.0,115,5500,18,22,17450.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,201,-1,volvo 145e (sw),gas,std,four,sedan,rwd,front,109.1,188.8,68.9,55.5,2952,ohc,four,141,mpfi,3.78,3.15,9.5,114,5400,23,28,16845.0
201,202,-1,volvo 144ea,gas,turbo,four,sedan,rwd,front,109.1,188.8,68.8,55.5,3049,ohc,four,141,mpfi,3.78,3.15,8.7,160,5300,19,25,19045.0
202,203,-1,volvo 244dl,gas,std,four,sedan,rwd,front,109.1,188.8,68.9,55.5,3012,ohcv,six,173,mpfi,3.58,2.87,8.8,134,5500,18,23,21485.0
203,204,-1,volvo 246,diesel,turbo,four,sedan,rwd,front,109.1,188.8,68.9,55.5,3217,ohc,six,145,idi,3.01,3.40,23.0,106,4800,26,27,22470.0


In [None]:
car_data.isnull().any()

car_ID              False
symboling           False
CarName             False
fueltype            False
aspiration          False
doornumber          False
carbody             False
drivewheel          False
enginelocation      False
wheelbase           False
carlength           False
carwidth            False
carheight           False
curbweight          False
enginetype          False
cylindernumber      False
enginesize          False
fuelsystem          False
boreratio           False
stroke              False
compressionratio    False
horsepower          False
peakrpm             False
citympg             False
highwaympg          False
price               False
dtype: bool

In [None]:
car_data.dtypes

car_ID                int64
symboling             int64
CarName              object
fueltype             object
aspiration           object
doornumber           object
carbody              object
drivewheel           object
enginelocation       object
wheelbase           float64
carlength           float64
carwidth            float64
carheight           float64
curbweight            int64
enginetype           object
cylindernumber       object
enginesize            int64
fuelsystem           object
boreratio           float64
stroke              float64
compressionratio    float64
horsepower            int64
peakrpm               int64
citympg               int64
highwaympg            int64
price               float64
dtype: object

In [None]:
# onehot encode all catagorical columns
car_transform_data = pd.get_dummies(car_data, columns=['symboling','fueltype',	'aspiration',	'doornumber',	'carbody',	'drivewheel',	'enginelocation',	'enginetype',	'cylindernumber',	'fuelsystem'])


In [None]:
car_transform_data.columns

Index(['car_ID', 'CarName', 'wheelbase', 'carlength', 'carwidth', 'carheight',
       'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio',
       'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price',
       'symboling_-2', 'symboling_-1', 'symboling_0', 'symboling_1',
       'symboling_2', 'symboling_3', 'fueltype_diesel', 'fueltype_gas',
       'aspiration_std', 'aspiration_turbo', 'doornumber_four',
       'doornumber_two', 'carbody_convertible', 'carbody_hardtop',
       'carbody_hatchback', 'carbody_sedan', 'carbody_wagon', 'drivewheel_4wd',
       'drivewheel_fwd', 'drivewheel_rwd', 'enginelocation_front',
       'enginelocation_rear', 'enginetype_dohc', 'enginetype_dohcv',
       'enginetype_l', 'enginetype_ohc', 'enginetype_ohcf', 'enginetype_ohcv',
       'enginetype_rotor', 'cylindernumber_eight', 'cylindernumber_five',
       'cylindernumber_four', 'cylindernumber_six', 'cylindernumber_three',
       'cylindernumber_twelve', 'cylindernumber_two', 'fuels

In [None]:
car_transform_data.drop(columns = ['car_ID', 'CarName'], inplace = True)
car_transform_data.columns

Index(['wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight',
       'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower',
       'peakrpm', 'citympg', 'highwaympg', 'price', 'symboling_-2',
       'symboling_-1', 'symboling_0', 'symboling_1', 'symboling_2',
       'symboling_3', 'fueltype_diesel', 'fueltype_gas', 'aspiration_std',
       'aspiration_turbo', 'doornumber_four', 'doornumber_two',
       'carbody_convertible', 'carbody_hardtop', 'carbody_hatchback',
       'carbody_sedan', 'carbody_wagon', 'drivewheel_4wd', 'drivewheel_fwd',
       'drivewheel_rwd', 'enginelocation_front', 'enginelocation_rear',
       'enginetype_dohc', 'enginetype_dohcv', 'enginetype_l', 'enginetype_ohc',
       'enginetype_ohcf', 'enginetype_ohcv', 'enginetype_rotor',
       'cylindernumber_eight', 'cylindernumber_five', 'cylindernumber_four',
       'cylindernumber_six', 'cylindernumber_three', 'cylindernumber_twelve',
       'cylindernumber_two', 'fuelsystem_1bbl', 'fuelsys

In [None]:
car_transform_data.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price,symboling_-2,symboling_-1,symboling_0,symboling_1,symboling_2,symboling_3,fueltype_diesel,fueltype_gas,aspiration_std,aspiration_turbo,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon,drivewheel_4wd,drivewheel_fwd,drivewheel_rwd,enginelocation_front,enginelocation_rear,enginetype_dohc,enginetype_dohcv,enginetype_l,enginetype_ohc,enginetype_ohcf,enginetype_ohcv,enginetype_rotor,cylindernumber_eight,cylindernumber_five,cylindernumber_four,cylindernumber_six,cylindernumber_three,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
0,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,13495.0,0,0,0,0,0,1,0,1,1,0,0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
1,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,16500.0,0,0,0,0,0,1,0,1,1,0,0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
2,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,154,5000,19,26,16500.0,0,0,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
3,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,102,5500,24,30,13950.0,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
4,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,115,5500,18,22,17450.0,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0


In [None]:
#Splitting Data
x = (car_transform_data.loc[:, car_transform_data.columns != 'price'])
y = (car_transform_data.loc[:, car_transform_data.columns == 'price'])


In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)

In [None]:
x_train.shape

(143, 57)

In [None]:
y_train.shape

(143, 1)

In [None]:
x_test.shape

(62, 57)

In [None]:
y_test.shape

(62, 1)

In [None]:
x_train.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,symboling_-2,symboling_-1,symboling_0,symboling_1,symboling_2,symboling_3,fueltype_diesel,fueltype_gas,aspiration_std,aspiration_turbo,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon,drivewheel_4wd,drivewheel_fwd,drivewheel_rwd,enginelocation_front,enginelocation_rear,enginetype_dohc,enginetype_dohcv,enginetype_l,enginetype_ohc,enginetype_ohcf,enginetype_ohcv,enginetype_rotor,cylindernumber_eight,cylindernumber_five,cylindernumber_four,cylindernumber_six,cylindernumber_three,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
177,102.4,175.6,66.5,53.9,2458,122,3.31,3.54,8.7,92,4200,27,32,0,1,0,0,0,0,0,1,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
75,102.7,178.4,68.0,54.8,2910,140,3.78,3.12,8.0,175,5000,19,24,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
174,102.4,175.6,66.5,54.9,2480,110,3.27,3.35,22.5,73,4500,30,33,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0
31,86.6,144.6,63.9,50.8,1819,92,2.91,3.41,9.2,76,6000,31,38,0,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
12,101.2,176.8,64.8,54.3,2710,164,3.31,3.19,9.0,121,4250,21,28,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0


In [None]:
x_test.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,symboling_-2,symboling_-1,symboling_0,symboling_1,symboling_2,symboling_3,fueltype_diesel,fueltype_gas,aspiration_std,aspiration_turbo,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon,drivewheel_4wd,drivewheel_fwd,drivewheel_rwd,enginelocation_front,enginelocation_rear,enginetype_dohc,enginetype_dohcv,enginetype_l,enginetype_ohc,enginetype_ohcf,enginetype_ohcv,enginetype_rotor,cylindernumber_eight,cylindernumber_five,cylindernumber_four,cylindernumber_six,cylindernumber_three,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
15,103.5,189.0,66.9,55.7,3230,209,3.62,3.39,8.0,182,5400,16,22,0,0,1,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
9,99.5,178.2,67.9,52.0,3053,131,3.13,3.4,7.0,160,5500,16,22,0,0,1,0,0,0,0,1,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
100,97.2,173.4,65.2,54.7,2302,120,3.33,3.47,8.5,97,5200,27,34,0,0,1,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
132,99.1,186.6,66.5,56.1,2658,121,3.54,3.07,9.31,110,5250,21,28,0,0,0,0,0,1,0,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
68,110.0,190.9,70.3,58.7,3750,183,3.58,3.64,21.5,123,4350,22,25,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0


In [None]:
# Normalization

mean = x_train.iloc[:,0:13].mean(axis = 0)
x_train.iloc[:,0:13] -= mean

std = x_train.iloc[:,0:13].std(axis = 0)
x_train.iloc[:,0:13] /= std

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


In [None]:
x_train.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,symboling_-2,symboling_-1,symboling_0,symboling_1,symboling_2,symboling_3,fueltype_diesel,fueltype_gas,aspiration_std,aspiration_turbo,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon,drivewheel_4wd,drivewheel_fwd,drivewheel_rwd,enginelocation_front,enginelocation_rear,enginetype_dohc,enginetype_dohcv,enginetype_l,enginetype_ohc,enginetype_ohcf,enginetype_ohcv,enginetype_rotor,cylindernumber_eight,cylindernumber_five,cylindernumber_four,cylindernumber_six,cylindernumber_three,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
177,0.571301,0.076145,0.234282,0.043705,-0.220606,-0.121004,-0.038773,0.918727,-0.342851,-0.339563,-1.95928,0.368822,0.269301,0,1,0,0,0,0,0,1,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
75,0.620694,0.301819,0.921744,0.406597,0.644684,0.313395,1.687015,-0.447422,-0.529766,1.717747,-0.317217,-0.983524,-1.014369,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
174,0.571301,0.076145,0.234282,0.446919,-0.17849,-0.410604,-0.185649,0.300707,3.342036,-0.810513,-1.343506,0.875951,0.42976,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0
31,-2.030063,-2.422386,-0.957319,-1.206257,-1.443881,-0.845003,-1.507529,0.495871,-0.209341,-0.736153,1.735362,1.044994,1.232054,0,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
12,0.373729,0.172863,-0.544842,0.20499,0.261812,0.892594,-0.038773,-0.21973,-0.262745,0.379256,-1.856651,-0.645438,-0.372534,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0


In [None]:
mean

wheelbase             98.930070
carlength            174.655245
carwidth              65.988811
carheight             53.791608
curbweight          2573.237762
enginesize           127.013986
boreratio              3.320559
stroke                 3.257552
compressionratio       9.983986
horsepower           105.699301
peakrpm             5154.545455
citympg               24.818182
highwaympg            30.321678
dtype: float64

In [None]:
std

wheelbase             6.073737
carlength            12.407288
carwidth              2.181939
carheight             2.480075
curbweight          522.368306
enginesize           41.436526
boreratio             0.272339
stroke                0.307434
compressionratio      3.745026
horsepower           40.343950
peakrpm             487.191907
citympg               5.915647
highwaympg            6.232131
dtype: float64

In [None]:
x_test.iloc[:,0:13] -= mean


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


In [None]:
x_test.iloc[:,0:13] /= std

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


In [None]:
x_test.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,symboling_-2,symboling_-1,symboling_0,symboling_1,symboling_2,symboling_3,fueltype_diesel,fueltype_gas,aspiration_std,aspiration_turbo,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon,drivewheel_4wd,drivewheel_fwd,drivewheel_rwd,enginelocation_front,enginelocation_rear,enginetype_dohc,enginetype_dohcv,enginetype_l,enginetype_ohc,enginetype_ohcf,enginetype_ohcv,enginetype_rotor,cylindernumber_eight,cylindernumber_five,cylindernumber_four,cylindernumber_six,cylindernumber_three,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
15,0.752408,1.156156,0.417605,0.769489,1.257278,1.978593,1.099513,0.430817,-0.529766,1.891255,0.503815,-1.490654,-1.335286,0,0,1,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
9,0.093835,0.285699,0.875913,-0.722401,0.918437,0.096196,-0.699713,0.463344,-0.796786,1.345944,0.709073,-1.490654,-1.335286,0,0,1,0,0,0,0,1,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
100,-0.284844,-0.10117,-0.361518,0.366276,-0.519246,-0.169271,0.034665,0.691035,-0.396255,-0.215628,0.093299,0.368822,0.590219,0,0,1,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
132,0.027978,0.962721,0.234282,0.930775,0.162265,-0.145137,0.805762,-0.610058,-0.179968,0.106601,0.195928,-0.645438,-0.372534,0,0,0,0,0,1,0,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
68,1.822589,1.309291,1.975852,1.97913,2.252744,1.351127,0.952637,1.244,3.075016,0.42883,-1.651393,-0.476395,-0.85391,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0


**Model with relu activation function**

In [None]:
from keras import models
from keras import layers

def build_model():
  model = models.Sequential()
  model.add(layers.Dense(10, activation='relu', input_shape=(x_train.shape[1],)))
  model.add(layers.Dense(8, activation='relu'))
  model.add(layers.Dense(6, activation='relu'))
  model.add(layers.Dense(1))
  
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

In [None]:
len(x_train)

143

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores = []

In [None]:
for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model()
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores

[1987.1640625,
 2073.07666015625,
 2080.31201171875,
 2417.559814453125,
 2922.708984375]

In [None]:
np.mean(all_scores)

2296.164306640625

In [None]:
test_mse, test_mae = model.evaluate(x_test, y_test)



In [None]:
y_predict = model.predict(x_test)

In [None]:
y_test1 = np.array(y_test)

In [None]:
for i in range(len(y_predict)):
  print("Actual: ", y_test1[i], "Prediction: ", y_predict[i])

Actual:  [30760.] Prediction:  [25611.627]
Actual:  [17859.167] Prediction:  [18885.516]
Actual:  [9549.] Prediction:  [10014.344]
Actual:  [11850.] Prediction:  [15087.986]
Actual:  [28248.] Prediction:  [21651.912]
Actual:  [7799.] Prediction:  [5670.316]
Actual:  [7788.] Prediction:  [6830.4155]
Actual:  [9258.] Prediction:  [8369.083]
Actual:  [10198.] Prediction:  [10990.948]
Actual:  [7775.] Prediction:  [7861.6445]
Actual:  [13295.] Prediction:  [17531.67]
Actual:  [8238.] Prediction:  [7845.8647]
Actual:  [18280.] Prediction:  [17206.025]
Actual:  [9988.] Prediction:  [9684.515]
Actual:  [40960.] Prediction:  [30674.084]
Actual:  [6488.] Prediction:  [4761.7974]
Actual:  [5151.] Prediction:  [262.8854]
Actual:  [12629.] Prediction:  [14411.19]
Actual:  [8189.] Prediction:  [10638.231]
Actual:  [9960.] Prediction:  [12113.7]
Actual:  [8495.] Prediction:  [10742.384]
Actual:  [13499.] Prediction:  [21508.406]
Actual:  [8249.] Prediction:  [5459.6377]
Actual:  [6479.] Prediction: 

**Model with tanh activation function**

In [None]:
from keras import models
from keras import layers

def build_model_tanh():
  model = models.Sequential()
  model.add(layers.Dense(10, activation='tanh', input_shape=(x_train.shape[1],)))
  model.add(layers.Dense(8, activation='tanh'))
  model.add(layers.Dense(6, activation='tanh'))
  model.add(layers.Dense(1))
  
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores_tanh = []

for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model_tanh()
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores_tanh.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores_tanh

[12604.087890625,
 12067.203125,
 11902.5244140625,
 14879.2392578125,
 15335.8662109375]

In [None]:
np.mean(all_scores_tanh)

13357.7841796875

In [None]:
test_mse_tanh, test_mae_tanh = model.evaluate(x_test, y_test)



**Regularized Model Structure**

In [None]:
# Regularized model
from keras import regularizers
def build_model_regular(act):
  model = models.Sequential()
  model.add(layers.Dense(10, activation= act,kernel_regularizer= regularizers.l1_l2(l1=0.001, l2=0.001),input_shape=(x_train.shape[1],)))
  model.add(layers.Dense(8, activation= act,kernel_regularizer= regularizers.l1_l2(l1=0.001, l2=0.001)))
  model.add(layers.Dense(6, activation= act,kernel_regularizer= regularizers.l1_l2(l1=0.001, l2=0.001)))
  model.add(layers.Dense(1))
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

**Regularized model with relu activation function**

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%
# regularized model with relu activation

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores_regular_relu = []

for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model_regular('relu')
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores_regular_relu.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores_regular_relu

[2060.09912109375,
 2359.301513671875,
 2132.947509765625,
 2444.255615234375,
 2637.423095703125]

In [None]:
np.mean(all_scores_regular_relu)

2326.80537109375

In [None]:
test_mse_regular_relu, test_mae_regular_relu = model.evaluate(x_test, y_test)



In [None]:
y_predict_regular_relu = model.predict(x_test)

In [None]:
for i in range(len(y_predict_tanh)):
  print("Actual: ", y_test1[i], "Prediction: ", y_predict_regular_relu[i])

Actual:  [30760.] Prediction:  [26860.035]
Actual:  [17859.167] Prediction:  [19586.117]
Actual:  [9549.] Prediction:  [9906.404]
Actual:  [11850.] Prediction:  [14810.027]
Actual:  [28248.] Prediction:  [23281.33]
Actual:  [7799.] Prediction:  [5914.882]
Actual:  [7788.] Prediction:  [6952.822]
Actual:  [9258.] Prediction:  [8283.431]
Actual:  [10198.] Prediction:  [10519.544]
Actual:  [7775.] Prediction:  [7864.8135]
Actual:  [13295.] Prediction:  [18117.559]
Actual:  [8238.] Prediction:  [7376.785]
Actual:  [18280.] Prediction:  [17155.467]
Actual:  [9988.] Prediction:  [8745.9375]
Actual:  [40960.] Prediction:  [31987.66]
Actual:  [6488.] Prediction:  [5256.8647]
Actual:  [5151.] Prediction:  [201.69736]
Actual:  [12629.] Prediction:  [14283.629]
Actual:  [8189.] Prediction:  [10203.536]
Actual:  [9960.] Prediction:  [11682.656]
Actual:  [8495.] Prediction:  [10472.642]
Actual:  [13499.] Prediction:  [21992.61]
Actual:  [8249.] Prediction:  [5808.4834]
Actual:  [6479.] Prediction: 

**Regularized model with tanh activation function**

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%
# regularized model with tanh activation

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores_regular_tanh = []

for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model_regular('tanh')
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores_regular_tanh.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores_regular_tanh

[12601.578125,
 12068.994140625,
 11900.5537109375,
 14880.9931640625,
 15337.6591796875]

In [None]:
np.mean(all_scores_regular_tanh)

13357.9556640625

In [None]:
test_mse_regular_tanh, test_mae_regular_tanh = model.evaluate(x_test, y_test)



**Drop out Model Sturcture**

In [None]:
# dropout model
from keras import regularizers
def build_model_drop(act):
  model = models.Sequential()
  model.add(layers.Dense(10, activation= act,input_shape=(x_train.shape[1],)))
  model.add(layers.Dropout(0.1))
  model.add(layers.Dense(8, activation= act))
  model.add(layers.Dropout(0.1))
  model.add(layers.Dense(6, activation= act))
  model.add(layers.Dropout(0.1))
  model.add(layers.Dense(1))
  model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
  return model

**Drop out model with relu activation function**

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%
# drop model with relu activation

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores_drop_relu = []

for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model_drop('relu')
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores_drop_relu.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores_drop_relu

[2177.500244140625,
 12129.4267578125,
 2362.256103515625,
 2641.40087890625,
 3120.885498046875]

In [None]:
np.mean(all_scores_drop_relu)

4486.293896484375

In [None]:
test_mse_drop_relu, test_mae_drop_relu = model.evaluate(x_test, y_test)



In [None]:
y_predict_drop_relu = model.predict(x_test)

In [None]:
for i in range(len(y_predict_tanh)):
  print("Actual: ", y_test1[i], "Prediction: ", y_predict_drop_relu[i])

Actual:  [30760.] Prediction:  [24088.422]
Actual:  [17859.167] Prediction:  [17455.018]
Actual:  [9549.] Prediction:  [9375.627]
Actual:  [11850.] Prediction:  [14642.888]
Actual:  [28248.] Prediction:  [21004.166]
Actual:  [7799.] Prediction:  [5645.852]
Actual:  [7788.] Prediction:  [6261.7637]
Actual:  [9258.] Prediction:  [7687.3335]
Actual:  [10198.] Prediction:  [10561.333]
Actual:  [7775.] Prediction:  [7153.511]
Actual:  [13295.] Prediction:  [16060.99]
Actual:  [8238.] Prediction:  [7397.5713]
Actual:  [18280.] Prediction:  [16369.846]
Actual:  [9988.] Prediction:  [9665.814]
Actual:  [40960.] Prediction:  [29867.268]
Actual:  [6488.] Prediction:  [4733.7705]
Actual:  [5151.] Prediction:  [172.73401]
Actual:  [12629.] Prediction:  [14394.422]
Actual:  [8189.] Prediction:  [9983.305]
Actual:  [9960.] Prediction:  [11260.6455]
Actual:  [8495.] Prediction:  [10263.103]
Actual:  [13499.] Prediction:  [20049.555]
Actual:  [8249.] Prediction:  [5325.867]
Actual:  [6479.] Prediction

**Drop out model with tanh activation function**

In [None]:
#Applying K fold method and getting 20% validation data from training --> 143/5 = 28.6 exact 20%
# drop model with tanh activation

k = 5
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores_drop_tanh = []

for i in range(k):
  print('processing fold #', i)
  val_data = x_train[i * num_val_samples: (i + 1) * num_val_samples]
  val_targets = y_train[i * num_val_samples: (i + 1) * num_val_samples]

  partial_train_data = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
  partial_train_targets = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)
  model = build_model_drop('tanh')
  
  model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0)
  
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) 
  all_scores_drop_tanh.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4


In [None]:
all_scores_drop_tanh

[12605.5224609375,
 12071.220703125,
 11904.5263671875,
 14885.4970703125,
 15340.263671875]

In [None]:
np.mean(all_scores_drop_tanh)

13361.4060546875

In [None]:
test_mse_drop_tanh, test_mae_drop_tanh = model.evaluate(x_test, y_test)

