Link for colab https://colab.research.google.com/drive/1IrYsLYKXo7e9BLxI0ix5Vmi454xTKXtP?usp=sharing

# CS412 - Machine Learning - 2021
## Homework 2
100 pts


## Goal

The goal of this homework is two-fold:

*   Gain experience with neural network approaches
*   Gain experience with the Keras library

## Dataset
You are going to use a house price dataset that we prepared for you, that contains four independent variables (predictors) and one target variable. The task is predicting the target variable (house price) from the predictors (house attributes).


Download the data from SuCourse. Reserve 10% of the training data for validation and use the rest for development (learning your models). The official test data we provide (1,200 samples) should only be used for testing at the end, and not model selection.

## Task 
Build a regressor with a neural network that has only one hidden layer, using the Keras library function calls to predict house prices in the provided dataset.

Your code should follow the given skeleton and try the indicated parameters.

## Preprocessing and Meta-parameters
You should try 10,50 and 100 as hidden node count. 

You should  decide on the learning rate (step size), you can try values such as 0.001, 0.01, 0.1, but you may need to increase if learning is very slow or decrease if you see the loss increase!

You can use either sigmoid or Relu activations for the hidden nodes (indicate with your results) and you should know what to use for the activation for the output layer, input, output layer sizes, and the suitable loss function. 

## Software: 

Keras is a library that we will use especially for deep learning, but also with basic neural network functionality of course.

You may find the necessary function references here: 

http://scikit-learn.org/stable/supervised_learning.html
https://keras.io/api/

When you search for Dense for instance, you should find the relevant function and explained parameters, easily.

## Submission: 

Fill this notebook. Write the report section at the end.

You should prepare a separate pdf document as your homework (name hw1-CS412-yourname.pdf) which consists of the report (Part 8) of the notebook for easy viewing -and- include a link to your notebook from within the pdf report (make sure to include the link obtained from the #share link on top right, **be sure to share with Sabancı University first** as otherwise there will be access problems.). 

##1) Initialize

*   First make a copy of the notebook given to you as a starter.

*   Make sure you choose Connect form upper right.


## 2) Load training dataset

* Load the datasets (train.csv, test.csv) provided on SuCourse on your Google drive and read the datasets using Google Drive's mount functions. 
You may find the necessary functions here: 
https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import drive
drive.mount('/content/drive/') 
# click on the url that pops up and give the necessary authorizations

Mounted at /content/drive/




*   Set your notebooks working directory to the path where the datasets are uploaded (cd is the linux command for change directory) 
*   You may need to use cd drive/MyDrive depending on your path to the datasets on Google Drive. (don't comment the code in the cells when using linux commands)






In [None]:
cd drive/MyDrive/Cs412_hw2/

/content/drive/MyDrive/Cs412_hw2


* List the files in the current directory.

In [None]:
ls

test.csv  train.csv


##3) Understanding the dataset

There are alot of functions that can be used to know more about this dataset

- What is the shape of the training set (num of samples X number of attributes) ***[shape function can be used]***

- Display attribute names ***[columns function can be used]***

- Display the first 5 rows from training dataset ***[head or sample functions can be used]***

..

In [None]:
# import the necessary libraries
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
import keras.utils
from tensorflow.keras import utils as np_utils
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.optimizers import SGD, Adam
from sklearn.model_selection import train_test_split

train_df = pd.read_csv("train.csv")
# show first 10 elements of the training data
train_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,251,5,west,low,925701.721399
1,211,3,west,high,622237.482636
2,128,5,east,low,694998.182376
3,178,3,east,high,564689.015926
4,231,3,west,low,811222.970379
5,253,5,north,high,766250.032506
6,101,1,north,low,512749.401548
7,242,1,north,high,637010.760148
8,174,5,west,high,638136.374869
9,328,2,south,high,787704.988273


In [None]:
# print the shape of data

print("Data dimensionality is:", train_df.shape)

print("Data features are as follows:", train_df.columns, "\n")
print("Data types are as follows:")
train_df.dtypes


Data dimensionality is: (4800, 5)
Data features are as follows: Index(['sqmtrs', 'nrooms', 'view', 'crime_rate', 'price'], dtype='object') 

Data types are as follows:


sqmtrs          int64
nrooms          int64
view           object
crime_rate     object
price         float64
dtype: object

In [None]:
# also give some statistics about the data like mean, standard deviation etc.
print("General statistics of the numerical features are as follows;")
train_df.describe()

General statistics of the numerical features are as follows;


Unnamed: 0,sqmtrs,nrooms,price
count,4800.0,4800.0,4800.0
mean,225.033542,2.983958,725757.0
std,71.851436,1.421251,151041.1
min,100.0,1.0,356498.5
25%,163.0,2.0,617953.6
50%,226.0,3.0,729299.9
75%,287.0,4.0,838928.4
max,349.0,5.0,1076067.0


##4) Preprocessing Steps

As some of the features (predictive variables) on this dataset are categorical (non-numeric) you need to do some preprocessing for those features.

Please read 7-features.pdf under SuCourse for converting categorical variables to dummy variables. You can use as many **dummy or indicator variables** as there are categories within one feature. You can also look at pandas' get_dummies or keras.utils.to_categorical functions.

In neural networks, scaling of the features are important, because they affect the net input of a neuron as a whole. You should use **MinMax scaler** on sklearn for this task, which scales the variables between 0 and 1 on by default. (Remember that mean-squared error loss function tends to be extremely large with unscaled features.)


In [None]:
# encode the categorical variables
categorical_features = ['view', 'crime_rate']
train_df = pd.get_dummies(data=train_df, columns=categorical_features)
#train_df = train_df.reindex(columns=['sqmtrs','nrooms','view_east','view_north','view_south','view_west', 'crime_rate_high', 'crime_rate_low', 'price'])

tempprice = train_df[['price']]
train_df = train_df.drop("price", axis=1)
# scale the features between 0-1
from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler()
scaled_train = msc.fit_transform(train_df)
scaled_train_df2 = pd.DataFrame(scaled_train, columns = train_df.columns.values)
scaled_train_df = pd.concat((scaled_train_df2, tempprice), axis=1)
print("Data dimensionality is:", scaled_train_df.shape)
scaled_train_df.head(10)

Data dimensionality is: (4800, 9)


Unnamed: 0,sqmtrs,nrooms,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low,price
0,0.606426,1.0,0.0,0.0,0.0,1.0,0.0,1.0,925701.721399
1,0.445783,0.5,0.0,0.0,0.0,1.0,1.0,0.0,622237.482636
2,0.11245,1.0,1.0,0.0,0.0,0.0,0.0,1.0,694998.182376
3,0.313253,0.5,1.0,0.0,0.0,0.0,1.0,0.0,564689.015926
4,0.526104,0.5,0.0,0.0,0.0,1.0,0.0,1.0,811222.970379
5,0.614458,1.0,0.0,1.0,0.0,0.0,1.0,0.0,766250.032506
6,0.004016,0.0,0.0,1.0,0.0,0.0,0.0,1.0,512749.401548
7,0.570281,0.0,0.0,1.0,0.0,0.0,1.0,0.0,637010.760148
8,0.297189,1.0,0.0,0.0,0.0,1.0,1.0,0.0,638136.374869
9,0.915663,0.25,0.0,0.0,1.0,0.0,1.0,0.0,787704.988273


Don't forget the split the training data to obtain a validation set. **Use random_state=42**

In [None]:
# split 90-10
from sklearn.utils import shuffle
target = 'price'
X = scaled_train_df.drop(target, axis=1)
Y = scaled_train_df[[target]]
#X, Y = shuffle(X, Y, random_state = 42)

Xtrain, Xval, Ytrain, Yval = train_test_split(X, Y, test_size = 0.1, random_state=42)
print(Xtrain.shape, Xval.shape, Ytrain.shape, Yval.shape)



(4320, 8) (480, 8) (4320, 1) (480, 1)


##5) Train neural networks on development data and do model selection using the validation data


* Train a neural network with **one hidden layer** (try 3 different values for the number of neurons in that hidden layer, as 25, 50, 100), you will need to correctly choose the optimizer and the loss function that this model will train with. Use batch_size as 64 and train each model for 30 epochs. 

* Train another neural network with two hidden layers with meta-parameters of your choice. Again, use batch_size as 64 and train the model for 30 epochs. 

* **Bonus (5 pts)** Train a KNN or a Decision Tree model with your own choice of meta parameters to predict the house prices.


In [None]:
# train one-hidden layered neural networks
# define your model architecture
model = tf.keras.Sequential()
#model.add(Flatten())
model.add(Dense(25, activation='relu', input_shape=(Xtrain.shape[1],), name='hidden_1'))
model.add(tf.keras.layers.Dense(1, name='output_layer'))

model3 = tf.keras.Sequential()
#model.add(Flatten())
model3.add(Dense(50, activation='relu', input_shape=(Xtrain.shape[1],), name='hidden_1'))
model3.add(tf.keras.layers.Dense(1, name='output_layer'))

model4 = tf.keras.Sequential()
#model.add(Flatten())
model4.add(Dense(100, activation='relu', input_shape=(Xtrain.shape[1],), name='hidden_1'))
model4.add(tf.keras.layers.Dense(1, name='output_layer'))


In [None]:
# compile your model with an optimizer
adam = Adam(learning_rate=0.1)
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
model.compile(loss='mean_squared_error', optimizer= adam)
model3.compile(loss='mean_squared_error', optimizer= adam)
model4.compile(loss='mean_squared_error', optimizer= adam)

In [None]:
# fit the model on training data
model.fit(Xtrain, Ytrain, batch_size=64, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f6c4017e950>

In [None]:
model3.fit(Xtrain, Ytrain, batch_size=64, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f6bc9b8b090>

In [None]:
model4.fit(Xtrain, Ytrain, batch_size=64, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f6bc9afda90>

In [None]:
# train a two-hidden layered neural network
model2 = tf.keras.Sequential()
#model2.add(Flatten())
model2.add(Dense(100, activation='relu', input_shape=(Xtrain.shape[1],), name='hidden_1'))
model2.add(Dense(100, activation='relu', name='hidden_2'))
model2.add(tf.keras.layers.Dense(1, name = 'output_layer'))
# ...


In [None]:
adam = Adam(learning_rate=0.1)
model2.compile(loss='mean_squared_error', optimizer=adam)

In [None]:
model2.fit(Xtrain, Ytrain, batch_size=64, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f6bc99f32d0>

In [None]:
#Bonus Part
#I am using a decision tree regressor
#Test part will be at the bottom
from sklearn.tree import DecisionTreeRegressor
from timeit import default_timer as timer
from sklearn.metrics import mean_squared_error
start = timer()
max_depth_array = np.arange(5, 41, 5)
min_samples_array = [3, 5, 10, 15, 25, 50]
best_acc = 10000000000
best_depth = 0
best_sample_split = 0
accs = []
for i in max_depth_array:
  for j in min_samples_array:
    decision_tree = DecisionTreeRegressor(max_depth = i, min_samples_split = j)
    decision_tree.fit(Xtrain, Ytrain)
    ypred_on_train = decision_tree.predict(Xtrain)
    acc_score = mean_squared_error(Ytrain,ypred_on_train)
    print("Training mean squared error is ", acc_score, 
          ", obtained with maximum depth = ", i, " and min_samples_split = ", j)
    accs.append(acc_score)
    if(acc_score < best_acc):
      best_acc = acc_score
      best_depth = i 
      best_sample_split = j
# ...
end = timer()
print("\nBest training mean squared error is ", 
      best_acc, ", obtained with maximum depth = ", 
      best_depth, " and min_samples_split = ", best_sample_split)
print("Total time = ", '{:.3f}'.format((end-start)), "s")
# Report your results 

Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  3
Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  5
Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  10
Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  15
Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  25
Training mean squared error is  628301063.6566485 , obtained with maximum depth =  5  and min_samples_split =  50
Training mean squared error is  21283396.977072828 , obtained with maximum depth =  10  and min_samples_split =  3
Training mean squared error is  22358991.135699973 , obtained with maximum depth =  10  and min_samples_split =  5
Training mean squared error is  27886935.351051968 , obtained with maximum depth =  10  

In [None]:
start = timer()
best_acc = 10000000000
best_depth = 0
best_sample_split = 0
accs = []
for i in max_depth_array:
  for j in min_samples_array:
    decision_tree = DecisionTreeRegressor(max_depth = i, min_samples_split = j)
    decision_tree.fit(Xtrain, Ytrain)
    ypred_on_val = decision_tree.predict(Xval)
    acc_score = mean_squared_error(Yval, ypred_on_val)
    print("Validation mean squared error is ", 
          acc_score,
          "obtained with maximum depth = ", i, " and min_samples_split = ", j)
    accs.append(acc_score)
    if(acc_score < best_acc):
      best_acc = acc_score
      best_depth = i 
      best_sample_split = j
# ...
end = timer()
print("\nBest validation mean squared error is ", 
      best_acc,
       ", obtained with maximum depth = ", 
      best_depth, " and min_samples_split = ", best_sample_split)
print("Total time = ", '{:.3f}'.format((end-start)), "s")

Validation mean squared error is  724648862.1645591 obtained with maximum depth =  5  and min_samples_split =  3
Validation mean squared error is  724648862.1645582 obtained with maximum depth =  5  and min_samples_split =  5
Validation mean squared error is  724648862.1645586 obtained with maximum depth =  5  and min_samples_split =  10
Validation mean squared error is  724648862.1645585 obtained with maximum depth =  5  and min_samples_split =  15
Validation mean squared error is  724648862.1645583 obtained with maximum depth =  5  and min_samples_split =  25
Validation mean squared error is  724648862.1645586 obtained with maximum depth =  5  and min_samples_split =  50
Validation mean squared error is  55625850.794416934 obtained with maximum depth =  10  and min_samples_split =  3
Validation mean squared error is  56175503.23947415 obtained with maximum depth =  10  and min_samples_split =  5
Validation mean squared error is  57424018.84212972 obtained with maximum depth =  10  an

## 6) Test your trained classifiers on the Validation set
Test your trained classifiers on the validation set and print the mean squared errors.


In [None]:
# tests on validation
score = model.evaluate(Xval, Yval)
#print('Validation loss of model 1:', score)
score2 = model2.evaluate(Xval, Yval) #two hidden layered
score3 = model3.evaluate(Xval, Yval)
score4 = model4.evaluate(Xval, Yval)


#print('Validation loss of model 2:', score2)
#...




## 7) Test your classifier on Test set

- Load test data
- Apply same pre-processing as training data (encoding categorical variables, scaling)
- Predict the labels of testing data **using the best model that you have selected according to your validation results** and report the mean squared error. 

In [None]:
test_df = pd.read_csv("test.csv")
# show first 10 elements of the training data
test_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,349,3,south,high,836553.5
1,169,1,west,high,512741.6
2,233,3,south,high,663880.6
3,340,4,north,low,1000086.0
4,199,2,east,low,745015.1
5,332,1,east,high,774017.1
6,294,3,west,low,913263.4
7,111,3,east,low,586111.6
8,310,5,north,low,1012929.0
9,307,4,west,low,971532.7


In [None]:
categorical_features = ['view', 'crime_rate']
test_df = pd.get_dummies(data=test_df, columns=categorical_features)
#test_df = test_df.reindex(columns=['sqmtrs','nrooms','view_east','view_north','view_south','view_west', 'crime_rate_high', 'crime_rate_low', 'price'])

tempprice2 = test_df[['price']]
test_df = test_df.drop("price", axis=1)
# scale the features between 0-1
from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler(feature_range = (0,1))
scaled_test = msc.fit_transform(test_df)
scaled_test_df2 = pd.DataFrame(scaled_test, columns = test_df.columns.values)
scaled_test_df = pd.concat((scaled_test_df2, tempprice2), axis=1)

print("Data dimensionality is:", scaled_test_df.shape)
scaled_test_df.head(10)

Data dimensionality is: (1200, 9)


Unnamed: 0,sqmtrs,nrooms,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low,price
0,1.0,0.5,0.0,0.0,1.0,0.0,1.0,0.0,836553.5
1,0.277108,0.0,0.0,0.0,0.0,1.0,1.0,0.0,512741.6
2,0.534137,0.5,0.0,0.0,1.0,0.0,1.0,0.0,663880.6
3,0.963855,0.75,0.0,1.0,0.0,0.0,0.0,1.0,1000086.0
4,0.39759,0.25,1.0,0.0,0.0,0.0,0.0,1.0,745015.1
5,0.931727,0.0,1.0,0.0,0.0,0.0,1.0,0.0,774017.1
6,0.779116,0.5,0.0,0.0,0.0,1.0,0.0,1.0,913263.4
7,0.044177,0.5,1.0,0.0,0.0,0.0,0.0,1.0,586111.6
8,0.843373,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1012929.0
9,0.831325,0.75,0.0,0.0,0.0,1.0,0.0,1.0,971532.7


In [None]:
target = 'price'
Xtest = scaled_test_df.drop(target, axis=1)
Ytest = scaled_test_df[[target]]

In [None]:
Xtest.head()

Unnamed: 0,sqmtrs,nrooms,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low
0,1.0,0.5,0.0,0.0,1.0,0.0,1.0,0.0
1,0.277108,0.0,0.0,0.0,0.0,1.0,1.0,0.0
2,0.534137,0.5,0.0,0.0,1.0,0.0,1.0,0.0
3,0.963855,0.75,0.0,1.0,0.0,0.0,0.0,1.0
4,0.39759,0.25,1.0,0.0,0.0,0.0,0.0,1.0


In [None]:
Ytest.head()

Unnamed: 0,price
0,836553.5
1,512741.6
2,663880.6
3,1000086.0
4,745015.1


In [None]:
scoretest = model.evaluate(Xtest, Ytest)

scoretest2 = model4.evaluate(Xtest, Ytest)




In [None]:
scoretesthidden2 = model2.evaluate(Xtest, Ytest)



In [None]:
scorecheck3 = pd.DataFrame({'Real Values':Ytest.values.reshape(-1), 'Predicted Values':model4.predict(Xtest).reshape(-1)})
scorecheck3.head(20)

Unnamed: 0,Real Values,Predicted Values
0,836553.5,856033.4
1,512741.6,504827.0
2,663880.6,674134.1
3,1000086.0,1007487.0
4,745015.1,731869.2
5,774017.1,760613.8
6,913263.4,909844.1
7,586111.6,623069.6
8,1012929.0,989636.9
9,971532.7,959422.1


In [None]:
scorecheck2 = pd.DataFrame({'Real Values':Ytest.values.reshape(-1), 'Predicted Values':model2.predict(Xtest).reshape(-1)})
scorecheck2.head(20)

Unnamed: 0,Real Values,Predicted Values
0,836553.5,880860.6
1,512741.6,498931.1
2,663880.6,680999.3
3,1000086.0,1030760.0
4,745015.1,728669.4
5,774017.1,777684.0
6,913263.4,924991.9
7,586111.6,607607.6
8,1012929.0,1009629.0
9,971532.7,977946.9


In [None]:
# test results of Decision Tree model
start = timer()
decision_tree = DecisionTreeRegressor(max_depth = best_depth, min_samples_split = best_sample_split)
decision_tree.fit(Xtrain, Ytrain)
ypred_on_test = decision_tree.predict(Xtest)
acc_score_test = mean_squared_error(Ytest, ypred_on_test)
end = timer()


print("Mean Squared error is ", acc_score_test, ", obtained with maximum depth = ", best_depth, " and min_samples_split = ", best_sample_split)
print("Total time = ", '{:.3f}'.format((end-start)), "s") 

Mean Squared error is  54527567.66444928 , obtained with maximum depth =  35  and min_samples_split =  5
Total time =  0.022 s


In [None]:
scorecheck = pd.DataFrame({'Real Values':Ytest.values.reshape(-1), 'Predicted Values':ypred_on_test.reshape(-1)})
scorecheck.head(20)

Unnamed: 0,Real Values,Predicted Values
0,836553.5,842133.6
1,512741.6,506514.6
2,663880.6,668993.8
3,1000086.0,1006515.0
4,745015.1,734043.5
5,774017.1,770451.6
6,913263.4,903374.2
7,586111.6,574652.4
8,1012929.0,1014505.0
9,971532.7,969273.2


In [None]:
re NNs

##8) Report Your Results

**Notebook should be RUN:** As training and testing may take a long time, we may just look at your notebook results without running the code again; so make sure **each cell is run**, so outputs are there.

**Report:** Write an **1-2 page summary** of your approach to this problem **below**. 

**Must include statements such as those below:**
**(Remove the text in parentheses, below, and include your own report)**

This homework's purpose was to implement a Neural Network to predict the prices of the houses with respect to its square-meter, view, crime rate and number of rooms.

We have 2 csv files in which we use train.csv's data for training and validation. Then, I used test.csv data to evaluate the mean squared error loss of our model.

As it was stated in the homework guideline. I splitted the train.csv with 90%-10% as train and validation and used test.csv file as the test data. I split train dataframe by using train_test_split function where random state was 42.

Since we have some categorical features such as crime rate and view, we need to do some preprocessing on the data in both test and train data. By using pandas library's get_dummies function, I converted categorical features into numerical features. Then, since we were asked not to scale price between 0 and 1, I did not scale the price. However, to improve the learnin rate in some sense I wanted to scale the other features between 0 and 1. I use minmaxscaler function for this purpose. I have also implemented those aforementioned operations on test.csv data 
 

**Add your observations as follows** (keep the questions for easy grading/context) in the report part of your notebook.

**Observations**

- Try a few learning rates for N=25 hidden neurons,  train for the indicated amount of epochs. Comment on what happens when learning rate is large or small? What is a good number/range for the learning rate?
Your answer here….

**Neural Network with RELU activaition function Training Loss Results**
  >Mean Squared Error Resuls | learning_rate = 0.1 | learning_rate = 0.01| learning_rate = 0.001 |
>---|---|:---|:---|
>Neuron amount = 25|3643362304.0|473466208256.0|549252988928.0|
>Neuron amount = 50|975219776.0|388201906176.0|548941365248.0|
>Neuron amount = 100|**314840288.0**|275274334208.0|546926297088.0|

**Neural Network with Softmax activaition function Training Loss Results**
  >Mean Squared Error Resuls | learning_rate = 0.1 | learning_rate = 0.01| learning_rate = 0.001 |
>---|---|:---|:---|
>Neuron amount = 25|549759811584.0|550286852096.0|550340788224.0|
>Neuron amount = 50|549756862464.0|550285869056.0|550340591616.0|
>Neuron amount = 100|**549754994688.0**|550285475840.0|550340657152.0|

**Neural Network with Sigmoid activaition function Training Loss Results**
  >Mean Squared Error Resuls | learning_rate = 0.1 | learning_rate = 0.01| learning_rate = 0.001 |
>---|---|:---|:---|
>Neuron amount = 25|542758797312.0|549559009280.0|550263324672.0|
>Neuron amount = 50|535541022720.0|548814192640.0|550183108608.0|
>Neuron amount = 100|**521323970560.0**|547319447552.0|550018351104.0|

As I have understood from each activation function and neuron size, when this model learning rate increases its mean squared error decreases. Thus, the predicted target values get closer to the real ones. For our learning-rate samples that we have used best training MSE scores came when the learning rate was 0.1. The worst case generally comes when learning rate is 0.001 for this training data case.

- Use that learning rate and vary the number of hidden neurons for the given values and try the indicated number of epochs. Give the validation mean squared errors for different approach and meta-parameters tried **in a table** and state which one you selected as your model. How many hidden neurons give the best model? 

**Neural Network Validation Loss Results with learning rate 0.1**
  >Mean Squared Error Resuls | RELU | SOFTMAX| SIGMOID |
>---|---|:---|:---|
>Neuron amount = 25|3802443776.0|530309087232.0|541256679424.0|
>Neuron amount = 50|1002926912.0|519058423808.0|541252321280.0|
>Neuron amount = 100|**326972032.0**|497063755776.0|541252059136.0|

The best result from validation set came with 100 neurons where have used RELU activation function.

- State  what your test results are with the chosen approach and meta-parameters: e.g. "We have obtained the best results on the validation set with the one hidden layer approach with **100 neurons** using a value of 0.1 learning rate for loss = mean squared error where parameter with the value of **326972032.0**. The result of this model on the test data with 0.1 learning rate and relu activation function with 100 neuron hidden layer is giving me **320377216.0** mean squared error.

Additionally, I have tested SGD and Adam decided on using Adam as the optimizer.

- How slow is learning? Any other problems?
When the hidden layer size is smaller the mean squared error was higher than the other ones. When I changed the learning rate between 0.1 and 0.001, I see some changes in the loss value. MSE error is very high, thus I think with one hidden layer we are not getting a beneficial result. Why? Because when I see the model with 2 hidden layers, I get a better result. Adding hidden layers might be advantegeous.

When I scale the price also as I have seen in the recitation the best learning rate was 0.001. However, Berrin Hoca mailed us about not scaling the price values. If you scale the price values you should return the price values to its original values. One can use that approach and scaled every features since the scale is between 0 and 1 when the learning rate is by taking small steps we get better results in a sense. 

However, in our case since scale of price is different than others taking bigger steps helped us to decrease the MSE for this task as I have seen from the values I get by using different activation functions. Learning is slow I think, still we have a huge MSE. However, by manipulating the learning rate and adding hidden layers we might decrease the MSE value. For instance when I use 2 hidden layer with RELU activation function and learning rate 0.1 I get a smaller MSE result which is 242528768.0 .

I have also implemented decision tree regressor to predict prices. I listed the predicted the values by models with their actual value to visually compare the results in the above.

