# **CS412 - Machine Learning - 2022**
## Assignment #2
100 pts


## Goal

The goal of this homework is two-fold:

*   Gain experience with neural network approaches
*   Gain experience with the Keras library

## Dataset
You are going to use a house price dataset that we prepared for you, that contains four independent variables (predictors) and one target variable. The task is predicting the target variable (house price) from the predictors (house attributes).


Download the data from SuCourse. Reserve 10% of the training data for validation and use the rest for development (learning your models). The official test data we provide (1,200 samples) should only be used for testing at the end, and not model selection.

## Task 
Build a regressor with a neural network that has only one hidden layer, using the Keras library function calls to predict house prices in the provided dataset.

Your code should follow the given skeleton and try the indicated parameters.

## Preprocessing and Meta-parameters
You should try 10,50 and 100 as hidden node count. 

You should  decide on the learning rate (step size), you can try values such as 0.001, 0.01, 0.1, but you may need to increase if learning is very slow or decrease if you see the loss increase!

You can use either sigmoid or Relu activations for the hidden nodes (indicate with your results) and you should know what to use for the activation for the output layer, input, output layer sizes, and the suitable loss function. 

## Software: 

Keras is a library that we will use especially for deep learning, but also with basic neural network functionality of course.

You may find the necessary function references here: 

http://scikit-learn.org/stable/supervised_learning.html
https://keras.io/api/

When you search for Dense for instance, you should find the relevant function and explained parameters, easily.

## Submission: 

Fill this notebook. Write the report section at the end.

You should prepare a separate pdf document as your homework (name hw2-CS412-yourname.pdf) which consists of the report (Part 8) of the notebook for easy viewing -and- include a link to your notebook from within the pdf report (make sure to include the link obtained from the #share link on top right, **be sure to share with Sabancı University first** as otherwise there will be access problems.). Also, do not forget to add your answers for Questions 2 and 3 on the assignment document.

##1) Initialize

*   First make a copy of the notebook given to you as a starter.

*   Make sure you choose Connect form upper right.


## 2) Load training dataset

* Load the datasets (train.csv, test.csv) provided on SuCourse on your Google drive and read the datasets using Google Drive's mount functions. 
You may find the necessary functions here: 
https://colab.research.google.com/notebooks/io.ipynb

In [1]:
from google.colab import drive
drive.mount('/content/drive/') 
# click on the url that pops up and give the necessary authorizations

Mounted at /content/drive/




*   Set your notebooks working directory to the path where the datasets are uploaded (cd is the linux command for change directory) 
*   You may need to use cd drive/MyDrive depending on your path to the datasets on Google Drive. (don't comment the code in the cells when using linux commands)






In [2]:
cd drive/My\ Drive/CS412/

/content/drive/My Drive/CS412


* List the files in the current directory.

In [3]:
ls

dfBotTweets.csv  test.csv  train.csv


##3) Understanding the dataset (5 pts)

There are alot of functions that can be used to know more about this dataset

- What is the shape of the training set (num of samples X number of attributes) **[shape function can be used]**

- Display attribute names **[columns function can be used]**

- Display the first 5 rows from training dataset **[head or sample functions can be used]**

..

In [4]:
# import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


In [5]:
# load the data
train_df = pd.read_csv("train.csv")

In [6]:
# show first 10 elements of the training data
train_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,251,5,west,low,925701.721399
1,211,3,west,high,622237.482636
2,128,5,east,low,694998.182376
3,178,3,east,high,564689.015926
4,231,3,west,low,811222.970379
5,253,5,north,high,766250.032506
6,101,1,north,low,512749.401548
7,242,1,north,high,637010.760148
8,174,5,west,high,638136.374869
9,328,2,south,high,787704.988273


In [7]:
# print the shape of data
print("Data dimensionality is:", train_df.shape, "\n");

# also give some statistics about the data like mean, standard deviation etc.
print("Mean of the data is:\n",train_df.mean(), "\n");
print("Standard deviation of the data is:\n",train_df.std());


Data dimensionality is: (4800, 5) 

Mean of the data is:
 sqmtrs       225.033542
nrooms         2.983958
price     725756.960758
dtype: float64 

Standard deviation of the data is:
 sqmtrs        71.851436
nrooms         1.421251
price     151041.121658
dtype: float64


  """
  


##4) Preprocessing Steps (10 pts)

As some of the features (predictive variables) on this dataset are categorical (non-numeric) you need to do some preprocessing for those features.

You can use as many **dummy or indicator variables** as there are categories within one feature. You can also look at pandas' get_dummies or keras.utils.to_categorical functions.

In neural networks, scaling of the features are important, because they affect the net input of a neuron as a whole. You should use **MinMax scaler** on sklearn for this task, which scales the variables between 0 and 1 on by default. (Remember that mean-squared error loss function tends to be extremely large with unscaled features.)


In [8]:
# View and Crime Rate features are categorical values. 
# For training them, we have to convert them to numerical values.
# To achieve it, I used LabelEncoder.

In [9]:
# encode the categorical variables
# first encode the view column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
label = le.fit_transform(train_df['view'])
train_df.drop("view", axis=1, inplace=True)
train_df["view"] = label
print(train_df)

      sqmtrs  nrooms crime_rate          price  view
0        251       5        low  925701.721399     3
1        211       3       high  622237.482636     3
2        128       5        low  694998.182376     0
3        178       3       high  564689.015926     0
4        231       3        low  811222.970379     3
...      ...     ...        ...            ...   ...
4795     231       5        low  886024.190216     0
4796     229       2        low  781430.834086     0
4797     334       2       high  786639.104497     3
4798     332       2       high  778546.285214     1
4799     244       4        low  867039.582695     0

[4800 rows x 5 columns]


In [10]:
# encode the view crime_rate
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
label = le.fit_transform(train_df['crime_rate'])
train_df.drop("crime_rate", axis=1, inplace=True)
train_df["crime_rate"] = label
print(train_df)

      sqmtrs  nrooms          price  view  crime_rate
0        251       5  925701.721399     3           1
1        211       3  622237.482636     3           0
2        128       5  694998.182376     0           1
3        178       3  564689.015926     0           0
4        231       3  811222.970379     3           1
...      ...     ...            ...   ...         ...
4795     231       5  886024.190216     0           1
4796     229       2  781430.834086     0           1
4797     334       2  786639.104497     3           0
4798     332       2  778546.285214     1           0
4799     244       4  867039.582695     0           1

[4800 rows x 5 columns]


In [11]:
# I would use this part for the KNN and the Decision tree
train_df_bonus = train_df
train_df_bonus


Unnamed: 0,sqmtrs,nrooms,price,view,crime_rate
0,251,5,925701.721399,3,1
1,211,3,622237.482636,3,0
2,128,5,694998.182376,0,1
3,178,3,564689.015926,0,0
4,231,3,811222.970379,3,1
...,...,...,...,...,...
4795,231,5,886024.190216,0,1
4796,229,2,781430.834086,0,1
4797,334,2,786639.104497,3,0
4798,332,2,778546.285214,1,0


In [12]:
# encode the view crime_rate
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
label = le.fit_transform(train_df_bonus['price'])
train_df_bonus.drop("price", axis=1, inplace=True)
train_df_bonus["price"] = label
print(train_df_bonus)

      sqmtrs  nrooms  view  crime_rate  price
0        251       5     3           1   4337
1        211       3     3           0   1222
2        128       5     0           1   1994
3        178       3     0           0    749
4        231       3     3           1   3317
...      ...     ...   ...         ...    ...
4795     231       5     0           1   4025
4796     229       2     0           1   2984
4797     334       2     3           0   3039
4798     332       2     1           0   2943
4799     244       4     0           1   3872

[4800 rows x 5 columns]


In [13]:
# Define X:
X = train_df_bonus[['sqmtrs',	'nrooms',	'view',	'crime_rate']];

# Define y:
y = train_df_bonus['price'];

# split 90-10
from sklearn.model_selection import train_test_split

X_train_bonus, X_validation_bonus, y_train_bonus, y_validation_bonus= train_test_split(X, y, test_size=0.1, random_state=42);

print("Length of the training set:", len(y_train_bonus),"\n",
      "Length of the validation set: ", len(y_validation_bonus)); 

Length of the training set: 4320 
 Length of the validation set:  480


In [14]:
# scale the features between 0-1
from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler()

# Scale both the training inputs and outputs
scaled_train= msc.fit_transform(train_df);
scaled_train

array([[0.6064257 , 1.        , 1.        , 1.        , 0.90372994],
       [0.44578313, 0.5       , 1.        , 0.        , 0.25463638],
       [0.1124498 , 1.        , 0.        , 1.        , 0.41550323],
       ...,
       [0.93975904, 0.25      , 1.        , 0.        , 0.63325693],
       [0.93172691, 0.25      , 0.33333333, 0.        , 0.61325276],
       [0.57831325, 0.75      , 0.        , 1.        , 0.80683476]])

In [15]:
# scaling created a numpy array, so we need to convert it to dataframe object
scaled_train_df = pd.DataFrame(scaled_train, columns=train_df.columns.values)
scaled_train_df

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,0.606426,1.00,1.000000,1.0,0.903730
1,0.445783,0.50,1.000000,0.0,0.254636
2,0.112450,1.00,0.000000,1.0,0.415503
3,0.313253,0.50,0.000000,0.0,0.156074
4,0.526104,0.50,1.000000,1.0,0.691186
...,...,...,...,...,...
4795,0.526104,1.00,0.000000,1.0,0.838716
4796,0.518072,0.25,0.000000,1.0,0.621796
4797,0.939759,0.25,1.000000,0.0,0.633257
4798,0.931727,0.25,0.333333,0.0,0.613253


Don't forget the split the training data to obtain a validation set. **Use random_state=42**

In [16]:
# Define X:
X = scaled_train_df[['sqmtrs',	'nrooms',	'view',	'crime_rate']];

# Define y:
y = scaled_train_df['price'];

In [17]:
# split 90-10
from sklearn.model_selection import train_test_split

X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=0.1, random_state=42);

print("Length of the training set:", len(y_train),"\n",
      "Length of the validation set: ", len(y_validation)); 

Length of the training set: 4320 
 Length of the validation set:  480


In [18]:
y_train

3839    0.248385
3226    0.862680
2894    0.752865
1345    0.404668
4713    0.686393
          ...   
4426    0.652428
466     0.724943
3092    0.469056
3772    0.843509
860     0.131694
Name: price, Length: 4320, dtype: float64

##5) Train neural networks on development data and do model selection using the validation data (55 pts)


* Train a neural network with **one hidden layer** (try 3 different values for the number of neurons in that hidden layer, as 25, 50, 100), you will need to correctly choose the optimizer and the loss function that this model will train with. Use batch_size as 64 and train each model for 30 epochs. 

* Train another neural network with two hidden layers with meta-parameters of your choice. Again, use batch_size as 64 and train the model for 30 epochs. 

* **Bonus (5 pts)** Train a KNN or a Decision Tree model with your own choice of meta parameters to predict the house prices.


In [19]:
import keras
import keras.utils
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD, Adam

In [20]:
# train one-hidden layered neural networks
# define your model architecture
# hidden layer 25
model_25 = tf.keras.Sequential() #would create an empty network for us
model_25.add(tf.keras.layers.Dense(25, activation='relu', name="hidden_layer1"))
model_25.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))


model_25.compile(loss='mean_squared_error', optimizer=Adam(),metrics='accuracy')
# model.compile(loss='mean_squared_error', optimizer=opt)

histories=[]

# fit the model on training data
history_hidden_layer25 = model_25.fit(X_train, y_train, epochs=30, batch_size=64,shuffle=True,verbose=1)
histories.append(history_hidden_layer25)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [21]:
# hidden layer 50
model_50 = tf.keras.Sequential() #would create an empty network for us
model_50.add(tf.keras.layers.Dense(50, activation='relu', name="hidden_layer1"))
model_50.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))

model_50.compile(loss='mean_squared_error', optimizer=Adam())

history_hidden_layer50 = model_50.fit(X_train, y_train, epochs=30, batch_size=64,shuffle=True,verbose=1)
histories.append(history_hidden_layer50)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [22]:
# hidden layer 100
model_100 = tf.keras.Sequential() #would create an empty network for us
model_100.add(tf.keras.layers.Dense(100, activation='relu', name="hidden_layer1"))
model_100.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))

model_100.compile(loss='mean_squared_error', optimizer=Adam())

history_hidden_layer100 = model_100.fit(X_train, y_train, epochs=30, batch_size=64,shuffle=True,verbose=1)
histories.append(history_hidden_layer100)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [42]:
# train a two-hidden layered neural network
training_model = tf.keras.Sequential() #would create an empty network for us
training_model.add(tf.keras.layers.Dense(250, activation='relu', name="hidden_layer1"))
training_model.add(tf.keras.layers.Dense(100, activation='relu', name="hidden_layer2"))
training_model.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))
# ...
training_model.compile(loss='mean_squared_error', optimizer=Adam())
history1 = training_model.fit(X_train, y_train, epochs=30, batch_size=64,shuffle=True,verbose=1)
histories.append(history1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [24]:
#Before testing my training classifiers on the validation set,
#I decided to put learning rate by manually to see the effect of the optimizers on my training clearly.
#I chose to two hidden layered model because which gives me the best result compare to others.
# train a two-hidden layered neural network

model_learning_rate = tf.keras.Sequential() #would create an empty network for us
model_learning_rate.add(tf.keras.layers.Dense(250, activation='relu', name="hidden_layer1"))
model_learning_rate.add(tf.keras.layers.Dense(100, activation='relu', name="hidden_layer2"))
model_learning_rate.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))

opt=SGD(learning_rate=0.001);
model_learning_rate.compile(loss='mean_squared_error', optimizer=opt)
history_learning_rate = model_learning_rate.fit(X_train, y_train, epochs=70, batch_size=64,shuffle=True,verbose=1)
history_learning_rate

Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70


<keras.callbacks.History at 0x7f870ea370d0>

In [25]:

model_learning_rate = tf.keras.Sequential() #would create an empty network for us
model_learning_rate.add(tf.keras.layers.Dense(250, activation='relu', name="hidden_layer1"))
model_learning_rate.add(tf.keras.layers.Dense(100, activation='relu', name="hidden_layer2"))
model_learning_rate.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))

opt=SGD(learning_rate=0.01);
model_learning_rate.compile(loss='mean_squared_error', optimizer=opt)
history_learning_rate = model_learning_rate.fit(X_train, y_train, epochs=50, batch_size=64,shuffle=True,verbose=1)
history_learning_rate

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f870e9d6750>

In [26]:
model_learning_rate = tf.keras.Sequential() #would create an empty network for us
model_learning_rate.add(tf.keras.layers.Dense(250, activation='relu', name="hidden_layer1"))
model_learning_rate.add(tf.keras.layers.Dense(100, activation='relu', name="hidden_layer2"))
model_learning_rate.add(tf.keras.layers.Dense(1, activation='linear', name="output_layer"))

opt=SGD(learning_rate=0.1);
model_learning_rate.compile(loss='mean_squared_error', optimizer=opt)
history_learning_rate = model_learning_rate.fit(X_train, y_train, epochs=30, batch_size=64,shuffle=True,verbose=1)
history_learning_rate

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f870e84af90>

## 6) Test your trained classifiers on the Validation set (10 pts)
Test your trained classifiers on the validation set and print the mean squared errors.


In [27]:
# tests on validation
validation_score = model_25.evaluate(X_validation, y_validation,verbose=1)
validation_score 




[0.0006571318372152746, 0.0]

In [28]:
validation_score = model_learning_rate.evaluate(X_validation, y_validation,verbose=1)
validation_score 



0.00037761146086268127

In [29]:
validation_score = model_50.evaluate(X_validation, y_validation,verbose=1)
validation_score 



0.0003528165107127279

In [30]:
validation_score = model_100.evaluate(X_validation, y_validation,verbose=1)
validation_score 



0.00021457819093484432

In [44]:
validation_score = training_model.evaluate(X_validation, y_validation,verbose=1)
validation_score



0.00013911847781855613

In [32]:
#model with two hidden layers which name is training_model has minimum loss value over the validation dataset so we need to choose it for the testing the new data.

## 7) Test your classifier on Test set (10 pts)

- Load test data
- Apply same pre-processing as training data (encoding categorical variables, scaling)
- Predict the labels of testing data **using the best model that you have selected according to your validation results** and report the mean squared error. 

In [33]:
# test results
test_df = pd.read_csv("/content/test.csv")
test_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,349,3,south,high,836553.5
1,169,1,west,high,512741.6
2,233,3,south,high,663880.6
3,340,4,north,low,1000086.0
4,199,2,east,low,745015.1
5,332,1,east,high,774017.1
6,294,3,west,low,913263.4
7,111,3,east,low,586111.6
8,310,5,north,low,1012929.0
9,307,4,west,low,971532.7


In [45]:
# encode the categorical variables
# first encode the view column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
label = le.fit_transform(test_df['view'])
test_df.drop("view", axis=1, inplace=True)
test_df["view"] = label
print(test_df)

      sqmtrs  nrooms         price  crime_rate  view
0        349       3  8.365535e+05           0     2
1        169       1  5.127416e+05           0     3
2        233       3  6.638806e+05           0     2
3        340       4  1.000086e+06           1     1
4        199       2  7.450151e+05           1     0
...      ...     ...           ...         ...   ...
1195     213       5  7.081836e+05           0     3
1196     136       1  5.940682e+05           1     0
1197     130       3  6.271434e+05           1     2
1198     291       3  7.596893e+05           0     2
1199     333       2  9.331981e+05           1     3

[1200 rows x 5 columns]


In [46]:
# encode the crime_rate column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
label = le.fit_transform(test_df['crime_rate'])
test_df.drop("crime_rate", axis=1, inplace=True)
test_df["crime_rate"] = label
print(test_df)

      sqmtrs  nrooms         price  view  crime_rate
0        349       3  8.365535e+05     2           0
1        169       1  5.127416e+05     3           0
2        233       3  6.638806e+05     2           0
3        340       4  1.000086e+06     1           1
4        199       2  7.450151e+05     0           1
...      ...     ...           ...   ...         ...
1195     213       5  7.081836e+05     3           0
1196     136       1  5.940682e+05     0           1
1197     130       3  6.271434e+05     2           1
1198     291       3  7.596893e+05     2           0
1199     333       2  9.331981e+05     3           1

[1200 rows x 5 columns]


In [47]:
# scale the features between 0-1
from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler()

# Scale both the training inputs and outputs
scaled_test= msc.fit_transform(test_df);
scaled_test

array([[1.        , 0.5       , 0.69044087, 0.66666667, 0.        ],
       [0.27710843, 0.        , 0.22900755, 1.        , 0.        ],
       [0.53413655, 0.5       , 0.4443812 , 0.66666667, 0.        ],
       ...,
       [0.12048193, 0.5       , 0.39203058, 0.66666667, 1.        ],
       [0.76706827, 0.5       , 0.58090906, 0.66666667, 0.        ],
       [0.93574297, 0.25      , 0.82815994, 1.        , 1.        ]])

In [48]:
# scaling created a numpy array, so we need to convert it to dataframe object
scaled_test_df = pd.DataFrame(scaled_test, columns=test_df.columns.values)
scaled_test_df

Unnamed: 0,sqmtrs,nrooms,price,view,crime_rate
0,1.000000,0.50,0.690441,0.666667,0.0
1,0.277108,0.00,0.229008,1.000000,0.0
2,0.534137,0.50,0.444381,0.666667,0.0
3,0.963855,0.75,0.923476,0.333333,1.0
4,0.397590,0.25,0.559998,0.000000,1.0
...,...,...,...,...,...
1195,0.453815,1.00,0.507513,1.000000,0.0
1196,0.144578,0.00,0.344898,0.000000,1.0
1197,0.120482,0.50,0.392031,0.666667,1.0
1198,0.767068,0.50,0.580909,0.666667,0.0


In [49]:
target='price'
X_test = scaled_test_df.drop(target, axis=1).values
Y_test = scaled_test_df[[target]].values

In [50]:
score = training_model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', score)

Test loss: 0.008553351275622845


## 8) Bonus (5pt)

In [51]:
train_df_bonus

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,251,5,3,1,4337
1,211,3,3,0,1222
2,128,5,0,1,1994
3,178,3,0,0,749
4,231,3,3,1,3317
...,...,...,...,...,...
4795,231,5,0,1,4025
4796,229,2,0,1,2984
4797,334,2,3,0,3039
4798,332,2,1,0,2943


##9) Report Your Results (10 pts)

**Notebook should be RUN:** As training and testing may take a long time, we may just look at your notebook results without running the code again; so make sure **each cell is run**, so outputs are there.

**Report:** Write an **1-2 page summary** of your approach to this problem **as indicated below**. 

**Must include statements such as those below:**
**(Remove the text in parentheses, below, and include your own report)**

( Include the problem definition: 1-2 lines )
The problem is clearly is that given the features like square meter of houses (sqmtrs), number of room (nrooms), view of the house (view), crime rate of the environment that house is in	(crime_rate) and also price of the house (price).

Our independent variables are sqmtrs, nrooms, view, crime_rate, and our dependent variable is price. It means that our aim is to predict the price of the house with given these independent variables.

 (Talk about train/val/test sets, size and how split. )
 (Talk about feature extraction or preprocessing.)
For achieving this aim and predicting house prices, I started with the mean, standard deviation of the data to understand it and then, I converted all categorical values to numerical ones as using LabelEncoder() to train my dataset properly for the upcoming stages.

Then, I scaled all my features between 0 and 1 as using MinMaxScaler().
Lastly, I splitted my dataset 90% training and 10% validation.

After that, I started to train it as using one hidden layer and 25, 50, and 100 nodes. However, the lowest MSE loss was in the two hidden layer model.
And I used all of these models in the validation set, it clearly showed me the lowest loss is in the two hidden layer one. Therefore, I decided to use it in my test dataset. 

Finally, I did all the same things in my test dataset like in the training and I use the two hidden layer model which has the lowest loss score.

 


**Add your observations as follows** (keep the questions for easy grading/context) in the report part of your notebook.

**Observations**

- Try a few learning rates for N=25 hidden neurons,  train for the indicated amount of epochs. Comment on what happens when learning rate is large or small? What is a good number/range for the learning rate?
Your answer here….
Actually, choosing the most appropriate learning rate was the most challenging part of this homework for me. 
Mostly learning rate has small positive values between 0.0 and 1.0.
Smaller learning rate requires more training epochs because it affects the change of the weights little so it would increase the training process and our model could get stuck.
On the other hand, larger learning rate requires fewer training epochs and would lead to learn a sub-optimal set of weights too fast or an unstable training process.

First of all, I choose the Adam optimizer to be in the safe side and I think that it would give me the appropriate results. 
However, I would try other learning rates to see how our result change with them.

I set learning rate 0.001 which means quite small and keep my training epoch as 70. Also, for other model, I set learning rate 0.01 and kept my training epoch as 50. They would result well at the beginning but get stuck on at some point.
Then, I put my learning rate as 0.1 and which gives me the better result compare to last 2. However, it was not so good compare to Adam() optimizer.

- Use that learning rate and vary the number of hidden neurons for the given values and try the indicated number of epochs. Give the validation mean squared errors for different approach and meta-parameters tried **in a table** and state which one you selected as your model. How many hidden neurons give the best model? 
The best model is the model which has two hidden layer, 250,100,1 and I used Adam optimizer for that because Adam gives the most accurate results in general. I wanted to be in the safe side.

- State  what your test results are with the chosen approach and meta-parameters: e.g. "We have obtained the best results on the validation set with the two hidden layered with using Adam optimizer approach using a value of 
This is the regression problem because of that I did not cate the accuracy result. The main point for me is MSE. And the MSE value for my model is quite small which is good. 

- How slow is learning? Any other problems?
Actually it was good.

- Any other observations (not obligatory)

 You can add additional visualization as separate pages if you want, think of them as appendix, keeping the summary to 1-2-pages.

