# **CS412 - Machine Learning - 2022**
## Assignment #2
100 pts


## Goal

The goal of this homework is two-fold:

*   Gain experience with neural network approaches
*   Gain experience with the Keras library

## Dataset
You are going to use a house price dataset that we prepared for you, that contains four independent variables (predictors) and one target variable. The task is predicting the target variable (house price) from the predictors (house attributes).


Download the data from SuCourse. Reserve 10% of the training data for validation and use the rest for development (learning your models). The official test data we provide (1,200 samples) should only be used for testing at the end, and not model selection.

## Task 
Build a regressor with a neural network that has only one hidden layer, using the Keras library function calls to predict house prices in the provided dataset.

Your code should follow the given skeleton and try the indicated parameters.

## Preprocessing and Meta-parameters
You should try 10,50 and 100 as hidden node count. 

You should  decide on the learning rate (step size), you can try values such as 0.001, 0.01, 0.1, but you may need to increase if learning is very slow or decrease if you see the loss increase!

You can use either sigmoid or Relu activations for the hidden nodes (indicate with your results) and you should know what to use for the activation for the output layer, input, output layer sizes, and the suitable loss function. 

## Software: 

Keras is a library that we will use especially for deep learning, but also with basic neural network functionality of course.

You may find the necessary function references here: 

http://scikit-learn.org/stable/supervised_learning.html
https://keras.io/api/

When you search for Dense for instance, you should find the relevant function and explained parameters, easily.

## Submission: 

Fill this notebook. Write the report section at the end.

You should prepare a separate pdf document as your homework (name hw2-CS412-yourname.pdf) which consists of the report (Part 8) of the notebook for easy viewing -and- include a link to your notebook from within the pdf report (make sure to include the link obtained from the #share link on top right, **be sure to share with Sabancı University first** as otherwise there will be access problems.). Also, do not forget to add your answers for Questions 2 and 3 on the assignment document.

##1) Initialize

*   First make a copy of the notebook given to you as a starter.

*   Make sure you choose Connect form upper right.


## 2) Load training dataset

* Load the datasets (train.csv, test.csv) provided on SuCourse on your Google drive and read the datasets using Google Drive's mount functions. 
You may find the necessary functions here: 
https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import drive
drive.mount('/content/drive/') 
# click on the url that pops up and give the necessary authorizations

Mounted at /content/drive/




*   Set your notebooks working directory to the path where the datasets are uploaded (cd is the linux command for change directory) 
*   You may need to use cd drive/MyDrive depending on your path to the datasets on Google Drive. (don't comment the code in the cells when using linux commands)






In [None]:
cd drive/MyDrive/CS412_Assignmet2/

/content/drive/MyDrive/CS412_Assignmet2


* List the files in the current directory.

In [None]:
ls 

test.csv  train.csv


##3) Understanding the dataset (5 pts)

There are alot of functions that can be used to know more about this dataset

- What is the shape of the training set (num of samples X number of attributes) **[shape function can be used]**

- Display attribute names **[columns function can be used]**

- Display the first 5 rows from training dataset **[head or sample functions can be used]**

..

In [None]:
# import the necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train_df = pd.read_csv("train.csv")

# show first 10 elements of the training data
print(train_df.head(10))
print("\n")

   sqmtrs  nrooms   view crime_rate          price
0     251       5   west        low  925701.721399
1     211       3   west       high  622237.482636
2     128       5   east        low  694998.182376
3     178       3   east       high  564689.015926
4     231       3   west        low  811222.970379
5     253       5  north       high  766250.032506
6     101       1  north        low  512749.401548
7     242       1  north       high  637010.760148
8     174       5   west       high  638136.374869
9     328       2  south       high  787704.988273




In [None]:
# print the shape of data
print("Data dimensionality is: ")
print(train_df.shape)
print("\n")


print("Unique valus for categorical data: ")
print("view: " + train_df['view'].unique())
print("crime_rate: " + train_df['crime_rate'].unique())
print("\n")


# also give some statistics about the data like mean, standard deviation etc.
print("Numerical data statistics: ")
print(train_df.describe())
print("\n")

print("View Mode: ")
print(train_df['view'].mode())
print("\n")

print("Crime Rate Mode: ")
print(train_df['crime_rate'].mode())
print("\n")


Data dimensionality is: 
(4800, 5)


Unique valus for categorical data: 
['view: west' 'view: east' 'view: north' 'view: south']
['crime_rate: low' 'crime_rate: high']


Numerical data statistics: 
            sqmtrs       nrooms         price
count  4800.000000  4800.000000  4.800000e+03
mean    225.033542     2.983958  7.257570e+05
std      71.851436     1.421251  1.510411e+05
min     100.000000     1.000000  3.564985e+05
25%     163.000000     2.000000  6.179536e+05
50%     226.000000     3.000000  7.292999e+05
75%     287.000000     4.000000  8.389284e+05
max     349.000000     5.000000  1.076067e+06


View Mode: 
0    east
dtype: object


Crime Rate Mode: 
0    high
dtype: object




##4) Preprocessing Steps (10 pts)

As some of the features (predictive variables) on this dataset are categorical (non-numeric) you need to do some preprocessing for those features.

You can use as many **dummy or indicator variables** as there are categories within one feature. You can also look at pandas' get_dummies or keras.utils.to_categorical functions.

In neural networks, scaling of the features are important, because they affect the net input of a neuron as a whole. You should use **MinMax scaler** on sklearn for this task, which scales the variables between 0 and 1 on by default. (Remember that mean-squared error loss function tends to be extremely large with unscaled features.)


In [None]:
from sklearn.preprocessing import MinMaxScaler

train_df = pd.get_dummies(train_df)

# scale the features between 0-1
msc = MinMaxScaler(feature_range=(0, 1))

scaled = msc.fit_transform(train_df)

scaled_df = pd.DataFrame(scaled, columns=train_df.columns.values)

# Define X:
X = scaled_df.drop(columns=['price'])

# Define y:
y = scaled_df[['price']]


Don't forget the split the training data to obtain a validation set. **Use random_state=42**

In [None]:
# split 90-10
from sklearn.model_selection import train_test_split
X_train, X_validate, y_train, y_validate = train_test_split(X, y, test_size=0.1, random_state=42)

print(X_train.shape, X_validate.shape, y_train.shape,y_validate.shape)

(4320, 8) (480, 8) (4320, 1) (480, 1)


##5) Train neural networks on development data and do model selection using the validation data (55 pts)


* Train a neural network with **one hidden layer** (try 3 different values for the number of neurons in that hidden layer, as 25, 50, 100), you will need to correctly choose the optimizer and the loss function that this model will train with. Use batch_size as 64 and train each model for 30 epochs. 

* Train another neural network with two hidden layers with meta-parameters of your choice. Again, use batch_size as 64 and train the model for 30 epochs. 

* **Bonus (5 pts)** Train a KNN or a Decision Tree model with your own choice of meta parameters to predict the house prices.


In [None]:
import keras
import keras.utils
from tensorflow.keras import utils as np_utils
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.optimizers import SGD, Adam
from sklearn.model_selection import train_test_split
from keras.datasets import mnist
from sklearn.utils import shuffle
import tensorflow as tf


### NN w/ 1 Hidden Layer

In [None]:
from keras import models
# train one-hidden layered neural networks
# define your model architecture

# Creates a NN with given layer size and activation function architecture then compiles it with given learning rate
def nn_oneHid(layer1, learning_rate, activation):
  model = tf.keras.Sequential()
  model.add(Flatten())
  model.add(tf.keras.layers.Dense(layer1, activation=activation, input_shape = (X_train.shape[1],), name='hidden_layer_1'))
  model.add(tf.keras.layers.Dense(1, name='output_layer'))

  model.compile(loss='mean_squared_error', optimizer=Adam(learning_rate = learning_rate))
  fit_model = model.fit(X_train, y_train, batch_size = 64, epochs = 30, verbose=1)

  return model

Training

In [None]:
activations = ['sigmoid', 'relu']
learning_rate = [0.001, 0.01, 0.1]
layer = [25, 50, 100]

models = []
one_layered = {}



for i in range(len(activations)):
  print("Activation: ", activations[i], "\n")
  for j in range(len(learning_rate)):
    print("Learning Rate: ", learning_rate[j], "\n")
    for k in range(len(layer)):
      print("Nodes in Layer: ", layer[k], "\n")
      model = nn_oneHid(layer[k], learning_rate[j], activations[i])
      models.append(model)
      one_layered[model] = {'model': model, 'activation': activations[i], 'layer1': layer[k], 'learning_rate': learning_rate[j]}




Activation:  sigmoid 

Learning Rate:  0.001 

Nodes in Layer:  25 

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Nodes in Layer:  50 

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Nodes in Layer:  100 

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
E

###NN w/ 2 Hidden Layer

In [None]:
# Functions that trains a two-hidden layered neural network

# Creates a NN with given layer size and activation function architecture then compiles it with given learning rate
def nn_twoHid(layer1, layer2, learning_rate, activation):
  model = tf.keras.Sequential()
  model.add(Flatten())
  model.add(tf.keras.layers.Dense(layer1, activation=activation, input_shape = (X_train.shape[1],), name='hidden_layer_1'))
  model.add(tf.keras.layers.Dense(layer2, activation=activation, name='hidden_layer_2'))
  model.add(tf.keras.layers.Dense(1, name='output_layer'))

  model.compile(loss='mean_squared_error', optimizer=Adam(learning_rate = learning_rate))
  fit_model = model.fit(X_train, y_train, batch_size = 64, epochs = 30, verbose=1)
  
  return model

Training


In [None]:
activations = ['sigmoid', 'relu']
learning_rate = [0.001, 0.01, 0.1]
layer = [25, 50, 100]
layer2 = [25, 50, 100]

models_two = []
two_layered = {}

for i in range(len(activations)):
  print("Activation: ", activations[i], "\n")
  for j in range(len(learning_rate)):
    print("Learning Rate: ", learning_rate[j], "\n")
    for k in range(len(layer2)):
      print("Nodes in Layer-2: ", layer2[k], "\n")
      for l in range(len(layer)):
        print("Nodes in Layer-1: ", layer[l], "\n")
        model = nn_twoHid(layer[l], layer2[k], learning_rate[j], activations[i])
        models_two.append(model)
        two_layered[model] = {'model': model, 'activation': activations[i], 'layer1': layer[k], 'layer2': layer2[l],'learning_rate': learning_rate[j]}

        
# ...


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 2

Getting the best model out of all combinations

## 6) Test your trained classifiers on the Validation set (10 pts)
Test your trained classifiers on the validation set and print the mean squared errors.


In [None]:
scores = []

for model in models:
  score = model.evaluate(X_validate, y_validate, verbose=0)  
  scores.append(score)
  one_layered[model]['score'] = score
  print(one_layered[model]['activation'],one_layered[model]['learning_rate'],one_layered[model]['layer1'],"MSE Score: ", score)


sigmoid 0.001 25 MSE Score:  0.00036183500196784735
sigmoid 0.001 50 MSE Score:  0.00038931670133024454
sigmoid 0.001 100 MSE Score:  0.00035190890775993466
sigmoid 0.01 25 MSE Score:  0.0006680361111648381
sigmoid 0.01 50 MSE Score:  0.00031313151703216136
sigmoid 0.01 100 MSE Score:  0.00048528294428251684
sigmoid 0.1 25 MSE Score:  0.00012579838221427053
sigmoid 0.1 50 MSE Score:  0.00013945625687483698
sigmoid 0.1 100 MSE Score:  0.00030518174753524363
relu 0.001 25 MSE Score:  0.0005061730043962598
relu 0.001 50 MSE Score:  6.825794844189659e-05
relu 0.001 100 MSE Score:  6.558789755217731e-05
relu 0.01 25 MSE Score:  0.00011399292998248711
relu 0.01 50 MSE Score:  0.00015610178525093943
relu 0.01 100 MSE Score:  7.513321907026693e-05
relu 0.1 25 MSE Score:  0.0006180982454679906
relu 0.1 50 MSE Score:  0.0005088504403829575
relu 0.1 100 MSE Score:  0.0004444707010407001


In [None]:
min_score_1 = 10000000.0
best_model_one_layer = {}

for item in one_layered:
  a = one_layered[item]
  if a['score'] < min_score_1:
    best_model_one_layer = item
    min_score_1 = a['score']
  

print("Score of the best scored model: ", one_layered[best_model_one_layer]['score'], '\n')
print("1-Hidden Layer Model with minimum loss: ", one_layered[best_model_one_layer])

Score of the best scored model:  6.558789755217731e-05 

1-Hidden Layer Model with minimum loss:  {'model': <keras.engine.sequential.Sequential object at 0x7f3ff9a48850>, 'activation': 'relu', 'layer1': 100, 'learning_rate': 0.001, 'score': 6.558789755217731e-05}


In [None]:
scores_two = []

for model in models_two:
  score = model.evaluate(X_validate, y_validate, verbose=0)
  scores_two.append(score)
  two_layered[model]['score'] = score
  print(two_layered[model]['activation'],two_layered[model]['learning_rate'],two_layered[model]['layer1'],two_layered[model]['layer2'],"MSE Score: ", score)
  


sigmoid 0.001 25 25 MSE Score:  0.00032741163158789277
sigmoid 0.001 25 50 MSE Score:  0.00032446306431666017
sigmoid 0.001 25 100 MSE Score:  0.00033768106368370354
sigmoid 0.001 50 25 MSE Score:  0.00031512408168055117
sigmoid 0.001 50 50 MSE Score:  0.0003219610371161252
sigmoid 0.001 50 100 MSE Score:  0.0003967750817537308
sigmoid 0.001 100 25 MSE Score:  0.00041346819489263
sigmoid 0.001 100 50 MSE Score:  0.00036869768518954515
sigmoid 0.001 100 100 MSE Score:  0.00033728653215803206
sigmoid 0.01 25 25 MSE Score:  0.00029342962079681456
sigmoid 0.01 25 50 MSE Score:  0.00020361946371849626
sigmoid 0.01 25 100 MSE Score:  0.0002670257235877216
sigmoid 0.01 50 25 MSE Score:  0.0002399661170784384
sigmoid 0.01 50 50 MSE Score:  0.0003942179901059717
sigmoid 0.01 50 100 MSE Score:  0.00021896559337619692
sigmoid 0.01 100 25 MSE Score:  0.0004476413887459785
sigmoid 0.01 100 50 MSE Score:  0.0004406751540955156
sigmoid 0.01 100 100 MSE Score:  0.0005425632116384804
sigmoid 0.1 25 25 

In [None]:
min_score_2 = 10000000.0
best_model_two_layer = {}

for item in two_layered:
  a = two_layered[item]
  if a['score'] < min_score_2:
    best_model_two_layer = item
    min_score_2 = a['score']
  

print(two_layered[best_model_two_layer]['score'], '\n')
print("2-Hidden Layer Model with minimum loss: ", two_layered[best_model_two_layer])

5.8343037380836904e-05 

2-Hidden Layer Model with minimum loss:  {'model': <keras.engine.sequential.Sequential object at 0x7f3fe0840590>, 'activation': 'relu', 'layer1': 25, 'layer2': 100, 'learning_rate': 0.001, 'score': 5.8343037380836904e-05}


In [None]:
compare = (lambda x1, x2: best_model_one_layer if x1 < x2 else best_model_two_layer)

best_model = compare(min_score_1,min_score_2)

def checkKey(dict, key):
    if key in dict:
        return 1
    else:
        return 0

inOne = checkKey(one_layered, best_model)
inTwo = checkKey(two_layered, best_model)


best_model_content = {}
if inOne == 1:
  best_model_content = one_layered[best_model]
  print("Best model is a 1-hidden layer NN: \n")
else:
  best_model_content = two_layered[best_model]
  print("Best model is a 2-hidden layer NN: \n")

best_model_content


Best model is a 2-hidden layer NN: 



{'activation': 'relu',
 'layer1': 25,
 'layer2': 100,
 'learning_rate': 0.001,
 'model': <keras.engine.sequential.Sequential at 0x7f3fe0840590>,
 'score': 5.8343037380836904e-05}

## 7) Test your classifier on Test set (10 pts)

- Load test data
- Apply same pre-processing as training data (encoding categorical variables, scaling)
- Predict the labels of testing data **using the best model that you have selected according to your validation results** and report the mean squared error. 

In [None]:
# test results
test_df = pd.read_csv("test.csv")
test_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,349,3,south,high,836553.5
1,169,1,west,high,512741.6
2,233,3,south,high,663880.6
3,340,4,north,low,1000086.0
4,199,2,east,low,745015.1
5,332,1,east,high,774017.1
6,294,3,west,low,913263.4
7,111,3,east,low,586111.6
8,310,5,north,low,1012929.0
9,307,4,west,low,971532.7


In [None]:
from sklearn.preprocessing import MinMaxScaler

test_df = pd.get_dummies(test_df)

# scale the features between 0-1
msc = MinMaxScaler(feature_range=(0, 1))

scaled_test = msc.fit_transform(test_df)

scaled_test_df = pd.DataFrame(scaled_test, columns=test_df.columns.values)

# Define X:
X_test = scaled_test_df.drop(columns=['price'])

# Define y:
y_test = scaled_test_df[['price']]



In [32]:
score = best_model.evaluate(X_test, y_test, verbose=0)
print("MSE testing score for the best model: ", score, "\n")
print("MSE testing parameters for the best model: ")
best_model_content['score'] = score
best_model_content

MSE testing score for the best model:  0.0004797478031832725 

MSE testing parameters for the best model: 


{'activation': 'relu',
 'layer1': 25,
 'layer2': 100,
 'learning_rate': 0.001,
 'model': <keras.engine.sequential.Sequential at 0x7f3fe0840590>,
 'score': 0.0004797478031832725}

##8) Report Your Results (10 pts)

**Notebook should be RUN:** As training and testing may take a long time, we may just look at your notebook results without running the code again; so make sure **each cell is run**, so outputs are there.

**Report:** Write an **1-2 page summary** of your approach to this problem **as indicated below**. 

**Must include statements such as those below:**
**(Remove the text in parentheses, below, and include your own report)**

**Problem Definition**: House price prediction problem is a regression problem where as inputs there exist area of the houses, number of rooms, view of the house and crime rates in the neighborhood that the house is located and output is the price of the given house according to the above stated parameters. In order to solve this problem, a regression model can be built. Therefore, I build Neural Network regression models with 1-Hidden Layers and 2-Hidden Layers with different activation functions and layer size parameters.

**Train/val/test sets, size and how split**: As the training set, I had a data set with 4800 examples with 5 attributes such as number of rooms, square meters, view, crime rate and price of the house. I used 10% of the training set as the validation set and trained my models with the remaining 4320 (90%) examples. For the testing set, I had a data set with 1200 examples with 4 attributes which are the same as the training set except the price since it is the parameter that I would predict.

 **Feature extraction and preprocessing**: Attributes view and crime_rates were categorical variables and in order to efficiently process them I used get_dummies function of the pandas library and turned them into numerical values. Also, in order to employ the data efficiently, I used minmax scaler and scaled all variables between 0 and 1.


**Add your observations as follows** (keep the questions for easy grading/context) in the report part of your notebook.

**Observations**

- Try a few learning rates for N=25 hidden neurons,  train for the indicated amount of epochs. Comment on what happens when learning rate is large or small? What is a good number/range for the learning rate?

1-Hidden Layer w/ 25 nodes:

sigmoid 0.001 25 MSE Score:  0.00036183500196784735

sigmoid 0.01  25 MSE Score:  0.0006680361111648381

sigmoid 0.1   25 MSE Score:  0.00012579838221427053

Relu    0.001 25 MSE Score:  0.0005061730043962598

Relu    0.01  25 MSE Score:  0.00011399292998248711

Relu    0.1   25 MSE Score:  0.0006180982454679906

I observed that the optimal score (min loss for the cost function) occurs with the learning rate around 0.01 for a NN which has 1-hidden layer and ReLU activation function for models that have 25 nodes. Also, I tried this learning rate along with other given rates on models which have 1 or 2 hidden layers with 'RELU' or 'Sigmoid' activation functions and node sizes vary around 25, 50, 100 nodes along with learning rates 0.001 and 0.1. Consequently, I found that the best performing model has two layers with 0.001 learning rate, RELU activation function and 25 nodes on first layer and 100 nodes in second layer along with the best performing 1-hidden layered model also have 0.001 learning rate.


- Use that learning rate and vary the number of hidden neurons for the given values and try the indicated number of epochs. Give the validation mean squared errors for different approach and meta-parameters tried **in a table** and state which one you selected as your model. How many hidden neurons give the best model? 

| Num hid layers | #Layer 1 | #Layer 2 | Learning Rate | Activation Func. | MSE Score |
| --- | --- | --- | --- | --- | --- |
| 2 | 25 | 100 | 0.001 | ReLU | 5.8343037380836904e-05 |
| 2 | 25 | 100 | 0.001 | Sigmoid | 0.00033768106368370354 |
| 1 | 100 | - | 0.001 | ReLU | 6.558789755217731e-05 |
| 1 |100 | - | 0.001 | Sigmoid | 0.00035190890775993466|
|...|
| 1 | 25 | - | 0.001 | ReLU | 0.0005061730043962598 |
| 1 | 50 | - | 0.001 | ReLU | 6.825794844189659e-05 |
| 2 | 25 | 50 | 0.001 | ReLU | 7.015091978246346e-05 |
| 2 | 25 | 25 | 0.001 | ReLU | 8.615724800620228e-05 |
| 2 | 50 | 100 | 0.001 | ReLU | 6.011507866787724e-05|
| 2 | 100 | 100 | 0.001 | ReLU | 7.17694201739505e-05 |
|...|
|Test Score|
| 2 | 25 | 100 | 0.001 | ReLU | 0.0004797478031832725 |

Best result for 1-hidden layered model is 6.558789755217731e-05 with 100 nodes however, best result overall is 5.8343037380836904e-05 with a 2-hidden layered model with 25 nodes in hidden layer 1 and 100 nodes with hidden layer 2. Since it is the best performing model (gives the minimum loss), I selected the 2-hidden layered (25,100) with ReLU function as my model to evaluate the test set.

- State  what your test results are with the chosen approach and meta-parameters: 

"We have obtained the best results on the validation set with the two-hidden layered NN approach using a value of 0.001 for learning_rate parameter, 25 nodes on first hidden layer, 100 nodes on second hidden layer and RELU for activation function. The result of this model on the test data gave MSE 0.0004797478031832725 as the score."" 

- How slow is learning? Any other problems?

Training 1-hidden layer models took ~3 minutes while training 2-hidden layer models took ~8-9 minutes but in return for long training time 2-hidden layer models performed better on validations. Also, models with lower learning rates took more time to train. Additionally, validation and testing took under 20 seconds. I did not encounter any problems.

- Any other observations (not obligatory)

 You can add additional visualization as separate pages if you want, think of them as appendix, keeping the summary to 1-2-pages.

