## Session Objectives

In this programming lab, we will be exploring how to use a package called Tenserflow to build our first neural network to predict if house prices are above or below median value. In particular, we will go through the full Deep Learning pipeline, from:

* Exploring and Processing the Data
* Building and Training our Neural Network
* Visualizing Loss and Accuracy
* Adding Regularization to our Neural Network

## Pre-requisites

This programming lab assumes you’ve got Jupyter notebook set up with an environment that has the following packages 
* keras "2.10.0"
* tensorflow "2.10.0"
* pandas
* scikit-learn "1.2.2"
* matplotlib "3.5.1"

# Exploring and Processing the Data

Before we code any ML algorithm, the first thing we need to do is to put our data in a format that the algorithm will want. In particular, we need to:

* Read in the CSV (comma separated values) file and convert them to arrays. Arrays are a data format that our algorithm can process.
* Split our dataset into the input features (which we call x) and the label (which we call y).
* Scale the data (we call this normalization) so that the input features have similar orders of magnitude.
* Split our dataset into the training set, the validation set and the test set. 

In [1]:
import pandas as pd

In [None]:
# Read the dataset path ./input/housepricedata.csv

df = 

In [None]:
# Show the first 5 record of the dataset


Here, you can explore the data a little. We have our input features in the first ten columns:

* Lot Area (in sq ft)
* Overall Quality (scale from 1 to 10)
* Overall Condition (scale from 1 to 10)
* Total Basement Area (in sq ft)
* Number of Full Bathrooms
* Number of Half Bathrooms
* Number of Bedrooms above ground
* Total Number of Rooms above ground
* Number of Fireplaces
* Garage Area (in sq ft)

In our last column, we have the feature that we would like to predict:

* Is the house price above the median or not? (1 for yes and 0 for no)
Now that we’ve seen what our data looks like, we want to convert it into arrays for our machine to process:

The dataset that we have now is in what we call a pandas dataframe. To convert it to an array, simply access its values:

In [None]:
# Convert the dataset into an array of values using "values" method

dataset = 


In [None]:
# Print the dataset


We now split our dataset into input features (X) and the feature we wish to predict (Y). To do that split, we simply assign the first 10 columns of our array to a variable called X and the last column of our array to a variable called Y.

In [None]:
# Split input features (X) and the label (Y)
X = 
Y = 

The next step in our processing is to make sure that the scale of the input features are similar. Right now, features such as lot area are in the order of the thousands, a score for overall quality is ranged from 1 to 10, and the number of fireplaces tend to be 0, 1 or 2.

This makes it difficult for the initialization of the neural network, which causes some practical problems. One way to scale the data is to use an existing package from scikit-learn.

In [2]:
from sklearn import preprocessing

In [None]:
# USe MinMaxScaler to norilize out input features (X)
min_max_scaler = 
X_scale =

Now, we are down to our last step in processing the data, which is to split our dataset into a training set, a validation set and a test set.

We will use the code from scikit-learn called ‘train_test_split’, which as the name suggests, split our dataset into a training set and a test set. We first import the code we need:

In [3]:
from sklearn.model_selection import train_test_split

In [None]:
# Split the dataset into 70% training, 15% testing, and 15% validation
#  Your code is here


In [None]:
print(X_train.shape, X_val.shape, X_test.shape, Y_train.shape, Y_val.shape, Y_test.shape)

In summary, we now have a total of six variables for our datasets we will use:

* X_train (10 input features, 70% of full dataset)
* X_val (10 input features, 15% of full dataset)
* X_test (10 input features, 15% of full dataset)
* Y_train (1 label, 70% of full dataset)
* Y_val (1 label, 15% of full dataset)
* Y_test (1 label, 15% of full dataset)

# Building and Training Our First Neural Network

As we know, Machine Learning consists of two steps. The first step is to specify a template (an architecture) and the second step is to find the best numbers from the data to fill in that template. Our code from here on will also follow these two steps.

## First Step: Setting up the Architecture

The first thing we have to do is to set up the architecture. Let’s first think about what kind of neural network architecture we want. Suppose we want this neural network:

<figure style="padding: 1em;">
<center><img src="https://cdn-media-1.freecodecamp.org/images/H3eAYjXcA2asaCjCYrVT7lc2IIBQGQWzQlPG" width="400" alt="Diagram of network architecture: BatchNorm, Dense, BatchNorm, Dropout, Dense, BatchNorm, Dropout, Dense."></center>
<figcaption style="textalign: center; font-style: italic"><center>Neural network architecture that we will use for our problem</center></figcaption>
</figure>

In words, we want to have these layers:

* Hidden layer 1: 32 neurons, ReLU activation
* Hidden layer 2: 32 neurons, ReLU activation
* Output Layer: 1 neuron, Sigmoid activation

Now, we need to describe this architecture to Keras. We will be using the Sequential model, which means that we merely need to describe the layers above in sequence.

We will be using Keras to build our architecture. Let's import the code from Keras that we will need to use:

In [1]:
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense

In [None]:
# Build the architecture
model = 


## Second Step: Filling in the best number


Now that we've got our architecture specified, we need to find the best numbers for it. Before we start our training, we have to configure the model by
- Telling it what algorithm you want to use to do the optimization (we'll use stochastic gradient descent)
- Telling it what loss function to use (for binary classification, we will use binary cross entropy)
- Telling it what other metrics you want to track apart from the loss function (we want to track accuracy as well)

We do so below:

In [None]:
# use sgd as optimizer
# use binary_crossentropy as loss
# use accuracy as metrics

model.

‘sgd’ refers to stochastic gradient descent (over here, it refers to mini-batch gradient descent), which we’ve seen in Intuitive Deep Learning Part 1b.


The loss function for outputs that take the values 1 or 0 is called binary cross entropy.


Lastly, we want to track accuracy on top of the loss function. Now once we’ve run that cell, we are ready to train!



Training on the data is pretty straightforward and requires us to write one line of code. The function is called 'fit' as we are fitting the parameters to the data. We specify:
- what data we are training on, which is X_train and Y_train
- the size of our mini-batch 
- how long we want to train it for (epochs)
- what our validation data is so that the model will tell us how we are doing on the validation data at each point.

This function will output a history, which we save under the variable hist. We'll use this variable a little later.

In [None]:
# batch size of 32
# use 100 training epochs
hist = model.

You can now see that the model is training! By looking at the numbers, you should be able to see the loss decrease and the accuracy increase over time. At this point, you can experiment with the hyper-parameters and neural network architecture. Run the cells again to see how your training has changed when you’ve tweaked your hyperparameters.

Once you’re happy with your final model, we can evaluate it on the test set. To find the accuracy on our test set, we run this code snippet:

In [None]:
# evaluate the model using test set

model.

The evaluatation function  returns the loss as the first element and the accuracy as the second element. To only output the accuracy, simply access the second element (which is indexed by 1, since the first element starts its indexing from 0).

Due to the randomness in how we have split the dataset as well as the initialization of the weights, the numbers and graph will differ slightly each time we run our notebook. Nevertheless, you should get a test accuracy anywhere between 80% to 95% if you’ve followed the architecture I specified above!

**Summary:** Coding up our first neural network required only a few lines of code:

We specify the architecture with the Keras Sequential model.
We specify some of our settings (optimizer, loss function, metrics to track) with model.compile
We train our model (find the best parameters for our architecture) with the training data with model.fit
We evaluate our model on the test set with model.evaluate

# Visualizing Loss and Accuracy

In [2]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

We’ll explain each line of the above code snippet. 

* The first two lines says that we want to plot the loss and the val_loss. 
* The third line specifies the title of this graph, “Model Loss”. 
* The fourth and fifth line tells us what the y and x axis should be labelled respectively. 
* The sixth line includes a legend for our graph, and the location of the legend will be in the upper right. 
* And the seventh line tells Jupyter notebook to display the graph.

In [None]:
plt.plot(hist.history['accuracy'])
plt.plot(hist.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='lower right')
plt.show()

Since the improvements in our model to the training set looks somewhat matched up with improvements to the validation set, it doesn’t seem like overfitting is a huge problem in our model.

**Summary:** We use matplotlib to visualize the training and validation loss / accuracy over time to see if there’s overfitting in our model.

# Adding Regularization to our Neural Network

For the sake of introducing regularization to our neural network, let’s formulate with a neural network that will badly overfit on our training set. We’ll call this Model 2.

So, basically we will train a model which will overfit.

In [None]:
model_2 = Sequential([
    Dense(1000, activation='relu', input_shape=(10,)),
    Dense(1000, activation='relu'),
    Dense(1000, activation='relu'),
    Dense(1000, activation='relu'),
    Dense(1, activation='sigmoid'),
])
model_2.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
hist_2 = model_2.fit(X_train, Y_train,
          batch_size=32, epochs=100,
          validation_data=(X_val, Y_val))

Here, we’ve made a much larger model and we’ve use the Adam optimizer. Adam is one of the most common optimizers we use, which adds some tweaks to stochastic gradient descent such that it reaches the lower loss function faster. If we run this code and plot the loss graphs for hist_2 using the code below (note that the code is the same except that we use ‘hist_2’ instead of ‘hist’):

Let's do the same visualization to see what overfitting looks like in terms of the loss and accuracy.

In [None]:
plt.plot(hist_2.history['loss'])
plt.plot(hist_2.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

In [None]:
plt.plot(hist_2.history['accuracy'])
plt.plot(hist_2.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='lower right')
plt.show()

Now, let’s try out some of our strategies to reduce over-fitting (apart from changing our architecture back to our first model).
we’ll incorporate L2 regularization and dropout here. The reason we don’t add early stopping here is because after we’ve used the first two strategies, the validation loss doesn’t take the U-shape we see above and so early stopping will not be as effective.

In [3]:
from keras.layers import Dropout
from keras import regularizers

In [None]:
# Build 4 layers with 1000 nodes
# for each layer use L2 regularization wth 0.01
# for each layer use dropout 0.3
# for each layer use ReLu as activation function
# for the 5th layer (output layer) use sigmoid as activation function and L2 regulariztion
model_3 = 

In [None]:
# compile the model
# use adam as optimizer
# use binary_crossentropy as loss
# use accuracy as metrics

model_3.


In [None]:
# train the model
# use 100 epoch
# use 32  as batch size
# use training set
# use validation set
hist_3.

Can you spot the differences between Model 3 and Model 2? There are two main differences:

**Difference 1:**  L2 Regularization

This tells Keras to include the squared values of those parameters in our overall loss function, and weight them by 0.01 in the loss function.

**Difference 2:**  Dropout

This means that the neurons in the previous layer has a probability of 0.3 in dropping out during training. Let’s compile it and run it with the same parameters as our Model 2 (the overfitting one):

We'll now plot the loss and accuracy graphs for Model 3. You'll notice that the loss is a lot higher at the start, and that's because we've changed our loss function. To plot such that the window is zoomed in between 0 and 1.2 for the loss, we add an additional line of code (plt.ylim) when plotting

In [None]:
plt.plot(hist_3.history['loss'])
plt.plot(hist_3.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.ylim(top=1.2, bottom=0)
plt.show()

Compared to our model in Model 2, we’ve reduced overfitting substantially! And that’s how we apply our regularization techniques to reduce overfitting to the training set.

**Summary:** To deal with overfitting, we can code in the following strategies into our model each with about one line of code:

* L2 Regularization
* Dropout

---




*Have questions or comments? Feel free to reach out to me (S.Lamchoudi@aui.ma).*