#Neural Nets

Build a deep neural network to perform more sophisticated linear regression

## Learning Objectives:
  * Create a baseline model with linear regession
  * Create a simple deep neural network to compare results.
  * Regularize the deep neural network using,
    * L1 Regularization
    * L2 Regularization
    * Dropout Regularization


In [None]:
#Run on TensorFlow 2.x
%tensorflow_version 2.x
from __future__ import absolute_import, division, print_function, unicode_literals

In [None]:
#Import relevant modules
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from matplotlib import pyplot as plt
import seaborn as sns

# The following lines adjust the granularity of reporting. 
pd.options.display.max_rows = 10
pd.options.display.float_format = "{:.1f}".format

## Load the dataset

This exercise uses the California Housing Dataset. 

* `train_df`, which contains the training set
* `test_df`, which contains the test set
   

In [None]:
train_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv")

#shuffle the examples
train_df = train_df.reindex(np.random.permutation(train_df.index)) 
test_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_test.csv")

## Normalize values

When building a model with multiple features, the values of each feature should cover roughly the same range.  The following code cell normalizes datasets by converting each raw value to its Z-score. (For more information about Z-scores, see the Classification exercise.)

In [None]:
train_df.describe()

In [None]:
#Convert raw values to their Z-scores 

# Calculate the Z-scores of each column in the training set:
train_df_mean = train_df.mean()
train_df_std = train_df.std()
train_df_norm = (train_df - train_df_mean)/train_df_std

# Calculate the Z-scores of each column in the test set.
test_df_mean = test_df.mean()
test_df_std = test_df.std()
test_df_norm = (test_df - test_df_mean)/test_df_std

In [None]:
#train_df_norm.describe()

In [None]:
#plot histograms. Use ; to avoid printing the text
pd.plotting.scatter_matrix(train_df_norm, figsize=(15,15));

## Feature Engineering

Create 1 feature `latitude` X `longitude` (a feature cross)


In [None]:
train_df_norm['longxlat'] = train_df_norm['longitude']*train_df_norm['latitude']
train_features = train_df_norm[['population','median_income','longxlat','median_house_value']]
train_features

## Build a linear regression model as a baseline

Before creating a deep neural net, let's find a baseline loss by running a simple linear regression model that uses the feature layer we just created. 


In [None]:
#Instantiate the model
model = None
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1, input_shape=(3,)))
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error", metrics=[tf.keras.metrics.MeanSquaredError()])

In [None]:
#Train the model
history = model.fit(x=train_features.iloc[:,0:3], y=train_features['median_house_value'], verbose = 0, batch_size=1000, epochs=15)

In [None]:
#Plot the loss curve
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error")

plt.plot(history.history['mean_squared_error'], label='Training Loss')
print("Training Loss is:", history.history['loss'][-1])
plt.legend()

## Evaluate the model performance on ther test dataset

In [None]:
test_df_norm['longxlat'] = test_df_norm['longitude']*test_df_norm['latitude']
test_features = test_df_norm[['population','median_income','longxlat','median_house_value']]
#test_features

In [None]:
results = model.evaluate(x=test_features.iloc[:,0:3], y=test_features['median_house_value'],verbose = 0, batch_size=300)
print("Test loss is:", results[0])
print("Test MSE is:", results[1])

##Build and train a deep neural net model

The `create_model` function defines the topography of the deep neural net, specifying the following:

* The number of layers in the deep neural net.
* The number of nodesin each layer.


In [None]:
#Instantiate the model
nnmodel = None
nnmodel = tf.keras.Sequential()

#Define the first hidden layer with 20 nodes.   
nnmodel.add(tf.keras.layers.Dense(units=20, activation='relu', name='Hidden1'))

#Define the second hidden layer with 12 nodes. 
nnmodel.add(tf.keras.layers.Dense(units=12, activation='relu', name='Hidden2'))
  
#Define the output layer.
nnmodel.add(tf.keras.layers.Dense(units=1, name='Output')) 

nnmodel.compile(optimizer=tf.keras.optimizers.Adam(lr=0.01),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.MeanSquaredError()])

In [None]:
#Train the model
nnhistory = nnmodel.fit(x=train_features.iloc[:,0:3], y=train_features['median_house_value'], verbose = 0, batch_size=1000, epochs=15)

In [None]:
#Plot the loss curve
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error")

plt.plot(nnhistory.history['mean_squared_error'], label='Training Loss')
print("Training Loss is:", nnhistory.history['loss'][-1])
plt.legend()

In [None]:
#Test the model
nnresults = nnmodel.evaluate(x=test_features.iloc[:,0:3], y=test_features['median_house_value'],verbose = 0, batch_size=300)
print("Test loss is:", nnresults[0])
print("Test MSE is:", nnresults[1])

##Compare the two models

The training loss of the deep neural network model (0.47) was consistently lower than that of the linear regression model (0.51), which suggests that the deep neural network model will make better predictions than the linear regression model. Performance can be improved by optimizing the neural network by adding more layers and nodes in the network.

However, the model's loss against the test set is **still higher** than the loss against the training set.  In other words, the deep neural network is overfitting to the data in the training set.  To reduce overfitting, regularize the model. 

##Regularize the deep neural network


  * L1 regularization
  * L2 regularization
  * Dropout regularization

Experiment with one or more regularization mechanisms to bring the test loss closer to the training loss (while still keeping test loss relatively low).  

**Note:** When you add a regularization function to a model, you might need to tweak other hyperparameters. 

### Implementing L1 or L2 regularization

To use L1 or L2 regularization on a hidden layer, specify the `kernel_regularizer` argument to [tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense). Assign one of the following methods to this argument:

* `tf.keras.regularizers.l1` for L1 regularization
* `tf.keras.regularizers.l2` for L2 regularization

Each of the preceding methods takes an `l` parameter, which adjusts the [regularization rate](https://developers.google.com/machine-learning/glossary/#regularization_rate). Assign a decimal value between 0 and 1.0 to `l`; the higher the decimal, the greater the regularization. For example, the following applies L2 regularization at a strength of 0.05. 

```
model.add(tf.keras.layers.Dense(units=20, 
                                activation='relu',
                                kernel_regularizer=tf.keras.regularizers.l2(l=0.01),
                                name='Hidden1'))
```

In [None]:
#Instantiate the model
l1model = None
l1model = tf.keras.Sequential()

#Define the first hidden layer with 20 nodes.   
l1model.add(tf.keras.layers.Dense(units=20, activation='relu', name='Hidden1', kernel_regularizer=tf.keras.regularizers.l1(0.04)))

#Define the second hidden layer with 12 nodes. 
l1model.add(tf.keras.layers.Dense(units=12, activation='relu', name='Hidden2', kernel_regularizer=tf.keras.regularizers.l1(0.04)))
  
#Define the output layer.
l1model.add(tf.keras.layers.Dense(units=1, name='Output')) 

l1model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.01),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.MeanSquaredError()])

In [None]:
#Train the model
l1history = l1model.fit(x=train_features.iloc[:,0:3], y=train_features['median_house_value'], verbose = 0, batch_size=1000, epochs=15)

In [None]:
#Plot the loss curve
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error")

plt.plot(l1history.history['mean_squared_error'], label='Training Loss')
print("Training Loss is:", l1history.history['loss'][-1])
plt.legend()

In [None]:
#Test the model
l1results = l1model.evaluate(x=test_features.iloc[:,0:3], y=test_features['median_house_value'],verbose = 0, batch_size=300)
print("Test loss is:", l1results[0])
print("Test MSE is:", l1results[1])

The training loss and test loss are now closer thus preventing overfitting.

### Implementing Dropout regularization

You implement dropout regularization as a separate layer in the topography. For example, the following code demonstrates how to add a dropout regularization layer between the first hidden layer and the second hidden layer:

```
model.add(tf.keras.layers.Dense( *define first hidden layer*)
 
model.add(tf.keras.layers.Dropout(rate=0.25))

model.add(tf.keras.layers.Dense( *define second hidden layer*)
```

The `rate` parameter to [tf.keras.layers.Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) specifies the fraction of nodes that the model should drop out during training. 