# Overview

In this notebook we are going to cover the basics of the Keras API as well as introducing some key hyper-parameters for neural networks. I have provided a translation to Python in the event that the R installation does not work.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.datasets import boston_housing
from keras.optimizers import SGD
import pandas as pd
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Data

We're going get the Boston housing data and split it into a training and testing set. Afterwards, we will perform some exploratory analysis. To do this, you will need to use pandas functionality. These exercises are not essential to the course, but rather are meant to act as a warm-up to coding.

In [None]:
# Load the data
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

# Convert the training data into a DataFrame so that we can do some exploratory analysis
train_df = pd.DataFrame(X_train, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 
                                          'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])

# Add the column for price
train_df["PRICE"] = y_train

## Exercise 1

One of the features in our model CHAS is binary -- it indicates whether the property is "bounded" by the Charles River or not; for our first warm-up exercise, I would like you to tell me what how being by the Charles River affects the median selling price of a home, additionally, do this using a "tidy" methodology; once you have done this, think of why you are seeing this result; we will briefly discuss this as a class

In [None]:
# Group-by CHAS and compute mean PRICE
train_df.groupby("CHAS").mean()["PRICE"]

## Exercise 2

For our final warm-up, I also want to remind us how to use filter data and then make plots. One of the features in the data is RAD this indicates the "index of accessibility" to radial highways. One of these indices is 24. Additionally there is a feature called AGE which defines the proportion of houses built before 1940. For this exercise, I want you to focus on instances where the RAD is 24 and then plot their relation of the home AGE to its price. Tell me what you see.

In [None]:
# Fit RAD == 24 and scatter plot with AGE vs PRICE
train_df.loc[train_df["RAD"] == 24, :].plot.scatter(x="AGE", y="PRICE")
plt.show()

# Data Pre-Processing

A standard practice in ML is to normalize the data so that each column has zero mean and a variance of 1. We will find these values for the training data and then apply it to the test set. A simple way this can be done is by using Scikit-Learn which has builting in functionality to do this operation.

In [None]:
# Define a scaler object
scaler = StandardScaler()

# Scale the training and testing data (notice how we do not use the testing data to inform the scaling factors)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Keras API

Now that we've done some simple data preparation, we're ready to introduce the Keras API. It has three major components. I will provide a simple example of how to use it and then we will run some exercises so you have a chance to practice with the API as well as gaining some intution about neural networks

In [None]:
# The first component of the Keras API is defining a model. This can be done by typing
model = Sequential([
    Dense(32, activation="relu", input_shape=(X_train.shape[1],)),
    Dense(1, activation="linear")
])

In [None]:
# We can see the parameters of the model by typing
model.summary()

In [None]:
# Now we need to compile the model -- tell the system how it is to be optimized
model.compile(optimizer=SGD(), loss='mse')

In [None]:
# Finally we have to tell the system how we want to train the model
model.fit(X_train, y_train, epochs=10, verbose=1, 
          validation_split=0.25, batch_size=128)

And that's all there is to defining and training a neural network in Keras. Now let's do some exercises that give you a chance to work with the API as well as gaining some intuition about key hyper-parameters

## Exercise 3

The first hyper-parameter we will focus on is the learning rate. This defines how much we update the inferred parameters in our model at each iteration. Using the learning rate that was specified for your group, I want you to train the exact same neural network as we did before. As a hint, type ?SGD; this might also lead to some other questions. Also make sure to TYPE out the code, do not just copy from what we did previously and try to do it from memory; this will force you to try to understand what each part of the model is doing and how it all flows with another. When you're finished, we will discuss the results that we're seeing

In [None]:
# I'll show this for lr = 1e-3; it's the same code for the other situations
model = Sequential([
    Dense(32, activation="relu", input_shape=(X_train.shape[1],)),
    Dense(1, activation="linear")
])

model.compile(optimizer=SGD(lr=1e-3), loss='mse')

model.fit(X_train, y_train, epochs=10, verbose=1, 
          validation_split=0.25, batch_size=128)

## Exercise 4

Another key hyper-parameter is the number of layers to use in your model So, for our next exercise, add one more layer to the model and report the results. Namely, plot the loss profile, determine the final validation loss, and compare the results to the model with only one layer. When adding layers, keep the number of units the same and use a learning rate of 1e-3.

**HINT**: To plot the loss profile, you can save the training history from the fit command to a variable


In [None]:
model = Sequential([
    Dense(32, activation="relu", input_shape=(X_train.shape[1],)),
    Dense(32, activation="relu"), # this is how easy it is to add another in Keras
    Dense(1, activation="linear")
])

model.compile(optimizer=SGD(lr=1e-4), loss='mse')

res = model.fit(X_train, y_train, epochs=10, verbose=1, 
                validation_split=0.25, batch_size=128)

In [None]:
# Here's how to plot the loss results
plt.plot(res.history["val_loss"])

## Exercise 5

Another hyper-parameter that can be changed is the number of nodes or units to have for a particular layer of a neural network. For this exercise, using the single-layer architecture, fit a layer with 512 nodes. Evaluate this model both in and out-of-sample

In [None]:
# All we have to do is change the number of units in the first Dense layer
model = Sequential([
    Dense(512, activation="relu", input_shape=(X_train.shape[1],)),
    Dense(1, activation="linear")
])

model.compile(optimizer=SGD(lr=1e-3), loss='mse')

model.fit(X_train, y_train, epochs=10, verbose=1, 
          validation_split=0.25, batch_size=128)

In [None]:
# In-sample evaluation
model.evaluate(X_train, y_train)

In [None]:
# Out-of-sample evaluation
model.evaluate(X_test, y_test)

## Exercise 6

Using your knowledge of the Keras API as well as the architecture for the model that we created for Exercise 5, add a layer_dropout with a rate of 0.7 to the model to regularize it

In [None]:
# We can add the Dropout layer shortly after the Dense layer
model = Sequential([
    Dense(512, activation="relu", input_shape=(X_train.shape[1],)),
    Dropout(rate=0.7), # here's where we add the Dropout
    Dense(1, activation="linear")
])

model.compile(optimizer=SGD(lr=1e-3), loss='mse')

model.fit(X_train, y_train, epochs=10, verbose=1, 
          validation_split=0.25, batch_size=128)