# Deep Learning

In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up.

**Please read more about the data [here](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html).**

In [6]:
!pip install --upgrade tensorflow==2.0

Collecting tensorflow==2.0
[?25l  Downloading https://files.pythonhosted.org/packages/46/0f/7bd55361168bb32796b360ad15a25de6966c9c1beb58a8e30c01c8279862/tensorflow-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl (86.3MB)
[K     |████████████████████████████████| 86.3MB 32.7MB/s 
Collecting tensorboard<2.1.0,>=2.0.0 (from tensorflow==2.0)
[?25l  Downloading https://files.pythonhosted.org/packages/9b/a6/e8ffa4e2ddb216449d34cfcb825ebb38206bee5c4553d69e7bc8bc2c5d64/tensorboard-2.0.0-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 45.1MB/s 
Collecting tensorflow-estimator<2.1.0,>=2.0.0 (from tensorflow==2.0)
[?25l  Downloading https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93b1ad6f8fb7d22c1ca1/tensorflow_estimator-2.0.1-py2.py3-none-any.whl (449kB)
[K     |████████████████████████████████| 450kB 58.7MB/s 
Installing collected packages: tensorboard, tensorflow-estimator, tensorflow
  Found existing installation: tensorboard 1.15

In [1]:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.layers import Dense, Dropout, Activation

np.random.seed(0)

In [2]:
(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()

In [3]:
x_train.shape

(404, 13)

In [4]:
y_train.shape

(404,)

In [5]:
y_train.mean(), y_train.std()

(22.395049504950492, 9.199035423364862)

The keras model consists of multiple parts:

1. Construct the model layers, neurons per layer, activation function
1. Determine the loss function, metrics and optimization method
1. Fit the model to some data
1. Evaluate the model using the same metric

Some relevant documentation:
 - [initializers](https://keras.io/initializers/)
 - [loss functions](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
 - [regularizations](https://keras.io/regularizers/)
 - [optimizers](https://keras.io/optimizers/)
 - [metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)


First, to construct a model, use the [Sequential](https://keras.io/getting-started/sequential-model-guide/) object. You can pass multiple layers to the sequential object. For this next step, we will only be using the [Dense](https://keras.io/layers/core/#dense) layers.

In [7]:
# Create 3 hidden layers and an output layer with 10 neurons each and set them to the variable layers
# Use any activation you like such as "relu" or "tanh", you can alternate for each layer
# For your first run, try using the linear activation and then see if modifying the activations improved the result.
# Optional - give each layer a name and see how that shows up in the summary.
# YOUR CODE HERE

layers= [Dense(10, input_shape=(13,),
    activation='relu'),
    Dense(10, activation='relu'),
    Dense(10,activation='tanh'),
    Dense(1)]
model = keras.Sequential(layers)
#model.add(Activation('relu'))
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 10)                140       
_________________________________________________________________
dense_5 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_6 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 11        
Total params: 371
Trainable params: 371
Non-trainable params: 0
_________________________________________________________________


In [0]:
assert len(layers) == 4
for i,layer in enumerate(layers):
    assert isinstance(layers[i], keras.layers.Dense) 
    assert layer.weights[1].shape == [10,10,10,1][i]

In [0]:
# Set up the model to do the following:
# - use stochastic gradient descent to fit the model
# - use mean absolute error as its loss function
# - use mean absolute error as one of its metrics
# - use mean squared error as one of its metrics
# YOUR CODE HERE

sgd = keras.optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error',
              optimizer=sgd,
              metrics=['mean_absolute_error','mean_squared_error'])

In [35]:
assert isinstance(model.optimizer, keras.optimizers.SGD)
assert model.loss in ["mae", "mean_absolute_error"]
assert "mae" in model.metrics or "mean_absolute_error" in model.metrics

AssertionError: ignored

In [0]:
# Now fit the model
model.fit(x_train, y_train, epochs=50)

In [63]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_26 (Dense)             (None, 10)                140       
_________________________________________________________________
dense_27 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_28 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_29 (Dense)             (None, 1)                 11        
Total params: 371
Trainable params: 371
Non-trainable params: 0
_________________________________________________________________


In [64]:
# Here we can evaluate how our model does based on the test data
model.evaluate(x_test, y_test)



[6.645743126962699, 6.645743, 88.366104]

Now let's try another optimizer instead of stochastic gradient descent. [Adam](https://keras.io/optimizers/#adam) is the recommended default for training neural networks since it usually performs quite well. In the next cell, compile the model with adam instead of sgd and use the same loss and metrics. After compiling, fit the data for as many epochs as you think it takes to see the value start to converge.

In [0]:
# Compile the model using adam
# YOUR CODE HERE
layers= [Dense(10, input_shape=(13,),
    activation='relu'),
    Dense(10, activation='relu'),
    Dense(10,activation='tanh'),
    Dense(1)]
model = keras.Sequential(layers)
ad=keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss='mean_absolute_error',
              optimizer=ad,
              metrics=['mean_absolute_error','mean_squared_error'])
model.fit(x_train, y_train, epochs=50)

In [68]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_34 (Dense)             (None, 10)                140       
_________________________________________________________________
dense_35 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_36 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_37 (Dense)             (None, 1)                 11        
Total params: 371
Trainable params: 371
Non-trainable params: 0
_________________________________________________________________


In [69]:
assert isinstance(model.optimizer, keras.optimizers.Adam)
assert model.loss in ["mae", "mean_absolute_error"]
assert "mae" in model.metrics or "mean_absolute_error" in model.metrics

AssertionError: ignored

In [70]:
model.evaluate(x_test, y_test)



[15.372428557452034, 15.372429, 316.80313]

Now recreate the model with dropout layers. Add 2 dropout layers, each after the first or second layer respectively. Select a low value of dropout that results in a good score.

In [0]:
# YOUR CODE HERE
layers= [Dense(10, input_shape=(13,), activation='relu'),Dropout(.015),
        Dense(10, activation='relu'), Dropout(.015),
        Dense(10,activation='relu'), 
        Dense(1)]
model = keras.Sequential(layers)

In [0]:
assert len(layers) == 6
assert isinstance(layers[1], Dropout)
assert isinstance(layers[3], Dropout)
for i,layer in enumerate(layers):
    if i not in [1,3]:
        assert isinstance(layers[i], keras.layers.Dense) 
        assert layer.weights[1].shape == [10,0,10,0,10,1][i]

In [103]:
model.compile(optimizer='adam', loss='mae', metrics=['mae', "mse"])
model.fit(x_train, y_train, epochs=500, verbose=0)
model.summary()

Model: "sequential_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_80 (Dense)             (None, 10)                140       
_________________________________________________________________
dropout_16 (Dropout)         (None, 10)                0         
_________________________________________________________________
dense_81 (Dense)             (None, 10)                110       
_________________________________________________________________
dropout_17 (Dropout)         (None, 10)                0         
_________________________________________________________________
dense_82 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_83 (Dense)             (None, 1)                 11        
Total params: 371
Trainable params: 371
Non-trainable params: 0
_______________________________________________________

In [104]:
model.evaluate(x_test, y_test)



[4.050873924704159, 4.0508738, 33.364864]

Select a dropout rate that gets an okay score.

In [105]:
assert model.evaluate(x_test, y_test)[0] < 7



## Feedback

In [0]:
def feedback():
    """Provide feedback on the contents of this exercise
    
    Returns:
        string
    """
    # YOUR CODE HERE
    return "Asserts are broken"