## Basics of deep learning and neural networks


In [None]:
# Coding the forward propagation algorithm

"""https://s3.amazonaws.com/assets.datacamp.com/production/course_3524/datasets/1_4.png"""

# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

"""-39"""

In [None]:
# The Rectified Linear Activation Function

def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(input, 0)
    
    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

"""
52

You predicted 52 transactions. Without this activation function, you would have predicted a negative number! 
The real power of activation functions will come soon when you start tuning model weights.
"""

In [None]:
# Applying the network to many observations/rows of data

# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row*weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row*weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs*weights['output'])
    model_output = input_to_final_layer.sum()
    
    # Return model output
    return(model_output)


# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))

# Print results
print(results)

"""[52, 63, 0, 148]"""

In [None]:
# Multi-layer neural networks

def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs*weights['output']).sum()
    
    # Return model_output
    return(model_output)

output = predict_with_network(input_data)
print(output)

"""182 The network generated a prediction of 182."""

In [None]:
"""
How are the weights that determine the features/interactions in Neural Networks created?

The model training process sets them to optimize predictive accuracy.

Which layers of a model capture more complex or "higher level" interactions?

--> The last layers capture the most complex interactions.
"""

## Optimizing a neural network with backward propagation



In [None]:
"""
For the exercises in this chapter, you'll continue working with the network to predict transactions for a bank.

What is the error (predicted - actual) for the following network using the ReLU activation function when the 
input data is [3, 2] and the actual value of the target (what you are trying to predict) is 5? 
It may be helpful to get out a pen and piece of paper to calculate these values.

https://s3.amazonaws.com/assets.datacamp.com/production/course_3524/datasets/ch2_ex2_3.png

--> The network generates a prediction of 16, which results in an error of 11.

Imagine you have to make a prediction for a single data point. The actual value of the target is 7. 
The weight going from node_0 to the output is 2, as shown below. If you increased it slightly, changing it to 2.01, 
would the predictions become more accurate, less accurate, or stay the same?

--> less accurate
"""

In [None]:
# Coding how weight changes affect accuracy

# The data point you will make a prediction for
input_data = np.array([0, 3])

# Sample weights
weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

# The actual target value, used to calculate the error
target_actual = 3

# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)

# Calculate error: error_0
error_0 = model_output_0 - target_actual

# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 0],
             'output': [1, 1]
            }

# Make prediction using new weights: model_output_1
model_output_1 = predict_with_network(input_data, weights_1)

# Calculate error: error_1
error_1 = target_actual - model_output_1

# Print error_0 and error_1
print(error_0)
print(error_1)

"""
6
0
"""

In [None]:
# Scaling up to multiple data points

from sklearn.metrics import mean_squared_error

# Create model_output_0 
model_output_0 = []
# Create model_output_1
model_output_1 = []

# Loop over input_data
for row in input_data:
    # Append prediction to model_output_0
    model_output_0.append(predict_with_network(row, weights_0))
    
    # Append prediction to model_output_1
    model_output_1.append(predict_with_network(row, weights_1))

# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(target_actuals, model_output_0)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(target_actuals, model_output_1)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)

"""
Mean squared error with weights_0: 37.500000
Mean squared error with weights_1: 49.890625
"""

In [None]:
# Calculating slopes

"""
You're now going to practice calculating slopes. When plotting the mean-squared error loss function against predictions, 
the slope is 2 * x * (xb-y), or 2 * input_data * error. Note that x and b may have multiple numbers 
(x is a vector for each data point, and b is a vector). In this case, the output will also be a vector, 
which is exactly what you want.
"""

# Calculate the predictions: preds
preds = (weights*input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = input_data * error * 2

# Print the slope
print(slope)

"""
[14 28 42]
"""

In [None]:
# Improving model weights

# Set the learning rate: learning_rate
learning_rate = 0.01

# Calculate the predictions: preds
preds = (weights * input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Update the weights: weights_updated
weights_updated = weights - (learning_rate*slope)

# Get updated predictions: preds_updated
preds_updated = (weights_updated * input_data).sum()

# Calculate updated error: error_updated
error_updated = preds_updated - target

# Print the original error
print(error)

# Print the updated error
print(error_updated)

"""
7
5.04
"""

In [None]:
# Making multiple updates to weights

n_updates = 20
mse_hist = []

# Iterate over the number of updates
for i in range(n_updates):
    # Calculate the slope: slope
    slope = get_slope(input_data, target, weights)
    
    # Update the weights: weights
    weights = weights - 0.01 * slope
    
    # Calculate mse with new weights: mse
    mse = get_mse(input_data, target, weights)
    
    # Append the mse to mse_hist
    mse_hist.append(mse)

# Plot the mse history
plt.plot(mse_hist)
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
plt.show()

In [None]:
"""
The relationship between forward and backward propagation
If you have gone through 4 iterations of calculating slopes (using backward propagation) and then updated weights, 
how many times must you have done forward propagation?

--> 4

If your predictions were all exactly right, and your errors were all exactly 0, the slope of the loss function 
with respect to your predictions would also be 0. In that circumstance, which of the following statements would be correct?

--> The updates to all weights in the network would also be 0.

    A round of backpropagation
In the network shown below, we have done forward propagation, and node values calculated as part of forward propagation 
are shown in white. The weights are shown in black. Layers after the question mark show the slopes calculated as part of 
back-prop, rather than the forward-prop values. Those slope values are shown in purple.
    This network again uses the ReLU activation function, so the slope of the activation function is 1 for any node 
receiving a positive value as input. Assume the node being examined had a positive value 
(so the activation function's slope is 1).
https://s3.amazonaws.com/assets.datacamp.com/production/course_3524/datasets/ch2ex14_1.png

--> 6
"""

## Building deep learning models with keras
In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.



In [None]:
# Specifying a model

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Set up the model: model
model = Sequential()

# Add the first layer
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))

# Add the second layer
model.add(Dense(32, activation='relu'))

# Add the output layer
model.add(Dense(1))

In [None]:
# Compiling the model

"""
keras optimizers: https://keras.io/api/optimizers/#adam
Adam's paper: https://arxiv.org/abs/1412.6980v8
"""

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Verify that model contains information from compiling
print("Loss function: " + model.loss)

In [None]:
# Fitting the model

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model
model.fit(predictors, target)

"""
Epoch 1/10
    
 32/534 [>.............................] - ETA: 0s - loss: 146.0927
534/534 [==============================] - 0s - loss: 80.8377      
    Epoch 2/10
    
 32/534 [>.............................] - ETA: 0s - loss: 86.5655
534/534 [==============================] - 0s - loss: 30.6980     
    Epoch 3/10
    
 32/534 [>.............................] - ETA: 0s - loss: 21.0332
534/534 [==============================] - 0s - loss: 27.1645     
    Epoch 4/10
    
 32/534 [>.............................] - ETA: 0s - loss: 16.9645
534/534 [==============================] - 0s - loss: 25.1634     
    Epoch 5/10
    
 32/534 [>.............................] - ETA: 0s - loss: 23.2594
534/534 [==============================] - 0s - loss: 24.0788     
    Epoch 6/10
    
 32/534 [>.............................] - ETA: 0s - loss: 13.3641
448/534 [========================>.....] - ETA: 0s - loss: 23.6005
534/534 [==============================] - 0s - loss: 23.2859     
    Epoch 7/10
    
 32/534 [>.............................] - ETA: 0s - loss: 28.1841
534/534 [==============================] - 0s - loss: 22.5868     
    Epoch 8/10
    
 32/534 [>.............................] - ETA: 0s - loss: 11.5292
534/534 [==============================] - 0s - loss: 22.1744     
    Epoch 9/10
    
 32/534 [>.............................] - ETA: 0s - loss: 21.9081
534/534 [==============================] - 0s - loss: 21.7692     
    Epoch 10/10
    
 32/534 [>.............................] - ETA: 0s - loss: 5.4873
534/534 [==============================] - 0s - loss: 21.5639 
"""

In [None]:
# Last steps in classification models

# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.utils import to_categorical

# Convert the target to categorical: target
target = to_categorical(df.survived)

# Set up the model
model = Sequential()

# Add the first layer
model.add(Dense(32, activation='relu', input_shape=(n_cols,)))

# Add the output layer
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(predictors, target)

"""
This simple model is generating an accuracy of 68!
"""

In [None]:
# Making predictions

# Specify, compile, and fit the model
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
model.fit(predictors, target)

# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:,1]

# print predicted_prob_true
print(predicted_prob_true)

"""
[0.27609792 0.43608963 0.78664505 0.48941714 0.2427727  0.21639577
     0.09800863 0.36834946 0.22350675 0.5689344  0.26583907 0.33334616
     0.22368406 0.33599177 0.22191    0.1795439  0.3084657  0.4662342
     0.13396168 0.40895924 0.65803    0.26842186 0.10306429 0.36911148
     0.37949595 0.22505754 0.57825166 0.50606996 0.23680483 0.5697499
     0.48121956 0.46790472 0.23128356 0.29944226 0.37085712 0.6701378
     0.33963767 0.22245668 0.5848665  0.46539986 0.33743626 0.41610914
     0.49419317 0.19858138 0.3917999  0.14388466 0.4169323  0.20333302
     0.4759615  0.7201449  0.32034287 0.03958427 0.4666656  0.5926019
     0.2937687  0.42009    0.8911674  0.25865832 0.46557653 0.23128356
     0.17487614 0.36022252 0.28130895 0.4185874  0.36777553 0.20134513
     0.35476878 0.5614948  0.24505188 0.45120114 0.26598433 0.47363704
     0.19699349 0.12412009 0.47023526 0.4334928  0.37694576 0.34953842
     0.22025892 0.5982637  0.4834284  0.20240319 0.37316647 0.29936978
     0.2621868  0.5070689  0.34357235 0.5416718  0.42867297 0.48576564
     0.2194321 ]
"""

## Fine-tuning keras models
Start by learning how to validate your models, then understand the concept of model capacity, and finally, experiment with wider and deeper networks.



In [None]:
"""
Diagnosing optimization problems
Which of the following could prevent a model from showing an improved loss in its first few epochs?

--> Learning rate too low.
Learning rate too high.
Poor choice of activation function.
"""

In [None]:
# Changing optimization parameters

# Import the SGD optimizer
from keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [0.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )
    
    # Build new model to test, unaffected by previous models
    model = get_new_model()
    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)
    
    # Compile the model
    model.compile(optimizer=my_optimizer, loss='categorical_crossentropy')
    
    # Fit the model
    model.fit(predictors, target)

In [None]:
# Evaluating model accuracy on validation dataset

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
hist = model.fit(predictors, target, validation_split=0.3)

"""
Train on 623 samples, validate on 268 samples
    Epoch 1/10
    
 32/623 [>.............................] - ETA: 0s - loss: 3.3028 - acc: 0.4062
623/623 [==============================] - 0s - loss: 1.3118 - acc: 0.6003 - val_loss: 0.6823 - val_acc: 0.7201
    Epoch 2/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.6880 - acc: 0.7188
544/623 [=========================>....] - ETA: 0s - loss: 0.8722 - acc: 0.5735
623/623 [==============================] - 0s - loss: 0.8810 - acc: 0.5714 - val_loss: 1.1022 - val_acc: 0.6418
    Epoch 3/10
    
 32/623 [>.............................] - ETA: 0s - loss: 1.0322 - acc: 0.5938
480/623 [======================>.......] - ETA: 0s - loss: 0.8238 - acc: 0.6146
623/623 [==============================] - 0s - loss: 0.7976 - acc: 0.6228 - val_loss: 0.8591 - val_acc: 0.6306
    Epoch 4/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.6479 - acc: 0.6875
544/623 [=========================>....] - ETA: 0s - loss: 0.7715 - acc: 0.6526
623/623 [==============================] - 0s - loss: 0.7540 - acc: 0.6517 - val_loss: 0.6891 - val_acc: 0.7090
    Epoch 5/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.6848 - acc: 0.6250
480/623 [======================>.......] - ETA: 0s - loss: 0.6596 - acc: 0.6667
623/623 [==============================] - 0s - loss: 0.6770 - acc: 0.6437 - val_loss: 0.5911 - val_acc: 0.7201
    Epoch 6/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.5709 - acc: 0.6875
576/623 [==========================>...] - ETA: 0s - loss: 0.6569 - acc: 0.6562
623/623 [==============================] - 0s - loss: 0.6597 - acc: 0.6501 - val_loss: 0.5279 - val_acc: 0.7463
    Epoch 7/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.5610 - acc: 0.7500
576/623 [==========================>...] - ETA: 0s - loss: 0.6067 - acc: 0.6771
623/623 [==============================] - 0s - loss: 0.6008 - acc: 0.6806 - val_loss: 0.5111 - val_acc: 0.7201
    Epoch 8/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.5943 - acc: 0.7500
448/623 [====================>.........] - ETA: 0s - loss: 0.5832 - acc: 0.6987
623/623 [==============================] - 0s - loss: 0.5913 - acc: 0.6902 - val_loss: 0.5253 - val_acc: 0.7649
    Epoch 9/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.5629 - acc: 0.7188
480/623 [======================>.......] - ETA: 0s - loss: 0.6668 - acc: 0.6521
623/623 [==============================] - 0s - loss: 0.6743 - acc: 0.6597 - val_loss: 0.5660 - val_acc: 0.7052
    Epoch 10/10
    
 32/623 [>.............................] - ETA: 0s - loss: 0.4828 - acc: 0.8125
448/623 [====================>.........] - ETA: 0s - loss: 0.6142 - acc: 0.7121
623/623 [==============================] - 0s - loss: 0.6207 - acc: 0.6886 - val_loss: 0.5381 - val_acc: 0.7388
"""

In [None]:
# Early stopping: Optimizing the optimization

# Import EarlyStopping
from keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Fit the model
model.fit(predictors, target, epochs=30, validation_split=0.3, callbacks=[early_stopping_monitor])

"""
Train on 623 samples, validate on 268 samples
    Epoch 1/30
    
 32/623 [>.............................] - ETA: 0s - loss: 5.6563 - acc: 0.4688
623/623 [==============================] - 0s - loss: 1.6580 - acc: 0.5634 - val_loss: 1.0181 - val_acc: 0.6791
    Epoch 2/30
    
 32/623 [>.............................] - ETA: 0s - loss: 1.7911 - acc: 0.4688
623/623 [==============================] - 0s - loss: 0.8405 - acc: 0.6003 - val_loss: 0.5838 - val_acc: 0.7388
    Epoch 3/30
    
 32/623 [>.............................] - ETA: 0s - loss: 0.9384 - acc: 0.6250
623/623 [==============================] - 0s - loss: 0.7984 - acc: 0.6260 - val_loss: 0.6721 - val_acc: 0.7276
    Epoch 4/30
    
 32/623 [>.............................] - ETA: 0s - loss: 1.4019 - acc: 0.5625
623/623 [==============================] - 0s - loss: 0.7350 - acc: 0.6372 - val_loss: 0.5389 - val_acc: 0.7164
    Epoch 5/30
    
 32/623 [>.............................] - ETA: 0s - loss: 0.5675 - acc: 0.7188
623/623 [==============================] - 0s - loss: 0.6439 - acc: 0.6581 - val_loss: 0.5773 - val_acc: 0.6940
    Epoch 6/30
    
 32/623 [>.............................] - ETA: 0s - loss: 0.4347 - acc: 0.8750
623/623 [==============================] - 0s - loss: 0.5946 - acc: 0.6966 - val_loss: 0.5682 - val_acc: 0.6828
    Epoch 7/30
    
 32/623 [>.............................] - ETA: 0s - loss: 0.6054 - acc: 0.6875
623/623 [==============================] - 0s - loss: 0.6401 - acc: 0.7063 - val_loss: 0.6818 - val_acc: 0.6493

Because optimization will automatically stop when it is no longer helpful, it is okay to specify the maximum number 
of epochs as 30 rather than using the default of 10 that you've used so far. 
Here, it seems like the optimization stopped after 7 epochs.
"""

In [None]:
# Experimenting with wider networks

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()

# Add the first and second layers
model_2.add(Dense(100, activation='relu', input_shape=input_shape))
model_2.add(Dense(100, activation='relu'))

# Add the output layer
model_2.add(Dense(2, activation='softmax'))

# Compile model_2
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit model_1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Fit model_2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

"""The blue model is the one you made, the red is the original model. Your model had a lower loss value, so it is the better model. Nice job!"""

In [None]:
# Adding layers to a network

# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(50, activation='relu', input_shape=input_shape))
model_2.add(Dense(50, activation='relu'))
model_2.add(Dense(50, activation='relu'))

# Add the output layer
model_2.add(Dense(2, activation='softmax'))

# Compile model_2
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

"""The blue model is the one you made and the red is the original model. The model with the lower loss value is the better model."""

In [None]:
"""
Experimenting with model structures
You've just run an experiment where you compared two networks that were identical except that the 2nd network 
had an extra hidden layer. You see that this 2nd network (the deeper network) had better performance. 
Given that, which of the following would be a good experiment to run next for even better performance?

--> Use more units in each hidden layer.
"""

In [None]:
# Building your own digit recognition model

# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50, activation='relu', input_shape=(784,)))

# Add the second hidden layer
model.add(Dense(50, activation='relu'))

# Add the output layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(X, y, validation_split=0.3)

"""
Train on 1750 samples, validate on 750 samples
    Epoch 1/10
    
  32/1750 [..............................] - ETA: 3s - loss: 2.1979 - acc: 0.2188
 448/1750 [======>.......................] - ETA: 0s - loss: 2.1749 - acc: 0.2299
 832/1750 [=============>................] - ETA: 0s - loss: 2.0266 - acc: 0.3113
1312/1750 [=====================>........] - ETA: 0s - loss: 1.8423 - acc: 0.3849
1750/1750 [==============================] - 0s - loss: 1.6738 - acc: 0.4651 - val_loss: 1.0040 - val_acc: 0.7680
    Epoch 2/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.9442 - acc: 0.7500
 544/1750 [========>.....................] - ETA: 0s - loss: 0.7563 - acc: 0.8419
1088/1750 [=================>............] - ETA: 0s - loss: 0.7282 - acc: 0.8281
1632/1750 [==========================>...] - ETA: 0s - loss: 0.6824 - acc: 0.8284
1750/1750 [==============================] - 0s - loss: 0.6777 - acc: 0.8269 - val_loss: 0.5345 - val_acc: 0.8573
    Epoch 3/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.3754 - acc: 0.9688
 576/1750 [========>.....................] - ETA: 0s - loss: 0.4164 - acc: 0.8993
1152/1750 [==================>...........] - ETA: 0s - loss: 0.3999 - acc: 0.9062
1696/1750 [============================>.] - ETA: 0s - loss: 0.4158 - acc: 0.8909
1750/1750 [==============================] - 0s - loss: 0.4170 - acc: 0.8886 - val_loss: 0.4493 - val_acc: 0.8667
    Epoch 4/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.1787 - acc: 0.9062
 576/1750 [========>.....................] - ETA: 0s - loss: 0.2951 - acc: 0.9288
1120/1750 [==================>...........] - ETA: 0s - loss: 0.3239 - acc: 0.9107
1664/1750 [===========================>..] - ETA: 0s - loss: 0.3254 - acc: 0.9087
1750/1750 [==============================] - 0s - loss: 0.3233 - acc: 0.9086 - val_loss: 0.3930 - val_acc: 0.8813
    Epoch 5/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.1720 - acc: 0.9375
 544/1750 [========>.....................] - ETA: 0s - loss: 0.2605 - acc: 0.9283
 992/1750 [================>.............] - ETA: 0s - loss: 0.2572 - acc: 0.9294
1472/1750 [========================>.....] - ETA: 0s - loss: 0.2421 - acc: 0.9334
1750/1750 [==============================] - 0s - loss: 0.2594 - acc: 0.9286 - val_loss: 0.3715 - val_acc: 0.8893
    Epoch 6/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.0794 - acc: 1.0000
 576/1750 [========>.....................] - ETA: 0s - loss: 0.1962 - acc: 0.9479
1120/1750 [==================>...........] - ETA: 0s - loss: 0.1988 - acc: 0.9500
1504/1750 [========================>.....] - ETA: 0s - loss: 0.2036 - acc: 0.9475
1750/1750 [==============================] - 0s - loss: 0.2074 - acc: 0.9446 - val_loss: 0.3482 - val_acc: 0.8933
    Epoch 7/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.1653 - acc: 0.9375
 448/1750 [======>.......................] - ETA: 0s - loss: 0.1586 - acc: 0.9665
 928/1750 [==============>...............] - ETA: 0s - loss: 0.1598 - acc: 0.9655
1440/1750 [=======================>......] - ETA: 0s - loss: 0.1706 - acc: 0.9611
1750/1750 [==============================] - 0s - loss: 0.1711 - acc: 0.9606 - val_loss: 0.3334 - val_acc: 0.8947
    Epoch 8/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.0865 - acc: 1.0000
 544/1750 [========>.....................] - ETA: 0s - loss: 0.1542 - acc: 0.9724
 992/1750 [================>.............] - ETA: 0s - loss: 0.1374 - acc: 0.9738
1472/1750 [========================>.....] - ETA: 0s - loss: 0.1384 - acc: 0.9715
1750/1750 [==============================] - 0s - loss: 0.1401 - acc: 0.9697 - val_loss: 0.3232 - val_acc: 0.9067
    Epoch 9/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.0587 - acc: 1.0000
 576/1750 [========>.....................] - ETA: 0s - loss: 0.0873 - acc: 0.9896
1120/1750 [==================>...........] - ETA: 0s - loss: 0.1037 - acc: 0.9839
1664/1750 [===========================>..] - ETA: 0s - loss: 0.1114 - acc: 0.9808
1750/1750 [==============================] - 0s - loss: 0.1140 - acc: 0.9789 - val_loss: 0.3377 - val_acc: 0.8907
    Epoch 10/10
    
  32/1750 [..............................] - ETA: 0s - loss: 0.1459 - acc: 1.0000
 544/1750 [========>.....................] - ETA: 0s - loss: 0.1146 - acc: 0.9743
1088/1750 [=================>............] - ETA: 0s - loss: 0.0982 - acc: 0.9825
1632/1750 [==========================>...] - ETA: 0s - loss: 0.0980 - acc: 0.9816
1750/1750 [==============================] - 0s - loss: 0.0963 - acc: 0.9829 - val_loss: 0.3198 - val_acc: 0.9040

Last: acc: 0.9829 
"""