**Fine-tuning keras models**

Here, you'll learn how to optimize your deep learning models in keras. You'll learn how to validate your models, understand the concept of model capacity, and experiment with wider and deeper networks. Enjoy!

**Changing optimization parameters**

It's time to get your hands dirty with optimization. You'll now try optimizing a model at a very low learning rate, a very high learning rate, and a "just right" learning rate. You'll want to look at the results after running this exercise, remembering that a low value for the loss function is good.

For these exercises, we've pre-loaded the predictors and target values from your previous classification models (predicting who would survive on the Titanic). You'll want the optimization to start from scratch every time you change the learning rate, to give a fair comparison of how each learning rate did in your results. So we have created a function `get_new_model()` that creates an unoptimized model to optimize.

In [4]:
import pandas as pd
import numpy as np

In [5]:
df = pd.read_csv('titanic_all_numeric.csv')
df.describe()

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,male,embarked_from_cherbourg,embarked_from_queenstown,embarked_from_southampton
count,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208,0.647587,0.188552,0.08642,0.722783
std,0.486592,0.836071,13.002015,1.102743,0.806057,49.693429,0.47799,0.391372,0.281141,0.447876
min,0.0,1.0,0.42,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,2.0,22.0,0.0,0.0,7.9104,0.0,0.0,0.0,0.0
50%,0.0,3.0,29.699118,0.0,0.0,14.4542,1.0,0.0,0.0,1.0
75%,1.0,3.0,35.0,1.0,0.0,31.0,1.0,0.0,0.0,1.0
max,1.0,3.0,80.0,8.0,6.0,512.3292,1.0,1.0,1.0,1.0


In [6]:
predictors = df.iloc[:, 1:].values
predictors.shape

(891, 10)

In [7]:
target = df.survived
target.shape

(891,)

In [8]:
n_cols = predictors.shape[1]
n_cols

10

In [17]:
def get_new_model(input_shape = n_cols):
    model = Sequential()
    model.add(Dense(100, activation='relu', input_shape = input_shape))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(2, activation='softmax'))
    return(model)

In [None]:
from keras import backend as K
from os import environ

# user defined function to change keras backend
def set_keras_backend(backend):
    if K.backend() != backend:
       environ['KERAS_BACKEND'] = backend
       reload(K)
       assert K.backend() == backend

# call the function with "theano"
set_keras_backend("tensorflow")

In [None]:
# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.utils import to_categorical

In [None]:
# Import the SGD optimizer
from keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )
    
    # Build new model to test, unaffected by previous models
    model = get_new_model()
    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)
    
    # Compile the model
    model.compile(optimizer=my_optimizer, loss='categorical_crossentropy')
    
    # Fit the model
    model.fit(predictors, target)
    

**Evaluating model accuracy on validation dataset**

Now it's your turn to monitor model accuracy with a validation data set. A model definition has been provided as `model`. Your job is to add the code to compile it and then fit it. You'll check the validation score in each epoch.

In [None]:
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
hist = model.fit(predictors, target, validation_split=0.3)


**Early stopping: Optimizing the optimization**

Now that you know how to monitor your model performance throughout optimization, you can use early stopping to stop optimization when it isn't helping any more. Since the optimization stops automatically when it isn't helping, you can also set a high value for `epochs` in your call to `.fit()`, as Dan showed in the video.

The model you'll optimize has been specified as `model`. As before, the data is pre-loaded as `predictors` and `target`.

In [None]:
# Import EarlyStopping
from keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Fit the model
model.fit(predictors, target, epochs=30, validation_split=0.3, callbacks=[early_stopping_monitor])


In [None]:
Train on 623 samples, validate on 268 samples
Epoch 1/30

 32/623 [>.............................] - ETA: 1s - loss: 5.6563 - acc: 0.4688
352/623 [===============>..............] - ETA: 0s - loss: 1.9121 - acc: 0.5000
623/623 [==============================] - 0s - loss: 1.6422 - acc: 0.5698 - val_loss: 0.9923 - val_acc: 0.6828
Epoch 2/30

 32/623 [>.............................] - ETA: 0s - loss: 1.7664 - acc: 0.4375
448/623 [====================>.........] - ETA: 0s - loss: 0.9028 - acc: 0.5692
623/623 [==============================] - 0s - loss: 0.8335 - acc: 0.6019 - val_loss: 0.5871 - val_acc: 0.7351
Epoch 3/30

 32/623 [>.............................] - ETA: 0s - loss: 0.9432 - acc: 0.6250
448/623 [====================>.........] - ETA: 0s - loss: 0.7740 - acc: 0.5938
623/623 [==============================] - 0s - loss: 0.7909 - acc: 0.6308 - val_loss: 0.6608 - val_acc: 0.7313
Epoch 4/30

 32/623 [>.............................] - ETA: 0s - loss: 1.3154 - acc: 0.5625
384/623 [=================>............] - ETA: 0s - loss: 0.7669 - acc: 0.5911
623/623 [==============================] - 0s - loss: 0.7288 - acc: 0.6308 - val_loss: 0.5404 - val_acc: 0.7276
Epoch 5/30

 32/623 [>.............................] - ETA: 0s - loss: 0.5698 - acc: 0.7188
352/623 [===============>..............] - ETA: 0s - loss: 0.6634 - acc: 0.6648
623/623 [==============================] - 0s - loss: 0.6558 - acc: 0.6629 - val_loss: 0.6004 - val_acc: 0.6866
Epoch 6/30

 32/623 [>.............................] - ETA: 0s - loss: 0.4422 - acc: 0.8750
512/623 [=======================>......] - ETA: 0s - loss: 0.5886 - acc: 0.7109
623/623 [==============================] - 0s - loss: 0.6034 - acc: 0.6998 - val_loss: 0.5831 - val_acc: 0.6866
Epoch 7/30

 32/623 [>.............................] - ETA: 0s - loss: 0.6229 - acc: 0.6875
384/623 [=================>............] - ETA: 0s - loss: 0.5948 - acc: 0.7083
623/623 [==============================] - 0s - loss: 0.6509 - acc: 0.7095 - val_loss: 0.6876 - val_acc: 0.6455
Out[2]: <keras.callbacks.History at 0x7f55b43a0940>

**Experimenting with wider networks**

Now you know everything you need to begin experimenting with different models!

A model called `model_1` has been pre-loaded. You can see a summary of this model printed in the IPython Shell. This is a relatively small network, with only 10 units in each hidden layer.

In this exercise you'll create a new model called `model_2` which is similar to `model_1, except it has 100 units in each hidden layer.

After you create model_2, both models will be fitted, and a graph showing both models loss score at each epoch will be shown. We added the argument verbose=False in the fitting commands to print out fewer updates, since you will look at these graphically instead of as text.

Because you are fitting two models, it will take a moment to see the outputs after you hit run, so be patient.

In [None]:
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()

# Add the first and second layers
model_2.add(Dense(100, activation='relu', input_shape=input_shape))
model_2.add(Dense(100, activation='relu'))

# Add the output layer
model_2.add(Dense(2,activation='softmax'))

# Compile model_2
model_2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

# Fit model_1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Fit model_2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

**Adding layers to a network**

You've seen how to experiment with wider networks. In this exercise, you'll try a deeper network (more hidden layers).

Once again, you have a baseline model called `model_1` as a starting point. It has 1 hidden layer, with `50` units. You can see a summary of that model's structure printed out. You will create a similar network with 3 hidden layers (still keeping 50 units in each layer).

This will again take a moment to fit both models, so you'll need to wait a few seconds to see the results after you run your code.

In [None]:
# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(50,activation='relu',input_shape=input_shape))
model_2.add(Dense(50,activation='relu'))
model_2.add(Dense(50,activation='relu'))

# Add the output layer
model_2.add(Dense(2,activation='softmax'))

# Compile model_2
model_2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=20, validation_split=0.4, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

**Building your own digit recognition model**

You've reached the final exercise of the course - you now know everything you need to build an accurate model to recognize handwritten digits!

We've already done the basic manipulation of the MNIST dataset shown in the video, so you have `X` and `y` loaded and ready to model with. `Sequential` and `Dense` from keras are also pre-imported.

To add an extra challenge, we've loaded only 2500 images, rather than 60000 which you will see in some published results. Deep learning models perform better with more data, however, they also take longer to train, especially when they start becoming more complex.

If you have a computer with a CUDA compatible GPU, you can take advantage of it to improve computation time. If you don't have a GPU, no problem! You can set up a deep learning environment in the cloud that can run your models on a GPU. Here is a [blog post](https://www.datacamp.com/community/tutorials/deep-learning-jupyter-aws) by Dan that explains how to do this - check it out after completing this exercise! It is a great next step as you continue your deep learning journey.

In [None]:
# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50,activation='relu',input_shape=(784,)))

# Add the second hidden layer
model.add(Dense(50,activation='relu'))

# Add the output layer
model.add(Dense(10,activation='softmax'))

# Compile the model
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

# Fit the model
model.fit(X,y,validation_split=0.3)