# Understanding `Model Optimization` (Learning Rate)
----
1. Why optimization is hard in deep learning ?
    - Simultaneously optimizing 1000s of parameters with complex relationships
    - Updates may not improve model meaningfully
    - Updates too small (if learning rate is low) or too large (if learning rate is high)
- **Scenario:** Try to optimize a model at a very low learning rate, a very high learning rate, and a `just right` learning rate.
- We need to look at the results after running this exercise, `remembering that a low value for the loss function is good `

## 1. Exercise 1: Let us optimize using <code> Stochastic Gradient Descent</code>

### 1.1 Step 1 of 3: Get Data

In [12]:
# Get data

import pandas as pd
from keras.utils import to_categorical

df = pd.read_csv("titanic_all_numeric_train.csv")
X_train = df.drop(['survived'], axis=1).values
y_train = to_categorical(df.survived)
df = pd.read_csv("titanic_all_numeric_test.csv")
X_test= df.drop(['survived'], axis=1).values
y_test = to_categorical(df.survived)

In [13]:
n_features=X_train.shape[1]

In [14]:
n_features

10

### 1.2.  Step 2 of 3: Create model as a function to loop from starting

In [15]:
def get_new_model(input_dim = n_features):
    model = Sequential()
    model.add(Dense(100, activation='relu', input_dim = n_features)) # 1 hidden + 1 Input layer
    model.add(Dense(100, activation='relu')) # 2nd hidden layer
    model.add(Dense(2, activation='softmax')) # output layer
    return(model)

### 1.3. Step 3 of 3: Changing optimization parameters

In [5]:
# Import the SGD optimizer
from keras.models import  Sequential
from keras.layers import Dense
from keras.optimizers import SGD

Using TensorFlow backend.


In [6]:
# Create list of learning rates: lr_to_test
lr_to_test = [.000001, 0.01, 1]

In [30]:
# Loop over learning rates
for lr in lr_to_test:
    print(f'\n\n\nTesting model with learning rate: {lr}\n\n' )    
    # Build new model to test, unaffected by previous models
    model = get_new_model()    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)    
    # Compile the model
    model.compile(optimizer=my_optimizer, 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])    
    # Fit the model
    model.fit(X_train, y_train,epochs=10,verbose=1)




Testing model with learning rate: 1e-06


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10



Testing model with learning rate: 0.01


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10



Testing model with learning rate: 1


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


1. By observing above result which learning rate is good ?
    
    - <input type="radio" disabled> 0.000001
    - <input type="radio" disabled checked> 0.01
    - <input type="radio" disabled> 1
    - <input type="radio" disabled> None of the above



2. Which of the following could prevent a model from showing an improved loss in its `first few epochs`?

    - <input type="radio" disabled> Learning rate too low
    - <input type="radio" disabled> Learning rate too high
    - <input type="radio" disabled> Poor choice of activation function
    - <input type="radio" disabled checked> All of the above

