# Deep learning

## Forward propagation
<img src="https://s3.amazonaws.com/assets.datacamp.com/production/course_3524/datasets/1_4.png"/>

In [5]:
import numpy as np

input_data = np.array([2, 3])

weights = {
    'node_0' : np.array([1, 1]),
    'node_1' : np.array([-1, 1]),
    'output' : np.array([2, -1])
}
node_0_value = input_data.dot(weights['node_0'])
node_1_value = input_data.dot(weights['node_1'])

hidden_layer_values = np.array([node_0_value, node_1_value])

print(hidden_layer_values)

output = hidden_layer_values.dot(weights['output'])

print(output)

[5 1]
9


##  Activation functions
Activation function allows model to caputre non **linearities**
- Tanh
- ReLU 
  <img src="https://www.safaribooksonline.com/library/view/python-natural-language/9781787121423/assets/02c4f3a4-8c9b-405a-88bd-47b79e3981dc.png" />

In [7]:
node_0_input = input_data.dot(weights['node_0'])
node_0_output = np.tanh(node_0_input)

node_1_input = input_data.dot(weights['node_1'])
node_1_output = np.tanh(node_1_input)

hidden_layer_outputs = np.array([node_0_output, node_1_output])

print(hidden_layer_outputs)

output = hidden_layer_outputs.dot(weights['output'])

print(output)

[ 0.9999092   0.76159416]
1.23822425257


##  Deeper networks
- Deep networks internally build representations of pattterns in the data
- Partially replace the need for feature engineering 
- Subsequent layers build increasingly sophisticated representations of raw data

In [55]:
weights = {
 'node_0_0': np.array([2, 4]),
 'node_0_1': np.array([ 4, -5]),
 'node_1_0': np.array([-1,  2]),
 'node_1_1': np.array([1, 2]),
 'output': np.array([2, 7])
}

def relu(X):
    return np.maximum(X, 0)

def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = input_data.dot(weights['node_0_0'])
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = input_data.dot(weights['node_0_1'])
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = hidden_0_outputs.dot(weights['node_1_0'])
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = hidden_0_outputs.dot(weights['node_1_1'])
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = relu(hidden_1_outputs.dot(weights['output']))
    
    # Return model_output
    return(model_output)

In [57]:
input_data = np.array([3, 5])
predict_with_network(input_data)

182

## The need for optimization
- Loss function is used to aggregate errors in predictions from many data points into single number
- Measure of model's predictive performance
- For example MEAN SQUARED ERROR
- Goal find the weights that give the lowest value for the loss function
- Using gradient descent algorithm to optimize
- Setps for GD: 
    - Start at a random point
    - until you are somwhere flat:
        - find the slope
        - take a step downhill

### Gradient descent

In [82]:
def run_one_step_gradient(input_data, weights, target, learning_rate):
    preds = input_data.dot(weights)
    error = preds - target
    gradient = 2 * input_data * error
    weights_updated  = weights - learning_rate * gradient
    return weights_updated

import numpy as np

weights = np.array([1, 2])
input_data = np.array([3, 4])
target = 6
learning_rate  = 0.01
weights_updated = run_one_step_gradient(input_data, weights, target, learning_rate)
for i in range(1, 21):
    weights_updated = run_one_step_gradient(input_data, weights_updated, target, learning_rate)
    pred = input_data.dot(weights_updated)
    print("Step ", i, 'Pred = ', pred, 'Error ', pred- target)

Step  1 Pred =  7.25 Error  1.25
Step  2 Pred =  6.625 Error  0.625
Step  3 Pred =  6.3125 Error  0.3125
Step  4 Pred =  6.15625 Error  0.15625
Step  5 Pred =  6.078125 Error  0.078125
Step  6 Pred =  6.0390625 Error  0.0390625
Step  7 Pred =  6.01953125 Error  0.01953125
Step  8 Pred =  6.009765625 Error  0.009765625
Step  9 Pred =  6.0048828125 Error  0.0048828125
Step  10 Pred =  6.00244140625 Error  0.00244140625
Step  11 Pred =  6.00122070313 Error  0.001220703125
Step  12 Pred =  6.00061035156 Error  0.0006103515625
Step  13 Pred =  6.00030517578 Error  0.00030517578125
Step  14 Pred =  6.00015258789 Error  0.000152587890625
Step  15 Pred =  6.00007629395 Error  7.62939453125e-05
Step  16 Pred =  6.00003814697 Error  3.81469726563e-05
Step  17 Pred =  6.00001907349 Error  1.90734863281e-05
Step  18 Pred =  6.00000953674 Error  9.53674316406e-06
Step  19 Pred =  6.00000476837 Error  4.76837158203e-06
Step  20 Pred =  6.00000238419 Error  2.38418579102e-06


### Backpropagation
- Allows gradient descent to update all weights in neural network by getting gradient ofr all weights
- Trying to estimate the slope of the loss function with respect to eaach weight
<img src="https://i.ytimg.com/vi/An5z8lR8asY/maxresdefault.jpg" >

**Backpropagation process**
- Go back one layer at time 
- Gradients for weight is product of :
 - Node value feedingg into that weight
 - Slope of loss function w.r.t node it feeds into
 - Slope of activation function at the node it feeds into

## Creating a keras model

In [13]:
import numpy as np
import pandas as pd
from keras.layers import Dense
from keras.models import Sequential

predictors = pd.read_csv('https://assets.datacamp.com/production/course_1975/datasets/hourly_wages.csv')
X = predictors.drop(columns=['wage_per_hour']).values
y = predictors['wage_per_hour'].values

n_cols = X.shape[1] #number of nodes in the input layer

model = Sequential() 

#dense layer because all the node in the previous layer are connected to the nodes in the current layer
#100 nodes in layer
model.add(Dense(100, activation="relu", input_shape=(n_cols, )))

model.add(Dense(100, activation="relu"))

#output layer
model.add(Dense(1))

from ann_visualizer.visualize import ann_viz
#ann_viz(model)

<img src="./graphs/nn.PNG" >

In [8]:
# compiling the model
model.compile(optimizer='adam', loss='mean_squared_error',   metrics=['accuracy'])

In [12]:
#fitting (applying back_prob)
model.fit(X, y, verbose=0)

<keras.callbacks.History at 0x1e12ffa5208>

### Classification models
- categorical_crosstropy loss function
- similar to log loss 
- uses softmax as activation function 

In [47]:
from keras.utils import to_categorical
import pandas as pd
from keras.layers import Dense
from keras.models import Sequential

df = pd.read_csv('https://assets.datacamp.com/production/course_1975/datasets/titanic_all_numeric.csv')
X = df.drop(columns=['survived']).values

target = to_categorical(df.survived)

n_cols = X.shape[1]

In [48]:
model = Sequential() 
# Add the first layer
model.add(Dense(32, activation="relu", input_shape=(n_cols, )))
# Add the output layer
model.add(Dense(2, activation="softmax"))
# Compile the model
model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=['accuracy'])
#fit the model
model.fit(X, target)

Epoch 1/1


<keras.callbacks.History at 0x1e132c140f0>

In [4]:
# Save the model
model.save('model_file.h5')

In [3]:
from keras.models import load_model
my_model = load_model('model_file.h5')
my_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 32)                352       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
Total params: 418
Trainable params: 418
Non-trainable params: 0
_________________________________________________________________


## Understanding model optimization
- **Dying neuron** problem:
    - Once a node start always getting negative inputs, it may conitnue getting always neg value so contributing nothing
- **vanishing** gradient:
    - when many layers have very small slopes
    - in case of DL, updates to backprob were close to 0

In [51]:
def get_new_model(input_shape):
    model = Sequential()
    model.add(Dense(100, activation='relu', input_shape = (input_shape, )))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(2, activation='softmax'))
    return(model)

In [56]:
# Import the SGD optimizer
from keras.optimizers import SGD

n_cols = X.shape[1]

# Create list of learning rates: lr_to_test
lr_to_test = [0.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )
    
    # Build new model to test, unaffected by previous models
    model = get_new_model(n_cols)
    
    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)
    
    # Compile the model
    model.compile(optimizer=my_optimizer, loss="categorical_crossentropy", metrics=['accuracy'])
    
    # Fit the model
    model.fit(X, target, epochs=10)



Testing model with learning rate: 0.000001

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Testing model with learning rate: 0.010000

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Testing model with learning rate: 1.000000

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Model validation

In [57]:
model.fit(X, target, epochs=10, validation_split=0.3)

Train on 623 samples, validate on 268 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1e132ea0668>

In [80]:
#Early stopping

from keras.callbacks import EarlyStopping

#how many epochs the model can go without improving before we stop training, 2 or 3 (more it is unlikly that model will improve)
early_stopping_monitor = EarlyStopping(patience=3)

model = Sequential()
model.add(Dense(100, activation='relu', input_shape = (n_cols, )))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['accuracy'])


model.fit(X, target, validation_split=0.3, epochs=50, callbacks=[early_stopping_monitor])

Train on 623 samples, validate on 268 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50


<keras.callbacks.History at 0x1e1499ae550>

In [81]:
model = load_model('classifier_nn.h5')

### Thinking about model capacity
<img src="./graphs/model_complexity_error_training_test.jpg" />

### Stepping up to images

In [85]:
df = pd.read_csv('https://assets.datacamp.com/production/course_1975/datasets/mnist.csv', header=None)

In [96]:
from keras.utils import to_categorical

X = df.drop(columns=[df.columns[0]])
y = to_categorical(df[df.columns[0]])

In [113]:
# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50, activation='relu',  input_shape=(784, )))

# Add the second hidden layer
model.add(Dense(50, activation='relu'))

# Add the output layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['accuracy'])

early_stopping_monitor = EarlyStopping(patience=10)
# Fit the model
model.fit(X, y, epochs=10000, validation_split=0.3, callbacks=[early_stopping_monitor])


Train on 1400 samples, validate on 601 samples
Epoch 1/10000
Epoch 2/10000
Epoch 3/10000
Epoch 4/10000
Epoch 5/10000
Epoch 6/10000
Epoch 7/10000
Epoch 8/10000
Epoch 9/10000
Epoch 10/10000
Epoch 11/10000
Epoch 12/10000
Epoch 13/10000
Epoch 14/10000
Epoch 15/10000
Epoch 16/10000
Epoch 17/10000
Epoch 18/10000
Epoch 19/10000
Epoch 20/10000
Epoch 21/10000
Epoch 22/10000
Epoch 23/10000
Epoch 24/10000
Epoch 25/10000
Epoch 26/10000
Epoch 27/10000
Epoch 28/10000
Epoch 29/10000
Epoch 30/10000
Epoch 31/10000
Epoch 32/10000
Epoch 33/10000
Epoch 34/10000
Epoch 35/10000
Epoch 36/10000
Epoch 37/10000
Epoch 38/10000
Epoch 39/10000
Epoch 40/10000
Epoch 41/10000
Epoch 42/10000
Epoch 43/10000
Epoch 44/10000


<keras.callbacks.History at 0x1e14ce5fe48>