### Basics of deep learning and neural networks

* [Introduction to deep learning](#Itdl)
* [Comparing neural network models to classical regression models](#Cnnmtcrm)
* [Forward propagation](#Fp)
* [Coding the forward propagation algorithm](#Ctfpa)
* [Activation functions](#Af)
* [The Rectified Linear Activation Function](#TRLAF)
* [Applying the network to many observations/rows of data](#Atntmood)
* [Deeper networks](#Dn)
* [Forward propagation in a deeper network](#Fpiadn)
* [Multi-layer neural networks](#Mnn)
* [Representations are learned](#Ral)
* [Levels of representation](#Lor)



### Optimizing a neural network with backward propagation
* [The need for optimization](#Tnfo)
* [Calculating model errors](#Cme)
* [Understanding how weights change model accuracy](#Uhwcma)
* [Coding how weight changes affect accuracy](#Chwcaa)
* [Scaling up to multiple data points](#Sutmdp)
* [Gradient descent](#Gd)
* [Calculating slopes](#Cs)
* [Improving model weights](#Imw)
* [Making multiple updates to weights](#Mmutw)
* [Backpropagation](#B)
* [The relationship between forward and backward propagation](#Trbfabp)
* [Thinking about backward propagation](#Tabp)
* [Backpropagation in practice](#Bip)
* [A round of backpropagation](#Arob)


### Building deep learning models with keras
* [Creating a keras model](#Cakm)
* [Understanding your data](#Uyd)
* [Specifying a model](#Sam)
* [Compiling and fitting a model](#Cafam)
* [Compiling the model](#Ctm)
* [Fitting the model](#Ftm)
* [Classification models](#Cm)
* [Understanding your classification data](#Uycd)
* [Last steps in classification models](#Lsicm)
* [Using models](#Um)
* [Making predictions](#Mp)


### Fine-tuning keras models
* [Understanding model optimization](#Umo)
* [Diagnosing optimization problems](#Dop)
* [Changing optimization parameters](#Cop)
* [Model validation](#Mv)
* [Evaluating model accuracy on validation dataset](#Emaovd)
* [Early stopping: Optimizing the optimization](#EsOto)
* [Experimenting with wider networks](#Ewwn)
* [Adding layers to a network](#Altan)
* [Thinking about model capacity](#Tamc)
* [Experimenting with model structures](#Ewms)
* [Stepping up to images](#Suti)
* [Building your own digit recognition model](#Byodrm)


<p id ='Itdl'><p>
### Introduction to deep learning![Untitled.png](attachment:Untitled.png)

# Basics of deep learning and neural networks



<p id ='Cnnmtcrm'><p>
### Comparing neural network models to classical regression models

<p id ='Fp'><p>
### Forward propagation![Screenshot%202019-03-25%20at%203.12.04%20PM.png](attachment:Screenshot%202019-03-25%20at%203.12.04%20PM.png)

<p id ='Ctfpa'><p>
### Coding the forward propagation algorithm

![Screenshot%202019-03-25%20at%203.22.02%20PM.png](attachment:Screenshot%202019-03-25%20at%203.22.02%20PM.png)

Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict **how many transactions the user makes in the next year.**



In [1]:
import numpy as np

In [2]:
input_data = np.array([3, 5])
# No. of accounts = 3, No. of children = 5

In [4]:
weights = {'node_0': np.array([2, 4]), 'node_1': np.array([ 4, -5]), 'output': np.array([2, 7])}

In [6]:
weights

{'node_0': array([2, 4]), 'node_1': array([ 4, -5]), 'output': array([2, 7])}

In [33]:
node_0_value = np.matmul(input_data, weights['node_0'])
node_1_value = np.matmul(input_data, weights['node_1'])
hidden_layer_outputs = np.array([node_0_value, node_1_value])
outputs = np.matmul(hidden_layer_outputs, weights['output'])
outputs

-39

Wonderful work! It looks like the network generated a prediction of -39.



<p id ='Af'><p>
### Activation functions
    
For the neural networks to achive their maximum predictive power, we must apply something called Activation function in the hidden layers.
An Activation function allows model to capture non-linearities.

Non- linearity capture pattern how going from one child may impact our banking transactions than going from two to three.
![Screenshot%202019-03-25%20at%204.41.07%20PM.png](attachment:Screenshot%202019-03-25%20at%204.41.07%20PM.png)

If the relationship in the data are'nt straight line relationships, we will need an activation function that captures non-linearities.


**Activation Functions:**
* Applied to node input to produce node output
![Screenshot%202019-03-25%20at%204.47.56%20PM.png](attachment:Screenshot%202019-03-25%20at%204.47.56%20PM.png)

<p id ='TRLAF'><p>
### The Rectified Linear Activation Function
![Screenshot%202019-03-25%20at%204.54.57%20PM.png](attachment:Screenshot%202019-03-25%20at%204.54.57%20PM.png)
![Screenshot%202019-03-25%20at%204.54.02%20PM.png](attachment:Screenshot%202019-03-25%20at%204.54.02%20PM.png)

In [35]:
def relu(input):
    return max(0, input)

In [40]:
print(relu(-3))
print(relu(889))

0
889


Apply **RelU** to hidden unit in this neural networks
![Screenshot%202019-03-25%20at%205.04.37%20PM.png](attachment:Screenshot%202019-03-25%20at%205.04.37%20PM.png)


In [43]:
node_0_output = relu(np.matmul(input_data, weights['node_0']))


26

In [46]:
node_1_output = relu(np.matmul(input_data, weights['node_1']))
node_1_output

0

In [48]:
hidden_layer_outputs = np.array([node_0_output, node_1_output])
hidden_layer_outputs

array([26,  0])

In [51]:
model_output = np.matmul(hidden_layer_outputs, weights['output'])
model_output

52

You predicted 52 transactions. Without this activation function, you would have predicted a negative number! The real power of activation functions will come soon when you start tuning model weights.



<p id ='Atntmood'><p>
### Applying the network to many observations/rows of data
Define a function called `predict_with_network()` which will generate predictions for multiple data observations

In [54]:
input_data =  [np.array([3, 5]), np.array([ 1, -1]), np.array([0, 0]), np.array([8, 4])]
input_data

[array([3, 5]), array([ 1, -1]), array([0, 0]), array([8, 4])]

In [56]:
def predict_with_network(input_data, weights):
    node_0_output = relu(np.matmul(input_data, weights['node_0']))
    node_1_output = relu(np.matmul(input_data, weights['node_1']))
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    model_output = np.matmul(hidden_layer_outputs, weights['output'])
    return model_output

In [58]:
results = [predict_with_network(input_data_row, weights) for input_data_row in input_data]
results

[52, 63, 0, 148]

<p id ='Dn'><p>
## Deeper networks
![Screenshot%202019-03-25%20at%205.42.39%20PM.png](attachment:Screenshot%202019-03-25%20at%205.42.39%20PM.png)
    
Representational Learning
* Deep Networks internally build representations of pattern in data.
* Partially replace the need for feature engineering.
* Subsequent layers build sophisticated representations of raw data.

![Screenshot%202019-03-25%20at%205.49.38%20PM.png](attachment:Screenshot%202019-03-25%20at%205.49.38%20PM.png)
When we train the model, the neural network gets weights that find the relevant patterns to make better predictions.

<p id ='Fpiadn'><p>
### Forward propagation in a deeper network

<p id ='Mnn'><p>
## Multi-layer neural networks
![Screenshot%202019-03-25%20at%206.11.14%20PM.png](attachment:Screenshot%202019-03-25%20at%206.11.14%20PM.png)

Code to forward propagation for neural network with two hidden layer.


In [61]:
input_data

[array([3, 5]), array([ 1, -1]), array([0, 0]), array([8, 4])]

In [63]:
weights = {'node_0_0': np.array([2, 4]),
 'node_0_1': np.array([ 4, -5]),
 'node_1_0': np.array([-1,  2]),
 'node_1_1': np.array([1, 2]),
 'output': np.array([2, 7])}

In [90]:
def predict_with_networks(input_data):
    # Calculate node 0  and node 1 in the first hidden layer
    node_00_output = relu(np.matmul(input_data, weights['node_0_0']))
    node_01_output = relu(np.matmul(input_data, weights['node_0_1']))
    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_00_output, node_01_output])
    # Calculate node 0  and node 1 in the second hidden layer
    node_10_output = relu(np.matmul(hidden_0_outputs, weights['node_1_0']))
    node_11_output = relu(np.matmul(hidden_0_outputs, weights['node_1_1']))
    # Put node values into array: hidden_0_outputs
    hidden_1_outputs = np.array([node_10_output, node_11_output])
    
    model_output = np.matmul(hidden_1_outputs, weights['output'])
    return model_output

In [91]:
dat = np.array([3, 5])

In [93]:
results = [predict_with_networks(input_data_row) for input_data_row in input_data ]

In [94]:
results

[182, 162, 0, 392]

<p id ='Ral'><p>
### Representations are learned

<p id ='Lor'><p>
### Levels of representation

# Optimizing a neural network with backward propagation

<p id ='Tnfo'><p>
### The need for optimization
![Screenshot%202019-03-25%20at%206.45.27%20PM.png](attachment:Screenshot%202019-03-25%20at%206.45.27%20PM.png)
![Screenshot%202019-03-25%20at%206.45.52%20PM.png](attachment:Screenshot%202019-03-25%20at%206.45.52%20PM.png)

<p id ='Chwcaa'><p>
### Coding how weight changes affect accuracy

![Screenshot%202019-03-25%20at%207.11.14%20PM.png](attachment:Screenshot%202019-03-25%20at%207.11.14%20PM.png)

In [96]:
input_data = np.array([0,3])
weights_0 = {'node_0':np.array([2,1]), 'node_1':np.array([1, 2]), 'output':np.array([1,1])}

Update a single weight in `weights_0` to create `weights_1`, which gives a perfect prediction (in which the predicted value is equal to target_actual: **3**).

In [105]:
# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)
print('Initial Model predicts', model_output_0)

Initial Model predicts 9


In [106]:
# The actual target value, used to calculate the error
target_actual = 3
# Calculate error: error_0
error_0 = model_output_0-target_actual
print('Initial Error is', error_0)

Initial Error is 6


In [114]:
# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 0]
            }

In [115]:
# Make prediction using original weights
model_output_1 = predict_with_network(input_data, weights_1)
print('Tuned Model predicts', model_output_1)

Tuned Model predicts 3


In [116]:
# Calculate error: error_1
error_1 = model_output_1-target_actual
print('Tuned Error is', error_1)

Tuned Error is 0


<p id ='Sutmdp'><p>
### Scaling up to multiple data points
Write code to compare model accuracies for two different sets of weights, which have been stored as `weights_0` and `weights_1`.
    
    
Loss function aggregates errors in predictions from many data points into single number.



In [117]:
from sklearn.metrics import mean_squared_error

In [136]:
weights_0

{'node_0': array([2, 1]), 'node_1': array([1, 2]), 'output': array([1, 1])}

In [138]:
target_actuals = [1, 3, 5, 7]

In [139]:
weights_1 = {'node_0': np.array([2, 1]),
 'node_1': np.array([1. , 1.5]),
 'output': np.array([1. , 1.5])}

In [140]:
input_data = [np.array([0, 3]), np.array([1, 2]), np.array([-1, -2]), np.array([4, 0])]


In [141]:
model_0_output = [predict_with_network(row, weights_0) for row in input_data]
model_0_output

[9, 9, 0, 12]

In [142]:
model_1_output = [predict_with_network(row, weights_1) for row in input_data]
model_1_output


[9.75, 10.0, 0.0, 14.0]

In [144]:
# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(target_actuals, model_0_output)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(target_actuals, model_1_output)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)


Mean squared error with weights_0: 37.500000
Mean squared error with weights_1: 49.890625


<p id ='Gd'><p>
### Gradient descent
![Screenshot%202019-03-25%20at%209.02.20%20PM.png](attachment:Screenshot%202019-03-25%20at%209.02.20%20PM.png)

<p id ='Cs'><p>
### Calculating slopes

In [145]:
target = 0
input_data = np.array([1,2,3])
weights = np.array([0,2,1])

In [150]:
preds = np.matmul(input_data, weights)
preds

7

In [158]:
error = target-preds
error

-7

In [159]:
slope = 2*input_data*error
slope

array([-14, -28, -42])

In [160]:
slope = 2*input_data*(target-np.matmul(input_data, weights))
slope

array([-14, -28, -42])

<p id ='Imw'><p>
### Improving model weights
If you add the slopes to your weights, you will move in the right direction. However, it's possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.

In [156]:
learning_rate = 0.01

In [161]:
# Update the weights: weights_updated
weights_updated = weights- learning_rate*slope

# Get updated predictions: preds_updated
preds_updated = weights_updated*input_data

# Calculate updated error: error_updated
error_updated = target-preds_updated
# Print the original error
print(error)

# Print the updated error
print(error_updated)


-7
[-0.14 -4.56 -4.26]


<p id ='Mmutw'><p>
### Making multiple updates to weights

In [164]:
n_updates = 20
mse_hist = []

In [166]:
input_data

array([1, 2, 3])

In [168]:
weights

array([0.14, 2.56, 2.26])

In [170]:
input_data*np.matmul(input_data, weights)

array([12.04, 24.08, 36.12])

In [172]:
target_actuals

[1, 3, 5, 7]

In [173]:
for i in range(n_updates):
    slope = 2*input_data*(target-input_data*np.matmul(input_data, weights))
    weights = weights- learning_rate*slope
    mse = mean_squared_error(target_actuals, input_data*np.matmul(input_data, weights))
    mse_hist.append(mse)

ValueError: Found input variables with inconsistent numbers of samples: [4, 3]

<p id ='B'><p>
### Backpropagation

<p id ='Trbfabp'><p>
### The relationship between forward and backward propagation

<p id ='Tabp'><p>
### Thinking about backward propagation

<p id ='Bip'><p>
### Backpropagation in practice

<p id ='Arob'><p>
### A round of backpropagation

# Building deep learning models with keras

<p id ='Cakm'><p>
### Creating a keras model

<p id ='Uyd'><p>
### Understanding your data

<p id ='Sam'><p>
### Specifying a model

<p id ='Cafam'><p>
### Compiling and fitting a model

<p id ='Ctm'><p>
### Compiling the model

<p id ='Ftm'><p>
### Fitting the model

<p id ='Cm'><p>
### Classification models

<p id ='Uycd'><p>
### Understanding your classification data

<p id ='Lsicm'><p>
### Last steps in classification models

<p id ='Um'><p>
### Using models

<p id ='Mp'><p>
### Making predictions

# Fine-tuning keras models

<p id ='Umo'><p>
### Understanding model optimization

<p id ='Dop'><p>
### Diagnosing optimization problems

<p id ='Cop'><p>
### Changing optimization parameters

<p id ='Mv'><p>
### Model validation

<p id ='Emaovd'><p>
### Evaluating model accuracy on validation dataset

<p id ='EsOto'><p>
### Early stopping: Optimizing the optimization

<p id ='Ewwn'><p>
### Experimenting with wider networks

<p id ='Altan'><p>
### Adding layers to a network

<p id ='Tamc'><p>
### Thinking about model capacity

<p id ='Ewms'><p>
### Experimenting with model structures

<p id ='Suti'><p>
### Stepping up to images

<p id ='Byodrm'><p>
### Building your own digit recognition model