# Python ML library: Keras


## 0. Motivation

Imagine we need to build a model predicting how many transaction each customer will make next year.     
We have predictive data (or features) like each customer's age, bank balance, whether they are retired etc.     

Consider how a linear regression will solve this problem:    
The linear regression embeds an assumption that the outcome, in this case how many transaction a user makes, is the sum of individual part. It starts by saying:"what is the average?", then the effect of each part comes in.      
So the linear regression model is *not* identifying the interactions between these parts, and how they affect banking activity.     

We can plot the prediction for retired/non-retired people (2 lines), with current bank balance on the x-axis, predicted number of transaction on the y-axis. This graph shows prediction from a model with no interactions (if these two lines are parallel, if interactions are allowed, then these 2 lines will not be parallel). In this model we simply add up the effect of the retirement status and current bank balance.     

Neural networks are a powerful modeling approach that accounts for interactions like this well.     
Deep learning uses especially powerful neural networks.     

Because deep learning account for interaction so well, they perform great on most prediction problems we seen before.     
But their ability to capture extremely complex interactions also allow them to do amazing stuff with text, images, video, source code etc.    

Going back to the transaction problem, if there is an interaction between retirement status and bank balance, instead of them *separately* affect the outcome, we can calculate a function  of these variables that accounts for their interaction, and use that to predict the outcome (with other features).    

In reality, we can represent interactions in neural network like:     
Most left: Input layer (consist of predictive features, e.g.age, income)      
Most right: Output layer (the prediction of the model, e.g. number of transaction)     
All layers that are not the input or output layers are called *hidden layers*.     
These are hidden because they are not something we have data about, or anything we observe directly form the world.     
Each *node* in the hidden layer, represents an aggregation of information from our input data, and each node adds to the model's ability to capture interactions. More nodes -> more interactions we capture.   




## 0.1 Forward propagation

Forward propagation algorithm: Use data to make predictions.    

Going back to the bank example where we try to predict how many transacations a user will make at our bank.    
If we make predictions based on only the no. of children and no. of existing accounts.    
For a customer with 2 children and 3 accounts, there will be 2 nodes (2 and 3) in the input layer.     
The hidden layer (also 2 nodes) are connected to the input layer nodes by lines. Each line has a weight indicating how strongly that input effects the hidden node that the line ends at. These are the first set of weights: the top node of the hidden layer is conected by no. child (2) with line = 1, and no. acct (3) with line = 1. These weights are the parameter we train (the number of the lines) or change when we fit a neural network to data.      

To make predictions for the top node of the hidden layer, we take the value of each node in the input layer, times it by the weight that ends at the node and sum it up. So this node will have the value of 2x1+3x1 = 5.      

With each node calculated and the weight of the lines determined, we propagate through till we get the number at the output layer.    

So forward propagation uses multiply and then add process. This is actually a dot product.    
This is forward propagation for a single data point. In general, we do forward propagation for 1 data point at a time.    



In [5]:
# forward propagation simple example
import numpy as np

# 2 children, 3 account
input_data = np.array([2,3])

# weights into each node in the hiddenlayer and to the output
weights = {"node_0": np.array([1,1]),"node_1": np.array([-1,1]),"output": np.array([2,-1])}

#top hidden node value
node_0_value = (input_data * weights["node_0"]).sum()
#bottom hidden node value
node_1_value = (input_data * weights["node_1"]).sum()
                
hidden_layer_values = np.array([node_0_value,node_1_value])
                
print(hidden_layer_values)

output = (hidden_layer_values*weights["output"]).sum()
print(output)

[5 1]
9


## 0.2 Activation functions

Creating the multiply-add-process is only half the story for hidden layers.     
For neural networks to achieve their maximum predictive power, we must apply *activation function* in the hidden layers.   

An *activation function* allows the model to capture non-linearities. That is, if the relationships in the data are not straight-line relationships, then we will need an activation functions that captures non-linearities.    

This *activation function* is something applied to the value coming into a node, which then transforms it into the value stored in that node (or the node output).        

An example of an activation function is a tanh function. If applied to the top node of the hidden layer in the example, then it will be tanh(5) instead of 5.     

Today, the standard in both industry and research applications is something called ReLU or Rectified Linear activation function:         
RELU(x) = 0 if x<0, x if x>=0      
It consist of 2 linear pieces, and can be powerful when composed together through multiple successive hidden layers.    



In [6]:
# forward propagation simple example
# tanh as activation function
import numpy as np


input_data = np.array([-1,2])

# weights into each node in the hiddenlayer and to the output
weights = {"node_0": np.array([3,3]),"node_1": np.array([1,5]),"output": np.array([2,-1])}

#top hidden node value
node_0_input = (input_data * weights["node_0"]).sum()
node_0_output = np.tanh(node_0_input)
#bottom hidden node value
node_1_input = (input_data * weights["node_1"]).sum()
node_1_output = np.tanh(node_1_input)
                
hidden_layer_output = np.array([node_0_output,node_1_output])
                
print(hidden_layer_output)

output = (hidden_layer_output*weights["output"]).sum()
print(output)

[0.99505475 0.99999997]
0.9901095378334199


In [7]:
# activation function: relu(x)
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)
    
    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

-3


In [None]:
# generate predictions for multiple data observation




# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row * weights["node_0"]).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row * weights["node_1"]).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights["output"]).sum()
    model_output = relu(input_to_final_layer)
    
    # Return model output
    return(model_output)


# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row,weights))

# Print results
print(results)
        