# Multi-Layer Perceptron and Basics

## 1 Forward Propagation 前向传播

Simple steps to make a prediction:
1. Intialize the weights and baises (randomly assign numbers)
2. Compute weighted sum at each node
3. Compute node activation
4. Use Forward Propagation to propagate data

<img src="http://cocl.us/neural_network_example" alt="Neural Network Example" width="600px">

In [None]:
# Install necessary libraries

# %pip install --user numpy==1.26.4

Note: you may need to restart the kernel to use updated packages.


In [4]:
# Import libraries

import numpy as np

### 1.1 给Weights和Biases随机取值：</br>
np.random.uniform(low, high, size)从均匀分布的区域[low, high)中随机取样

1. low：采样区域的下界，float类型或者int类型或者数组类型或者迭代类型，默认值为0 
2. high：采样区域的上界，float类型或者int类型或者数组类型或者迭代类型，默认值为1 
3. size：输出样本的数目(int类型或者tuple类型或者迭代类型) 
4. 返回对象：ndarray类型，形状和size中的数值一样

In [5]:
# Initialize the weights and biases by randomly generating numbers 
# Here: sample from an Uniformly distributed space in [0, 1)

weights = np.around(np.random.uniform(size=6), decimals=2) # initialize the weights
biases = np.around(np.random.uniform(size=3), decimals=2) # initialize the biases

print("Weights: ", weights)
print("Biases: ", biases)

Weights:  [0.37 0.61 0.95 0.83 0.85 0.75]
Biases:  [0.41 0.45 0.56]


## 1.2 用上面的权重和偏差计算预测值
Compute the output for a given input, $x_1$ and $x_2$. </br>
Given: $x_1 = 0.5$ , $x_2 = 0.85$

In [6]:
x_1 = 0.5 # input 1
x_2 = 0.85 # input 2
print('x1 is {} and x2 is {}'.format(x_1, x_2))

# Compute the weighted sum of inputs for the nodes of the hidden layer

z_11 = x_1 * weights[0] + x_2 * weights[1] + biases[0]
print('The weighted sum of the inputs at the first node in the hidden layer is {}'.format(z_11))

z_12 = x_1 * weights[2] + x_2 * weights[3] + biases[1]
print('The weighted sum of the inputs at the second node in the hidden layer is {}'.format(np.around(z_12, decimals=4)))

x1 is 0.5 and x2 is 0.85
The weighted sum of the inputs at the first node in the hidden layer is 1.1135
The weighted sum of the inputs at the second node in the hidden layer is 1.6305


Assuming a sigmoid activation function, compute the activated values for the nodes.

In [7]:
a_11 = 1.0 / (1.0 + np.exp(-z_11))
print('The activation of the first node in the hidden layer is {}'.format(np.around(a_11, decimals=4)))

a_12 = 1.0 / (1.0 + np.exp(-z_12))
print('The activation of the second node in the hidden layer is {}'.format(np.around(a_12, decimals=4)))

The activation of the first node in the hidden layer is 0.7528
The activation of the second node in the hidden layer is 0.8362


Compute the output of the network as the activation of the node in the output layer.

In [8]:
z_2 = a_11 * weights[4] + a_12 * weights[5] + biases[2]
print('The weighted sum of the inputs at the node in the output layer is {}'.format(np.around(z_2, decimals=4)))

a_2 = 1.0 / (1.0 + np.exp(-z_2))
print('The output of the network for x1 = 0.5 and x2 = 0.85 is {}'.format(np.around(a_2, decimals=4)))

The weighted sum of the inputs at the node in the output layer is 1.827
The output of the network for x1 = 0.5 and x2 = 0.85 is 0.8614


## 1.3 Generalize the network-initialization process

Obviously, neural networks for real problems are composed of many hidden layers and many more nodes in each layer. So, we can't continue making predictions using this very inefficient approach of computing the weighted sum at each node and the activation of each node manually.
</br>
We can code an automatic way of making predictions.</br>

A general network would take $n$ inputs, would have many hidden layers, each hidden layer having $m$ nodes, and would have an output layer. 

Although the network is showing one hidden layer, but we will code the network to have many hidden layers. Similarly, although the network shows an output layer with one node, we will code the network to have more than one node in the output layer.

<img src="http://cocl.us/general_neural_network" alt="Neural Network General" width="400px">

### 1.3.1 Build the network 

Formally defining the structure of the network

In [9]:
# Define the structure of the network

n = 2                    # number of inputs
num_hidden_layers = 2    # number of hidden layers
m = [2, 2]               # number of nodes in each hidden layer
num_nodes_output = 1     # number of nodes in the output layer

#### - 1 Initialize the weights and biases 初始化权值和偏差

Initialize the weights and biases in the network to random numbers.

The logic is:
1. To calculate the weights and biases, we need to know how many numbers to generate - which depends on the number of nodes on each layer
2. Fully connected layers: Weights are needed for each node in one layer corresponding with each node in the previous layer
    1. We can traverse each layer, from 1st hidden layer -> the output layer
    2. Num of weights: num of nodes of previous layer (from input -> the last hidden layer)
    3. Num of biases: numb of nodes of current layer (from 1st hidden layer -> output layer)
    

In [10]:
import numpy as np


num_nodes_previous = n      # number of nodes in the previous layer, initialized by input nodes

network = {}                # initialize network in an empty dictionary

# Loop through each layer and randomly initialize the weights and biases associated with each node
# Adding 1 to the number of hidden layers in order to include the output layer

for layer in range(num_hidden_layers + 1):
    
    # determine name of the layer & num_nodes in this layer
    if layer == num_hidden_layers:
        # it's the last layer we iterate - output layer
        layer_name = 'output'
        num_nodes = num_nodes_output
    else:
        layer_name = 'layer_{}'.format(layer + 1)  # naming
        num_nodes = m[layer]
        
    # initialize weights and biases associated with each node in the current layer
    network[layer_name] = {}        # store all weights and biases for this layer
    for node in range(num_nodes):
        node_name = 'node_{}'.format(node + 1)
        network[layer_name][node_name] = {
            'weights': np.around(np.random.uniform(size=num_nodes_previous), decimals = 2),
            'biases': np.around(np.random.uniform(size=1), decimals = 2)
        }
    
    # Update layer nodes to the next one
    num_nodes_previous = num_nodes
    
print(network)

{'layer_1': {'node_1': {'weights': array([0.29, 0.1 ]), 'biases': array([0.82])}, 'node_2': {'weights': array([0.94, 0.04]), 'biases': array([0.89])}}, 'layer_2': {'node_1': {'weights': array([0.61, 0.51]), 'biases': array([0.04])}, 'node_2': {'weights': array([0.35, 0.58]), 'biases': array([0.07])}}, 'output': {'node_1': {'weights': array([0.67, 0.93]), 'biases': array([0.11])}}}


Initialization for network weights and biases is done.

Put the initialization process into a function: `initialize_network()`

In [15]:
def initialize_network(num_inputs, num_hidden_layers, num_nodes_hidden, num_nodes_output):
    
    num_nodes_previous = num_inputs   # number of nodes in the previous layer, initialized by input nodes
    network = {}                      # initialize network in an empty dictionary

    # Loop through each layer and randomly initialize the weights and biases associated with each node

    for layer in range(num_hidden_layers + 1):
        
        # determine name of the layer & num_nodes in this layer
        if layer == num_hidden_layers:
            layer_name = 'output'    # the last layer is output layer
            num_nodes = num_nodes_output
        else:
            layer_name = 'layer_{}'.format(layer + 1)  
            num_nodes = num_nodes_hidden[layer]
            
        # initialize weights and biases associated with each node in the current layer
        network[layer_name] = {}        # store all weights and biases for this layer
        for node in range(num_nodes):
            node_name = 'node_{}'.format(node + 1)
            network[layer_name][node_name] = {
                'weights': np.around(np.random.uniform(size=num_nodes_previous), decimals = 2),
                'biases': np.around(np.random.uniform(size=1), decimals = 2)
            }
        
        # Update layer nodes to the next one
        num_nodes_previous = num_nodes
    
    return network

Use the `initialize_network()` function to create a network that:

1. takes 5 inputs
2. has three hidden layers
3. has 3 nodes in the first layer, 2 nodes in the second layer, and 3 nodes in the third layer
4. has 1 node in the output layer

Call the network **small_network**.

In [17]:
# Excercise:

small_network = initialize_network(5, 3, [3, 2, 3], 1)
print(small_network)

{'layer_1': {'node_1': {'weights': array([0.45, 0.51, 0.05, 0.51, 0.53]), 'biases': array([0.75])}, 'node_2': {'weights': array([0.38, 0.99, 0.88, 0.41, 0.33]), 'biases': array([0.3])}, 'node_3': {'weights': array([0.72, 0.74, 0.57, 0.65, 0.14]), 'biases': array([0.67])}}, 'layer_2': {'node_1': {'weights': array([0.61, 0.39, 0.12]), 'biases': array([0.45])}, 'node_2': {'weights': array([0.55, 0.18, 0.81]), 'biases': array([0.46])}}, 'layer_3': {'node_1': {'weights': array([0.56, 0.65]), 'biases': array([0.01])}, 'node_2': {'weights': array([0.37, 0.39]), 'biases': array([0.97])}, 'node_3': {'weights': array([0.72, 0.39]), 'biases': array([0.94])}}, 'output': {'node_1': {'weights': array([0.2 , 0.31, 0.68]), 'biases': array([0.77])}}}


#### - 2 Compute Weighted Sum at Each Node 单个node上做加权求和


Weighted sum at each node: Dot product of the inputs and weights, plus the bias.

$Z = W * X + b$

But start from simple ones.

In [24]:
def computed_weighted_sum(inputs, weights, biases):
    return np.sum(inputs * weights) + biases  # dot product

In [25]:
from random import seed

np.random.seed(12)  # set seed

# Generate random inputs
inputs = np.around(np.random.uniform(size=5), decimals=2)    
print('The inputs to the network are {}'.format(inputs))

# Compute the weighted sum at the first node in the first hidden layer
# used the "small_network" defined above
weights = small_network['layer_1']['node_1']['weights']
biases = small_network['layer_1']['node_1']['biases']

weighted_sum = computed_weighted_sum(inputs, weights, biases)
print('The weighted sum at the first node in the first hidden layer is {}'.
      format(np.around(weighted_sum[0], decimals=4)))


The inputs to the network are [0.15 0.74 0.26 0.53 0.01]
The weighted sum at the first node in the first hidden layer is 1.4835


In [26]:
# # AI generated code:
# def computed_weighted_sum(inputs, weights, biases):
#     """
#     Compute the weighted sum of inputs for a given layer.
    
#     Parameters:
#     inputs (list): List of inputs to the layer.
#     weights (list): List of weights for each node in the layer.
#     biases (list): List of biases for each node in the layer.
    
#     Returns:
#     list: Weighted sum for each node in the layer.
#     """
    
#     # Compute the weighted sum of inputs for each node
#     z = np.around(np.dot(weights, inputs) + biases, decimals=4)
    
#     return z

#### - 3 Compute Node Activation 单个node上的激活值

Define the non-linear transformation of the input using Sigmoid function.

In [28]:
def sigmoid(x):
    return 1.0 / 1.0 + np.exp(-x)   # sigmoid activation function

In [None]:
node_output = sigmoid(weighted_sum)  # apply sigmoid activation function

# Or, in a complete way: node_output  = node_activation(compute_weighted_sum(inputs, node_weights, node_bias))

print('The output of the first node in the first hidden layer is {}'.
      format(np.around(node_output[0], decimals=4)))

The output of the first node in the first hidden layer is 1.2268


### 1.3.2 Forward Propagation in the generalized network

Apply the *compute_weighted_sum* and *node_activation* functions to each node in the network and propagates the data all the way to the output layer and outputs a prediction for each node in the output layer.

The way we are going to accomplish this is through the following procedure:

1. Start with the input layer as the input to the first hidden layer.
2. Compute the weighted sum at the nodes of the current layer.
3. Compute the output of the nodes of the current layer.
4. Set the output of the current layer to be the input to the next layer.
5. Move to the next layer in the network.
6. Repeat steps 2 - 5 until we compute the output of the output layer.

In [None]:
def forward_propagation(network, inputs):
    """
    Perform forward propagation through the network.
    
    Parameters:
    network (dict): The neural network structure with weights and biases.
    inputs (list): The input data to the network.
    
    Returns:
    (dict): The outputs of each layer in the network.
    """
    
    inputs = list(inputs)   # the input layer is the input to the 1st hidden layer
    
    for layer_name, layer in network.items():
        layer_outputs = []  # initialize layer outputs
        
        for node_name, node in layer.items():
            # Compute the weighted sum
            weighted_sum = computed_weighted_sum(inputs, node['weights'], node['biases'])
            
            # Apply activation function
            output = sigmoid(weighted_sum)
            
            # Store the output of the node
            layer_outputs.append(output[0])
        
        # Store the outputs of the current layer
        outputs[layer_name] = np.around(layer_outputs, decimals=4)
        
        # Update inputs for the next layer
        inputs = np.array(layer_outputs)
    

    
    return outputs