# Neural Network

Look at the architecture of neural network in this image. It contains 3 input layer nodes, 4 hidden layer nodes, and 2 output layer nodes. Of course it will be have $(3 \times 4)$ number of weight (W) and 4 biases (b) same numbers with the hidden layer.

<img src="img/nn_single_layer.png">

The main part of neural network part:

- **Input layer**: The main input of the neural network, represented as node and the number of it depends with the data to process.
- **Hidden layer**: The layer to process the input, usually in the middle between Input layer and output layer. Represented as nodes, and the number of it customizeable depends with the user want.
- **Ouput layer**: The layer to show or collect the output, represented as nodes, and the number of it customizeable depends with result or the model that we want.
- **Weight**: The value which always updated during the communication between the Input layer nodes and the Hidden layer nodes, so the numbers of weight (W) is `number of input layer nodes` $\times$ `number of hidden layer nodes`.
- **Bias**: The bias (b) is additional value to add after input pass the hidden layer, the number of bias (b) is depend with the number of hidden layers.
- **Activation function**: Like it's name when compared with the neuron system, it used to chose which the neuron to activate. When implemented in this scheme, it means which the node of hidden layer to activate. So, the number of activation function depends with the number of hidden layer nodes and output nodes (`number of hidden layer nodes` $\times$ `number of output layer nodes`). There are a lot of activation function method such as linear function, sigmoid, tanh, softmax and etc.

## Learning Process

In neural network, Here's some learning process that usually used:

- Training
- Evaluation
- Testing (Optional)

During **The training**, the Weight $W$ and bias $b$ in each neuron or node will be update until the expected result reached. While the training process running, **The evaluation** are running too. It will make sure if the expected result reached yet, and the training process will be stopped. The tesing process is about how implemented the model as the result of the training and the evaluation in the new data (data testing).

For this section will only mention two training step, the two training step is:

- Forward Pass
- Backward Pass

**Forward pass** or also called forward propagation is a process where the dataa carried from the input through the each neuron in the hidden layer to the output layer which the error will be calculate later.

$y_{j} = \sum_{i = 10}^{M} W_{ji}x_{i} + b_{j}$

$h_{j} = \theta(y_{j}) = max(0, y_{j})$

The equation above is an example of a forward pass on the first architecture (see architectural image above) that uses ReLU as an activation function. Where $i$ is the node at the $M$ input layer (3 input nodes from 0-2), $j$ is the node at the hidden layer while $h$ is the output of the node at the hidden layer.

**Backward pass** is the error that was got on the **forward pass** and will be used to update each weight $W$ and bias $b$ with a certain learning rate.

The two processes above will be repeated until a weight $W$ and bias $b$ value is obtained, which it can give the smallest possible error value to the output layer (during the forward pass).

In this section, The implementation of forward pass was tried using Python and Numpy without a framework to make it clearer. Later in the next parts we will try with Tensorflow and Keras.

## Problem

Solve the regression problem, the actually equation is like this.

$f(x) = 3x + 2$

Get the closest model like that equation.

$f(x) = \theta_{0}x + \theta_{1}$

It means get the closest $\theta$ value with the actual equation. While the neural network architecture consists of:
- 1 node on the input layer $\rightarrow$ $(x)$
- 1 node on the output layer $\rightarrow$ $f(x)$

The neural network have been trained and The forward pass will be used to update the weight and bias thar had been obtained during the training before.

## Forward Propagation

The forward pass method below is very simple, `dot` operations will be performed on each element in the input and each weight $W$ that is connected to the input and added with bias $b$. The results of this operation will be entered into the activation function.

In [1]:
import numpy as np

# Linear Activation f(x)
def forward_pass(inputs, weight, bias):
    w_sum = np.dot(inputs, weight) + bias
    
    return w_sum

## Pre-Trained Weight

For the weight and bias that we will try, both values have been obtained in the training process that have been done before. How to get the two values will be explained in the following parts.

In [2]:
# Pre-Trained Weights & Biases after Training
W = np.array([[2.99999928]])
b = np.array([1.99999976])

print("Weight: ", W)
print("Bias: ", b)

Weight:  [[2.99999928]]
Bias:  [1.99999976]


In [3]:
W.shape

(1, 1)

In [4]:
b.shape

(1,)

In [5]:
# Initialize Input Data
inputs = np.array([[7], [8], [9], [10]])

inputs

array([[ 7],
       [ 8],
       [ 9],
       [10]])

In [6]:
inputs.shape

(4, 1)

The calculation process of the inputs and the weight $W$ represented with matrix, the actually is the input layer is only contain one neuron. But in this case, every matrix row represented different input $x$ to train.

$f(x) = W x + b$

which the value of weight and bias has been trained, so the **Forward Pass** will be like this:

$f(x) = [2.99999928] \times \begin{bmatrix}7 \\8 \\9 \\10 \end{bmatrix} + [1.99999976]$

The trained value must be not really far compared with the main equations like this explanation:

$f(x) = 3x + 2 \approx 2.99999928x + 1.99999976$

In [7]:
# Output of Output Layer
o_out = forward_pass(inputs, W, b)

print('Output Layer Output (Linear)')
print('============================')
print(o_out)

Output Layer Output (Linear)
[[22.99999472]
 [25.999994  ]
 [28.99999328]
 [31.99999256]]


In this experiment we will make a prediction of values of 7, 8, 9 and 10. The resulting output should be 23, 26, 29, 32 and the predicted results are 22.99999472, 25.999994, 28.99999328 and 31.99999256. When viewed from the results of predictions, there are still errors but with very small values.