<a href="https://colab.research.google.com/github/r-moret/neural_networks_from_scratch/blob/main/basic_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic Neural Network Implementation
----------------------------------
<div align="center">

</div>

In this notebook I'm going to implement the most basic posible implementation of a neural network through all its steps:

1. Initialization (`initialize_nn`)
2. Forward propagation (`forward_propagation`)
3. Compute cost (`compute_cost`)
4. Backpropagation
5. Update weights

First thing, let's import the necessary libraries

In [None]:
import numpy as np

## Network initialization
To initialize the network we need to create some matrix to store the weights and the bias of the neurons that we'll use later in our computations. The size of the matrix are $Wi \equiv (n_i \times n_{i-1})$ and $b_i \equiv (n_i \times 1)$, where $n_i$ is the number of neurons of the layer $i$ (it could be transposed but I find this form clearer).

$$
W_i = \begin{bmatrix} 
    w_{1,1} & w_{1,2} & \dots & w_{1,n_{i-1}} \\
    w_{2,1} & w_{2,2} & \dots & w_{2,n_{i-1}} \\
    \vdots & \vdots & \ddots & \vdots \\
    w_{n_i,1} & w_{n_i,2} & \dots & w_{n_i,n_{i-1}}
    \end{bmatrix} \qquad
b_i = \begin{bmatrix}
     b_1 \\
     \vdots \\
     b_{n_i}
     \end{bmatrix} =
     \begin{bmatrix}
     0 \\
     \vdots \\
     0
     \end{bmatrix}
$$

The weights, in order to **break the symmetry**, need to be initialized **randomly**, in the other hand, the bias can be initialized to zero.
If we would not break the symmetry of the network weights the network would not be able to compute anything useful.

In [None]:
def initialize_nn(architecture):
    np.random.seed(0)
    
    # num of hidden layers + output layer
    num_layers = len(architecture)-1
    # weights and bias
    # W1, W2, ... 
    # b1, b2, ...
    parameters = dict()

    for l in range(num_layers):

        parameters["W" + str(l+1)] = np.random.rand(architecture[l+1], architecture[l])
        parameters["b" + str(l+1)] = np.zeros((architecture[l+1], 1))

    return parameters

## Forward propagation step

First, the activation functions I'm using in this example are ReLU and Sigmoid, so I'm coding them in the hard way in order to make the forward step more readable.

In [None]:
def relu(x):
    return np.maximum(0,x)

def sigmoid(x):
    return 1./(1+np.exp(-x))

Next, to implement the forward step I'm computing the outputs of all the layers with the previous layer activation value as input. For example, for layer 1 it means:

$$
Z_1 = W_1*A_0 + b_1 \\
A_1 = g_1(Z_1)
$$

$g_i$: activation function used in the neurons of the layer $i$

In [None]:
def forward_propagation(parameters, input):
    num_layers = len(parameters)//2
    # Z: synaptic potentials
    # A: activation values
    Z = dict()
    A = dict()

    # the input value is the "activation value" of the input layer
    A["A" + str(0)] = np.reshape(input, (len(input),1))
    
    # hidden layers: relu()
    for l in range(num_layers):

        Z["Z" + str(l+1)] = np.add(np.matmul(parameters["W" + str(l+1)], A["A" + str(l)]), parameters["b" + str(l+1)])
        A["A" + str(l+1)] = relu(Z["Z" + str(l+1)])
    
    # last layer: sigmoid()
    Z["Z" + str(num_layers)] = np.add(np.matmul(parameters["W" + str(num_layers)], A["A" + str(num_layers-1)]), parameters["b" + str(num_layers)])
    A["A" + str(num_layers)] = sigmoid(Z["Z" + str(num_layers)])

    return A["A" + str(num_layers)]

## Computing cost

In this simple implementation I'm using the **Cross Entropy** function to compute the cost of the network.

$$
J = -(y*log(\hat y) + (1-y)*log(1-\hat y))
$$

$y$: target value (`real_output`) \\
$\hat y$: predicted value (`prediction`) \\
$log(x)$: natural logarithm of $x$

In [None]:
# cross entropy cost
def compute_cost(prediction, real_output):
    cost = -(real_output * np.log(prediction) + (1-real_output) * np.log(1-prediction))
    return cost

-----------------------------

In [None]:
arch = [2, 3, 1]
x = np.array([1, 2])
real_y = np.array([1])

network = initialize_nn(arch)
y = forward_propagation(network, x)
cost = compute_cost(y, real_y)

for l in range(len(arch)-1):
    print("W" + str(l+1) + ": " + str(network["W" + str(l+1)]))

print(y)
print(cost)