# Neural Network From Scratch. 🔨

## The goal of this project is to implement a Neural Network from scratch without using python libraries for it.

## Base Layer:

The base layer will be a class named "Layer" and it has two attributes:
- **Input**
- **Output**

As well as two methods:
- **Forward:** Takes in the input and gives you the output.
- **Backward:** Takes in the derivative of the error with respect to the output, which is gonna be called "Output gradient" and is responsible for two things:
  - Updating the trainable parameters, if any.
  - Returning the derivative of the error with respect to the input of the layer.


In [14]:
class Layer:
  def __init__(self):
    self.input = None
    self.output = None

  def forward(self, input):
    pass

  def backward(self, output_gradient, learning_rate):
    pass

## Dense Layer:

The Dense Layer (Or Fully Connected Layer) will connect a set of *i* input neurons to *j* output neurons. The input will be denoted as *x* and the output as *y*. Each input neuron is connected to every output neuron and each connection represents a "weight", noted by 𝑤ⱼᵢ, meaning the weight that connects the output neuron *j* to the input neuron *i*. Every output value is computed as the sum of all the inputs multiplied by the weights connecting them to that specific output, plus the bias, as shown by the equations:
<br><br>
$$
  y_{1} = x_{1}w_{11} + x_{2}w_{12} + \dots + x_{i}w{1i} + b_{1}\\
  y_{2} = x_{1}w_{21} + x_{2}w_{22} + \dots + x_{i}w{2i} + b_{2}\\
  y_{3} = x_{1}w_{31} + x_{2}w_{32} + \dots + x_{i}w{3i} + b_{3}\\
  y_{j} = x_{1}w_{j1} + x_{2}w_{j2} + \dots + x_{i}w{ji} + b_{j}
$$
<br>
This equation can be turned into a matrix multiplication, represented by:
<br><br>
$$\begin{bmatrix}
  y_{1} \\
  y_{2} \\
  \vdots \\
  y_{j} \\
\end{bmatrix} =
\begin{bmatrix}
  w_{11} & w_{12} & \dots & w_{1i} \\
  w_{21} & w_{22} & \dots & w_{2i} \\
  \vdots & \vdots & \ddots & \vdots \\
  w_{j1} & w_{j2} & \dots & w_{ji} \\
\end{bmatrix}
\begin{bmatrix}
  x_{1} \\
  x_{2} \\
  \vdots \\
  x_{i} \\
\end{bmatrix} +
\begin{bmatrix}
  b_{1} \\
  b_{2} \\
  \vdots \\
  b_{j} \\
\end{bmatrix}$$
<br>
And finally, the equation can be simplified to:
<br><br>
$$ Y = W ⋅ X + B $$




In [2]:
import numpy as np

class Dense(Layer):
  def __init__ (self, input_size, output_size):
    self.weights = np.random.randn(output_size, input_size)
    self.bias = np.random.randn(output_size, 1)

  def forward(self, input):
    self.input = input
    return np.dot(self.weights, self.input) + self.bias

  def backward(self, output_gradient, learning_rate):
    weights_gradient = np.dot(output_gradient, self.input.T)
    self.weights -= learning_rate * output_gradient
    return np.dot(self.weights.T, output_gradient)

## Activation Layer:

The activation layer takes in input neurons and passes them through an activation function, therefore, the output has the same shape as the input.

The forward propagation can be represented by:

$$ Y = f(X) $$
<br>
Where f is the activation function and f of X means that you apply the function f to every element of X.

In [4]:
class Activation(Layer):
  def __init__(self, activation, activation_prime):
    self.activation = activation
    self.activation_prime = activation_prime

  def forward(self,input):
    self.input = input
    return self.activation(self.input)

  def backward(self, output_gradient, learning_rate):
    return np.multiply(output_gradient, self.activation_prime(self.input))

##Hyperbolic Tangent:

In [16]:
class Tanh(Activation):
  def __init__(self):
    tanh= lambda x: np.tanh(x)
    tanh_prime = lambda x: 1 - np.tanh(x) ** 2
    super().__init__(tanh, tanh_prime)

##Loss function (Mean Squared Error):

In [19]:
def mse(y_true, y_pred):
  return np.mean(np.power(y_true - y_pred, 2))

def mse_prime(y_true, y_pred):
  return 2 * (y_pred - y_true) / np.size(y_true)

## Solving the XOR problem with the Neural Network:

In [12]:
X = np.reshape([[0,0],[0,1],[1,0],[1,1]], (4,2,1))
Y = np.reshape([[0],[1],[1],[0]], (4,1,1))

In [17]:
network = [
    Dense(2,3),
    Tanh(),
    Dense(3,1),
    Tanh()
]


In [20]:
epochs = 10000
learning_rate = 0.1

# Train
for e in range(epochs):
  error = 0
  for x, y in zip(X, Y):
    # Forward
    output = x
    for layer in network:
      output = layer.forward(output)

    # Error
    error += mse(y, output)

    # Backward
    grad = mse_prime(y, output)
    for layer in reversed(network):
      grad = layer.backward(grad, learning_rate)

  error /= len(x)
  print('%d/%d, error=%f' % (e + 1, epochs, error))

[1;30;43mA saída de streaming foi truncada nas últimas 5000 linhas.[0m
5001/10000, error=0.588620
5002/10000, error=0.588620
5003/10000, error=0.588620
5004/10000, error=0.588620
5005/10000, error=0.588620
5006/10000, error=0.588620
5007/10000, error=0.588620
5008/10000, error=0.588620
5009/10000, error=0.588620
5010/10000, error=0.588620
5011/10000, error=0.588620
5012/10000, error=0.588620
5013/10000, error=0.588620
5014/10000, error=0.588620
5015/10000, error=0.588620
5016/10000, error=0.588620
5017/10000, error=0.588620
5018/10000, error=0.588620
5019/10000, error=0.588620
5020/10000, error=0.588620
5021/10000, error=0.588620
5022/10000, error=0.588620
5023/10000, error=0.588620
5024/10000, error=0.588620
5025/10000, error=0.588620
5026/10000, error=0.588620
5027/10000, error=0.588619
5028/10000, error=0.588619
5029/10000, error=0.588619
5030/10000, error=0.588619
5031/10000, error=0.588619
5032/10000, error=0.588619
5033/10000, error=0.588619
5034/10000, error=0.588619
5035/1000