# Neural Network (Rede Neural)
**[EN-US]**

Each layer ($l$) inserts a vector of numbers to the next layer, where the neurons of the next layer apply a linear function ($\vec{z}$) and then a non-linear function, activation ($g(\vec {z})$) to the vector, then this new vector of numbers is passed to the next layer, which will perform the same calculation, and the vector is passed from layer to layer until reaching the final calculation of the output layers, which is neural network prediction.

We can use the notation $a^{[0]} = X$, that is, $X$, the $X$ matrix of the input layer, the initial layer of our neural network, is considered as $a^{[0] }$, the activation on layer 0.

**[PT-BR]**

Cada layer ($l$) insere um vetor de números à próxima layer, onde os neurônios da próxima layer aplicam uma função linear ($\vec{z}$) e depois uma função não linear, a ativação ($g(\vec{z})$) ao vetor, em seguida, esse novo vetor de números é passado para a próxima layer, que realizará o mesmo cálculo, e o vetor é passado de layer em layer até chegar ao cálculo final das outputs layers, que é a previsão da rede neural.

Podemos usar a notação $a^{[0]} = X$, ou seja, $X$, a matriz $X$ da input layer, a layer inicial da nossa rede neural, é considerada como $a^{[0]}$, a ativação na layer 0.

## Table of Contents
* [Libraries](#Libraries)
* [Simples Neural Network](#Simple-Neural-Network-(Rede-Neural-Simples))
    * [Activations](#Activations-(Ativações))
    * [Dense Layer](#Dense-Layer-(Câmada-Densa))
    * [Sequential](#Sequential-(Sequencial))
* [Vectorize Implementation](#Vectorized-Implementation-(Implementação-Vetorizada))

## Libraries

In [3]:
import numpy as np

## Simple Neural Network (Rede Neural Simples)
### Activations (Ativações)
**[EN-US]**

`Hidden layer`: refers to the fact that in the training set the true values for these `hidden units` are not observed, that is, we do not see what they should be in the training set. We see what the inputs and outputs are, what the outputs should be, but the things in the hidden layers are not seen in the training set.

`Activations`: refers to the values that different layers of the neural network are passing to subsequent layers. Activation functions:

**[PT-BR]**

`Hidden layer`: refere-se ao fato que no training set os valores verdadeiros para essas `hidden units` nã são observados, ou seja, não vemos o que eles deveriam ser no training set. Vemos quais são os inputs e os outputs, o que os outputs deveriam ser, mas as coisas nas hidden layers não são vistas no training set.

`Activations`: refere-se aos valores que diferentes layers da rede neural estão passando para as layers subsequentes. Funções de ativação:
$$z = \mathbf{w} \cdot \mathbf{X} + b$$
- linear/no activation (sem ativação):
$$a = g(z) = Z$$
- sigmoid:
$$a = g(z) = \frac{1}{1 + e^{-z}}$$
- tanh:
$$a = g(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$$
- ReLU (Rectified Linear Unit):
$$a = g(z) = max(0,\ z)$$
- Leaky ReLU:
$$a = g(z) = max(0.01 z,\ z)$$
- softmax:
$$a_j = g(z_j) = \frac{e^{z_j}}{\sum\limits_{k=0}^{N-1}e^{z_k}}$$

In [None]:
def softmax(z):
    ez = np.exp(z)
    a = ez / np.sum(ez)
    return a

### Dense Layer (Câmada Densa)
**[EN-US]**

Because every feature in the input layer is interconnected with every neuron in the next layer, and so on.
* We can call it either Dense, or Fully Connected (FC), or Flatten Layer.

**[PT-BR]**

Porque todas as features da input layer estão interconectadas com cada neurônio da próxima layer, e assim por diante.
* Podemos chamá-la ou de Dense, ou de Fully Connected (FC), ou de Flatten layer.

In [1]:
def dense(a_in, W, b, activation='linear'):
    units = W.shape[1]
    a_out = np.zeros(units)

    for j in range(units):
        w = W[:, j]
        Z = np.dot(w, a_in) + b[j]

        if activation == 'sigmoid':
            a_out[j] += 1 / (1 + np.exp(-Z))
        elif activation == 'tanh':
            a_out[j] += (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z))
        elif activation == 'relu':
            a_out[j] += np.maximum(0, Z)
        elif activation == 'leaky_relu':
            a_out[j] += np.maximum(0.01 * Z, Z)
        elif activation == 'softmax':
            a_out[j] += np.exp(Z) / np.sum(np.exp(Z))
        else:
            a_out[j] = z
    return a_out

### Sequential (Sequencial)
**[EN-US]**

It will join the layers sequentially.


**[PT-BR]**

Juntará as layers sequencialmente.

In [58]:
def sequential(X, W1, b1, W2, b2, dense):
    a1 = dense(X, W1, b1)
    a2 = dense(a1, W2, b2)
    return a2

In [62]:
def predict(X, W1, b1, W2, b2, sequential, dense):
    m, _ = X.shape
    yhat = np.zeros((m, 1))

    for i in range(m):
        yhat[i, 0] = sequential(X[i], W1, b1, W2, b2, dense)
    return yhat

In [63]:
W1 = np.array( [[-8.93,  0.29, 12.9 ], [-0.1,  -7.32, 10.81]] )
b1 = np.array( [-9.82, -9.28,  0.96] )
W2 = np.array( [[-31.18], [-27.59], [-32.56]] )
b2 = np.array( [15.41] )
X = np.array([
    [200,13.9],
    [200,17]])

In [None]:
yhat = predict(X, W1, b1, W2, b2, sequential, dense)
print(f'decisions = {yhat}')

## Vectorized Implementation (Implementação Vetorizada)

In [2]:
def dense_vectorized(A_in, W, B, activation='linear'):
    Z = np.matmul(A_in, W) + B # Ou/Or, Z = A_in @ W + B
    
    if activation == 'sigmoid':
        A_out = 1 / (1 + np.exp(-Z))
    elif activation == 'tanh':
        A_out += (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z))
    elif activation == 'relu':
        A_out += np.maximum(0, Z)
    elif activation == 'leaky_relu':
        A_out += np.maximum(0.01 * Z, Z)
    elif activation == 'softmax':
        A_out = np.exp(Z) / np.sum(np.exp(Z))
    else:
        return Z
    return A_out