# Redes Neurais

## Criando uma rede neural

A typical training procedure for a neural network is as follows:

1. **Define the neural network that has some learnable parameters (or weights)**
2. **Iterate over a dataset of inputs**
3. **Process input through the network**
4. **Compute the loss (how far is the output from being correct)**
5. Propagate gradients back into the network’s parameters
6. Update the weights of the network, typically using a simple update rule: weight = weight - lr * gradient

Nós definimos manualmente um classificador linear, criando dois pesos (w_a e w_b) para serem aprendidos e atualizando eles com gradiente descendente manualmente. Agora vamos ver como o PyTorch nos oferece uma abstração em módulos, para nos preocupamos paenas com a definição da nossa rede neural (a nossa reta de antes)


In [25]:
import torch
import torch.nn as nn


class MyMLP(nn.Module):
    
    def __init__(self, vocab_size, emb_size, hidden_size, nb_classes):
        super().__init__()
        self.emb_layer = nn.Embedding(vocab_size, emb_size)
        self.hidden_layer = nn.Linear(emb_size, hidden_size)
        self.out_layer = nn.Linear(hidden_size, nb_classes)
        self.dropout_hidden = nn.Dropout(0.5)
    
    def forward(self, x, nonlinearity='sigmoid'):
        x = self.emb_layer(x)
        x = torch.sigmoid(x) if nonlinearity == 'sigmoid' else torch.tanh(x)
        x = self.hidden_layer(x)
        x = self.dropout_hidden(x)
        print(x)
        print(x.shape)
        x = torch.sigmoid(x) if nonlinearity == 'sigmoid' else torch.tanh(x)
        x = self.out_layer(x)
        x = torch.softmax(x, dim=-1)
        return x
    

# Se uma classe herda de nn.Module ela é considerada um módulo do pytorch

class Net(nn.Module):  # uma MLP com uma hidden layer!

    def __init__(self, num_features, hidden_size, nb_classes):
        super(Net, self).__init__()
        self.in_layer = nn.Linear(num_features, hidden_size)
        self.out_layer = nn.Linear(hidden_size, nb_classes)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = self.in_layer(x)
        x = torch.relu(x)
        x = self.out_layer(x)
        x = torch.sigmoid(x)
        return x
    
#     def backward(self, x):
#         #  não precisa implementar isso
#         pass

mymlp = MyMLP(100, 50, 20, 2)
net = Net(64*64, 120, 10)
print(mymlp)
print(net)

MyMLP(
  (emb_layer): Embedding(100, 50)
  (hidden_layer): Linear(in_features=50, out_features=20, bias=True)
  (out_layer): Linear(in_features=20, out_features=2, bias=True)
  (dropout_hidden): Dropout(p=0.5)
)
Net(
  (in_layer): Linear(in_features=4096, out_features=120, bias=True)
  (out_layer): Linear(in_features=120, out_features=10, bias=True)
)


Os parâmetros (pesos) que devem ser treinados para nossa rede (módulo) pode ser acessado com `.parameters()`

In [6]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # in_layer size

4
torch.Size([120, 4096])


Bora jogar uma entrada aleatória na rede, mesmo sem treinar, só pra ver o ue acontece

In [7]:
input = torch.randn(1, 64*64)  # primeira dim é o batch_size
out = net(input)
print(out)
print(torch.max(out, dim=-1))

tensor([[0.5136, 0.4990, 0.5216, 0.5382, 0.5776, 0.5110, 0.5005, 0.5645, 0.4996,
         0.4662]], grad_fn=<SigmoidBackward>)
(tensor([0.5776], grad_fn=<MaxBackward0>), tensor([4]))


## Função de custo

Definir uma função de custo em pytorch é muito simples! As mais usadas na literatura já estão implementadas e basta chamar seu nome. Caso deseja criar uma nova, basta definir o `forward` e o resto será feito pelo pytorch! Vamos dar uma olhada nisso com exemplos

In [13]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
print(target.shape)
print(target, output)
my_loss = nn.MSELoss()

torch.Size([1, 10])
tensor([[-2.3338,  0.7477, -0.4520, -0.3884,  0.0286,  0.9032, -1.0489,  1.1140,
          2.5504, -0.1358]]) tensor([[0.5136, 0.4990, 0.5216, 0.5382, 0.5776, 0.5110, 0.5005, 0.5645, 0.4996,
         0.4662]], grad_fn=<SigmoidBackward>)


lista completa de todas as losses: 
https://pytorch.org/docs/stable/nn.html#loss-functions

In [11]:
loss = my_loss(output, target)
loss

tensor(0.6290, grad_fn=<MseLossBackward>)

Se seguir o backprop pra loss, veremos que o grafo será o seguinte:

input -> linear -> relu -> linear -> sigmoid
      -> MSELoss
      -> loss

Backpropagation agora é como anteriormente! Só chamar loss.backward()

In [14]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('in_layer.bias.grad before backward')
print(net.in_layer.bias.grad)

loss.backward()

print('in_layer.bias.grad after backward')
print(net.in_layer.bias.grad)

in_layer.bias.grad before backward
None
in_layer.bias.grad after backward
tensor([ 0.0000e+00,  1.6721e-03, -4.4005e-03,  1.1139e-02,  1.0901e-02,
         0.0000e+00,  0.0000e+00,  0.0000e+00, -2.0454e-03,  0.0000e+00,
         8.1291e-03,  0.0000e+00,  0.0000e+00, -8.9753e-03, -4.0322e-03,
         0.0000e+00, -3.5479e-03,  3.7232e-05,  2.2290e-03,  0.0000e+00,
         4.3925e-03, -8.2164e-03,  7.2483e-03,  9.7614e-03,  0.0000e+00,
        -1.1811e-02,  0.0000e+00, -4.8036e-03,  0.0000e+00, -5.1431e-03,
         0.0000e+00,  9.2927e-03, -3.8401e-03,  0.0000e+00,  7.7450e-04,
        -9.0245e-04,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        -5.7607e-03,  5.2065e-03, -7.3092e-03,  0.0000e+00,  0.0000e+00,
         0.0000e+00,  8.5516e-03,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        -4.8287e-03,  1.2702e-03,  0.0000e+00,  0.0000e+00,  0.0000e+00,
         5.0386e-03,  1.1476e-02, -1.0987e-03,  0.0000e+00, -6.5559e-03,
         7.6140e-03,  0.0000e+00, -1.9423e-03, -7.

## Atualizando os pesos (Optimizer)

Na nossa regressão linear estavamos atualizando os pesos com a famosa regra "delta"

    wi = wi - lr*wprev 

Mas existe uma gama de algoritmos que procuram atualizar os pesos de uma forma mais inteligente:

<img src="http://cs231n.github.io/assets/nn3/opt2.gif" width="45%" style="float:left;" />
<img src="http://cs231n.github.io/assets/nn3/opt1.gif" width="45%" style="float:left;" />

In [15]:
# regra delta manualmente:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

In [17]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = my_loss(output, target)
loss.backward()
optimizer.step()    # Does the update

## Treinamento

Basta seguir o passo acima por um número de épocas!

In [22]:
my_loss = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)  # olha a documentação os outros parâmetros que cada optimizer tem!
dataset = []
  
for _ in range(10):
    input = torch.randn(10, 64*64) # random input
    target = torch.randn(10, 10) # a dummy target, for example and make it the same shape as output
    dataset.append((input, target))

for x,y in dataset:
    print(x.shape, y.shape)

torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])
torch.Size([10, 4096]) torch.Size([10, 10])


In [23]:
for epoch in range(100):
    for input, target in dataset:
        output = net(input)
        loss = my_loss(output, target)
        loss.backward()
        optimizer.step()    

Acc:  0.1


In [24]:
with torch.no_grad():
    c = 0
    for input, target in dataset:
        output = net(input)
        if torch.argmax(output) == torch.argmax(target):
            c += 1
    print('Acc: ', c/len(dataset))

Acc:  0.1
