![IFCE Logo](https://ifce.edu.br/fortaleza/comunicacao-2/logo-ifce-campus-de-fortaleza-horizontal.jpg/@@images/ba68e46f-55df-436f-9bfb-5eeed5d6fa0b.jpeg)

**Disciplina:** Aprendizagem de Máquina

**Professor:** Amauri Holanda

**Aluno:** Keven Carneiro

# Redes Neurais

**Questão 1.** O arquivo https://www.dropbox.com/s/s2cyx82uxsv03rq/dados-ex5.txt?dl=0 possui amostras de treinamento para um problema de classificação binária com X = $R^2$ e Y = {0, 1}.
Neste exercício, você deve avaliar a aplicação de redes neurais do tipo MLP (Multilayer Perceptrons) ao problema proposto. Para isso, utilize validação hold-out para seleção de modelo (mostre os erros de validação para cada configuração avaliada) e plote as superfícies de decisão para a melhor e a pior (segundo o erro de validação) das configurações.

In [2]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import sklearn
import sys
import torch
from torch.nn import functional as F

from sklearn.model_selection import train_test_split

In [37]:
df = pd.read_csv('https://www.dropbox.com/s/s2cyx82uxsv03rq/dados-ex5.txt?dl=1', names=['x1', 'x2', 'y'])

In [38]:
x = torch.tensor(df.drop('y', axis='columns').values).float()
y = torch.tensor(df['y']).float()
x_train, x_test, y_train, y_test = train_test_split(x, y)
x_train, x_cv, y_train, y_cv = train_test_split(x_train, y_train)

In [39]:
class Feedforward(torch.nn.Module):
  def __init__(self, input_size, output_size, number_layers, number_units):
    super(Feedforward, self).__init__()
    self.input_size = input_size
    self.output_size = output_size
    self.number_layers = number_layers
    self.number_units = number_units
    fc = torch.nn.ModuleList() # fully connected
    for i in range(number_layers):
      if i == 0:
        fc.append(torch.nn.Linear(input_size, number_units))
      elif i == number_layers-1:
        fc.append(torch.nn.Linear(number_units, output_size))
      else:
        fc.append(torch.nn.Linear(number_units, number_units))
    self.fc = fc

  def forward(self, x):
    for i in range(self.number_layers):
      if i == self.number_layers-1:
        x = F.sigmoid(self.fc[i](x))
      else:
        x = F.relu(self.fc[i](x))
    return x

In [40]:
number_units = range(2, 101, 15)
number_layers = range(2, 10, 1)
learning_rates = [10**-x for x in range(5, -3, -1)]
epoch = 50

best_model = {'model': None, 'optimizer': None, 'loss': sys.maxsize, 'accuracy': 0}
worst_model = {'model': None, 'optimizer': None, 'loss': 0, 'accuracy': 0}

for learning_rate in learning_rates:
  for number_layer in number_layers:
    for number_unit in number_units:
      model = Feedforward(2, 1, number_layer, number_unit)
      criterion = torch.nn.BCELoss()
      optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
      model.train()
      for epoch in range(1, epoch+1):
          optimizer.zero_grad()
          # Forward pass
          y_pred = model(x_train)
          # Compute Loss
          loss = criterion(y_pred.squeeze(), y_train)
          # Backward pass
          loss.backward()
          optimizer.step()
      print(f'#Layers: {number_layer} #Units: {number_unit} LR: {learning_rate} Epoch {epoch}: train loss: {loss.item()}')

      model.eval()
      with torch.no_grad():
        y_pred = model(x_cv)
        accuracy = ((y_pred.squeeze() >= 0.5) == y_cv).sum().item() / y_cv.shape[0]
        print(f'#Layers: {number_layer} #Units: {number_unit} LR: {learning_rate} Training loss: {loss.item()} Cross validation accuracy: {accuracy}')

        if loss.item() < best_model['loss']:
          best_model['model'] = model
          best_model['optimizer'] = optimizer
          best_model['loss'] = loss.item()
          best_model['accuracy'] = accuracy

        if loss.item() > worst_model['loss']:
          worst_model['model'] = model
          worst_model['optimizer'] = optimizer
          worst_model['loss'] = loss.item()
          worst_model['accuracy'] = accuracy


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



#Layers: 2 #Units: 2 LR: 1e-05 Epoch 50: train loss: 0.7381122708320618
#Layers: 2 #Units: 2 LR: 1e-05 Training loss: 0.7381122708320618 Cross validation accuracy: 0.5266272189349113
#Layers: 2 #Units: 17 LR: 1e-05 Epoch 50: train loss: 0.7710756063461304
#Layers: 2 #Units: 17 LR: 1e-05 Training loss: 0.7710756063461304 Cross validation accuracy: 0.5266272189349113
#Layers: 2 #Units: 32 LR: 1e-05 Epoch 50: train loss: 0.7807587385177612
#Layers: 2 #Units: 32 LR: 1e-05 Training loss: 0.7807587385177612 Cross validation accuracy: 0.3431952662721893
#Layers: 2 #Units: 47 LR: 1e-05 Epoch 50: train loss: 0.6551284193992615
#Layers: 2 #Units: 47 LR: 1e-05 Training loss: 0.6551284193992615 Cross validation accuracy: 0.7100591715976331
#Layers: 2 #Units: 62 LR: 1e-05 Epoch 50: train loss: 0.6901153922080994
#Layers: 2 #Units: 62 LR: 1e-05 Training loss: 0.6901153922080994 Cross validation accuracy: 0.5266272189349113
#Layers: 2 #Units: 77 LR: 1e-05 Epoch 50: train loss: 0.7057189345359802
#Lay

In [41]:
print(f'Melhor modelo: {best_model}')
print(f'Pior modelo: {worst_model}')

Melhor modelo: {'model': Feedforward(
  (fc): ModuleList(
    (0): Linear(in_features=2, out_features=77, bias=True)
    (1): Linear(in_features=77, out_features=77, bias=True)
    (2): Linear(in_features=77, out_features=1, bias=True)
  )
), 'optimizer': SGD (
Parameter Group 0
    dampening: 0
    lr: 1
    momentum: 0
    nesterov: False
    weight_decay: 0
), 'loss': 0.10805312544107437, 'accuracy': 0.9881656804733728}
Pior modelo: {'model': Feedforward(
  (fc): ModuleList(
    (0): Linear(in_features=2, out_features=92, bias=True)
    (1): Linear(in_features=92, out_features=92, bias=True)
    (2): Linear(in_features=92, out_features=92, bias=True)
    (3): Linear(in_features=92, out_features=1, bias=True)
  )
), 'optimizer': SGD (
Parameter Group 0
    dampening: 0
    lr: 10
    momentum: 0
    nesterov: False
    weight_decay: 0
), 'loss': 50.79051208496094, 'accuracy': 0.47337278106508873}


In [42]:
def plot_decision_graph(x, model, y_true):
  x0_contour = torch.linspace(min(x[:, 0])-0.2, max(x[:, 0])+0.2, 100)
  x1_contour = torch.linspace(min(x[:, 1])-0.2, max(x[:, 1])+0.2, 100)

  y_contour = np.zeros((len(x0_contour), len(x1_contour)))
  for idx, x1_item in enumerate(x1_contour):
    x0_array = x0_contour.numpy()
    x1_array = np.full_like(x0_array, x1_item)
    y_pred_logits = model(torch.tensor(np.stack((x0_array, x1_array), axis=-1)))
    y_pred_labels = [1 if value >= 0.5 else 0 for value in y_pred_logits]
    y_contour[idx] = y_pred_labels
  
  colorscale = [[0., 'gold'], [1., 'mediumturquoise']]
  fig = go.Figure()
  fig.add_trace(go.Contour(
        z=y_contour,
        x=x0_contour.numpy(), # horizontal axis
        y=x1_contour.numpy(), # vertical axis
        opacity=0.3,
        contours=dict(
            start=0,
            end=1,
            size=1,
        ),
        colorscale=colorscale
    ))
  fig.add_trace(go.Scatter(
        marker_color=[1. if value == True else 0. for value in (y_true.squeeze().numpy() >= 0.5)],
        x=x[:, 0], # horizontal axis
        y=x[:, 1], # vertical axis
        mode='markers',
        marker={'colorscale': [[0., 'red'], [1., 'blue']] }
    ))
  fig.show()

In [43]:
plot_decision_graph(x_test, best_model['model'], y_test)


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



In [44]:
plot_decision_graph(x_test, worst_model['model'], y_test)


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



**Questão 2.** Neste exercício, você deve escolher um dataset de sua preferência (exceto Boston Housing e MNIST) para avaliar redes neurais. Você deve minimamente:
* Explicar o problema associado ao dataset escolhido;
* Comparar a sua rede neural com um modelo mais simples (ex: k-NN ou regress ̃ao logística);
* Plotar a evolução da função custo (loss) ao longo do treinamento (épocas) para verificar a corretude do algoritmo de treinamento.
* Reportar taxas de acerto (ou erro) nos conjuntos de treinamento e teste para diferentes configurações da rede neural.

Obs: UCI Datasets (https://archive.ics.uci.edu/ml/datasets.php) e a biblioteca TorchVision (https://pytorch.org/docs/stable/torchvision/datasets.html) possuem várias opções de datasets.

O dataset escolhido foi o Kaggle Dogs vs Cats, disponível em https://www.microsoft.com/en-us/download/details.aspx?id=54765 e está contido na bilbioteca de datasets do Tensorflow, disponível em https://www.tensorflow.org/datasets/catalog/cats_vs_dogs. Como o nome sugere, o dataset visa classificar imagens de gatos ou cachorros, podendo ser um problema de classificação binária. Aqui as imagens serão centralizadas em um tamanho razoável para a identificação e não serão feitas análises de alternativas de redes que possam resolver bem esse problema, visto que o intuito principal é demonstrar o comportamento de uma MLP.

In [3]:
import torchvision
import torch

from PIL import Image

In [4]:
!wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip

https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip
Resolving download.microsoft.com (download.microsoft.com)... 23.193.24.126, 2600:1417:76:586::e59, 2600:1417:76:58e::e59
Connecting to download.microsoft.com (download.microsoft.com)|23.193.24.126|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 824894548 (787M) [application/octet-stream]
Saving to: ‘kagglecatsanddogs_3367a.zip.1’


(164 MB/s) - ‘kagglecatsanddogs_3367a.zip.1’ saved [824894548/824894548]



In [5]:
!unzip kagglecatsanddogs_3367a.zip

Archive:  kagglecatsanddogs_3367a.zip
replace PetImages/Cat/0.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [14]:
!ls

 kagglecatsanddogs_3367a.zip	'MSR-LA - 3467.docx'  'readme[1].txt'
 kagglecatsanddogs_3367a.zip.1	 PetImages	       sample_data


In [15]:
if torch.cuda.is_available():  
  dev = "cuda:0" 
else:  
  dev = "cpu"  
device = torch.device(dev)

In [16]:
device

device(type='cuda', index=0)

In [25]:
folder = 'PetImages'
image_size = 128

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize((image_size, image_size)),
    torchvision.transforms.ToTensor(), # (3, 128, 128)
    #torchvision.transforms.Lambda(lambda x: x.to(device)),
    torchvision.transforms.Lambda(lambda x: x.flatten())
])

def verify_image(image):
    try:
        Image.open(image).verify()
        return True
    except:
        return False

ds = torchvision.datasets.ImageFolder(folder, transform=transforms, is_valid_file=verify_image)


Possibly corrupt EXIF data.  Expecting to read 32 bytes but only got 0. Skipping tag 270


Possibly corrupt EXIF data.  Expecting to read 5 bytes but only got 0. Skipping tag 271


Possibly corrupt EXIF data.  Expecting to read 8 bytes but only got 0. Skipping tag 272


Possibly corrupt EXIF data.  Expecting to read 8 bytes but only got 0. Skipping tag 282


Possibly corrupt EXIF data.  Expecting to read 8 bytes but only got 0. Skipping tag 283


Possibly corrupt EXIF data.  Expecting to read 20 bytes but only got 0. Skipping tag 306


Possibly corrupt EXIF data.  Expecting to read 48 bytes but only got 0. Skipping tag 532


Corrupt EXIF data.  Expecting to read 2 bytes but only got 0. 



In [26]:
subset_size = int(len(ds) * 0.1) # O dataset é muito grande, pegando apenas 10% dele
ignored_size = len(ds) - subset_size
train_size = int(0.8 * subset_size)
test_size = subset_size - train_size

subset, _ = torch.utils.data.random_split(ds, [subset_size, ignored_size])
train_dataset, test_dataset = torch.utils.data.random_split(subset, [train_size, test_size])

In [27]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=512)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=512)

In [45]:
class LogisticRegression(torch.nn.Module):
  def __init__(self):
    super(LogisticRegression, self).__init__()
    self.linear = torch.nn.Linear(3 * 128 * 128, 1)
  def forward(self, x):
    x = F.sigmoid(self.linear(x))
    return x
baseline_model = LogisticRegression()

In [46]:
best_baseline_model = {'model': None, 'optimizer': None, 'loss': sys.maxsize, 'accuracy': 0}
learning_rates = [10**-x for x in range(4, 1, -1)]

for learning_rate in learning_rates:
  baseline_model.train()
  criterion = torch.nn.BCELoss(size_average=True)
  optimizer = torch.optim.SGD(baseline_model.parameters(), lr=learning_rate)
  for epoch in range(20):
    for idx, (images, labels) in enumerate(train_loader):
      baseline_model.train()
      optimizer.zero_grad()
      # Forward pass
      y_pred = baseline_model(images)
      # Compute Loss
      loss = criterion(y_pred.squeeze(), labels.float())
      # Backward pass
      loss.backward()
      optimizer.step()

  with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
      y_pred = baseline_model(images)
      correct += ((y_pred.squeeze() >= 0.5) == labels.float()).sum().item()
      total += labels.shape[0]

    accuracy = correct / total
    print(f'LR: {learning_rate} Train loss: {loss.item()} Test accuracy: {accuracy}')

    if loss.item() < best_baseline_model['loss']:
      best_baseline_model['model'] = baseline_model
      best_baseline_model['optimizer'] = optimizer
      best_baseline_model['loss'] = loss.item()
      best_baseline_model['accuracy'] = accuracy


size_average and reduce args will be deprecated, please use reduction='mean' instead.


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



LR: 0.0001 Train loss: 0.6929877996444702 Test accuracy: 0.508
LR: 0.001 Train loss: 1.1297045946121216 Test accuracy: 0.482
LR: 0.01 Train loss: 12.711209297180176 Test accuracy: 0.496


In [47]:
class FeedForward(torch.nn.Module):
  def __init__(self, input_size, output_size, number_layers, number_units):
    super(FeedForward, self).__init__()
    self.input_size = input_size
    self.output_size = output_size
    self.number_layers = number_layers
    self.number_units = number_units
    fc = torch.nn.ModuleList() # fully connected
    for i in range(number_layers):
      if i == 0:
        fc.append(torch.nn.Linear(input_size, number_units))
      elif i == number_layers-1:
        fc.append(torch.nn.Linear(number_units, output_size))
      else:
        fc.append(torch.nn.Linear(number_units, number_units))
    self.fc = fc

  def forward(self, x):
    for i in range(self.number_layers):
      if i == self.number_layers-1:
        x = F.sigmoid(self.fc[i](x))
      else:
        x = F.relu(self.fc[i](x))
    return x

In [None]:
number_units = range(2, 23, 10)
number_layers = range(2, 11, 4)
learning_rates = [10**-x for x in range(3, -1, -1)]
epoch = 10

best_model = {'model': None, 'optimizer': None, 'loss': sys.maxsize, 'accuracy': 0}

for learning_rate in learning_rates:
  for number_layer in number_layers:
    for number_unit in number_units:
        model = FeedForward(3 * 128 * 128, 1, number_layer, number_unit)
        criterion = torch.nn.BCELoss()
        optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
        model.train()
        for epoch in range(1, epoch+1):
          for idx, (images, labels) in enumerate(train_loader):
            optimizer.zero_grad()
            # Forward pass
            y_pred = model(images)
            # Compute Loss
            loss = criterion(y_pred.squeeze(), labels.float())
            # Backward pass
            loss.backward()
            optimizer.step()
          print(f'#Layers: {number_layer} #Units: {number_unit} LR: {learning_rate} Epoch {epoch}: train loss: {loss.item()}')

        with torch.no_grad():
          correct = 0
          total = 0
          for images, labels in test_loader:
            y_pred = model(images)
            correct += ((y_pred.squeeze() >= 0.5) == labels.float()).sum().item()
            total += labels.shape[0]

          accuracy = correct / total
          print(f'#Layers: {number_layer} #Units: {number_unit} LR: {learning_rate} Train loss: {loss.item()} Test accuracy: {accuracy}')

          if loss.item() < best_model['loss']:
            best_model['model'] = model
            best_model['optimizer'] = optimizer
            best_model['loss'] = loss.item()
            best_model['accuracy'] = accuracy


nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.



#Layers: 2 #Units: 2 LR: 0.001 Epoch 1: train loss: 0.708114504814148
#Layers: 2 #Units: 2 LR: 0.001 Epoch 2: train loss: 0.7080109715461731
#Layers: 2 #Units: 2 LR: 0.001 Epoch 3: train loss: 0.7079711556434631
#Layers: 2 #Units: 2 LR: 0.001 Epoch 4: train loss: 0.7079395055770874
#Layers: 2 #Units: 2 LR: 0.001 Epoch 5: train loss: 0.7078931331634521
#Layers: 2 #Units: 2 LR: 0.001 Epoch 6: train loss: 0.707851767539978
#Layers: 2 #Units: 2 LR: 0.001 Epoch 7: train loss: 0.7078334093093872
#Layers: 2 #Units: 2 LR: 0.001 Epoch 8: train loss: 0.7077964544296265
#Layers: 2 #Units: 2 LR: 0.001 Epoch 9: train loss: 0.7077520489692688
#Layers: 2 #Units: 2 LR: 0.001 Epoch 10: train loss: 0.7076995372772217
#Layers: 2 #Units: 2 LR: 0.001 Train loss: 0.7076995372772217 Test accuracy: 0.518
#Layers: 2 #Units: 12 LR: 0.001 Epoch 1: train loss: 0.6939381957054138
#Layers: 2 #Units: 12 LR: 0.001 Epoch 2: train loss: 0.692548930644989
#Layers: 2 #Units: 12 LR: 0.001 Epoch 3: train loss: 0.6916207671

### Observações e Limitações
* Usei apenas 10% do dataset pois o tempo de treinamento do dataseet inteiro para uma Rede Neural simples é inviável
* Durante o treino não armazenei a queda do loss no formato de lista para plotar a diferença, pois o uso de memória da GPU ficou no limite
* Após alguns treinamentos a GPU passou a dar um erro de memória que persistia mesmo restartando o kernel, a única solução possível foi voltar para CPU, deixando o treinamento mais lento ainda
* Apesar de ter um resultado um pouco melhor, nem a Regressão Logística nem o MLP conseguem atingir uma bom resultado neste dataset.