# Transfer Learning 


Como reutilizar redes já treinadas.
Nesse script vamos usar uma rede Convolucional, reutilizando os pesos e retreinando apenas a última camada, uma camada densa treinada para o imagenet com 1000 classes.

In [1]:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

In [2]:
%matplotlib inline

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

## Dataset e transformes

A maioria das redes treinadas e disponiveis em torchvision são para entradas de tamanho 224x224, com valores normalizados.

Os canais foram normalizados com médias de **[0.485, 0.456, 0.406]** e desvio padrão de **[0.229, 0.224, 0.225]**

In [16]:
data_dir = 'Cat_Dog_data'

train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])  # alem da normalização está sendo feito o data augmentation para treinamento

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224), 
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])

In [19]:

train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

- Baixando o modelo e exibindo a arquitetura da rede 

In [6]:
model = models.densenet121(pretrained=True)
model

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to C:\Users\igor/.cache\torch\hub\checkpoints\densenet121-a639ec97.pth
100%|█████████████████████████████████████████████████████████████████████████████| 30.8M/30.8M [00:02<00:00, 11.6MB/s]


DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

- A última camada é a **classifier** que será retreinada para o nosso propósito
- O próximo passo é congelar os pesos de toda a rede e realizar a substituição da camada final de classificação

O congelamento impede o treinamento e agiliza a execução da rede que não precisa guardar e calcular os gradientes desses pesos.

In [7]:

# congelando os pesos
for param in model.parameters():
    param.requires_grad = False

In [13]:
# Definindo a nova camada
from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([ 
                                ('fc1', nn.Linear(1024, 512)),
                                ('relu', nn.ReLU()),
                                ('dropout', nn.Dropout(0.2)),
                                ('fc2', nn.Linear(512, 2)),
                                ('output', nn.LogSoftmax(dim=1))
                            ]))

classifier

Sequential(
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.2, inplace=False)
  (fc2): Linear(in_features=512, out_features=2, bias=True)
  (output): LogSoftmax(dim=1)
)

In [14]:
# atribuindo à ultima camada da rede a ser utiizada

model.classifier = classifier

- Device

Para uma rede grande como essa, a demanda computacional será elevada e o tempo de processamento também.
Havendo uma GPU compatível, é aconselhado a utilização da mesma para melhor desempenho.

In [11]:
# verifica se a GPU está disponível
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

- Treinamento

Definindo a função de perda e treinando a nova camada para a task desejada

In [20]:
criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)


In [21]:

model.to(device);

### Treinamento

In [23]:
epochs = 2
steps = 0
running_loss = 0
print_every = 10



for epoch in range(epochs):
    
    for inputs, labels in trainloader:
        steps += 1
        
        inputs = inputs.to(device)
        labels = labels.to(device)
        
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

Epoch 1/2.. Train loss: 0.824.. Test loss: 0.248.. Test accuracy: 0.930
Epoch 1/2.. Train loss: 0.372.. Test loss: 0.132.. Test accuracy: 0.966
Epoch 1/2.. Train loss: 0.231.. Test loss: 0.128.. Test accuracy: 0.969
Epoch 1/2.. Train loss: 0.188.. Test loss: 0.116.. Test accuracy: 0.970
Epoch 1/2.. Train loss: 0.184.. Test loss: 0.124.. Test accuracy: 0.966
Epoch 1/2.. Train loss: 0.160.. Test loss: 0.139.. Test accuracy: 0.965
Epoch 1/2.. Train loss: 0.161.. Test loss: 0.130.. Test accuracy: 0.970
Epoch 1/2.. Train loss: 0.150.. Test loss: 0.126.. Test accuracy: 0.970
Epoch 1/2.. Train loss: 0.131.. Test loss: 0.131.. Test accuracy: 0.970
Epoch 1/2.. Train loss: 0.142.. Test loss: 0.143.. Test accuracy: 0.970
Epoch 1/2.. Train loss: 0.148.. Test loss: 0.141.. Test accuracy: 0.969
Epoch 1/2.. Train loss: 0.139.. Test loss: 0.130.. Test accuracy: 0.973
Epoch 1/2.. Train loss: 0.193.. Test loss: 0.133.. Test accuracy: 0.973
Epoch 1/2.. Train loss: 0.144.. Test loss: 0.144.. Test accuracy

KeyboardInterrupt: 