<a href="https://colab.research.google.com/github/steffaneribeiro/Projeto_IV/blob/main/Projeto_IV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Links de referência:

*   https://aws.amazon.com/pt/blogs/aws-brasil/criando-um-workflow-de-rotulamento-treinamento-e-deploy-de-machine-learning-utilizando-o-amazon-sagemaker-pytorch-e-amazon-sagemaker-ground-truth/
*   https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html
*   https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html

Conjunto de dados utilizado:

*   http://www.vision.caltech.edu/Image_Datasets/Caltech256/



# **Treinamento e deploy de machine learning utilizando o Amazon SageMaker e PyTorch**

---


Meu projeto consistiu em treinar um modelo para classificar diferentes imagens de copos. Estas imagens foram selecionadas a partir do dataset público mais de 30 mil imagens rotuladas em 256 categorias. A partir deste dataset, trabalhei com uma amostra das seguintes 4 classes:

*   Beer Mug (caneca de cerveja);
*   Coffee Mug (caneca de café);
*   Teapot (bule de chá);
*   Wine Bottle (garrafa de vinho);

Selecionei 20 imagens para treinamento e 5 para teste.

Para começar, segui os seguintes passos:

1.   Criei um bucket no Amazon S3, chamado piiv;
2.   Em seguida, selecione vinte imagens aleatórias do dataset citado anteriormente, considerando as quatro categorias mencionadas, e fiz o upload dessas imagens para o S3, utilizando dois diretórios, "train" e "test".
3.   Seguindo agora para o Amazon SageMaker, iniciei pelo treinamento do modelo. Utilizaando uma técnica de Machine Learning chamada [Transfer Learning](https://cs231n.github.io/transfer-learning/), fiz o upload do script transfer_learning.py com o código completo empregando a técnica de transfer learning e o framework PyTorch. Esse script foi o ponto de entrada para o treinamento e deploy do meu modelo:

In [None]:
import argparse
import json
import logging
import os
import time
import sys
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.utils.data
import torch.utils.data.distributed
import torchvision
import numpy as np
from torchvision import datasets, transforms, models
import copy
from collections import OrderedDict

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))

def _get_data_loader(batch_size, training_dir, is_distributed, **kwargs):
    logger.info("Get dataset into data_loader")
    
    data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),
        transforms.RandomRotation(degrees=15),
        transforms.RandomHorizontalFlip(),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    }

    data_dir = training_dir
    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'test']}

    train_sampler = torch.utils.data.distributed.DistributedSampler(dataset) if is_distributed else None
    
    dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
                                                shuffle=train_sampler is None, 
                                                sampler=train_sampler, **kwargs)
                  for x in ['train', 'test']}
    
    dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'test']}

    return dataloaders, dataset_sizes

def model_fn(model_dir):
    try:
        logger.info('model_fn')
        device = "cuda" if torch.cuda.is_available() else "cpu"
        with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
            ckpt = torch.load(f, map_location='cpu')
        optimizer = ckpt['optimizer']
        epoch = ckpt['epoch']
        model = ckpt['model']
        load_dict = OrderedDict()
        for k, v in model.items():
            if k.startswith('module.'):
                k_ = k.replace('module.', '')
                load_dict[k_] = v
            else:
                load_dict[k] = v
        
        model = models.resnet18(pretrained=False)
        num_ftrs = model.fc.in_features
        
        model.fc = nn.Sequential(
            nn.Linear(num_ftrs, 256),
            nn.ReLU(),
            nn.Dropout(0.4),
            nn.Linear(256, 4), 
            nn.LogSoftmax(dim=1) # For using NLLLoss()
        )
        
        model.load_state_dict(load_dict)
        return model.to(device)
    except Exception as err:
        print(err)
        raise

def save_model(model, optimizer, epoch, model_dir):
    logger.info("Saving the model.")
    path = os.path.join(model_dir, 'model.pth')
    # recommended way from http://pytorch.org/docs/master/notes/serialization.html
    torch.save(
        {
            "model" : model.state_dict(), 
            "optimizer": optimizer.state_dict(),
            "epoch": epoch
        },
        path)

def train_model(dataloaders, dataset_sizes, device, model, criterion, optimizer, 
                scheduler, num_epochs=10):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'test']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'test' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

def train(args):
    is_distributed = len(args.hosts) > 1 and args.backend is not None
    logger.debug("Distributed training - {}".format(is_distributed))
    use_cuda = args.num_gpus > 0
    logger.debug("Number of gpus available - {}".format(args.num_gpus))
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    device = torch.device("cuda" if use_cuda else "cpu")

    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(args.hosts)
        os.environ['WORLD_SIZE'] = str(world_size)
        host_rank = args.hosts.index(args.current_host)
        dist.init_process_group(backend=args.backend, rank=host_rank, world_size=world_size)
        logger.info('Initialized the distributed environment: \'{}\' backend on {} nodes. '.format(
            args.backend, dist.get_world_size()) + 'Current host rank is {}. Number of gpus: {}'.format(
            dist.get_rank(), args.num_gpus))

    # set the seed for generating random numbers
    torch.manual_seed(args.seed)
    if use_cuda:
        torch.cuda.manual_seed(args.seed)

    dataloaders, dataset_sizes = _get_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs)

    model_ft = models.resnet18(pretrained=True)
    num_ftrs = model_ft.fc.in_features
    
    # Change the final layer of ResNet18 Model for Transfer Learning
    #model_ft.fc = nn.Linear(num_ftrs, 4)
    model_ft.fc = nn.Sequential(
        nn.Linear(num_ftrs, 256),
        nn.ReLU(),
        nn.Dropout(0.4),
        nn.Linear(256, 4), 
        nn.LogSoftmax(dim=1) # For using NLLLoss()
    )

    model_ft = model_ft.to(device)
    if is_distributed and use_cuda:
        # multi-machine multi-gpu case
        model_ft = torch.nn.parallel.DistributedDataParallel(model_ft)
    else:
        # single-machine multi-gpu case or single-machine or multi-machine cpu case
        model_ft = torch.nn.DataParallel(model_ft)

    criterion = nn.NLLLoss()

    # Observe that all parameters are being optimized
    optimizer_ft = optim.SGD(model_ft.parameters(), lr=args.lr, momentum=args.momentum)

    # Decay LR by a factor of 0.1 every 7 epochs
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

    # Training
    model_ft = train_model(dataloaders, dataset_sizes, device, model_ft, criterion, 
                            optimizer_ft, exp_lr_scheduler, args.epochs)

    # Save Model
    save_model(model_ft, optimizer_ft, args.epochs, args.model_dir)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Data and model checkpoints directories
    parser.add_argument('--batch-size', type=int, default=4, metavar='N',
                        help='input batch size for training (default: 4)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--lr', type=float, default=0.001, metavar='LR',
                        help='learning rate (default: 0.001)')
    parser.add_argument('--momentum', type=float, default=0.9, metavar='M',
                        help='SGD momentum (default: 0.9)')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=100, metavar='N',
                        help='how many batches to wait before logging training status')
    parser.add_argument('--backend', type=str, default=None,
                        help='backend for distributed training (tcp, gloo on cpu and gloo, nccl on gpu)')

    # Container environment
    parser.add_argument('--hosts', type=list, default=json.loads(os.environ['SM_HOSTS']))
    parser.add_argument('--current-host', type=str, default=os.environ['SM_CURRENT_HOST'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--data-dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
    parser.add_argument('--num-gpus', type=int, default=os.environ['SM_NUM_GPUS'])

    train(parser.parse_args())

4. Logo em segida, criei a instância de notebook  utilizando o ***kernel conda_pytorch_p36***

OBS: Como o propósito deste projeto para teste de alguns dos serviços de aprendizagem de máquina disponíveis na plataforma AWS e praticar o conhecimento adquirido no curso AWS Academy Machine Learning Foundations, pode ser utilizada uma instância do tipo ***ml.t3.medium***.


> Num primeiro trecho desse notebook importei algumas dependências que irei utilizar e definir o caminho para o bucket




In [None]:
import os, json
import pandas as pd
import fnmatch
import boto3
from botocore.config import Config
import sagemaker

from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

sagemaker_session = sagemaker.Session()

s3 = boto3.resource('s3')

# Destination Bucket
bucket = 'piiv'
role = sagemaker.get_execution_role()



> No mesmo notebook, criei um estimator do SageMaker, que possui as instruções necessárias para o processo de treinamento



In [None]:
from sagemaker.pytorch import PyTorch

# Criando o Estimator
pytorch_estimator = PyTorch('transfer_learning.py',
                            role=role,
                            instance_type='ml.m5.large',
                            instance_count=1,
                            framework_version='1.5.0',
                            py_version='py3',
                           )



> Adicionando mais um trecho de código, agora iremos pegar o caminho para as imagens no S3 e passa como parâmetro para o processo de treinamento do SageMaker. Durante este treinamento o SageMaker irá lançar as instâncias necessárias (definidas no passo anterior), executar o treinamento, salvar o modelo treinado no S3 e encerrar as instâncias.



In [None]:
bucket_uri = 's3://' + bucket
pytorch_estimator.fit({'training': bucket_uri})

> Após a conclusão do processo de treinamento, é possível fazer o deploy do modelo em uma instância de inferência do SageMaker. Esta instância é otimizada e preparada com as dependências necessárias para reduzir a necessidade de gerenciamento de infraestrutura.

> Através do seguinte comando é possível criar uma instância de inferência para testar o modelo:



In [None]:
predictor = pytorch_estimator.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

#Testando o Modelo

Para testar, utilizei o script abaixo, que irá utilizar uma imagem da pasta de validation, composta por imagens do dataset original, mas que não foram utilizadas no processo do treinamento.

In [None]:
from PIL import Image
import torch
from torchvision import transforms
import numpy as np

loader = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])        

def image_loader(image_name):
    image = Image.open(image_name)
    image = loader(image).float()
    image = image.unsqueeze(0) 
    return image

s3 = boto3.resource('s3')
bucket = s3.Bucket('piiv')
object = bucket.Object('010_0009.jpg') #key
response = object.get()
img = response['Body']

image = image_loader(img)

objects_category = ['01-beer-mug','02-coffee-mug','03-teapot','04-wine-bottle']

response = predictor.predict(image)
output = torch.exp(torch.tensor(response))
index = np.argmax(output)
print("Result --> label: " + objects_category[index] + " | probability: " + str(output[0][index]))

O resultado é o output abaixo:

In [None]:
Result --> label: 01-beer-mug | probability: tensor(0.6802, dtype=torch.float64)



> 

