# Deep Learning Brasilia - Lição 3

Neste notebook, veremos como executar todas as etapas para participação da competição do Kaggle **Dog Breed Classification**.



Esqueleto de treino de modelo para a competição Dog Breed Identification: https://www.kaggle.com/c/dog-breed-identification

## Obtenção dos dados

1. Instalar cliente de linha de comando oficial do Kaggle:
```
pip install kaggle
```
2. Acessar sessão "account" do site do Kaggle (trocar USUARIO pelo nome de seu usuário):
https://www.kaggle.com/USUARIO/account
3. Clicar em "Create New API Token" para baixar o arquivo `kaggle.json`.
4. Enviar o token baixado localmente para a pasta `~/.kaggle` no servidor (exemplo de envio para o Paperspace):
```
scp ~/Downloads/kaggle.json paperspace@seu.ip.aq.ui:~/.kaggle
```
5. Testar listando as competições contendo a palavra "breed":
```
kaggle competitions list -s breed`
```
6. Acessar seção "Rules" da competição de um dos links abaixo e clicar em "I Understand and Accept" (somente assim o download é liberado):
https://www.kaggle.com/c/dog-breed-identification/rules
7. Baixar dados da competição (Os dados ficam salvos na pasta ~/.kaggle/competitions/dog-breed-identification):
```
kaggle competitions download -c dog-breed-identification`
```
8. Descompactar arquivos baixados.
```
cd ~/.kaggle/competitions/dog-breed-identification/
unzip '*.zip'
```

## Treino do Modelo

In [1]:
# Setting AutoReload
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import sys, os
BASE_DIR = '/home/paperspace/fastai/courses/dl1/'
sys.path.append(BASE_DIR)

In [3]:
from fastai.conv_learner import ConvLearner
from fastai.transforms import tfms_from_model
from fastai.dataset import ImageClassifierData, get_cv_idxs
from torchvision.models import resnet34

import numpy as np
import pandas as pd

SIZE = 224

In [4]:
DATA_DIR = os.path.join('/home','paperspace', '.kaggle', 'competitions','dog-breed-identification')
print(DATA_DIR)

/home/paperspace/.kaggle/competitions/dog-breed-identification


In [5]:
tfms = tfms_from_model(resnet34, SIZE)
labels_csv = os.path.join(DATA_DIR, 'labels.csv')
n = len(list(open(labels_csv)))-1
val_idxs = get_cv_idxs(n)

In [6]:
#data = ImageClassifierData.from_paths(DATA_DIR, tfms=transformations)
data = ImageClassifierData.from_csv(DATA_DIR, 'train', labels_csv, tfms=tfms, suffix='.jpg', val_idxs=val_idxs, test_name='test')
learn = ConvLearner.pretrained(resnet34, data, precompute=True)
learn.fit(0.01, 2)

epoch      trn_loss   val_loss   accuracy                   
    0      2.15012    1.115417   0.762305  
    1      1.105205   0.759622   0.806315                   



[0.75962174, 0.806315103545785]

## Aperfeiçoamento do Modelo

Sinta-se à vontade!

## Submissão dos resultados

Carregarmos o aruqivo de exemplo de submissão para e mostramos abaixo.

In [7]:
import pandas as pd
d=pd.read_csv(os.path.join(DATA_DIR, 'sample_submission.csv'))
d.head()

Unnamed: 0,id,affenpinscher,afghan_hound,african_hunting_dog,airedale,american_staffordshire_terrier,appenzeller,australian_terrier,basenji,basset,...,toy_poodle,toy_terrier,vizsla,walker_hound,weimaraner,welsh_springer_spaniel,west_highland_white_terrier,whippet,wire-haired_fox_terrier,yorkshire_terrier
0,000621fb3cbb32d8935728e48679680e,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,...,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333
1,00102ee9d8eb90812350685311fe5890,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,...,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333
2,0012a730dfa437f5f3613fb75efcd4ce,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,...,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333
3,001510bc8570bbeee98c8d80c8a95ec1,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,...,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333
4,001a5f3114548acdefa3d4da05474c2e,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,...,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333,0.008333


Na função `learn.predict()` a seguir, utilizamos o parâmetro `is_test=True` para gerar as predições dos dados de teste ao invés das prediçõe dos dados de validação, o que é o padrão.

In [8]:
log_preds = learn.predict(is_test=True)
probs = np.exp(log_preds)

Utilizando o Pandas, criamos um data frame com a mesma estrutura do arquivo de submissão de exemplo, utilizando as probabilidades inferidas pelo mdelo nos dados de teste como dados.

In [9]:
# Cria data frame para submissão, com as probabilidades calculadas pelo modelo
df=pd.DataFrame(
    data=probs,
    columns=d.columns[1:], # Excluir primeira coluna, que é o ID
    index=[f[5:-4] for f in data.test_dl.dataset.fnames] 
)
df.index.name = 'id'

# Salva dataframe em arquivo CSV para envio para o Kaggle
arquivo_submissao = os.path.join(DATA_DIR,'fastai_submission.csv')
df.to_csv(arquivo_submissao)

Formata comando para submissão através da ferramenta oficial do Kaggle

In [10]:
arquivo_submissao = '/home/paperspace/.kaggle/competitions/dog-breed-identification/fastai_submission.csv'
comando = f'kaggle competitions submit -c dog-breed-identification -f {arquivo_submissao} -m "Submissão de teste"'
print(comando)

kaggle competitions submit -c dog-breed-identification -f /home/paperspace/.kaggle/competitions/dog-breed-identification/fastai_submission.csv -m "Submissão de teste"


Ao executar a linha abaixo, o comando de submissão, armazenado na variável `comanando` é executado. Este mesmo comando pode ser executado diretamente no shell do PaperSpace. Após alguns segundos, você deverá receber a mensagem "Successfully submitted to Dog Breed Identification"

In [11]:
!($comando)

Successfully submitted to Dog Breed Identification