# Gatos vs. Perros

En este problema hay solo dos clases: gatos y perros. El ejemplo presenta dos bases de datos y es necesario escoger uno de ellos:.


* Reduced dataset: con algunas imágenes de gatos y perros para realizar un entrenamiento más rápido (pero menos efectivo). Para este conjunto se recomienda una CNN simple (de pocas capas, ver modelo para "reduced dataset").

* Full dataset: con un conjunto mayor de imágenes de gatos y perros que se usa para entrenar un modelo más efectivo (con mejor accuracy) pero con un entrenamiento más lento. Para este conjunto se recomienda una CNN compleja (de más capas, ver modelo para "large dataset").

Interesante en este ejemplo es que los conjuntos de datos de training y testing están pre-definidos y no se escogen de manera aleatoria. Sin embargo, el conjunto de validación es escogido aleatoriamente del conjunto de training. Esta información es definida con la función `ImageLoader`.

Referencia: [CNN Model With PyTorch For Image Classification](https://medium.com/thecyphy/train-cnn-model-with-pytorch-21dafb918f48), by Pranjal Soni (Medium)

In [None]:
# Reduced dataset
!wget https://www.dropbox.com/s/kugbkmznlyb4krv/catdog_reduced.zip?dl=0
!unzip -qq catdog_reduced.zip

In [None]:
# Full dataset
!wget https://www.dropbox.com/s/rk2vrsow7yk7651/test_set.zip?dl=0
!wget https://www.dropbox.com/s/6sgiquis94t1r9t/training_set.zip?dl=0

!unzip -qq test_set.zip 
!unzip -qq training_set.zip 

In [None]:
import torch
import torchvision
from   torchvision import transforms
from   torchvision.datasets import ImageFolder
from   torch.utils.data.dataloader import DataLoader
from   torch.utils.data import random_split
from   sklearn.metrics import confusion_matrix, accuracy_score

#train and test data directory
#full dataset
#train_dir = "training_set/"
#test_dir  = "test_set/"
#reduced dataset
train_dir = "catdog/train/"
test_dir  = "catdog/test/"

#load the train and test data
train_set = ImageFolder(train_dir,transform = transforms.Compose([
    transforms.Resize((64,64)),transforms.RandomHorizontalFlip(),transforms.ToTensor()]))

test_set = ImageFolder(test_dir,transforms.Compose([
    transforms.Resize((64,64)),transforms.ToTensor()]))

img,_ = train_set[0]
print('Size image: '+ str(img.shape))
print('   Classes: ',train_set.classes)

In [None]:
!wget https://www.dropbox.com/s/caz30t81td7zxgl/cnn_utils.py?dl=0
from cnn_utils import *
print('cnn_utils module has been loaded')

In [None]:
#display an image in the training dataset
display_img(train_set,5)

In [None]:
#load the train and validation into batches.

batch_size = 100 
val_size   = 800 
train_size = len(train_set) - val_size 

train_data,val_data = random_split(train_set,[train_size,val_size])
print(f"Length of Train Data : {len(train_data)}")
print(f"Length of Validation Data : {len(val_data)}")

train_dl = DataLoader(train_data, batch_size, shuffle = True, num_workers = 2, pin_memory = True)
val_dl = DataLoader(val_data, batch_size, num_workers = 2, pin_memory = True)


In [None]:
show_batch(train_dl,nrow=10)

In [None]:
# for reduced dataset
class CNN_Classification(ImageClassificationBase):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            
            nn.Conv2d(3, 4, kernel_size = 5, stride = 1, padding = 0),
            nn.ReLU(),
            nn.Conv2d(4,8, kernel_size = 5, stride = 4, padding = 0),
            nn.ReLU(),
            nn.MaxPool2d(2,2),            

            nn.Flatten(),
            nn.Linear(392,32),
            nn.Linear(32,2)
        )
    
    def forward(self, xb):
        return self.network(xb)


In [None]:
# for full dataset
class CNN_Classification(ImageClassificationBase):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            
            nn.Conv2d(3, 8, kernel_size = 3, stride = 1, padding = 1),
            nn.ReLU(),
            nn.Conv2d(8,16, kernel_size = 3, stride = 2, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),            

            nn.Conv2d(16, 32, kernel_size = 3, stride = 1, padding = 1),
            nn.ReLU(),
            nn.Conv2d(32,32, kernel_size = 3, stride = 1, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),            

            nn.Flatten(),
            nn.Linear(2048,1024),
            nn.Linear(1024,128),
            nn.Linear(128,2),
            nn.Dropout(0.25)
        )
    
    def forward(self, xb):
        return self.network(xb)


In [None]:
model =  CNN_Classification()
print(model)

In [None]:
#fitting the model on training data and record the result after each epoch
num_epochs = 50
opt_func   = torch.optim.Adam
lr         = 0.0005
history    = fit(num_epochs, lr, model, train_dl, val_dl, opt_func)

In [None]:
plot_accuracies(history)

In [None]:
plot_losses(history)

In [None]:
model = load_model(CNN_Classification,'best_model.pt')

In [None]:
ytest = get_labels(model,test_set)
ypred = get_prediction(model,test_set)

acc   = accuracy_score(ytest,ypred) 
C     = confusion_matrix(ytest,ypred)

print('Performance on Testing subset:')
print('Accuracy:')
print(acc)
print(' ')
print('Confusion Matrix = ')
print(C)

In [None]:
C1,acc1 = performance(model,train_data,'Training')
C2,acc2 = performance(model,val_data,'Validation')
C3,acc3 = performance(model,test_set,'Testing')