<h1>Pento Machine Learning Engineer Challenge</h1>
<h3>Juan Andres Tabarez Santias's solution</h3>

<h4>Summary</h4>

As the dataset is very short, I think the best solution is using a pre-trained model.
The chosen one is ResNet18, and the way I'm going to use it is by replacing the output layer of the model with my own output layer, this allows me to use the output of the pre-trained model as input of my model.

 I'll start by importing the libraries I'm gonna use.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader

Then I load the model, setting the parameter pretrained to True, this means the model has been trained before.

In [2]:
model = models.resnet18(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/juan/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 48.5MB/s]


I'm going to use compose to these transforms. First I resize the image to 224x224, because it is the input expected by the model ResNet18. Second I convert it to a tensor, because I'm working with PyTorch and this framework require the input data to be in the form of a tensor. And finally I normalize the image using mean and standard desviation for each channel, the values are taken from the dataset the model was pre-trained on. 

In [3]:
composed = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

Load the data from the respective folder.

In [4]:
dataset = datasets.ImageFolder(root='./pento-ssr-challenge/dogs', transform=composed)

I split the dataset, 80% for training and 20% for validation.

In [6]:
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [int(len(dataset) * 0.8), len(dataset) - int(len(dataset) * 0.8)])


Like I will only change the output layer of the pre-trained model, I'm going to set the parameters requires grad to False.

In [7]:
for param in model.parameters():
    param.requires_grad = False

Now I'll replace the output layer with my own fully connected layer, as the last hidden layer of the pre-trained model has 512 neurons it will have 512 inputs, and 4 outputs, one for each class.

In [8]:
model.fc = nn.Linear(512, 4)

Create a DataLoader object for the training data.

In [9]:
train_loader = DataLoader(dataset=train_dataset, batch_size=16)

val_loader = DataLoader(dataset=val_dataset, batch_size=16)

Here I create the criterion function and the optimizer object. The optimizer only use parameters where the grad attrubute is set to True, so I only update the parameters of the last layer.  

In [10]:
criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam([parameters for parameters in model.parameters() if parameters.requires_grad], lr=0.001)

Finally I train the model using Mini-Batch Stochastic Gradient Descent and print the accuracy of the model.

In [13]:
num_epochs = 5

for epoch in range(num_epochs):
    for x, y in train_loader:
        model.train()
        optimizer.zero_grad()
        z = model(x)
        loss = criterion(z, y)
        loss.backward()
        optimizer.step()

    correct = 0
    for x_test, y_test in val_loader:
        model.eval()
        z = model(x_test)
        _,yhat = torch.max(z.data, 1)
        correct += (yhat == y_test).sum().item()

    accuracy = correct / len(val_dataset)

print(f"Accuracy: {accuracy}")

Accuracy: 1.0


This class and the code below allow us to make a prediction of an image using the model.

In [None]:
from tkinter import Image

def image_prediction(image_path):
    image = composed(Image.open(image_path))
    model.eval()
    output = model(image)
    _, prediction = torch.max(output, 1)
    return dataset.classes[prediction.item()]
    

In [None]:
image_path= 'path to image'
predicted_class = image_prediction(image_path)
print(f"The predicted class is:  {predicted_class}")