This model is able to recognize fresh and rotten fruit. I achieved the model to a validation accuracy of `92%`, I have used some combination of transfer learning, data augmentation, and fine tuning. 

In [1]:
import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms.v2 as transforms
import torchvision.io as tv_io

import glob
from PIL import Image

import utils

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.is_available()

True

## 7.1 The Dataset

 The dataset comes from [Kaggle](https://www.kaggle.com/sriramr/fruits-fresh-and-rotten-for-classification). The dataset structure is in the `data/fruits` folder. There are 6 categories of fruits: fresh apples, fresh oranges, fresh bananas, rotten apples, rotten oranges, and rotten bananas. This will mean that your model will require an output layer of 6 neurons to do the categorization successfully. we'll also need to compile the model with `categorical_crossentropy`, as we have more than two categories.

<img src="./images/fruits.png" style="width: 600px;">

## 7.2 Load ImageNet Base Model

started with a model pretrained on ImageNet. Loaded the model with the correct weights. Because these pictures are in color, there will be three channels for red, green, and blue. I've filled in the input shape. A reference for setting up the pretrained model, please take a look at [notebook 05b](05b_presidential_doggy_door.ipynb) where transfer learning is implemented.

In [2]:
from torchvision.models import vgg16
from torchvision.models import VGG16_Weights

weights = VGG16_Weights.IMAGENET1K_V1
vgg_model = vgg16(weights=weights)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to C:\Users\amohd/.cache\torch\hub\checkpoints\vgg16-397923af.pth


100%|██████████| 528M/528M [00:18<00:00, 29.3MB/s] 


## 7.3 Freeze Base Model

 Freezing the base model, as done in [notebook 05b](05b_presidential_doggy_door.ipynb). This is done so that all the learning from the ImageNet dataset does not get destroyed in the initial training.

In [4]:
# Freeze base model
vgg_model.requires_grad_(False)
next(iter(vgg_model.parameters())).requires_grad

False

## 7.4 Add Layers to Model

Now it's time to add layers to the pretrained model. [Notebook 05b](05b_presidential_doggy_door.ipynb) is used as a guide. Payed close attention to the last dense layer and make sure it has the correct number of neurons to classify the different types of fruit.

The later layers of a model become more specific to the data the model trained on. Since we want the more general learnings from VGG, we can select parts of it, like so:

In [5]:
vgg_model.classifier[0:3]

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
)

Once we've taken what we've wanted from VGG16, then we can add our own modifications. No matter what additional modules we add, we still need to end with one value for each output.

In [6]:
N_CLASSES = 6

my_model = nn.Sequential(
    vgg_model.features,
    vgg_model.avgpool,
    nn.Flatten(),
    vgg_model.classifier[0:3],
    nn.Linear(4096, 500),
    nn.ReLU(),
    nn.Linear(500, N_CLASSES)
)
my_model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

## 7.5 Compile Model

Now it's time to compile the model with loss and metrics options. We have 6 classes, so CrossEntropyLoss() loss function is used?

In [7]:
# Unfreeze the entire base model for fine-tuning
vgg_model.requires_grad_(True)

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer with a very low learning rate for fine-tuning
optimizer = Adam(my_model.parameters(), lr=1e-5)

# Optional: learning rate scheduler
from torch.optim.lr_scheduler import ReduceLROnPlateau
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=2)


## 7.6 Data Transforms

To preprocess input images, we use the transforms included with the VGG16 weights.

In [8]:
pre_trans = weights.transforms()

Randomly augmented the data to improve the dataset. look at [notebook 04a](04a_asl_augmentation.ipynb) and [notebook 05b](05b_presidential_doggy_door.ipynb) for augmentation examples. There is also documentation for the [TorchVision Transforms class](https://pytorch.org/vision/stable/transforms.html).

**Hint**: Remember not to make the data augmentation too extreme.

In [18]:
IMG_WIDTH, IMG_HEIGHT = (224, 224)

random_trans = transforms.Compose([
    transforms.Resize((IMG_WIDTH, IMG_HEIGHT)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(10),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToDtype(torch.float32, scale=True)
])


## 7.7 Load Dataset

Now it's time to load the train and validation datasets. 

In [19]:
DATA_LABELS = ["freshapples", "freshbanana", "freshoranges", "rottenapples", "rottenbanana", "rottenoranges"] 
    
class MyDataset(Dataset):
    def __init__(self, data_dir):
        self.imgs = []
        self.labels = []
        
        for l_idx, label in enumerate(DATA_LABELS):
            data_paths = glob.glob(data_dir + label + '/*.png', recursive=True)
            for path in data_paths:
                img = tv_io.read_image(path, tv_io.ImageReadMode.RGB)
                self.imgs.append(pre_trans(img).to(device))
                self.labels.append(torch.tensor(l_idx).to(device))


    def __getitem__(self, idx):
        img = self.imgs[idx]
        label = self.labels[idx]
        return img, label

    def __len__(self):
        return len(self.imgs)

Select the batch size `n` and set `shuffle` either to `True` or `False` depending on if we are `train`ing or `valid`ating. For a reference, check out [notebook 05b](05b_presidential_doggy_door.ipynb).

In [20]:
n = 32

train_path = "data/fruits/train/"
train_data = MyDataset(train_path)
train_loader = DataLoader(train_data, batch_size=n, shuffle=True)
train_N = len(train_loader.dataset)

valid_path = "data/fruits/valid/"
valid_data = MyDataset(valid_path)
valid_loader = DataLoader(valid_data, batch_size=n, shuffle=False)
valid_N = len(valid_loader.dataset)

## 7.8 Train the Model

Time to train the model! I've moved the `train` and `validate` functions to our [utils.py](./utils.py) file. Before running the below, all the variables are correctly defined.
It may help to rerun this cell or change the number of `epochs`.

In [21]:
epochs = 20

for epoch in range(epochs):
    print('Epoch: {}'.format(epoch))
    utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)
    utils.validate(my_model, valid_loader, valid_N, loss_function)

Epoch: 0
Train - Loss: 70.9824 Accuracy: 0.6946
Valid - Loss: 16.3028 Accuracy: 0.7781
Epoch: 1
Train - Loss: 21.5779 Accuracy: 0.8130
Valid - Loss: 11.0210 Accuracy: 0.7690
Epoch: 2
Train - Loss: 16.2375 Accuracy: 0.8604
Valid - Loss: 5.5058 Accuracy: 0.8693
Epoch: 3
Train - Loss: 13.0727 Accuracy: 0.8900
Valid - Loss: 9.1529 Accuracy: 0.8359
Epoch: 4
Train - Loss: 9.0874 Accuracy: 0.9188
Valid - Loss: 8.7047 Accuracy: 0.8663
Epoch: 5
Train - Loss: 7.9512 Accuracy: 0.9255
Valid - Loss: 4.6246 Accuracy: 0.8936
Epoch: 6
Train - Loss: 9.7744 Accuracy: 0.9103
Valid - Loss: 3.6779 Accuracy: 0.9027
Epoch: 7
Train - Loss: 7.8933 Accuracy: 0.9239
Valid - Loss: 3.9176 Accuracy: 0.8997
Epoch: 8
Train - Loss: 6.1387 Accuracy: 0.9467
Valid - Loss: 3.8745 Accuracy: 0.9210
Epoch: 9
Train - Loss: 5.5798 Accuracy: 0.9492
Valid - Loss: 4.8223 Accuracy: 0.9301
Epoch: 10
Train - Loss: 5.9077 Accuracy: 0.9501
Valid - Loss: 3.2444 Accuracy: 0.9331
Epoch: 11
Train - Loss: 8.7027 Accuracy: 0.9349
Valid - Lo

## 7.9 Unfreeze Model for Fine Tuning

If reached 92% validation accuracy already, next step is optional. If not, fine tuning the model with a very low learning rate.

In [22]:
# Unfreeze the base model
vgg_model.requires_grad_(True)
optimizer = Adam(my_model.parameters(), lr=.0001)

In [14]:
epochs = 1

for epoch in range(epochs):
    print('Epoch: {}'.format(epoch))
    utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)
    utils.validate(my_model, valid_loader, valid_N, loss_function)

Epoch: 0
Train - Loss: 2.4129 Accuracy: 0.9772
Valid - Loss: 6.2580 Accuracy: 0.9088


## 7.10 Evaluate the Model

we've a model that has a validation accuracy of 92% or higher. If not, you may want to go back and either run more epochs of training, or adjust your data augmentation. 


In [15]:
utils.validate(my_model, valid_loader, valid_N, loss_function)

Valid - Loss: 6.2580 Accuracy: 0.9088
