# Aerial Cactus Identification

Goal : classify whether aerial image has cactus or not
Dataset : [Aerial Cactus Image](https://www.kaggle.com/c/aerial-cactus-identification)

## Table of Contents
- [0. Prerequisites](#prerequisites)
- [1. Data Preparation](#dataprep)
- [2. Model Building](#modelbuild)
- [3. Model Training and Evaluation](#modeltraineval)
- [-1. Summary](#summary)


author @otivedani | github.com/otivedani

---
<a name='prerequisites'></a>
## #0 :: Prerequisites 

*Here we define dependencies, environments and dataset download*

In [None]:
import random
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
from typing import Callable, Optional, Tuple
import matplotlib.pyplot as plt


In [None]:
# DEFINE random seed
RANDOM_SEED = 86
# RANDOM_SEED = random.randrange(sys.maxsize)
#!DEFINE

print(f"seed : {RANDOM_SEED}")

# initiate seeds
random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# use accelerator when available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

Download and extract dataset from https://www.kaggle.com/c/aerial-cactus-identification

Define extracted dataset in the cell below and run.
Expect directory listings something like this :
```
!tree /kaggle/working/aerial-cactus-identification
.
├── sample_submission.csv
├── test.zip
├── train.csv
└── train.zip
```

In [None]:
# run this cell if notebook hosted on kaggle.com to resolve file permission problem
!cp -r /kaggle/input/aerial-cactus-identification /kaggle/working/

In [None]:
# DEFINE our dataset directory, e.g. /kaggle/working/aerial-cactus-identification
ACI_DATASET_PATH = '/kaggle/working/aerial-cactus-identification'
#!DEFINE

!tree -L 1 {ACI_DATASET_PATH}

!test ! -d {ACI_DATASET_PATH}/train && unzip -q {ACI_DATASET_PATH}/train.zip -d {ACI_DATASET_PATH}
!test ! -d {ACI_DATASET_PATH}/test && unzip -q {ACI_DATASET_PATH}/test.zip -d {ACI_DATASET_PATH}

---
<a name='dataprep'></a>
## #1 :: Data Preparation

*Dataset definition, and splitting training and validation*

### Dataset Definition

How to define dataset and the transformations on images and labels are as follows :

In [None]:
# DEFINE how to load dataset, pairing labels and images here

class AerialCactus(torch.utils.data.Dataset):
    """
    Aerial Cactus Identification Dataset 
    from https://www.kaggle.com/c/aerial-cactus-identification
    """
    def __init__(self, 
                 root: str,
                 train: bool = True,
                 transform: Optional[Callable] = None, 
                 target_transform: Optional[Callable] = None,
                ) -> None:
        super().__init__()
        self.root = Path(root)
        self.transform = transform
        self.target_transform = target_transform
        
        # assuming train.zip and test.zip were already unzipped
        mode = 'train/' if train else 'test/'
        self.images_path = self.root/mode
        
        # dataset are based on supplied csv
        csv_dataframe = 'train.csv' if train else 'sample_submission.csv'
        self.df = pd.read_csv(self.root/csv_dataframe)
    
    def __getitem__(self, idx:int) -> Tuple[torch.Tensor, int]:
        """Get each items from dataset by index"""
        
        # supplied csv data have 2 field
        data = self.df.loc[idx]
        
        # `has_cactus` : 1 means the image has cactus, 0 means no cactus
        label = data['has_cactus']
        if self.target_transform:
            label = self.target_transform(label)
            
        # `id` : the filename, which are inside train/test zip
        image = Image.open(self.images_path/data['id'])
        image = np.array(image)
        if self.transform:
            image = self.transform(image)
        if not isinstance(image, torch.Tensor):
            image = transforms.functional.to_tensor(image)
        
        return image, label
    
    def __len__(self) -> int:
        return len(self.df)
    
#!DEFINE

# preview dataset definition
preview_dataset = AerialCactus(ACI_DATASET_PATH, train=True)
label_counts = preview_dataset.df['has_cactus'].value_counts()
label_counts.index = ['has_cactus', 'no_cactus']
print(label_counts)

It seems the dataset is imbalanced, having ratio between class 1 (cactus image) and class 0 (no cactus image) was ~0.33. This could be resolved by adding more data, augmentation, or tweak class weight later, which will be done later.

In [None]:
# DEFINE how to use the dataset, without transformations
base_dataset = AerialCactus(ACI_DATASET_PATH, train=True, transform=None)
#!DEFINE

print(f"Using total of {len(base_dataset)} data")

# preview how our dataset is loaded
base_dataloader = torch.utils.data.DataLoader(base_dataset, batch_size=8, shuffle=True)
example_data = next(iter(base_dataloader))

f, axs = plt.subplots(1,8, figsize=(16,16))
for ax, img, label in zip(axs, *example_data,):
    ax.imshow(np.moveaxis(img.numpy(), 0, -1))
    ax.set_title(['no_cactus', 'has_cactus'][int(label)])

### Train-Validation Split

Assign dataset loader for training and validation in the cell below.

In [None]:
# DEFINE loader batch sizes, train-validation split ratio
TRAIN_RATIO = 0.8
BATCH_SIZE = 32
#!DEFINE 

# split our base dataset to `train` and `validation` datasets
n_train = int(len(base_dataset) * TRAIN_RATIO)
n_valid = len(base_dataset) - n_train
train_dataset, valid_dataset = torch.utils.data.random_split(base_dataset, [n_train, n_valid])

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, num_workers=0, shuffle=True)
valid_dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, num_workers=0, shuffle=True)
print(f"Each loader will load {BATCH_SIZE} each batch")

# count size of each target class from dataloader
train_targets = np.concatenate([target.numpy() for _, target in train_dataloader])
valid_targets = np.concatenate([target.numpy() for _, target in valid_dataloader])

print(f"Using {n_train} data for training and {n_valid} data for validation")
print(     f"Train set class\n 1 : {train_targets.sum()}, 0 : {train_targets.size - train_targets.sum()}")
print(f"Validation set class\n 1 : {valid_targets.sum()}, 0 : {valid_targets.size - valid_targets.sum()}")

---
<a name='modelbuild'></a>
## #2 :: Model Building

*Model architecture, loss and optimizers definition*

### Model Definition

tensor size is (3, 32, 32) (channel, image width-height)

the network architecture we will using was inspired by VGG.

since it is one-class binary classification task, we use sigmoid-based activation (0,1)

In [None]:
# DEFINE model architecture for our task

class ACIModel(nn.Module):
    """Model definition for Aerial Cactus Identification"""
    def __init__(self) -> None:
        super().__init__()
        # inspired by VGG
        self.features = nn.Sequential(
                            # 1
                            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.MaxPool2d(kernel_size=2, stride=1),
                            # 2
                            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.MaxPool2d(kernel_size=2, stride=1),
                            # 3
                            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.MaxPool2d(kernel_size=2, stride=1),
                            # 4
                            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=2),
                            nn.ReLU(inplace=True),
                            nn.MaxPool2d(kernel_size=2, stride=1),
                        )
        self.avgpool = nn.AdaptiveAvgPool2d((7,7))
        self.classifier = nn.Sequential(
                            nn.Linear(256 * 7 * 7, 1024),
                            nn.ReLU(inplace=True),
                            nn.Dropout(p=0.5),
                            nn.Linear(1024, 128),
                            nn.ReLU(inplace=True),
                            nn.Dropout(p=0.7),
                            nn.Linear(128, 1),
                            # usually the last will be BCELoss(). 
                            # but now are skipped since later will be using BCEWithLogitsLoss()
                        )
        
    def forward(self, x:torch.Tensor) -> torch.Tensor:
        """Forward-pass input to tensors"""
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x,1)
        x = self.classifier(x)
        return x
                
#!DEFINE

# model initiate and summary
model = ACIModel().to(device)
model, f"running on {device}"

### Loss Functions and Optimizer

Instead of using `nn.BCELoss()`, we use `nn.BCEWithLogitsLoss()`. It is said to be more stable, and class weight could be tweaked in order to deal with imbalanced data. 

As the optimizer, we will using Stochastic Gradient Descent (`optim.SGD`)

In [None]:
# DEFINE loss function and optimizers
WEIGHT_MOD = 12/3  # approx. neg/pos
LEARNING_RATE = 0.001

loss_function = nn.BCEWithLogitsLoss(reduction='mean', 
                                     pos_weight=torch.FloatTensor([WEIGHT_MOD]).to(device)
                                    )
optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE)
#!DEFINE

<a name='modeltraineval'></a>
## #3 :: Model Training and Evaluation

*Train, Validate and Measure*

### Helpers

Some helpers for training and validations are defined below.

In [None]:
def confusion_matrix_lin2sig(y_predict:torch.FloatTensor, y_target:torch.IntTensor) -> np.ndarray:
    """ Calculate confusion matrix
    Args:
    y_predict - model output from linear layer (-inf,inf) / sigmoid unactivated,
    y_target  - dataset label (0,1)
        
    Return:
        confusion_matrix: np.ndarray - 2x2 confusion matrix of [[TN, FP]
                                                                [FN, TP]]
    """
    #  feed linear input to sigmoid function [(-inf,inf) -> (0,1)]
    _y_predict = torch.sigmoid(y_predict.squeeze()).round()
    _y_target = y_target.squeeze()
    
    assert(len(_y_predict) == len(_y_target))
    
    hit = _y_predict == _y_target
    tn = ((_y_predict == 1) *  hit).sum()
    tp = ((_y_predict == 0) *  hit).sum()
    fn = ((_y_predict == 0) * ~hit).sum()
    fp = ((_y_predict == 1) * ~hit).sum()
    
    return np.array([[tn,fp],[fn,tp]], dtype=np.intc)

# test function
confusion_matrix_lin2sig(torch.FloatTensor([-1,-2,-3,-4, 1, 2, 3, 4, 5, 9, 8,-9,-8,-7]),
                           torch.IntTensor([ 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1])
                        ) # [[5, 4], [3, 2]]


def train(model: nn.Module, 
          train_dataloader,
          loss_function,
          optimizer,
         ) -> Tuple[float, np.ndarray]:
    """ 
    Training functions wrapper for one dataloader iteration
    
    Return:
        total_loss: float              - sum of loss from loss function
        confusion_matrix: np.ndarray   - 2x2 numpy array of confusion matrix
    """
    total_loss = 0.0
    confmat = np.zeros((2,2), dtype=np.intc)
    
    # train mode
    model.train()
    
    # per-batches:
    for images, labels in train_dataloader:
        images, labels = images.to(device), labels.to(device)
        
        # training
        optimizer.zero_grad()        
        outputs = model(images).squeeze()        
        loss = loss_function(outputs, labels.to(torch.float32))        
        loss.backward()
        optimizer.step()
        
        # store results
        total_loss += loss.item()
        confmat += confusion_matrix_lin2sig(outputs, labels)
        
    train_loss = total_loss/len(train_dataloader)
    
    return train_loss, confmat
    

def validate(model: nn.Module, 
             valid_dataloader, 
             loss_function
            ) -> Tuple[float, np.ndarray]:
    """
    Validation functions wrapper for one dataloader iteration
    
    Return:
        total_loss: float              - sum of loss from loss function
        confusion_matrix: np.ndarray   - 2x2 numpy array of confusion matrix
    """
    total_loss = 0.0
    confmat = np.zeros((2,2), dtype=np.intc)
    
    # evaluation mode
    model.eval()
    
    # per-batches:
    for images, labels in valid_dataloader:
        images, labels = images.to(device), labels.to(device)
        
        # validation
        outputs = model(images).squeeze()
        loss = loss_function(outputs, labels.to(torch.float32))

        # store results
        total_loss += loss.item()
        confmat += confusion_matrix_lin2sig(outputs, labels)
    
    valid_loss = total_loss/len(valid_dataloader)
    
    return valid_loss, confmat

### Training Section

Define how much epoch we wanted, and we are ready to go.

In [None]:
# DEFINE number of epochs
EPOCHS = 100
#!DEFINE


# keep tracks of loss and accuracy each epochs
train_losses = []
train_accuracies = []
valid_losses = []
valid_accuracies = []

print("Running...")
for epoch in range(1, EPOCHS+1):
    train_loss, train_cm = train(model, train_dataloader, loss_function, optimizer)
    valid_loss, valid_cm = validate(model, valid_dataloader, loss_function)
    
    train_accuracy = (train_cm[0,0] + train_cm[1,1]) / train_cm.sum()
    valid_accuracy = (valid_cm[0,0] + valid_cm[1,1]) / valid_cm.sum()
    
    print(f"Epoch [{epoch}/{EPOCHS}] :: ")
    print(f"\tTrain Loss: {train_loss:.16f}, Accu: {train_accuracy}")
    print(f"\tValid Loss: {valid_loss:.16f}, Accu: {valid_accuracy}")
    
    train_losses.append(train_loss)
    train_accuracies.append(train_accuracy)
    valid_losses.append(valid_loss)
    valid_accuracies.append(valid_accuracy)
    
print("End of training.")

### Model Evaluation

Evaluate latest trained model against validation dataset by calculating **Accuracy, Precision, Recall, F1 Score** on the following cell.

In [None]:
valid_loss, valid_cm = validate(model, valid_dataloader, loss_function)

tn, fp, fn, tp = tuple(valid_cm.ravel())

accuracy = ( tn + tp ) / valid_cm.sum()
precision = tp / ( tp + fp )
recall = tp / ( tp + fn )
f1_score = 2 * precision * recall / ( precision + recall )

print(f"Confusion Matrix :\n {valid_cm}")
print(f"Accuracy  : {accuracy}")
print(f"Precision : {precision}")
print(f"Recall    : {recall}")
print(f"F1 Score  : {f1_score}")

### Visualization

Loss and Accuracy each epoch for both train and validation are shown below.

In [None]:
plt.plot(train_losses,'-o')
plt.plot(valid_losses,'-o')
plt.xlabel('epoch')
plt.ylabel('losses')
plt.legend(['Train','Valid'])
plt.title('Train vs Valid Losses')
 
plt.show()

In [None]:
plt.plot(train_accuracies,'-o')
plt.plot(valid_accuracies,'-o')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend(['Train','Valid'])
plt.title('Train vs Valid Accuracy')
 
plt.show()

### Save Trained Model

In [None]:
model.eval()
torch.save(model.state_dict(), 'aci_classifier.pth')

### Model Evaluation v2

https://github.com/jakartaresearch/earth-vision library by Jakarta AI Research have test dataset derived from kaggle.com which have the images labelled, 3000 cactus images and 1000 no-cactus  images.

We will try to evaluate latest trained model against those dataset by calculating **Accuracy, Precision, Recall, F1 Score** below.

In [None]:
# DEFINE uncomment lines below to use saved model
SAVED_MODEL_PATH = 'aci_classifier.pth'
model = ACIModel().to(device)
model.load_state_dict(torch.load(SAVED_MODEL_PATH))
model.eval()
model

In [None]:
!pip install -q earth-vision
import earthvision
test_dataset = earthvision.datasets.AerialCactus(
                   root='./', 
                   data_mode='validation_set',
                    transform=transforms.Compose([
                        transforms.Resize((32,32)),
                        transforms.ToTensor(),
                        lambda x: x.moveaxis(-1,1)
                    ])
               )

test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=2, num_workers=0)

test_cm = np.zeros((2,2))

for image, label in test_dataloader:
    image, label = image.to(device), label.to(device)
    output = model(image).squeeze()

    test_cm += confusion_matrix_lin2sig(output, label)

tn, fp, fn, tp = tuple(test_cm.ravel())

accuracy = ( tn + tp ) / test_cm.sum()
precision = tp / ( tp + fp )
recall = tp / ( tp + fn )
f1_score = 2 * precision * recall / ( precision + recall )

print(f"Confusion Matrix :\n {test_cm}")
print(f"Accuracy  : {accuracy}")
print(f"Precision : {precision}")
print(f"Recall    : {recall}")
print(f"F1 Score  : {f1_score}")

### Submission Artifact

In [None]:
_submit_df = pd.read_csv(Path(ACI_DATASET_PATH) / 'sample_submission.csv')
submit_df = _submit_df.copy()

_submit_df = _submit_df.set_index('id')
    

# model.eval()
with torch.no_grad():
    for idx in _submit_df.index:
        img = Image.open(Path(ACI_DATASET_PATH) / 'test' / idx)
        t_img = transforms.functional.to_tensor(img).unsqueeze(0).to(device)
        prediction = model(t_img)
        prediction = torch.sigmoid(prediction.squeeze()).round()
        _submit_df.loc[idx]['has_cactus'] = prediction.cpu().numpy()
        
        
_submit_df.to_csv('submission.csv')

!head -10 submission.csv


In [None]:
# cleanups 
!rm -rf /kaggle/working/aerial-cactus-identification
!rm -rf /kaggle/working/cactus-aerial-photos

---
<a name='summary'></a>
## #-1 :: Summary

summary :

:: Dataset 
- Train set class
  1 : 10537, 0 : 3463, total 14000
- Validation set class
  1 : 2599, 0 : 901, total 3500
- Batch size = 32

:: Model
```
ACIModel(
   (features): Sequential(
     (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (1): ReLU(inplace=True)
     (2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
     (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (4): ReLU(inplace=True)
     (5): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
     (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (7): ReLU(inplace=True)
     (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (9): ReLU(inplace=True)
     (10): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
     (11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (12): ReLU(inplace=True)
     (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (14): ReLU(inplace=True)
     (15): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
   )
   (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
   (classifier): Sequential(
     (0): Linear(in_features=12544, out_features=1024, bias=True)
     (1): ReLU(inplace=True)
     (2): Dropout(p=0.5, inplace=False)
     (3): Linear(in_features=1024, out_features=128, bias=True)
     (4): ReLU(inplace=True)
     (5): Dropout(p=0.7, inplace=False)
     (6): Linear(in_features=128, out_features=1, bias=True)
   )
```
- Loss Function = BCEWithLogitsLoss, pos_weight = 4.0
- Optimizer = SGD

:: Result (seed = 86)

- test.zip dataset
```
Confusion Matrix :
 [[2848.  157.]
 [ 152.  843.]]
Accuracy  : 0.92275
Precision : 0.843
Recall    : 0.8472361809045226
F1 Score  : 0.8451127819548871
```
- Finish time : ~60 minutes on GPU


notes :

While the documentation said the `pos_weight` value need to be negative/positive class ratio (ref [BCEWithLogitsLoss docs](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html)) experiment results otherwise, positive/negative ratio.


some room of improvements : 

- use image augmentations
```
transform = transforms.Compose([
                # simulates drones from any rotation
                # transforms.RandomRotation([360], expand=False), 
                # simulates out-of-focus 
                transforms.GaussianBlur(kernel_size=3), 
                # simulates lowlight and color shift
                transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.1, hue=0.3),
                transforms.ToTensor(),
            ])
```

- use mix of original and augmented image as dataset
```
transformed_dataset = AerialCactus(ACI_DATASET_PATH, train=True, transform=transform)
base_dataset = torch.utils.data.ConcatDataset([base_dataset, transformed_dataset])
```

- use weight initialization on model declaration

- use pretrained networks as feature layer e.g. VGG, ResNet