<a href="https://colab.research.google.com/github/jeffaudi/notebooks/blob/main/Aircraft_Detection_with_IceVision_and_Airbus_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Testing various models for aircraft detection on Airbus imagery

In recent years, artificial intelligence has made great strides in the field of computer vision. One area that has seen particularly impressive progress is object detection, with a variety of deep learning models achieving high levels of accuracy. However, this abundance of choice can be overwhelming for practitioners who are looking to implement an object detection system.

On top of this, most public models and academic research are benchmarked on COCO which are dataset made of photographs. Satellite images are quite different from photographs: the objects to detect are usually much smaller and much more numerous, they are oriented in all kind of direction and acquired in slightly different colors. In photographs, trees are always seen as green objects with the trunk below the foliage. But not in aerial or satellite images.

So, if a model architecture performs well on a photographic dataset, it does not mean that it will perform as well on an aerial dataset. And finding vehicles, wind turbines, buildings, roads, floods or crops are very different tasks. How can one find which model will work best for their data and application? In many cases, it is necessary to experiment with a few different models before finding the one that gives the best results for the specific task.

How can we test many architectures?
In this notebook, we will use the open source IceVision framework to experiment with different model architectures and compare their performance on a dataset representing the desired task i.e. finding aircrafts in satellite imagery.

This notebook was inpired by: https://www.kaggle.com/code/aninda/icevision/notebook


## Install IceVision

IceVision is an agnostic computer vision framework which integrates hundreds of high-quality code source and pre-trained models from Torchvision, OpenMMLabs, Ultralytics YOLOv5 and Ross Wightman’s EfficientDet. It enables a end-to-end deep learning workflow by offering a unique interface to robust high-performance libraries like Pytorch Lightning and Fastai

IceVision Unique Features:

- Data curation/cleaning with auto-fix
- Access to an exploratory data analysis dashboard
- Pluggable transforms for better model generalization
- Access to hundreds of neural net models
- Access to multiple training loop libraries
- Multi-task training to efficiently combine object detection, segmentation, and classification models

Here, we will see how we can easily compare multiple models and backbones on the same dataset and task.

IceVision provides a installation scripts which takes care of installing various libraries. Take care of getting the most up-to-date versions or versions adapted to your specific hardware.

In [None]:
# to correct 'distutils has no attribute version' error due to incompatible torch version.
!pip install setuptools==59.5.0

In [None]:
# Torch - Torchvision - IceVision - IceData - MMDetection - YOLOv5 - EfficientDet Installation
!wget https://raw.githubusercontent.com/airctic/icevision/master/icevision_install.sh

# Choose your installation target: cuda11 or cuda10 or cpu
!bash icevision_install.sh cuda11 master #> /dev/null

In [None]:
# Standard imports
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import glob
import random
import PIL
from pathlib import Path

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline  

def clear_pyplot_memory():
    plt.clf()
    plt.cla()
    plt.close()

In [None]:
# Here we import all functions from IceVision (similar to fastai)
# This enable to redefine (and improve) various standard python function
# If this fails, just re-run the cell
from icevision.all import *

Here, we retreive and display the versions for PyTorch, Torch Vision and Torch Ligthning. This is useful when using various environment like Google Cloud, Google Collab or Kaggle. All these environment do not have the lastest version of our favorite deep learning packages.

In [None]:
import torch
import torchvision
import pytorch_lightning as pl

# Check PyTorch version
print(f"      Torch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"  Lightning version: {pl.__version__}")

In [None]:
!nvidia-smi

### Deterministic behavior

In this notebook, we are going to compare various deep learning architectures. So we are going to run the same training multiple times and compare the results (i.e. the final loss and final metric). In order for these comparaisons to be fair, we need to make sure that our training is deterministic. So we want to make sure that all random numbers used in functions (such as splitting between train and valid dataset) always return the same value. I place the function at the beggining of many cells so that I can replay them with no. But globally at the begining of the notebook should be enough if playing the cells only once.

This function is pretty simple so it will not seed the workers. So we will only use the main thread. We will loose in speed but this is not an issue now because we just want to figure which is the best architecture for this task.

In [None]:
def seed_everything(s=42):
    random.seed(s)
    os.environ['PYTHONHASHSEED'] = str(s)
    np.random.seed(s)
    torch.manual_seed(s)
    #imgaug.random.seed(s)

    if torch.cuda.is_available():
        torch.cuda.manual_seed(s)
        torch.cuda.manual_seed_all(s)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

SEED = 42
seed_everything(SEED)

## Explore the dataset and Pleiades images

For this article, we will use the [Airbus Aircraft Sample Dataset](https://www.kaggle.com/datasets/airbusgeo/airbus-aircrafts-sample-dataset) which was published on Kaggle in 2020. Keep in mind that this is a sample dataset i.e. it is too small to make a real benchmark. Also, it only contains commercial airports and commercial aircrafts. A real benchmark would need more objects and a mix of objects adapted to the business needs. The objective here is only to show the methodology so that you could run this with a real benchmark dataset.



In [None]:
# download the Airbus Aircraft sample dataset
import requests
import zipfile

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

DATA_DIR = Path('./airbus-aircrafts-sample-dataset')
if not os.path.exists(DATA_DIR):
    
    # download file from faudi.net
    url = 'https://storage.googleapis.com/dl4eo-public/airbus/airbus-aircrafts-sample-dataset.zip'
    filename = download_file(url)

    # create a ZipFile object
    zipobj = zipfile.ZipFile(filename, 'r')

    # extract all the contents of the zip file in the current directory
    zipobj.extractall(DATA_DIR)

    # close the file
    zipobj.close()

## Explore images

Let us first explore the content of the `image` directory. There are **JPEG** extracts of **Airbus Pleiades** imagery. We will display one of these images to check its size.

Looking at satellite images of airports can be pretty fascinating. You can see all of the different buildings and airstrips and get a sense of how busy the airport is. But did you know that you can also learn a lot about the activity of an airport by automaticaly monitoring the location and type of aircrafts?

*Note: if you get a `module 'PIL.TiffTags' has no attribute 'IFD'` error here, it might be a Google Colab glitch. Just restart the runtime and rerun the cells without reinstalling all the packages.*

In [None]:
img_list = list(DATA_DIR.glob('images/*.jpg'))
pickone = random.choice(img_list)
img = PIL.Image.open(pickone)
display(img)
print(img)

Thev images available in the dataset are too large to fit into GPU memory at full resolution. We do not want to resample them because we do not want to loose small details in the images. So the followinng code cut the 2560x2560 images into 512x512 tiles. We select pretty small tiles because we want to be able to fit more than one on the GPU to maintain a reasonable `batch_size`. But we also make sure that the size is large enough to fir most planes. In our case, 512 x 0.5m is 128 meters which is more than most planes wingspan.

## Explore annotations

Next, let's explore the annotations file `annotations.csv`. It contains a list of all the aircrafts visible on the images with the id of the associated image, a list of coordinates describing the outer boundaries of the aircraft and a label, usually `Aircraft` and sometimes `Truncated_aircraft`.

Here we will convert the geometries to bounding boxes as this is the usual format for detection models and replace the two categories by only `aircraft`.

In [None]:
import ast

# convert a string record into a valid python object
def f(x): 
    return ast.literal_eval(x.rstrip('\r\n'))

df = pd.read_csv(DATA_DIR / "annotations.csv", \
            converters={'geometry': f, 'class': lambda l: "aircraft"})
df.rename(columns = {'class' : 'label'}, inplace = True)

def getBounds(geometry):
    try: 
        arr = np.array(geometry).T
        xmin = np.min(arr[0])
        ymin = np.min(arr[1])
        xmax = np.max(arr[0])
        ymax = np.max(arr[1])
        return (xmin, ymin, xmax, ymax)
    except:
        return np.nan

def getWidth(bounds):
    try: 
        (xmin, ymin, xmax, ymax) = bounds
        return np.abs(xmax - xmin)
    except:
        return np.nan

def getHeight(bounds):
    try: 
        (xmin, ymin, xmax, ymax) = bounds
        return np.abs(ymax - ymin)
    except:
        return np.nan
    
# Create bounds, width and height
df.loc[:,'bounds'] = df.loc[:,'geometry'].apply(getBounds)
df.loc[:,'width'] = df.loc[:,'bounds'].apply(getWidth)
df.loc[:,'height'] = df.loc[:,'bounds'].apply(getHeight)

df.head(10)

We subsequently process the annotation DataFrame by associating each annotation with its tile rather than its imagery.

*Note: here we do not use the parameter to provide an overlap between the tiles. This is because the train/val split is done by IceVision on the tiles and not on the source imagery as described on my other <a href="https://www.kaggle.com/code/jeffaudi/aircraft-detection-with-yolov5">notebook on aircraft detection</a>. If we let overlap, we will have a leak between the training and the validation split which will impact training.*

In [None]:
import os 
import tqdm.notebook
import copy
import PIL

# Create 512x512 tiles with 0 pix overlap in df
TILE_WIDTH = 512
TILE_HEIGHT = 512
TILE_OVERLAP = 0 # for now since splitting by image needs to be implemented in IceVision 64
TRUNCATED_PERCENT = 0.30
_overwriteFiles = True

# Location for tiles
TILES_PATH = Path('./tiles/')
if not os.path.isdir(TILES_PATH):
    os.makedirs(TILES_PATH)

# check if annotation should be kept
def tag_is_inside_tile(bounds, x_start, y_start, width, height, truncated_percent):
    x_min, y_min, x_max, y_max = bounds
    x_min, y_min, x_max, y_max = x_min - x_start, y_min - y_start, x_max - x_start, y_max - y_start

    if (x_min > width) or (x_max < 0.0) or (y_min > height) or (y_max < 0.0):
        return None
    
    x_max_trunc = min(x_max, width) 
    x_min_trunc = max(x_min, 0) 
    if (x_max_trunc - x_min_trunc) / (x_max - x_min) < truncated_percent:
        return None

    y_max_trunc = min(y_max, width) 
    y_min_trunc = max(y_min, 0) 
    if (y_max_trunc - y_min_trunc) / (y_max - y_min) < truncated_percent:
        return None
    
    return (x_min_trunc, y_min_trunc, x_max_trunc, y_max_trunc)
            
rows = []
for img_path in tqdm.notebook.tqdm(img_list):
    # Open image and related data
    pil_img = PIL.Image.open(img_path, mode='r')
    np_img = np.array(pil_img, dtype=np.uint8)
    image_width, image_height = pil_img.size

    # Get annotations for image
    img_labels = df[df['image_id'] == img_path.name]
    #print(img_labels)

    # Count number of sections to make
    X_TILES = (image_width + TILE_WIDTH - TILE_OVERLAP - 1) // (TILE_WIDTH - TILE_OVERLAP)
    Y_TILES = (image_height + TILE_HEIGHT - TILE_OVERLAP - 1) // (TILE_HEIGHT - TILE_OVERLAP)

    # Cut each tile
    for x in range(X_TILES):
        for y in range(Y_TILES):

            x_end = min(TILE_WIDTH + x * (TILE_WIDTH - TILE_OVERLAP), image_width)
            x_start = x_end - TILE_WIDTH
            y_end = min(TILE_HEIGHT + y * (TILE_HEIGHT - TILE_OVERLAP), image_height)
            y_start = y_end - TILE_HEIGHT
            #print(x_start, y_start)
            
            # Save if file doesn't exit
            tile_id = img_path.stem + "_" + str(x_start) + "_" + str(y_start) + img_path.suffix
            save_tile_path = TILES_PATH / tile_id
            if _overwriteFiles or not os.path.isfile(save_tile_path):
                cut_tile = np.zeros(shape=(TILE_WIDTH, TILE_HEIGHT, 3), dtype=np.uint8)
                cut_tile[0:TILE_HEIGHT, 0:TILE_WIDTH, :] = np_img[y_start:y_end, x_start:x_end, :]
                cut_tile_img = PIL.Image.fromarray(cut_tile)
                cut_tile_img.save(save_tile_path)

            # Get annotations in tile
            for index, tag in img_labels.iterrows():
                bounds = tag_is_inside_tile(tag['bounds'], x_start, y_start, TILE_WIDTH, TILE_HEIGHT, TRUNCATED_PERCENT)
                if bounds is not None:
                    x_min, y_min, x_max, y_max = bounds
                    row = {
                        'image_id': img_path.name,
                        'tile_id': tile_id,
                        'x_start': x_start,
                        'y_start': y_start,
                        'x_min': x_min,
                        'x_max': x_max,
                        'y_min': y_min,
                        'y_max': y_max,
                        'label': tag['label'],
                    }
                    rows.append(copy.deepcopy(row))

tiles_df = pd.DataFrame(rows)
tiles_df.head(20)

Next, we need to write an IceVision `Parser`. This is one of the most magical piece of code in IceVision. It enables to smoothly use our content in `PyTorch Lightning` and `Fastai` DataLoaders. Here is the functions wthat we need to implement to create an IceVision `Parser`.

In [None]:
TEMPLATE_RECORD = ObjectDetectionRecord()
Parser.generate_template(TEMPLATE_RECORD)

Hopefully, if you have a Pandas `DataFrame`, it is pretty straightfoward.

In [None]:
class AirbusAircraftParser(Parser):
    def __init__(self, template_record, df):
        super().__init__(template_record=template_record)

        self.df = df
        self.class_map = ClassMap(list(self.df['label'].unique()))

    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o

    def __len__(self) -> int:
        return len(self.df)

    def record_id(self, o) -> Hashable:
        return o.tile_id

    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(TILES_PATH / o.tile_id)
            record.set_img_size(ImgSize(width=TILE_WIDTH, height=TILE_HEIGHT))
            record.detection.set_class_map(self.class_map)

        record.detection.add_bboxes([BBox.from_xyxy(o.x_min, o.y_min, o.x_max, o.y_max)])
        record.detection.add_labels([o.label])

In [None]:
# here we create the parser for the Airbus aircraft dataset
parser = AirbusAircraftParser(TEMPLATE_RECORD, tiles_df)

In [None]:
# we check the number and name of classes
print(parser.class_map)

In [None]:
# here, parser takes care of splitting train/valid 
# we define the seed to have consistent train/valid between runs
# parser also correcting/removing records which are incorrects
seed_everything(SEED)
train_records, valid_records = parser.parse(RandomSplitter([0.8, 0.2], seed=SEED))

In [None]:
# let's display some records!
show_records(random.choices(train_records, k=6), ncols=3)

### Transformations

The purpose of using transformations is to programmatically increase the number of images used to train the network. Too few images will cause the model to overfit quickly (i.e. learn to replicate exactly the training data). By applying transformations to the training data, we ensure that we can use larger backbones and longer training time — globally improving the models while avoiding overfitting.

In [None]:
# seed everything
seed_everything(SEED)

# define some transformation adapted to satellite imagery
train_tfms = tfms.A.Adapter([
    tfms.A.VerticalFlip(p=0.5),
    tfms.A.HorizontalFlip(p=0.5),
    tfms.A.Rotate(limit=20),
    tfms.A.GaussNoise(p=0.2),
    tfms.A.RandomBrightnessContrast(p=0.2),
    tfms.A.Normalize(),
])

# no transformation on the validation split
valid_tfms = tfms.A.Adapter([tfms.A.Normalize()])

# this is the size of the image after transformations are applied
image_size = TILE_WIDTH

In [None]:
# we create the train/valid Dataset objects
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

### Model architecture

**And this is where the magic happens!** By leveraging the IceVision integration layers, we are able to access easily to hundreds of neural net models.

In the notebook, I will test 4 differents models with different backbones
- RetinaNet (with ResNet34 or ResNet50 backbones)
- FasterRCNN (with ResNet34 or ResNet50 backbones)
- EfficientDet (with two small backbones)
- YOLOv5 (with medium backbone - similar in size to previous backbones)

**Relaunch all cells below when changing the SELECTION**

In [None]:
# the selected architecture
SELECTION = 3

# default parameters
BATCH_SIZE = 16
extra_args = {}

if SELECTION == 1:
    model_name = "torchvision-retinanet-resnet34_fpn"

elif SELECTION == 2:
    model_name = "torchvision-retinanet-resnet50_fpn"
    
elif SELECTION == 3:
    model_name = "torchvision-faster_rcnn-resnet34_fpn"

elif SELECTION == 4:
    model_name = "torchvision-faster_rcnn-resnet50_fpn"
    
elif SELECTION == 5:
    model_name = "ross-efficientdet-tf_lite0"
    extra_args['img_size'] = image_size
    
elif SELECTION == 6:
    model_name = "ross-efficientdet-d0"
    extra_args['img_size'] = image_size
    
elif SELECTION == 7:
    model_name = "ultralytics-yolov5-medium"
    extra_args['img_size'] = image_size

In [None]:
# seed everything
seed_everything(SEED)

tokens = model_name.split("-")
library_name = tokens[-3]
print(f"Library name: {library_name}")
arch_name = tokens[-2]
print(f"Architecture name: {arch_name}")
backbone_name = tokens[-1]
print(f"Backbone name: {backbone_name}")

model_type = getattr(getattr(models, library_name), arch_name)
backbone = getattr(model_type.backbones, backbone_name)

model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)

In [None]:
# we create the train/valid DataLoaders objects
train_dl = model_type.train_dl(train_ds, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)
valid_dl = model_type.valid_dl(valid_ds, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)

In [None]:
# and display the first batch
model_type.show_batch(first(valid_dl), ncols=4)

In [None]:
# define the metric
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]

## Finding LR with Fastai
You can uncomment the following two cells to use Fastai to find the best learning rate for each architecture.

*Note: when you are done with this part, comment the code again and rerun the previous cells from the architecture SELECTION before moving on to the next part.*

In [None]:
#seed_everything(SEED)
#learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)

In [None]:
#learn.lr_find()

In [None]:
if SELECTION == 1 or SELECTION == 3:
    LR = 1e-4
elif SELECTION == 2 or SELECTION == 4:
    LR = 5e-5
else:
    LR = 0.001
    
print(f"Model is {model_name}")
print(f"Learning rate is {LR}")

## Training using PyTorch Lightning

For the sake of demonstration, we will now use PyTorch Lightning to train our models. We can move easily from one library to another depending on our needs.

This is the minimum code that we need:

In [None]:
class LightModel(model_type.lightning.ModelAdapter):
    def configure_optimizers(self):
        return Adam(self.parameters(), lr=LR)

light_model = LightModel(model, metrics=metrics)

In [None]:
seed_everything(SEED)

MAX_EPOCHS = 10
trainer = pl.Trainer(max_epochs=MAX_EPOCHS, accelerator='gpu', devices=1, log_every_n_steps=10)
trainer.fit(light_model, train_dl, valid_dl)

## Computing the final metric
To compute the final metric, we will create a new PyTorch Lightning `trainer`
 to make sure that we have a clean state and use the `test()` function.

In [None]:
# compute the final metric with a new Trainer object
seed_everything(SEED)
valid_dl = model_type.valid_dl(valid_ds, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)
trainer = pl.Trainer(accelerator='gpu', devices=1)
results = trainer.test(light_model, valid_dl)

In [None]:
# print architecture, loss and metric
loss = results[0]['test_loss']
metric = results[0]['COCOMetric']
print(f"| {model_name} | {loss} | {metric} |")

Accumulating the results in this table

| Architecture | Best validation loss | Best COCOMetric |
| --- | --- | --- |
| torchvision-retinanet-resnet34_fpn | 0.46982264518737793 | 0.49574658971328095 |
| torchvision-retinanet-resnet34_fpn | 0.46982264518737793 | 0.49574658971328095 |
| torchvision-retinanet-resnet50_fpn | 0.3995179533958435 | 0.5556400150095998 |
| torchvision-retinanet-resnet50_fpn | 0.3995179533958435 | 0.5556400150095998 |
| torchvision-faster_rcnn-resnet18_fpn | 0.8484785556793213 | 0.1651134579186384 |
| torchvision-faster_rcnn-resnet34_fpn | 0.24665088951587677 | 0.5957194089424052 |
| torchvision-faster_rcnn-resnet34_fpn | 0.2362484633922577 | 0.6065852414795327 |
| torchvision-faster_rcnn-resnet34_fpn | 0.2500587999820709 | 0.5787572670787772 |
| torchvision-faster_rcnn-resnet50_fpn | 0.2642669975757599 | 0.5958341939769323 |
| torchvision-faster_rcnn-resnet50_fpn | 0.26830238103866577 | 0.5765243207445334 |
| ross-efficientdet-tf_lite0 | 0.30016225576400757 | 0.6021635117326537 |
| ross-efficientdet-tf_lite0 | 0.30016225576400757 | 0.6021635117326537 |
| ross-efficientdet-d0 | 0.29007476568222046 | 0.6077941592823384 |
| ross-efficientdet-d0 | 0.29007476568222046 | 0.6077941592823384 |
| ultralytics-yolov5-medium | 0.3424062132835388 | 0.5123324077224909 |
| ultralytics-yolov5-medium | 0.508611261844635 | 0.23745982558895537 |
| ultralytics-yolov5-medium | 0.42536768317222595 | 0.3929547212353217 |
| ultralytics-yolov5-medium | 0.4761926829814911 | 0.3119549499080895 |
| ultralytics-yolov5-medium | 0.44334524869918823 | 0.34793269230592194 |

We can visualize the detections by using the following code:

In [None]:
model_type.show_results(model, valid_ds, detection_threshold=.30)

To better understand the behavior of our model, we can visualize the results by using the following code:

In [None]:
samples_plus_losses, preds, losses_stats = model_type.interp.plot_top_losses(model=model, dataset=valid_ds, sort_by="loss_total", n_samples=6)


Thank you for reading till the end :)

Please, check my other notebooks:
- https://www.kaggle.com/code/jeffaudi/aircraft-detection-with-yolov5
- https://www.kaggle.com/code/jeffaudi/eda-airbus-oil-storage-tanks-dataset
- https://www.kaggle.com/code/jeffaudi/oil-storage-detection-on-airbus-imagery-with-yolox