# Faster R-CNN Analysis

This notebook is an analysis of the performance of various layer additions to the Faster R-CNN model as used in Pytorch.  The dataset used is the SARscope dataset found at the below link.  The objective is to determine whether proposed image processing methods would increase model performance on Synthetic Aperture Radar data of maritime vessels.

Dataset: https://www.kaggle.com/datasets/kailaspsudheer/sarscope-unveiling-the-maritime-landscape

## Section 1 - Workspace Preparation

To ensure that the user can run this notebook without issue, please do the following:

1. Ensure your Python installtion is 3.8.10 or higher.
2. You are using the pip3 package manager.
3. Run the below installation steps. These are all the packages used in this notebook.

In [None]:
!pip3 install torch
!pip3 install torchvision
!pip3 install torchmetrics
!pip3 install kagglehub
!pip3 install json
!pip3 install matplotlib
!pip3 install cv2

## Section 2 - Dataset Loading

The user has two options for initializing their data.  They can either provide a path to a local copy of the dataset folder or they can load it using the Kagglehub module.  To choose the local path option, simply provide the absolute path to the 'SARscope' folder and no lower.  To choose the Kagglehub option, leave the 'path' variable as 'None'.

In [None]:
# Utility Imports
import json
import os
import pathlib
import random
import shutil


# Data Handling Imports
import cv2
import kagglehub
import matplotlib.pyplot as plt


# Model & Metric Imports
import torch
import torchmetrics
import torchvision

project_path = pathlib.Path.cwd().parent.resolve()
print(f"Project path: {project_path}")

### Section 2.1: Note on Kagglehub

Kagglehub does not natively support downloading to specific directories on the user's file system.  It instead downloads it to a cache folder, which may vary between users.  Thus, all downloads will move the data folder to the included data directory under the '*/data/kaggle/*' folder for consistency.

If you have an error, this is likely due to the `shutil.move()` command failing because it sees the dataset still cached.  To counteract this, `cd` into the cache directory that is printed in the output and delete the entire data folder.  Then run this block again.

Once the dataset is downloaded, you can either continue or change the `data_path` variable from `None` to the *SARscope* folder's absolute path which will avoid re-downloading everything.

In [None]:
data_path = None # Leave None to download data through Kagglehub.  Otherwise, provide the path to your dataset.

if data_path is None:
    # Create the Kaggle directory to move the downloaded data to
    kaggle_path = os.path.join(project_path, "data", "kaggle")

    if not os.path.exists(kaggle_path):
        os.makedirs(kaggle_path, exist_ok=True)

    # Download the SARscope dataset from Kaggle
    try:
        cached_path = kagglehub.dataset_download("kailaspsudheer/sarscope-unveiling-the-maritime-landscape")
    except:
        raise LookupError("Unable to download SEAscope dataset.")

    # Get the absolute path and move it.
    cached_path = os.path.abspath(os.path.join(cached_path, "SARscope"))

    print(f"Moving cached dataset from directory {cached_path} to {kaggle_path}")
    shutil.move(cached_path, kaggle_path)

    data_path = os.path.join(kaggle_path, "SARscope")

In [None]:
print(f"Using path to dataset: {data_path}")

## Section 3 - Data Visualization

Below, we visualize a few randomly selected images throughout the training dataset as examples of the different types of images the model will encounter and to ensure the annotations are working as expected.

In [135]:
train_files = os.listdir(os.path.join(data_path, "train"))

# Extract the annotation file
train_annotation_file = [x for x in train_files if x.endswith(".json")][0]
train_annotation_file = os.path.join(data_path, "train", train_annotation_file)

# Remove the annotation file from the list of image files and add the absolute paths
train_files = [x for x in train_files if not x.endswith(".json")]
train_files = list(map(lambda x: os.path.join(data_path, "train", x), train_files))

In [134]:
train_sample = random.sample(train_files, 3)

with open(train_annotation_file, 'r') as fAnn:
    annotations = json.load(fAnn)