<center><img src="https://i.imgur.com/1efyCJQ.png"></center>

<center><h1>Great Barrier Reef - Image & Bounding Box Augmentation</h1></center>

# 1. Introduction

> üåä **Competition Goal**: accurately identify starfish (*COTS - coral-eating Crown-Of-Thorns Starfish*) in real-time by building an object detection model trained on underwater videos of coral reefs. This way we can help researchers & scientists to control **COTS outbreaks**, which are a threat to the Great Barrier Reef.

### Crown of Thorns Starfish

üåü **What is this creature?** [The crown-of-thorns starfish](https://en.wikipedia.org/wiki/Crown-of-thorns_starfish) is a large starfish that preys upon hard, or stony, coral polyps. It receives its name from the *venomous thorn-like spines* that cover its upper surface, resembling the biblical crown of thorns. It is one of the largest starfish in the world.

<center><img src="https://i.imgur.com/LqpLu9c.png" width=600></center>

üåü **Why is this a problem?** *One or two Crown-of-Thorn starfish on a reef may be arguably beneficial* for biological diversity as they keep down the growth of fast-growing coral species and leave space for other, slow-growing corals. However, as the starfish population multiplies or the starfish begin eating coral tissue faster than it can grow back a devastating Crown-of-Thorn (COTS) outbreak can occur. It is not known exactly what causes a COTS outbreaks, however, scientists agree it could have something to do with increased levels of nutrients in the water due to agriculture runoff or warming oceans, leading to a plankton bloom which is a necessary food source for starfish larvae ([source here](https://oceangardener.org/crown-of-thorns-starfish/)).

üåü **Can an AI spot them?** A COTS outbreak can have devastating impacts to an entire coral reef, and depending on the event the ravenous starfish could wipe out nearly all living corals. Crown-of-Thorns are among some of the *larges starfish species*, generally 25-35cm (10-14inch) in diameter and can grow to a size of 80cm (31inch), this makes them easy to spot on a reef ([source here](https://oceangardener.org/crown-of-thorns-starfish/)).

### ‚¨á Libraries Below

In [None]:
# Libraries
import os
import sys
import wandb
import time
import random
from tqdm import tqdm
import warnings
import cv2
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.patches as patches
import matplotlib.pyplot as plt
from IPython.display import display_html


# Environment check
warnings.filterwarnings("ignore")
os.environ["WANDB_SILENT"] = "true"
CONFIG = {'competition': 'greatReef', '_wandb_kernel': 'aot'}

# üêù Secrets
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("wandb")

! wandb login $secret_value_0

# Custom colors
class color:
    S = '\033[1m' + '\033[94m'
    E = '\033[0m'
    
my_colors = ["#16558F", "#1583D2", "#61B0B7", "#ADDEFF", "#A99AEA", "#7158B7"]
print(color.S+"Notebook Color Scheme:"+color.E)
sns.palplot(sns.color_palette(my_colors))

# Set Style
sns.set_style("white")
mpl.rcParams['xtick.labelsize'] = 14
mpl.rcParams['ytick.labelsize'] = 14
mpl.rcParams['axes.spines.left'] = False
mpl.rcParams['axes.spines.right'] = False
mpl.rcParams['axes.spines.top'] = False
plt.rcParams.update({'font.size': 14})

### ‚¨á Helper Functions Below

In [None]:
def show_values_on_bars(axs, h_v="v", space=0.4):
    '''Plots the value at the end of the a seaborn barplot.
    axs: the ax of the plot
    h_v: weather or not the barplot is vertical/ horizontal'''
    
    def _show_on_single_plot(ax):
        if h_v == "v":
            for p in ax.patches:
                _x = p.get_x() + p.get_width() / 2
                _y = p.get_y() + p.get_height()
                value = int(p.get_height())
                ax.text(_x, _y, format(value, ','), ha="center") 
        elif h_v == "h":
            for p in ax.patches:
                _x = p.get_x() + p.get_width() + float(space)
                _y = p.get_y() + p.get_height()
                value = int(p.get_width())
                ax.text(_x, _y, format(value, ','), ha="left")

    if isinstance(axs, np.ndarray):
        for idx, ax in np.ndenumerate(axs):
            _show_on_single_plot(ax)
    else:
        _show_on_single_plot(axs)
    
    
# === üêù W&B ===
def save_dataset_artifact(run_name, artifact_name, path):
    '''Saves dataset to W&B Artifactory.
    run_name: name of the experiment
    artifact_name: under what name should the dataset be stored
    path: path to the dataset'''
    
    run = wandb.init(project='g2net', 
                     name=run_name, 
                     config=CONFIG, anonymous="allow")
    artifact = wandb.Artifact(name=artifact_name, 
                              type='dataset')
    artifact.add_file(path)

    wandb.log_artifact(artifact)
    wandb.finish()
    print("Artifact has been saved successfully.")
    
    
def create_wandb_plot(x_data=None, y_data=None, x_name=None, y_name=None, title=None, log=None, plot="line"):
    '''Create and save lineplot/barplot in W&B Environment.
    x_data & y_data: Pandas Series containing x & y data
    x_name & y_name: strings containing axis names
    title: title of the graph
    log: string containing name of log'''
    
    data = [[label, val] for (label, val) in zip(x_data, y_data)]
    table = wandb.Table(data=data, columns = [x_name, y_name])
    
    if plot == "line":
        wandb.log({log : wandb.plot.line(table, x_name, y_name, title=title)})
    elif plot == "bar":
        wandb.log({log : wandb.plot.bar(table, x_name, y_name, title=title)})
    elif plot == "scatter":
        wandb.log({log : wandb.plot.scatter(table, x_name, y_name, title=title)})
        
        
def create_wandb_hist(x_data=None, x_name=None, title=None, log=None):
    '''Create and save histogram in W&B Environment.
    x_data: Pandas Series containing x values
    x_name: strings containing axis name
    title: title of the graph
    log: string containing name of log'''
    
    data = [[x] for x in x_data]
    table = wandb.Table(data=data, columns=[x_name])
    wandb.log({log : wandb.plot.histogram(table, x_name, title=title)})

# 2. üåä Dataset Understanding

## 2.1 [train.csv]

The `train.csv` dataset contains 5 columns that help identify the position within the video and sequence of the .jpg images within the `train_images` folder.

Additionaly, it has an `annotations` columns, which can be empty (`[]`) or could contain 1 or multiple coordinates for the location (or a bounding box) of the COTS.

<center><img src="https://i.imgur.com/xSuUaxf.png" width=700></center>

In [None]:
# W&B Experiment
run = wandb.init(project='GreatBarrierReef', name='DataUnderstanding', config=CONFIG, anonymous="allow")

# Read training dataset
train_df = pd.read_csv("../input/tensorflow-great-barrier-reef/train.csv")
test_df = pd.read_csv("../input/tensorflow-great-barrier-reef/test.csv")

In [None]:
df1_styler = train_df.sample(n=5, random_state=24).style.set_table_attributes("style='display:inline'").set_caption('Sample Train Data')
df2_styler = test_df.head().style.set_table_attributes("style='display:inline'").set_caption('Test Data (the rest is hidden)')

display_html(df1_styler._repr_html_(), raw=True)
print("\n")
display_html(df2_styler._repr_html_(), raw=True)

### I. Length of Videos, Sequences and Frames

üê° There are **3 total videos**, with the last one having the most frames (.jpg images) out of all. However, they are not extremely imbalanced, with enough frame numbers fro each of the 3 videos.

üê° Each **video is split into sequences**. 1 video is split into 4 sequences, while the other 2 videos are split into 8 sequences each. Each sequence has an unique ID and has various numbers of frames, raging from 71 frames per sequence all the way to ~3,000 frames per sequence.

In [None]:
fig, ((ax1, ax2)) = plt.subplots(nrows=1, ncols=2, figsize=(23, 10))

# --- Plot 1 ---
df1 = train_df["video_id"].value_counts().reset_index()

sns.barplot(data=df1, x="index", y="video_id", ax=ax1,
            palette=my_colors)
show_values_on_bars(ax1, h_v="v", space=0.1)
ax1.set_xlabel("Video ID")
ax1.set_ylabel("")
ax1.title.set_text("Frequency of Frames per Video")
ax1.set_yticks([])

# --- Plot 2  ---
df2 = train_df["sequence"].value_counts().reset_index()

sns.barplot(data=df2, y="index", x="sequence", order=df2["index"],
            ax=ax2, orient="h", palette="BuPu_r")
show_values_on_bars(ax2, h_v="h", space=0.1)
ax2.set_xlabel("")
ax2.set_ylabel("Sequence ID")
ax2.title.set_text("Frequency of Frames per Sequence")
ax2.set_xticks([])

sns.despine(top=True, right=True, left=True, bottom=True, ax=ax1)
sns.despine(top=True, right=True, left=True, bottom=True, ax=ax2)

In [None]:
# üêù Log plots into W&B Dashboard
create_wandb_plot(x_data=df1.index, 
                  y_data=df1.video_id, 
                  x_name="Video ID", y_name=" ", 
                  title="-Frequency of Frames per Video-", 
                  log="frames", plot="bar")

create_wandb_plot(x_data=df2.index, 
                  y_data=df2.sequence, 
                  x_name="Sequence ID", y_name=" ", 
                  title="-Frequency of Frames per Sequence-", 
                  log="frames2", plot="bar")

### II. Target Variable - `annotations`

We can compute the total number of annotations per frame (or .jpg image) by counting how many coordinates can be found within a frame.

In [None]:
# Calculate the number of total annotations within the frame
train_df["no_annotations"] = train_df["annotations"].apply(lambda x: len(eval(x)))

üê° The distribution of annotations is extremely skewed, with **most of the frames having no annotation** at all.

üê° For the frames that do have annotations, most have between **1 and 3 annotations**, with a few outlier frames that have more than 10 unique coordinates (bounding boxes) identified within the image.

In [None]:
# % annotations
n = len(train_df)
no_annot = round(train_df[train_df["no_annotations"]==0].shape[0]/n*100)
with_annot = round(train_df[train_df["no_annotations"]!=0].shape[0]/n*100)

print(color.S + f"There are ~{no_annot}% frames with no annotation and" + color.E,
      "\n",
      color.S + f"only ~{with_annot}% frames with at least 1 annotation." + color.E)

# Plot
plt.figure(figsize=(23, 6))
sns.histplot(train_df["no_annotations"], bins=19, kde=True, element="step", 
             color=my_colors[5])

plt.xlabel("Number of Annotations")
plt.ylabel("Frequency")
plt.title("Distribution for Number of Annotations per Frame")

sns.despine(top=True, right=True, left=False, bottom=True)

n = len(train_df)
no_annot = round(train_df[train_df.no_annotations==0].shape[0]/n*100)
with_annot = round(train_df[train_df.no_annotations!=0].shape[0]/n*100)

In [None]:
# üêù Log info and plots into W&B Dashboard
wandb.log({"no annotations": no_annot,
           "with annotations": with_annot})

create_wandb_hist(x_data=train_df["no_annotations"],
                  x_name="Number of Annotations",
                  title="Distribution for Number of Annotations per Frame",
                  log="annotations")

I also wanted to look at all sequences and see **how the annotations distribute through time**. We know that each sequence has the frames (.jpg images) numerotated in the order that they appear within the video, from 1 to n. Hence, we can visualize the number of annotations per frame through time to see *if these have irregularities between sequences, or if they have some kind of systematic appearance*.

üê° **Sequences 53708, 8503, 60754, 22643 and 8399**: these have **lots of annotations** throughout the entire sequence, with no particular pattern of apparition (what I mean by this is that the annotations don't seem to usually appear either at the beginning, middle nor end of the sequence).

üê° **Sequences 44160, 29424, 37114**: these **don't have ANY annotation** appear in any of the frame, meaning that no COTS has been identified and tagged within these images.

üê° **All other sequences**: for the remainer of sequences, most have a few or close to no annotation within them. These sequences don't seem to have an apparition pattern either, so I tend to believe that **the COTS appear as sporradic as possible within the videos** (which is very good, we want to mimic a natural setting as much as possible).

In [None]:
# List of unique sequence values
sequences = list(train_df["sequence"].unique())

plt.figure(figsize=(23,20))
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.2, hspace=0.5)
plt.suptitle("Frequency of annotations on sequence length", fontsize = 20)

# Enumerate through all sequences
for k, sequence in enumerate(sequences):
    train_df[train_df["sequence"] == sequence]
    df_seq = train_df[train_df["sequence"] == sequence]
    
    plt.subplot(5, 4, k+1)
    plt.title(f"Sequence: {sequence}", fontsize = 12)
    plt.xlabel("Seq Frame", fontsize=10)
    plt.ylabel("No. Annot", fontsize=10)
    plt.xticks(fontsize=10); plt.yticks(fontsize=10)
    sns.lineplot(x=df_seq["sequence_frame"], y=df_seq["no_annotations"],
                 color=my_colors[2], lw=3)

In [None]:
wandb.finish()

## 2.2 [train_images]

The `train_images` folder is structured as follows:

<center><img src="https://i.imgur.com/AZzvcs4.png" width=700></center>

### I. Showing 1 Frame

Before doing anything, let's explore the frames and how do they look like. Again, a **frame is actually a .jpg image**, a picture caught in time within the video.

In [None]:
# W&B Experiment
run = wandb.init(project='GreatBarrierReef', name='ExampleImages', config=CONFIG, anonymous="allow")

# Create a "path" column containing full path to the frames
base_folder = "../input/tensorflow-great-barrier-reef/train_images"

train_df["path"] = base_folder + "/video_" + \
                    train_df['video_id'].astype(str) + "/" +\
                    train_df['video_frame'].astype(str) +".jpg"

In [None]:
# === Show image and annotations if applicable ===
def show_image(path, annot, axs=None):
    '''Shows an image and marks any COTS annotated within the frame.
    path: full path to the .jpg image
    annot: string of the annotation for the coordinates of COTS'''
    
    # This is in case we plot only 1 image
    if axs==None:
        fig, axs = plt.subplots(figsize=(23, 8))
    
    img = plt.imread(path)
    axs.imshow(img)

    if annot:
        for a in eval(annot):
            rect = patches.Rectangle((a["x"], a["y"]), a["width"], a["height"], 
                                     linewidth=3, edgecolor="#FF6103", facecolor='none')
            axs.add_patch(rect)

    axs.axis("off")
    
    
# === üêùW&B Log ===
def wandb_annotation(image, annotations):
    '''Source: https://www.kaggle.com/ayuraj/visualize-bounding-boxes-interactively
    image: the cv2.imread() output
    annotations: the original annotations from the train dataset'''
    
    all_annotations = []
    if annotations:
        for annot in eval(annotations):
            data = {"position": {
                            "minX": annot["x"],
                            "minY": annot["y"],
                            "maxX": annot["x"]+annot["width"],
                            "maxY": annot["y"]+annot["height"]
                        },
                    "class_id" : 1,
                    "domain" : "pixel"}
            all_annotations.append(data)
    
    return wandb.Image(image, 
                       boxes={"ground_truth": {"box_data": all_annotations}}
                      )

üê° This is an example of a "naked" image - there are **no annotations found**, meaning that there are no COTS present.

In [None]:
# Show only 1 image as example
path = list(train_df[train_df["no_annotations"]==0]["path"])[0]
annot = list(train_df[train_df["no_annotations"]==0]["annotations"])[0]

# üêù Log Image to W&B
image = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
wandb_images = []
wandb_images.append(wandb_annotation(image, annot))

print(color.S+"Path:"+color.E, path)
print(color.S+"Annotation:"+color.E, annot)
print(color.S+"Frame:"+color.E)
show_image(path, annot, axs=None)

üê° The image below is a case that has the most annotations a frame can have (**18** bounding boxes in total).

üê° Some COTS can be seen with the naked eye, however others are extremely hidden in the background.

In [None]:
# Show only 1 image as example
path = list(train_df[train_df["no_annotations"]==18]["path"])[0]
annot = list(train_df[train_df["no_annotations"]==18]["annotations"])[0]

# üêù Log Image to W&B
image = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
wandb_images.append(wandb_annotation(image, annot))
wandb.log({"example_image": wandb_images})

print(color.S+"Path:"+color.E, path)
print(color.S+"Annotation:"+color.E, annot)
print(color.S+"Frame:"+color.E)
show_image(path, annot, axs=None)

### II. Show Multiple Consecutive Frames

Now let's look at multiple consecutive frames within a few sequences.

In [None]:
def show_multiple_images(seq_id, frame_no):
    '''Shows multiple images within a sequence.
    seq_id: a number corresponding with the sequence unique ID
    frame_no: a list containing the first and last frame to plot'''
    
    # Select image paths & their annotations
    paths = list(train_df[(train_df["sequence"]==seq_id) & 
                 (train_df["sequence_frame"]>=frame_no[0]) & 
                 (train_df["sequence_frame"]<=frame_no[1])]["path"])
    annotations = list(train_df[(train_df["sequence"]==seq_id) & 
                 (train_df["sequence_frame"]>=frame_no[0]) & 
                 (train_df["sequence_frame"]<=frame_no[1])]["annotations"])

    # Plot
    fig, axs = plt.subplots(2, 3, figsize=(23, 10))
    axs = axs.flatten()
    fig.suptitle(f"Showing consecutive frames for Sequence ID: {seq_id}", fontsize = 20)

    for k, (path, annot) in enumerate(zip(paths, annotations)):
        axs[k].set_title(f"Frame No: {frame_no[0]+k}", fontsize = 12)
        show_image(path, annot, axs[k])

    plt.tight_layout()
    plt.show()

üê° The frames below have **no COTS** identified within them.

In [None]:
seq_id = 44160
frame_no = [51, 56]

show_multiple_images(seq_id, frame_no)

üê° These frames however have **1 and 2 COTS** identified. Notice that in the first 3 frames only 1 COTS is annotated, however the second COTS one is also visible but NOT identified. This COTS is identified and annotated only starting the 4th frame onwards.

In [None]:
seq_id = 59337
frame_no = [38, 43]

show_multiple_images(seq_id, frame_no)

üê° At the polar opposite, these images show the presence of **multiple COTS** within them.

üê° My question would be - could we somehow distort/enhance these images so we could better identify the presence of COTS within them? We already know that all the images will have around the same tonal colors (blue, green, yellow) and around the same texture.

In [None]:
seq_id = 53708
frame_no = [801, 806]

show_multiple_images(seq_id, frame_no)

### III. Comparison between No Annotated vs Annotated Images

I wanted to look at multiple random images/frames and see if they look significantly different.

In [None]:
def plot_comparison(no_annot, state=24):
    
    # Select image paths & their annotations
    paths_compare = list(train_df[train_df["no_annotations"]==no_annot]\
                         .sample(n=9, random_state=state)["path"])
    annotations_compare = list(train_df[train_df["no_annotations"]==no_annot]\
                               .sample(n=9, random_state=state)["annotations"])

    # Plot
    fig, axs = plt.subplots(3, 3, figsize=(23, 13))
    axs = axs.flatten()
    fig.suptitle(f"{no_annot} annotations", fontsize = 20)

    for k, (path, annot) in enumerate(zip(paths_compare, annotations_compare)):
        video_id = path.split("/")[4]
        frame_id = path.split("/")[-1].split(".")[0]
        
        axs[k].set_title(f"{video_id} | Frame {frame_id}",
                         fontsize = 12)
        show_image(path, annot, axs[k])

    plt.tight_layout()
    plt.show()

In [None]:
# No annotations
no_annot = 0
plot_comparison(no_annot, state=24)

In [None]:
# 5 annotations
no_annot = 5
plot_comparison(no_annot, state=24)

In [None]:
# 17 annotations
no_annot = 17
plot_comparison(no_annot, state=24)

# 3. Bounding Box Augmentation

In this part I wanted to explore some ways to do Image Augmentation **and** adjust the annotations (aka bounding boxes) to match all sorts of augmentations applied on the image.

üê° Before we do that, we will need to format the annotations we have now:
* from this: {'x': 628, 'y': 321, 'width': 42, 'height': 47}
* to this: {'x1': 628, 'y1': 321, 'x2': 670, 'y2': 368} => [628, 321, 670, 368]

<center><img src="https://i.imgur.com/7sYUdCb.png" width=950></center>

In order to do so, we just need to compute as follows:
* x1 = x
* y1 = y
* x2 = x + width
* y2 = y + height

> ü¶¶ **Note**: we are adding and not substracting to y2 because we aren't using a coordinate system, although x and y are coordinates, but an image, so the "coordinates" are actually pixels on the surface. Hence, the top left corner of an image has the coordinates `[0, 0]`, while the bottom right corner has the coordinates `[width_max, height_max]`.

*Example*:

* first bbox (bigger one): `{'x': 520, 'y': 151, 'width': 78, 'height': 62}`
* second bbox (smaller one): `{'x': 598, 'y': 204, 'width': 58, 'height': 32}`

<center><img src="https://i.imgur.com/KfLQKma.png" width=600></center>

In [None]:
def format_annotations(x):
    '''Changes annotations from format {x, y, width, height} to {x1, y1, x2, y2}.
    x: a string of the initial format.'''
    
    annotations = eval(x)
    new_annotations = []

    if annotations:
        for annot in annotations:
            new_annotations.append([annot["x"],
                                    annot["y"],
                                    annot["x"]+annot["width"],
                                    annot["y"]+annot["height"]
                                   ])
    
    if new_annotations: return str(new_annotations)
    else: return "[]"

In [None]:
# Create a new column with the new formated annotations
train_df["f_annotations"] = train_df["annotations"].apply(lambda x: format_annotations(x))

üê° One last thing I would like to do is create a new function called `show_image_bbox` that receives the new formated annotations ({x1, y1, x2, y2}) and displays the new augmented image.

In [None]:
def show_image_bbox(img, annot, axs=None):
    '''Shows an image and marks any COTS annotated within the frame.
    img: the output from cv2.imread()
    annot: FORMATED annotation'''
    
    # This is in case we plot only 1 image
    if axs==None:
        fig, axs = plt.subplots(figsize=(23, 8))
    
    axs.imshow(img)

    if annot:
        for a in annot:
            rect = patches.Rectangle((a[0], a[1]), a[2]-a[0], a[3]-a[1], 
                                     linewidth=3, edgecolor="#FF6103", facecolor='none')
            axs.add_patch(rect)

    axs.axis("off")

## 3.1 (Random) Horizontal Flip

Creates a class that (randomly) flips the image (and the bounding box with it).

> **Note**: Keep in mind that cv2 works with *BGR* images - so, in order to view the original image within RGB, we need to convert using `cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)`.

*üê° Note: Most of my inspiration and research is from here: https://blog.paperspace.com/data-augmentation-for-bounding-boxes/*

In [None]:
class RandomHorizontalFlip(object):

    def __init__(self, p=0.5):
        # p = probability of the image to be flipped
        # set p = 1 to always flip
        self.p = p
        
    def __call__(self, img, bboxes):
        '''img : the image to be flipped
        bboxes : the annotations within the image'''
        
        # Convert bboxes
        bboxes = np.array(bboxes)
        
        img_center = np.array(img.shape[:2])[::-1]/2
        img_center = np.hstack((img_center, img_center))
        
        # If random number between 0 and 1 < probability p
        if random.random() < self.p:
            # Reverse image elements in the 1st dimension
            img =  img[:,::-1,:]
            bboxes[:,[0,2]] = bboxes[:,[0,2]] + 2*(img_center[[0,2]] - bboxes[:,[0,2]])
            
            # Convert the bounding boxes
            box_w = abs(bboxes[:,0] - bboxes[:,2])
            bboxes[:,0] -= box_w
            bboxes[:,2] += box_w
            
        return img, bboxes.tolist()

üê° Let's see an example of the original image and then the **flipped** one.

In [None]:
# Take an example
path = list(train_df[train_df["no_annotations"]==18]["path"])[0]

img_original = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
annot_original = eval(list(train_df[train_df["no_annotations"]==18]["f_annotations"])[0])

# Horizontal Flip
horizontal_flip = RandomHorizontalFlip(p=1)  
img_flipped, annot_flipped = horizontal_flip(img_original, annot_original)



# Show the Before and After
fig, axs = plt.subplots(1, 2, figsize=(23, 10))
axs = axs.flatten()
fig.suptitle(f"(Random) Horizontal Flip", fontsize = 20)

axs[0].set_title("Original Image", fontsize = 20)
show_image_bbox(img_original, annot_original, axs=axs[0])

axs[1].set_title("With Horizontal Flip", fontsize = 20)
show_image_bbox(img_flipped, annot_flipped, axs[1])

plt.tight_layout()
plt.show()

## 3.2 (Random) Scaling

When we scale the image, we **descrease it's original size**. In this case, the *bounding boxes which have an area of less than 25% in the remaining in the transformed image is dropped*. The resolution is maintained, and the remaining area if any is filled by black color.

*üê° Note: Most of my inspiration and research is from here: https://blog.paperspace.com/data-augmentation-bounding-boxes-scaling-translation/*

In [None]:
# ==== Clips the bboxes ====
def bbox_area(bbox):
    return (bbox[:,2] - bbox[:,0])*(bbox[:,3] - bbox[:,1])

def clip_box(bbox, clip_box, alpha):
    """
    Clip the bounding boxes to the borders of an image
    bbox: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
    
    clip_box: numpy.ndarray
        An array of shape (4,) specifying the diagonal co-ordinates of the image
        The coordinates are represented in the format `x1 y1 x2 y2`
        
    alpha: float
        If the fraction of a bounding box left in the image after being clipped is 
        less than `alpha` the bounding box is dropped. 
    
    Returns
    -------
    numpy.ndarray
        Numpy array containing **clipped** bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes left are being clipped and the bounding boxes are represented in the
        format `x1 y1 x2 y2` 
    """
    ar_ = (bbox_area(bbox))
    x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)
    y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)
    x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)
    y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)
    
    bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))
    
    delta_area = ((ar_ - bbox_area(bbox))/ar_)
    
    mask = (delta_area < (1 - alpha)).astype(int)
    
    bbox = bbox[mask == 1,:]


    return bbox

In [None]:
class RandomScale(object):

    def __init__(self, scale = 0.2, diff = False):
        
        # scale must always be a positive number
        self.scale = scale
        self.scale = (max(-1, -self.scale), self.scale)
        
        # Maintain the aspect ratio
        # (scaling factor remains the same for width & height)
        self.diff = diff
        
        
    def __call__(self, img, bboxes):
        
        # Convert bboxes
        bboxes = np.array(bboxes)

        #Chose a random digit to scale by 
        img_shape = img.shape

        if self.diff:
            scale_x = random.uniform(*self.scale)
            scale_y = random.uniform(*self.scale)
        else:
            scale_x = random.uniform(*self.scale)
            scale_y = scale_x

        resize_scale_x = 1 + scale_x
        resize_scale_y = 1 + scale_y

        # Resize the image by scale factor
        img = cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

        bboxes[:,:4] = bboxes[:,:4] * [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]

        # The black image (the remaining area after we have clipped the image)
        canvas = np.zeros(img_shape, dtype = np.uint8)

        # Determine the size of the scaled image
        y_lim = int(min(resize_scale_y,1)*img_shape[0])
        x_lim = int(min(resize_scale_x,1)*img_shape[1])

        canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

        img = canvas
        # Adjust the bboxes - remove all annotations that dissapeared after the scaling
        bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)

        return img, bboxes.tolist()

üê° Let's see an example of the original image and then the **scaled** one.

In [None]:
random.seed(24)

# Scaling
scale = RandomScale(scale=1.3, diff = False) 
img_scaled, annot_scaled = scale(img_original, annot_original)



# Show the Before and After
fig, axs = plt.subplots(1, 2, figsize=(23, 10))
axs = axs.flatten()
fig.suptitle(f"(Random) Image Scaling", fontsize = 20)

axs[0].set_title("Original Image", fontsize = 20)
show_image_bbox(img_original, annot_original, axs=axs[0])

axs[1].set_title("Scaled (zoomed in) Image", fontsize = 20)
show_image_bbox(img_scaled, annot_scaled, axs[1])

plt.tight_layout()
plt.show()

## 3.3 (Random) Translate

When we translate the image we **move it around on the canvas**. It's like if you would look through a camera lence at a piece of paper on a table and then you would move it left, right, up or down, leaving some parts of the table exposed and some areas or the paper not visible.

As in the case of scaling, the *bounding boxes which have an area of less than 25% in the remaining in the transformed image is dropped*. The resolution is maintained, and the remaining area if any is filled by black color.

*üê° Note: Most of my inspiration and research is from here: https://blog.paperspace.com/data-augmentation-bounding-boxes-scaling-translation/*

In [None]:
class RandomTranslate(object):

    def __init__(self, translate = 0.2, diff = False):
        
        self.translate = translate
        self.translate = (-self.translate, self.translate)
            
        # Maintain the aspect ratio
        # (scaling factor remains the same for width & height)
        self.diff = diff
        
    def __call__(self, img, bboxes):  
        
        # Convert bboxes
        bboxes = np.array(bboxes)
        
        # Chose a random digit to scale by 
        img_shape = img.shape

        # Percentage of the dimension of the image to translate
        translate_factor_x = random.uniform(*self.translate)
        translate_factor_y = random.uniform(*self.translate)

        if not self.diff:
            translate_factor_y = translate_factor_x

        canvas = np.zeros(img_shape).astype(np.uint8)

        corner_x = int(translate_factor_x*img.shape[1])
        corner_y = int(translate_factor_y*img.shape[0])

        #Change the origin to the top-left corner of the translated box
        orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

        mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
        canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
        img = canvas

        bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]

        bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)

        return img, bboxes.tolist()

üê° Let's see an example of the original image and then the **translated** one.

In [None]:
random.seed(25)

# Translate
translate = RandomTranslate(translate=0.4, diff = False) 
img_translated, annot_translated = translate(img_original, annot_original)



# Show the Before and After
fig, axs = plt.subplots(1, 2, figsize=(23, 10))
axs = axs.flatten()
fig.suptitle(f"(Random) Image Translation", fontsize = 20)

axs[0].set_title("Original Image", fontsize = 20)
show_image_bbox(img_original, annot_original, axs=axs[0])

axs[1].set_title("Translated (shifted) Image", fontsize = 20)
show_image_bbox(img_translated, annot_translated, axs[1])

plt.tight_layout()
plt.show()

## 3.4 (Random) Rotation

Rotation is when (you guessed) you rotate the image a random number degrees (and it might be the hardest one to deal with when trying to accomodate bounding boxes with it).

*üê° Note: Most of my inspiration and research is from here: https://blog.paperspace.com/data-augmentation-for-object-detection-rotation-and-shearing/*

TODO: needs work, bboxes don't rotate with the image

### ‚¨áÔ∏è Function for Image Rotation

In [None]:
# === Image Rotation ===

def rotate_im(image, angle):
    '''image: numpy array of the image'''
    '''angle: a float that specifies the angle the image should be rotated.'''

    # Image dimensions
    (h, w) = image.shape[:2]
    # Image Centre
    (cX, cY) = (w // 2, h // 2)

    # Rotation Matrix from cv2
    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
    # Sine & Cosine - rotation components of the matrix
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])

    # NEW Bounding Dimensions of the image
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))

    # Adjust the rotation matrix to take into account translation
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY

    # Perform the Rotation
    image = cv2.warpAffine(image, M, (nW, nH))

    return image

### ‚¨áÔ∏è Functions for BBox Rotation

In [None]:
# === Get Corners of Bounding Boxes ===

def get_corners(bboxes):
    '''bboxes: array of the original bounding boxes.'''
    
    width = (bboxes[:,2] - bboxes[:,0]).reshape(-1,1)
    height = (bboxes[:,3] - bboxes[:,1]).reshape(-1,1)
    
    x1 = bboxes[:,0].reshape(-1,1)
    y1 = bboxes[:,1].reshape(-1,1)
    
    x2 = x1 + width
    y2 = y1 
    
    x3 = x1
    y3 = y1 + height
    
    x4 = bboxes[:,2].reshape(-1,1)
    y4 = bboxes[:,3].reshape(-1,1)
    
    # Each bounding box is described by 8 coordinates x1,y1,x2,y2,x3,y3,x4,y4
    corners = np.hstack((x1,y1,x2,y2,x3,y3,x4,y4))
    
    return corners


# === Box Rotation ===

def rotate_box(corners, angle, cx, cy, h, w):
    '''
    corners: output from get_corners()
    angle:  a float that specifies the angle the image should be rotated
    cx, cy: coordinates for the xenter of the image
    h, w: height and width of the image
    '''
    
    # corners = x1,y1,x2,y2,x3,y3,x4,y4
    corners = corners.reshape(-1,2)
    corners = np.hstack((corners, np.ones((corners.shape[0],1), dtype = type(corners[0][0]))))
    
    # Rotation Matrix from cv2
    M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)
    # Sine & Cosine - rotation components of the matrix
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    
    # NEW Bounding Dimensions of the image
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    
    # Adjust the rotation matrix to take into account translation
    M[0, 2] += (nW / 2) - cx
    M[1, 2] += (nH / 2) - cy
    
    # Prepare the vector to be transformed
    calculated = np.dot(M,corners.T).T
    calculated = calculated.reshape(-1,8)
    
    return calculated


# === Get the Enclosing Box ===

def get_enclosing_box(corners):
    '''corners: output from get_corners()'''
    
    x_ = corners[:,[0,2,4,6]]
    y_ = corners[:,[1,3,5,7]]
    
    xmin = np.min(x_,1).reshape(-1,1)
    ymin = np.min(y_,1).reshape(-1,1)
    xmax = np.max(x_,1).reshape(-1,1)
    ymax = np.max(y_,1).reshape(-1,1)
    
    # Notation where each bounding box is determined by 4 coordinates or two corners
    final = np.hstack((xmin, ymin, xmax, ymax,corners[:,8:]))
    
    return final

In [None]:
class RandomRotate(object):

    def __init__(self, angle = 10):
        
        self.angle = angle
        self.angle = (-self.angle, self.angle)
        
        
    def __call__(self, img, bboxes):

        # Convert bboxes
        bboxes = np.array(bboxes)
        
        # Compute the random angle
        angle = random.uniform(*self.angle)

        # width, height and center of the image
        w,h = img.shape[1], img.shape[0]
        cx, cy = w//2, h//2

        # Rotate the image
        img = rotate_im(img, angle)

        # --- Rotate the bounding boxes ---
        # Get the 4 point corner coordinates
        corners = get_corners(bboxes)
        corners = np.hstack((corners, bboxes[:,4:]))
        # Rotate the bounding box
        corners[:,:8] = rotate_box(corners[:,:8], angle, cx, cy, h, w)
        # Get the enclosing (new bboxes)
        new_bbox = get_enclosing_box(corners)

        # Get scaling factors to clip the image and bboxes
        scale_factor_x = img.shape[1] / w
        scale_factor_y = img.shape[0] / h

        # Rescale the image - to w,h and not nW,nH
        img = cv2.resize(img, (w,h))

        # Clip boxes (in case there are any outside of the rotated image)
        bboxes[:,:4] = bboxes[:,:4] / [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y] 
        bboxes = clip_box(bboxes, [0,0,w, h], 0.25)

        return img, bboxes.tolist()

üê° Let's see an example of the original image and then the **rotated** one.

In [None]:
random.seed(25)

# Translate
rotate = RandomRotate(angle=25) 
img_rotated, annot_rotated = rotate(img_original, annot_original)



# Show the Before and After
fig, axs = plt.subplots(1, 2, figsize=(23, 10))
axs = axs.flatten()
fig.suptitle(f"(Random) Image Rotation", fontsize = 20)

axs[0].set_title("Original Image", fontsize = 20)
show_image_bbox(img_original, annot_original, axs=axs[0])

axs[1].set_title("Rotated Image", fontsize = 20)
show_image_bbox(img_rotated, annot_rotated, axs[1])

plt.tight_layout()
plt.show()

## 3.5 (Random) Shearing

Finally, shearing is when the image is shifted, like it is dragged from one corner and opposite to the other, so the image ends up looking sort of like a parallelogram.

*üê° Note: Most of my inspiration and research is from here: https://blog.paperspace.com/data-augmentation-for-object-detection-rotation-and-shearing/*

In [None]:
class RandomShear(object):

    def __init__(self, shear_factor = 0.2):
        
        self.shear_factor = shear_factor
        self.shear_factor = (-self.shear_factor, self.shear_factor)
        
        shear_factor = random.uniform(*self.shear_factor)
        
        
    def __call__(self, img, bboxes):
        
        # Convert bboxes
        bboxes = np.array(bboxes)

        # Get the shear factor and size of the image
        shear_factor = random.uniform(*self.shear_factor)
        w,h = img.shape[1], img.shape[0]

        # Flip the image and boxes horizontally
        if shear_factor < 0:
            img, bboxes = HorizontalFlip()(img, bboxes)

        # Apply the shear transformation
        M = np.array([[1, abs(shear_factor), 0],[0,1,0]])
        nW =  img.shape[1] + abs(shear_factor*img.shape[0])

        bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int) 

        # Transform using cv2 warpAffine (like in rotation)
        img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))

        # Flip the image back again
        if shear_factor < 0:
            img, bboxes = HorizontalFlip()(img, bboxes)

        # Resize
        img = cv2.resize(img, (w,h))

        scale_factor_x = nW / w
        bboxes[:,:4] = bboxes[:,:4] / [scale_factor_x, 1, scale_factor_x, 1] 
        
        return img, bboxes.tolist()

üê° Let's see an example of the original image and then the **sheared** one.

In [None]:
random.seed(25)

# Translate
shear = RandomShear(shear_factor=0.9) 
img_sheared, annot_sheared = shear(img_original, annot_original)



# Show the Before and After
fig, axs = plt.subplots(1, 2, figsize=(23, 10))
axs = axs.flatten()
fig.suptitle(f"(Random) Image Shear", fontsize = 20)

axs[0].set_title("Original Image", fontsize = 20)
show_image_bbox(img_original, annot_original, axs=axs[0])

axs[1].set_title("Sheared Image", fontsize = 20)
show_image_bbox(img_sheared, annot_sheared, axs[1])

plt.tight_layout()
plt.show()

### üêù Log Augmented Images to W&B

Let's now log the a sample of each augmentation to the W&B Dashboard.

In [None]:
# === üêùW&B Log (redone for formated annotations) ===
def wandb_bboxes(image, annotations):
    '''Source: https://www.kaggle.com/ayuraj/visualize-bounding-boxes-interactively
    image: the cv2.imread() output
    annotations: the FORMATED annotations from the train dataset'''
    
    all_annotations = []
    if annotations:
        for annot in annotations:
            data = {"position": {
                            "minX": annot[0],
                            "minY": annot[1],
                            "maxX": annot[2],
                            "maxY": annot[3]
                        },
                    "class_id" : 1,
                    "domain" : "pixel"}
            all_annotations.append(data)
    
    return wandb.Image(image, 
                       boxes={"ground_truth": {"box_data": all_annotations}}
                      )

# Log all augmented images to the Dashboard
wandb.log({"flipped": wandb_bboxes(img_flipped, annot_flipped)})
wandb.log({"scaled": wandb_bboxes(img_scaled, annot_scaled)})
wandb.log({"translated": wandb_bboxes(img_translated, annot_translated)})
wandb.log({"rotated": wandb_bboxes(img_rotated, annot_rotated)})
wandb.log({"sheared": wandb_bboxes(img_sheared, annot_sheared)})

In [None]:
wandb.finish()

# 4. Final Changes to Train datasets

This is the part where we create the last *helper features* for our dataset.

## What is the COCO format?

üê° As we have seen, an Object Detection model locates an object within an image using a **bounding box**. However, this bounding box can have multiple ways of being displayed, as there is no "wrong" way to locate a rectangle within an image:

* `[x, y, width, height]` - this is the case in our training dataset (also called the COCO format).
* `[x1, y1, x2, y2]` - the *formated* version we have created during the BBox Augmentation phase, also called `[xmin, ymin, xmax, ymax]`. This format is used within the [SSD/ RCNN/ Fast RCNN/ Faster RCNN models](https://lohithmunakala.medium.com/bounding-box-formats-for-models-like-yolo-ssd-rcnn-fast-rcnn-faster-rcnn-807be7721527).
* `[x_center, y_center, width, height]` - this is the YOLO format, or rather the format used when training using the YOLO model. x_center, y_center are the normalized coordinates of the center of the bounding box and width, height are the normalized width and height of the image.

üê° **COCO** comes from Common Objects in Context, which is a database that aims to support and improve models for Object Detection, Instance Segmentation and Image Captioning.

In [None]:
# Create sepparate paths for images and their labels (annotations)
# these will come in handy later for the YOLO model
train_df["path_images"] = "/kaggle/images/video_" + train_df["video_id"].astype(str) + "_" + \
                                                train_df["video_frame"].astype(str) + ".jpg"
train_df["path_labels"] = "/kaggle/labels/video_" + train_df["video_id"].astype(str) + "_" + \
                                                train_df["video_frame"].astype(str) + ".txt"

# Save the width and height of the images
# it is the same for the entire dataset
train_df["width"] = 1280
train_df["height"] = 720

# Simplify the annotation format
train_df["coco_bbox"] = train_df["annotations"].apply(lambda annot: [list(item.values()) for item in eval(annot)])

# Data Sample
train_df.sample(5, random_state=24)

In [None]:
# Save dataset
train_df.to_csv("train.csv", index=False)


# üêù Save dataset Artifact
save_dataset_artifact(run_name="save-train-data",
                      artifact_name="train_meta",
                      path="../input/2021-greatbarrierreef-prep-data/train.csv")

<center><img src="https://i.imgur.com/0cx4xXI.png"></center>

### üêù W&B Dashboard

> My W&B Dashboard is [here](https://wandb.ai/andrada/GreatBarrierReef/workspace?workspace=user-andrada).

<center><video src="https://i.imgur.com/qMGR4Xe.mp4" width=800 controls></center>

<center><img src="https://i.imgur.com/knxTRkO.png"></center>

### My Specs

* üñ• Z8 G4 Workstation
* üíæ 2 CPUs & 96GB Memory
* üéÆ NVIDIA Quadro RTX 8000
* üíª Zbook Studio G7 on the go