This notebook contains data and code for Kaggle competition: [TensorFlow - Help Protect the Great Barrier Reef](https://www.kaggle.com/c/tensorflow-great-barrier-reef/overview) and follows documentation: [Training Custom Object Detector](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html) from TensorFlow Object Detection API Tutorial.

Additional code references:  
https://www.kaggle.com/khanhlvg/cots-detection-w-tensorflow-object-detection-api#Prepare-the-training-dataset  
https://www.kaggle.com/andradaolteanu/greatbarrierreef-full-guide-to-bboxaugmentation

### Install Object Detection API in Kaggle notebook
- To successfully install and import Object Detection API, I followed codes below:  

```python
!pip uninstall tf-models-official --yes
!pip install tensorflow==2.8.0
!pip install tf-models-official==2.8.0

!git clone --depth 1 https://github.com/tensorflow/models

%%bash
cd models/research/
protoc object_detection/protos/*.proto --python_out=.

%%bash
cd models/research/
cp object_detection/packages/tf2/setup.py .
python -m pip install --use-feature=2020-resolver .

### Now the API installed
### you can run model_builder_tf2_test.py to check installation here

### httplib2 needed to import
!pip3 install httplib2==0.20.2
```

## Import dependencies

In [None]:
import numpy as np
import pandas as pd
import time
import os
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.patches as patches
%matplotlib inline
import io
from PIL import Image
tf.get_logger().setLevel('ERROR')

## 1. Load data

In [None]:
train_df = pd.read_csv("../input/tensorflow-great-barrier-reef/train.csv")
print(train_df.head())
print(train_df.info())
print("object dtype check for `image_id` and `annotations`")
print(type(train_df.image_id[0]), type(train_df.annotations[0]))

In [None]:
# print random sample
print(train_df.sample(5)) # .iloc[:, [0, 2, 4]]

- Loading `train.csv` file, `image_id` and `annotations` are converted to `str` type object. `image_id` is represented as `video_id-video_frame`-like encoding for all observations, while `annotations` is represented as a list of dict-type annotations. Re-open data using `ast.literal_eval` as converter and drop redundant column.

In [None]:
from ast import literal_eval
train_df = pd.read_csv("../input/tensorflow-great-barrier-reef/train.csv",
                    converters={"annotations": literal_eval})
train_df.drop(['image_id'], axis=1, inplace=True)
print(train_df.info())

In [None]:
# Grouped by `video_id` and `sequence`
print(train_df.groupby(['video_id', 'sequence']).size())

### Visualize image sequence with annotation bboxes

In [None]:
def show_image(path, annot, axs=None):
    '''Shows an image and marks any starfish annotated within the frame.
    path: full path to the .jpg image
    annot: list of the annotation for the coordinates of starfish'''
    
    # This is in case we plot only 1 image
    if axs==None:
        fig, axs = plt.subplots(figsize=(23, 8))
    img = plt.imread(path)
    axs.imshow(img)
    if annot:
        for a in annot:
            rect = patches.Rectangle((a["x"], a["y"]), a["width"], a["height"], 
                                     linewidth=3, edgecolor="#FF6103", facecolor='none')
            axs.add_patch(rect)
    axs.axis("off")

def show_multiple_images(seq_id, frame_no): # 6 images
    '''Shows multiple images within a sequence.
    seq_id: a number corresponding with the sequence unique ID
    frame_no: a number of first frame to plot'''
    
    # Select image paths & their annotations
    sample = train_df[(train_df["sequence"]==seq_id) &
                      (train_df["sequence_frame"]>=frame_no) &
                      (train_df["sequence_frame"]<=frame_no+5)]
    paths = []
    annotations = []
    IMAGE_DIR = "../input/tensorflow-great-barrier-reef/train_images"
    for vid, _, vframe, _, annot in sample.values:
        paths.append(os.path.join(IMAGE_DIR, "video_"+str(vid), str(vframe)+'.jpg'))
        annotations.append(annot)    
    # Plot
    fig, axs = plt.subplots(2, 3, figsize=(23, 10))
    axs = axs.flatten()
    fig.suptitle(f"Showing consecutive frames for Sequence ID: {seq_id}", fontsize = 20)

    for k, (path, annot) in enumerate(zip(paths, annotations)):
        axs[k].set_title(f"Frame No: {frame_no+k}", fontsize = 12)
        show_image(path, annot, axs[k])

    plt.tight_layout()
    plt.show()

In [None]:
seq_id, frame_no = 60754, 698
show_multiple_images(seq_id, frame_no)

## 2. Prepare dataset for Tensorflow Object Detection API

- Generate label_map.pbtxt

In [None]:
# Create a label map to map between label index and human-readable label name.
label_map_str = """item {
  id: 1
  name: 'starfish'
}"""
with open('label_map.pbtxt', 'w') as f: # created on current working dir
    f.write(label_map_str)

- From dataset `train_df`, extract annotated observations and generate tfrecord file
    - Save 20% of dataset for evaluation

In [None]:
mask = train_df.annotations.isin([[]])
annot_df = train_df[~mask]
from sklearn.model_selection import train_test_split
train_annot_df, test_annot_df = train_test_split(annot_df, test_size=0.2, random_state=1243)
print("Total annotated images:{} \nNumber of Train images:{} \nNumber of Test images:{}".format(len(annot_df), len(train_annot_df), len(test_annot_df)))

In [None]:
# created on current working dir
train_annot_df.to_csv('train_annot_df.csv', index=None)
test_annot_df.to_csv('test_annot_df.csv', index=None)

- Convert the training and validation dataset into TFRecord format as required by the TensorFlow Object Detection API. You can find [installation guide](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html) here.

- We will use pretrained model EfficientDet-D0:  
        Map bounding box {xmin, ymin, width, height} for xmin, ymin in range of $[0, 255)$, as {xmin, ymin, xmax, ymax} for values in range of $[0, 1)$ dividing by corresponding axis value.

In [None]:
from object_detection.utils import dataset_util
def class_text_to_int(cname):
    if cname == 'starfish':
        return 1
    else:
        None
def generate_tfrecord(dataframe, output_path):
    writer = tf.io.TFRecordWriter(output_path)
    for v in dataframe.values:
        vid, _, fpath, _, annot = v
        IMAGE_DIR = "../input/tensorflow-great-barrier-reef/train_images"
        fpath = os.path.join(IMAGE_DIR, 'video_'+str(vid), str(fpath)+'.jpg')
        with tf.io.gfile.GFile(fpath, 'rb') as fid:
            encoded_jpg = fid.read()
        encoded_jpg_io = io.BytesIO(encoded_jpg)
        image = Image.open(encoded_jpg_io)
        width, height = image.size # (1280, 720)
        filename = fpath.split(os.sep)[-1]
        filename = filename.encode('utf8')
        cname = 'starfish'
        image_format = b'jpg'
        xmins = []
        xmaxs = []
        ymins = []
        ymaxs = []
        classes_text = []
        classes = []
        for box in annot:
            xmin, ymin, w, h = box.values()
            xmax, ymax = xmin + w, ymin + h
            xmins.append(xmin / width)
            xmaxs.append(xmax / width)
            ymins.append(ymin / height)
            ymaxs.append(ymax / height)
            classes_text.append(cname.encode('utf8'))
            classes.append(class_text_to_int(cname))
        tf_example = tf.train.Example(features=tf.train.Features(feature={
            'image/height': dataset_util.int64_feature(height),
            'image/width': dataset_util.int64_feature(width),
            'image/filename': dataset_util.bytes_feature(filename),
            'image/source_id': dataset_util.bytes_feature(filename),
            'image/encoded': dataset_util.bytes_feature(encoded_jpg),
            'image/format': dataset_util.bytes_feature(image_format),
            'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
            'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
            'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
            'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
            'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
            'image/object/class/label': dataset_util.int64_list_feature(classes),
        }))
        writer.write(tf_example.SerializeToString())
    writer.close()

In [None]:
# created on current working dir
# generate_tfrecord(train_annot_df, "train.record")
# generate_tfrecord(test_annot_df, "test.record")

## 3. Train an object detection model

### Local environment settings:  
- CUDA on WSL: Windows 11 > Docker Desktop > tensorflow-gpu  
- GPU: RTX 3070  

###  Model: EfficientDet-D0  
- Download pretrained model
```python
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d0_coco17_tpu-32.tar.gz
!tar -xvzf efficientdet_d0_coco17_tpu-32.tar.gz
```
- pipeline.config
    - num_classes: 1
    - image_resizer: 768x768 # GPU memory limit
    - batch_size: 2
    - num_steps: 20000
    - use_bfloat16: false # TPU disabled

### Command line codes
```python
### train: took less then 2 hours on my local environment
!python model_main_tf2.py --model_dir=models/my_efficientDet \
    --pipeline_config_path=models/my_efficientDet/pipeline.config

### evaluation
!python model_main_tf2.py --model_dir=models/my_efficientDet \
    --pipeline_config_path=models/my_efficientDet/pipeline.config \
    --checkpoint_dir=models/my_efficientDet

### export
!python exporter_main_v2.py --input_type image_tensor \
    --pipeline_config_path models/my_efficientDet/pipeline.config \
    --trained_checkpoint_dir models/my_efficientDet \
    --output_directory exported-models/my_efficientDet
```

### Evalutaion result
```
Accumulating evaluation results...
DONE (t=0.52s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.365
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.712
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.314
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.097
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.427
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.602
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.220
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.492
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.262
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.548
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.665
INFO:tensorflow:Eval metrics at step 20000
I0214 20:29:34.667814 139819700154944 model_lib_v2.py:1015] Eval metrics at step 20000
INFO:tensorflow:        + DetectionBoxes_Precision/mAP: 0.364686
I0214 20:29:34.711701 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP: 0.364686
INFO:tensorflow:        + DetectionBoxes_Precision/mAP@.50IOU: 0.712469
I0214 20:29:34.713693 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP@.50IOU: 0.712469
INFO:tensorflow:        + DetectionBoxes_Precision/mAP@.75IOU: 0.313904
I0214 20:29:34.715082 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP@.75IOU: 0.313904
INFO:tensorflow:        + DetectionBoxes_Precision/mAP (small): 0.097257
I0214 20:29:34.716626 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP (small): 0.097257
INFO:tensorflow:        + DetectionBoxes_Precision/mAP (medium): 0.426918
I0214 20:29:34.717952 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP (medium): 0.426918
INFO:tensorflow:        + DetectionBoxes_Precision/mAP (large): 0.601684
I0214 20:29:34.719259 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Precision/mAP (large): 0.601684
INFO:tensorflow:        + DetectionBoxes_Recall/AR@1: 0.219556
I0214 20:29:34.720585 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@1: 0.219556
INFO:tensorflow:        + DetectionBoxes_Recall/AR@10: 0.433014
I0214 20:29:34.721925 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@10: 0.433014
INFO:tensorflow:        + DetectionBoxes_Recall/AR@100: 0.491768
I0214 20:29:34.723235 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@100: 0.491768
INFO:tensorflow:        + DetectionBoxes_Recall/AR@100 (small): 0.262473
I0214 20:29:34.724538 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@100 (small): 0.262473
INFO:tensorflow:        + DetectionBoxes_Recall/AR@100 (medium): 0.548051
I0214 20:29:34.725793 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@100 (medium): 0.548051
INFO:tensorflow:        + DetectionBoxes_Recall/AR@100 (large): 0.664706
I0214 20:29:34.727559 139819700154944 model_lib_v2.py:1018]     + DetectionBoxes_Recall/AR@100 (large): 0.664706
INFO:tensorflow:        + Loss/localization_loss: 0.163040
I0214 20:29:34.728614 139819700154944 model_lib_v2.py:1018]     + Loss/localization_loss: 0.163040
INFO:tensorflow:        + Loss/classification_loss: 0.422540
I0214 20:29:34.729672 139819700154944 model_lib_v2.py:1018]     + Loss/classification_loss: 0.422540
INFO:tensorflow:        + Loss/regularization_loss: 0.032527
I0214 20:29:34.730714 139819700154944 model_lib_v2.py:1018]     + Loss/regularization_loss: 0.032527
INFO:tensorflow:        + Loss/total_loss: 0.618107
I0214 20:29:34.731794 139819700154944 model_lib_v2.py:1018]     + Loss/total_loss: 0.618107
```
### F2 score(evalutation metric for competition)  
    DetectionBoxes_Precision/mAP : 0.364686  
    DetectionBoxes_Recall/AR@100 : 0.491768  
    Expected F2 score for IoU=0.50:0.95 : 0.459727  

## 4. Run inference and visualization
- Fine tuned model uploaded to dir: /eff-768-v1  
    Test on unused dataset: test_annot_df.csv

In [None]:
#PATH_TO_MODEL_DIR = os.path.join(os.getcwd(),'exported-models', 'my_efficientDet') # in local env
PATH_TO_MODEL_DIR = "../input/eff-768-v1"

In [None]:
PATH_TO_SAVED_MODEL = PATH_TO_MODEL_DIR # + 'saved_model' in local env

print('Loading model...', end='')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

In [None]:
IMAGE_DIR = "../input/tensorflow-great-barrier-reef/train_images"

data = pd.read_csv("test_annot_df.csv",
                    converters={"annotations": literal_eval})

In [None]:
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
PATH_TO_LABELS = 'label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS,
                                                                    use_display_name=True)
def load_image_into_numpy_array(path):
    return np.array(Image.open(path))

In [None]:
# %matplotlib inline
test_data = data.sample(5)
for vid, _, vframe, _, annot in test_data.values:
    if not annot:
        continue
    image_path = os.path.join(IMAGE_DIR, 'video_' + str(vid), str(vframe) + '.jpg')
    print('Running inference for {}... '.format(image_path), end='')
    image_np = load_image_into_numpy_array(image_path)
    input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.uint8)
    detections = detect_fn(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(detections.pop('num_detections'))
    detections = {key: value[0, :num_detections].numpy()
                  for key, value in detections.items()}
    detections['num_detections'] = num_detections

    # detection_classes should be ints.
    detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

    label_id_offset = 0
    image_np_with_detections = image_np.copy()

    viz_utils.visualize_boxes_and_labels_on_image_array(
            image_np_with_detections,
            detections['detection_boxes'],
            detections['detection_classes']+label_id_offset,
            detections['detection_scores'],
            category_index,
            use_normalized_coordinates=True,
            max_boxes_to_draw=20,
            min_score_thresh=0.1,
            agnostic_mode=False,
            line_thickness=3
            )
    height, width = image_np.shape[:2]
    if annot:
        boxes = []
        for box in annot:
            xmin, ymin, w, h = box.values()
            boxes.append([ymin/height,xmin/width,(ymin+h)/height,(xmin+w)/width])
        image_np_with_detections = tf.image.draw_bounding_boxes(
                image_np_with_detections.reshape([1, *image_np_with_detections.shape]), np.array(boxes)[tf.newaxis,...], [[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]])
        image_np_with_detections = np.array(image_np_with_detections, dtype=np.uint8)
        image_np_with_detections = image_np_with_detections[0,:]
        plt.figure(figsize=(16,9))
        plt.imshow(image_np_with_detections)
        print('Done')
plt.show()

- Note that:
    Black boxes are actual bounding boxes  
    Green boxes are predicted bounding boxes with score threshold 0.1  
    Model works pretty well on large objects, but struggles to manage smaller ones  
- Ideas may improve the performance:  
    - Higher resolution, rather than 768x768 (proper physical device may needed)
    - Data augmentation
    - Better base model architecture
    - Train longer
    - etc.