# Intro
Welcome to the [Tensorflow - Help Protect the Great Barrier Reef](https://www.kaggle.com/c/tensorflow-great-barrier-reef) compedition.

![](https://storage.googleapis.com/kaggle-competitions/kaggle/31703/logos/header.png)

<span style="color: royalblue;">Please vote the notebook up if it helps you. Feel free to leave a comment above the notebook. Thank you. </span>

# Libraries

In [None]:
import os
import cv2
import ast
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# Path

In [None]:
path = '/kaggle/input/tensorflow-great-barrier-reef/'
os.listdir(path)

# Load Data

In [None]:
train_data = pd.read_csv(path+'train.csv')
test_data = pd.read_csv(path+'test.csv')
samp_subm = pd.read_csv(path+'example_sample_submission.csv')

# Overview

In [None]:
print('Number train samples:', len(train_data))
print('Number test samples:', len(test_data))

* video_id - ID number of the video the image was part of. The video ids are not meaningfully ordered.
* video_frame - The frame number of the image within the video. Expect to see occasional gaps in the frame number from when the diver surfaced.
* sequence - ID of a gap-free subset of a given video. The sequence ids are not meaningfully ordered.
* sequence_frame - The frame number within a given sequence.
* image_id - ID code for the image, in the format '{video_id}-{video_frame}'
* annotations - The bounding boxes of any starfish detections in a string format that can be evaluated directly with Python. Does not use the same format as the predictions you will submit. Not available in test.csv. A bounding box is described by the pixel coordinate (x_min, y_min) of its lower left corner within the image together with its width and height in pixels.

In [None]:
train_data.head()

# EDA
There are 3 values for the feature video_id which represent the number of the underlying folder.

In [None]:
train_data['video_id'].value_counts()

There are 20 different sequences:

In [None]:
train_data['sequence'].value_counts()

# Load Image Files
**train_images/** - Folder containing training set photos of the form video_{video_id}/{video_frame_number}.jpg.

We consider the image with the video_frame id 7981:

In [None]:
video_frame = 7981
file_name = str(video_frame)+'.jpg'
train_data[train_data['video_frame']==video_frame]

As we can see there are 2 images with this video_frame id. One in folder video_0 and another one in folder video_2:

In [None]:
print('file 7981.jpg in folder video_0:', file_name in os.listdir(path+'train_images/video_0'))
print('file 7981.jpg in folder video_2:', file_name in os.listdir(path+'train_images/video_2'))

We load the images:

In [None]:
image_folder_0 = cv2.imread(path+'train_images/video_0/'+file_name)
image_folder_2 = cv2.imread(path+'train_images/video_2/'+file_name)

In [None]:
print('shape of image in folder video_0:', image_folder_0.shape)
print('shape of image in folder video_2:', image_folder_2.shape)

**Plot images**

The image in folder video_0 has no annotations.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 8))
ax.imshow(image_folder_0);
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.show()

The image in the folder video_2 has one annotation:

In [None]:
train_data[(train_data['video_frame']==video_frame)&(train_data['video_id']==2)]['annotations']

In [None]:
row = 20722

We extract the boxes:

In [None]:
boxes = ast.literal_eval(train_data.loc[row, 'annotations'])

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 8))
ax.imshow(image_folder_2)
for box in boxes:
    p = matplotlib.patches.Rectangle((box['x'], box['y']), box['width'], box['height'],
                                     ec='r', fc='none', lw=2.)
    ax.add_patch(p)
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.show()

# Example With Multi Boxes

In [None]:
row = 5454
file_name = str(train_data.loc[row, 'video_frame'])+'.jpg'
video_folder = 'video_'+str(train_data.loc[row, 'video_id'])
boxes = ast.literal_eval(train_data.loc[row, 'annotations'])

In [None]:
print('video folder:', video_folder)
print('file name:', file_name)

In [None]:
image = cv2.imread(path+'train_images/'+video_folder+'/'+file_name)
image.shape

Plot image and annotations:

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 8))
ax.imshow(image)
for box in boxes:
    p = matplotlib.patches.Rectangle((box['x'], box['y']), box['width'], box['height'],
                                     ec='r', fc='none', lw=2.)
    ax.add_patch(p)
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.show()

# Use API
To use the api we follow the instructions of [this notebook](https://www.kaggle.com/sohier/great-barrier-reef-api-tutorial/notebook). 

In [None]:
import PIL.Image
import greatbarrierreef

env = greatbarrierreef.make_env()
iter_test = env.iter_test()

In [None]:
pixel_array, sample_prediction_df = next(iter_test)
pixel_array

In [None]:
PIL.Image.fromarray(pixel_array)

# Export Data

In [None]:
samp_subm.to_csv('submission.csv', index=False)