# **The "Help Protect the Great Barrier Reef" Code Explained by Dino** #

## Intro ##
The Great Barrier Reef is the world's largest coral reef species in the world. However, the Great Barrier Reef is at a critical tipping point that will determine its long-term survival. Coral bleaching as a result of global warming is a key reason for the reef's decline. Not only that, the infamous starfishes overpopulate too. Like sea urchins devouring kelp in California (the place where I live), the starfishes devoured every coral, day by day. Because of that, scientists are worried about the corals being eaten up by starfishes, so the Great Barrier Reef Foundation and Google teamed up together to create a one full competition on Kaggle. Today, we are explaining the solution about the "Help Protect the Great Barrier Reef" contest.

## Imports and Modules ##
Before classifying the images whether there is a seastar or not, we have to import the neccessary libraries. Importing them can make the Python code classify more and more of seastars well, for, at least.

In [None]:
import os
import cv2
import ast
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

## Variables setup ##
Well, well, we had made our good progress with that imports, let's now define variables! First, define the "path" variable, which defines the path of the files toward the "tensorflow-great-barrier-reef' folder, at least. From there, you can find all 6 files inside listed in the listdir thing.

In [None]:
path = '/kaggle/input/tensorflow-great-barrier-reef/'
os.listdir(path)

And... We have to set up train_data, test_data, and even samp_subm! How? We use pandas to read the csvs with the path variable concatenated with the strings: train.csv, test.csv, and example_sample_submission.csv.

In [None]:
train_data = pd.read_csv(path+'train.csv')
test_data = pd.read_csv(path+'test.csv')
samp_subm = pd.read_csv(path+'example_sample_submission.csv')

In [None]:
train_data.head()

## Box Setup Classification of a Crown-of-Thorns Starfish ##
Now we defined all the variables, then we should know that we have to define the video_frame first, then concatenate the video_frame variable to the '.jpg' string. Then, we can call train_data down towards the train_data key inside, video_frame, and set equal to the video frame variable.

In [None]:
video_frame = 621
file_name = str(video_frame)+'.jpg'
train_data[train_data['video_frame']==video_frame]

The train_data dictionary returned two results, one of which contained annotations about where the sea stars prowl. Since we know that video_id 1 has annotations, we can assign one of the image_folder variables, which each made the OpenCV library read an image file from 3 different paths, from the video_0 path to the video_2 path.

In [None]:
image_folder_0 = cv2.imread(path+'train_images/video_0/'+file_name)
image_folder_1 = cv2.imread(path+'train_images/video_1/'+file_name)
image_folder_2 = cv2.imread(path+'train_images/video_2/'+file_name)

Now we have to call out the train_data dictionary variable again, but this time, we combine the two key terms: the one with the video_frame key equals to the video frame variable and the one with the video_id key equals to the number which contained the annotations. If the number in the video_id has the annotation, then, the row is displayed, that contains the coordinates.

In [None]:
train_data[(train_data['video_frame']==video_frame)&(train_data['video_id']==1)]['annotations']

Since row 7329 contains the annotations, we want to set the variable row to the row where it contained the annotations.

In [None]:
row = 7329

Now, we can assign the boxes variable to the ast module over the literal_eval function, over the .loc dataframe over the locations of the annotations by the row variable.

In [None]:
boxes = ast.literal_eval(train_data.loc[row, 'annotations'])

## Now let's plot! ##

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 8))
ax.imshow(image_folder_1)
for box in boxes:
    p = matplotlib.patches.Rectangle((box['x'], box['y']), box['width'], box['height'],
                                     ec='r', fc='none', lw=2.)
    ax.add_patch(p)
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.savefig("oh_no_one_reef_eater.png")
plt.show()

Alas, we see at least one crown-of-thorns starfish looking for coral to devour. But if there are more crown-of-thorns starfish lurking around the coral reef, how does the machine detect more of those? Well, let's find out ways to tackle the reef-eater problem.

## Multi-Box Object Detection over more "Reef-Eaters" ##
After we detect one sea-star, we are going detect more of that! First of all, we now define row again first, then define file_name again, but this time, locate the row with the video_frame key in train_data and concatenate it with .jpg inside the string. And then, we are now define video_folder variable with the string of video_ concatenated with the string of the location of train_data over the video_id key and the row variable. Lastly, we can again assign the boxes variable to the ast module over the literal_eval function, over the .loc dataframe over the locations of the annotations by the row variable.

In [None]:
row = 2845
file_name = str(train_data.loc[row, 'video_frame'])+'.jpg'
video_folder = 'video_'+str(train_data.loc[row, 'video_id'])
boxes = ast.literal_eval(train_data.loc[row, 'annotations'])

In [None]:
print('video folder:', video_folder)
print('file name:', file_name)

Now, we can define the image variable with OpenCV using concatenation under the imread attribute. If we find out the shape of the image variable, it will return the tuple containing three numbers of the entire path.

In [None]:
image = cv2.imread(path+'train_images/'+video_folder+'/'+file_name)
image.shape

## And now, let's plot again! ##

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 8))
ax.imshow(image)
for box in boxes:
    p = matplotlib.patches.Rectangle((box['x'], box['y']), box['width'], box['height'],
                                     ec='r', fc='none', lw=2.)
    ax.add_patch(p)
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.show()

Our model now captured three reef-eating sea-stars in this image. After we detected all of the seastars in some images, now it's time to submit predictions.

## API Usage ##
Since we detected all of the seastars in all of the images, let's use the api afterall. In order to use the api, we have to follow directions of [this notebook made by Sohier](https://www.kaggle.com/sohier/great-barrier-reef-api-tutorial/notebook).

In [None]:
import PIL.Image

# these sys calls aren't actually necesarry in Kaggle notebooks, but you may need to add the data directory to your pythonpath to run the sample API off of Kaggle
import sys
sys.path.append('../input/tensorflow-great-barrier-reef')   

import greatbarrierreef
env = greatbarrierreef.make_env()   # initialize the environment
iter_test = env.iter_test() 

In [None]:
pixel_array, sample_prediction_df = next(iter_test)
pixel_array

In [None]:
PIL.Image.fromarray(pixel_array)

## And finally, let's submit the data!!!!! ##

In [None]:
samp_subm.to_csv('submission.csv', index=False) # Remember: name your submission, submission.csv!

## Conclusion ##
Job well done, we just detect a lot of starfishes in a pile of images. Overall, we will know that a lot of environmental conservationists and scientists will inject vinegar over to the infamous reef-eaters and kill them up for once and for all! When there are no crown-of-thorn seastars lurking through the Great Barrier Reef, the corals in there would thrive happily ever after... since I can make up as a fairy tale story in this notebook hehe.

## Acknowledgements ##
Special thanks to DrCapa, who provided the starter notebook as an example to some people who are new to computer coding competition like me. Details over here: 
https://www.kaggle.com/drcapa/great-barrier-reef-starter