# Get data and explore 
Download dataset from Kaggle using Kaggle [API](https://github.com/Kaggle/kaggle-api). Please see API [credential](https://github.com/Kaggle/kaggle-api#api-credentials) documentation to retrieve and save kaggle.json file on SageMaker within `/home/ec2-user/.kaggle`. For security reason make sure to change mode for accidental other users `chmod 600 ~/.kaggle/kaggle.json`.

Make sure to select `pytorch_p36` as the kernel.

### Import Needed Packages

In [None]:
import kaggle
import imageio
from PIL import Image
import cv2
import numpy as np 
import pandas as pd 
import os
import subprocess

import matplotlib.pyplot as plt
import matplotlib.patches as patches
%matplotlib inline
plt.rcParams['figure.dpi'] = 150

import seaborn as sns

from IPython.display import Video, display

#block those warnings from pandas about setting values on a slice
import warnings
warnings.filterwarnings('ignore')


### Download data from Kaggle
Next, download dataset from Kaggle using Kaggle [API](https://github.com/Kaggle/kaggle-api). Please see API [credential](https://github.com/Kaggle/kaggle-api#api-credentials) documentation to retrieve and save kaggle.json file on SageMaker within /home/ec2-user/.kaggle. For security reason make sure to change mode for accidental other users.

chmod 600 ~/.kaggle/kaggle.json

In [None]:
# !pip install kaggle
# !mkdir /home/ec2-user/.kaggle
# !mv kaggle.json /home/ec2-user/.kaggle

In [None]:
!kaggle competitions download -c nfl-impact-detection

In [None]:
# !mkdir input output model

In [None]:
# !unzip nfl-impact-detection.zip

In [None]:
# !mv images/ input/
# !mv train/ input/
# !mv image_labels.csv input/
# !mv train_labels.csv input/
# !rm -r test/
# !rm -r nflimpact/
# !rm sample_submission.csv test_player_tracking.csv train_player_tracking.csv

### Image Data Overview

The labeled image dataset consists of 9947 labeled images and a .csv file named image_labels.csv that contains the labeled bounding boxes for all images.  This dataset is provided to support the development of helmet detection algorithms. 

#### Lets check raw images

In [None]:
!ls /home/ec2-user/SageMaker/helmet_detection/input/images/ >image_name.txt

In [None]:
# Read in the image labels file
img_name = pd.read_csv('image_name.txt', header=None)
img_name.columns =['image'] 
img_name['view']=img_name['image'].str.split("_", expand=True)[2]
img_name['image_id']=img_name['image'].str[0:21]
print(img_name.shape)
img_name.head()

In [None]:
print
img_name['view'].value_counts()

In [None]:
len(img_name['image_id'].unique())# there are multiple frames from the same play

In [None]:
img_name_dp = img_name[img_name.duplicated(['image_id'], keep=False)]
img_name_dp.head()

#### lets check image label file

In [None]:
# Read in the image labels file
img_labels = pd.read_csv('input/image_labels.csv')
img_labels['view']=img_labels['image'].str.split("_", expand=True)[2]
img_labels['image_id']=img_labels['image'].str[0:20]
print(img_labels.shape)
img_labels.head()

In [None]:
# Get a summary on the data type
img_labels.info()

In [None]:
img_labels['view'].value_counts()

In [None]:
len(img_labels['image_id'].unique())

In [None]:
img_labels['label'].value_counts()

In [None]:
img_labels.label.value_counts(normalize=True)

#### Let's bring in an image and go ahead and add the labels.  

In [None]:
# Set the name of our working image
img_name = img_labels['image'][100]
img_name

In [None]:
# Define the path to our selected image
img_path = f"input/images/{img_name}"

In [None]:
# Read in and plot the image
img = imageio.imread(img_path) 
plt.imshow(img)
plt.show()

Let's write a function for adding the bounding boxes from the label to the image.  Note that the pixel geometry starts with (0,0) in the top left of the image.  To draw the bounding box, we need to specify the top left pixel location and the bottom right pixel location of the image.

In [None]:
### Function to add labels to an image

def add_img_boxes(image_name, image_labels):
    # Set label colors for bounding boxes
    HELMET_COLOR = (0, 0, 0)    # Black

    boxes = img_labels.loc[img_labels['image'] == img_name]
    for j, box in boxes.iterrows():
        print(j)
        color = HELMET_COLOR 

        # Add a box around the helmet
        # Note that cv2.rectangle requires us to specify the top left pixel and the bottom right pixel
        cv2.rectangle(img, (box.left, box.top), (box.left + box.width, box.top + box.height), color, thickness=3)
        
    # Display the image with bounding boxes added
    plt.imshow(img)
    plt.show()

In [None]:
add_img_boxes(img_name, img_labels)

We can now see in the image above that bounding boxes have been added to every helmet.  

## Basic EDA

In [None]:
# !pip install basic-image-eda

In [None]:
!pwd

In [None]:
from basic_image_eda import BasicImageEDA

data_dir = "/home/ec2-user/SageMaker/helmet_detection/input/images"
extensions = ['png', 'jpg', 'jpeg']
threads = 0
dimension_plot = True
channel_hist = True
nonzero = False
hw_division_factor = 1.0

BasicImageEDA.explore(data_dir, extensions, threads, dimension_plot, channel_hist, nonzero, hw_division_factor)