### Objective

The main objective is to track the player helmet assigning them proper identity, for better understanding of collision during game. For a given play a sideline and endzone view are taken. Overall the main aim is to detect and track multiple helmets. Detection involves predicting the right bounding boxes. The label assigned should be same as the one present on the jersey.    

#### Datasets
Directory Information:
* Train Data: train/
* Train Labels: train_labels.csv
* Test Videoss: test/
* Images of Helmets: images/
* Bounding box info of helmets: image_labels.csv
* Baseline helmet Detection Boxes: train_baseline_helmets.csv

### Import Libraries

In [None]:
import os
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patch

import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

#### Exploring the csv files

In [None]:
home_dir = "../input/nfl-health-and-safety-helmet-assignment/"
for files in os.listdir(home_dir):
    if ('.csv' in files):
        print(files,'\n')
        dataset = pd.read_csv(os.path.join(home_dir,files))
        print(dataset.head(),'\n')
        print("---------------------------------")

### About Custom Datasets
An efficient way of loading datasets in pytorch, which helps in making the code more readable.  
Our custom dataset inherits from the Dataset library and overrides the __len__ and __getitem__ methods  
* **len** - returns the size of the dataset
* **getitem** - for a given index returns the element at the index  

A general approach:
* load the csv files in the init method
* load the images in the getitem magic function.   
This prevents loading all the images at once hence is memory efficient

In [None]:
class NFLDataset(Dataset):
    """NFL Helmet Assignment Dataset"""
    def __init__(self, path_to_labels, path_to_images):
        self.bounding_boxes = pd.read_csv(path_to_labels['detection'])
        self.tracking = pd.read_csv(path_to_labels['tracking'])
        self.identity = pd.read_csv(path_to_labels['identity'])
        self.image_info = pd.read_csv(path_to_labels['images'])
        self.unique_img_info = path_to_labels['unqimgs']
        self.image_dir = path_to_images
    
    def __len__(self):
        return len(self.unique_img_info)
       
    def __getitem__(self, index):
        image = self.unique_img_info['unqimgs'][index]
        sample = {'image':image}
        return sample

<span style="color:red">**NOTE**</span>   In ***image_labels.csv*** since information about a single image is shared in multiple rows, using this directly for custom dataset is not recommended. Because when the data is loaded using dataloader with a particular batch size, for a single image bounding box information will be distributed among different batches which is not what we want.  

Instead for a given image we want all the bounding boxes combined as its label. In the below cell this is done by creating a seperate dataframe consisting of only unique image name, which then used to load the image and all its corresponding bounding boxes together.

In [None]:
image_root_dir = '../input/nfl-health-and-safety-helmet-assignment/images'
train_identity_labels = f'{home_dir}/train_labels.csv'
train_tracking_labels = f'{home_dir}/train_player_tracking.csv'
train_helmet_detection_labels = f'{home_dir}/train_baseline_helmets.csv'
train_image_labels = f'{home_dir}/image_labels.csv'

imgs = pd.read_csv(train_image_labels)
df = {'unqimgs':imgs.image.unique()}
unique_imgs = pd.DataFrame(df)

path_to_train_labels = {'identity':train_identity_labels, 'tracking': train_tracking_labels, 
                        'detection':train_helmet_detection_labels, 'images': train_image_labels,
                       'unqimgs': unique_imgs}

train_dataset = NFLDataset(path_to_train_labels, image_root_dir)

### Using DataLoader to wrap over the dataset
Using Dataset and Dataloader helps simplify the overall pipeline. It helps in loading and iterating over the given dataset.  
It allows to load minibatch of data and shuffle it along the way.  
This helps in mutiprocessing while at the same time prevents us from loading the complete data in the memory.

In [None]:
train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=False)

### Visualize Data
Now each iteration throgh the dataloader returns a batch of features and labels.  
The next() and iter() built in function works as its name.
* **iter()** creates a stream of data to iterate over
* **next()** selects the next data from the given stream  
This combination can be used on the train_dataset (instance of custom dataset) to iterate over our given dataset.

In [None]:
data_info = pd.read_csv(train_image_labels)

def _get_bbox_info(img_path):
    indx = data_info[data_info['image']==img_path].index.tolist()
    left = data_info['left'][indx].tolist()
    top = data_info['top'][indx].tolist()
    width = data_info['width'][indx].tolist()
    height = data_info['height'][indx].tolist()
    right = [x+y for x,y in zip(left,width)]
    bottom = [x+y for x,y in zip(top,height)]
    start_point = [(x,y) for x,y in zip(left,top)]
    end_point = [(x,y) for x,y in zip(right,bottom)]
    
    return (start_point, end_point)
    
def _draw_bbox(img, sp, ep):
    return cv2.rectangle(img, sp, ep, (255,0,0), 2)    

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20,12))
for batch in train_dataloader:
    for (index, img_path) in enumerate(batch['image']):
        img = cv2.imread(f'{image_root_dir}/{img_path}')
        sp, ep = _get_bbox_info(img_path)
        for (x,y) in zip(sp,ep):
            img = _draw_bbox(img, x, y)         
        ax[index//2][index%2].imshow(img)
    break    

Further details related to training and inference will be added soon