# Contents
* 1. [Understanding the Competition](#1)
* 2. [Understanding Data](#2)
* 3. [Libraries](#3)
* 4. [Importing Data](#4)
* 5. [Visualizing Data](#5)
    * 5.1 [Image Data](#6)
    * 5.2 [Video Data](#7)
* 6. [Bounding Boxes](#8)
    * 6.1 [Bounding boxes in Images](#9)
    
* 7. [Exploratory Data Analysis](#11)

* [Upcoming Updates](#12)
* [Note to Readers](#13)

<a id="1"></a> 

# 1. Understanding the Competition
This competition is part of the NFL’s annual 1st and Future Competition, which has been designed to spur innovation in athlete safety and performance.

The NFL is actively addressing the need for a computer vision system to detect on-field helmet impacts as part of the “Digital Athlete” platform, and the league is calling on Kagglers to help.

In this competition, it is expected to develop a computer vision model that automatically detects helmet impacts that occur on the field. The dataset is of more than one thousand definitive head impacts from thousands of game images, labelled video from the sidelines and end zones, and player tracking data.

The data also documents the position, speed, acceleration, and orientation for every player on the field during NFL games.

This competition is evaluated using a micro F1 score at an Intersection over Union (IoU) threshold of 0.35.



<a id="2"></a> 
# 2.Understanding Data

### The dataset consists of three types of data:

* **Image Data:**
    Image Data consist of about 10,000 images and associated helmet labels. This is to be used for building a helmet detection system.

* **Video Data:**
    Video Data consists of 120 videos (60 plays) from both a sideline and endzone point of view for each play. It has been associated with helmet and helmet impact labels, which has to be used for building a helmet impact detection system.

* **Tracking Data:**
    Tracking data consists of tracking for all players in the provided 60 plays.


### Data files:

* **train_labels.csv** - Helmet tracking and collision labels for the training set.
* **sample_submission.csv** - A valid sample submission file.
* **image_labels.csv** - contains the bounding boxes corresponding to the images.
* **[train/test]_player_tracking.csv** - Each player wears a sensor that allows us to precisely locate them on the field.



### Folders:
* **/train/** contains the mp4 video files for the training plays. 
  (Both an endzone and sideline view.)
    
* **/test/** contains the videos for the test set. 
    
* **/images/** contains the additional annotated images of player helmets.

<a id="3"></a> 
# 3.Libraries

In [None]:
# Libraries
import numpy as np 
import pandas as pd

import seaborn as sns


import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.dpi'] = 150

import cv2
import imageio
from IPython.display import Video, display

import warnings
warnings.filterwarnings('ignore')

<a id="4"></a> 
# 4. Importing Data

In [None]:
train_tracking = pd.read_csv('../input/nfl-impact-detection/train_player_tracking.csv')
test_tracking = pd.read_csv('../input/nfl-impact-detection/test_player_tracking.csv')


train_labels = pd.read_csv('../input/nfl-impact-detection/train_labels.csv')
image_labels = pd.read_csv('../input/nfl-impact-detection/image_labels.csv')
video_labels = pd.read_csv('/kaggle/input/nfl-impact-detection/train_labels.csv')

sub_sample = pd.read_csv('../input/nfl-impact-detection/sample_submission.csv')

<a id="5"></a> 
# 5. Visualizing Data
<a id="6"></a> 
## 5.1 Image Data

In [None]:
image_labels.head()

In [None]:
image_labels.info()

In [None]:
def img_show(index):
    im = cv2.imread("../input/nfl-impact-detection/images/" + image_labels["image"][index])
    plt.imshow(im)

In [None]:
img_show(0)

<a id="7"></a> 
## 5.2 Video Data

In [None]:
# Read in the video labels file

video_labels.head()

In [None]:
def vid_show(index):
    video_name = video_labels['video'][index]
    video_path = f"/kaggle/input/nfl-impact-detection/train/{video_name}"
    display(Video(data=video_path, embed=True))

In [None]:
vid_show(0)

<a id="8"></a>
# 6. Bounding Boxes

<a id="9"></a>
## 6.1 Bounding Boxes in Images

In [None]:
image_labels.head(5)

In [None]:
# Bounding box function for Images
def box_image(index):
    name = image_labels['image'][index]
    box_color = (0, 0, 0)    # Bounding box color -> Black
    img = imageio.imread(f"/kaggle/input/nfl-impact-detection/images/{name}")
    image = image_labels.loc[image_labels['image'] == name]
    for i, j in image.iterrows():
        color = box_color 

        # Add a box around the helmet
        # Note that cv2.rectangle requires us to specify the top left pixel and the bottom right pixel
        cv2.rectangle(img, (j.left, j.top), (j.left + j.width, j.top + j.height), color,thickness=1)
        
    # Display the image with bounding boxes added
    plt.imshow(img)
    plt.show()

In [None]:
box_image(1)

<a id="11"></a>
# 7. Exploratory Data Analysis

### Number of unique elements in each feature

In [None]:
train_labels.nunique().to_frame().rename(columns={0:"Count"})

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))

sns.distplot(train_labels["gameKey"].value_counts(), ax=ax[0, 0], rug=True, color="red")
ax[0, 0].set_title("Game Counts")

sns.distplot(train_labels["playID"].value_counts(), ax=ax[0, 1], rug=True, color="blue")
ax[0, 1].set_title("Play Counts")

sns.distplot(train_labels["label"].value_counts(), ax=ax[1, 0], rug=True, color="green")
ax[1, 0].set_title("Labels Counts")

sns.distplot(train_labels["video"].value_counts(), ax=ax[1, 1], rug=True, color="yellow")
ax[1, 1].set_title("Videos Counts")

fig.show()

Let us know the number of(unique) videos in our dataset.

In [None]:
train_labels['video'].nunique()

120 unique videos that comprise of two views of one game play each.
Therefore, 60 gameplays with two views of each.

### Lenght of videos

In [None]:
play_frame_count = train_labels[['gameKey','playID','frame']].drop_duplicates()[['gameKey','playID']].value_counts()

fig, ax = plt.subplots(figsize=(10, 8))
sns.distplot(play_frame_count, bins=15)
ax.set_title('Distribution of frames per video file')
plt.show()

The videos range from approximately 300 frames to 600 frames per video.

### Bounding box size
This depends on various factors like,
* The distance between player and camera.
* The camera's angle and zoom relative to the field.
* One player's helmet may be blocked from view by another player.


Here, we are taking area (width x height) of the bounding box.

In [None]:
train_labels['area'] = train_labels['width'] * train_labels['height']
fig, ax = plt.subplots(figsize=(10, 5))

sns.distplot(train_labels['area'].value_counts(),
             bins=10)
ax.set_title('Distribution bounding box sizes')
plt.show()

### Impact Type Count
Types of Impacts recorded here are:
* Helmet
* Shoulder
* Body
* Ground
* Hand
* shoulder'

In [None]:
train_labels['impactType'].value_counts().plot(kind='bar',title='Impact Type Count',figsize=(12, 4))

plt.show()

train_labels['impactType'].value_counts()

In [None]:
sns.catplot(x="view", hue="impactType", col="confidence",
                data=train_labels, kind="count")

In [None]:
sns.catplot(x="view", hue="impactType", col="visibility",
                data=train_labels, kind="count")

### Impact Occurance Percentage

In [None]:
impact_occ = train_labels[['video','impact']].fillna(0)['impact'].mean() * 100
print(f'Of all bounding boxes, {impact_occ:0.4f}% of them involve an impact event')

### Confidence
* Possible = 1
* Definitive = 2
* Definitive and Obvious = 3

In [None]:
train_labels['confidence'].dropna().astype('int').value_counts().plot(kind='bar',
          title='Confidence Type Label Count',
          figsize=(12, 4))
plt.show()

train_labels['confidence'].value_counts()

In [None]:
sns.catplot(x="impactType", hue="confidence", col="view",
                data=train_labels, kind="count")

### Visibility
* Not Visible from View = 0 
* Minimum = 1 
* Visible = 2
* Clearly Visible = 3

In [None]:
train_labels['visibility'].dropna() \
    .astype('int').value_counts() \
    .plot(kind='bar',
          title='Visibility Label Count',
          figsize=(12, 4))
plt.show()

train_labels['visibility'].value_counts()

In [None]:
sns.catplot(x="impactType", hue="visibility", col="view",
                data=train_labels, kind="count")

<a id="12"></a> 
# Upcoming updates:

<b> 
* Bounding Box displays in Videos
* Finding in-depth insights about the dataset
* Model design and implementation
* Final submission.
</b>

All these updates will be here soon, keep motivating me till then.


<a id="13"></a> 
# Note to the Readers
<b>
This is my first attempt on Kaggle, I am still finding my way around over here, motivate me and push me to learn more.

If you wish to suggest me updates, feel free to do so.
</b>

## Did you upvote or comment yet? Please do... :D