In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<h1><center>NFL 1st and Future - Impact Detection</center></h1>
<h2><center>Detect helmet impacts in videos of NFL plays</center></h2>

<center><img src="https://storage.googleapis.com/kaggle-competitions/kaggle/12125/logos/header.png?t=2018-11-30-18-08-32"></center>

# About the Competition

<h2 style="color:red">Work in Progress.</h2>
<h3 style="color:brown">Shoot your thoughts in the comment section and Don't forget to upvote if you like the notebook :)</h3>

## Organisers and additional perks

- This competition is part of the **NFL’s annual 1st and Future competition**, which is designed to spur innovation in athlete safety and performance. 
- For the first time this year, 1st and Future will be broadcast in primetime during Super Bowl LV week on NFL Network, and winning Kagglers may have the opportunity to present their computer vision systems as part of this exciting event.  
- If successful, you could support the NFL’s research programs in a big way: improving athletes' safety. Backed by this research, the NFL may implement rule changes and helmet design improvements to try to better protect the athletes who play the game millions watch each week.

## What to do?

- We’ll develop a computer vision model that automatically detects helmet impacts that occur on the field. 
- Kick off with the dataset of more than one thousand definitive head impacts from thousands of game images, labeled video from the sidelines and end zones, and player tracking data. 

## Data Source

- This information is sourced from the NFL’s Next Gen Stats (NGS) system, which documents the position, speed, acceleration, and orientation for every player on the field during NFL games.

# Evaluation Metric

<h2 style="color:brown">Task: </h2>
<p>Segment helmet collisions in videos of football plays using bounding boxes.</p>
<h2 style="color:brown">Metric: </h2>
<p>Evaluated using <em>micro F1 score</em> at an <em>Intersection Over Union</em> threshold of 0.35.</p>

## Why F1-Score?

- The main departure from a traditional metric is that some imprecision on the timing of the impact is acceptable. For a given ground truth impact, a prediction within **+/- 4 frames (9 frames total)** within the same play can be accepted as valid without necessarily degrading the score. Assuming the player is moving over the course of those frames, the exact bounding box predicted to achieve an IoU of 1.0 would also vary depending on the frame.
- As one helmet may partially obscure another from the camera's perspective, both predicted and ground truth bounding boxes may overlap. However, at most one prediction will ever be assigned to a given ground truth box.

The two criteria described above mean that one or more predictions could theoretically be assigned to more than one ground truth boxes. If this happens, our metric will optimize for the assignments between your prediction(s) and the ground truth boxes that lead to the highest total number of True Positives (thereby maximizing the F1 score). At most one prediction will be assigned to any ground truth box and vice versa.

## But What is F1 Score?

F1 is calculated as follows:
\begin{equation}
F1 = 2 ∗ \frac{{precision∗recall}} {precision+recall}
\end{equation}

where:

\begin{align}
precision = \frac{TP}{TP+FP} \\
recall = \frac{{TP}}{TP+FN}
\end{align}

## What is IoU?

The IoU of a proposed bounding box and a ground truth bounding box is calculated as:

\begin{equation}
IoU(A,B) = \frac{{A∩B}}{A∪B}
\end{equation}

# Make a submission, but how to?

Due to the custom metric, this competition relies on an evaluation pipeline which is slightly different than a typical code competition. Your notebook must import and submit via the custom `nflimpact` python module available in Kaggle notebooks.

To submit, simply add these three lines at the end of your code:

`
import nflimpact
env = nflimpact.make_env()
env.predict(df) # df is a pandas dataframe of your entire submission file
`

The dataframe should be in the following format:
- Each row in your submission represents a single predicted bounding box for the given frame.
- Note that it is not required to include labels of which players had an impact, only a bounding box where it occurred.

`
gameKey,playID,view,video,frame,left,width,top,height
57590,3607,Endzone,57590_003607_Endzone.mp4,1,1,1,1,1
57590,3607,Sideline,57590_003607_Sideline.mp4,1,1,1,1,1
57595,1252,Endzone,57595_001252_Endzone.mp4,1,1,1,1,1
57595,1252,Sideline,57595_001252_Sideline.mp4,1,1,1,1,1
etc.
`

## More on Guidelinses/submissions:

- CPU Notebook <= 9 hours run-time
- GPU Notebook <= 9 hours run-time
- Freely & publicly available external data is allowed, including pre-trained models

# Data Overview

- We are tasked with identifying helmet collisions in video files. 
- Each play has two associated videos, showing a `sideline` and `endzone` view, and the videos are aligned so that frames correspond between the videos. 
- The training set videos are in `train` with corresponding labels in `train_labels.csv`, while the videos for which you must predict are in the `test` folder.
- We are also provided an ancillary dataset of images showing helmets with labeled bounding boxes. These files are located in `images` and the bounding boxes in `image_labels.csv`.

<p style="color:red">This is a code competition. When you submit, your model will be rerun on a set of 15 unseen videos located in the same test location. The publicly provided test videos are simply a set of mock plays (copied from the training set) which are not used in scoring.</p>

<p style="color:blue">The dataset provided for this competition has been carefully designed for the purposes of training computer vision models and therefore contains plays that have much higher incidence of helmet impacts than is normal. This dataset should not be used to make inferences about the incidence of helmet impact rates during football games, as it is not a representative sample of those rates.</p>

Files
[train/test] mp4 videos of each play. Each play has two copies, one shot from the endzone and the other shot from the sideline. The video pairs are matched frame for frame in time, but different players may be visible in each view. You only need to make predictions for the view that a player is actually visible in.

# Helper Functions

In [None]:
def count_plot(df, col, top_most=50, title=None, is_top=True):
     
    if not is_top:
        temp = df[col].astype("str").value_counts(ascending=True).to_frame().reset_index().head(top_most)
    else:
        temp = df[col].astype("str").value_counts().to_frame().reset_index().head(top_most)
    
    temp.columns = [col,'count']
    
    plt.figure(figsize=(6, 10))
    sns.barplot("count", col, data=temp, orient="h", order=temp[col].values.tolist())
    plt.show()
    pass

# Imports

In [None]:
import os
import gc
import numpy as np
import pandas as pd
from collections import Counter

import cv2
import imageio
import subprocess
from PIL import Image

import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import matplotlib.patches as ptc

from IPython.display import Video, display

sns.set_style("whitegrid")
colorpal = sns.color_palette("husl", 9)

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline
plt.rcParams['figure.dpi'] = 150
plt.rcParams['figure.figsize'] = 12, 8

In [None]:
image_path = "../input/nfl-impact-detection/images"
train_videos = "../input/nfl-impact-detection/train"
test_videos = "../input/nfl-impact-detection/test"
image_labels = "../input/nfl-impact-detection/image_labels.csv"
train_labels = "../input/nfl-impact-detection/train_labels.csv"
train_player_tracking = "../input/nfl-impact-detection/train_player_tracking.csv"
test_player_tracking = "../input/nfl-impact-detection/test_player_tracking.csv"
sample_submissions = "../input/nfl-impact-detection/sample_submission.csv"

# Train Labels

**Helmet tracking and collision labels for the training set.**

- **gameKey:** the ID code for the game.

- **playID:** the ID code for the play.

- **view:** the camera orientation.

- **video:** the filename of the associated video.

- **frame:** the frame number for this play.

- **label:** the associate player's number.

- **[left/width/top/height]:** the specification of the bounding box of the prediction.

- **impact:** an indicator (1 = helmet impact) for bounding boxes associated with helmet impacts

- **impactType:** a description of the type of helmet impact: helmet, shoulder, body, ground, etc.

- **confidence:** 1 = Possible, 2 = Definitive, 3 = Definitive and Obvious

- **visibility:** 0 = Not Visible from View, 1 = Minimum, 2 = Visible, 3 = Clearly Visible

For the purposes of evaluation, definitive helmet impacts are defined as meeting three criteria:

- `impact = 1`
- `confidence > 1`
- `visibility > 0` 

Those labels with confidence = 1 document cases in which human labelers asserted it was possible that a helmet impact occurred, but it was not clear that the helmet impact altered the trajectory of the helmet. Those labels with visibility = 0 indicate that although there is reason to believe that an impact occurred to that helmet at that time, the impact itself was not visible from the view.

In [None]:
tr_labels = pd.read_csv(train_labels)
tr_labels

In [None]:
tr_labels.info()

In [None]:
# Unique Videos
tr_labels["video"].nunique()

So we have `120` unique videos. Technically Yes, but as per the descirption, we have `60` videos each having `2` view or we can say being captured from `2` view, one is **EndZone** and another is **Sideline**. We can see that the last token of each video file's name depicts from which view it's been captured. We can confirm once.

In [None]:
tr_labels["video"].apply(lambda x: x[:12]).nunique()

### Number of unique values in each Column

In [None]:
tr_labels.nunique().to_frame().rename(columns={0:"count"}).style.background_gradient(cmap="gnuplot")

### Top 50 gameKeys

In [None]:
count_plot(tr_labels, "gameKey")

### Top 50 PlayIDs

In [None]:
count_plot(tr_labels, "playID")

### Top 50 Labels

In [None]:
count_plot(tr_labels, "label")

### Least 50 labels

In [None]:
count_plot(tr_labels, "label", is_top=False)

# Top 50 Videos

In [None]:
count_plot(tr_labels, "video")

### 50 Rare Videos

In [None]:
count_plot(tr_labels, "video", is_top=False)

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
sns.distplot(tr_labels["gameKey"].value_counts(), ax=ax[0, 0], rug=True, color="red")
ax[0, 0].set_title("Game Counts")
sns.distplot(tr_labels["playID"].value_counts(), ax=ax[0, 1], rug=True, color="blue")
ax[0, 1].set_title("Play Counts")
sns.distplot(tr_labels["label"].value_counts(), ax=ax[1, 0], rug=True, color="green")
ax[1, 0].set_title("Labels Counts")
sns.distplot(tr_labels["video"].value_counts(), ax=ax[1, 1], rug=True, color="yellow")
ax[1, 1].set_title("Videos Counts")
fig.show()

In [None]:
_ = sns.catplot(x="impactType", hue="visibility", col="view",
                data=tr_labels, kind="count")

In [None]:
_ = sns.catplot(x="impactType", hue="confidence", col="view",
                data=tr_labels, kind="count")

In [None]:
_ = sns.catplot(x="view", hue="impactType", col="confidence",
                data=tr_labels, kind="count")

In [None]:
_ = sns.catplot(x="view", hue="impactType", col="visibility",
                data=tr_labels, kind="count")

# Image Labels


**Contains the bounding boxes corresponding to the images.**

- **image:** the image file name.

- **label:** the label type.

- **[left/width/top/height]:** the specification of the bounding box of the label, with left=0 and top=0 being the top left corner.

In [None]:
img_labels = pd.read_csv(image_labels)
img_labels

In [None]:
img_labels.label.value_counts()

In [None]:
_ = sns.catplot(x="label", data=img_labels, kind="count")
plt.gcf().set_size_inches(20, 8)

In [None]:
# take a sample image
ridx = np.random.randint(0, len(os.listdir(image_path)))
img_fn = os.listdir(image_path)[ridx]
print("Image: ", img_fn)
img_sample = Image.open(os.path.join(image_path, img_fn))

plt.imshow(img_sample)
plt.show()

In [None]:
def add_img_boxes(image_name, image_labels=img_labels):
    # Set label colors for bounding boxes
    _, ax = plt.subplots(1)
    
    boxes = img_labels.loc[img_labels['image'] == image_name]
    
    for j, box in boxes.iterrows():
        if box.label=="Helmet":
            edc = "blue"
        elif box.label=="Helmet-Blurred":
            edc = "orange"
        elif box.label=="Helmet-Difficult":
            edc = "green"
        elif box.label=="Helmet-Sideline":
            edc = "red"
        else:
            edc = "purple"
        
        patch = ptc.Rectangle((box.left, box.top), width=box.width, height=box.height, fill=False, edgecolor=edc)
        ax.text(box.left, box.top, box.label, fontsize=8, bbox=dict(facecolor=edc, alpha=0.1))
        ax.add_patch(patch)
        
    # Display the image with bounding boxes added
    ax.imshow(img_sample)
    ax.set_title(f"{image_name} with bounded boxes")
    plt.show()
    
add_img_boxes(img_fn)

# Player Tracking

**Each player wears a sensor that allows us to precisely locate them on the field; that information is reported in these two files.**

- **gameKey:** the ID code for the game.

- **playID:** the ID code for the play.

- **player:** the player's ID code.

- **time:** timestamp at **10 Hz**.

- **x:** player position along the long axis of the field. See figure below.

- **y:** player position along the short axis of the field. See figure below.

- **s:** speed in yards/second.

- **a:** acceleration in yards/second^2.

- **dis:** distance traveled from prior time point, in yards.

- **o:** orientation of player (deg).

- **dir:** angle of player motion (deg).

- **event:** game events like a snap, whistle, etc.

<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3258%2F820e86013d48faacf33b7a32a15e814c%2FIncreasing%20Dir%20and%20O.png?generation=1572285857588233&alt=media">

## Train Players tracking

In [None]:
tr_track = pd.read_csv(train_player_tracking)
tr_track

## Test Players tracking

In [None]:
ts_track = pd.read_csv(test_player_tracking)
ts_track

# Videos Analysis

In [None]:
# take a sample image
ridx = np.random.randint(0, len(os.listdir(train_videos)))
vid_fn = os.listdir(train_videos)[ridx]
print("Video: ", vid_fn)

display(Video(data=os.path.join(train_videos, vid_fn), embed=True))

# Have a look at sample submission!

**A valid sample submission file.**

- **gameKey:** the ID code for the game.

- **playID:** the ID code for the play.

- **view:** the camera orientation.

- **video:** the filename of the associated video.

- **frame:** the frame number for this play.

- **[left/width/top/height]:** the specification of the bounding box of the prediction.

In [None]:
ss = pd.read_csv(sample_submissions)
ss

As said, we need to predict a single bouding box for the givenn frame. The bouding boxe is represented by `left`, `width`, `top` and `height`.