### Data Overview

This notebook provides an overview of the data and some examples of how to access and conduct some initial plotting of the data that has been provided.  There are three different types of data provided for this problem:

* **Image Data:**  Almost 10,000 images and associated helmet labels for the purpose of building a helmet detection computer vision system.

* **Video Data:**  120 videos (60 plays) from both a sideline and endzone point of view (one each per play) with associated helmet and helmet impact labels for the purpose of building a helmet impact detection computer vision system.

* **Tracking Data:**  Tracking data for all players that participate in the provided 60 plays.

This overview provides an example for how to parse and plot each of these data types.  It also briefly summarizes the needed steps to submit a solution for scoring.

### Import Needed Packages

In [None]:
import imageio
from PIL import Image
import cv2
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import subprocess

import matplotlib.pyplot as plt
import matplotlib.patches as patches
%matplotlib inline
plt.rcParams['figure.dpi'] = 150

import seaborn as sns

from IPython.display import Video, display

#block those warnings from pandas about setting values on a slice
import warnings
warnings.filterwarnings('ignore')


### Image Data Overview

The labeled image dataset consists of 9947 labeled images and a .csv file named image_labels.csv that contains the labeled bounding boxes for all images.  This dataset is provided to support the development of helmet detection algorithms. 

In [None]:
# Read in the image labels file
img_labels = pd.read_csv('/kaggle/input/nfl-impact-detection/image_labels.csv')
img_labels.head()

In [None]:
# Get a summary on the data type

img_labels.info()

Let's bring in an image and go ahead and add the labels.  

In [None]:
# Set the name of our working image
img_name = img_labels['image'][0]
img_name

In [None]:
# Define the path to our selected image
img_path = f"/kaggle/input/nfl-impact-detection/images/{img_name}"

In [None]:
# Read in and plot the image
img = imageio.imread(img_path) 
plt.imshow(img)
plt.show()

Let's write a function for adding the bounding boxes from the label to the image.  Note that the pixel geometry starts with (0,0) in the top left of the image.  To draw the bounding box, we need to specify the top left pixel location and the bottom right pixel location of the image.

In [None]:
### Function to add labels to an image

def add_img_boxes(image_name, image_labels):
    # Set label colors for bounding boxes
    HELMET_COLOR = (0, 0, 0)    # Black

    boxes = img_labels.loc[img_labels['image'] == img_name]
    for j, box in boxes.iterrows():
        color = HELMET_COLOR 

        # Add a box around the helmet
        # Note that cv2.rectangle requires us to specify the top left pixel and the bottom right pixel
        cv2.rectangle(img, (box.left, box.top), (box.left + box.width, box.top + box.height), color, thickness=1)
        
    # Display the image with bounding boxes added
    plt.imshow(img)
    plt.show()

In [None]:
add_img_boxes(img_name, img_labels)

We can now see in the image above that bounding boxes have been added to every helmet.  

### Video Data

The labeled video dataset provides video for 60 plays observed from both the sideline and endzone perspective (120 videos total).  The video_labels.csv file contains labeled bounding boxes for every helmet that is visible in every frame of every video.  

In [None]:
# Read in the video labels file
video_labels = pd.read_csv('/kaggle/input/nfl-impact-detection/train_labels.csv')
video_labels.head()

The gameKey, playID, video, and frame fields facilitate matching the bounding box to the appropriate video file and video frame.  The label field corresponds to the player field in the tracking data, providing a unique identifier for the helmets of players that are participating in the play.  However, there are also helmets (players) that appear in the videos that are not participating in the play.  These players are identified with the labels V00 (non-participant on the visiting team) or H00 (non-participant on the home team).  In rare cases that a player cannot be uniquely identified that is participating in the play (for example when only the helmet is visible in a pile-up), the appropriate generic V00 or H00 label is applied to that helmet bounding box. 

The Sideline and Endzone views have been time-synced such that the snap occurs 10 frames into the video.  This time alignment should be considered to be accurate to within +/- 3 frames or 0.05 seconds (video data is recorded at approximately 59.94 frames per second). 

For the purposes of evaluation, **definitive helmet impacts are defined as meeting three criteria:**

•	impact = 1

•	confidence > 1

•	visibility > 0

Those labels with confidence = 1 document cases in which human labelers asserted it was possible that a helmet impact occurred, but it was not clear that the helmet impact altered the trajectory of the helmet.  Those labels with visibility = 0 indicate that although there is reason to believe that an impact occurred to that helmet at that time, the impact itself was not visible from the view.

Let's bring in the very first video and display it.

In [None]:
# Define the video we'll process
video_name = video_labels['video'][0]
video_name

In [None]:
# Define the path and then display the video using 
video_path = f"/kaggle/input/nfl-impact-detection/train/{video_name}"
display(Video(data=video_path, embed=True))

Let's develop a function that will add bounding boxes to every frame in the video.

In [None]:
# Create a function to annotate the video at the provided path using labels from the provided dataframe, return the path of the video
def annotate_video(video_path: str, video_labels: pd.DataFrame) -> str:
    VIDEO_CODEC = "MP4V"
    HELMET_COLOR = (0, 0, 0)    # Black
    IMPACT_COLOR = (0, 0, 255)  # Red
    video_name = os.path.basename(video_path)
    
    vidcap = cv2.VideoCapture(video_path)
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    width = int(vidcap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    output_path = "labeled_" + video_name
    tmp_output_path = "tmp_" + output_path
    output_video = cv2.VideoWriter(tmp_output_path, cv2.VideoWriter_fourcc(*VIDEO_CODEC), fps, (width, height))
    frame = 0
    while True:
        it_worked, img = vidcap.read()
        if not it_worked:
            break
        
        # We need to add 1 to the frame count to match the label frame index that starts at 1
        frame += 1
        
        # Let's add a frame index to the video so we can track where we are
        img_name = f"{video_name}_frame{frame}"
        cv2.putText(img, img_name, (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, HELMET_COLOR, thickness=2)
    
        # Now, add the boxes
        boxes = video_labels.query("video == @video_name and frame == @frame")
        for box in boxes.itertuples(index=False):
            if box.impact == 1 and box.confidence > 1 and box.visibility > 0:    # Filter for definitive head impacts and turn labels red
                color, thickness = IMPACT_COLOR, 2
            else:
                color, thickness = HELMET_COLOR, 1
            # Add a box around the helmet
            cv2.rectangle(img, (box.left, box.top), (box.left + box.width, box.top + box.height), color, thickness=thickness)
            cv2.putText(img, box.label, (box.left, max(0, box.top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, thickness=1)
        output_video.write(img)
    output_video.release()
    
    # Not all browsers support the codec, we will re-load the file at tmp_output_path and convert to a codec that is more broadly readable using ffmpeg
    if os.path.exists(output_path):
        os.remove(output_path)
    subprocess.run(["ffmpeg", "-i", tmp_output_path, "-crf", "18", "-preset", "veryfast", "-vcodec", "libx264", output_path])
    os.remove(tmp_output_path)
    
    return output_path

In [None]:
# Label the video and display it - this will take a bit
labeled_video = annotate_video(f"/kaggle/input/nfl-impact-detection/train/{video_name}", video_labels)
display(Video(data=labeled_video, embed=True))

If you watch the video carefully (or pause at the appropriate time), you will note that the bounding box flashes red at the moment of impact.  

You can get a list of the helmet impacts for this view in the following manner (this will provide the frames and labels for the players with impacts).

In [None]:
# Filter for definitive impacts labeled for this video
video_impacts = video_labels.loc[(video_labels.video == video_name) & (video_labels.impact == 1) & (video_labels.confidence > 1) & (video_labels.visibility > 0)]

len(video_impacts) # how many definitive impacts in this play

In [None]:
# Get this list of definitive impacts
video_impacts

Note that every play consists of two views - a sideline view and an endzone view.  So, to find the other view of this play:

In [None]:
# Get the name of the sideline view associated with this play and display it

sideline_video_name = video_name.replace("Endzone", "Sideline")
# Define the path and then display the video using 
sideline_video_path = f"/kaggle/input/nfl-impact-detection/train/{sideline_video_name}"
display(Video(data=sideline_video_path, embed=True))

### Tracking Data

The player track file in .csv format includes player position, direction, and orientation data for each player during the entire course of the play collected using the Next Gen Stats (NGS) system. This data is indexed by gameKey, playID, and player, with the time variable providing a temporal index within an individual play.

In [None]:
track_data = pd.read_csv('/kaggle/input/nfl-impact-detection/train_player_tracking.csv')
track_data.head()

Let's filter the track data to analyze the same play we displayed above (happens to be the first play in the file).

In [None]:
# Filter the track data to the play of interest
game_key = track_data['gameKey'][0]
play_id = track_data['playID'][0]
play_track = track_data.loc[(track_data.gameKey == game_key) & (track_data.playID == play_id)]
len(play_track)

In [None]:
# See what events are stored in the data
play_track['event'].unique()

In [None]:
# Build a dataframe for the player positions at the snap

at_snap = play_track.loc[play_track.event == 'ball_snap']
at_snap

The following code for generating an image of a football field is borrowed (with permission) from Kaggle Grandmaster Rob Mulla.  You can see his original notebook here:  

https://www.kaggle.com/robikscube/nfl-big-data-bowl-plotting-player-position/notebook

In [None]:
def create_football_field(linenumbers=True,
                          endzones=True,
                          highlight_line=False,
                          highlight_line_number=50,
                          highlighted_name='Line of Scrimmage',
                          fifty_is_los=False,
                          figsize=(12, 6.33)):
    """
    Function that plots the football field for viewing plays.
    Allows for showing or hiding endzones.
    """
    rect = patches.Rectangle((0, 0), 120, 53.3, linewidth=0.1,
                             edgecolor='r', facecolor='forestgreen', zorder=0)  # changed the field color to forestgreen

    fig, ax = plt.subplots(1, figsize=figsize)
    ax.add_patch(rect)

    plt.plot([10, 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70, 80,
              80, 90, 90, 100, 100, 110, 110, 120, 0, 0, 120, 120],
             [0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3,
              53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 53.3, 0, 0, 53.3],
             color='white')
    if fifty_is_los:
        plt.plot([60, 60], [0, 53.3], color='gold')
        plt.text(62, 50, '<- Player Yardline at Snap', color='gold')
    # Endzones
    if endzones:
        ez1 = patches.Rectangle((0, 0), 10, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ez2 = patches.Rectangle((110, 0), 120, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ax.add_patch(ez1)
        ax.add_patch(ez2)
    plt.xlim(0, 120)
    plt.ylim(-5, 58.3)
    plt.axis('off')
    if linenumbers:
        for x in range(20, 110, 10):
            numb = x
            if x > 50:
                numb = 120 - x
            plt.text(x, 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white')
            plt.text(x - 0.95, 53.3 - 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white', rotation=180)
    if endzones:
        hash_range = range(11, 110)
    else:
        hash_range = range(1, 120)

    for x in hash_range:
        ax.plot([x, x], [0.4, 0.7], color='white')
        ax.plot([x, x], [53.0, 52.5], color='white')
        ax.plot([x, x], [22.91, 23.57], color='white')
        ax.plot([x, x], [29.73, 30.39], color='white')

    if highlight_line:
        hl = highlight_line_number + 10
        plt.plot([hl, hl], [0, 53.3], color='yellow')
        plt.text(hl + 2, 50, '<- {}'.format(highlighted_name),
                 color='yellow')
    return fig, ax

create_football_field()
plt.show()

To start, we are going plot the player positions at the snap.  Let's use a helper function to set the color for the home and visiting team.

In [None]:
# This is just a small helper function to set the color mapping for the Track Plot
# The visiting team *usually* wears white 
def set_color(row):
    if 'H' in row['player']:
        return "black"
    else:
        return "white"

at_snap['color'] = at_snap.apply(lambda row: set_color(row), axis=1)
at_snap

In [None]:
# Plot the positions of players at the snap

fig, ax = create_football_field()
at_snap.plot(x="x", y="y",  kind='scatter', ax=ax, color = at_snap['color'], s=300)
at_snap_home = at_snap.loc[at_snap['player'].str.contains('H')]
at_snap_away = at_snap.loc[at_snap['player'].str.contains('V')]

for index, row in at_snap_away.iterrows():
    ax.annotate(row['player'], (row['x'], row['y']), verticalalignment='center', horizontalalignment='center')
for index, row in at_snap_home.iterrows():
    ax.annotate(row['player'], (row['x'], row['y']), verticalalignment='center', horizontalalignment='center', color = 'white')
x_min = min(at_snap['x']) - 5
x_max = max(at_snap['x']) + 5
y_min = min(at_snap['y']) - 5
y_max = max(at_snap['y']) + 5
ax.set_xlim(x_min, x_max)
ax.set_ylim(y_min, y_max)
plt.show()

In [None]:
# Plot the positions of players through the play

play_track['color'] = play_track.apply(lambda row: set_color(row), axis=1)

# Filter to only include time after the snap
snap_time = at_snap['time'].iloc[0]
play_track = play_track.loc[play_track['time'] > snap_time]

fig, ax = create_football_field()
play_track.plot(x="x", y="y",  kind='scatter', ax=ax, color = play_track['color'], s= 1)

plt.show()

### Submission Instructions

Due to the custom metric, this competition relies on an evaluation pipeline which is slightly different than a typical code competition. Your notebook must import and submit via the custom nflimpact python module available in Kaggle notebooks. To submit, simply add these three lines at the end of your code:

```
# Code for generating a dataframe of your solution goes here
# solution_df = {a dataframe containing all of your predicted impacts}

import nflimpact
env = nflimpact.make_env()

env.predict(solution_df) # solution_df is a pandas dataframe of your entire submission file

```

The dataframe should be in the following format. Each row in your submission represents a single predicted bounding box for a helmet impact for the given frame. Note that it is not required to include labels of which players had an impact, only a bounding box where it occurred.

```
gameKey,playID,view,video,frame,left,width,top,height
57590,3607,Endzone,57590_003607_Endzone.mp4,1,1,1,1,1
57590,3607,Sideline,57590_003607_Sideline.mp4,1,1,1,1,1
57595,1252,Endzone,57595_001252_Endzone.mp4,1,1,1,1,1
57595,1252,Sideline,57595_001252_Sideline.mp4,1,1,1,1,1
etc.
```