# NFL Dataset quick overviewüèâ

In this competition, we are given some format data.

I'll do quick introduction especially focusing on how to load them.

## Contents

1. [Video data](#1)
1. [Tracking data](#2)
1. [Image data](#3)

# <div class="alert alert-block alert-info">preparation</div>

Load libraries and create utility functions. 

In [None]:
import os

import cv2
from IPython.display import Video, display
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns

%matplotlib inline

In [None]:
#https://www.kaggle.com/robikscube/nfl-big-data-bowl-plotting-player-position 
def create_football_field(linenumbers=True,
                          endzones=True,
                          highlight_line=False,
                          highlight_line_number=50,
                          highlighted_name='Line of Scrimmage',
                          fifty_is_los=False,
                          figsize=(12, 6.33)):
    """
    Function that plots the football field for viewing plays.
    Allows for showing or hiding endzones.
    """
    rect = patches.Rectangle((0, 0), 120, 53.3, linewidth=0.1,
                             edgecolor='r', facecolor='darkgreen', zorder=0)

    fig, ax = plt.subplots(1, figsize=figsize)
    ax.add_patch(rect)

    plt.plot([10, 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70, 80,
              80, 90, 90, 100, 100, 110, 110, 120, 0, 0, 120, 120],
             [0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3,
              53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 53.3, 0, 0, 53.3],
             color='white')
    if fifty_is_los:
        plt.plot([60, 60], [0, 53.3], color='gold')
        plt.text(62, 50, '<- Player Yardline at Snap', color='gold')
    # Endzones
    if endzones:
        ez1 = patches.Rectangle((0, 0), 10, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ez2 = patches.Rectangle((110, 0), 120, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ax.add_patch(ez1)
        ax.add_patch(ez2)
    plt.xlim(0, 120)
    plt.ylim(-5, 58.3)
    plt.axis('off')
    if linenumbers:
        for x in range(20, 110, 10):
            numb = x
            if x > 50:
                numb = 120 - x
            plt.text(x, 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white')
            plt.text(x - 0.95, 53.3 - 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white', rotation=180)
    if endzones:
        hash_range = range(11, 110)
    else:
        hash_range = range(1, 120)

    for x in hash_range:
        ax.plot([x, x], [0.4, 0.7], color='white')
        ax.plot([x, x], [53.0, 52.5], color='white')
        ax.plot([x, x], [22.91, 23.57], color='white')
        ax.plot([x, x], [29.73, 30.39], color='white')

    if highlight_line:
        hl = highlight_line_number + 10
        plt.plot([hl, hl], [0, 53.3], color='yellow')
        plt.text(hl + 2, 50, '<- {}'.format(highlighted_name),
                 color='yellow')
    return fig, ax

create_football_field()
plt.show()

<a id="1"></a> <br>
# <div class="alert alert-block alert-success">Video data</div>

In train_labels.csv, we can access mp4 data for train. 

In [None]:
video_train = pd.read_csv("../input/nfl-impact-detection/train_labels.csv")
video_train.head()

mp4 files' name are in video column. mp4 data in train_labels.csv are here.

In [None]:
!ls ../input/nfl-impact-detection/train

With IPython.display module, we can play mp4 data on jupyter notebook.

In [None]:
display(Video(data="/kaggle/input/nfl-impact-detection/train/58098_001193_Endzone.mp4", embed=True))

We can load mp4 video as image data with OpenCV.

In [None]:
video = cv2.VideoCapture("/kaggle/input/nfl-impact-detection/train/58098_001193_Endzone.mp4")

We can also get video width, height, fps and frame count.

In [None]:
print("Width", video.get(cv2.CAP_PROP_FRAME_WIDTH))

print("Height",video.get(cv2.CAP_PROP_FRAME_HEIGHT))

print("FPS",video.get(cv2.CAP_PROP_FPS))

print("Frame Count",video.get(cv2.CAP_PROP_FRAME_COUNT))

First frame is,

In [None]:
ret, frame = video.read()
plt.imshow(frame)

100th frame is,

In [None]:
video.set(cv2.CAP_PROP_POS_FRAMES, 100)
ret, frame = video.read()
plt.imshow(frame)

<a id="2"></a> <br>
# <div class="alert alert-block alert-success">Tracking data</div>

This data includes plyaers' tracking data while games.

In this section, I'll visualize the data.

Also we can see many analysis about tracking data in [NFL Big Data Bowl 2021](https://www.kaggle.com/c/nfl-big-data-bowl-2021).

In [None]:
tracking_train = pd.read_csv("../input/nfl-impact-detection/train_player_tracking.csv")
tracking_train.head()

There are so many time point.

In [None]:
tracking_train.shape

In [None]:
tracking_train[(tracking_train["playID"]==82)&(tracking_train["player"]=="H96")]

x and y are float and represent field coordinate.

In [None]:
playID = 82
player = "H90"
tracking_data = tracking_train[(tracking_train["playID"]==playID)&(tracking_train["player"]==player)]

fig, ax = create_football_field()
g = sns.scatterplot(data=tracking_data, x="x", y="y", color="red", ax=ax)
g.set_title(f"Tracking data of playID: {playID} and player: {player}", fontsize=15)

s and a are players' speed and acceleration at each time point. Distance is players' euclid distance between this point in time and at a previous point in time.

In [None]:
#Create figure and Axes. And set title.
fig, axes = plt.subplots(2, 2, figsize=(10,6), gridspec_kw=dict(wspace=0.1, hspace=0.6))
fig.suptitle(f"Moving information of playID: {playID} and player: {player}", fontsize=15)

#Too check layout, I'll show text on each Axes.
gs = axes[0, 1].get_gridspec()
axes[0, 0].remove()
axes[1, 0].remove()
#Add gridspec we got
axbig = fig.add_subplot(gs[:, 0])


#Add three plots.
sns.distplot(tracking_data["dis"], kde=False, rug=False, color="red", ax=axbig)
axbig.set_title("Distribution of dis", fontsize=12)

sns.distplot(tracking_data["s"], kde=False, color="blue", rug=False, ax=axes[0, 1])
axes[0, 1].set_title('Distribution of s', fontsize=12)

sns.distplot(tracking_data["a"], kde=False, color="green", rug=False, ax=axes[1, 1])
axes[1, 1].set_title('Distribution of a', fontsize=12)

o and dir are angle of players. o is orientation and represents simple direction. dir is angle of player motion.

In [None]:
f,ax=plt.subplots(1,2,figsize=(13,5))

sns.distplot(tracking_data["o"], color="olive", kde=False, rug=False, ax=ax[0])
ax[0].set_title("o distribution")

sns.distplot(tracking_data["dir"], color="darkmagenta", kde=False, rug=False, ax=ax[1])
ax[1].set_xlabel("dir") 
ax[1].set_title('dir distribution')


plt.show()

### Note the angle definition above!

![field](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3258%2F820e86013d48faacf33b7a32a15e814c%2FIncreasing%20Dir%20and%20O.png?generation=1572285857588233&alt=media)

<a id="3"></a> <br>
# <div class="alert alert-block alert-success">Image data</div>

image_labels.csv includes image datas' information.

In [None]:
image_train = pd.read_csv("../input/nfl-impact-detection/image_labels.csv")
image_train.head()

I'll load first image.

In [None]:
im = cv2.imread("../input/nfl-impact-detection/images/" + image_train["image"][0])
plt.imshow(im)