This is a series of kernels for the NFL Punt Analytics competition:
1. [NFL Punt: Game Mechanics](https://www.kaggle.com/argentium/nfl-punt-game-mechanics)
2. [Group Dynamics](https://www.kaggle.com/argentium/nfl-punt-group-dynamics)
3. [Collision Pairs](https://www.kaggle.com/argentium/nfl-punt-collision-pairs)
4. [Penalties](https://www.kaggle.com/argentium/nfl-punt-penalties)

In the previous kernel, I have identified key findings:

1. Coverage Formation Blocking
2. Punt Returner Collisions
3. Gunner Collisions
4. Block Opposition
5. Friendly-Fires Collisions (Without the Gunner)

This is a continuation of the NFL Punt: Game Mechanics kernel.

#### Note: In this kernel, the units are in metric system.
1. Distances are in meters
2. Speed is in kilometers per hour.

In [None]:
import os
import re
import pandas as pd
import numpy as np
import seaborn as sns

import scipy
import math
from matplotlib import pyplot as plt

import statsmodels.api as sm
from statsmodels.formula.api import ols

In [None]:
df_play_info = pd.read_csv('../input/play_information.csv')
df_punt_role = pd.read_csv('../input/play_player_role_data.csv')
df_injury = pd.read_csv('../input/video_review.csv')

In [None]:
team_positions = {'Return': 
                  ['VR', 'VRo', 'VRi', 
                   'VL', 'VLo', 'VLi',
                   'PDR1', 'PDR2', 'PDR3', 'PDR4', 'PDR5', 'PDR6',
                   'PDM',
                   'PDL1', 'PDL2', 'PDL3', 'PDL4', 'PDL5', 'PDL6',
                   'PLR', 'PLR1', 'PLR2', 'PLR3',
                   'PLM', 'PLM1',
                   'PLL', 'PLL1', 'PLL2', 'PLL3', 'PLLi',
                   'PR', 'PFB'
                   ],
     'Coverage': ['GR', 'GRo', 'GRi',
                  'GL', 'GLo', 'GLi',
                  'PRG', 'PRT', 'PRW',
                  'PPR', 'PPRo', 'PPRi', 
                  'PPL', 'PPLo', 'PPLi',
                  'P', 'PC', 'PLS',
                  'PLW', 'PLT', 'PLG'
                  ]}

role_categories = {'G': ['GR', 'GRo', 'GRi',
                        'GL', 'GLo', 'GLi'],
                      'Coverage_Center': ['PRG', 'PLG', 'PRT', 'PLT', 'PRW', 'PLW'],
                  'PP': ['PPR', 'PPRo', 'PPRi',
                         'PPL', 'PPLo', 'PPLi'],
                  'P': ['P'],
                  'PC': ['PC'],
                  'PLS': ['PLS'],
                    'V': ['VR', 'VRo', 'VRi',
                        'VL', 'VLo', 'VLi'],
                  'PD': ['PDR1', 'PDR2', 'PDR3', 'PDR4', 'PDR5', 'PDR6',
                          'PDM',
                         'PDL1', 'PDL2', 'PDL3', 'PDL4', 'PDL5', 'PDL6'],
                  'PL': ['PLR', 'PLR1', 'PLR2', 'PLR3',
                         'PLM', 'PLM1',
                         'PLL', 'PLL1', 'PLL2', 'PLL3', 'PLLi'],
                  'PR': ['PR'],
                  'PFB': ['PFB']
                 }

# Add the corresponding side of their role
def set_team(role):
    for team in team_positions.keys():
        if str(role) in team_positions[team]:
            return str(team)
    return None

def set_role_category(role):
    for category in role_categories.keys():
        if str(role) in role_categories[category]:
            return str(category)
    return None

df_punt_role['Team'] = df_punt_role.apply(lambda row: set_team(row['Role']), axis=1)
df_punt_role['Role_Category'] = df_punt_role.apply(lambda row: set_role_category(row['Role']), 
                                                axis=1)
df_punt_role = df_punt_role.drop(columns=['Season_Year'])

In [None]:
def get_goal(activity):
    if (activity == 'Blocking') or (activity == 'Tackled'):
        return 'Offensive'
    else:
        return 'Defensive'

# Add the corresponding side of their role
def set_phase(row):
    goal = get_goal(row['Player_Activity_Derived'])
    if row['Team'] == 'Coverage':
        if goal == 'Offensive':
            return 1
        else:
            return 2
    else: # Return Team
        if goal == 'Offensive':
            return 2
        else:
            return 1

# Convert to int data type
df_injury['Primary_Partner_GSISID'] = df_injury.apply(lambda row: 
                                                                  row['Primary_Partner_GSISID'] 
                                                                  if (row['Primary_Partner_GSISID'] != 'Unclear')
                                                                 else 0,
                                                                 axis=1)
df_injury['Primary_Partner_GSISID'] = df_injury['Primary_Partner_GSISID'].fillna(0)
df_injury['Primary_Partner_GSISID'] = df_injury['Primary_Partner_GSISID'].astype(int)

# Merge with df_punt_role
df_injury = df_injury.merge(df_punt_role,
                                right_on=['GameKey', 'PlayID', 'GSISID'],
                                 left_on=['GameKey', 'PlayID', 'GSISID'],
                           how='left')
df_injury = df_injury.merge(df_punt_role,
                                right_on=['GameKey', 'PlayID', 'GSISID'],
                                 left_on=['GameKey', 'PlayID', 'Primary_Partner_GSISID'],
                            suffixes=('', '_Partner'),
                           how='left')
df_injury['Phase'] = df_injury.apply(lambda row: 
                                                set_phase(row), 
                                                axis=1)

### NGS Data

For the data heavy processing, we use the code from [How to Import Large CSV Files](https://www.kaggle.com/akosciansky/how-to-import-large-csv-files-and-save-efficiently)

In [None]:
import gc
import tqdm
import feather

dtypes = {'Season_Year': 'int16',
         'GameKey': 'int64',
         'PlayID': 'int16',
         'GSISID': 'float32',
         'Time': 'str',
         'x': 'float32',
         'y': 'float32',
         'dis': 'float32',
         'o': 'float32',
         'dir': 'float32',
         'Event': 'str'}
col_names = list(dtypes.keys())

ngs_files = ['NGS-2016-pre.csv',
             'NGS-2016-reg-wk1-6.csv',
             'NGS-2016-reg-wk7-12.csv',
             'NGS-2016-reg-wk13-17.csv',
             'NGS-2016-post.csv',
             'NGS-2017-pre.csv',
             'NGS-2017-reg-wk1-6.csv',
             'NGS-2017-reg-wk7-12.csv',
             'NGS-2017-reg-wk13-17.csv',
             'NGS-2017-post.csv']

# Load each ngs file and append it to a list. 
# We will turn this into a DataFrame in the next step

df_list = []

for i in tqdm.tqdm(ngs_files):
    df = pd.read_csv('../input/'+i, usecols=col_names,dtype=dtypes)
    
    df_list.append(df)

# Merge all dataframes into one dataframe
ngs = pd.concat(df_list)

# Delete the dataframe list to release memory
del df_list
gc.collect()

# # Convert Time to datetime
ngs['Time'] = pd.to_datetime(ngs['Time'], format='%Y-%m-%d %H:%M:%S')

# There are 2536 out of 66,492,490 cases where GSISID is NAN. Let's drop those to convert the data type
ngs = ngs[~ngs['GSISID'].isna()]

# Convert GSISID to integer
ngs['GSISID'] = ngs['GSISID'].astype('int32')

# ngs.set_index(['GameKey', 'PlayID', 'GSISID'], inplace=True)

# I. Data Pre-processing

I only extracted those with injuries to save memory.

In [None]:
# Get Injury Moves
# Ensure same data types
columns = ['GameKey', 'PlayID']
for col in columns:
    df_injury[col] = df_injury[col].astype(ngs[col].dtype)
    df_punt_role[col] = df_punt_role[col].astype(ngs[col].dtype)

df_injury['GSISID'] = df_injury['GSISID'].astype(ngs['GSISID'].dtype)
df_injury['Primary_Partner_GSISID'] = df_injury['Primary_Partner_GSISID'].astype(ngs['GSISID'].dtype)

# Get Only Games with Injuries
df_injury_moves = ngs.merge(df_injury[['GameKey', 'PlayID']],
                            left_on=['GameKey', 'PlayID'],
                            right_on=['GameKey', 'PlayID'])

# Create Gameplay ID
df_injury_moves['Gameplay'] = df_injury_moves.apply(lambda row: 
                                            str(row['GameKey']) + '_' + 
                                            str(row['PlayID']),
                                           axis=1)

# Delete the dataframe list to release memory
del ngs
gc.collect()

## A. Basic Info

In [None]:
# I added the role for easier categorization
df_injury_moves = df_injury_moves.merge(df_punt_role,
                                  left_on=['GameKey', 'PlayID', 'GSISID'],
                                  right_on=['GameKey', 'PlayID', 'GSISID'],
                                 suffixes=('', '_Injury'))

### 1. Distance (Conversion of yards to meters)

In [None]:
df_injury_moves['x'] = 0.9144 * df_injury_moves['x']
df_injury_moves['y'] = 0.9144 * df_injury_moves['y']
df_injury_moves['dis'] = 0.9144 * df_injury_moves['dis']

### 2. Time (Time elapsed since the start of the play)

In [None]:
# Get only events
df_events = df_injury_moves.dropna(subset=['Event'])
df_events_indexed = df_events.set_index(['GameKey', 'PlayID', 'Event'])

# Get Aggregates
df_injury_moves_gameplay = df_injury_moves.groupby(['GameKey', 'PlayID']).agg({'Time': ['min', 'max'],
                                                                               'dis': ['min', 'median', 'max', 'sum'],
                                                                               'dir': ['min', 'median', 'max'],
                                                                               'o': ['min', 'median', 'max']})

# Get start time by subtracting the time at play start in the specified GameKey, PlayID
def get_start_time(row):
    start = None
    try:
        start = df_events_indexed.loc[(row['GameKey'], row['PlayID'], 'ball_snap')]['Time'][0]
    finally:
        if start==None:
            return None
        else:
            end = row['Time']
            return (end - start).total_seconds()

df_injury_moves['PlayStartTime'] = df_injury_moves.apply(lambda row: 
                                                         get_start_time(row),
                                                         axis=1)
# Remove data beyond the start of the punt play
df_injury_moves = df_injury_moves[df_injury_moves['PlayStartTime'] >= 0]

### 3. Speed (in kilometers per hour or kph)
I computed the speed according to the change in distance and time.

In [None]:
def get_speed(row):
    meters_per_sec = row['dis'] / 0.1
    kph = 3.6 * meters_per_sec
    return kph

df_injury_moves = df_injury_moves.sort_values(by=['GameKey', 'PlayID', 'GSISID', 'PlayStartTime'])
df_injury_moves['kph'] = df_injury_moves.apply(lambda row: get_speed(row), axis=1)

### 4. PR Distance
Since the PR plays a central role in the Punt play, I decided to compute player distances from the PR.

In [None]:
# Get the PR Moves
df_pr_moves = df_injury_moves[df_injury_moves['Role']=='PR']
df_pr_moves = df_pr_moves.set_index(['GameKey', 'PlayID', 'PlayStartTime'])

# Compute each distance from the PR
def get_distance(row):
    try:
        coordinates = df_pr_moves.loc[(row['GameKey'], row['PlayID'], row['PlayStartTime'])]
        return math.sqrt(pow(coordinates['x'] - row['x'], 2) + pow(coordinates['y'] - row['y'], 2))
    except:
        return None

df_injury_moves['PR_Distance'] = df_injury_moves.apply(lambda row: get_distance(row), 
                                                       axis=1)

## B. Collision Pairs

I listed the moves of the involved pairs first.

In [None]:
df_injury_moves_player = df_injury_moves.merge(df_injury[['GameKey', 'PlayID', 'GSISID']],
                                                left_on=['GameKey', 'PlayID', 'GSISID'],
                                                 right_on=['GameKey', 'PlayID', 'GSISID'],
                                                 how='right')
df_injury_moves_partner = df_injury_moves.merge(df_injury[['GameKey', 'PlayID', 
                                                          'Primary_Partner_GSISID']],
                                                left_on=['GameKey', 'PlayID', 'GSISID'],
                                                 right_on=['GameKey', 'PlayID', 'Primary_Partner_GSISID'],
                                                 how='right')

# Put it side-by-side in a row
df_involved_pairs = df_injury_moves_player.merge(df_injury_moves_partner,
                                                left_on=['GameKey', 'PlayID', 'PlayStartTime'],
                                                right_on=['GameKey', 'PlayID', 'PlayStartTime'],
                                                suffixes=('_Player', '_Partner'))

# Put it all in a list
df_injury_moves_player['Involvement'] = 'Player'
df_injury_moves_partner['Involvement'] = 'Partner'
df_injury_moves_involved = pd.concat([df_injury_moves_player, df_injury_moves_partner])

### 1. Pair Distances

I computed the distance between the partner pairs across time.

In [None]:
# Put it side-by-side in a row
df_involved_pairs = df_injury_moves_player.merge(df_injury_moves_partner,
                                                left_on=['GameKey', 'PlayID', 'PlayStartTime'],
                                                right_on=['GameKey', 'PlayID', 'PlayStartTime'],
                                                suffixes=('_Player', '_Partner'))

# Compute Pair Distance
def get_distance(row):
    return math.sqrt(pow(row['x_Player'] - row['x_Partner'], 2) + 
                     pow(row['y_Player'] - row['y_Partner'], 2))


df_involved_pairs['PairDistance'] = df_involved_pairs.apply(lambda row:
                                                            get_distance(row),
                                                            axis=1)
# Create Gameplay ID
df_involved_pairs['Gameplay'] = df_involved_pairs.apply(lambda row: 
                                            str(row['GameKey']) + '_' + 
                                            str(row['PlayID']),
                                           axis=1)

### 2. Collision Point

Since I already have the distance of the pairs from each other, the data with the collision point can be easily extracted. The collision occurs when the two players are very close to each other. Thus, I only obtained the rows of their minimum pair distances, which should coincide with their time of collision.

In [None]:
df_min_distances = df_involved_pairs.groupby(["GameKey", "PlayID"])['PairDistance'].idxmin()
df_collision_point = df_involved_pairs.loc[df_min_distances]

# II. Analysis

## A. Coverage Formation Blocking

### What are the trends in the Blocking interactions?

In the previous analysis, I identified a weakness in the Coverage formation before the kick. In here, I studied the movements further.

#### X Axis

In [None]:
df_injury_phase1 = df_injury[df_injury['Phase']==1]
df_injury_phase1_blocking = df_injury_phase1[df_injury_phase1['Player_Activity_Derived']=='Blocking']
df_pair = df_injury_moves_involved.merge(df_injury_phase1_blocking,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", height=5) 
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

The x axis movement shows how the blocking person (blue) is pushed back by the opponent. The players are clearly facing each other.

#### Y Axis

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", height=5) 
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

Looking at the y coordinate, the opponents faced each other from an angle. While the blocking person is pushed back, he was also pushed sidewards by the opponent. This weakens the blocking position of the defender.

#### Finding:
1. The Blocking person is pushed backwards by the person in front.
2. However, there is a sidewards angle to the push that makes it harder for the person to block.

## B. Punt Returner Tackles

### What happens when the PR is tackled?

#### X Axis

In [None]:
df_injury_phase2 = df_injury[df_injury['Phase']==2]
df_injury_phase2_tackled = df_injury_phase2[df_injury_phase2['Player_Activity_Derived']=='Tackled']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_tackled,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", height=5) 
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

The main players are:
- The PR (Blue)
- The Tackling player (Orange)

The two players involved came from the different sides of the field. The PR is often placed further back to catch the ball while the Coverage team is placed near the scrimmage line.

For both, there is an initial delay of running across the field, but their reasons are different.
- The PR is waiting for the ball
- The tackling player is likely engaged in blocking the Return team

As time goes on, the two changes priority.
- In some cases, the PR noticably ran backwards to fetch the ball.
- The tackling player disengages from the initial formation and runs towards the PR.

Often, the person tackling has already travelled such a long distance, which also means the person tackling is quite fast. Such conditions make the PR prone to injuries. Below, I displayed the y coordinate movement across time to show that the two generally sees each other as they move from opposite directions.

#### Y Axis

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", height=5) 
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

PR injures often occur when the opponent comes from a different side (y coordinate) of the field.

### What happens when the Tackling Player is injured?

#### X Coordinate

In [None]:
df_injury_phase2_tackling = df_injury_phase2[df_injury_phase2['Player_Activity_Derived']=='Tackling']
df_injury_phase2_tackling_opponent = df_injury_phase2_tackling[df_injury_phase2_tackling['Friendly_Fire']=='No']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_tackling_opponent,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

Above the x-coordinate movements of the players are shown.

The players are colored as follows:
- The PR (orange)
- Tackling Player (blue)

There is just subtle difference when the tackling person is injured. Notice that both players stayed on the same gridline for some time.

Below, I graphed the movements along the y-coordinate to put more context to the story.

#### Y Coordinate

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

In most cases, both were running in the same y direction along with a close x proximity (previous graph). This means that the PR is about to run past the tackling player. Consequently, the tackling player was forced to lunge forward towards the PR. In a way, the player movements above explains the higher number of helmet-to-body collisions of the tackling player. Instead of a head-on collision, there was a chase involved and the tackling player ended up propelling himself towards the PR.

### Speed

In [None]:
df_pair = df_collision_point.merge(df_injury_phase2_tackling_opponent[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

There are 4 out of 9 cases that the tackling partner is running faster than 20 kph. This means being fast is more risky for the tackling partner.

#### Finding:
The differences on who gets injured depends on the angle or side of the collision.
1. When the tackling person comes from the opposite side in both axis, the PR is likely to get injured because the PR did not expect the collision from the opposite side (y axis).
2. When the tackling person comes in the same y axis, meaning the PR is about to pass by him, the tackling player is likely to get injured while lunging towards the PR.
    - This often results to helmet-to-body collisions.
    - The tackling player is also very fast in almost half of the tackling injuries.

## C. Gunner Collisions
### What are the trends between gunner collisions?

In [None]:
df_injury_gunner = df_injury[(df_injury['Role_Category']=='G') |
                                 (df_injury['Role_Category_Partner']=='G')]

df_injury_moves_paired = df_injury_gunner.merge(df_injury_moves_involved,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='left')

df_injury_moves_paired = df_injury_moves_paired[df_injury_moves_paired['PlayStartTime']<15]

#### Y Axis

In [None]:
g = sns.FacetGrid(df_injury_moves_paired, hue='Involvement', col="Gameplay", col_wrap=4, height=5)
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

The gunner often comes from the opposite side of the field regardless of the activity involved.

#### Speed

In [None]:
df_injury_moves_paired_pivot = df_injury_gunner.merge(df_involved_pairs,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='left')

g = sns.FacetGrid(df_injury_moves_paired, hue='Involvement', col="Gameplay", col_wrap=4, height=5)
g = g.map(plt.plot, "PlayStartTime", "kph", marker=".")

In matching the collision time with the speed, it can be seen that the collision occurs near the Gunner's peak speed almost right after the peak speeds. This means that the Gunner lacks the control to decelerate or slow down enough before the collisions. Hence, there is high speed collisions with the gunner.

### Gunner is Injured: PR Distance

In [None]:
df_injury_gunner = df_injury[(df_injury['Role_Category']=='G')]

df_injury_moves_paired = df_injury_gunner.merge(df_injury_moves_involved,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='left')
df_injury_moves_paired = df_injury_moves_paired[df_injury_moves_paired['PlayStartTime']<15]
g = sns.FacetGrid(df_injury_moves_paired, hue='Involvement', col="Gameplay", col_wrap=3, height=5)
g = g.map(plt.plot, "PlayStartTime", "PR_Distance", marker=".")

The players are colored as follows:
- The Gunner (blue)
- Partner Player (orange)

When the gunner is injured, he was performing his task of running towards the PR. He just happened to collide with someone along the way. In the two cases, where the orange line is at 0, the partner role is the PR.

In [None]:
g = sns.FacetGrid(df_injury_moves_paired, hue='Involvement', col="Gameplay", col_wrap=3, height=5)
g = g.map(plt.plot, "PlayStartTime", "x", marker=".")

In this scenario, there are two activities involved:
- PR-related activities tend to start from distant x axis values
- Other activities tend to start close to one another. The pattern is similar to the block activities, which will be discussed later.

### Gunner is Injured: Speed
What was the Gunner's speed when injured?

In [None]:
df_injury_gunner_player = df_injury[(df_injury['Role_Category']=='G')]
df_min_distances_gunner = df_collision_point.merge(df_injury_gunner_player[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_min_distances_gunner[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

As the injured player, the Gunner was running around more than 17-29 kph in 4 cases. This indicates that he was really running faster than the partner player for most of the cases. In two of those 4 cases, his partner player is the PR. 

### Gunner Injured Someone: PR Distance

In [None]:
df_injury_gunner = df_injury[(df_injury['Role_Category_Partner']=='G')]

df_injury_moves_paired = df_injury_gunner.merge(df_injury_moves_involved,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='left')
df_injury_moves_paired = df_injury_moves_paired[df_injury_moves_paired['PlayStartTime']<15]

g = sns.FacetGrid(df_injury_moves_paired, hue='Involvement', col="Gameplay", col_wrap=4, height=5)
g = g.map(plt.plot, "PlayStartTime", "PR_Distance", marker=".")

The players are colored as follows:
- The Gunner (orange)
- Partner Player (blue)

The Gunner is again moving towards the PR. In one case, the PR got injured by the Gunner.

### Gunner Injured Someone: Speed
What was the Gunner's speed as the partner role in the injuries?

In [None]:
df_injury_gunner_partner = df_injury[(df_injury['Role_Category_Partner']=='G')]
df_min_distances_gunner = df_collision_point.merge(df_injury_gunner_partner[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_min_distances_gunner[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

As the partner player, the Gunner's speed is only above 20 kph for 1 instance out of 4. In this instance, the partner role is the PR. This means that the Gunner can be a risk to the PR.

#### Finding: 

1. The Gunner tends to collide with others from a different side (y axis). 
2. The Gunner's speed of collision is very high when he got injured. This means that he was unable to slow down at the right moment.

## D. Block Opposition

### What are the roles of the teams after the kick?
We review the objectives of the teams.
1. Coverage
    - The Coverage team must tackle the PR to prevent him from gaining yards.
2. Return
    - Hence, the Return team responds by blocking the Coverage team from reaching the PR.

Before the analysis, I removed the star players, which may skew the data.

### What happens in Blocked Injuries?

#### X Axis

In [None]:
df_injury_phase2_blocked = df_injury_phase2[(df_injury_phase2['Player_Activity_Derived']=='Blocked')]
df_injury_phase2_blocked_opponent = df_injury_phase2_blocked[df_injury_phase2_blocked['Friendly_Fire']=='No']

df_pair = df_injury_moves_involved.merge(df_injury_phase2_blocked_opponent,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

- The Blocked person (in blue) is moving in the same x direction as the blocking person (orange).
- However, notice how the blocked person is often a bit earlier than the blocking person at an x coordinate.
- When the blocked person stops or changes direction, the blocking person catches up and causes some injury.

### Y Axis

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

In the first 2 cases, the opponents are moving from opposite direction. On the other cases, it seems that they were moving along the same y axis for quite some time. There is no clear pattern that stands out.

### PR Distance

In [None]:
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5)
g = g.map(plt.plot, "PlayStartTime", "PR_Distance", marker=".")

In around half of the blocked cases, both players were moving close to the PR. The excepction is Game ID 392, PlayID 1088.

### Pair Distances

In [None]:
df_pair_distance = df_involved_pairs.merge(df_injury_phase2_blocked_opponent,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair_distance = df_pair_distance[df_pair_distance['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair_distance, col='Gameplay', col_wrap=5, height=5) 
g = g.map(plt.scatter, 'PlayStartTime', 'PairDistance', marker=".")

In considering the pair distances, 3 out of 5 blocked cases showed that the pair were initially close together, but moved apart before colliding later on. This indicates that the pair was separated for a short while. This adds to the evidence that the blocked person (an early runner) got injured when a late runner opponent catches up with him.

### Speed

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.plot, "PlayStartTime", "kph", marker=".")

Similar to the gunner injuries, most of the collisions occur momements after the peak speeds. This means there is insufficient time to decelerate or slow down before collision.

### What were their exact speeds on the time of the collisions?

In [None]:
df_min_distances_pair = df_collision_point.merge(df_injury_phase2_blocked_opponent[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_min_distances_pair[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

The speed of the partner player is above 20kph in 3 out of 5 cases.

### What happens in Blocking injuries?

#### X Axis

In [None]:
df_injury_phase2_blocking = df_injury_phase2[df_injury_phase2['Player_Activity_Derived']=='Blocking']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_blocking,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

Similar to the Blocked injuries, the players are moving along the same x axis.

The story is similar to the Blocking injury:
- The Blocked person (in orange) is moving in the same x direction as the blocking person (blue).
- However, notice how the blocked person is often a bit earlier than the blocking person at an x coordinate.
- When the blocked person stops or changes direction, the blocking person catches up and causes some injury.

#### Y Axis

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

In most blocking injuries, they both ran in the same y direction. The blocking person (blue) caught up with the blocked person (orange). In most cases, the collision occured close to the time when the blocking person made a sudden change in direction.

### PR Distance

In [None]:
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5)
g = g.map(plt.plot, "PlayStartTime", "PR_Distance", marker=".")

In all cases, both players were moving closer to the PR.

### Pair Distance

In [None]:
df_pair_distance = df_involved_pairs.merge(df_injury_phase2_blocking, 
                                  left_on=['GameKey', 'PlayID'],
                                 right_on=['GameKey', 'PlayID'],
                                 how='right')
df_pair_distance = df_pair_distance[df_pair_distance['PlayStartTime'] < 15]

# Graph
g = sns.FacetGrid(df_pair_distance, col='Gameplay', col_wrap=5, height=5) 
g = g.map(plt.plot, 'PlayStartTime', 'PairDistance', marker=".")

The pair distances show some parabolic trajectory. This means the players were close together, then separated for some distance, around more than 15 yards. Again, this is consistent with the theory that the early runner is often hit by the late runner when changing directions.

### Speed

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=5, height=5) 
g = g.map(plt.plot, "PlayStartTime", "kph", marker=".")

When the speed is matched to the pair distance, it could be observed that the collisions occured right after the peak speeds of the blocked person (orange).

### Collision Speed

In [None]:
df_min_distances_pair = df_collision_point.merge(df_injury_phase2_blocking[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_min_distances_pair[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

Even if they did not collide at peak speeds, the speed is an issue because it was difficult for them to slow down or decelerate as shown in the speed vs time graph.

#### Finding: 

1. Both players were generally running towards the PR.
2. The players were initially close together, but separated before colliding into one another.
3. The collisions occured right after a change in direction.
4. Speed:
    - Blocked Injury: The blocking player often has high speeds when the collision occured and the blocked person got injured.
    - Blocking Injury: The collision occured right after the peak speeds of the blocked player.

## E. Friendly-fire Collisions

### What are the trends of the friendly-fire injuries?

For the friendly-fires, I performed a similar analysis to the blocked/blocking injuries.

#### X Axis

In [None]:
df_injury_phase2_friend = df_injury_phase2[df_injury_phase2['Friendly_Fire']=='Yes']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_friend, 
                                  left_on=['GameKey', 'PlayID'],
                                 right_on=['GameKey', 'PlayID'],
                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=3, height=5)
g = g.map(plt.scatter, "PlayStartTime", "x", marker=".")

Although the friendly fires are a combination of tackling and blocked activities, the pattern is similar to the blocked/blocking injuries.
- The injured person (in orange) is moving in the same x direction as the comrade (blue).
- One person is often a bit earlier than the other person at an x coordinate.
- They catch up at a certain location.

#### Tackle: Y Axis

In [None]:
df_injury_phase2_tackling_friend = df_injury_phase2[df_injury_phase2['Friendly_Fire']=='Yes']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_tackling_friend,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=3, height=5)
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

In the friendly-fire tackling injuries, the injured is was moving in the opposite direction of the partner role. This means that while tackling the PR, they bumped each other from the opoosite directions.

### Speed vs Time

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=3, height=5) 
g = g.map(plt.plot, "PlayStartTime", "kph", marker=".")

The collisions occured right after the peak speeds.

### What were their exact speeds?

In [None]:
df_collision_time = df_collision_point.merge(df_injury_phase2_tackling_friend[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_collision_time[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

The speeds are actually lower than other cases. This means that there may be something wrong with their tackling methods.

#### Blocked: Y Axis

In [None]:
df_injury_phase2_blocked = df_injury_phase2[df_injury_phase2['Player_Activity_Derived']=='Blocked']
df_injury_phase2_blocked_friend = df_injury_phase2_blocked[df_injury_phase2_blocked['Friendly_Fire']=='Yes']
df_pair = df_injury_moves_involved.merge(df_injury_phase2_blocked_friend,
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair = df_pair[df_pair['PlayStartTime']<15]

g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=3, height=5)
g = g.map(plt.scatter, "PlayStartTime", "y", marker=".")

#### Blocked: Pair Distance

In [None]:
df_pair_distance = df_involved_pairs.merge(df_injury_phase2_blocked_friend, 
                                  left_on=['GameKey', 'PlayID'],
                                 right_on=['GameKey', 'PlayID'],
                                 how='right')
df_pair_distance = df_pair_distance[df_pair_distance['PlayStartTime'] < 15]

# Graph
g = sns.FacetGrid(df_pair_distance, col='Gameplay', col_wrap=5, height=5) 
g = g.map(plt.plot, 'PlayStartTime', 'PairDistance', marker=".")

On the other hand, they were moving in the same y direction for the blocked cases. This is the same case for the early runners getting hit by the late runners.

### Speed vs Time

In [None]:
# Graph
g = sns.FacetGrid(df_pair, hue='Involvement', col="Gameplay", col_wrap=3, height=5) 
g = g.map(plt.plot, "PlayStartTime", "kph", marker=".")

Collisions occur right after the peak speeds.

### What were their speeds?

In [None]:
df_pair = df_collision_point.merge(df_injury_phase2_blocked_friend[['GameKey', 'PlayID']],
                                                left_on=['GameKey', 'PlayID'],
                                                 right_on=['GameKey', 'PlayID'],
                                                 how='right')
df_pair[['GameKey', 'PlayID', 'PlayStartTime', 'kph_Player', 'kph_Partner']].head(40)

For the blocked cases, there is only one instace of speeds more than 20kph. The others have much lowers speeds.

#### Finding:
1. Both players were moving in the same x axis.
2. Activity
    1. Tackling injuries were caused by tackling the PR in opposite y axis directions. This means that the Coverage team did not coordinate with tackles
    2. Blocked injuries occured when both were moving in the same direction. The players were initially close together, separated, then collided with one another. This is similar to the early runners vs late runners injuries.
3. Most of the collisions occured after the peak speeds.