# Introduction
The NFL is America's most popular sports league. Founded in 1920, the organization behind American football has developed the model for the successful modern sports league. They're committed to advancing every aspect of the game, including the lesser researched special teams. In this competition, you’ll quantify team or individual strategies, rank players, or even something we haven’t considered.

With your creativity and analytical skills, the development of these new methods could lead to additional stats for special teams plays.

References Notebooks: 
* https://www.kaggle.com/werooring/nfl-big-data-bowl-basic-eda-for-beginner
* https://www.kaggle.com/travistyler/nfl-special-teams-eda
* https://www.kaggle.com/bcruise/nfl-kickoff-returns-eda
* https://www.kaggle.com/hijest/nfl-big-data-bowl-2022-starters-eda

In [None]:
# General Libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os # General OS properties
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Plotting Libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Data Reperesentation
* **Game Data:** The **games.csv** contains the teams playing in each game. The key variable is **gameId**.
* **Play Data:** The **plays.csv** file contains play-level information from each game. The key variables are **gameId** and **playId**.
* **Player Data:** The **players.csv** file contains player-level information from players that participated in any of the tracking data files. The key variable is **nflId**.
* **Tracking Data:** Files tracking **season.csv** contain player tracking data from season **[season]**. The key variables are **gameId**, **playId**, and **nflId**.
* **PFF Scouting Data:** The **PFFScoutingData.csv** file contains play-level scouting information for each game. The key variables are **gameId** and **playId**.

In [None]:
# Read the Major data files for representation

df_games = pd.read_csv("../input/nfl-big-data-bowl-2022/games.csv")
df_plays = pd.read_csv("../input/nfl-big-data-bowl-2022/plays.csv")
df_players = pd.read_csv("../input/nfl-big-data-bowl-2022/players.csv")
df_ppfSctdata = pd.read_csv("../input/nfl-big-data-bowl-2022/PFFScoutingData.csv")

# Deatils of Game Data
* **gameId**: Game identifier, unique (numeric)
* **season**: Season of game
* **week**: Week of game
* **gameDate**: Game Date (time, mm/dd/yyyy)
* **gameTimeEastern**: Start time of game (time, HH:MM:SS, EST)
* **homeTeamAbbr**: Home team three-letter code (text)
* **visitorTeamAbbr**: Visiting team three-letter code (text)

In [None]:
# Glimpse of Game data
df_games.head()

In [None]:
# Games data set information in brief
df_games.info()

In [None]:
# Augment the Games data related to Day, Week & Year

df_games["gameDate"] = pd.to_datetime(df_games["gameDate"])
df_games['mnthByYear'] = df_games["gameDate"].dt.month_name()
df_games['dByWeek'] = df_games["gameDate"].dt.day_name()

In [None]:
# Plot the Number Games / Year in respective Season on MOnthly Basis

sns.set_style("whitegrid")
plt.figure(figsize=(8, 6))
ax = plt.gca()

sns.countplot(x='season', data=df_games, hue='mnthByYear', lw=2, ec='black', ax=ax).set(title='Number of Games in a Season on Monthly basis')
ax.legend(loc='center right', bbox_to_anchor=(1.5, 0.5), ncol=1)
plt.show()

In [None]:
# Plot the Number Games / Year in respective Season on Waeekday Basis

sns.set_style("whitegrid")
plt.figure(figsize=(8, 6))
ax = plt.gca()

sns.countplot(x='season', data=df_games, hue='dByWeek', lw=2, ec='black', ax=ax).set(title='Number of Games in a Season on Weekday basis')
ax.legend(loc='center right', bbox_to_anchor=(1.5, 0.5), ncol=1)
plt.show()

In [None]:
# Details from : Home team with respect to Visitor Team

hTeamVsvTeam = df_games.groupby(['homeTeamAbbr', 'visitorTeamAbbr'], as_index=False).count()
plt.figure(figsize=(15, 10))
sns.set_style('dark')

ax = sns.swarmplot(data=hTeamVsvTeam, x="homeTeamAbbr", y="visitorTeamAbbr", hue="season").set(title='Home Team with respect to Visitor Team')

In [None]:
# Games details with respect to Time (EST)

sns.set_style("whitegrid")
plt.figure(figsize=(15, 10))
ax = plt.gca()

sns.countplot(x='season', data=df_games, hue='gameTimeEastern', lw=2, ec='black', ax=ax).set(title='Game details in a Season on Time (EST) basis')
ax.legend(loc='center right', bbox_to_anchor=(1.5, 0.5), ncol=1)
plt.show()

# Details of Play Data
* **gameId**: Game identifier, unique (numeric)
* **playId**: Play identifier, not unique across games (numeric)
* **playDescription**: Description of play (text)
* **quarter**: Game quarter (numeric)
* **down**: Down (numeric)
* **yardsToGo**: Distance needed for a first down (numeric)
* **possessionTeam**: Team punting, placekicking or kicking off the ball (text)
* **specialTeamsPlayType**: Formation of play: Extra Point, Field Goal, Kickoff or Punt (text)
* **specialTeamsPlayResult**: Special Teams outcome of play dependent on play type: Blocked Kick Attempt, Blocked Punt, Downed, Fair Catch, Kick Attempt Good, Kick Attempt No Good, Kickoff Team Recovery, Muffed, Non-Special Teams Result, Out of Bounds, Return or Touchback (text)
* **kickerId**: nflId of placekicker, punter or kickoff specialist on play (numeric)
* **returnerId**: nflId(s) of returner(s) on play if there was a special teams return. Multiple returners on a play are separated by a ; (text)
* **kickBlockerId**: nflId of blocker of kick on play if there was a blocked field goal or blocked punt (numeric)
* **yardlineSide**: 3-letter team code corresponding to line-of-scrimmage (text)
* **yardlineNumber**: Yard line at line-of-scrimmage (numeric)
* **gameClock**: Time on clock of play (MM:SS)
* **penaltyCodes**: NFL categorization of the penalties that occurred on the play. Multiple penalties on a play are separated by a ; (text)
* **penaltyJerseyNumber**: Jersey number and team code of the player committing each penalty.
* **Multiple** penalties on a play are separated by a ; (text)
* **penaltyYards**: yards gained by possessionTeam by penalty (numeric)
* **preSnapHomeScore**: Home score prior to the play (numeric)
* **preSnapVisitorScore**: Visiting team score prior to the play (numeric)
* **passResult**: Scrimmage outcome of the play if specialTeamsPlayResult is "Non-Special Teams Result" (**C:** Complete pass, **I:** Incomplete pass, **S:** Quarterback sack, **IN:** Intercepted pass, **R:** Scramble, **' ':** Designed Rush, text)
* **kickLength**: Kick length in air of kickoff, field goal or punt (numeric)
* **kickReturnYardage**: Yards gained by return team if there was a return on a kickoff or punt (numeric)
* **playResult**: Net yards gained by the kicking team, including penalty yardage (numeric)
* **absoluteYardlineNumber**: Location of ball downfield in tracking data coordinates (numeric)

In [None]:
# Glimpse of Play data
df_plays.head()

In [None]:
# Plays data set information in brief
df_plays.info()

In [None]:
# In play data set vrious fields contains "NaN" vlaues but still I am plotting these fields 
# without implementing any feature engineering.

fig, ((ax1,ax2),(ax3,ax4),(ax5,ax6),(ax7,ax8), (ax9,ax10)) = plt.subplots(5,2, figsize=(15,20))

sns.histplot(data=df_plays, x="kickLength", bins=50, kde=True, ax=ax1).set(title='Kick Length (Yards)')
sns.histplot(data=df_plays, x="kickReturnYardage", bins=50, kde=True, ax=ax2).set(title='Kick Return Length (Yards)')

sns.histplot(data=df_plays, x="playResult", bins=50, kde=True, ax=ax3).set(title='Net Distance Gained (Yards)')
sns.histplot(data=df_plays, x="yardsToGo", bins=50, kde=True, ax=ax4).set(title='First Down Distance (Yards)')

sns.histplot(data=df_plays, x="penaltyYards", bins=50, kde=True, ax=ax5).set(title='Possesion Team gained Distance by Penalty (Yards)')
sns.histplot(data=df_plays, x="penaltyCodes", bins=50, kde=True, ax=ax6).set(title='NFL Penalty Categorization')

sns.histplot(data=df_plays, x="specialTeamsPlayType", bins=50, kde=True, ax=ax7).set(title='Formation of Play')
sns.histplot(data=df_plays, x="specialTeamsResult", bins=50, kde=True, ax=ax8).set(title='Results of Special Team Play')

sns.histplot(data=df_plays, x="passResult", bins=50, kde=True, ax=ax9).set(title='Scrimmage Outcome')
sns.histplot(data=df_plays, x="yardlineNumber", bins=50, kde=True, ax=ax10).set(title='Line Of Scrimmage')

plt.tight_layout()

# Details of Players Data
* **gameId**: Game identifier, unique (numeric)
* **playId**: Play identifier, not unique across games (numeric)
* **nflId**: Player identification number, unique across players (numeric)
* **Height**: Player height (text)
* **Weight**: Player weight (numeric)
* **birthDate**: Date of birth (YYYY-MM-DD)
* **collegeName**: Player college (text)
* **Position**: Player position (text)
* **displayName**: Player name (text)

In [None]:
# Glimpse of Players data
df_players.head()

In [None]:
# Players data set information in brief
df_players.info()

# Height Column
Height column of the need to be modified and stored in the form of height feet with decimal placces properly.

In [None]:
# Spilt the data of the cell with token '-'
check = df_players['height'].str.split('-', expand=True)

# Store the splitted data into list with "decimal" & 
check.columns = ['integer', 'fraction']

# Check the fraction & integer part
check.loc[(check['fraction'].notnull()), 'integer'] = check[check['fraction'].notnull()]['integer'].astype(np.int16) * 12 + check[check['fraction'].notnull()]['fraction'].astype(np.int16)

# Store the updated height value.
df_players['height'] = check['integer']
df_players['height'] = df_players['height'].astype(np.float32)
df_players['height'] /= 12

In [None]:
# Display data after column updation
df_players.head()

In [None]:
# Create separate column for Birth Year
df_players['birthYear'] = 0

# If there NA for bith year, Drop those values
df_players.dropna(subset=['birthDate'], inplace=True)

# Extract the birth year - From DOB column
for idx, row in df_players.iterrows():
    if len(row['birthDate'].split('/')) == 3: 
        df_players.loc[idx, 'birthYear'] = row['birthDate'].split('/')[2]
        
    elif len(row['birthDate'].split('-')) == 3:
        df_players.loc[idx, 'birthYear'] = row['birthDate'].split('-')[0]

In [None]:
# Display the players data in segmented cells
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2, figsize=(10,10))

ax = sns.histplot(df_players['height'], bins=50, kde=True, ax=ax1).set(title='Height Distribution')
ax = sns.histplot(df_players['weight'], bins=50, kde=True, ax=ax2).set(title='Weight Distribution')

ax = sns.histplot(df_players['birthYear'], bins=50, kde=True, ax=ax3).set(title='Players with respect to Birth Year')
ax = sns.histplot(df_players['nflId'], bins=50, kde=True, ax=ax4).set(title='Player Identification')

plt.tight_layout()

# PFF Scouting data
* **gameId**: Game identifier, unique (numeric)
* **playId**: Play identifier, not unique across games (numeric)
* **snapDetail**: Punts Detail (H: High, L: Low, <: Left,>: Right, OK: Accurate Snap, text)
* **operationTime**: Timing from snap to kick on punt plays in seconds: (numeric)
* **hangTime**: Hangtime of player's punt or kickoff attempt in seconds. (numeric)
* **kickType**: Kickoff or Punt Type (text).
* Possible values for kickoff plays:
  * D: Deep - your normal deep kick with decent hang time
  * F: Flat - different than a Squib, in that it will have some hang time and no roll but has a lower trajectory and hang time than a Deep kick off
  * K: Free Kick - Kick after a safety
  * O: Obvious Onside - score and situation dictates the need to regain possession. Also the hands team is on for the returning team
  * P: Pooch kick - high for hangtime but not a lot of distance - usually targeting an upman
  * Q: Squib - low-line drive kick that bounces or rolls considerably, with virtually no hang time
  * S: Surprise Onside - accounting for score and situation an onsides kick that the returning team doesn’t expect. Hands teams probably aren't on the field
  * B: Deep Direct OOB - Kickoff that is aimed deep (regular kickoff) that goes OOB directly (doesn't bounce)
* Possible values for punt plays:
  * N: Normal - standard punt style
  * R: Rugby style punt
  * A: Nose down or Aussie-style punts
* **kickDirectionIntended**: Intended kick direction from the kicking team's perspective - based on how coverage unit sets up and other factors (L: Left, R: Right, C: Center, text).
* **kickDirectionActual**: Actual kick direction from the kicking team's perspective (L: Left, R: Right, C: Center, text).
* **returnDirectionIntended**: The return direction the punt return or kick off return unit is set up for from the return team's perspective (L: Left, R: Right, C: Center, text).
* **returnDirectionActual**: Actual return direction from the return team's perspective (L: Left, R: Right, C: Center, text).
* **missedTacklers**: Jersey number and team code of player(s) charged with a missed tackle on the play. (text).
* **assistTacklers**: Jersey number and team code of player(s) assisting on the tackle. Multiple assist tacklers on a play are separated by a ; (text).
* **tacklers**: Jersey number and team code of player making the tackle (text).
* **kickoffReturnFormation**: 3 digit code indicating the number of players in the Front Wall, Mid Wall and Back Wall (text).
* **gunners**: Jersey number and team code of player(s) lined up as gunner on punt unit. Multiple gunners on a play are separated by a ; (text).
* **puntRushers**: Jersey number and team code of player(s) on the punt return unit with "Punt Rush" role for actively trying to block the punt. (text).
* **specialTeamsSafeties**: Jersey number and team code for player(s) with "Safety" roles on kickoff coverage and field goal/extra point block units - and those not actively advancing towards the line of scrimmage on the punt return unit. (text).
* **vises**: Jersey number and team code for player(s) with a "Vise" role on the punt return unit. Multiple vises on a play are separated by a ; (text).
* **kickContactType**: Detail on how a punt was fielded, or what happened when it wasn't fielded (text).
* Possible values:
  * BB: Bounced Backwards
  * BC: Bobbled Catch from Air
  * BF: Bounced Forwards
  * BOG: Bobbled on Ground
  * CC: Clean Catch from Air
  * CFFG: Clean Field From Ground
  * DEZ: Direct to Endzone
  * ICC: Incidental Coverage Team Contact
  * KTB: Kick Team Knocked Back
  * KTC: Kick Team Catch
  * KTF: Kick Team Knocked Forward
  * MBC: Muffed by Contact with Non-Designated Returner
  * MBDR: Muffed by Designated Returner
  * OOB: Directly Out Of Bounds

In [None]:
# PPF Scounting Data 
df_ppfSctdata.head()

In [None]:
# Details information of PPF Scout data
df_ppfSctdata.info()

In [None]:
fig, ((ax1,ax2),(ax3,ax4),(ax5,ax6)) = plt.subplots(3,2, figsize=(15,20))

ax = sns.histplot(df_ppfSctdata.hangTime, bins=20, kde=True, ax=ax1).set(title='Hangtime (seconds)')
ax = sns.histplot(df_ppfSctdata.loc[df_ppfSctdata.kickType.notnull()]['kickType'].value_counts(), bins=20, kde=True, ax=ax2).set(title='Kick Type')

ax = sns.histplot(df_ppfSctdata.loc[df_ppfSctdata.kickDirectionActual.notnull()]['kickDirectionActual'].value_counts(), bins=20, kde=True, ax=ax3).set(title='Kick Direction')
ax = sns.histplot(df_ppfSctdata.loc[df_ppfSctdata.snapTime.notnull()]['snapTime'], bins=20, kde=True, ax=ax4).set(title='Snap Time')

ax = sns.histplot(df_ppfSctdata.loc[df_ppfSctdata.kickContactType.notnull()]['kickContactType'].value_counts(), bins=20, kde=True, ax=ax5).set(title='Kick Contact Type')
ax = sns.histplot(df_ppfSctdata.loc[df_ppfSctdata.returnDirectionActual.notnull()]['returnDirectionActual'].value_counts(), bins=20, kde=True, ax=ax6).set(title='Return Direction')

plt.tight_layout()

In [None]:
# Merge scout and plays
play_scout = pd.merge(df_plays, df_ppfSctdata, how='left', on=['playId','gameId'])
# Select only numeric columns
num_play_scout = play_scout.select_dtypes(include=['int','float'])


corr_df = num_play_scout[['quarter','down','yardsToGo','yardlineNumber',
                          'penaltyYards','preSnapHomeScore','preSnapVisitorScore',
                          'kickLength','kickReturnYardage','playResult',
                          'absoluteYardlineNumber','snapTime','operationTime',
                          'hangTime']]

plt.figure(figsize=(20, 10))
corr = corr_df.corr()
sns.heatmap(corr, annot=True)
plt.title('Plays - Scout Data Correlation Heatmap')
plt.show()