# (NFL 2020 DATA Visualization Competition.)
## by (Peter Gamal Girgis)

## Project Overview:

> ### In American football, there are a plethora of defensive strategies and outcomes. The National Football League (NFL) has used previous Kaggle competitions to focus on offensive plays, but as the old proverb goes, “defense wins championships.” Though metrics for analyzing quarterbacks, running backs, and wide receivers are consistently a part of public discourse, techniques for analyzing the defensive part of the game trail and lag behind. Identifying player, team, or strategic advantages on the defensive side of the ball would be a significant breakthrough for the game.

> ### This competition uses NFL’s Next Gen Stats data, which includes the position and speed of every player on the field during each play. You’ll employ player tracking data for all drop-back pass plays from the 2018 regular season. The goal of submissions is to identify unique and impactful approaches to measure defensive performance on these plays. There are several different directions for participants to ‘tackle’ (ha)—which may require levels of football savvy, data aptitude, and creativity. As examples:
>
>> ### - **What are coverage schemes (man, zone, etc) that the defense employs? What coverage options tend to be better performing?**
>> ### - **Which players are the best at closely tracking receivers as they try to get open?**
>> ### - **Which players are the best at closing on receivers when the ball is in the air?**
>> ### - **Which players are the best at defending pass plays when the ball arrives?**
>> ### - **Is there any way to use player tracking data to predict whether or not certain penalties – for example, defensive pass interference – will be called?**
>> ### - **Who are the NFL’s best players against the pass?**
>> ### - **How does a defense react to certain types of offensive plays?**
>> ### - **Is there anything about a player – for example, their height, weight, experience, speed, or position – that can be used to predict their performance on defense?**
>
> #### What does data tell us about defending the pass play? You are about to find out.

# Data Description
> #### The 2021 Big Data Bowl data contains player tracking, play, game, and player level information for all possible passing plays during the 2018 regular season. For purposes of this event, passing plays are considered to be ones on a pass was thrown, the quarterback was sacked, or any one of five different penalties was called (defensive pass interference, offensive pass interference, defensive holding, illegal contact, or roughing the passer). On each play, linemen (both offensive and defensive) data are not provided. The focus of this year's contest is on pass coverage.
>
> #### Here, you'll find a summary of each data set in the 2021 Data Bowl, a list of key variables to join on, and a description of each variable.

# File descriptions
> #### `Game data`: The `games.csv` contains the teams playing in each game. The key variable is `gameId`.
>
> #### `Player data`: The `players.csv` file contains player-level information from players that participated in any of the tracking data files. The key variable is `nflId`.
>
> #### `Play data`: The `plays.csv` file contains play-level information from each game. The key variables are `gameId` and `playId`.
>
> #### `Tracking data`: Files `week[week].csv` contain player tracking data from all games in week `[week]`. The key variables are `gameId`, `playId`, and `nflId`. There are 17 weeks to a typical NFL Regular Season, and thus 17 data frames with player tracking data are provided.

# Game data

> ##### - `gameId`: Game identifier, unique (numeric)
> ##### - `gameDate`: Game Date (time, mm/dd/yyyy)
> ##### - `gameTimeEastern`: Start time of game (time, HH:MM:SS, EST)
> ##### - `homeTeamAbbr`: Home team three-letter code (text)
> ##### - `visitorTeamAbbr`: Visiting team three-letter code (text)
> ##### - `week`: Week of game (numeric)

# Player data

> ##### - `nflId`: Player identification number, unique across players (numeric)
> ##### - `height`: Player height (text)
> ##### - `weight`: Player weight (numeric)
> ##### - `birthDate`: Date of birth (YYYY-MM-DD)
> ##### - `collegeName`: Player college (text)
> ##### - `position`: Player position (text)
> ##### - `displayName`: Player name (text)

# Play data
> ##### - `gameId`: Game identifier, unique (numeric)
> ##### - `playId`: Play identifier, not unique across games (numeric)
> ##### - `playDescription`: Description of play (text)
> ##### - `quarter`: Game quarter (numeric)
> ##### - `down`: Down (numeric)
> ##### - `yardsToGo`: Distance needed for a first down (numeric)
> ##### - `possessionTeam`: Team on offense (text)
> ##### - `playType`: Outcome of dropback: sack or pass (text)
> ##### - `yardlineSide`: 3-letter team code corresponding to line-of-scrimmage (text)
> ##### - `yardlineNumber`: Yard line at line-of-scrimmage (numeric)
> ##### - `offenseFormation`: Formation used by possession team (text)
> ##### - `personnelO`: Personnel used by offensive team (text)
> ##### - `defendersInTheBox`: Number of defenders in close proximity to line-of-scrimmage (numeric)
> ##### - `numberOfPassRushers`: Number of pass rushers (numeric)
> ##### - `personnelD`: Personnel used by defensive team (text)
> ##### - `typeDropback`: Dropback categorization of quarterback (text)
> ##### - `preSnapHomeScore`: Home score prior to the play (numeric)
> ##### - `preSnapVisitorScore`: Visiting team score prior to the play (numeric)
> ##### - `gameClock`: Time on clock of play (MM:SS)
> ##### - `absoluteYardlineNumber`: Distance from end zone for possession team (numeric)
> ##### - `penaltyCodes`: NFL categorization of the penalties that ocurred on the play. For purposes of this contest, the most important penalties are Defensive Pass Interference (DPI), Offensive Pass Interference (OPI), Illegal Contact (ICT), and Defensive Holding (DH). Multiple penalties on a play are separated by a ; (text)
> ##### - `penaltyJerseyNumber`: Jersey number and team code of the player commiting each penalty. Multiple penalties on a play are separated by a ; (text)
> ##### - `passResult`: Outcome of the passing play (C: Complete pass, I: Incomplete pass, S: Quarterback sack, IN: Intercepted pass, text)
> ##### - `offensePlayResult`: Yards gained by the offense, excluding penalty yardage (numeric)
> ##### - `playResul`t: Net yards gained by the offense, including penalty yardage (numeric)
> ##### - `epa`: Expected points added on the play, relative to the offensive team. Expected points is a metric that estimates the average of every next scoring outcome given the play's down, distance, yardline, and time remaining (numeric)
> ##### - `isDefensivePI`: An indicator variable for whether or not a DPI penalty ocurred on a given play (TRUE/FALSE)


# Tracking data

> #### Each of the 17 `week[week].csv` files contain player tracking data from all passing plays during Week `[week]` of the 2018 regular season. Nearly all plays from each `[gameId]` are included; certain plays or games with insufficient data are dropped. Each team and player plays no more than 1 game in a given week.
> ##### - `time`: Time stamp of play (time, yyyy-mm-dd, hh:mm:ss)
> ##### - `x`: Player position along the long axis of the field, 0 - 120 yards. See Figure 1 below. (numeric)
> ##### - `y`: Player position along the short axis of the field, 0 - 53.3 yards. See Figure 1 below. (numeric)
> ##### - `s`: Speed in yards/second (numeric)
> ##### - `a`: Acceleration in yards/second^2 (numeric)
> ##### - `dis`: Distance traveled from prior time point, in yards (numeric)
> ##### - `o`: Player orientation (deg), 0 - 360 degrees (numeric)
> ##### - `dir`: Angle of player motion (deg), 0 - 360 degrees (numeric)
> ##### - `event`: Tagged play details, including moment of ball snap, pass release, pass catch, tackle, etc (text)
> ##### - `nflId`: Player identification number, unique across players (numeric)
> ##### - `displayName`: Player name (text)
> ##### - `jerseyNumber`: Jersey number of player (numeric)
> ##### - `position`: Player position group (text)
> ##### - `team`: Team (away or home) of corresponding player (text)
> ##### - `frameId`: Frame identifier for each play, starting at 1 (numeric)
> ##### - `gameId`: Game identifier, unique (numeric)
> ##### - `playId`: Play identifier, not unique across games (numeric)
> ##### - `playDirection`: Direction that the offense is moving (text, left or right)
> ##### - `route`: Route ran by offensive player (text)


<a id='Data Gathering and wrangling'></a>
## Data Gathering and wrangling

> **Tip**: In this section of the report:
> 1. I'll load using data from different CSV files. 
> 2. I'll check for cleanliness, and then trim and clean all the dataset for analysis. 

In [None]:
# import all requiring libraries

# Pandas & NumPy Profiling

import pandas as pd
import numpy as np
import pandas_profiling as pdp
from pandas_profiling import ProfileReport

# Visialisation profiling

from PIL import Image
import scipy.misc
import plotly.express as px
import plotly.graph_objects as go
import plotly
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.patches as patches
from matplotlib import animation,rc
from matplotlib import animation
from matplotlib.animation import FFMpegWriter
import matplotlib.animation as animation
import seaborn as sns
import dateutil
import plotly.tools as tls
import plotly.graph_objs as go
from wordcloud import WordCloud
from IPython.display import Video

import io
import re
import os
#from os import startfile
import os.path as path
import requests
from glob import glob # for combining several CSV files
import altair as alt
from math import radians

### 1- Open and Wrangling **`Games`** DataFrame

In [None]:
# open Dataframe CSV file 
games = pd.read_csv('../input/nfl-big-data-bowl-2021/games.csv')
games

In [None]:
# Check Dataframe shape
games.shape

In [None]:
# Check Dataframe info()
games.info()

In [None]:
# check Dataframe NULL values
games.isnull().sum()

In [None]:
# Check Dataframe duplication
games.duplicated().sum()

In [None]:
#Check Dataframe unique values
games.nunique()

In [None]:
# correct data type for "gameDate , gameTimeEastern" to be time stamp
games.gameDate = pd.to_datetime(games.gameDate)
games.gameTimeEastern = pd.to_datetime(games.gameTimeEastern)

### 2- Open and Wrangling **`Plays`** DataFrame

In [None]:
# open Dataframe CSV file 
plays = pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv')
plays.head(2)

In [None]:
# Check Dataframe shape
plays.shape

In [None]:
# Check Dataframe info()
plays.info()

In [None]:
# Check Dataframe duplication
plays.duplicated().sum()

In [None]:
# check Dataframe NULL values
plays.isnull().sum()

In [None]:
plays.columns

In [None]:
# complete missing data in column "gameClock" from column"playDescription"
plays['gameClock'] = plays.playDescription.str.split(')', n = 1, expand = True)
plays.gameClock = plays.gameClock.apply(lambda x:x.replace('(:', '00:'))
plays.gameClock = plays.gameClock.apply(lambda x:x.strip('('))

In [None]:
# correct data type for "gameClock" to be time stamp
plays.gameClock = pd.to_datetime(plays.gameClock)

In [None]:
plays.info()

In [None]:
# convert cells have "EMPTY" value at colomn"offenseFormation" to Nan Value
plays.offenseFormation = plays.offenseFormation.replace('EMPTY',np.nan)

In [None]:
plays.offenseFormation.unique()

In [None]:
# replace NUll Value with data at column "playDescription"
plays.offenseFormation.fillna(plays.playDescription,inplace =True)

In [None]:
plays.offenseFormation.unique()

In [None]:
# remove 1st word from text
plays.offenseFormation = plays.offenseFormation.str.split(n=1).str[-1]

In [None]:
# keep current 1st word now and remove prackets
plays.offenseFormation = plays.offenseFormation.str.split(')', n = 1, expand = True)
plays.offenseFormation = plays.offenseFormation.apply(lambda x:x.strip('('))

In [None]:
plays.offenseFormation.unique()

In [None]:
# replace "shotgun" by "'SHOTGUN'"
plays.offenseFormation = plays.offenseFormation.replace('Shotgun','SHOTGUN')

In [None]:
# convert unspecified cells to Null
def clean(x):
    if x == 'I_FORM' or x == 'SINGLEBACK' or x == 'SHOTGUN' or x == 'PISTOL' or x == 'WILDCAT' or x == 'No Huddle' or x == 'JUMBO' or x == 'No Huddle, Shotgun':
        
        return x
    else: 
        return np.nan
plays.offenseFormation = plays.offenseFormation.apply(clean)

In [None]:
plays.offenseFormation.unique()

In [None]:
plays = plays.drop(['penaltyCodes','penaltyJerseyNumbers'],axis = 1)

In [None]:
plays.isnull().sum()

In [None]:
# get mean for each item in columns "defendersInTheBox, numberOfPassRushers, preSnapVisitorScore, preSnapHomeScore and absoluteYardlineNumber" 
# and full NULL value
plays.defendersInTheBox = plays.defendersInTheBox.fillna(plays.groupby('playId')['defendersInTheBox'].transform('mean'))
plays.numberOfPassRushers = plays.numberOfPassRushers.fillna(plays.groupby('playId')['numberOfPassRushers'].transform('mean'))
plays.preSnapVisitorScore = plays.preSnapVisitorScore.fillna(plays.groupby('playId')['preSnapVisitorScore'].transform('mean'))
plays.preSnapHomeScore = plays.preSnapHomeScore.fillna(plays.groupby('playId')['preSnapHomeScore'].transform('mean'))
plays.absoluteYardlineNumber = plays.absoluteYardlineNumber.fillna(plays.groupby('playId')['absoluteYardlineNumber'].transform('mean'))

In [None]:
plays.isnull().sum()

In [None]:
# create list for all 3-letter team code corresponding to line-of-scrimmage
yard_list = list(plays.yardlineSide.unique())

In [None]:
# check the list created
yard_list

In [None]:
# remove Null value from the created list
yard_list = [x for x in yard_list if str(x) != 'nan']

In [None]:
# check list
yard_list

In [None]:
# replace NUll Value with data at column "playDescription"
plays.yardlineSide.fillna(plays.playDescription,inplace =True)

In [None]:
plays.yardlineSide.unique()

In [None]:
# create function to replace items in the list "yard_list" at column "yardlineSide"
def find_at_word(text):
    word=re.findall(r'\b(?:{})\b'.format('|'.join(map(re.escape, yard_list))),text)
    return " ".join(word)

plays.yardlineSide = plays.yardlineSide.apply(lambda x: find_at_word(x))

In [None]:
plays.yardlineSide.unique()

In [None]:
# remove repeated words
plays.yardlineSide = plays.yardlineSide.str.replace(r'\b(\w+)(\s+\1)+\b', r'\1')

In [None]:
plays.yardlineSide.unique()

In [None]:
# add "," between team codes corresponding to line-of-scrimmage
plays.yardlineSide = plays.yardlineSide.str.replace(' ', ',')

In [None]:
plays.yardlineSide.unique()

In [None]:
# Drop un-nessesary column"playDescription"
plays = plays.drop('playDescription',axis = 1)

In [None]:
plays.isnull().sum()

### 3- Open and Wrangling **`Players`** DataFrame

In [None]:
# open Dataframe CSV file 
players = pd.read_csv('../input/nfl-big-data-bowl-2021/players.csv')
players.head(2)

In [None]:
players.info()

In [None]:
players.birthDate = pd.to_datetime(players.birthDate)

In [None]:
players.groupby('height').count()

In [None]:
# modify the heights and remove "-" then convert data type from feet-inch to inches only
def height_convert(height):
    h = height.split('-')
    if len(h) > 1:
        return (int(h[0]) * 12 + int(h[1]))
    else:
        return height

In [None]:
# change Heights formatted to inches.
players.height = players.height.apply(lambda x: height_convert(x))

In [None]:
# onvert Datatype at column "heights" from string to integer.
players.height = players.height.astype(int)

In [None]:
players.head(10)

In [None]:
players.groupby('height').count()

In [None]:
# calculate player Age according to "birthDate" column
def cal_age(birthDate):
    today = pd.to_datetime('today')
    return (today - birthDate).days / 365.25

In [None]:
players['age'] = players.birthDate.apply(cal_age)

In [None]:
# create bmi column at "players" dataFrame using formula " [weight (lb) / height (in) / height (in)] x 703 "
player_weight = players.weight.astype(float)
player_height = players.height.astype(float)
players['bmi'] = player_weight / player_height / player_height * 703

In [None]:
players.head(2)

In [None]:
players.info()

### 4- Open and Wrangling **`weeks`** DataFrame

In [None]:
## using glob function to combine several CSV files with same strucure 
files = sorted (glob('../input/nfl-big-data-bowl-2021/week*.csv'))
files

In [None]:
# using concat to combined all files and assign() methods
weeks = pd.concat((pd.read_csv(file).assign(filename = file) for file in files), ignore_index = True)
weeks

In [None]:
weeks.filename.unique()

In [None]:
# modify file path to week number.
weeks.filename = weeks.filename.str.replace('../input/nfl-big-data-bowl-2021/', '')
weeks.filename = weeks.filename.str.replace('.csv', '')
weeks.filename = weeks.filename.str.replace('week', '')
weeks

In [None]:
# change column "filename to "week"
weeks = weeks.rename(columns = {'filename' :'week'})

In [None]:
# convert Datatype at column "week" from string to integer.
weeks.week = weeks.week.astype('int64')

In [None]:
weeks.info()

In [None]:
weeks.head(2)

In [None]:
# Check Dataframe duplication
weeks.duplicated().sum()

In [None]:
# drop douplication at df_weeks
weeks = weeks.drop_duplicates()

In [None]:
weeks.info()

### 5- Copying DataFrames

In [None]:
# Copying the original DataFrames
df_games = games.copy()
df_plays = plays.copy()
df_players = players.copy()
df_weeks = weeks.copy()

<a id='eda'></a>
## Exploratory Data Analysis (Analyzing and Visualization)

### <span style="color:orange">  1. **Games DataFrame :** </span>

> ### Q1: How Many Matches played per Month?

In [None]:
df_games.info()

In [None]:
df_games['game_date'] = pd.to_datetime(df_games.gameDate)

In [None]:
date = df_games.game_date.groupby([df_games.game_date.dt.month, df_games.game_date.dt.year]).count()
ax = date.plot(kind ='line',figsize = (10,8), marker = 'o')       
plt.title('Matches Played per Month', weight = 'bold', size = 15)
plt.xlabel('Date(Month-Year)', size = 15)
plt.ylabel('Total Matches Played', size = 15)
plt.xticks([0, 1, 2, 3], ['Sep-18', 'Oct-18', 'Nov-18', 'Dec-18'])
plt.grid(True)
plt.show();

> ### Q2: How Many Matches played per Day?

In [None]:
date = df_games.game_date.groupby([df_games.game_date.dt.month,df_games.game_date.dt.day]).count()
ax = date.plot(kind ='line',figsize = (20,8), marker = 'o')
plt.title('Matches Played per Day', weight = 'bold', size = 15)
plt.xlabel('Date(Day-Month)', size = 15)
plt.ylabel('Total Matches Played', size = 15)
plt.xticks([0, 4, 10, 13, 17, 21, 24, 30, 36, 39, 43, 49],
           ['6th-Sep', '16th-Sep', '30th-Sep', '7th-Oct', '15th-Oct', '25th-Oct', '1st-Nov', '15th-Nov', '29th-Nov', '6th-Dec', '15th-Dec', '30th-Dec'])
plt.grid(True)
plt.show();

> ### Q3: What's Total Games played per Team as (Home / Visitor) Team?

In [None]:
# combine 2 colums "homeTeamAbbr" & "gameId" then sort home teams desending regarding number of games played results
df_play_home_game_sort = df_games.groupby('homeTeamAbbr')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

# combine 2 colums "visitorTeamAbbr" & "gameId" then sort away teams desending regarding number of games played results
df_play_away_game_sort = df_games.groupby('visitorTeamAbbr')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig,(ax1,ax2) = plt.subplots (1,2,figsize = (16,20))

# figure "a" represent number of Home Teams in NFL 
a = sns.barplot (x = df_play_home_game_sort.gameId, y = df_play_home_game_sort.homeTeamAbbr, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# figure "a" represent number of Visitor Teams in NFL 
b = sns.barplot (x = df_play_away_game_sort.gameId, y = df_play_away_game_sort.visitorTeamAbbr, ax = ax2 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of Matches in each Home Teams
for i,j in enumerate(df_play_home_game_sort.gameId):
        ax1.text(.07,i+0.15,j,weight = "bold", size = 15)

# create loop to count number of Matches in each Visitor Teams
for k,l in enumerate(df_play_away_game_sort.gameId):
        ax2.text(.07,k+0.15,l,weight = "bold", size = 15)     
        
a.set_title("Total Games played as Home Team" , weight = 'bold', size = 15)
a.set_xlabel('Total Home Games Count', size = 15)
a.set_ylabel('NFL Team', size = 15)

b.set_title("Total Games played as Visitor Team" , weight = 'bold', size = 15)
b.set_xlabel('Total Visitor Games Count', size = 15)
b.set_ylabel('NFL Team', size = 15)
plt.show();

> ### Q4: What's Total Games played per Week?

In [None]:
# combine 2 colums "week" & "gameId" then sort weeks desending regarding number of games played results
df_week_game_sort = df_games.groupby('week')['gameId'].count().reset_index().sort_values( by = 'week' )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (8,12))

# figure "a" represent number of Games played per week
a = sns.barplot (x = df_week_game_sort.gameId, y = df_week_game_sort.week, ax = ax1 , linewidth = 1 ,
                 alpha = 0.7, palette = 'Blues_r', orient = 'h')


# create loop to count number of games in each week
for i,j in enumerate(df_week_game_sort.gameId):
        ax1.text(0.5,i+0.1,j,weight = "bold", size = 15)

a.set_title("Total Total Games Count per Week Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Games Count', size = 15)
a.set_ylabel('Weeks Number', size = 15)
plt.yticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
           ['Week 1','Week 2','Week 3','Week 4','Week 5','Week 6','Week 7','Week 8','Week 9','Week 10','Week 11','Week 12','Week 13','Week 14','Week 15','Week 16','Week 17'])
plt.show();

> ### Q5: What's the Total Number of Team joined NFL?

In [None]:
# Count total Number of Team joined NFL?
print ('='*80+'\n')
print('we have {} Teams Joined in NFL Champions, listed as below:\n\n {}'.format(df_games.homeTeamAbbr.nunique(), df_games.homeTeamAbbr.unique()))
print ('\n'+'='*80)

### <span style="color:orange">  2. **Players DataFrame :** </span>

> ### Q1: What's the Player Height Distribution?

In [None]:
df_players.height.describe()

In [None]:
# plot the players Hight distribution
plt.figure(figsize =[15,6])
bins = np.arange(df_players.height.min(), df_players.height.max() + 1, 1)
plt.hist(x = df_players.height,bins = bins, ec = 'black', alpha = 0.7)

plt.xlabel('Height (Inches)',size = 14)
plt.ylabel('Total Plyers Count',size = 14)
plt.title("Player Height Distribution",size = 16, weight = 'bold')
plt.show();

> ### Q2: What's the Heighest 20 Players at NFL?

In [None]:
# combine 2 colums "displayName" & "height" then Player's name desending regarding the heighest players.
df_height_sort = pd.DataFrame(df_players.groupby('height')['displayName'].value_counts().sort_values( ascending = False))
df_height_sort = df_height_sort.drop('displayName', axis = 1).reset_index()

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of player heighest
a = sns.barplot (x = df_height_sort.height, y = df_height_sort.displayName[:20], ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count players depend on height player.
for i,j in enumerate(df_height_sort.height[:20]):
        ax1.text(.7,i+0.08,'{:0.1f}'.format(j),weight = "bold", size = 15)

a.set_title("Top Players Heighst Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Player Height (Inches)', size = 15)
a.set_ylabel('Player\'s  Distribution', size = 15)
plt.show();

> ### Q3:What's the Player Weight Distribution?

In [None]:
# plot the players Weight distribution
plt.figure(figsize =[20,8])
bins = np.arange(df_players.weight.min(), df_players.weight.max() + 1, 1)
plt.hist(x = df_players.weight, bins = bins, ec = 'black', alpha = 0.7)

plt.xticks([150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350],
           ['150','160','170','180','190','200', '210', '220', '230', '240', '250', '260', '270', '280', '290', '300', '310', '320', '330', '340', '350'])
plt.xlabel('Weight (Ibs)',size = 15)
plt.ylabel('Total Plyers Count',size = 14)
plt.title("Player Weight Distribution",size = 15, weight = 'bold')
plt.show();

> ### Q4: What's the Heavy 20 Players at NFL?

In [None]:
# combine 2 colums "displayName" & "weight" then Player's name desending regarding the player's weight.
df_weight_sort = pd.DataFrame(df_players.groupby('weight')['displayName'].value_counts().sort_values( ascending = False))
df_weight_sort = df_weight_sort.drop('displayName', axis = 1).reset_index()

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of player weight
a = sns.barplot (x = df_weight_sort.weight, y = df_weight_sort.displayName[:20], ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count players depend on weight player.
for i,j in enumerate(df_weight_sort.weight[:20]):
        ax1.text(.7,i+0.08,'{:0.1f}'.format(j),weight = "bold", size = 15)

a.set_title("Top Players Weight Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Player Weight (Ibs)', size = 15)
a.set_ylabel('Player\'s  Distribution', size = 15)
plt.show();

> ### Q5: What's the Oldest Player's inn NFL?

In [None]:
# plot the players Age distribution
plt.figure(figsize =[20,8])
bins = np.arange(df_players.age.min(), df_players.age.max() + 1, 1)
plt.hist(x = df_players.age, bins = bins, ec = 'black', alpha = 0.7)

plt.xticks([22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
           ['22','23','24','25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44'])

plt.xlabel('Age (Year)',size = 15)
plt.ylabel('Total Plyers Count',size = 15)
plt.title("Player Age Distribution",size = 15, weight = 'bold')
plt.show();

> ### Q4: What's the Oldest 20 players in NFL?

In [None]:
# combine 2 colums "displayName" & "age" then Player's name desending regarding the player'sage.
df_age_sort = pd.DataFrame(df_players.groupby('age')['displayName'].value_counts().sort_values( ascending = False))
df_age_sort = df_age_sort.drop('displayName', axis = 1).reset_index()

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of player age.
a = sns.barplot (x = df_age_sort.age, y = df_age_sort.displayName[:20], ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count players depend on age player.
for i,j in enumerate(df_age_sort.age[:20]):
        ax1.text(.7,i+0.08,'{:0.2f}'.format(j),weight = "bold", size = 15)

a.set_title("Top Players Age Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Player Age', size = 15)
a.set_ylabel('Player\'s  Distribution', size = 15)
plt.show();

> ### Q5: What's the Player BMI (Body Mass Index) Distribution?

In [None]:
# create bmi column at "players" dataFrame using formula " [weight (lb) / height (in) / height (in)] x 703 "
df_players['bmi'] = df_players.weight / df_players.height / df_players.height * 703

In [None]:
# check creatig new column
df_players.head()

In [None]:
# plot the players BMI distribution
plt.figure(figsize =[20,8])
bins = np.arange(df_players.bmi.min(), df_players.bmi.max() + 1, 1)
plt.hist(x = df_players.bmi, bins = bins, ec = 'black', alpha = 0.7)

plt.xticks([22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48],
           ['22','24','26','28','30','32', '34', '36', '38', '40', '42', '44', '46', '48'])
plt.xlabel('BMI',size = 15)
plt.ylabel('Total Plyers Count',size = 14)
plt.title("Player BMI Distribution",size = 15, weight = 'bold')
plt.show();

> ### Q6: What's the biggest BMI Players at NFL?

In [None]:
# combine 2 colums "displayName" & "bmi" then Player's name desending regarding the player's bmi.
df_bmi_sort = pd.DataFrame(df_players.groupby('bmi')['displayName'].value_counts().sort_values( ascending = False))
df_bmi_sort = df_bmi_sort.drop('displayName', axis = 1).reset_index()

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of player weight
a = sns.barplot (x = df_bmi_sort.bmi, y = df_bmi_sort.displayName[:20], ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count players depend on weight player.
for i,j in enumerate(df_bmi_sort.bmi[:20]):
        ax1.text(.7,i+0.08,'{:0.1f}'.format(j),weight = "bold", size = 15)

a.set_title("Top Players BMI Distribution" , weight = 'bold', size = 15)
a.set_xlabel('BMI', size = 15)
a.set_ylabel('Player\'s  Distribution', size = 15)
plt.show();

> ### Q7: What's the Total Players at top 20 College?

In [None]:
# combine 2 colums "collegeName" & "nflId" then sort colleges desending regarding number of players 
df_team_sort = df_players.groupby('collegeName')['nflId'].count().reset_index().sort_values( by = 'nflId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of players in top 20 colleges 
a = sns.barplot (x = df_team_sort.nflId, y = df_team_sort.collegeName[:20], ax = ax1, linewidth = 1, alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each college
for i,j in enumerate(df_team_sort.nflId[:20]):
        ax1.text(.7,i+0.15,j,weight = "bold", size = 15)

a.set_title("Top 20 College Name Vs. Players Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Players count', size = 15)
a.set_ylabel('Top 20 College Vs. Dstribution', size = 15)
plt.show();

> ### Q8: What's the total number of players at each position?

In [None]:
# combine 2 colums "position" & "nflId" then sort postions desending regarding number of players
df_position_sort = df_players.groupby('position')['nflId'].count().reset_index().sort_values( by = 'nflId', ascending = False)

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of players in All Position
a = sns.barplot (x = df_position_sort.nflId, y = df_position_sort.position, ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each Position
for i,j in enumerate(df_position_sort.nflId):
        ax1.text(.7,i+0.08,j,weight = "bold", size = 15)

a.set_title("Positions Vs. Players Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Players count', size = 15)
a.set_ylabel('Player Positions Distribution', size = 15)
plt.show();

> ### Q9: Whats Players Attributes Summary?

In [None]:
# Create Heatmap for player attributions(height, weight,bmi,age)
plt.figure(figsize=(12,6))
sns.heatmap(df_players[['height','weight','age','bmi']].describe()[1:].transpose(),annot=True,cmap = plt.cm.plasma, 
            linecolor="white",linewidths=2)
plt.title('Player attributes Summary', weight = 'bold', size = 20)
plt.ylabel('', size = 15)
plt.show();

> ### Q10: What's Correlation Between Variables ?

In [None]:
# get correlation in specific columns
correlation = df_players.corr()
plt.figure(figsize=(6,5))

# create heatmap plot
sns.heatmap(correlation,annot=True ,cmap = plt.cm.plasma, linecolor='white', linewidths=2)
plt.title('Correlation Between Variables')
plt.show();

> ### Q11: What's the Total Number of players joined NFL?

In [None]:
# Count total Number of players joined NFL?
print ('='*80+'\n')
print('we have {} of players Joined in NFL Champions, listed as below:\n\n {}'.format(df_players.displayName.nunique(), list(df_players.displayName.unique())))
print ('\n'+'='*80)

> ### Q12: What's the Total Number of players's college joined NFL?

In [None]:
# Count total Number of College joined NFL?
print ('='*80+'\n')
print('we have {} college Joined in NFL Champions, listed as below:\n\n {}'.format(df_players.collegeName.nunique(), list(df_players.collegeName.unique())))
print ('\n'+'='*80)

### <span style="color:orange">  3. **Plays DataFrame :** </span>

> ### Q1: What's the Player Positions?

In [None]:
df_plays.offenseFormation.value_counts()

In [None]:
# 1st plot: "offenseFormation" analysis
plt.figure(figsize=[16, 6])
color_base = sns.color_palette()[0]

ax = sns.countplot(data = df_plays, x = 'offenseFormation', color = color_base, order = df_plays.offenseFormation.value_counts().index)

for i,j in enumerate (df_plays.offenseFormation.value_counts()):
    ax.text(i,100 + df_plays.offenseFormation.value_counts()[i], j, weight = "bold", size = 13,va='baseline', ha='center')
    
plt.yticks([0, 2000, 4000, 6000, 8000, 10000, 12000, 14000],['0','2K','4K','6K','8K','10K','12K','14K'])
plt.xlabel('Offense Formation', size = 15)
plt.ylabel('Total Player Count', size = 15)
plt.title('Player Position Distribution', size = 15, weight = 'bold')
plt.show();

> ### Q2: What's the Number Of Defenders Distribution and Number of Pass Rushers?

In [None]:
# Two plots: "defendersInTheBox" and "numberOfPassRushers" analysis
plt.figure(figsize=[18, 6])
bins_defendersInTheBox = np.arange(df_plays.defendersInTheBox.min(), df_plays.defendersInTheBox.max() + 1, 1)
bins_numberOfPassRushers = np.arange(df_plays.numberOfPassRushers.min(), df_plays.numberOfPassRushers.max() + 1, 1)

plt.subplot(1, 2, 1)
plt.hist(x = df_plays.defendersInTheBox, bins = bins_defendersInTheBox ,ec = 'black', alpha = 0.7)
plt.yticks([0, 2000, 4000, 6000, 8000],['0','2K','4K','6K','8K'])
plt.xlabel('defendersInTheBox', size = 15)
plt.ylabel('Total Player Count', size = 15)
plt.title('Number Of Defenders Distribution', size = 15, weight = 'bold')

plt.subplot(1, 2, 2)
plt.hist(x = df_plays.numberOfPassRushers, bins = bins_numberOfPassRushers,  ec = 'black', alpha = 0.7)
plt.yticks([0, 2000, 4000, 6000, 8000, 10000, 12000],['0','2K','4K','6K','8K', '10K', '12K'])
plt.xlabel('numberOfPassRushers', size = 15)
plt.ylabel('Total Player Count', size = 15)
plt.title('Number Of Pass Rushers Distribution', size = 15, weight = 'bold')
plt.show();

> ### Q3: What's the Absolute Yard line Number at Possession team?

In [None]:
# plot: "absoluteYardlineNumber" distribution
plt.figure(figsize =[20,8])
bins = np.arange(df_plays.absoluteYardlineNumber.min(), df_plays.absoluteYardlineNumber.max() + 1, 1)
plt.hist(x = df_plays.absoluteYardlineNumber, bins = bins, ec = 'black', alpha = 0.7)
plt.xticks([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110],
           ['10','20','30', '40', '50', '60', '70', '80', '90', '100', '110']);

plt.xlabel('absolute Yard line Number',size = 15)
plt.ylabel('Total Plyers Count',size = 15)
plt.title("Absolute Yard line Number Distribution",size = 15, weight = 'bold')
plt.show();

> ### Q4: What's the Offense Play Result?

In [None]:
#Plot : "offensePlayResult" Distribution
plt.figure(figsize =[14,8])
bins = np.arange(df_plays.offensePlayResult.min(), df_plays.offensePlayResult.max() + 1, 1)
plt.hist(x = df_plays.offensePlayResult, bins = bins, ec = 'black', alpha = 0.7)

plt.xticks([-20, -10, 0, 10, 20, 30, 40, 50, 60, 70, 80],
           ['-20','-10','0', '10', '20', '30', '40', '50', '60', '70', '80']);
plt.yticks([1000, 2000, 3000, 4000, 5000, 6000, 7000],['1K', '2K', '3K', '4K', '5K', '6K', '7K'])
plt.xlabel('offensePlayResult',size = 15)
plt.ylabel('Total Plyers Count',size = 15)
plt.title("offense Play Result Distribution",size = 15, weight = 'bold')
plt.show();

> ### Q5: What's the PreSnap score at (Home/ Visitor)?

In [None]:
# two plots: "preSnapHomeScore" and "preSnapVisitorScore" at Home/Visitor  Distribution
plt.figure(figsize=[18, 6])
bins_preSnapHomeScore = np.arange(df_plays.preSnapHomeScore.min(), df_plays.preSnapHomeScore.max() + 1, 1)
bins_preSnapVisitorScore = np.arange(df_plays.preSnapVisitorScore.min(), df_plays.preSnapVisitorScore.max() + 1, 1)

plt.subplot(1, 2, 1)
plt.hist(x = df_plays.preSnapHomeScore,bins = bins_preSnapHomeScore, ec = 'black', alpha = 0.7)
plt.yticks([0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000],['0','0.5K','1K','1.5K','2K', '2.5K', '3K', '3.5K', '4K'])
plt.xlabel('preSnapHomeScore', size = 15)
plt.ylabel('Total Player Count', size = 15)
plt.title('preSna pHome Scores Distribution', size = 15, weight = 'bold')

plt.subplot(1, 2, 2)
plt.hist(x = df_plays.preSnapVisitorScore, bins = bins_preSnapVisitorScore,  ec = 'black', alpha = 0.7)
plt.yticks([0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000],['0','0.5K','1K','1.5K','2K', '2.5K', '3K', '3.5K', '4K'])
plt.xlabel('preSnapVisitorScore', size = 15)
plt.ylabel('Total Player Count', size = 15)
plt.title('preSnap Visitor Score Distribution', size = 15, weight = 'bold')
plt.show();

> ### Q6: What's the total number of player at each Team?

In [None]:
# combine 2 colums "possessionTeam" & "playId" then sort Teams desending regarding number of players 
df_postion_player_sort = df_plays.groupby('possessionTeam')['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,16))

# figure "a" represent number of players in all teams. 
a = sns.barplot (x = df_postion_player_sort.playId, y = df_postion_player_sort.possessionTeam, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each team
for i,j in enumerate(df_postion_player_sort.playId):
        ax1.text(0.7,i+0.2,j,weight = "bold", size = 15)

# create loop to count number of players in each college

a.set_title("Total Number of Players per Team Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Player Count', size = 15)
a.set_ylabel('Team Distribution', size = 15)
plt.show();

> ### Q7: What's the total number of players / postion sucessfully complete the pass attempts at each postion team?

In [None]:
# combine 3 colums "possessionTeam", "passResult" & "playId" then sort team positions desending regarding number of players achieve pass attempts result
df_postion_pass_player_sort = df_plays.groupby(['possessionTeam', 'passResult'])['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )
df_complete_postion_pass_player_sort = df_postion_pass_player_sort.query('passResult == "C"')

In [None]:
fig, ax1 = plt.subplots (1,figsize = (14,20))

# figure "a" represent number of players acheive pass attempts result in all teams.
a = sns.barplot (x = df_complete_postion_pass_player_sort.playId, y = df_complete_postion_pass_player_sort.possessionTeam, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players players acheive pass attempts result in each teams.
for i,j in enumerate(df_complete_postion_pass_player_sort.playId):
        ax1.text(.7,i+0.1,j,weight = "bold", size = 15)
        #ax1.text(3.7,i+0.1,j / df_postion_player_sort.playId, weight = "bold", size = 15)
        
# create loop to count number of players in each college

a.set_title("Total Number of Players Completing the pass attempt Vs. Positions Team Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Counts of Player Completing the pass attempts', size = 15)
a.set_ylabel('Positions Team', size = 15)
plt.show();

> ### Q8: What's the percentage of players / postion sucessfully complete the pass attempts at each postion team?

In [None]:
df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_sort.copy()
df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_percentage.merge(df_postion_player_sort, how = 'left', on=['possessionTeam'])
df_complete_postion_pass_player_percentage = df_complete_postion_pass_player_percentage.rename(columns = {'playId_y' :'total_playId', 'playId_x' : 'complete_pass_playId'})
df_complete_postion_pass_player_percentage['complete_pass_%'] = (df_complete_postion_pass_player_percentage.complete_pass_playId 
                                                                 / df_complete_postion_pass_player_percentage.total_playId) * 100
df_complete_postion_pass_player_percentage

In [None]:
fig, ax1 = plt.subplots (1,figsize = (14,20))

# figure "a" represent number of players acheive pass attempts result in all teams.
a = sns.barplot (x = df_complete_postion_pass_player_percentage['complete_pass_%'], y = df_complete_postion_pass_player_percentage.possessionTeam, 
                 ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players players acheive pass attempts % in each teams.
for i,j in enumerate(df_complete_postion_pass_player_percentage['complete_pass_%']):
        ax1.text(.01,i+0.1,'{:0.2f} %'.format(j),weight = "bold", size = 15)
             
# create loop to count number of players in each college
a.set_title("Percentage of Players Completing the pass attempt Vs. Positions Team Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Percentage of Player Completing the pass attempts', size = 15)
a.set_ylabel('Positions Team', size = 15)
plt.xticks([0, 10, 20, 30, 40, 50, 60, 70],['0%','10%','20%','30%','40%','50%','60%', '70%'])
plt.show();

> ### Q9: What's the Number Of plays per each DropBack Type?

In [None]:
# combine 2 colums "typeDropback" & "playId" then sort Player position desending regarding DropBack Type 
df_dropback_type_sort = df_plays.groupby('typeDropback')['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,6))

# figure "a" represent number of players in all Dropback type
a = sns.barplot (x = df_dropback_type_sort.playId, y = df_dropback_type_sort.typeDropback, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each Dropback type
for i,j in enumerate(df_dropback_type_sort.playId):
        ax1.text(0.7,i+0.05,j,weight = "bold", size = 15)

# create loop to count number of players in each college

a.set_title("Total Number of Players Vs. DropBack Type Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Player Counts', size = 15)
a.set_ylabel('DropBack Type', size = 15)
plt.xticks([0, 2000, 4000, 6000, 8000, 10000, 12000, 14000, 16000],['0','2K','4K','6K','8K','10K', '12K', '14K', '16K'])
plt.show();

> ### Q10: Describe the Player pass Results Distribution?

In [None]:
# combine 2 colums "passResult" & "playId" then sort Player pass result desending regarding number of players 
df_player_pass_sort = df_plays.groupby('passResult')['playId'].count().reset_index().sort_values( by = 'playId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,6))

# figure "a" represent number of players in all pass results.
a = sns.barplot (x = df_player_pass_sort.playId, y = df_player_pass_sort.passResult, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each pass results.
for i,j in enumerate(df_player_pass_sort.playId):
        ax1.text(0.7,i+0.05,j,weight = "bold", size = 15)

# create loop to count number of players in each college

a.set_title("Total Number of Players Vs. Pass Attempts Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Player Counts', size = 15)
a.set_ylabel('Eeach Pass Attempts', size = 15)
plt.yticks([0, 1, 2, 3, 4],['Complete pass','Incomplete pass','Quarterback sack' ,'Intercepted pass', 'R'])
plt.xticks([0, 2000, 4000, 6000, 8000, 10000],['0','2K','4K','6K','8K','10K'])
plt.show();

> ### Q11: What's the Total Players Result at each Position Team?

In [None]:
# combine 2 colums "possessionTeam" & "playId" then sort Player pass result per position desending regarding players results
df_player_result_sort = df_plays.groupby('possessionTeam')['playResult'].sum().reset_index().sort_values( by = 'playResult', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (14,20))

# figure "a" represent number of players in all pass results per position
a = sns.barplot (x = df_player_result_sort.playResult, y = df_player_result_sort.possessionTeam, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of players in each pass results per position
for i,j in enumerate(df_player_result_sort.playResult):
        ax1.text(.7,i+0.1,j,weight = "bold", size = 15)
              
# create loop to count number of players in each college
a.set_title("Total Play Results Vs. Positions Team Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total Play Results', size = 15)
a.set_ylabel('Positions Team', size = 15)
plt.xticks([0, 1000, 2000, 3000, 4000, 5000],['0','1K','2K','3K','4K','5K'])
plt.show();

> ### Q12: What's the Offensive Formation related Defense Formation?

In [None]:
# combine 2 colums "personnelD" & "offensePlayResult" then sort mean reult of offensive formation
df_offense_play_sort = df_plays.groupby('personnelD')['offensePlayResult'].mean().reset_index().sort_values( by = 'offensePlayResult', ascending = False )

In [None]:
# plot: "offensePlayResult" Distribution
df_offense_play_sort.plot(x ='offensePlayResult', y ='personnelD', kind='scatter', s=30, color='blue', figsize = (12,20))
plt.axvline(x = df_offense_play_sort.offensePlayResult.mean(), color = 'r', linewidth = 3)
plt.title('Offensive Vs. Defense Formation', weight = 'bold', size = 15)
plt.xlabel('personne lD', size = 15)
plt.ylabel('offense Play Result', size = 15)
plt.xticks([-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
           ['-5','0','5','10','15','20', '25', '30', '35', '40', '45', '50', '55', '60', '65', '70'])
plt.show();

> ### Q13: Complete play pass at each NFL Teams?

In [None]:
# combine 2 colums "possessionTeam" & "playId" then sort mean reult of Complete play pass at each NFL Teams
df_plays_team_pass_sort = df_plays.groupby('possessionTeam')['playId'].nunique().reset_index().sort_values( by = 'playId', ascending = False )

In [None]:
#plot: Play Pass in eah teams analysis 
df_plays_team_pass_sort.plot(x ='playId', y ='possessionTeam', kind='scatter', s=30, color='blue', figsize = (10,15))
plt.axvline(x = df_plays_team_pass_sort.playId.mean(), color = 'r', linewidth = 3)
plt.title('Teams Vs. Play Pass Distribution',weight = 'bold', size = 15)
plt.xlabel('Plays Pass', size = 15)
plt.ylabel('NFL Team', size = 15)
plt.xticks([400, 450, 500, 550, 600, 650, 700],['400', '450', '500', '550', '600', '650', '700'])
plt.show();

> ### Q14: What's the play type Percentage?

In [None]:
# combine 2 colums "playType" & "gameId" then sort play type desending regarding number of games played results
df_play_type_sort = df_plays.groupby('playType')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig,ax1 = plt.subplots (1,figsize = (10,4))

# figure "a" represent number of play type in all games 
a = sns.barplot (x = df_play_type_sort.gameId*100/df_plays.gameId.count(), y = df_play_type_sort.playType, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, 
        palette = 'Blues_r', orient = 'h')

# create loop to count number of play type in each game
for i,j in enumerate(df_play_type_sort.gameId):
        ax1.text(0.01,i+0.04,'{:0.2f} %'.format(j *100 /df_plays.gameId.count()),weight = "bold", size = 15)
            
a.set_title("Total play type Percentage Distribution" , weight = 'bold', size = 15)
a.set_xlabel('Total %', size = 15)
a.set_ylabel('Play type', size = 15)
plt.xticks([0, 10, 20, 30, 40, 50, 60, 70, 80,90,100],['0%','10%','20%','30%','40%','50%','60%', '70%', '80%', '90%', '100%'])
plt.show();

> ### Q16: What's Correlation Between Variables ?

In [None]:
# get correlation in specific columns
correlation = df_plays.corr()
plt.figure(figsize=(22,15))

# create heatmap plot
sns.heatmap(correlation,annot=True ,cmap = plt.cm.plasma, linecolor='white', linewidths=2)
plt.title('Correlation Between Variables')
plt.show();

### <span style="color:orange">  4. **Weeks DataFrame :** </span>

> ### Q1: What's the total count of each Events at All Games?

In [None]:
# combine 2 colums "event" & "gameId" then events desending regarding to All Games. 
df_event_sort = df_weeks.groupby('event')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,16))

# figure "a" represent number of total events at all Games. 
a = sns.barplot (x = df_event_sort.gameId, y = df_event_sort.event[1:], ax = ax1 , linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of each events at all games.
for i,j in enumerate(df_event_sort.gameId[1:]):
        ax1.text(.05,i+0.15,j,weight = "bold", size = 13)

# create loop to count number of players in each college
a.set_title("Events Vs. Games Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Events Count', size = 15)
a.set_ylabel('Events Distribution', size = 14)
plt.xticks([0, 50000, 100000, 150000, 200000, 250000],['0','50K','100K','150K','200K','250K'])
plt.show();

> ### Q2: What's Correlation Between Variables ?

In [None]:
# get correlation in specific columns
correlation = df_weeks.corr()
plt.figure(figsize=(30,10))

# create heatmap plot
sns.heatmap(correlation,annot=True ,cmap = plt.cm.plasma, linecolor='white', linewidths=2)
plt.title('Correlation Between Variables')
plt.show();

> ### Q3: Using funny shape called"worldcloud" to print top players overall_rating

In [None]:
# get player counts value acheived gameId > 32000 
top_rate = df_weeks[df_weeks.gameId  > 32000 ]['displayName'].value_counts().index

# using bckground" WorldCloud
w_c = WordCloud(background_color="white",scale=2).generate(" ".join(top_rate))
fig = plt.figure(figsize=(15,8))

# plot show in "bilionear style"
plt.imshow(w_c,interpolation="bilinear")
plt.axis("off")
plt.title("Top Rated Players")
plt.show();

### <span style="color:orange">  5. **Mixed DataFrame :** </span>

> ### Q1: What's the Total Number of Games played by the top 30 players?

In [None]:
df_players.columns

In [None]:
df_weeks.columns

In [None]:
df_weeks_displayname = df_weeks[['nflId', 'displayName', 'gameId']]

In [None]:
df_players_displayname = df_players[['nflId', 'displayName']]

In [None]:
# check Dataframe duplication
df_weeks_displayname.duplicated().sum()

In [None]:
# drop douplication at df_players_team
df_weeks_displayname = df_weeks_displayname.drop_duplicates()

In [None]:
# merge 2 Dataframe "df_players" and "df_team"
df_displayname = pd.merge(df_players_displayname, df_weeks_displayname, how = 'left', on =['nflId','displayName'])
df_displayname

In [None]:
# combine 2 colums "displayName" & "gameId" then sort players desending regarding to total number of games played. 
df_game_player_sort = df_displayname.groupby('displayName')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of top players palaying a max. number of games. 
a = sns.barplot (x = df_game_player_sort.gameId, y = df_game_player_sort.displayName[:30], ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of max. games player by each player.
for i,j in enumerate(df_game_player_sort.gameId[:30]): 
        ax1.text(.05,i+0.15,j,weight = "bold", size = 15)

a.set_title("Total Number of Games played by the top 30 players Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Games Count', size = 15)
a.set_ylabel('Top 20 Player Vs. Games Distribution', size = 14)
plt.xticks([0, 5, 10, 15, 20, 25, 30, 35],
           ['0','5','10','15','20','25','30', '35'])
plt.show();

> ### Q2: What's the players didn't play any game?

In [None]:
# combine 2 colums "displayName" & "gameId" then sort players desending regarding to no games played. 
df_game_player_nogame_sort = df_displayname.groupby('displayName')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = True )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (10,12))

# figure "a" represent number of top players palaying a zero number of games. 
a = sns.barplot (x = df_game_player_nogame_sort.gameId, y = df_game_player_nogame_sort.displayName[:8], ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of max. games player by each player.
for i,j in enumerate(df_game_player_nogame_sort.gameId[:8]): 
        ax1.text(.05,i+0.15,j,weight = "bold", size = 15)

a.set_title("Total Number of Games played by the top 30 players Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Games Count', size = 15)
a.set_ylabel('Top 20 Player Vs. Games Distribution', size = 14)
plt.xticks([0, 5, 10, 15, 20, 25, 30, 35],
           ['0','5','10','15','20','25','30', '35'])
plt.show();

> ### Q3: How Many Players in each Direction at All Games ?

In [None]:
df_weeks_direction = df_weeks[['nflId', 'displayName', 'playDirection']]
df_players_direction = df_players[['nflId', 'displayName']]

In [None]:
# check Dataframe duplication
df_weeks_direction.duplicated().sum()

In [None]:
# drop douplication at df_players_team
df_weeks_direction = df_weeks_direction.drop_duplicates()

In [None]:
# merge 2 Dataframe "df_players" and "df_team"
df_direction = pd.merge(df_players_direction, df_weeks_direction, how = 'left', on =['nflId','displayName'])
df_direction

In [None]:
df_direction['direction'] = df_direction.nflId.duplicated()

In [None]:
df_direction.direction = df_direction.direction.replace(True, 'both')
df_direction

In [None]:
df_direction_both = df_direction.query('direction == "both"')

In [None]:
df_direction_both.drop('playDirection', axis = 1)

In [None]:
# merge 2 Dataframe "df_players" and "df_team"
df_direction = pd.merge(df_direction, df_direction_both, how = 'left', on =['nflId','displayName'])
df_direction = df_direction.drop(['direction_x', 'playDirection_y'], axis = 1)
df_direction = df_direction.rename(columns = {'playDirection_x': 'playDirection', 'direction_y': 'direction'})
df_direction.direction = df_direction.direction.fillna(df_direction.playDirection)
df_direction.direction = df_direction.direction.fillna('Not playing Any Games')
df_direction = df_direction.drop('playDirection', axis = 1)
df_direction = df_direction.rename(columns = {'direction': 'playDirection'})
df_direction = df_direction.drop_duplicates()

df_direction

In [None]:
df_direction.groupby('playDirection')['nflId'].count().reset_index().sort_values( by = 'nflId', ascending = False )

In [None]:
# combine 2 colums "displayName" & "gameId" then sort players desending regarding to no games played. 
df_direction_sort = df_direction.groupby('playDirection')['nflId'].count().reset_index().sort_values( by = 'nflId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (8,5))

# figure "a" represent number of top players palaying a zero number of games. 
a = sns.barplot (x = df_direction_sort.nflId, y = df_direction_sort.playDirection, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of max. games player by each player.
for i,j in enumerate(df_direction_sort.nflId): 
        ax1.text(.7,i+0.03,j,weight = "bold", size = 15)

a.set_title("Players Direction Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Play Direction Count', size = 15)
a.set_ylabel('Direction Distribution', size = 14)
plt.xticks([0, 200, 400, 600, 800, 1000, 1200],
           ['0','200','400','600','800','1K','1.2K'])
plt.show();

> ### Q4: What's the Quarters per games?

In [None]:
quarters = df_plays[['gameId','quarter']]

In [None]:
quarters.duplicated().sum()

In [None]:
quarters = quarters.drop_duplicates()

In [None]:
quarters.quarter = quarters.quarter.replace(1, np.nan)
quarters.quarter = quarters.quarter.replace(2, np.nan)
quarters.quarter = quarters.quarter.replace(3, np.nan)
quarters = quarters.dropna(axis = 0)
quarters

In [None]:
quarters['direction'] = quarters.gameId.duplicated()

In [None]:
quarters.direction = quarters.direction.replace(True, 'both')
quarters

In [None]:
quarters_both = quarters.query('direction == "both"')

In [None]:
quarters_both

In [None]:
quarters_both.drop('quarter', axis = 1)

In [None]:
# merge 2 Dataframe "quarters" and "quarters_both"
quarters = pd.merge(quarters, quarters_both, how = 'left', on ='gameId')
quarters = quarters.drop(['direction_x', 'quarter_y'], axis = 1)
quarters = quarters.rename(columns = {'quarter_x': 'quarter', 'direction_y': 'direction'})
quarters.direction = quarters.direction.fillna(quarters.quarter)
quarters = quarters .drop('quarter', axis = 1)
quarters = quarters .rename(columns = {'direction': 'quarter'})
quarters = quarters.drop_duplicates()
quarters.quarter = quarters.quarter.replace('both','Five')
quarters.quarter = quarters.quarter.replace(4,'Four')

quarters

In [None]:
quarters.groupby('quarter')['gameId'].count().reset_index().sort_values( by = 'quarter', ascending = False )

In [None]:
# combine 2 colums "quarter" & "gameId" then sort games quarters desending regarding to total number of quarters played. 
df_game_gurter_sort = quarters.groupby('quarter')['gameId'].count().reset_index().sort_values( by = 'gameId', ascending = False )

In [None]:
fig, ax1 = plt.subplots (1,figsize = (12,4))

# figure "a" represent number of top players palaying a max. number of games. 
a = sns.barplot (x = df_game_gurter_sort.gameId, y = df_game_gurter_sort.quarter, ax = ax1 , 
                 linewidth = 1 ,alpha = 0.7, palette = 'Blues_r')

# create loop to count number of max. games player by each player.
for i,j in enumerate(df_game_gurter_sort.gameId): 
        ax1.text(.05,i+0.08,j,weight = "bold", size = 15)

a.set_title("Quarters per Games Distribution" , weight = 'bold', size = 16)
a.set_xlabel('Total Games Count', size = 15)
a.set_ylabel('Games Quarters\n Distribution', size = 14)
plt.yticks([0,1],['Quarter 4','Quarter 5'])
plt.show();

> ### Q5: How We Can Create a Football Field ?

In [None]:
def NFL_field(line_numbers = True, end_zones = True, home_zones = True, visitor_zones = True, gameId = True,  highlight_line = False,
              highlight_line_number = 50, highlight_home_number = 50, highlight_away_number = 50, highlight_first_down_line=False, 
              yards_to_go = 10, highlight_name = 'Scrimmage_Line', fifty_lose = False):
    
    #create function for plotting football field
    rect = patches.Rectangle((0,0), 120, 53.3, linewidth = 0.2, edgecolor = 'r', facecolor = 'darkgreen', zorder = 0)
    fig,ax = plt.subplots(1, figsize = (18,10.5))
    ax.add_patch(rect)
    
    plt.plot([10, 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70, 80, 80, 90, 90, 100, 100, 110, 110, 120, 0, 0, 120, 120],
             [0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 53.3, 0, 0, 53.3], 
             color = 'w')
   
    if fifty_lose:
        plt.plot([60, 60], [0, 53.3], color = 'gold')
        plt.text(62, 50, '<- Player Yardline at Snap', color = 'gold')
       
    if home_zones == True:
        home_zone = patches.Rectangle((0, 0), 10, 53.3, linewidth = 0.1, edgecolor = 'r', facecolor = 'darkblue', alpha = 0.7, zorder = 0)
        plt.text(3, 27, 'Home Team', verticalalignment = 'center', fontsize = 20, color = 'red', rotation = 90)
        ax.add_patch(home_zone)
        
    else:
        h = ('gameId == "{}"').format(home_zones)
        home_zone = patches.Rectangle((0, 0), 10, 53.3, linewidth = 0.1, edgecolor = 'r', facecolor = 'darkblue', alpha = 0.7, zorder = 0)
        plt.text(3, 27, df_games.query(h)['homeTeamAbbr'].values[0] + ' Team', verticalalignment = 'center', fontsize = 20, color = 'red', rotation = 90)
        ax.add_patch(home_zone)
        
    if visitor_zones == True:
        visitor_zone = patches.Rectangle((110, 0), 120, 53.3, linewidth = 0.1, edgecolor = 'r', facecolor = 'darkblue', alpha = 0.7, zorder = 0)
        plt.text(113, 27, 'Visitor Team', verticalalignment = 'center', fontsize = 20, color = 'yellow', rotation = 90)
        ax.add_patch(visitor_zone)
        
    else:
        a = ('gameId == "{}"').format(visitor_zones)
        visitor_zone = patches.Rectangle((110, 0), 120, 53.3, linewidth = 0.1, edgecolor = 'r', facecolor = 'darkblue', alpha = 0.7, zorder = 0)
        plt.text(113, 27, df_games.query(h)['visitorTeamAbbr'].values[0] + ' Team', verticalalignment = 'center', fontsize = 20, color = 'yellow', rotation = 90)
        ax.add_patch(visitor_zone)
       
    plt.xlim(0, 120)
    plt.ylim(-5, 59)
    plt.axis('off')
   
    if line_numbers:
        for x in range(10, 110, 97):
            number = x
            plt.text(x + 1.5, 5, str('G'), horizontalalignment = 'center', fontsize = 20, color = 'w')
            plt.text(x + 1.5, 53.3 - 5, str('G'), horizontalalignment = 'center', fontsize = 20, color = 'w', rotation = 180)
            
            
        for x in range(20, 110, 10):
            number = x
            if x > 50:
                number = 120 - x
                plt.text(x, 5, str(number - 10), horizontalalignment = 'center', fontsize = 20, color = 'w')
                plt.text(x - 0.9, 53.3 - 5, str(number - 10), horizontalalignment = 'center', fontsize = 20, color = 'w', rotation = 180)
            elif x <= 50:
                number = 0 + x
                plt.text(x, 5, str(number - 10), horizontalalignment = 'center', fontsize = 20, color = 'w')
                plt.text(x - 0.9, 53.3 - 5, str(number - 10), horizontalalignment = 'center', fontsize = 20, color = 'w', rotation = 180) 
                          
    if end_zones:
        hash_range = range(11, 110)
    else:
        hash_range = range(1, 120)
    
    for x in hash_range:
        ax.plot([x, x], [1.5, 0.5], color='white')
        ax.plot([x, x], [22.5, 23.5], color='white')
        ax.plot([x, x], [29.5, 30.5], color='white')
        ax.plot([x, x], [52.8, 51.8], color='white') 
        
    if highlight_line == True:
        h = highlight_line_number + 10
        plt.plot([h, h], [0, 53.3], color = 'red', linewidth = 3)
        plt.text(h + 2, 50, '<- {}'.format(highlight_name), color = 'red', size = 16, weight = 'bold')
    
    elif highlight_first_down_line:
        y = highlight_line_number + 10 + yards_to_go
        plt.plot([y, y], [0, 53.3], color='yellow', linewidth = 3)
    
    elif highlight_line == False:
        
        if highlight_home_number > 50:
            h_1 = 50 - highlight_home_number
            plt.plot([h_1, h_1], [0, 53.3], color = 'red', linewidth = 3)
            plt.text(h_1 + 2, 45, '<- {} \nat {} Team'.format(highlight_name, df_games.query(h)['homeTeamAbbr'].values[0]),
                     color = 'red', size = 16, weight = 'bold',rotation = 0)
            
        elif highlight_home_number < 50:
            h_1 = highlight_home_number + 10
            plt.plot([h_1, h_1], [0, 53.3], color = 'red', linewidth = 3)
            plt.text(h_1 + 2, 45, '<- {} \nat {} Team'.format(highlight_name, df_games.query(h)['homeTeamAbbr'].values[0]),
                     color = 'red', size = 16, weight = 'bold',rotation = 0)
        
        if highlight_away_number > 50:
            h_2 = 50 - highlight_away_number
            plt.plot([h_2, h_2], [0, 53.3], color = 'yellow', linewidth = 3)
            plt.text(h_2 + 2, 45, '<- {} \nat {} Team'.format(highlight_name, df_games.query(h)['visitorTeamAbbr'].values[0]),
                     color = 'yellow', size = 16, weight = 'bold', rotation = 0)
        
        elif highlight_away_number < 50:
            h_1 = highlight_away_number + 10
            plt.plot([h_1, h_1], [0, 53.3], color = 'yellow', linewidth = 3)
            plt.text(h_1 + 2, 45, '<- {} \nat {} Team'.format(highlight_name, df_games.query(h)['visitorTeamAbbr'].values[0]),
                     color = 'yellow', size = 16, weight = 'bold',rotation = 0)
            
    return fig, ax
plt.show();

In [None]:
NFL_field()
plt.show();

> ### Q6: How we Compare between 2 teams (Home Team Vs. Visitor Team) related several parameters as below?
>> ##### 1. **`gameId`** : to choosing which game we need to get the comparison. 
>> ##### 2. **`team `** : for choosing Home Team Vs. Visitor Team.
>> ##### 3. **`event`**  : Tagged play details, including moment of (first contact, ball snap, pass release, pass catch, tackle, etc...).
>> ##### 4. **`position_group`** :  Player position group ( FB, HB, QB, RB, TE and WR).


In [None]:
def match_players(game_id, event = 'first_contact', pos = 'RB'):
    fig ,ax = NFL_field(home_zones = '{}'.format(game_id), visitor_zones = '{}'.format(game_id))
                        
    home = ('gameId == "{}" and team == "home" and event == "{}" and position == "{}"').format(game_id, str(event), str(pos))
    home = df_weeks.query(home)
    home.plot(x='x', y='y', kind='scatter',ax = ax, s=50, color='red',marker='o', label = 'Home Players')
    
    away = ('gameId == "{}" and team == "away" and event == "{}" and position == "{}"').format(game_id, str(event), str(pos))
    away = df_weeks.query(away)
    away.plot(x='x', y='y', kind='scatter',ax = ax, s=50, color='yellow',marker='o', label = 'Visitor Players')
    
    plt.title('\nPlayers at Game No."{}" \nwith event :"{}" \nat position Group : "{}".'.format(game_id, event, pos),
              color = 'black',size = 16, weight = 'bold')
    return fig, ax
plt.show();

In [None]:
match_players(2018090600, 'first_contact','RB')
plt.show();

> ### Q7: Create paths of `one player at Home Team` and `one player at Visitor Team` at certain Game ?
>> ##### 1. **`gameId`** : to choosing which game we need to get the comparison. 
>> ##### 2. **`team `** : for choosing Home Team Vs. Visitor Team.
>> ##### 3. **`player`** : Tagged play details, including moment of (first contact, ball snap, pass release, pass catch, tackle, etc...).

In [None]:
def players_path(game_id, player_home_id, player_away_id):
    fig ,ax = NFL_field(home_zones = '{}'.format(game_id), visitor_zones = '{}'.format(game_id))
                        
    home = ('gameId == "{}" and team == "home" and playId == "{}"').format(game_id, player_home_id)
    home = df_weeks.query(home)
    home.plot(x='x', y='y', kind='scatter',ax = ax, s=30, color='red',marker='o',label = 'Home Player Path')
    
    away = ('gameId == "{}" and team == "away" and playId == "{}"').format(game_id, player_away_id)
    away = df_weeks.query(away)
    away.plot(x='x', y='y', kind='scatter',ax = ax, s=30, color='yellow',marker='o', label = 'Visitor Player Path')
    #plt.arrow(x, y, dx, dy, color='black', width=0.25, shape='full' )
    plt.title('\nPlayers at Game No."{}" \nHome Player Path No. :"{}" Vs. Visitor Player Path No. : "{}".'.format(game_id, player_home_id, player_away_id), 
              color = 'black',size = 16, weight = 'bold')
    return fig, ax
plt.show();

In [None]:
players_path(2018090600, 1085, 521)
plt.show();

> ### Q8: Create paths of `one player at Home Team` and `one player at Visitor Team` starting from `YardlineNumber` at certain Game ?
>> ##### 1. **`gameId`** : to choosing which game we need to get the comparison. 
>> ##### 2. **`team `** : for choosing Home Team .
>> ##### 3. **`playId`** : Play identifier.
>> ##### 4. **`yardlineNumber`** : Yard line at line-of-scrimmage.

In [None]:
def Scrimmage_line(game_id, player_home_id, player_away_id):
    
    home_player_weeks = ('team == "home" and playId == "{}"').format(player_home_id)
    home_player_plays = ('gameId == "{}" and playId == "{}"').format(game_id, player_home_id)
    away_player_weeks = ('team == "away" and playId == "{}"').format(player_away_id)
    away_player_plays = ('gameId == "{}" and playId == "{}"').format(game_id, player_away_id)
    
    home_line = df_plays.query(home_player_plays)['yardlineNumber'].tolist()[0]
    away_line = df_plays.query(away_player_plays)['yardlineNumber'].tolist()[0]
    
    fig ,ax = NFL_field( highlight_home_number = home_line , highlight_away_number = away_line, home_zones = '{}'.format(game_id),
                        visitor_zones = '{}'.format(game_id))
                        
    home = ('gameId == "{}" and team == "home" and playId == "{}"').format(game_id, player_home_id)
    home = df_weeks.query(home)
    home.plot(x='x', y='y', kind='scatter',ax = ax, s=30, color='red', label = 'Home Player')
    
    away = ('gameId == "{}" and team == "away" and playId == "{}"').format(game_id, player_away_id)
    away = df_weeks.query(away)
    away.plot(x='x', y='y', kind='scatter',ax = ax, s=30, color='yellow', label = 'Visitor Player')
    
    plt.title('\nPlayers at Game No."{}" \nHome Player No. :"{}" Vs. Visitor Player No. : "{}".'
              .format(game_id, player_home_id, player_away_id),color = 'black',size = 16, weight = 'bold')
    return fig, ax
plt.show();

In [None]:
Scrimmage_line(2018090600, 320,146)
plt.show();

### Q9: Create Video paths for `one player at Home Team` starting from `YardlineNumber`at certain Game ?
>> ##### 01. **`gameId`** : to choosing which game we need to get the comparison. 
>> ##### 02. **`team `** : for choosing Home Team .
>> ##### 03. **`playId`** : Play identifier.
>> ##### 04. **`yardlineNumber`** : Yard line at line-of-scrimmage.
>> ##### 05. **`yardsToGo`** : Distance needed for a first down.
>> ##### 06. **`absoluteYardlineNumber`** : Distance from end zone for possession team.
>> ##### 07. **`playDirection`** : Direction that the offense is moving.
>> ##### 08. **`x`** : Player position along the long axis of the field, 0 - 120 yards.
>> ##### 09. **`y`** : Player position along the short axis of the field, 0 - 53.3 yards.
>> ##### 10. **`s`** : Speed in yards/second (numeric)
>> ##### 11. **`o`** : Player orientation (deg), 0 - 360 degrees.
>> ##### 12. **`dir`** : Angle of player motion (deg), 0 - 360 degrees.
>> ##### 13. **`jerseyNumber`** : Jersey number of player.

In [None]:
def calculate_dx_dy_arrow(x, y, angle, speed, multiplier):
    if angle > 0 and angle <= 90:
        angle = angle
        dx = np.sin(radians(angle)) * multiplier * speed
        dy = np.cos(radians(angle)) * multiplier * speed
        return dx, dy
    elif angle > 90 and angle <= 180:
        angle = angle - 90
        dx = np.sin(radians(angle)) * multiplier * speed
        dy = -np.cos(radians(angle)) * multiplier * speed
        return dx, dy
    elif angle > 180 and angle <= 270:
        angle = angle - 180
        dx = -(np.sin(radians(angle)) * multiplier * speed)
        dy = -(np.cos(radians(angle)) * multiplier * speed)
        return dx, dy
    else:
        angle = angle - 270
        dx = -np.sin(radians(angle)) * multiplier * speed
        dy = np.cos(radians(angle)) * multiplier * speed
        return dx, dy

In [None]:
def animate_player_movement(game_Id, play_id):
    
    home_player_weeks_team = ('gameId == "{}" and playId == "{}" and team == "home"').format(game_Id, play_id)
    home_plays_team = ('gameId == "{}" and playId == "{}"').format(game_Id, play_id)
    home_play = df_weeks.query(home_player_weeks_team)
    
    
    away_player_weeks_team = ('gameId == "{}" and playId == "{}" and team == "away"').format(game_Id, play_id)
    away_plays_team = ('gameId == "{}" and playId == "{}"').format(game_Id, play_id)
    away_play = df_weeks.query(away_player_weeks_team)
    
    
    ball_team = ('gameId == "{}" and playId == "{}" and team == "football"').format(game_Id, play_id)
    ball_play = df_weeks.query(ball_team)
    
    yard_line = ('gameId == "{}" and playId == "{}"').format(game_Id, play_id)
    yard_line_number = df_plays.query(yard_line)['yardlineNumber'].item()
    yards_To_Go = df_plays.query(yard_line)['yardsToGo'].item()
    absolute_Yard_line_Number = df_plays.query(yard_line)['absoluteYardlineNumber'].item() - 10
    play_Dir = home_play['playDirection'].values[0]
    
    home_play.time = home_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    away_play.time = away_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    ball_play.time = ball_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
       
    if (home_play.team.count() >= int(away_play.team.count())) :
        max_time = int(home_play.time.max())
        min_time = int(home_play.time.min())
    
    else:
        max_time = int(away_play.time.max())
        min_time = int(away_play.time.min())
    
    if (absolute_Yard_line_Number > 50):
        yard_line_number = 100 - yard_line_number
    if (absolute_Yard_line_Number <= 50):
        yard_line_number = yard_line_number
        
    if (play_Dir == 'left'):
        yards_To_Go = -yards_To_Go
    else:
        yards_To_Go = yards_To_Go
    
    fig, ax = NFL_field(highlight_line=True, highlight_line_number = yard_line_number, highlight_first_down_line = True, yards_to_go = yards_To_Go,
                        home_zones = '{}'.format(game_Id), visitor_zones = '{}'.format(game_Id))
    h = ('gameId == "{}"').format(game_Id)
    a = ('gameId == "{}"').format(game_Id)
    home_team = df_games.query(h)['homeTeamAbbr'].values[0]
    away_team = df_games.query(a)['visitorTeamAbbr'].values[0]
   
    
    plt.title('\nPlayers at Game No. : "{}" \nHome Team :"{}" Team Vs. Visitor Team : "{}" Team.'.format(game_Id,home_team, away_team), 
              color = 'black',size = 16, weight = 'bold')
    
    
    def update_animation(time):
        patch = []
        # Home players' location
        homeX = home_play.query('time == ' + str(time))['x']
        homeY =home_play.query('time == ' + str(time))['y']
        homeNum = home_play.query('time == ' + str(time))['jerseyNumber']
        homeOrient = home_play.query('time == ' + str(time))['o']
        homeDir = home_play.query('time == ' + str(time))['dir']
        homeSpeed = home_play.query('time == ' + str(time))['s']
        patch.extend(plt.plot(homeX, homeY, 'o',c='yellow', ms=20, mec='red'))
        
        # Home players' jersey number 
        for x, y, num in zip(homeX, homeY, homeNum):
            patch.append(plt.text(x, y, int(num), va='center', ha='center', color='red', size='medium'))
            
        # Home players' orientation
        for x, y, orient in zip(homeX, homeY, homeOrient):
            dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
            patch.append(plt.arrow(x, y, dx, dy, color='yellow', width=0.5, shape='full', ec = 'red'))
        
        # Home players' direction
        for x, y, direction, speed in zip(homeX, homeY, homeDir, homeSpeed):
            dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
            patch.append(plt.arrow(x, y, dx, dy, color='black', width=0.25, shape='full' ))
        
        # away players' location
        awayX = away_play.query('time == ' + str(time))['x']
        awayY = away_play.query('time == ' + str(time))['y']
        awayNum = away_play.query('time == ' + str(time))['jerseyNumber']
        awayOrient = away_play.query('time == ' + str(time))['o']
        awayDir = away_play.query('time == ' + str(time))['dir']
        awaySpeed = away_play.query('time == ' + str(time))['s']
        patch.extend(plt.plot(awayX, awayY, 'o',c='red', ms=20, mec='white'))
        
        # Away players' jersey number 
        for x, y, num in zip(awayX, awayY, awayNum):
            patch.append(plt.text(x, y, int(num), va='center', ha='center', color='white', size='medium'))
            
        # Away players' orientation
        for x, y, orient in zip(awayX, awayY, awayOrient):
            dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
            patch.append(plt.arrow(x, y, dx, dy, color='red', width=0.5, shape='full', ec = 'white'))
        
        # Away players' direction
        for x, y, direction, speed in zip(awayX, awayY, awayDir, awaySpeed):
            dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
            patch.append(plt.arrow(x, y, dx, dy, color='black', width=0.25, shape='full'))
        
        # Ball location
        ballX = ball_play.query('time == ' + str(time))['x']
        ballY = ball_play.query('time == ' + str(time))['y']
        patch.extend(plt.plot(ballX, ballY, 'o', c='black', ms=10, mec='white', data=ball_play.query('time == ' + str(time))['team']))
        
        
        return patch
    
    ims = [[]]
    for time in np.arange(min_time, max_time+1):
        patch = update_animation(time)
        ims.append(patch)
        
    anim = animation.ArtistAnimation(fig, ims, repeat=False)
    
    return anim

In [None]:
anim = animate_player_movement(2018090600, 320)

In [None]:
writer = animation.FFMpegWriter(fps=10, metadata=dict(artist='Desgined by Peter Girgis for NFL 2018'), bitrate=1800)
path = 'animation at Game No. 2018090600.mp4'
anim.save(path , writer = writer)
Video(path)

### Q10: Create Full Video Game?
>> ##### 01. **`gameId`** : to choosing which game we need to get the comparison. 
>> ##### 02. **`team `** : for choosing Home Team .
>> ##### 03. **`playId`** : Play identifier.
>> ##### 04. **`playDirection`** : Direction that the offense is moving.
>> ##### 05. **`x`** : Player position along the long axis of the field, 0 - 120 yards.
>> ##### 06. **`y`** : Player position along the short axis of the field, 0 - 53.3 yards.
>> ##### 07. **`s`** : Speed in yards/second (numeric)
>> ##### 08. **`o`** : Player orientation (deg), 0 - 360 degrees.
>> ##### 09. **`dir`** : Angle of player motion (deg), 0 - 360 degrees.
>> ##### 10. **`jerseyNumber`** : Jersey number of player.

In [None]:
def animate_Full_Game(game_id):
    
    home_player_weeks_team = ('gameId == "{}" and team == "home"').format(game_id)
    home_play = df_weeks.query(home_player_weeks_team)
    
    
    away_player_weeks_team = ('gameId == "{}" and team == "away"').format(game_id)
    away_play = df_weeks.query(away_player_weeks_team)

    
    ball_team = ('gameId == "{}" and team == "football"').format(game_id)
    ball_play = df_weeks.query(ball_team)
    
    play_Dir = home_play['playDirection'].values[0]
    
    home_play.time = home_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    away_play.time = away_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    ball_play.time = ball_play.time.apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    
    if (home_play.team.count() >= int(away_play.team.count())) :
        max_time = int(home_play.time.max())
        min_time = int(home_play.time.min())
    
    else:
        max_time = int(away_play.time.max())
        min_time = int(away_play.time.min())
    
    
    
    fig, ax = NFL_field(highlight_line=False, highlight_first_down_line = False, home_zones = '{}'.format(game_id),
                        visitor_zones = '{}'.format(game_id))
    h = ('gameId == "{}"').format(game_id)
    a = ('gameId == "{}"').format(game_id)
    home_team = df_games.query(h)['homeTeamAbbr'].values[0]
    away_team = df_games.query(a)['visitorTeamAbbr'].values[0]
   
    
    plt.title('\nPlayers at Game No. : "{}" \nHome Team :"{}" Team Vs. Visitor Team : "{}" Team.'.format(game_id,home_team, away_team), 
              color = 'black',size = 16, weight = 'bold')
    
    
    def update_animation(time):
        patch = []
        # Home players' location
        homeX = home_play.query('time == ' + str(time))['x']
        homeY = home_play.query('time == ' + str(time))['y']
        homeNum = home_play.query('time == ' + str(time))['jerseyNumber']
        homeOrient = home_play.query('time == ' + str(time))['o']
        homeDir = home_play.query('time == ' + str(time))['dir']
        homeSpeed = home_play.query('time == ' + str(time))['s']
        patch.extend(plt.plot(homeX, homeY, 'o',c = 'yellow', ms = 20, mec= 'red'))
        
        # Home players' jersey number 
        for x, y, num in zip(homeX, homeY, homeNum):
            patch.append(plt.text(x, y, int(num), va = 'center', ha = 'center', color = 'red', size = 'medium'))
            
        # Home players' orientation
        for x, y, orient in zip(homeX, homeY, homeOrient):
            dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
            patch.append(plt.arrow(x, y, dx, dy, color = 'yellow', width = 0.5, shape = 'full', ec = 'red'))
        
        # Home players' direction
        for x, y, direction, speed in zip(homeX, homeY, homeDir, homeSpeed):
            dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
            patch.append(plt.arrow(x, y, dx, dy, color = 'black', width = 0.25, shape = 'full' ))
        
        # away players' location
        awayX = away_play.query('time == ' + str(time))['x']
        awayY = away_play.query('time == ' + str(time))['y']
        awayNum = away_play.query('time == ' + str(time))['jerseyNumber']
        awayOrient = away_play.query('time == ' + str(time))['o']
        awayDir = away_play.query('time == ' + str(time))['dir']
        awaySpeed = away_play.query('time == ' + str(time))['s']
        patch.extend(plt.plot(awayX, awayY, 'o',c = 'red', ms = 20, mec = 'white'))
        
        # Away players' jersey number 
        for x, y, num in zip(awayX, awayY, awayNum):
            patch.append(plt.text(x, y, int(num), va = 'center', ha = 'center', color = 'white', size = 'medium'))
            
        # Away players' orientation
        for x, y, orient in zip(awayX, awayY, awayOrient):
            dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
            patch.append(plt.arrow(x, y, dx, dy, color = 'red', width = 0.5, shape = 'full', ec = 'white'))
        
        # Away players' direction
        for x, y, direction, speed in zip(awayX, awayY, awayDir, awaySpeed):
            dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
            patch.append(plt.arrow(x, y, dx, dy, color = 'black', width = 0.25, shape = 'full'))
        
       # # Ball location
        ballX = ball_play.query('time == ' + str(time))['x']
        ballY = ball_play.query('time == ' + str(time))['y']
        patch.extend(plt.plot(ballX, ballY, 'o', c = 'black', ms = 10, mec = 'white', data = ball_play.query('time == ' + str(time))['team']))
        
        
        return patch
    
    ims = [[]]
    for time in np.arange(min_time, max_time + 1):
        patch = update_animation(time)
        ims.append(patch)
        
    anim = animation.ArtistAnimation(fig, ims, repeat=False)
    
    return anim

In [None]:
# rendering full match required 5 hours.
anim = animate_Full_Game(2018090600)

In [None]:
# saving a full match required 5 hours, required code as below
writer = animation.FFMpegWriter(fps = 10, metadata = dict(artist='Desgined by Peter Girgis for NFL 2018'), bitrate = 1800)
path = 'full match at Game No.2018090600.mp4'
anim.save(path , writer = writer)

> ### Q11: providing Profile Report for each Dataframe 
>> ##### 01. **`df_games`** 
>> ##### 02. **`df_plays `** 
>> ##### 03. **`df_players`**
>> ##### 04. **`df_weeks`**

In [None]:
df_games_profile = pdp.ProfileReport(df_games)
df_games_profile