![#NFL](https://www.thesportsgeek.com/wp-content/uploads/2019/12/NFL-768x372.jpg)

In [None]:
import numpy as np
import pandas as pd
        
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go

pd.set_option('display.max_columns', None)

#            **Game DATA**

**Game data:** The games.csv contains the teams playing in each game. The key variable is gameId

* **gameId:** Game identifier, unique (numeric)
* **gameDate:** Game Date (time , mm/dd/yyyy)
* **gameTimeEastern:** Start time of game (time, HH:MM:SS, EST)
* **homeTeamAbbr:** Home team three-letter code (text)
* **visitorTeamAbbr:** Visiting team three-letter code (text)
* **week:** Week of game (numeric)

In [None]:
games = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2021/games.csv')

In [None]:
games

In [None]:
games_num = games['gameDate'].value_counts().reset_index()
games_num.columns = ['Date' , "Games"]
games_num = games_num.sort_values('Games' , ascending = True)

fig = px.bar(
      games_num,
    y = 'Date',
    x = 'Games',
    orientation = 'h',
    color = 'Games',
    title = 'Number of games for every Date',
    height = 500,
    width = 500


)

fig.show()

In [None]:
check = games['gameTimeEastern'].value_counts().reset_index()
check.columns = ['Time' , 'Games']
check = check.sort_values('Games')

fig = px.bar(
    check,
    x = 'Games',
    y = 'Time',
    color = 'Games',
    orientation = 'h',
    title = 'Number of games for every Time',
    height = 500,
    width = 500
    


)

fig.show()

In [None]:
check = games['homeTeamAbbr'].value_counts().reset_index()
check.columns = ['Team', 'Games']
check = check.sort_values('Games')

fig = px.bar(
    check, 
    y='Team', 
    x="Games", 
    orientation='h',
    color = 'Games',
    title='Number of games for every team (home)', 
    height=500, 
    width=500
)

fig.show()

In [None]:
check = games['visitorTeamAbbr'].value_counts().reset_index()
check.columns = ['Team', 'Games']
check = check.sort_values('Games')

fig = px.bar(
    check, 
    y='Team', 
    x="Games", 
    orientation='h', 
    title='Number of games for every team (Visitor)', 
    height=500, 
    width=500
)

fig.show()

In [None]:
check = games['week'].value_counts().reset_index()
check.columns = ['Week_Numeric', 'Games']
check = check.sort_values('Games')

fig = px.bar(
    check, 
    y='Week_Numeric', 
    x="Games", 
    orientation='h',
    color = 'Games',
    title='Number of games for every week', 
    height=500, 
    width=500
)

fig.show()

# **Player Data**

**Player data:** The players.csv file contains player-level information from players that participated in any of the tracking data files. The key variable is nflId

* **nflId:** Player identification number, unique across players (numeric)

* **height:** Player height (text)

* **weight:** Player weight (numeric)

* **birthDate:** Date of birth (YYYY-MM-DD)

* **collegeName:** Player college (text)

* **position:** Player position (text)

* **displayName:** Player name (text)

In [None]:
players = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2021/players.csv')
players

Convert all heights to feet.



In [None]:
players['height'] = players['height'].str.replace('-', '.')
players['height'] = players['height'].astype(np.float32)


In [None]:
players.loc[players['height']>10, 'height'] /= 12
players

In [None]:
fig = px.histogram(
    players, 
    x="height",
    width=500,
    height=500,
    nbins=20,
        title='Players height distribution'
)

fig.show()

In [None]:
fig = px.histogram(
    players, 
    x="weight",
    width=500,
    height=500,
    nbins=20,
        title='Players weight distribution'
)

fig.show()

In [None]:
check = players.collegeName.value_counts().reset_index()
check.columns = ['collegeName' , 'Players']
check.sort_values('Players' , inplace=True)

fig = px.bar(
   check.tail(20),
    x = 'Players',
    y = 'collegeName',
    orientation = 'h',
    title = 'Top 20 colleges by number of Players',
    color = 'Players',
    height = 900,
    width = 800


)

fig.show()

In [None]:
check = players.position.value_counts().reset_index()
check.columns = ['Position' , 'Players']
check.sort_values('Players' , inplace=True)

fig = px.bar(
   check,
    x = 'Players',
    y = 'Position',
    orientation = 'h',
    title = 'Top positions by number of players',
    color = 'Players',
    height = 900,
    width = 800


)

fig.show()

**Players positions abbreviation**
* WR: Wide Receiver;
* CB: Cornerback;
* RB: Running Back;
* TE: Tight End;
* OLB: Outside Linebacker;
* QB: Quarterback;
* FS: Free Safety;
* LB: Linebacker;
* SS: Strong Safety;
* ILB: Inside Linebacker;
* DE: Defensive End;
* DB: Defensive Back;
* MLB: Middle Linebacker;
* DT: Defensive Tackle;
* FB: Fullback;
* P: Punter;
* LS: Long snapper;
* S: Safety;
* K: Kicker;
* HB: Running back;
* NT: Nose Tackle

# Play Data

**Play data:** The plays.csv file contains play-level information from each game. The key variables are gameId and playId

* **gameId:** Game identifier, unique (numeric)

* **playId:** Play identifier, not unique across games (numeric)

* **playDescription:** Description of play (text)

* **quarter:** Game quarter (numeric)

* **down:** Down (numeric)

* **yardsToGo:** Distance needed for a first down (numeric)

* **possessionTeam:** Team on offense (text)

* **playType:** Outcome of dropback: sack or pass (text)

* **yardlineSide:** 3-letter team code corresponding to line-of-scrimmage (text)

* **yardlineNumber:** Yard line at line-of-scrimmage (numeric)

* **offenseFormation:** Formation used by possession team (text)

* **personnelO:** Personnel used by offensive team (text)

* **defendersInTheBox:** Number of defenders in close proximity to line-of-scrimmage (numeric)

* **numberOfPassRushers:** Number of pass rushers (numeric)

* **personnelD:** Personnel used by defensive team (text)

* **typeDropback:** Dropback categorization of quarterback (text)

* **preSnapHomeScore:** Home score prior to the play (numeric)

* **preSnapVisitorScore:** Visiting team score prior to the play (numeric)

* **gameClock:** Time on clock of play (MM:SS)

* **absoluteYardlineNumber:** Distance from end zone for possession team (numeric)

* **penaltyCodes:** NFL categorization of the penalties that ocurred on the play. For purposes of this contest, the most important penalties are Defensive Pass Interference (DPI), Offensive Pass Interference (OPI), Illegal Contact (ICT), and Defensive Holding (DH). Multiple penalties on a play are separated by a ; (text)

* **penaltyJerseyNumber:** Jersey number and team code of the player commiting each penalty. Multiple penalties on a play are separated by a ; (text)

* **passResult:** Outcome of the passing play (C: Complete pass, I: Incomplete pass, S: Quarterback sack, IN: Intercepted pass, text)

* **offensePlayResult:** Yards gained by the offense, excluding penalty yardage (numeric)

* **playResult:** Net yards gained by the offense, including penalty yardage (numeric)

* **epa:** Expected points added on the play, relative to the offensive team. Expected points is a metric that estimates the average of every next scoring outcome given the play's down, distance, yardline, and time remaining (numeric)

* **isDefensivePI:** An indicator variable for whether or not a DPI penalty ocurred on a given play (TRUE/FALSE)

In [None]:
plays = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2021/plays.csv')
plays

In [None]:
check = plays['possessionTeam'].value_counts().reset_index()
check.columns = ['team', 'plays']
check = check.sort_values('plays')

fig = px.bar(
    check, 
    y='team', 
    x="plays", 
    orientation='h', 
    color = 'plays',
    title='Number of plays for every team',
    height=800,
    width=800
)

fig.show()

In [None]:
check = plays['playType'].value_counts().reset_index()
check.columns = ['type', 'plays']
check = check.sort_values('plays')

fig = px.pie(
    check, 
    names='type', 
    values="plays",  
    title='Number of plays of every type',
    height=600,
    width=600
)

fig.show()

In [None]:
check = plays['yardlineNumber'].value_counts().reset_index()
check.columns = ['yardline', 'plays']
check = check.sort_values('plays')

fig = px.bar(
    check, 
    x='yardline', 
    y="plays",  
    title='Number of plays for every yardline',
    height=600,
    width=800,
    color = 'plays'
)

fig.show()

In [None]:
check = plays['offenseFormation'].value_counts().reset_index()
check.columns = ['offenseFormation', 'plays']
check = check.sort_values('plays')

fig = px.pie(
    check, 
    names='offenseFormation', 
    values="plays",  
    title='Number of plays for every offense formation type',
    height=600,
    width=600
)

fig.show()

In [None]:
check = plays['defendersInTheBox'].value_counts().reset_index()
check.columns = ['defendersInTheBox', 'plays']
check = check.sort_values('plays')

fig = px.bar(
    check, 
    x='defendersInTheBox', 
    y="plays",  
    title='Number of plays for every number of defenders in the box',
    height=600,
    width=800
)

fig.show()

