# Final Project

1.Research Question
- How has the performance of NFL teams evolved over the past decade in terms of win percentages and offensive metrics?
- How have wins evolved over time for each conference in the NFL?
- What is the relationship between total yards, points, and wins? Which factor—yards or points—has a greater impact on winning?
- How do average points compare between the best and worst teams in the league?
- Which factor, turnover percentage or penalties, has a more significant impact on wins and point differential? What is the effect of turnover percentage on win percentage and point differential?
- Can a regression model be used to accurately predict wins in the NFL?
- What correlations exist between various NFL statistics?
= How are passing yards and rushing yards related across teams?
- How do total snaps influence offensive outcomes?
Investigate the relationship between total snaps and key metrics like yards gained, touchdowns, and total points.

- What is the impact of passing efficiency on team success?
Explore the relationship between pass completion percentage and win percentage, total points, and yards gained.

- How does the frequency of rushing versus passing impact overall team performance?
Compare teams that rely more heavily on rushing or passing in terms of win percentage, yards per snap (yps), and points per game.

- Do teams that excel in no-huddle and shotgun formations score more points?
Examine if teams with higher no-huddle and shotgun usage have better offensive performance (yards gained, touchdowns, points).

- What is the correlation between air yards and total passing yards?
Determine how air yards contribute to overall passing success and offensive productivity.

- Does a higher number of turnovers (fumbles and interceptions) lead to a significant drop in win percentage?
Analyze the impact of turnovers on win percentage and other success metrics like point differential.

- How does a team's home and away performance differ in terms of wins and points scored?
Compare the performance of teams at home vs. away, looking at wins, losses, and points per game.

- Which factors contribute most to a high points per game average?
Identify the strongest predictors of points per game by exploring variables such as pass attempts, rushing yards, and touchdowns.

- Is there a relationship between receiving yards and points scored?
Explore if teams with higher receiving yards tend to score more points or win more games.

- How does win percentage change over time across teams with different offensive strategies?
Track how offensive strategies (e.g., passing-heavy, rushing-heavy) correlate with win percentage trends over multiple seasons.

2.Justification - why is this relevant?
This project is especially relevant with the start of the new 2024 NFL season, as it will allow us to examine changes in team offenses over the past 10 years and provide insights into how teams’ offenses have improved. These findings can also help fans anticipate trends and performances in the upcoming season.

3.Data Sources
NFL dataset 
- https://www.kaggle.com/datasets/philiphyde1/nfl-stats-1999-2022
- https://www.kaggle.com/datasets/nickcantalupa/nfl-team-data-2003-2023/code

4.Libraries Used
- pandas for data manipulation and analysis
- matplotlib and seaborn for visualization
- scikit-learn for any machine learning models or predictions



# Introduction to Dataset and Summary Statistics

## Dataset Overview

In [8]:
# Import necessary libraries
import pandas as pd
from IPython.display import display

# Load the dataset 
df = pd.read_csv('..\Final Project\dataset\yearly_team_data.csv')


# Display the first few rows of the dataframe to get an overview
pd.set_option('display.max_columns', None) # Show all columns
# Clean the 'record' column by stripping leading/trailing whitespace and tabs
df['record'] = df['record'].str.strip()
df.head()

Unnamed: 0,team,season,total_snaps,yards_gained,touchdown,extra_point_attempt,field_goal_attempt,total_points,td_points,xp_points,fg_points,fumble,fumble_lost,shotgun,no_huddle,qb_dropback,pass_snaps_count,pass_snaps_pct,pass_attempts,complete_pass,incomplete_pass,air_yards,passing_yards,pass_td,interception,targets,receptions,receiving_yards,yards_after_catch,receiving_td,pass_fumble,pass_fumble_lost,rush_snaps_count,rush_snaps_pct,qb_scramble,rushing_yards,run_td,run_fumble,run_fumble_lost,home_wins,home_losses,home_ties,away_wins,away_losses,away_ties,wins,losses,ties,win_pct,record,yps
0,ARI,2012,1013,7595,28,25,25,268,168,25,75,16,10,568,53,676,665,0.66,586,337,249,4862,3383,17,21,586,337,3383,1363,17,10,7,348,0.34,11,1207,11,6,3,4,4,0,1,7,0,5,11,0,0.313,5-11-0,7.5
1,ARI,2013,1020,9855,37,37,30,349,222,37,90,17,9,385,17,622,617,0.6,551,363,188,5284,4291,25,22,551,363,4291,1756,25,9,5,403,0.4,5,1560,12,8,4,6,2,0,4,4,0,10,6,0,0.625,10-6-0,9.66
2,ARI,2014,983,9128,28,27,29,282,168,27,87,13,5,495,17,613,598,0.61,556,320,236,6028,3990,22,12,556,320,3990,1604,22,9,3,385,0.39,15,1326,6,4,2,7,1,0,4,4,0,11,5,0,0.688,11-5-0,9.29
3,ARI,2015,1005,11337,53,53,28,455,318,53,84,16,9,400,11,590,583,0.58,543,353,190,6084,4775,36,13,543,353,4775,1782,36,8,5,422,0.42,7,1946,16,8,4,6,2,0,7,1,0,13,3,0,0.813,13-3-0,11.28
4,ARI,2016,1080,10302,51,43,21,412,306,43,63,25,11,529,51,692,688,0.64,626,383,243,6204,4425,30,17,626,383,4425,1747,30,14,6,392,0.36,4,1739,21,11,5,4,4,1,3,5,0,7,9,1,0.412,7-9-1,9.54


### Column Descriptions

| Column Name                | Data Type | Description                                           | Example Values |
|----------------------------|-----------|-------------------------------------------------------|----------------|
| team                       | Object    | The name of the NFL team.                             | "NYG" New York Giants |
| season                     | Int       | The year of the season.                              | 2022           |
| total_snaps                | Int       | Total number of plays (snaps) run by the team.      | 1200           |
| yards_gained               | Int       | Total yards gained by the team during the season.    | 3500           |
| touchdown                  | Int       | Total touchdowns scored by the team.                  | 25             |
| extra_point_attempt        | Int       | Number of extra point attempts.                        | 20             |
| field_goal_attempt         | Int       | Number of field goal attempts made.                   | 15             |
| total_points               | Int       | Total points scored by the team.                      | 275            |
| td_points                  | Int       | Points scored from touchdowns.                         | 150            |
| xp_points                  | Int       | Points scored from extra point attempts.              | 20             |
| fg_points                  | Int       | Points scored from field goals.                       | 45             |
| fumble                     | Int       | Total fumbles committed by the team.                  | 10             |
| fumble_lost                | Int       | Total fumbles lost by the team.                       | 5              |
| shotgun                    | Int       | Number of plays run from a shotgun formation.         | 150            |
| no_huddle                  | Int       | Number of no-huddle plays run by the team.           | 75             |
| qb_dropback                | Int       | Total dropbacks by the quarterback.                   | 400            |
| pass_snaps_count           | Int       | Total number of pass snaps taken.                     | 300            |
| pass_snaps_pct             | Float     | Percentage of snaps that were passing plays.          | 25.0           |
| pass_attempts              | Int       | Total number of passing attempts.                      | 500            |
| complete_pass              | Int       | Total number of completed passes.                      | 350            |
| incomplete_pass            | Int       | Total number of incomplete passes.                    | 150            |
| air_yards                  | Int       | Total air yards gained on passes.                     | 2500           |
| passing_yards              | Int       | Total passing yards gained.                           | 3000           |
| pass_td                    | Int       | Total touchdown passes thrown.                         | 20             |
| interception               | Int       | Total interceptions thrown by the team.               | 10             |
| targets                    | Int       | Total targets for receivers.                           | 400            |
| receptions                 | Int       | Total receptions made by receivers.                    | 350            |
| receiving_yards            | Int       | Total yards gained by receivers.                       | 2800           |
| yards_after_catch          | Int       | Total yards gained after catch by receivers.          | 800            |
| receiving_td               | Int       | Total receiving touchdowns scored.                     | 10             |
| pass_fumble                | Int       | Total fumbles by the quarterback on pass plays.      | 5              |
| pass_fumble_lost           | Int       | Total fumbles lost by the quarterback.                | 2              |
| rush_snaps_count           | Int       | Total number of rush snaps taken.                     | 200            |
| rush_snaps_pct             | Float     | Percentage of snaps that were rushing plays.          | 15.0           |
| qb_scramble                | Int       | Total times the quarterback scrambled.                | 25             |
| rushing_yards              | Int       | Total rushing yards gained.                           | 1500           |
| run_td                     | Int       | Total rushing touchdowns scored.                       | 12             |
| run_fumble                 | Int       | Total rushing fumbles committed.                       | 6              |
| run_fumble_lost            | Int       | Total rushing fumbles lost.                           | 3              |
| home_wins                  | Int       | Total wins at home.                                   | 6              |
| home_losses                | Int       | Total losses at home.                                 | 2              |
| home_ties                  | Int       | Total ties at home.                                   | 0              |
| away_wins                  | Int       | Total wins away.                                      | 5              |
| away_losses                | Int       | Total losses away.                                    | 3              |
| away_ties                  | Int       | Total ties away.                                      | 0              |
| wins                       | Int       | Total wins in the season.                             | 11             |
| losses                     | Int       | Total losses in the season.                           | 5              |
| ties                       | Int       | Total ties in the season.                             | 0              |
| win_pct                    | Float     | Win percentage of the team.                           | 0.688          |
| record                     | Object    | Win-loss-tie record of the team.                      | "5-11-0"       |
| yps                        | Float     | Yards per snap.                                       | 4.8            |


## Summary Statistics

In [17]:
summary_statistics = df.describe(include='all')  # Include all columns, numerical and categorical
summary_statistics = summary_statistics.applymap(lambda x: f'{x:g}' if isinstance(x, (int, float)) else x)

# Display the summary statistics
print("Summary Statistics:")
display(summary_statistics)

Summary Statistics:


Unnamed: 0,team,season,total_snaps,yards_gained,touchdown,extra_point_attempt,field_goal_attempt,total_points,td_points,xp_points,fg_points,fumble,fumble_lost,shotgun,no_huddle,qb_dropback,pass_snaps_count,pass_snaps_pct,pass_attempts,complete_pass,incomplete_pass,air_yards,passing_yards,pass_td,interception,targets,receptions,receiving_yards,yards_after_catch,receiving_td,pass_fumble,pass_fumble_lost,rush_snaps_count,rush_snaps_pct,qb_scramble,rushing_yards,run_td,run_fumble,run_fumble_lost,home_wins,home_losses,home_ties,away_wins,away_losses,away_ties,wins,losses,ties,win_pct,record,yps
count,384,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384,384.0
unique,32,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,42,
top,ARI,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7-9-0,
freq,12,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,37,
mean,,2017.5,1024.26,9670.45,41.651,36.474,26.5911,366.154,249.906,36.474,79.7734,16.9375,8.02865,642.086,102.25,627.474,602.628,0.588307,548.174,357.094,191.081,4593.27,4032.02,26.9531,13.5755,548.174,357.094,4030.8,1851.66,26.9531,10.0286,5.01562,421.633,0.411693,24.875,1859.2,13.9505,6.90885,3.01302,4.5,3.6224,0.03125,3.59115,4.53125,0.03125,8.09115,8.15365,0.0625,0.496836,,9.43904
std,,3.45656,52.2525,1050.13,9.11448,9.61419,5.58177,64.7982,54.6869,9.61419,16.7453,4.63512,2.89624,146.949,102.215,58.0605,59.8663,0.0466698,59.2602,46.2316,26.6093,647.379,535.805,7.40319,4.49421,59.2602,46.2316,535.79,311.686,7.40319,3.38044,2.24708,49.8091,0.0466698,14.6382,348.128,5.11182,3.03496,1.78105,1.79497,1.79224,0.17422,1.7912,1.78741,0.17422,3.06578,3.07348,0.242377,0.18944,,0.884055
min,,2012.0,866.0,6761.0,21.0,16.0,8.0,227.0,126.0,16.0,24.0,6.0,1.0,228.0,6.0,486.0,436.0,0.45,361.0,223.0,131.0,2898.0,2598.0,12.0,2.0,361.0,223.0,2589.0,1124.0,12.0,2.0,0.0,314.0,0.31,1.0,1168.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,,7.1
25%,,2014.75,989.0,8916.5,35.0,30.0,23.0,319.75,210.0,30.0,69.0,14.0,6.0,541.5,45.0,587.75,560.0,0.56,504.75,324.75,174.0,4116.25,3644.75,22.0,10.0,504.75,324.75,3643.5,1644.75,22.0,8.0,3.0,386.0,0.38,14.0,1616.5,10.0,5.0,2.0,3.0,2.0,0.0,2.0,3.0,0.0,6.0,6.0,0.0,0.375,,8.7875
50%,,2017.5,1018.0,9669.5,40.0,35.0,27.0,361.0,240.0,35.0,81.0,17.0,8.0,638.0,74.5,630.0,607.0,0.59,551.5,357.0,191.0,4571.0,4020.0,26.0,13.0,551.5,357.0,4018.0,1841.0,26.0,10.0,5.0,415.0,0.41,22.0,1820.0,13.0,7.0,3.0,5.0,4.0,0.0,4.0,5.0,0.0,8.0,8.0,0.0,0.5,,9.385
75%,,2020.25,1060.25,10348.2,48.0,43.0,30.0,411.0,288.0,43.0,90.0,20.0,10.0,732.25,123.0,665.0,648.0,0.62,589.0,384.25,209.0,5021.25,4388.75,32.0,16.0,589.0,384.25,4388.75,2054.5,32.0,12.0,6.0,456.25,0.44,34.0,2062.5,17.0,9.0,4.0,6.0,5.0,0.0,5.0,6.0,0.0,10.25,10.0,0.0,0.6305,,10.02


## Check for missing data

In [20]:
# Checking for missing values
missing_values = df.isnull().sum()
print("\nMissing Values:")
print(missing_values)


Missing Values:
team                   0
season                 0
total_snaps            0
yards_gained           0
touchdown              0
extra_point_attempt    0
field_goal_attempt     0
total_points           0
td_points              0
xp_points              0
fg_points              0
fumble                 0
fumble_lost            0
shotgun                0
no_huddle              0
qb_dropback            0
pass_snaps_count       0
pass_snaps_pct         0
pass_attempts          0
complete_pass          0
incomplete_pass        0
air_yards              0
passing_yards          0
pass_td                0
interception           0
targets                0
receptions             0
receiving_yards        0
yards_after_catch      0
receiving_td           0
pass_fumble            0
pass_fumble_lost       0
rush_snaps_count       0
rush_snaps_pct         0
qb_scramble            0
rushing_yards          0
run_td                 0
run_fumble             0
run_fumble_lost        0
home_win

## Additional information on dataset

In [21]:
# Additional dataset information (data types, non-null count, etc.)
print("\nDataset Info:") 
df.info()


Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384 entries, 0 to 383
Data columns (total 51 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   team                 384 non-null    object 
 1   season               384 non-null    int64  
 2   total_snaps          384 non-null    int64  
 3   yards_gained         384 non-null    int64  
 4   touchdown            384 non-null    int64  
 5   extra_point_attempt  384 non-null    int64  
 6   field_goal_attempt   384 non-null    int64  
 7   total_points         384 non-null    int64  
 8   td_points            384 non-null    int64  
 9   xp_points            384 non-null    int64  
 10  fg_points            384 non-null    int64  
 11  fumble               384 non-null    int64  
 12  fumble_lost          384 non-null    int64  
 13  shotgun              384 non-null    int64  
 14  no_huddle            384 non-null    int64  
 15  qb_dropback          384 

## Additional Summary Statistics:

- Total Games Played: You can derive this from wins, losses, and ties.
- Average Points Per Game: Calculate as total_points / games_played.
- Average Yards Per Snap: Calculate as yards_gained / total_snaps.
- Turnover Ratio: Calculate as (fumble + interception) / total_snaps to understand how turnovers affect the game.
- Passing Efficiency: This could be measured as (passing_yards / pass_attempts) to see the effectiveness of the passing game.
- Rushing Efficiency: Similarly, rushing_yards / rush_snaps_count could be a valuable metric.

In [23]:
df['games_played'] = df['wins'] + df['losses'] + df['ties']
df['avg_points_per_game'] = df['total_points'] / df['games_played']
df['avg_yards_per_snap'] = df['yards_gained'] / df['total_snaps']
df['turnover_ratio'] = (df['fumble'] + df['interception']) / df['total_snaps']
df['passing_efficiency'] = df['passing_yards'] / df['pass_attempts']
df['rushing_efficiency'] = df['rushing_yards'] / df['rush_snaps_count']

summary_statistics = df.describe(include='all')  # Include all columns, numerical and categorical
summary_statistics = summary_statistics.applymap(lambda x: f'{x:g}' if isinstance(x, (int, float)) else x)

# Display the summary statistics
print("Summary Statistics:")
display(summary_statistics)

Summary Statistics:


Unnamed: 0,team,season,total_snaps,yards_gained,touchdown,extra_point_attempt,field_goal_attempt,total_points,td_points,xp_points,fg_points,fumble,fumble_lost,shotgun,no_huddle,qb_dropback,pass_snaps_count,pass_snaps_pct,pass_attempts,complete_pass,incomplete_pass,air_yards,passing_yards,pass_td,interception,targets,receptions,receiving_yards,yards_after_catch,receiving_td,pass_fumble,pass_fumble_lost,rush_snaps_count,rush_snaps_pct,qb_scramble,rushing_yards,run_td,run_fumble,run_fumble_lost,home_wins,home_losses,home_ties,away_wins,away_losses,away_ties,wins,losses,ties,win_pct,record,yps,games_played,avg_points_per_game,avg_yards_per_snap,turnover_ratio,passing_efficiency,rushing_efficiency
count,384,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384.0,384,384.0,384.0,384.0,384.0,384.0,384.0,384.0
unique,32,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,42,,,,,,,
top,ARI,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7-9-0,,,,,,,
freq,12,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,37,,,,,,,
mean,,2017.5,1024.26,9670.45,41.651,36.474,26.5911,366.154,249.906,36.474,79.7734,16.9375,8.02865,642.086,102.25,627.474,602.628,0.588307,548.174,357.094,191.081,4593.27,4032.02,26.9531,13.5755,548.174,357.094,4030.8,1851.66,26.9531,10.0286,5.01562,421.633,0.411693,24.875,1859.2,13.9505,6.90885,3.01302,4.5,3.6224,0.03125,3.59115,4.53125,0.03125,8.09115,8.15365,0.0625,0.496836,,9.43904,16.3073,22.4713,9.43906,0.0298257,7.36373,4.38881
std,,3.45656,52.2525,1050.13,9.11448,9.61419,5.58177,64.7982,54.6869,9.61419,16.7453,4.63512,2.89624,146.949,102.215,58.0605,59.8663,0.0466698,59.2602,46.2316,26.6093,647.379,535.805,7.40319,4.49421,59.2602,46.2316,535.79,311.686,7.40319,3.38044,2.24708,49.8091,0.0466698,14.6382,348.128,5.11182,3.03496,1.78105,1.79497,1.79224,0.17422,1.7912,1.78741,0.17422,3.06578,3.07348,0.242377,0.18944,,0.884055,0.494723,4.02433,0.884089,0.00680455,0.664556,0.450092
min,,2012.0,866.0,6761.0,21.0,16.0,8.0,227.0,126.0,16.0,24.0,6.0,1.0,228.0,6.0,486.0,436.0,0.45,361.0,223.0,131.0,2898.0,2598.0,12.0,2.0,361.0,223.0,2589.0,1124.0,12.0,2.0,0.0,314.0,0.31,1.0,1168.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,,7.1,16.0,14.1875,7.09781,0.0121335,5.64348,3.20913
25%,,2014.75,989.0,8916.5,35.0,30.0,23.0,319.75,210.0,30.0,69.0,14.0,6.0,541.5,45.0,587.75,560.0,0.56,504.75,324.75,174.0,4116.25,3644.75,22.0,10.0,504.75,324.75,3643.5,1644.75,22.0,8.0,3.0,386.0,0.38,14.0,1616.5,10.0,5.0,2.0,3.0,2.0,0.0,2.0,3.0,0.0,6.0,6.0,0.0,0.375,,8.7875,16.0,19.5,8.78452,0.0252176,6.88198,4.08093
50%,,2017.5,1018.0,9669.5,40.0,35.0,27.0,361.0,240.0,35.0,81.0,17.0,8.0,638.0,74.5,630.0,607.0,0.59,551.5,357.0,191.0,4571.0,4020.0,26.0,13.0,551.5,357.0,4018.0,1841.0,26.0,10.0,5.0,415.0,0.41,22.0,1820.0,13.0,7.0,3.0,5.0,4.0,0.0,4.0,5.0,0.0,8.0,8.0,0.0,0.5,,9.385,16.0,22.0901,9.38335,0.0295207,7.29151,4.38043
75%,,2020.25,1060.25,10348.2,48.0,43.0,30.0,411.0,288.0,43.0,90.0,20.0,10.0,732.25,123.0,665.0,648.0,0.62,589.0,384.25,209.0,5021.25,4388.75,32.0,16.0,589.0,384.25,4388.75,2054.5,32.0,12.0,6.0,456.25,0.44,34.0,2062.5,17.0,9.0,4.0,6.0,5.0,0.0,5.0,6.0,0.0,10.25,10.0,0.0,0.6305,,10.02,17.0,25.261,10.0202,0.0339308,7.81161,4.67621
