# Fantasy Football Predictions: Weekly Starts and Sits for Wide Receivers, Running Backs, and Tight Ends Prediction Model

**Garrett Bainwol**

## Introduction:

As an avid **Fantasy Football** enthusiast and a **data science student**, combining the two realms seems like a natural progression. The objective of this project is to predict weekly starts and sits for Wide Receivers, Running Backs, and Tight Ends using various metrics, with a primary focus on the concept of "weighted opportunity" as outlined in [this article](https://www.fantasypoints.com/nfl/articles/season/2023/weighted-opportunity-for-rbs#/). We will be using R and Python for this project. I would have used all python however

## Problem Statement:

-   **Type of Learning**: Supervised Machine Learning
-   **Type of Task**: Regression (Predicting Fantasy Points)

## Goal:

**The primary goal is to outperform ESPN's weekly player projections by developing a more accurate prediction model. We aim to see if our model, which incorporates unique metrics like Weighted Opportunities Per Game and Snap Share, can provide superior predictions across three scoring formats: PPR, Half PPR, and TE Premium.**

**This can be seen as calculating a better Expected Fantasy Points Per Game.**

## Metrics of Interest:

-   Weighted Opportunities Per Game

-   Targets Per Game

-   Rush+Receiving Yards Per Game

-   Snap Share

-   Offensive snaps per game

-   Team Average for Passing Yards & Rushing Yards

-   Opponent Average for Allowed Passing Yards and Rushing Yard

-   Weight for offensive line and quarterback injuries & performance

-   Opponent Offense (Time on the field) so maybe a offensive possession metric.

-   Opponent Allowed Yards per Game Avg(Receiving and Rushing)

-   Target share percentage

-   Points Per Touch

-   Receptions/Targets

-   Expected Turnovers(Fumbles)

-   Home vs Away Player Performance

## ESPN Player Projection Metrics

-   Base Points Per Game
-   Utilization by Position
-   Talent of the Defensive opponent
-   Likeliness of Success against Opponent
-   Length of Career(Age)
-   Season Historical Factors
-   History of Effectiveness
-   Independent Projections from Experts

## Hypothesis:

The central hypothesis is that "Weighted Opportunities" is a significant predictor of a player's success in Fantasy Football. By incorporating this metric, along with others, we aim to create a robust prediction model that can outperform traditional projections.

## Further Readings on Fantasy Points

1.  [Highly correlated fantasy stats for rbs](https://www.fantasypoints.com/nfl/articles/2023/fantasy-points-data-most-important-rb-stats#/)

2.  [Top 10 stats for fantasy](https://www.fantasypoints.com/nfl/articles/2023/fantasy-points-data-top-10-stats#/)

3.  [Highly correlated fantasy stats for wrs](https://www.fantasypoints.com/nfl/articles/2023/fantasy-points-data-most-important-wr-stats#/)

4.  [Wins above replacement](https://www.fantasypoints.com/nfl/articles/2023/finding-fantasy-values-using-war#/)

Data set Creation and Data Cleaning

Given the specificity of the metrics, a custom dataset will be curated, pulling data from various sources and ensuring all relevant metrics are captured. we will need to calculate the weighted opportunitity while doing some other data formatting and EDA.

note:[how Weighted Opportunity is traditionally calculated for wide receivers](https://www.nbcsports.com/fantasy/football/news/article-numbers-why-receiver-air-yards-matter) We will be using a different metric still involving target share but not involving air yards at all. From the weighted opportunity link: **WOPR**

WOPR stands for Weighted Opportunity Rating. It takes a player's target share and share of team Air Yards and combines them in a way that best predicts both PPR and standard fantasy points. The formula for WOPR is:

1.5 \* Target Share + 0.7 \* Share of Team Air Yards

This opportunity rating combined with a player's PPR points can help us spot players who under-produced their volume. In other words, WOPR helps us spot buy-lows both in-season and between seasons.

We will be using the metric from the first link in this notebook combined with other metrics to predict fantasy production weighted for the different formats.

## Scoring by Format for Fantasy Football

### PPR (Point Per Reception)

| **Action**                    | **Points** |
|-------------------------------|------------|
| **PPR (Point Per Reception)** |            |
| Rushing Yard                  | 0.1        |
| Rushing Touchdown             | 6          |
| Reception (RBs & WRs)         | 1          |
| Reception (TEs)               | 1          |
| Receiving Yard                | 0.1        |
| Receiving Touchdown           | 6          |

### Half PPR

| **Action**            | **Points** |
|-----------------------|------------|
| **Half PPR**          |            |
| Rushing Yard          | 0.1        |
| Rushing Touchdown     | 6          |
| Reception (RBs & WRs) | 0.5        |
| Reception (TEs)       | 0.5        |
| Receiving Yard        | 0.1        |
| Receiving Touchdown   | 6          |

### TE Premium

| **Action**            | **Points** |
|-----------------------|------------|
| **TE Premium**        |            |
| Rushing Yard          | 0.1        |
| Rushing Touchdown     | 6          |
| Reception (RBs & WRs) | 1          |
| Reception (TEs)       | 1.5        |
| Receiving Yard        | 0.1        |
| Receiving Touchdown   | 6          |

## Notable quote from link regarding the calculation of weighted opportunity:

**On average (over the past five seasons), a single rushing attempt has been worth about 0.62 fantasy points. A target has been worth roughly 1.57 fantasy points in PPR leagues. So, broadly speaking, a target is worth 2.53 times as much as a carry in PPR leagues. In half-point PRR leagues, targets are 1.90 times as valuable.**

**weighing the value of a TE reception in TE Premium would be 2.07 assuming it is PPR for Running Backs and Wide Receivers.**

| Scoring Format | Rushing Attempt Value | Target Value (RBs & WRs) | Target Value (TEs) | Target to Rushing Attempt Ratio (RBs & WRs) |
|---------------|---------------|---------------|---------------|---------------|
| PPR            | 0.62                  | 1.57                     | 1.57               | 2.53                                        |
| Half PPR       | 0.62                  | 1.18                     | 1.18               | 1.90                                        |
| TE Premium     | 0.62                  | 1.57                     | 2.07               | 2.53                                        |

# Creating the Datasets

In [None]:
install.packages("nflreadr")

install.packages("tidyverse")

library(nflreadr)

library(tidyverse)

years <- 2020:2023

# Fetch weekly player stats data for the specified years

player_stats <- load_player_stats(years)

head(player_stats)

colnames(player_stats)

View(player_stats)

summary(player_stats)

### Player_stats Columns

| \#  | Column name             | \#  |                               | \#  |                                 |
|------------|------------|------------|------------|------------|------------|
| 1   | **player_id**           | 18  | **sack_fumbles**              | 35  | **receptions**                  |
| 2   | **player_name**         | 19  | **sack_fumbles_lost**         | 36  | **targets**                     |
| 3   | **player_display_name** | 20  | **passing_air_yards**         | 37  | **receiving_yards**             |
| 4   | **position**            | 21  | **passing_yards_after_catch** | 38  | **receiving_tds**               |
| 5   | **position_group**      | 22  | **passing_first_downs**       | 39  | **receiving_fumbles**           |
| 6   | **headshot_url**        | 23  | **passing_epa**               | 40  | **receiving_fumbles_lost**      |
| 7   | **recent_team**         | 24  | **passing_2pt_conversions**   | 41  | **receiving_air_yards**         |
| 8   | **season**              | 25  | **pacr**                      | 42  | **receiving_yards_after_catch** |
| 9   | **week**                | 26  | **dakota**                    | 43  | **receiving_first_downs**       |
| 10  | **season_type**         | 27  | **carries**                   | 44  | **receiving_epa**               |
| 11  | **completions**         | 28  | **rushing_yards**             | 45  | **receiving_2pt_conversions**   |
| 12  | **attempts**            | 29  | **rushing_tds**               | 46  | **racr**                        |
| 13  | **passing_yards**       | 30  | **rushing_fumbles**           | 47  | **target_share**                |
| 14  | **passing_tds**         | 31  | **rushing_fumbles_lost**      | 48  | **air_yards_share**             |
| 15  | **interceptions**       | 32  | **rushing_first_downs**       | 49  | **wopr**                        |
| 16  | **sacks**               | 33  | **rushing_epa**               | 50  | **special_teams_tds**           |
| 17  | **sack_yards**          | 34  | **rushing_2pt_conversions**   | 51  | **fantasy_points**              |
|     |                         |     |                               | 52  | **fantasy_points_ppr**          |
|     |                         |     |                               | 53  | **opponent_team**               |

Our initial dataset before any cleaning 17,400 entries, 53 total columns.

**Segmentation by Position**: Football positions, such as Wide Receivers, Running Backs, and Tight Ends, have unique metrics and patterns. To tailor our analysis and modeling effectively, we'll be splitting the main dataframe into smaller, position-specific dataframes. This segmentation allows for a more nuanced approach to data cleaning and feature engineering.

In [None]:
player_stats2 <- player_stats %>%

select(-player_name, -headshot_url, -position_group) %>%  # Remove columns player_name, headshot_url, and position_group

rename(player_name = player_display_name) %>% # Rename player_display_name to player_name

filter(season_type != "POST", position != "P") # Remove rows where season_type is 'POST' and position is 'P'

#remove season_type column because all games are regular season

player_stats2 <- player_stats2 %>%

select(-season_type)
#sorting the original dataset into positions 
qb_stats <- player_stats2 %>% filter(position == "QB")

rb_stats <- player_stats2 %>% filter(position == "RB")

wr_stats <- player_stats2 %>% filter(position == "WR")

te_stats <- player_stats2 %>% filter(position == "TE")

View(player_stats2)
View(rb_stats)

`cat("Number of rows in qb_stats:", nrow(qb_stats), "\n")`

**Number of rows in qb_stats: 2013**

`cat("Number of rows in rb_stats:", nrow(rb_stats), "\n")`

**Number of rows in rb_stats: 4265**

`cat("Number of rows in wr_stats:", nrow(wr_stats), "\n")`

**Number of rows in wr_stats: 6653**

`cat("Number of rows in te_stats:", nrow(te_stats), "\n")`

**Number of rows in te_stats: 3294**

# 

# **Data Cleaning**

With segmented data, we can now tackle inconsistencies, missing values, outliers, and other potential issues. This step ensures our model receives the highest quality input, leading to more reliable predictions.

By following this systematic approach, we aim to convert our raw dataset into a refined resource, ready for modeling and analysis.

### sorting data for active players

[Source for Active Roster Data](https://www.footballdb.com/players/current.html)

Active Roster Data from www.footballdb.com which stores free public data. I pulled the active players by position and created a data frame with the names of the players that are currently active. This includes people on the roster who are suspended or injured. We will have to remove retired players because a lot of notable players at various positions stepped away from the game over the last 3 years.

# Count of active players

**There are currently:**

-   [98 active Quarterbacks]{.underline}

-   [296 active Wide Receivers]{.underline}

-   [178 active Running Backs]{.underline}

-   [156 active Tight Ends]{.underline}

This includes backups as well who don't have much fantasy production we want to sort out these players because there sample size is minimal.

###### (Again this includes suspended players and injured players on IR or PUP list)

## Further Data Cleaning

Let's create dataframes for active players(Players who are not retired)

In [None]:
# For Wide Receivers
activewr_stats <- merge(wr_stats, active_wr, by.x="player_name", by.y="Names")

# For Tight Ends
activete_stats <- merge(te_stats, active_te, by.x="player_name", by.y="Names")

# For Running Backs
activerb_stats <- merge(rb_stats, active_rb, by.x="player_name", by.y="Names")

# For Quarterbacks
activeqb_stats <- merge(qb_stats, active_qb, by.x="player_name", by.y="Names")

View(activeqb_stats)


Removing unnecessary columns based on position and qualified metrics

In [None]:
# List of columns to be removed

cols_to_remove <- c("completions", "attempts", "passing_yards", "passing_tds", "interceptions", "sacks", "sack_yards", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa","passing_2pt_conversions", "pacr", "dakota")

# Remove columns from activewr_stats

activewr_stats <- activewr_stats %>%

select(-one_of(cols_to_remove))

activete_stats <- activete_stats %>%

select(-one_of(cols_to_remove))

activerb_stats <- activerb_stats %>%

select(-one_of(cols_to_remove))

# List of columns to be removed for QB

cols_to_remove_qb <- c("air_yard_share", "target_share", "racr", "receiving_2pt_conversions", "receiving_epa", "receiving_first_downs", "receiving_yards_after_catch", "receiving_air_yards", "receiving_fumbles_lost", "receiving_fumbles","receiving_tds", "receiving_yards", "targets", "receptions", 'air_yards_share')

# Remove columns from activeqb_stats

activeqb_stats <- activeqb_stats %>%

select(-one_of(cols_to_remove_qb))

# Rename for Wide Receivers

colnames(activewr_stats)[colnames(activewr_stats) == "recent_team"] <- "team"

colnames(activete_stats)[colnames(activete_stats) == "recent_team"] <- "team"

colnames(activerb_stats)[colnames(activerb_stats) == "recent_team"] <- "team"

colnames(activeqb_stats)[colnames(activeqb_stats) == "recent_team"] <- "team"



#remove position column from datasets since it is in the title

activewr_stats$position <- NULL

activete_stats$position <- NULL

activerb_stats$position <- NULL

activeqb_stats$position <- NULL

**we notice that the opponent for earlier match ups are missing. we want to use that match up data so we will have to find out and back fill our data frame. we will use pro football references match up data.**

[weekly match up data source](https://www.pro-football-reference.com/years/2023/games.html)

I saved the files as csvs directly from the link above

In [None]:
  
#Set directory and filepath to save files 
setwd("replace with your file path")

 PS <- read.csv("player_stats2.csv") 
 M2023 <- read.csv("2023 Match Up Data.csv") 
 M2022 <- read.csv("2022 Matchup Data.csv") 
 M2021 <- read.csv("2021 Matchup data.csv") 
 M2020 <- read.csv("2020 Matchup data.csv") 
 qb <- read.csv("qb_stats.csv") 
 te <- read.csv("te_stats.csv")
 wr <- read.csv("wr_stats.csv") 
 rb <- read.csv("rb_stats.csv") 
 aqb <- read.csv("activeqb_stats.csv")
 ate <- read.csv("activete_stats.csv") 
 awr <- read.csv("activewr_stats.csv")
 arb <- read.csv("activerb_stats.csv")

Upon further inspect we notice that we don't really need the player_id column

In [None]:
#remove the unnecessary player_id columns from our datasets 
qb <- qb[, !(colnames(qb) == 'player_id')]

rb <- rb[, !(colnames(rb) == 'player_id')]

wr <- wr[, !(colnames(wr) == 'player_id')]

te <- te[, !(colnames(te) == 'player_id')]

aqb <- aqb[, !(colnames(aqb) == 'player_id')]

arb <- arb[, !(colnames(arb) == 'player_id')]

awr <- awr[, !(colnames(awr) == 'player_id')]

ate <- ate[, !(colnames(ate) == 'player_id')]


In [None]:


 columns_to_remove <- c("receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "racr","target_share", "air_yards_share", "wopr", "special_teams_tds") 
  qb <- qb[, !(colnames(qb) %in% columns_to_remove)]
  
View(qb)
View(rb) 
View(aqb)
View(arb)
View(ate)
View(awr)
View(te)
View(wr)


In [None]:
#Remove playoffs from Match up data 

 M2020 <- subset(M2020, !(Week %in% c("WildCard", "Division", "ConfChamp", "SuperBowl")))

 M2022 <- subset(M2022, !(Week %in% c("WildCard", "Division", "ConfChamp", "SuperBowl")))
 M2021 <- subset(M2021, !(Week %in% c("WildCard", "Division", "ConfChamp", "SuperBowl")))

In [None]:


#map the full team name to their abbreviations 
team_abbr <- list('Green Bay Packers'='GB', 'New York Jets'='NYJ', 'Dallas Cowboys'='DAL',
                  'Chicago Bears'='CHI', 'New Orleans Saints'='NO', 
                  'Indianapolis Colts'='IND', 'New England Patriots'='NE',
                  'Cleveland Browns'='CLE', 'Carolina Panthers'='CAR', 
                  'Los Angeles Rams'='LA', 'Tampa Bay Buccaneers'='TB',
                  'Cincinnati Bengals'='CIN', 'Denver Broncos'='DEN',
                  'San Francisco 49ers'='SF', 'Jacksonville Jaguars'='JAX',
                  'Houston Texans'='HOU', 'Buffalo Bills'='BUF',
                  'New York Giants'='NYG', 'Detroit Lions'='DET', 
                  'Arizona Cardinals'='ARI', 'Las Vegas Raiders'='LV',
                  'Atlanta Falcons'='ATL', 'Los Angeles Chargers'='LAC',
                  'Philadelphia Eagles'='PHI', 'Seattle Seahawks'='SEA',
                  'Miami Dolphins'='MIA', 'Baltimore Ravens'='BAL',
                  'Pittsburgh Steelers'='PIT', 'Tennessee Titans'='TEN',
                  'Minnesota Vikings'='MIN', 'Washington Football Team'='WAS',
                  'Kansas City Chiefs'='KC')



# Convert the list to a named vector for easier mapping
team_abbr_vector <- unlist(team_abbr)


# Use the named vector to replace values in Winner.tie and Loser.tie columns
#format 2020 matchup data

M2020$Winner.tie <- ifelse(M2020$Winner.tie %in% names(team_abbr_vector), team_abbr_vector[M2020$Winner.tie], M2020$Winner.tie)
M2020$Loser.tie <- ifelse(M2020$Loser.tie %in% names(team_abbr_vector), team_abbr_vector[M2020$Loser.tie], M2020$Loser.tie)


#format 2021 match up data

 M2021$Winner.tie <- ifelse(M2021$Winner.tie %in% names(team_abbr_vector), team_abbr_vector[M2021$Winner.tie], M2021$Winner.tie)
 M2021$Loser.tie <- ifelse(M2021$Loser.tie %in% names(team_abbr_vector), team_abbr_vector[M2021$Loser.tie], M2021$Loser.tie)

#Update team abbreviation vector for Washington Football team
 #(name change in 2022)
 
team_abbr_vector["Washington Commanders"] <- "WAS"

#format 2022 match up data

M2022$Winner.tie <- ifelse(M2022$Winner.tie %in% names(team_abbr_vector), team_abbr_vector[M2022$Winner.tie], M2022$Winner.tie)
M2022$Loser.tie <- ifelse(M2022$Loser.tie %in% names(team_abbr_vector), team_abbr_vector[M2022$Loser.tie], M2022$Loser.tie)

#format 2023 match up data

M2023$Winner.tie <- ifelse(M2023$Winner.tie %in% names(team_abbr_vector), team_abbr_vector[M2023$Winner.tie], M2023$Winner.tie)
M2023$Loser.tie <- ifelse(M2023$Loser.tie %in% names(team_abbr_vector), team_abbr_vector[M2023$Loser.tie], M2023$Loser.tie)



In [None]:
# Extracting necessary columns for M2020
winners_2020 <- M2020[, c("Week", "Winner.tie", "Loser.tie")]
names(winners_2020) <- c("Week", "team", "opponent_team")

losers_2020 <- M2020[, c("Week", "Loser.tie", "Winner.tie")]
names(losers_2020) <- c("Week", "team", "opponent_team")

# Binding rows together
combined_2020 <- rbind(winners_2020, losers_2020)
#sort
combined_2020 <- combined_2020[order(combined_2020$Week), ]

#For M2021
winners_2021 <- M2021[, c("Week", "Winner.tie", "Loser.tie")]
names(winners_2021) <- c("Week", "team", "opponent_team")
 
losers_2021 <- M2021[, c("Week", "Loser.tie", "Winner.tie")]
names(losers_2021) <- c("Week", "team", "opponent_team")
 
combined_2021 <- rbind(winners_2021, losers_2021)
combined_2021 <- combined_2021[order(combined_2021$Week), ]
 
# For M2022
winners_2022 <- M2022[, c("Week", "Winner.tie", "Loser.tie")]
names(winners_2022) <- c("Week", "team", "opponent_team")
 
losers_2022 <- M2022[, c("Week", "Loser.tie", "Winner.tie")]
names(losers_2022) <- c("Week", "team", "opponent_team")
 
combined_2022 <- rbind(winners_2022, losers_2022)
combined_2022 <- combined_2022[order(combined_2022$Week), ]
 
 # For M2023
winners_2023 <- M2023[, c("Week", "Winner.tie", "Loser.tie")]
names(winners_2023) <- c("Week", "team", "opponent_team")
 
losers_2023 <- M2023[, c("Week", "Loser.tie", "Winner.tie")]
names(losers_2023) <- c("Week", "team", "opponent_team")
 
combined_2023 <- rbind(winners_2023, losers_2023)
combined_2023 <- combined_2023[order(combined_2023$Week), ]


#consistent formatting across datasets
names(combined_2020)[names(combined_2020) == "Week"] <- "week"
names(combined_2021)[names(combined_2021) == "Week"] <- "week"
names(combined_2022)[names(combined_2022) == "Week"] <- "week"
names(combined_2023)[names(combined_2023) == "Week"] <- "week"


Let's backfill our datasets with the misssing values for opponent team

In [None]:
# Split the qb dataset by season
qb_splits <- split(qb, qb$season)

# Fill missing opponent_team values for each split
qb_splits$'2020' <- merge(qb_splits$'2020', combined_2020, by=c("week", "team"), all.x=TRUE)
qb_splits$'2021' <- merge(qb_splits$'2021', combined_2021, by=c("week", "team"), all.x=TRUE)
qb_splits$'2022' <- merge(qb_splits$'2022', combined_2022, by=c("week", "team"), all.x=TRUE)
qb_splits$'2023' <- merge(qb_splits$'2023', combined_2023, by=c("week", "team"), all.x=TRUE)

# Update the opponent_team column with the values from the combined datasets
qb_splits$'2020'$opponent_team <- ifelse(is.na(qb_splits$'2020'$opponent_team.x), qb_splits$'2020'$opponent_team.y, qb_splits$'2020'$opponent_team.x)
qb_splits$'2021'$opponent_team <- ifelse(is.na(qb_splits$'2021'$opponent_team.x), qb_splits$'2021'$opponent_team.y, qb_splits$'2021'$opponent_team.x)
qb_splits$'2022'$opponent_team <- ifelse(is.na(qb_splits$'2022'$opponent_team.x), qb_splits$'2022'$opponent_team.y, qb_splits$'2022'$opponent_team.x)
qb_splits$'2023'$opponent_team <- ifelse(is.na(qb_splits$'2023'$opponent_team.x), qb_splits$'2023'$opponent_team.y, qb_splits$'2023'$opponent_team.x)

# Combine the splits back together
qb_updated <- do.call(rbind, qb_splits)
qb<-qb_updated
View(qb_updated)


rb_splits <- split(rb, rb$season)
rb_splits$'2020' <- merge(rb_splits$'2020', combined_2020, by=c("week", "team"), all.x=TRUE)
rb_splits$'2021' <- merge(rb_splits$'2021', combined_2021, by=c("week", "team"), all.x=TRUE)
rb_splits$'2022' <- merge(rb_splits$'2022', combined_2022, by=c("week", "team"), all.x=TRUE)
rb_splits$'2023' <- merge(rb_splits$'2023', combined_2023, by=c("week", "team"), all.x=TRUE)
rb_updated <- do.call(rbind, rb_splits)
rb_updated$opponent_team <- ifelse(is.na(rb_updated$opponent_team.x), rb_updated$opponent_team.y, rb_updated$opponent_team.x)
rb_updated <- subset(rb_updated, select = -c(opponent_team.x, opponent_team.y))
rb<-rb_updated


 # Merging data for tight ends
te_splits <- split(te, te$season)
te_splits$'2020' <- merge(te_splits$'2020', combined_2020, by=c("week", "team"), all.x=TRUE)
te_splits$'2021' <- merge(te_splits$'2021', combined_2021, by=c("week", "team"), all.x=TRUE)
te_splits$'2022' <- merge(te_splits$'2022', combined_2022, by=c("week", "team"), all.x=TRUE)
te_splits$'2023' <- merge(te_splits$'2023', combined_2023, by=c("week", "team"), all.x=TRUE)
te_updated <- do.call(rbind, te_splits)
te_updated$opponent_team <- ifelse(is.na(te_updated$opponent_team.x), te_updated$opponent_team.y, te_updated$opponent_team.x)
te_updated <- subset(te_updated, select = -c(opponent_team.x, opponent_team.y))

#Merging data for wide receivers 
wr_splits <- split(wr, wr$season)
wr_splits$'2020' <- merge(wr_splits$'2020', combined_2020, by=c("week", "team"), all.x=TRUE)
wr_splits$'2021' <- merge(wr_splits$'2021', combined_2021, by=c("week", "team"), all.x=TRUE)
wr_splits$'2022' <- merge(wr_splits$'2022', combined_2022, by=c("week", "team"), all.x=TRUE)
wr_splits$'2023' <- merge(wr_splits$'2023', combined_2023, by=c("week", "team"), all.x=TRUE)
wr_updated <- do.call(rbind, wr_splits)
wr_updated$opponent_team <- ifelse(is.na(wr_updated$opponent_team.x), wr_updated$opponent_team.y, wr_updated$opponent_team.x)
wr_updated <- subset(wr_updated, select = -c(opponent_team.x, opponent_team.y))
wr<-wr_updated




In [None]:
rb_names  <-  c(
    "Israel Abanikanda", "Ameer Abdullah", "Devon Achane", "Salvon Ahmed",
    "Cam Akers", "Tyler Allgeier", "Alex Armah", "Tyler Badie",
    "Saquon Barkley", "Nick Bawden", "Andrew Beck", "Greg Bell",
    "Eno Benjamin", "Cartavious Bigsby", "Raheem Blackshear", "Khari Blasingame",
    "Brandon Bolden", "Mike Boone", "Matt Breida", "Gary Brightwell",
    "Christopher Brooks", "Brittain Brown", "Chase Brown", "Robert Burns",
    "Michael Burton", "Jason Cabinda", "Michael Carter", "Ty Chandler",
    "Zach Charbonnet", "Julius Chestnut", "Nick Chubb", "Corey Clement",
    "Tarik Cohen", "Jack Colletto", "James Conner", "Snoop Conner",
    "Dalvin Cook", "James Cook", "Jashaun Corbin", "DeeJay Dallas",
    "Malik Davis", "Tyrion Davis-Price", "Emari Demercado", "AJ Dillon",
    "Gerrid Doaks", "JK Dobbins", "Elijah Dotson", "Rico Dowdle",
    "Chase Edmonds", "Gus Edwards", "Clyde Edwards-Helaire", "Austin Ekeler",
    "Ezekiel Elliott", "Travis Etienne", "Chris Evans", "Darrynton Evans",
    "Zach Evans", "Demetric Felton", "Jerome Ford", "D'Onta Foreman",
    "Royce Freeman", "Jake Funk", "Kenny Gainwell", "Myles Gaskin",
    "Jahmyr Gibbs", "Antonio Gibson", "Reggie Gilliam", "Tyler Goodson",
    "Melvin Gordon", "Derrick Gore", "Alfonzo Graham", "Eric Gray",
    "Troy Hairston", "Breece Hall", "Hassan Hall", "C.J. Ham",
    "Damien Harris", "Kevin Harris", "Najee Harris", "Hassan Haskins",
    "JaMycal Hasty", "Derrick Henry", "Khalil Herbert", "Justice Hill",
    "Nyheim Hines", "Travis Homer", "Alexander Horvath", "Chuba Hubbard",
    "Evan Hull", "Godwin Igwebuike", "Alec Ingold", "Keaontay Ingram",
    "Deon Jackson", "Josh Jacobs", "D'Ernest Johnson", "Jakob Johnson",
    "Roschon Johnson", "Ty Johnson", "Aaron Jones", "Taiwan Jones",
    "Tony Jones", "Kyle Juszczyk", "Alvin Kamara", "Joshua Kelley",
    "Zonovan Knight", "Patrick Laird", "Hunter Luepke", "Marlon Mack",
    "Jordan Mason", "Alexander Mattison", "DeWayne McBride", "Christian McCaffrey",
    "Sincere McCormick", "Anthony McFarland", "Kenny McIntosh", "Jerick McKinnon",
    "Jaleel McLaughlin", "Kendre Miller", "Jordan Mims", "Elijah Mitchell",
    "Keaton Mitchell", "Joe Mixon", "David Montgomery", "Zack Moss",
    "Raheem Mostert", "Latavius Murray", "Chris Myarick", "Kene Nwangwu",
    "Dare Ogunbowale", "Qadree Ollison", "Isiah Pacheco", "Jacques Patrick",
    "Cordarrelle Patterson", "Jaret Patterson", "Henry Pearson", "Rashaad Penny",
    "Lamical Perine", "Samaje Perine", "Dameon Pierce", "Tony Pollard",
    "Adam Prentice", "Deneric Prince", "Craig Reynolds", "Patrick Ricard",
    "Ronnie Rivers", "Bijan Robinson", "Brian Robinson", "Christopher Rodriguez",
    "Miles Sanders", "Boston Scott", "Devin Singletary", "Keith Smith",
    "Tyjae Spears", "Isaiah Spiller", "Rhamondre Stevenson", "Pierre Strong",
    "D'Andre Swift", "Jonathan Taylor", "Patrick Taylor", "SaRodorick Thompson",
    "Sean Tucker", "Xazavian Valladay", "Deuce Vaughn", "Ke'Shawn Vaughn",
    "Kenneth Walker", "Austin Walter", "Jaylen Warren", "Dwayne Washington",
    "Rachaad White", "Zamir White", "Avery Williams", "Jamaal Williams",
    "Javonte Williams", "Kyren Williams", "Trayveon Williams", "Emanuel Wilson",
    "Jeffery Wilson", "Owen Wright"
)


wr_names <- c(
    "Davante Adams", "Jordan Addison", "Nelson Agholor", "Jamal Agnew", "Brandon Aiyuk",
    "Maurice Alexander", "Josh Ali", "Devon Allen", "Kazmeir Allen", "Keenan Allen",
    "Robby Anderson", "Daniel Arias", "Marcell Ateman", "Chatarius Atwell", "Calvin Austin",
    "Andre Baccellia", "Michael Bandy", "Rashod Bateman", "Cole Beasley", "Odell Beckham Jr.",
    "David Bell", "Ronnie Bell", "Braxton Berrios", "Jake Bobo", "Kendrick Bourne",
    "Kayshon Boutte", "Lynn Bowden", "Tyler Boyd", "Miles Boykin", "Jalen Brooks",
    "A.J. Brown", "Dyami Brown", "Marquise Brown", "Noah Brown", "Jason Brownlee",
    "Treylon Burks", "Terrell Bynum", "Marquez Callaway", "Parris Campbell", "DeAndre Carter",
    "D.J. Chark", "JaMarr Chase", "Chase Claypool", "Randall Cobb", "Keelan Cole",
    "Nico Collins", "Chris Conley", "Brandin Cooks", "Elijah Cooks", "Amari Cooper",
    "Jacob Copeland", "Britain Covey", "River Cracraft", "Jalen Cropper", "Jamison Crowder",
    "Jaelon Darden", "Derius Davis", "Gabriel Davis", "Kaden Davis", "Shaquan Davis",
    "Nathaniel Dell", "Stefon Diggs", "Phillip Dorsett", "Greg Dortch", "Keelan Doss",
    "Jahan Dotson", "Romeo Doubs", "Demario Douglas", "Colton Dowell", "Josh Downs",
    "Dylan Drummond", "Grant DuBose", "Ashton Dulin", "David Durden", "Devin Duvernay",
    "Alex Erickson", "D'Wayne Eskridge", "Mike Evans", "Erik Ezukanma", "Simi Fehoko",
    "Dez Fitzpatrick", "Zay Flowers", "Bryce Ford-Wheaton", "Daurice Fountain", "Russell Gage",
    "Michael Gallup", "Xavier Gipson", "Chris Godwin", "Marquise Goodwin", "Jakeem Grant",
    "Danny Gray", "Antoine Green", "Jalen Guyton", "Mecole Hardman", "Tre'shaun Harrison",
    "N'Keal Harry", "Penny Hart", "Deonte Harty", "Malik Heath", "Ra'shaun Henry",
    "Tee Higgins", "Tyreek Hill", "Khadarel Hodge", "Isaiah Hodgins", "Mack Hollins",
    "DeAndre Hopkins", "Dennis Houston", "Lil'Jordan Humphrey", "Xavier Hutchinson", "Jalin Hyatt",
    "Andrei Iosivas", "Trenton Irwin", "Andy Isabella", "Kearis Jackson", "Lucky Jackson",
    "Shedrick Jackson", "Trishton Jackson", "Richie James", "Rakim Jarrett", "Justin Jefferson",
    "Van Jefferson", "Jauan Jennings", "Jerry Jeudy", "Brandon Johnson", "Cade Johnson",
    "Cephus Johnson", "Diontae Johnson", "Johnny Johnson", "Tyler Johnson", "Tyron Johnson",
    "Quentin Johnston", "Charlie Jones", "Marvin Jones", "Tim Jones", "Velus Jones",
    "Zay Jones", "Mason Kinsey", "Christian Kirk", "Keith Kirkwood", "Malik Knowles",
    "Cooper Kupp", "CeeDee Lamb", "Matt Landers", "Kwamie Lassiter", "Allen Lazard",
    "Tyler Lockett", "Drake London", "T.J. Luther", "Xavier Malone", "Terrace Marshall",
    "Tay Martin", "Jesse Matthews", "Ray Ray McCloud", "Lance McCutcheon", "Isaiah McKenzie",
    "Terry McLaurin", "Racey McMath", "Bo Melton", "Kirk Merritt", "D.K. Metcalf",
    "John Metchie", "Jakobi Meyers", "Ryan Miller", "Scott Miller", "Dax Milne",
    "Marvin Mims", "Jonathan Mingo", "D.J. Montgomery", "Ty Montgomery", "Darnell Mooney",
    "Chris Moore", "D.J. Moore", "David Moore", "Elijah Moore", "Jaylon Moore",
    "Rondale Moore", "Skyy Moore", "Stanley Morgan", "Puka Nacua", "Jalen Nailor",
    "Joseph Ngata", "Tre Nixon", "Chris Olave", "Gunner Olszewski", "K.J. Osborn",
    "Josh Palmer", "Trey Palmer", "DeVante Parker", "Zach Pascal", "Tim Patrick",
    "Donovan Peoples-Jones", "A.T. Perry", "Kyle Philips", "George Pickens", "Alec Pierce",
    "Michael Pittman", "Brandon Powell", "Cornell Powell", "Byron Pringle", "Kalif Raymond",
    "Jalen Reagor", "Jayden Reed", "Joe Reed", "Nikko Remigio", "Hunter Renfrow",
    "Josh Reynolds", "Rashee Rice", "Calvin Ridley", "Allen Robinson", "Demarcus Robinson",
    "Wan'Dale Robinson", "Amari Rodgers", "Justyn Ross", "Sean Ryan", "Curtis Samuel",
    "Deebo Samuel", "Braylon Sanders", "C.J. Saunders", "Anthony Schwartz", "Tyler Scott",
    "Mathew Sexton", "Rashid Shaheed", "Khalil Shakir", "Tyrell Shavers", "Laviska Shenault",
    "Sterling Shepard", "Trent Sherfield", "Justin Shorter", "David Sills", "Cam Sims",
    "Steven Sims", "Bennett Skowronek", "Matt Slater", "Darius Slayton", "DeVonta Smith",
    "Tre'Quan Smith", "Xavier Smith", "Ihmir Smith-Marsette", "Jaxon Smith-Njigba",
    "JuJu Smith-Schuster", "Willie Snead", "Amon-Ra St. Brown", "Equanimeous St. Brown",
    "John Stephens", "Mike Strachan", "Courtland Sutton", "Malik Taylor", "Trent Taylor",
    "Adam Thielen", "Michael Thomas", "Thayer Thomas", "Deven Thompkins", "Bryan Thompson",
    "Cody Thompson", "Tyquan Thornton", "Cedric Tillman", "Mitchell Tinsley", "Jalen Tolbert",
    "Kadarius Toney", "Samori Toure", "Austin Trammell", "Laquon Treadwell", "Brycen Tremayne",
    "Tre Tucker", "KaVontae Turpin", "Marquez Valdes-Scantling", "Jalen Virgil", "Jaylen Waddle",
    "Tylan Wallace", "Greg Ward", "Montrell Washington", "Parker Washington", "Austin Watkins",
    "Quez Watkins", "Christian Watson", "Justin Watson", "Jared Wayne", "Raleigh Webb",
    "Nsimba Webster", "Nick Westbrook", "Dontayvion Wicks", "Kristian Wilkerson", "Jameson Williams",
    "Mike Williams", "Seth Williams", "Cedrick Wilson", "Garrett Wilson", "Michael Wilson",
    "Juwann Winfree", "Isaiah Winstead", "Easop Winston", "Michael Woods", "Robert Woods",
    "Derek Wright", "Dareke Young", "Olamide Zaccheaus"
)



te_names <- c("Nate Adkins", "Jordan Akins", "Mo Alie-Cox", "Davis Allen",
"Mark Andrews", "John Bates", "Blake Bell", "Daniel Bellinger",
"Nick Bowers", "Pharaoh Brown", "Harrison Bryant", "Matt Bushman",
"Lawrence Cager", "Grant Calcaterra", "Stephen Carlson", "Tyler Conklin",
"Tanner Conner", "Darrell Daniels", "Zach Davidson", "Tyler Davis",
"Josiah Deguara", "Brandon Dillon", "Will Dissly", "Greg Dulcich",
"Payne Durham", "Ross Dwelley", "Evan Engram", "Zach Ertz", "Gerald Everett",
"Noah Fant", "Princeton Fant", "Luke Farrell", "Jake Ferguson", "Tucker Fisk",
"John FitzPatrick", "Miller Forristall", "Joe Fortson", "Cole Fotheringham",
"Feleipe Franks", "Pat Freiermuth", "Troy Fumagalli", "Zach Gentry","Mike Gesicki","Dallas Goedert", "Jimmy Graham", "Kylen Granson", "Noah Gray", "Jacob Harris", "Peyton Hendershot", "Hunter Henry", "Parker Hesse", "Connor Heyward", "Austin Hooper", "Brycen Hopkins", "T.J. Hockenson", "Curtis Hodges", "Austin Hooper", "Brycen Hopkins", "T.J. Hockenson", "Curtis Hodges", "Austin Hooper", "Julian Hill", "George Kittle", "Cole Kmet", "Dawson Knox", "Charlie Kolar", "Tucker Kraft", "Tyler Kroft", "Lucas Krull","Zack Kuntz", "Sam LaPorta", "Cameron Latu", "Marcedes Lewis", "Isaiah Likely", "Hunter Long", "Tyler Mabry", "Will Mallory", "Chris Manhertz", "Ben Mason", "Michael Mayer", "Trey McBride", "Sean McKeon", "Tre' McKitty", "James Mitchell", "Zaire Mitchell-Paden", "Foster Moreau", "Quintin Morris", "John Mundt", "Jordan Murray", "Nick Muse", "Luke Musgrave", "David Njoku", "Thomas Odukoya", "Andrew Ogletree", "Chigoziem Okonkwo", "Albert Okwuegbunam", "Josh Oliver", "Cade Otton", "Donald Parham", "Derek Parish", "Colby Parkinson",
"Josh Pederson", "Kyle Pitts", "Gerrit Prince", "MyCole Pruitt", "Teagan Quitoriano", "Kevin Rader", "Giovanni Ricci", "Armani Rogers", "Jeremy Ruckert", "Brady Russell", "Drew Sample", "Luke Schoonmaker", "Dalton Schultz", "Bernhard Seikovits", "John Samuel Shenker", "Ben Sims", "Stone Smartt", "Irv Smith", "Jonnu Smith", "Kaden Smith", "Durham Smythe", "Matt Sokol", "Jack Stoll", "Brenton Strange", "Stephen Sullivan", "Geoff Swaim", "Tommy Sweeney", "Tanner Taula", "Leonard Taylor", "Ian Thomas", "Jordan Thomas", "Logan Thomas", "Robert Tonyan", "Adam Trautman", "Tommy Tremble", "Cole Turner", "C.J. Uzomah", "Nick Vannett", "Travis Vokolek", "Darren Waller", "Darnell Washington", "Leroy Watson", "David Wells", "Trevon Wesco", "Blake Whiteheart", "Josh Whyle", "Mitchell Wilcox", "Rodney Williams", "Brayden Willis", "Joel Wilson", "Charlie Woerner", "Jelani Woods", "Brock Wright", "Kenny Yeboah", "Shane Zylstra")



qb_names <- c("Brandon Allen", "Josh Allen", "Kyle Allen", "Tyson Bagent",

"C.J. Beathard", "Stetson Bennett", "David Blough", "Tim Boyle",

"Teddy Bridgewater", "Jacoby Brissett", "Jake Browning", "Shane Buechele",

"Joe Burrow", "Derek Carr", "Sean Clifford", "Kirk Cousins",

"Malik Cunningham", "Andy Dalton", "Sam Darnold", "Tommy DeVito",

"Ben DiNucci", "Joshua Dobbs", "Jeff Driskel", "Max Duggan",

"Sam Ehlinger", "Justin Fields", "Jake Fromm", "Blaine Gabbert",

"Jimmy Garoppolo", "Jared Goff", "Will Grier", "Jake Haener",

"Jaren Hall", "Taylor Heinicke", "Justin Herbert", "Taysom Hill",

"Hendon Hooker", "Sam Howell", "Brian Hoyer", "Tyler Huntley",

"Jalen Hurts", "Lamar Jackson", "Josh Johnson", "Daniel Jones",

"Mac Jones", "Case Keenum", "Trey Lance", "Trevor Lawrence",

"Will Levis", "Drew Lock", "Jordan Love", "Patrick Mahomes",

"Marcus Mariota", "Baker Mayfield", "Alex McGough", "Tanner McKee",

"Davis Mills", "Gardner Minshew", "Nick Mullens", "Kyler Murray",

"Aidan O'Connell", "Chris Oladokun", "Nathan Peterman", "Kenny Pickett",

"Dak Prescott", "Brock Purdy", "Anthony Richardson", "Desmond Ridder",

"Aaron Rodgers", "Nathan Rourke", "Mason Rudolph", "Cooper Rush",

"Brett Rypien", "Geno Smith", "Matthew Stafford", "Easton Stick",

"Jarrett Stidham", "C.J. Stroud", "Nate Sudfeld", "Tua Tagovailoa",

"Ryan Tannehill", "Tyrod Taylor", "Skylar Thompson", "Dorian Thompson-Robinson",

"Kyle Trask", "Mitchell Trubisky", "Clayton Tune", "Phillip Walker",

"Deshaun Watson", "Mike White", "Malik Willis", "Russell Wilson",

"Zach Wilson", "Jameis Winston", "John Wolford", "Logan Woodside",

"Bryce Young", "Bailey Zappe")

active_rb <- data.frame(Names = rb_names)
active_wr <- data.frame(Names = wr_names)
active_te <- data.frame(Names = te_names)
active_qb <- data.frame(Names = qb_names)

View(active_rb)
View(active_wr)
View(active_te)
View(active_qb)

In [None]:
library(dplyr)

# Filter for active QBs
aqb <- semi_join(qb_updated, active_qb, by = c("player_name" = "Names"))

# Filter for active RBs
arb <- semi_join(rb_updated, active_rb, by = c("player_name" = "Names"))

# Filter for active WRs
awr <- semi_join(wr_updated, active_wr, by = c("player_name" = "Names"))

# Filter for active TEs
ate <- semi_join(te_updated, active_te, by = c("player_name" = "Names"))


In [None]:
# Split PS by season
PS_splits <- split(PS, PS$season)

# Merge opponent_team for each season
PS_splits$'2020' <- merge(PS_splits$'2020', combined_2020, by=c("week", "team"), all.x=TRUE)
PS_splits$'2021' <- merge(PS_splits$'2021', combined_2021, by=c("week", "team"), all.x=TRUE)
PS_splits$'2022' <- merge(PS_splits$'2022', combined_2022, by=c("week", "team"), all.x=TRUE)
PS_splits$'2023' <- merge(PS_splits$'2023', combined_2023, by=c("week", "team"), all.x=TRUE)

# Combine all seasons back
PS_updated <- do.call(rbind, PS_splits)

# Sort the dataset in chronological order
PS_updated <- PS_updated[order(PS_updated$season, PS_updated$week), ]
PS<-PS_updated

View(PS_updated)

Now that we have our player data formatted lets back fill the defensive stats and rankings of opposing teams.\
\
I want to pull all the defensive data into a separate data set. again I will be using data from footballdb.com

In [None]:
Team <- c("Los Angeles Rams", "Washington Football Team", "Pittsburgh Steelers", "New Orleans Saints", "San Francisco 49ers", 
          "Tampa Bay Buccaneers", "Baltimore Ravens", "Indianapolis Colts", "Green Bay Packers", "Los Angeles Chargers",
          "Chicago Bears", "New York Giants", "Arizona Cardinals", "Buffalo Bills", "New England Patriots",  
          "Kansas City Chiefs", "Cleveland Browns", "Carolina Panthers", "Philadelphia Eagles", "Miami Dolphins",
          "Denver Broncos", "Seattle Seahawks", "Dallas Cowboys", "New York Jets", "Las Vegas Raiders",
          "Cincinnati Bengals", "Minnesota Vikings", "Tennessee Titans", "Atlanta Falcons", "Houston Texans",
          "Jacksonville Jaguars", "Detroit Lions")

Gms = rep(17, 32)


TotPts <- c(289, 404, 365, 303, 371, 407, 335, 322, 371, 385, 366, 354, 353, 439, 373, 365, 372, 376, 358, 457, 416, 434, 459, 398, 392, 459, 364, 366, 467, 426, 452, 504)

PtsG <- c(17.0, 23.8, 21.5, 17.8, 21.8, 23.9, 19.7, 18.9, 21.8, 22.6, 21.5, 20.8, 20.8, 25.8, 21.9, 21.5, 21.9, 22.1, 21.1, 26.9, 24.5, 25.5, 27.0, 23.4, 23.1, 27.0, 21.4, 21.5, 27.5, 25.1, 26.6, 29.6)

RushYds <- c(1866, 1935, 1760, 2103, 1857, 2127, 1589, 1892, 1855, 1834, 1952, 1438, 1573, 1943, 1867, 1854, 1754, 1742, 1918, 2127, 2193, 1775, 2361, 2483, 1436, 2242, 1999, 1932, 2296, 2222, 2418, 2351)

RYdsG <- c(109.8, 113.8, 103.5, 123.7, 109.2, 125.1, 93.5, 111.3, 109.1, 107.9, 114.8, 84.6, 92.5, 114.3, 109.8, 109.1, 103.2, 102.5, 112.8, 125.1, 129.0, 104.4, 138.9, 146.1, 84.5, 131.9, 117.6, 113.6, 135.1, 130.7, 142.2, 138.3) 

PassYds <- c(2771, 3266, 3510, 3181, 3439, 3257, 3821, 3652, 3724, 3756, 3645, 4169, 4062, 3789, 3871, 3980, 4109, 4222, 4049, 3875, 3839, 4333, 3761, 3656, 4742, 3952, 4273, 4513, 4160, 4300, 4117, 4409)

PYdsG <- c(163.0, 192.1, 206.5, 187.1, 202.3, 191.6, 224.8, 214.8, 219.1, 220.9, 214.4, 245.2, 238.9, 222.9, 227.7, 234.1, 241.7, 248.4, 238.2, 227.9, 225.8, 254.9, 221.2, 215.1, 278.9, 232.5, 251.4, 265.5, 244.7, 252.9, 242.2, 259.4)

TotYds <- c(4637, 5201, 5270, 5284, 5296, 5384, 5410, 5544, 5579, 5590, 5597, 5607, 5635, 5732, 5738, 5834, 5863, 5964, 5967, 6002, 6032, 6108, 6122, 6139, 6178, 6194, 6272, 6445, 6456, 6522, 6535, 6760)

YdsG <- c(272.8, 305.9, 310.0, 310.8, 311.5, 316.7, 318.2, 326.1, 328.2, 328.8, 329.2, 329.8, 331.5, 337.2, 337.5, 343.2, 344.9, 350.8, 351.0, 353.1, 354.8, 359.3, 360.1, 361.1, 363.4, 364.4, 368.9, 379.1, 379.8, 383.6, 384.4, 397.6)

# Create dataframe
NFL_2021 <- data.frame(Team, Gms, TotPts, PtsG, RushYds, RYdsG, PassYds, PYdsG, TotYds, YdsG)
View(NFL_2021)

In [None]:
# Create vectors for each stat
Team <- c("San Francisco 49ers", "Philadelphia Eagles", "Washington Commanders", "New York Jets", "New Orleans Saints", "Buffalo Bills", "Denver Broncos", "New England Patriots", "Baltimore Ravens", "Tampa Bay Buccaneers", "Kansas City Chiefs", "Dallas Cowboys", "Pittsburgh Steelers", "Cleveland Browns", "Indianapolis Colts", "Cincinnati Bengals", "Green Bay Packers", "Miami Dolphins", "Los Angeles Rams", "Los Angeles Chargers", "Arizona Cardinals", "Carolina Panthers", "Tennessee Titans", "Jacksonville Jaguars", "New York Giants", "Seattle Seahawks", "Atlanta Falcons", "Las Vegas Raiders", "Chicago Bears", "Houston Texans", "Minnesota Vikings", "Detroit Lions")

Gms <- c(17, 17, 17, 17, 17, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17)

TotPts <- c(277, 344, 343, 316, 345, 286, 359, 347, 315, 358, 369, 342, 346, 381, 427, 322, 371, 399, 384, 384, 449, 374, 359, 350, 371, 401, 386, 418, 463, 420, 427, 427)

PtsG <- c(16.3, 20.2, 20.2, 18.6, 20.3, 17.9, 21.1, 20.4, 18.5, 21.1, 21.7, 20.1, 20.4, 22.4, 25.1, 20.1, 21.8, 23.5, 22.6, 22.6, 26.4, 22.0, 21.1, 20.6, 21.8, 23.6, 22.7, 24.6, 27.2, 24.7, 25.1, 25.1)

RushYds <- c(1321, 2068, 1926, 2068, 2218, 1673, 1866, 1793, 1566, 2052, 1823, 2198, 1838, 2295, 2109, 1706, 2372, 1751, 1956, 2478, 2016, 2085, 1307, 1951, 2455, 2554, 2214, 2087, 2674, 2894, 2093, 2491)

RYdsG <- c(77.7, 121.6, 113.3, 121.6, 130.5, 104.6, 109.8, 105.5, 92.1, 120.7, 107.2, 129.3, 108.1, 135.0, 124.1, 106.6, 139.5, 103.0, 115.1, 145.8, 118.6, 122.6, 76.9, 114.8, 144.4, 150.2, 130.2, 122.8, 157.4, 170.2, 123.1, 146.5)

PassYds <- c(3789, 3057, 3252, 3220, 3134, 3433, 3574, 3681, 3947, 3461, 3756, 3415, 3779, 3336, 3569, 3665, 3349, 3992, 3842, 3406, 3915, 3868, 4671, 4055, 3638, 3595, 3942, 4129, 3716, 3558, 4151, 4179) 

PYdsG <- c(222.9, 179.8, 191.3, 189.4, 184.4, 214.6, 210.2, 216.5, 232.2, 203.6, 220.9, 200.9, 222.3, 196.2, 209.9, 229.1, 197.0, 234.8, 226.0, 200.4, 230.3, 227.5, 274.8, 238.5, 214.0, 211.5, 231.9, 242.9, 218.6, 209.3, 244.2, 245.8)

TotYds <- c(5110, 5125, 5178, 5288, 5352, 5106, 5440, 5474, 5513, 5513, 5579, 5613, 5617, 5631, 5678, 5371, 5721, 5743, 5798, 5884, 5931, 5953, 5978, 6006, 6093, 6149, 6156, 6216, 6390, 6452, 6244, 6670)

YdsG <- c(300.6, 301.5, 304.6, 311.1, 314.8, 319.1, 320.0, 322.0, 324.3, 324.3, 328.2, 330.2, 330.4, 331.2, 334.0, 335.7, 336.5, 337.8, 341.1, 346.1, 348.9, 350.2, 351.6, 353.3, 358.2, 361.7, 362.1, 365.6, 375.9, 379.5, 367.8, 392.4)

# Create dataframe
NFL_2022 <- data.frame(Team, Gms, TotPts, PtsG, RushYds, RYdsG, PassYds, PYdsG, TotYds, YdsG)

View(NFL_2022)

In [None]:
Team <- c("San Francisco 49ers", "Philadelphia Eagles", "Washington Commanders", "New York Jets", "New Orleans Saints", "Buffalo Bills", "Denver Broncos", "New England Patriots", "Baltimore Ravens", "Tampa Bay Buccaneers", "Kansas City Chiefs", "Dallas Cowboys", "Pittsburgh Steelers", "Cleveland Browns", "Indianapolis Colts", "Cincinnati Bengals", "Green Bay Packers", "Miami Dolphins", "Los Angeles Rams", "Los Angeles Chargers", "Arizona Cardinals", "Carolina Panthers", "Tennessee Titans", "Jacksonville Jaguars", "New York Giants", "Seattle Seahawks", "Atlanta Falcons", "Las Vegas Raiders", "Chicago Bears", "Houston Texans", "Minnesota Vikings", "Detroit Lions")


Gms <- c(2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2)

TotPts <- c(5, 9, 7, 14, 6, 6, 3, 6.5, 22, 7, 4, 8, 5, 4, 6, 7, 9, 5, 0, 3, 8, 7, 5, 2, 5.5, 4, 2.5, 5.5, 2, 6, 0.5, 1.5)

PtsG <- c(17, 14.5, 17, 14, 16, 16, 21.5, 16.5, 22, 17, 24.5, 28, 15, 24, 26, 27, 19, 25.5, 20, 23, 28, 27, 25, 32.7, 25.5, 24, 22.5, 25.5, 32.5, 26, 30.5, 31.5)

RushYds <- c(72, 130, 238, 159, 204, 227, 244, 138, 264, 108, 218, 236, 192, 242, 183, 332, 166, 219, 130, 231, 172, 277, 321, 414, 157, 104, 333, 384, 212, 386, 194, 211)

RYdsG <- c(86, 65, 119, 53, 102, 113.5, 122, 69, 132, 54, 109, 118, 96, 121, 91, 166, 83, 109, 65, 115.5, 86, 138, 160, 138, 78, 52, 166, 192, 106, 193, 97, 105.5)

PassYds <- c(214, 267, 267, 616, 320, 302, 301, 412, 298, 497, 391, 382, 447, 398, 466, 340, 513, 468, 563, 465, 537, 433, 400, 671, 574, 652, 424, 381, 554, 413, 650, 666)

PYdsG <- c(107, 133.5, 133.5, 205, 160, 151, 150.5, 206, 149, 248.5, 195.5, 191, 223.5, 199, 233, 170, 256.5, 234, 281.5, 232.5, 268.5, 216.5, 200, 223, 287, 326, 212, 190.5, 277, 206.5, 325, 333)

TotYds <- c(286, 397, 505, 775, 524, 529, 545, 550, 562, 605, 609, 618, 639, 640, 649, 672, 679, 687, 693, 696, 709, 710, 721, 1085, 731, 756, 757, 765, 766, 799, 844, 877)

YdsG <- c(193, 198.5, 252.5, 258.3, 262, 264.5, 272.5, 275, 281, 302.5, 304.5, 309, 319.5, 320, 324.5, 336, 339.5, 343.5, 346.5, 348, 354.5, 355, 360.5, 361.7, 365.5, 378, 378.5, 382.5, 383, 399.5, 422, 438.5)

# Create dataframe
NFL_2023 <- data.frame(Team, Gms, TotPts, PtsG, RushYds, RYdsG, PassYds, PYdsG, TotYds, YdsG)

View(NFL_2023)

For Team Statistics we will be using data from NFL.com\
\
\
[2020 NFL defense passing stats by Team \| NFL.com](https://www.nfl.com/stats/team-stats/defense/passing/2020/reg/all)

Add Team Defensive Statistics

In [None]:
install.packages("rvest")
install.packages("dplyr")
library(rvest)
library(dplyr)

# URLs
urls <- c(
  "https://www.nfl.com/stats/team-stats/defense/rushing/2020/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/rushing/2021/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/rushing/2022/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/rushing/2023/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/passing/2023/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/passing/2020/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/passing/2021/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/passing/2022/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/downs/2020/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/downs/2021/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/downs/2022/reg/all",
  "https://www.nfl.com/stats/team-stats/defense/downs/2023/reg/all"
)

# Function to scrape data and return dataframe
scrape_data <- function(url) {
  webpage <- read_html(url)
  table <- webpage %>% html_table(fill = TRUE)
  return(table[[1]])
}

# Loop through URLs and create dataframes

 for (url in urls) {
     # Extract year and metric from the URL
     year <- gsub(".*rushing/([0-9]{4}).*", "\\1", url)
     if (year == url) {
         year <- gsub(".*passing/([0-9]{4}).*", "\\1", url)
     }
     if (year == url) {
         year <- gsub(".*downs/([0-9]{4}).*", "\\1", url)
     }
     
     metric <- gsub(".*team-stats/defense/([a-z]+)/.*", "\\1", url)
     
     # Create dataframe name with "df_" prefix
     df_name <- paste0("df_", year, "_", metric)
     
     # Assign dataframe to global environment
     assign(df_name, scrape_data(url))
 }

# Now you can view the dataframes like this:
rush20 <- df_2020_rushing
down20 <- df_2020_downs
pass20 <- df_2020_passing
rush21 <- df_2021_rushing
down21 <- df_2021_downs
pass21 <- df_2021_passing
rush22 <- df_2022_rushing
down22 <- df_2022_downs
pass22 <- df_2022_passing
rush23 <- df_2023_rushing
down23 <- df_2023_downs
pass23 <- df_2023_passing


View(rush20)
View(rush21)
View(rush22)
View(rush23)
View(down20)
View(down21)
View(down22)
View(down23)
View(pass20)
View()

In [None]:
clean_team_column <- function(df) {
     #Extract unique team name from duplicated format
     df$Team <- sapply(df$Team, function(team) {
         # Split the team name on spaces and take the first part
         return(unlist(strsplit(team, " "))[1])
     })
     return(df)
 }
 
 # Apply the cleaning function to each dataframe
rush20 <- clean_team_column(rush20)
down20 <- clean_team_column(down20)
pass20 <- clean_team_column(pass20)
rush21 <- clean_team_column(rush21)
down21 <- clean_team_column(down21)
pass21 <- clean_team_column(pass21)
rush22 <- clean_team_column(rush22)
down22 <- clean_team_column(down22)
pass22 <- clean_team_column(pass22)
rush23 <- clean_team_column(rush23)
down23 <- clean_team_column(down23)
pass23 <- clean_team_column(pass23)

In [None]:
update_team_column <- function(df, year) {
    # Mapping for team names to abbreviations
    team_mapping <- list(
        Ravens = "BAL", Titans = "TEN", Patriots = "NE", Browns = "CLE", Saints = "NO", 
        Cardinals = "ARI", Rams = "LAR", Vikings = "MIN", Chargers = "LAC", Colts = "IND", 
        Raiders = "LV", Packers = "GB", Broncos = "DEN", Cowboys = "DAL", 
        Dolphins = "MIA", Bills = "BUF", Bengals = "CIN", Seahawks = "SEA", Falcons = "ATL", 
        Panthers = "CAR", Jets = "NYJ", Chiefs = "KC", Eagles = "PHI", Giants = "NYG", 
        Bears = "CHI", Steelers = "PIT", Buccaneers = "TB", Lions = "DET", Texans = "HOU", 
        Jaguars = "JAX", `49ers` = "SF"
    )
    
    # Special handling for Washington Football Team/Commanders
    if (year %in% c(2020, 2021)) {
        team_mapping$Football <- "WAS"
    } else if (year %in% c(2022, 2023)) {
        team_mapping$Commanders <- "WAS"
    }
    
    # Replace team names with abbreviations
    df$Team <- sapply(df$Team, function(team) {
        trimmed_team <- trimws(team)  # Trim whitespace from team name
        if (!is.null(team_mapping[[trimmed_team]])) {
            return(team_mapping[[trimmed_team]])
        } else {
            return(trimmed_team)  # Return the trimmed team name if not found in the mapping
        }
    })
    
    return(df)
}

# Update the dataframes
rush20 <- update_team_column(rush20, 2020)
down20 <- update_team_column(down20, 2020)
pass20 <- update_team_column(pass20, 2020)
rush21 <- update_team_column(rush21, 2021)
down21 <- update_team_column(down21, 2021)
pass21 <- update_team_column(pass21, 2021)
rush22 <- update_team_column(rush22, 2022)
down22 <- update_team_column(down22, 2022)
pass22 <- update_team_column(pass22, 2022)
rush23 <- update_team_column(rush23, 2023)
down23 <- update_team_column(down23, 2023)
pass23 <- update_team_column(pass23, 2023)

drush20<-rush20
ddown20<-down20
dpass20<-pass20
dpass21<-pass21
dpass22<-pass22
dpass23<-pass23
ddown21<-down21
ddown22<-down22
ddown23<-down23
drush21<-rush21
drush22<-rush22
drush23<-rush23

Team Offensive statistics

In [None]:
offense_urls <- c(
  "https://www.nfl.com/stats/team-stats/offense/passing/2020/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/passing/2021/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/passing/2022/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/passing/2023/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/rushing/2020/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/rushing/2021/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/rushing/2022/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/rushing/2023/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/receiving/2020/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/receiving/2021/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/receiving/2022/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/receiving/2023/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/downs/2020/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/downs/2021/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/downs/2022/reg/all",
    "https://www.nfl.com/stats/team-stats/offense/downs/2023/reg/all"
)

# Loop through URLs and create dataframes
for (url in offense_urls) {
    # Extract year and metric from the URL
    year <- gsub(".*offense/([a-z]+)/([0-9]{4}).*", "\\2", url)
    metric <- gsub(".*offense/([a-z]+)/([0-9]{4}).*", "\\1", url)
    
    # Create dataframe name
    df_name <- paste0(metric, substr(year, 3, 4))
    
    # Scrape data
    df <- scrape_data(url)
    
    # Clean team column
    df <- clean_team_column(df)
    
    # Update team column
    df <- update_team_column(df, as.numeric(year))
    
    # Assign dataframe to global environment
    assign(df_name, df)
}

# Check one of the dataframes
View(pass20)


In [None]:
# List of offensive datasets
offensive_datasets <- list("receiving20", "receiving21", "receiving22", "receiving23", 
                           "downs20", "downs21", "downs22", "downs23", 
                           "rushing20", "rushing21", "rushing22", "rushing23", 
                           "passing20", "passing21", "passing22", "passing23")

# Loop through each dataset and apply the functions
for (dataset_name in offensive_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Extract the year from the dataset name
  year <- as.numeric(substr(dataset_name, nchar(dataset_name)-1, nchar(dataset_name)))
  
  # Apply the clean_team_column function
  df <- clean_team_column(df)
  
  # Apply the update_team_column function
  df <- update_team_column(df, year)
  
  # Assign the cleaned dataframe back to the global environment
  assign(dataset_name, df)
}

# Check one of the dataframes
# List of specified datasets
datasets_to_update <- c("passing20", "passing21", "downs20", "downs21", 
                        "receiving20", "receiving21", "rushing20", "rushing21")

# Loop through each dataset and update the Team column
for (dataset_name in datasets_to_update) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Update the Team column where the value is "Football" to "WAS"
  df$Team[df$Team == "Football"] <- "WAS"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# Check one of the dataframes
View(passing20)


# List of specified datasets for 2022 and 2023
datasets_to_update <- c("passing22", "passing23", "downs22", "downs23", 
                        "receiving22", "receiving23", "rushing22", "rushing23")

# Loop through each dataset and update the Team column
for (dataset_name in datasets_to_update) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Update the Team column where the value is "Commanders" to "WAS"
  df$Team[df$Team == "Commanders"] <- "WAS"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# Check one of the dataframes
View(passing22)


datasets_to_update <- c("receiving20", "receiving21", "receiving22", "receiving23")

# Loop through each dataset and rename the "Yds" column
for (dataset_name in datasets_to_update) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the "Yds" column to "Rec Yards"
  colnames(df)[colnames(df) == "Yds"] <- "Rec Yards"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}



In [None]:
datasets_to_update <- c("pass20", "pass21", "pass22", "pass23", 
                        "passing20", "passing21", "passing22", "passing23")

# Loop through each dataset and rename the "Att" column
for (dataset_name in datasets_to_update) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the "Att" column to "Pass Att"
  colnames(df)[colnames(df) == "Att"] <- "Pass Att"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

datasets_to_update <- c("rush20", "rush21", "rush22", "rush23", 
                        "rushing20", "rushing21", "rushing22", "rushing23")

# Loop through each dataset and rename the "Att" column
for (dataset_name in datasets_to_update) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the "Att" column to "Rush Att"
  colnames(df)[colnames(df) == "Att"] <- "Rush Att"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

drush_datasets <- c("drush20", "drush21", "drush22", "drush23")

# List of dpassXX datasets
dpass_datasets <- c("dpass20", "dpass21", "dpass22", "dpass23")

# Update column names for drushXX dataframes
for (dataset_name in drush_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the 'Att' column to 'Rush Att'
  colnames(df)[colnames(df) == "Att"] <- "Rush Att"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# Update column names for dpassXX dataframes
for (dataset_name in dpass_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the 'Att' column to 'Pass Att'
  colnames(df)[colnames(df) == "Att"] <- "Pass Att"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

rushing_datasets <- c("rushing20", "rushing21", "rushing22", "rushing23")

# Update column names for rushingXX dataframes
for (dataset_name in rushing_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the specified columns
  colnames(df)[colnames(df) == "TD"] <- "Rush TD"
  colnames(df)[colnames(df) == "20+"] <- "20+ Yd Rush"
  colnames(df)[colnames(df) == "40+"] <- "40+ Yd Rush"
  colnames(df)[colnames(df) == "Lng"] <- "Rush Lng"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# Update column names for drushXX dataframes
for (dataset_name in drush_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the 'TD' column to 'Rush TD Allowed'
  colnames(df)[colnames(df) == "TD"] <- "Rush TD Allowed"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# Update column names for dpassXX dataframes
for (dataset_name in dpass_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename the 'TD' column to 'Pass TD Allowed'
  colnames(df)[colnames(df) == "TD"] <- "Pass TD Allowed"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}

# List of passingXX datasets
passing_datasets <- c("passing20", "passing21", "passing22", "passing23")

# Rename columns for passingXX dataframes
for (dataset_name in passing_datasets) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Rename columns
  colnames(df)[colnames(df) == "20+"] <- "20+ Yd Rec"
  colnames(df)[colnames(df) == "40+"] <- "40+ Yd Rec"
  colnames(df)[colnames(df) == "Lng"] <- "Lng Rec"
  
  # Assign the updated dataframe back to the global environment
  assign(dataset_name, df)
}





In [None]:
datasets_to_save <- c("downs20", "downs21", "downs22", "downs23", 
                      "ddown20", "ddown21", "ddown22", "ddown23",
                      "dpass20", "dpass21", "dpass22", "dpass23",
                      "drush20", "drush21", "drush22", "drush23",
                      "rushing20", "rushing21", "rushing22", "rushing23",
                      "receiving20", "receiving21", "receiving22", "receiving23",
                      "passing20", "passing21", "passing22", "passing23")

# File path
file_path <- "C:\Users\Garrett Bainwol\Desktop\\UC Boulder\Supervised Learning\Final Project Csvs"

# Loop through each dataset and save as CSV
for (dataset_name in datasets_to_save) {
  # Get the dataframe from the global environment
  df <- get(dataset_name)
  
  # Create the full path for the CSV file
  csv_path <- file.path(file_path, paste0(dataset_name, ".csv"))
  
  # Save the dataframe as a CSV
  write.csv(df, csv_path, row.names = FALSE)
}



Creating the whole defense and offense data sets by year

In [None]:
# List of years
years <- c("20", "21", "22", "23")

# Loop through each year
for (year in years) {
  
  # Merge defensive datasets
  defensive_df <- merge(get(paste0("ddown", year)), get(paste0("dpass", year)), by = "Team")
  defensive_df <- merge(defensive_df, get(paste0("drush", year)), by = "Team")
  
  # Assign the merged defensive dataframe to the global environment
  assign(paste0("defensive_stats_", year), defensive_df)
  
  # Merge offensive datasets
  offensive_df <- merge(get(paste0("passing", year)), get(paste0("receiving", year)), by = "Team")
  offensive_df <- merge(offensive_df, get(paste0("rushing", year)), by = "Team")
  offensive_df <- merge(offensive_df, get(paste0("downs", year)), by = "Team")
  
  # Assign the merged offensive dataframe to the global environment
  assign(paste0("offensive_stats_", year), offensive_df)
}




In [None]:
years <- c("20", "21", "22", "23")

# Loop through each year
for (year in years) {
  
  # Get the defensive dataframe from the global environment
  df <- get(paste0("defensive_stats_", year))
  
  # Rename columns
  colnames(df)[colnames(df) == "Yds"] <- "Pass Yds"
  colnames(df)[colnames(df) == "Rush 1st.x"] <- "Rush 1st"
  colnames(df)[colnames(df) == "Rush 1st%.x"] <- "Rush 1st%"
  
  # Remove unwanted columns
  df <- df[, !(colnames(df) %in% c("Rush 1st.y", "Rush 1st%.y"))]
  
  # Assign the updated dataframe back to the global environment
  assign(paste0("defensive_stats_", year), df)
}

years <- c("20", "21", "22", "23")

# Loop through each year
for (year in years) {
  
  # Get the offensive dataframe from the global environment
  df <- get(paste0("offensive_stats_", year))
  
  # Rename columns
  colnames(df)[colnames(df) == "TD.x"] <- "Pass TD"
  colnames(df)[colnames(df) == "Rec 1st.x"] <- "Rec 1st"
  colnames(df)[colnames(df) == "Rec 1st%.x"] <- "Rec 1st%"
  colnames(df)[colnames(df) == "Rush 1st.x"] <- "Rush 1st"
  colnames(df)[colnames(df) == "Rush 1st%.x"] <- "Rush 1st%"
  
  # Remove unwanted columns
  df <- df[, !(colnames(df) %in% c("TD.y", "Rec 1st.y", "Rec 1st%.y", "Rush 1st.y", "Rush 1st%.y"))]
  
  # Assign the updated dataframe back to the global environment
  assign(paste0("offensive_stats_", year), df)
}


**Conclusion and Next Steps:**

Over the course of this notebook, we embarked on a crucial journey towards building a robust Fantasy Football prediction model. We began with a voluminous dataset, rich with potential, but also fraught with challenges typical of raw data. Our meticulous data preprocessing and cleaning ensured that we transformed this raw data into sophisticated datasets, segmented by football positions. This granularity not only provides clarity but also sets the stage for precise and targeted analyses.

Our achievements in this notebook include:

-   **Initial Data Exploration**: We got acquainted with the structure, columns, and overall characteristics of our 17,400 entries.

-   **Data Segmentation**: By breaking down our data based on positions like Wide Receivers, Running Backs, and Tight Ends, we paved the way for position-specific analysis, ensuring that our models are finely tuned to the nuances of each role.

-   **Data Cleaning and Preprocessing**: Through systematic cleaning, we enhanced data quality, ensuring that our subsequent models and analyses are built on a solid foundation.

With a well-prepared dataset in hand, our journey is only half complete. In the next notebook, we will dive deep into **Exploratory Data Analysis (EDA)**. This phase will allow us to uncover patterns, relationships, and anomalies in our data, providing invaluable insights that will guide our modeling decisions. After the EDA, we will transition into the **Modeling** phase, where we will employ Supervised Machine Learning techniques to predict weekly starts and sits for our players. Our overarching goal remains: to outperform standard projections and provide Fantasy Football enthusiasts with a competitive edge.