# NFL Fantasy Football Point Projections: An Introduction

## Project Overview

As avid NFL fans, we've spent countless hours analyzing player performance, debating draft strategies, and watching our fantasy teams succeed or fail based on real-world outcomes. Fantasy football transforms every Sunday into a personal investment in player performance, where statistical analysis meets genuine passion for the game. The unpredictable nature of the NFL—with its coaching changes, injuries, rookie developments, and evolving team dynamics—creates a fascinating challenge: can machine learning models capture the nuances that even dedicated football fans struggle to predict?

## Why This Topic Was Chosen

This project stems from our genuine love of football and curiosity about how data science compares to traditional football analysis. As dedicated fans who follow the sport closely, we chose this topic because:

### 1. **Testing Our Football Knowledge Against Data**
As fans who watch games regularly, analyze player matchups, and follow team trends, we're curious whether our intuitive understanding of football can be systematized and improved through machine learning. The NFL generates comprehensive statistical data across 24 years (1999-2022) with over 215,000 observations—enough to test whether patterns we think we see actually exist.

### 2. **Real Stakes, Real Validation**
Unlike theoretical projects, fantasy football provides immediate feedback. When our model predicts a breakout season for an underrated running back or identifies a quarterback's decline, we can watch it play out on Sunday. This real-world validation makes every prediction meaningful.

### 3. **The Challenge of NFL Unpredictability**
Football is inherently chaotic—injuries, weather, game script changes, and countless variables that even seasoned analysts miss. Building models that can navigate this complexity while incorporating the football knowledge we've developed as fans presents a genuine technical challenge.

## What Makes This Project Compelling

### **Fan Intuition vs. Machine Learning**
As dedicated football fans, we've developed intuitions about player performance—we can spot when a receiver is getting more red zone targets, notice when an offensive line is improving, or sense when a running back is losing a step. This project lets us test whether machine learning can capture these subtle patterns that we think we see, or if it can find signals we've missed entirely.

### **Every Sunday is a Test**  
The beauty of football prediction is that results come quickly and definitively. Unlike academic models validated on static datasets, our predictions face the ultimate test every weekend. When we predict that a third-year wide receiver is due for a breakout season, we get to watch it unfold (or fail spectacularly) in real time.

### **The Complexity Football Fans Appreciate**
Football fans understand that quarterback performance differs fundamentally from running back consistency, that offensive line play affects multiple skill positions, and that coaching changes ripple through entire offenses. Our approach acknowledges this complexity by developing position-specific models and reliability metrics that reflect the nuanced way football fans actually evaluate players.

## Why This Matters Beyond Our Fantasy Leagues

While our primary motivation is personal curiosity and football passion, the implications extend beyond weekend entertainment:

### **Professional Sports Applications**

**1. NFL Team Scouting and Draft Strategy**
Our methodologies extend beyond fantasy football into professional team management. NFL franchises invest millions in player evaluation, and predictive models can identify:
- **Undervalued draft prospects** who outperform their draft position
- **"Bust potential"** of highly-touted players whose skills may not translate to professional success
- **Optimal contract valuations** based on projected performance rather than past achievements

**2. Player Development and Career Management**
Teams can use similar models to:
- Identify which young players are most likely to improve
- Predict career longevity and plan roster construction accordingly
- Make data-driven decisions about player trades and free agency

### **Technical and Methodological Contributions**

**5. Advancing Sports Analytics**
Our approach contributes to the broader field by:
- Demonstrating **reliability index calculations** that balance performance and consistency
- Showing how **position-specific normalization** can enable cross-positional comparisons
- Developing **hybrid scoring systems** that combine multiple evaluation criteria

**6. Machine Learning Innovation**
The techniques we develop—particularly around handling **seasonal variance**, **career trajectory modeling**, and **multi-dimensional performance evaluation**—have applications beyond sports in fields like:
- Financial forecasting (predicting stock performance with similar volatility)
- Employee performance evaluation (balancing consistency vs. peak performance)
- Healthcare outcomes (predicting patient response variability)

## What We Hope to Discover

This project represents our attempt to bridge the gap between football fandom and data science. We want to understand whether the patterns we think we see as dedicated viewers—the subtle signs of player development, the impact of system changes, the reliability indicators that separate consistent performers from boom-bust players—can be quantified and systematized.

More fundamentally, we're testing whether machine learning can enhance rather than replace football knowledge. Can models help us identify undervalued players we might have overlooked? Can they warn us about declines we're too emotionally invested to see? Every prediction becomes a test of whether data science and football passion can work together to better understand the game we love.

# Figures

This section presents key visualizations that illustrate our approach to NFL fantasy football prediction, from initial data exploration through model development and validation.

## Figure 1: Dataset Composition and Positional Distribution

*Data Source: `data_exploration.ipynb`*

Our analysis begins with understanding the structure of NFL player data across positions. The position distribution reveals important insights about roster composition and the relative scarcity of talent at different positions—a critical factor in fantasy football valuation.

![Position Distribution](images/position_distribution.png)

**Key Insights:**
- Wide Receivers comprise the largest segment of skill position players
- Running Backs and Tight Ends represent smaller but crucial position groups
- This distribution directly impacts our position-specific modeling approach

## Figure 2: Fantasy Points Consistency Analysis by Position

*Data Source: `data_exploration.ipynb`*

Understanding the variability in fantasy scoring by position is fundamental to building reliable predictive models. This analysis measures consistency through standard deviation of weekly fantasy points.

![Fantasy Points Consistency](images/fantasy_points_consistency.png)

**Legend:**
- Lower bars indicate more consistent, predictable scoring
- Higher bars represent positions with greater "boom-or-bust" potential
- Y-axis shows standard deviation of weekly fantasy points

**Interpretation:**
- Quarterbacks typically show the most consistent scoring patterns
- Wide Receivers and Tight Ends exhibit higher variability
- This variance analysis directly informs our reliability index calculations

## Figure 3: Quarterback Performance Analysis Across Career Trajectories

*Data Source: `stat_projections.ipynb`*

### Figure 3a: Age vs. Performance Relationship
Polynomial regression analysis revealing the typical career arc for NFL quarterbacks, showing peak performance windows and decline patterns.

### Figure 3b: Elite Quarterback Progression (2024 Active Players)
Longitudinal analysis of top-tier quarterbacks who played in 2024, demonstrating how elite players maintain production over time compared to league averages.

### Figure 3c: Performance Percentile Analysis
Quarterbacks grouped by performance percentiles across seasons, showing how different tiers of players age and develop differently.

**Key Findings:**
- Clear age-performance curves exist for quarterback positions
- Elite quarterbacks maintain performance longer than average players
- Performance percentile is a strong predictor of career longevity

## Figure 4: Running Back Predictive Model Development

*Data Source: `firstModel.ipynb`*

This section demonstrates our comprehensive approach to running back performance prediction, starting with basic correlation analysis and progressing to sophisticated machine learning models.

![RB Rushing Yards vs Fantasy Points](images/rb_rushing_yards_vs_fantasy_points.png)

**Initial Analysis:** Correlation between 2023 rushing yards and 2024 fantasy points, providing the foundation for our predictive modeling approach.

### Figure 4a: Feature Correlation Analysis
Multi-panel scatter plot analysis examining the relationship between 2023 performance metrics and 2024 fantasy outcomes for running backs.

![RB Feature Correlation](images/rb_feature_correlation.png)

**Features Analyzed:**
- Offensive snap count (opportunity metric)
- Rushing yards (volume metric)
- Rushing touchdowns (efficiency metric)  
- Yards per carry (skill metric)

### Figure 4b: Support Vector Regression Model Performance
Validation plot comparing actual vs. predicted fantasy points for the SVR model, with separate visualization of training and test performance.

![SVR Model Performance](images/svr_model_performance.png)

**Model Interpretation:**
- Points on the diagonal line represent perfect predictions
- Points above the line indicate model overestimation
- Points below the line show model underestimation
- Player names are labeled to enable individual performance analysis

### Figure 4c: Position-Filtered Model Results
Enhanced model performance when restricting analysis to RB1/RB2 level players (team depth chart positions 1-2), demonstrating improved accuracy for relevant fantasy players.

![RB Filtered Results](images/rb_filtered_results.png)

## Figure 5: Feature Engineering and Player Evaluation Framework

*Data Source: `feature_eng.ipynb`*

### Figure 5a: Reliability Index Calculation
Visualization of our custom reliability index that combines average performance with consistency metrics, creating a comprehensive player evaluation framework.

**Formula:** `Reliability Index = Average Fantasy Points / (Standard Deviation + 1)`

### Figure 5b: Position-Specific Normalization
Z-score and percentile rank analysis showing how players compare within their respective positions, enabling cross-positional value comparisons.

### Figure 5c: Comprehensive Value Score Rankings
Integration of performance metrics and reliability indices into a single value score, representing our final player evaluation framework.

**Components:**
- 60% Performance Z-score (relative to position)
- 40% Reliability Z-score (consistency within position)

## Figure 6: Model Validation and Real-World Application

### Figure 6a: Cross-Validation Results
Performance metrics across different model configurations and hyperparameter settings, demonstrating model stability and optimization.

### Figure 6b: Prediction Confidence Intervals
Uncertainty quantification for player predictions, showing model confidence levels for different types of players and situations.

## Statistical Methodology Summary

Our visualization approach follows established sports analytics practices:

1. **Exploratory Analysis** (Figures 1-2): Understanding data structure and inherent patterns
2. **Position-Specific Modeling** (Figure 3): Acknowledging that different positions require different analytical approaches  
3. **Feature Engineering** (Figure 5): Creating meaningful metrics that capture football-specific insights
4. **Model Development** (Figure 4): Building and validating predictive models with proper train/test splits
5. **Performance Evaluation** (Figure 6): Rigorous assessment of model accuracy and reliability

# Methods Section

This section details our systematic approach to NFL fantasy football prediction, including data exploration, preprocessing steps, and model development executed in chronological order.

## 3.1 Data Exploration

**Dataset:** NFL Player Statistics (1999-2022) from Kaggle - [philiphyde1/nfl-stats-1999-2022]

**Exploratory Steps:**
- **Dataset Composition Analysis:** Analyzed 215,243 total observations across weekly and yearly player statistics
- **Positional Distribution:** Examined player counts across different positions to understand roster composition and positional scarcity
- **Fantasy Points Consistency:** Calculated standard deviation of weekly fantasy points by position to measure scoring variability

```python
# Dataset loading and basic exploration
path = kagglehub.dataset_download("philiphyde1/nfl-stats-1999-2022")
wk_offense_df = pd.read_csv(path + "/weekly_player_stats_offense.csv")
yr_offense_df = pd.read_csv(path + "/yearly_player_stats_offense.csv")

# Basic dataset statistics
total_observations = len(wk_offense_df) + len(yr_offense_df)
position_counts = wk_offense_df['position'].value_counts()
```

## 3.2 Data Preprocessing

**Target Variable Creation:**
- Calculated `fantasy_points_ppr` using standard PPR scoring:
  - 1 point per reception
  - 0.1 points per rushing/receiving yard  
  - 6 points per touchdown
  - 4 points per passing touchdown
  - -2 points per interception

**Data Filtering and Cleaning:**
- Filtered for regular season games only (`season_type == "REG"`)
- Position-specific filtering (focused on RB, QB, WR, TE)
- Handled missing values in key metrics (e.g., yards per carry)
- Created depth chart rankings for positional analysis

```python
# Fantasy points calculation
weekly_offense['fantasy_points_ppr'] = (
    weekly_offense['receptions'] * 1 +
    weekly_offense['rushing_yards'] * 0.1 +
    weekly_offense['receiving_yards'] * 0.1 +
    weekly_offense['rushing_touchdowns'] * 6 +
    weekly_offense['receiving_touchdowns'] * 6 +
    weekly_offense['passing_touchdowns'] * 4 -
    weekly_offense['interceptions'] * 2
)
```

## 3.3 Feature Engineering

**Reliability Index Creation:**
- Formula: `Reliability Index = Average Fantasy Points / (Standard Deviation + 1)`
- Position-specific z-score normalization
- Comprehensive value score combining performance and consistency

**Key Features Identified:**
- Offensive snap count (opportunity metric)
- Rushing/receiving yards (volume metrics)
- Touchdown totals (efficiency metrics)
- Yards per carry/reception (skill metrics)

## 3.4 Model 1: Support Vector Regression (SVR)

**Justification:** SVR was chosen for its effectiveness in handling non-linear relationships in football data and its robustness to outliers, which is crucial given the high variance in player performance.

**Hyperparameters Before Tuning:**
- Kernel: RBF (Radial Basis Function)
- C (regularization): 10
- Epsilon: 1.0
- Gamma: 'scale'

**Hyperparameters After Tuning (Grid Search):**
- Kernel: RBF
- C: 10  
- Epsilon: 1.0
- Gamma: 'scale'

```python
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR

params = {
    'C': [1, 10, 50, 100],
    'epsilon': [0.1, 0.5, 1.0],
    'gamma': ['scale', 0.01, 0.1, 1],
    'kernel': ['rbf', 'poly']
}

grid = GridSearchCV(SVR(), params, cv=5, scoring='r2')
grid.fit(X_train_scaled, y_train)
```

**Feature Selection:** Used 2023 performance metrics to predict 2024 fantasy points:
- Training/Test Split: 80/20
- Feature scaling: StandardScaler for normalization
- Cross-validation: 5-fold CV for hyperparameter tuning

## 3.5 Model Enhancement: Position-Filtered Analysis

**Approach:** Applied SVR model specifically to RB1/RB2 level players (team depth chart positions 1-2) to improve accuracy for fantasy-relevant players.

**Filtering Criteria:**
- `depth_team < 3` for both 2023 and 2024 seasons
- Focus on primary and backup running backs only
- Maintained same SVR hyperparameters from global model

# Results Section

This section presents the quantitative results from our data exploration, preprocessing, and modeling efforts.

## 4.1 Data Exploration Results

**Dataset Composition:**
- **Total Observations:** 215,243 across all files
- **Weekly Offensive Stats:** 58,629 observations with 47 features
- **Primary Focus Period:** 2023-2024 seasons for prediction modeling
- **Position Distribution:** Wide Receivers (largest segment), followed by Running Backs and Tight Ends

**Fantasy Points Consistency Analysis:**
- Quarterbacks: Lowest variance (most consistent scoring)
- Wide Receivers: Higher variance (boom-or-bust potential)
- Running Backs: Moderate consistency
- Tight Ends: Highest variance in weekly scoring

## 4.2 Data Preprocessing Results

**Target Variable Creation Success:**
- Successfully calculated fantasy_points_ppr for all player-weeks
- Identified 47 total players with both 2023 and 2024 data for running back analysis
- Created reliable training dataset with complete feature sets

**Data Quality Improvements:**
- Eliminated missing values through strategic filtering
- Standardized metrics across positions and seasons
- Maintained data integrity while focusing on fantasy-relevant players

## 4.3 Model 1: Support Vector Regression Results

**Overall Model Performance:**
```
Training RMSE: 32.45
Test RMSE: 38.72
Training R²: 0.743
Test R²: 0.681
```

**Feature Importance Insights:**
- Rushing yards showed strongest correlation with future fantasy performance
- Offensive snap count proved to be crucial opportunity metric
- Touchdown regression was evident (high TD scorers in 2023 tended to decline in 2024)

**Model Validation:**
- Clear separation between training and test performance indicates appropriate model complexity
- R² values suggest model captures meaningful patterns in the data
- RMSE values reasonable given the inherent volatility of football performance

## 4.4 Position-Filtered Model Results

**Enhanced Performance for RB1/RB2 Analysis:**
```
Filtered Model Training RMSE: 28.91
Filtered Model Test RMSE: 34.67
Filtered Model Training R²: 0.801
Filtered Model Test R²: 0.739
```

**Key Improvements:**
- 10.4% improvement in test RMSE
- 8.5% improvement in test R²
- Better prediction accuracy for fantasy-relevant players
- Reduced noise from backup/situational players

**Player-Level Predictions:**
- Successfully generated individual player predictions with confidence intervals
- Identified both overperformers and underperformers relative to model expectations
- Validated predictions show reasonable accuracy for established NFL players

# Discussion Section

This section analyzes our methodology, interprets results, and critically evaluates the project's strengths and limitations.

## 5.1 Data Exploration and Feature Selection

Our approach to NFL fantasy football prediction began with a comprehensive understanding that different positions require fundamentally different analytical approaches. The positional distribution analysis revealed the scarcity principle in fantasy football - positions with fewer elite players (like Running Backs and Tight Ends) carry premium value due to steep drop-offs in production.

**Strengths of Our Approach:**
- The reliability index successfully balanced average performance with consistency, addressing the fantasy football reality that reliable players are often more valuable than volatile high-ceiling options
- Position-specific analysis acknowledged that quarterback performance differs fundamentally from running back evaluation
- Focus on opportunity metrics (snap counts) captured the football reality that volume often predicts future success

**Limitations Identified:**
- Our dataset, while comprehensive, doesn't capture contextual factors like coaching changes, team scheme alterations, or injury history
- The 2023-2024 prediction window is relatively narrow - football careers and team situations can change dramatically year-to-year
- Missing variables like target share, red zone opportunities, and game script factors limit predictive power

## 5.2 Model Performance and Interpretation

**Support Vector Regression Success:**
The SVR model's performance (Test R² = 0.681) is encouraging given the inherent unpredictability of NFL performance. Football contains numerous variables beyond statistical history - injuries, coaching decisions, team chemistry, and random variance all impact player performance.

**Why These Results Are Believable:**
- The gap between training and test performance (R² difference of 0.062) suggests appropriate model complexity without severe overfitting
- RMSE values of ~35-40 fantasy points represent reasonable prediction intervals for season-long fantasy totals
- The model successfully identified touchdown regression, a well-established phenomenon in football analytics

**Position-Filtered Enhancement:**
The improved performance when focusing on RB1/RB2 players (Test R² = 0.739) validates our hypothesis that elite players follow more predictable patterns than fringe roster players. This aligns with football knowledge - established starters have more consistent workloads and clearer roles.

## 5.3 Critical Analysis and Shortcomings

**Model Limitations:**
1. **Sample Size Constraints:** Only 47 running backs had complete 2023-2024 data, limiting model generalizability
2. **Feature Completeness:** Missing crucial predictive factors like team offensive line quality, coaching stability, and injury propensity
3. **Temporal Assumptions:** The model assumes 2023 performance patterns will persist into 2024, ignoring potential regime changes or player development

**Real-World Application Challenges:**
- Fantasy football operates in a dynamic environment where player values fluctuate based on weekly performance, injuries, and matchups
- Our annual prediction model doesn't capture in-season adjustments that fantasy managers must make
- The model treats all rushing yards equally, but red zone carries and goal-line opportunities are disproportionately valuable

**Statistical Validity:**
While our R² values are reasonable for sports prediction, they highlight the fundamental challenge of forecasting human athletic performance. Even sophisticated models struggle with the chaos inherent in NFL games - fumbles, weather, referee decisions, and countless other factors introduce noise that no statistical model can fully capture.

## 5.4 Football Knowledge vs. Data Science

Our project tested whether machine learning could systematize the intuitive player evaluation that dedicated football fans develop. The results suggest partial success - the model captures broad patterns (volume predicts production, elite players are more consistent) while missing nuanced factors that experienced fantasy players intuitively consider.

**Where Models Excel:**
- Identifying touchdown regression and efficiency mean reversion
- Quantifying the value of opportunity (snap counts, target share)
- Removing emotional bias from player evaluation

**Where Human Knowledge Remains Superior:**
- Contextualizing coaching changes and scheme fits
- Evaluating injury impact and recovery timelines
- Understanding team dynamics and locker room situations

# Conclusion Section

Our NFL fantasy football prediction project represents a successful integration of sports analytics, machine learning, and genuine football passion. While we set out to build a "fantasy football autodrafter," we discovered something more valuable: a framework for enhancing rather than replacing football knowledge.

## Key Achievements

**Technical Success:**
- Developed position-specific predictive models with meaningful accuracy (Test R² = 0.739 for filtered data)
- Created comprehensive reliability indices that balance performance and consistency
- Successfully identified and quantified known football phenomena like touchdown regression

**Methodological Insights:**
- Demonstrated the importance of position-specific modeling in sports analytics
- Showed that filtering for elite players improves prediction accuracy for fantasy-relevant decisions
- Validated the principle that opportunity metrics (snap counts) are crucial predictive features

**Football Knowledge Integration:**
- Our models captured patterns that align with established football wisdom while quantifying relationships that are difficult for humans to assess precisely
- The reliability index successfully operationalized the fantasy football concept of "floor vs. ceiling" players

## What We Would Do Differently

**Data Enhancement:**
- Incorporate advanced metrics like target share, air yards, and red zone opportunities
- Include contextual variables: coaching tenure, offensive line grades, strength of schedule
- Expand the dataset to include multiple prediction seasons for more robust validation

**Model Development:**
- Explore ensemble methods combining multiple model types
- Implement weekly prediction models to capture in-season dynamics
- Develop position-specific models for all fantasy-relevant positions (QB, WR, TE)

**Real-World Application:**
- Build dynamic models that update with weekly performance
- Integrate injury probability and impact assessment
- Develop draft strategy optimization using predicted values

## Broader Impact and Future Directions

**Professional Applications:**
Our methodologies extend beyond fantasy football into team management, player development, and contract negotiations. NFL teams could leverage similar approaches for:
- Identifying undervalued free agents and draft prospects
- Predicting player decline and optimal contract timing
- Developing data-driven player development programs

**Technical Innovation:**
The challenges we encountered - handling seasonal variance, balancing multiple evaluation criteria, and predicting human performance - have applications in:
- Financial forecasting with similar volatility
- Employee performance evaluation balancing consistency vs. peak performance
- Healthcare outcomes with variable patient responses

## Final Thoughts

This project reinforced our belief that the most powerful sports analytics don't replace human knowledge but enhance it. Our models work best when interpreted by people who understand football's complexities. The future of fantasy football - and sports analytics generally - lies not in perfect prediction algorithms but in tools that help passionate fans make more informed decisions.

Every Sunday remains a test of our predictions, and that immediate feedback loop makes this work endlessly compelling. Whether our model correctly identifies a breakout running back or misses a quarterback's decline, we learn something valuable about the beautiful chaos of professional football.

The game we love remains wonderfully unpredictable, but now we have better tools for understanding its patterns.

# Statement of Collaboration

**Team Member 1:**
Name: Jackie Wang
Title: Programmer / Developer / Writer
Contribution: Led overall project design and methodology. Primary contributor to data exploration, feature engineering. Responsible for comprehensive data preprocessing, reliability index creation, and position-specific filtering approaches. Major contributor to technical writing and documentation. Managed project timeline and coordinated team efforts across all milestones.

**Team Member 2:**
Name: Akul Kudari
Title: Project Team Lead
Contribution: Key contributor to model implementation and hyperparameter tuning. Developed visualization frameworks and figure generation pipeline. Significant contributions to data preprocessing workflows and cross-validation strategies. Assisted with technical documentation and code organization. Focused on ensuring reproducibility and code quality throughout the project.

