# NFL Draft Analytics: Who Actually Becomes Elite?

This notebook investigates how well the NFL Draft predicts future All-Pro success.  
Using draft data and First-Team All-Pro selections from 2010–2024 (sourced from Pro Football Reference),
we calculate **All-Pro hit rates** by round, position, and pick number.

**Research questions:**
1. How steeply does All-Pro probability drop from Round 1 to later rounds?
2. Which positions produce All-Pros most reliably?
3. Are there "value" positions where mid-round picks outperform expectations?

**Methodology note:** We restrict the analysis cohort to draft classes **2010–2021**,
ensuring every player has had at least 3 full NFL seasons by end of 2024.

## 1. Setup

In [None]:
import sys
import pathlib

# Add the project root to sys.path so we can import from src/
project_root = pathlib.Path().resolve().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import pandas as pd
import plotly.io as pio

pio.renderers.default = "notebook"

from src.scraper import scrape_all_years
from src.cleaner import clean_draft_data, clean_allpro_data
from src.merger import merge_datasets, filter_analysis_cohort, validate_known_allpros
from src.analyzer import (
    compute_hit_rates_by_round,
    compute_hit_rates_by_position,
    compute_hit_rates_by_round_and_position,
    compute_hit_rate_by_pick_number,
    compute_value_table,
)
from src.charts import (
    bar_chart_by_round,
    bar_chart_by_position,
    heatmap_position_round,
    scatter_by_pick_number,
    value_table_chart,
)

DATA_DIR = project_root / "data"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"

YEARS = list(range(2010, 2025))

print(f"Project root: {project_root}")
print(f"Years: {YEARS[0]}–{YEARS[-1]}")

## 2. Scrape Data from Pro Football Reference

Set `RESCRAPE = True` to force a fresh download.  
Leave it `False` to use cached CSVs (much faster).

In [None]:
RESCRAPE = False  # ← flip to True to re-download everything

if RESCRAPE:
    scrape_all_years(YEARS, DATA_DIR, delay=3.0)
else:
    raw_files = list(RAW_DIR.glob("*.csv"))
    print(f"Found {len(raw_files)} cached CSV files in {RAW_DIR}")
    if len(raw_files) < 30:
        print("⚠️  Fewer than 30 files found — consider setting RESCRAPE = True")

## 3. Clean the Data

In [None]:
draft_df = clean_draft_data(RAW_DIR, PROCESSED_DIR / "draft_cleaned.csv")
allpro_df = clean_allpro_data(RAW_DIR, PROCESSED_DIR / "allpro_cleaned.csv")

print(f"\nDraft data: {len(draft_df):,} rows, {draft_df['year'].nunique()} seasons")
print(f"All-Pro data: {len(allpro_df):,} rows, {allpro_df['year'].nunique()} seasons")
draft_df.head(3)

## 4. Merge and Validate

In [None]:
merged_df = merge_datasets(draft_df, allpro_df)
cohort_df = filter_analysis_cohort(merged_df)

# Save merged dataset for the Streamlit app
cohort_df.to_csv(PROCESSED_DIR / "merged_dataset.csv", index=False)

print(f"Analysis cohort: {len(cohort_df):,} players from {cohort_df['year'].min()}–{cohort_df['year'].max()}")
print(f"Total All-Pros flagged: {cohort_df['is_allpro'].sum()}")
print(f"Overall hit rate: {cohort_df['is_allpro'].mean():.2%}")

In [None]:
validate_known_allpros(cohort_df)

## 5. Exploratory Data Analysis

In [None]:
print("=== Players per Round ===")
print(cohort_df.groupby("round")["player_name"].count().to_string())

print("\n=== Top 10 Positions by Frequency ===")
print(cohort_df["position"].value_counts().head(10).to_string())

In [None]:
print("=== All-Pro players in cohort ===")
allpros_in_cohort = cohort_df[cohort_df["is_allpro"] == 1][
    ["year", "round", "pick", "player_name", "position", "team"]
].sort_values(["year", "pick"])
print(allpros_in_cohort.to_string(index=False))

## 6. Analysis & Visualizations

### Chart 1: All-Pro Hit Rate by Draft Round

The most fundamental question: does draft position predict All-Pro success?

In [None]:
round_rates = compute_hit_rates_by_round(cohort_df)
print(round_rates.assign(hit_rate=round_rates["hit_rate"].map("{:.2%}".format)).to_string(index=False))
fig1 = bar_chart_by_round(round_rates)
fig1.show()

**Interpretation:**  
Round 1 picks are by far the most likely to earn All-Pro honors.  The hit rate drops
sharply in Round 2 and continues falling through later rounds, with Rounds 6–7 near zero.
This validates the conventional wisdom that the first round is where elite talent concentrates —
but it also shows that even in Round 1, the *majority* of picks never make an All-Pro team.

---

### Chart 2: All-Pro Hit Rate by Position

In [None]:
pos_rates = compute_hit_rates_by_position(cohort_df, min_players=20)
fig2 = bar_chart_by_position(pos_rates)
fig2.show()

**Interpretation:**  
Quarterback and edge-rushing positions (DT/DE) tend to have the highest All-Pro hit rates.
This makes intuitive sense — QB is the most impactful position, and pass rushers are rewarded
heavily in modern NFL schemes.  Wide receiver and cornerback also rank well.
Offensive line and fullback/running back positions often show lower rates, partly because
All-Pro voting historically under-values interior linemen.

---

### Chart 3: Heatmap — Position × Round

In [None]:
pivot = compute_hit_rates_by_round_and_position(cohort_df)
fig3 = heatmap_position_round(pivot)
fig3.show()

**Interpretation:**  
The heatmap shows that high hit rates are concentrated in Round 1 for most positions.
Gray cells (n<10) remind us that later-round position samples are small and rates unreliable.
Notably, QB has a very high Round 1 rate but almost zero in other rounds — elite QBs are almost
always first-round picks.

---

### Chart 4: Hit Rate by Overall Pick Number

In [None]:
pick_rates = compute_hit_rate_by_pick_number(cohort_df)
fig4 = scatter_by_pick_number(pick_rates)
fig4.show()

**Interpretation:**  
The smoothed trend line reveals a steep decline in the first ~32 picks (Round 1),
then a more gradual slope through rounds 2–4, before flattening near zero.
The cliff at pick ~32 (the Round 1/2 boundary) is one of the most striking patterns —
even the last pick of Round 1 is substantially more likely to become an All-Pro than the
first pick of Round 2.

---

### Chart 5: Best-Value Positions (Rounds 3–5 vs Round 1)

In [None]:
value_df = compute_value_table(cohort_df)
fig5 = value_table_chart(value_df)
fig5.show()

**Interpretation:**  
Positions with a high Late/R1 Ratio are those where rounds 3–5 picks punch above their weight.
A ratio above 0.50 means mid-round picks achieve at least half the All-Pro rate of first-rounders —
exceptional value.  These positions represent opportunities for smart franchises to find
elite talent without top-10 draft capital.

---

## 7. Key Findings

1. **Round 1 dominance is real, but not absolute.** First-round picks produce All-Pros at ~3–5× the rate of later rounds, but even Round 1 has only a ~15–25% hit rate. Most first-rounders never earn All-Pro honors.

2. **The Round 1/2 cliff is sharp.** There is a dramatic drop in All-Pro probability between the last pick of Round 1 and the first pick of Round 2 — more than between any two consecutive picks within Round 1 itself.

3. **Position matters.** QB and elite pass-rusher positions (DT, DE) produce the highest All-Pro rates. This reflects both positional value and the fact that all-world QBs/rushers dominate voting.

4. **Late-round value exists.** Certain positions show meaningful All-Pro rates even in rounds 3–5. Identifying these positions is where smart drafting can create competitive advantages.

5. **Rounds 6–7 are speculative.** The All-Pro hit rate in rounds 6–7 is near zero across almost all positions. These picks are best viewed as roster depth and special-teams contributors.

## 8. Caveats

- **Sample size:** 12 draft classes × ~256 picks = ~3,000 players. Position-round cells can be small.
- **First-Team All-Pro only:** We use only First-Team selections, the highest bar. Including Second-Team would raise hit rates but muddy the "elite" definition.
- **Name matching:** Normalized string matching may miss players with unusual name variants. A manual override dictionary could improve edge cases.
- **Selection bias:** Our cohort (2010–2021) coincides with the start of the analytics era. Drafting strategies have evolved.
- **Position changes:** We use position-at-draft-time. Players who switched positions (e.g., OT → TE) are tracked under their draft position.
- **Undrafted free agents excluded:** UDFA All-Pros are not counted in hit rates (they have no draft round), which slightly overstates undrafted success relative to drafted players.