# üìä Player-Week Statistics with Uncertainty

This notebook computes mean projections and variance estimates for each player-week combination.

**Input:** `projections_with_sleeper` table (from `match_projections_to_sleeper.ipynb`)

**Output:** `player_week_stats` table with:
- `mu` = mean projection across sources
- `sigma` = uncertainty (from source disagreement + position baseline)
- `var` = variance (sigma¬≤)

**Use case:** The variance enables computing team-level variance (sum of individual variances under independence assumption). Correlations can be added later.


## ‚öôÔ∏è Configuration

Adjust these parameters to control uncertainty estimates:


In [1]:
# =============================================================================
# CONFIGURATION
# =============================================================================

# ‚ö†Ô∏è SET THIS WEEK NUMBER ‚ö†Ô∏è
WEEK = 11  # ‚¨ÖÔ∏è UPDATE THIS EACH WEEK!

# How much source disagreement inflates uncertainty
# Higher alpha = more weight on projection variance across sources
ALPHA = 2.0

# Baseline outcome randomness weight
# Higher beta = more weight on inherent position variance
BETA = 1.0

# Baseline standard deviation by position (points)
# Represents inherent outcome variance for each position
POS_SIGMA = {
    'QB': 7.0,   # Quarterbacks: moderate variance
    'RB': 9.0,   # Running backs: high variance (usage-dependent)
    'WR': 10.0,  # Wide receivers: highest variance (big-play dependent)
    'TE': 8.0,   # Tight ends: moderate-high variance
    'K': 4.0,    # Kickers: low variance (consistent scoring)
    'DST': 7.0,  # Defense: moderate variance
}

# Default for unknown positions
DEFAULT_POS_SIGMA = 8.0

print("‚úÖ Configuration loaded")
print(f"\nüìÖ Computing stats for WEEK {WEEK} only")
print(f"\nUncertainty parameters:")
print(f"  Alpha (source disagreement weight): {ALPHA}")
print(f"  Beta (baseline randomness weight): {BETA}")
print(f"  Position sigmas: {POS_SIGMA}")
print(f"  Default sigma: {DEFAULT_POS_SIGMA}")


‚úÖ Configuration loaded

üìÖ Computing stats for WEEK 11 only

Uncertainty parameters:
  Alpha (source disagreement weight): 2.0
  Beta (baseline randomness weight): 1.0
  Position sigmas: {'QB': 7.0, 'RB': 9.0, 'WR': 10.0, 'TE': 8.0, 'K': 4.0, 'DST': 7.0}
  Default sigma: 8.0


## üì¶ Setup & Imports


In [2]:
import sqlite3
import sys
from pathlib import Path

# Add parent directory to path for imports
sys.path.append(str(Path().absolute().parent.parent))
sys.path.append(str(Path().absolute().parent / 'scrapers'))

import pandas as pd
import numpy as np
from datetime import datetime, timezone
import re

print("‚úì Imports successful")

# Get absolute paths to databases
NOTEBOOK_DIR = Path().absolute()
BACKEND_DIR = NOTEBOOK_DIR.parent
DB_PROJ_PATH = str(BACKEND_DIR / "data" / "databases" / "projections.db")
DB_LEAGUE_PATH = str(BACKEND_DIR / "data" / "databases" / "league.db")


‚úì Imports successful


## üì• Load Data

Load projections from the `projections_with_sleeper` table.


In [3]:
print(f"Loading projections for Week {WEEK}...\n")

# Connect to database
conn = sqlite3.connect(DB_PROJ_PATH)

# Load projections with Sleeper IDs for the specified week ONLY
week_str = f"Week {WEEK}"
query = """
    SELECT 
        sleeper_player_id,
        player_first_name || ' ' || player_last_name as player_name,
        position,
        team,
        source_website,
        week,
        projected_points,
        match_method
    FROM projections_with_sleeper
    WHERE sleeper_player_id IS NOT NULL
      AND week = ?
"""

df = pd.read_sql_query(query, conn, params=[week_str])

print(f"‚úÖ Loaded {len(df)} projections for Week {WEEK}")
print(f"\nData shape: {df.shape}")
print(f"Unique players: {df['sleeper_player_id'].nunique()}")
print(f"Sources: {df['source_website'].nunique()}")

# Show sample
print(f"\nSample data:")
print(df.head(10).to_string(index=False))

conn.close()


Loading projections for Week 11...

‚úÖ Loaded 1617 projections for Week 11

Data shape: (1617, 8)
Unique players: 574
Sources: 5

Sample data:
sleeper_player_id         player_name position team   source_website    week  projected_points match_method
             9226       De'Von Achane       RB  MIA         espn.com Week 11              27.9    automatic
             4034 Christian McCaffrey       RB   SF         espn.com Week 11              26.6    automatic
             4984          Josh Allen       QB  BUF      sleeper.com Week 11              26.0    automatic
             4984          Josh Allen       QB  BUF  fantasypros.com Week 11              25.0    automatic
             4881       Lamar Jackson       QB  BAL      sleeper.com Week 11              24.6    automatic
            11564          Drake Maye       QB   NE  fantasypros.com Week 11              24.5    automatic
             6904         Jalen Hurts       QB  PHI      sleeper.com Week 11              24.1    au

## üßπ Data Cleaning

1. Parse week to integer
2. Drop rows with null projected_points


In [4]:
print("Cleaning data...\n")

initial_rows = len(df)

# Parse week to integer (extract digits)
def parse_week(week_str):
    """Extract integer from week string (e.g., 'Week 9' -> 9)"""
    if pd.isna(week_str):
        return None
    # Extract all digits
    digits = re.sub(r'\D', '', str(week_str))
    return int(digits) if digits else None

df['week_int'] = df['week'].apply(parse_week)

# Drop null projected_points
df = df.dropna(subset=['projected_points'])
after_null_drop = len(df)

# Drop null week_int
df = df.dropna(subset=['week_int'])
after_week_drop = len(df)

print(f"‚úì Parsing complete")
print(f"  Initial rows: {initial_rows:,}")
print(f"  After dropping null projected_points: {after_null_drop:,} ({initial_rows - after_null_drop} dropped)")
print(f"  After dropping null week: {after_week_drop:,} ({after_null_drop - after_week_drop} dropped)")
print(f"  Final rows: {len(df):,}")

# Show week distribution
print(f"\nWeek distribution:")
week_counts = df.groupby('week_int').size().sort_index()
for week, count in week_counts.items():
    print(f"  Week {int(week)}: {count:,} projections")


Cleaning data...

‚úì Parsing complete
  Initial rows: 1,617
  After dropping null projected_points: 1,617 (0 dropped)
  After dropping null week: 1,617 (0 dropped)
  Final rows: 1,617

Week distribution:
  Week 11: 1,617 projections


## üßÆ Compute Statistics

For each (sleeper_player_id, week, position):
- **mu** = mean of projections across sources
- **s** = sample standard deviation (ddof=1) if ‚â•2 sources, else 0
- **sigma** = sqrt((alpha √ó s)¬≤ + (beta √ó pos_sigma)¬≤)
- **var** = sigma¬≤


In [5]:
print("Computing player-week statistics...\n")

# Group by sleeper_player_id and week only (the unique key)
grouped = df.groupby(['sleeper_player_id', 'week_int'])

stats_list = []

for (player_id, week), group in grouped:
    projections = group['projected_points'].values
    n_sources = len(projections)
    
    # Compute mean
    mu = np.mean(projections)
    
    # Compute sample standard deviation (ddof=1)
    if n_sources >= 2:
        s = np.std(projections, ddof=1)
    else:
        s = 0.0
    
    stats_list.append({
        'sleeper_player_id': player_id,
        'week': week,
        'mu': mu,
        's': s,  # Store sample std for later
        'n_sources': n_sources,
    })

# Create DataFrame with basic stats
stats_df = pd.DataFrame(stats_list)

print(f"‚úì Computed statistics for {len(stats_df):,} player-week combinations")
print(f"  Unique players: {stats_df['sleeper_player_id'].nunique()}")
print(f"  Unique weeks: {stats_df['week'].nunique()}")

# Load player info from Sleeper database
print(f"\nLoading player information from Sleeper database...")
from database_league import LeagueDB

with LeagueDB(db_path=DB_LEAGUE_PATH) as db:
    nfl_players = db.get_nfl_players()

# Create player lookup dictionary
player_lookup = {p['player_id']: p for p in nfl_players}

# Add player name and position from Sleeper database
stats_df['player_name'] = stats_df['sleeper_player_id'].apply(
    lambda pid: f"{player_lookup.get(pid, {}).get('first_name', '')} {player_lookup.get(pid, {}).get('last_name', '')}".strip()
)
stats_df['position'] = stats_df['sleeper_player_id'].apply(
    lambda pid: player_lookup.get(pid, {}).get('position', 'UNKNOWN')
)

print(f"  ‚úì Matched {(stats_df['position'] != 'UNKNOWN').sum():,} players to Sleeper database")

# Now compute sigma and var with position-specific parameters
missing_positions = set()

def compute_sigma_var(row):
    position = row['position']
    s = row['s']
    
    # Get position sigma (with default for unknown positions)
    pos_sigma = POS_SIGMA.get(position, DEFAULT_POS_SIGMA)
    if position not in POS_SIGMA and position != 'UNKNOWN':
        missing_positions.add(position)
    
    # Compute combined sigma: sqrt( (alpha * s)^2 + (beta * pos_sigma)^2 )
    sigma = np.sqrt((ALPHA * s)**2 + (BETA * pos_sigma)**2)
    var = sigma ** 2
    
    return pd.Series({
        'sigma': sigma,
        'var': var,
        'pos_sigma': pos_sigma,
        'alpha': ALPHA,
        'beta': BETA
    })

# Apply the computation
computed = stats_df.apply(compute_sigma_var, axis=1)
stats_df = pd.concat([stats_df, computed], axis=1)

# Drop the temporary 's' column
stats_df = stats_df.drop(columns=['s'])

# Warnings for missing positions
if missing_positions:
    print(f"\n‚ö†Ô∏è  Warning: Unknown positions found (using default sigma={DEFAULT_POS_SIGMA}):")
    for pos in sorted(missing_positions):
        count = stats_df[stats_df['position'] == pos].shape[0]
        print(f"  - {pos}: {count} player-weeks")

# Summary statistics
print(f"\nSummary statistics:")
print(f"  Mean mu: {stats_df['mu'].mean():.2f}")
print(f"  Mean sigma: {stats_df['sigma'].mean():.2f}")
print(f"  Mean var: {stats_df['var'].mean():.2f}")
print(f"  Mean sources per player-week: {stats_df['n_sources'].mean():.2f}")

print(f"\nSource count distribution:")
source_counts = stats_df['n_sources'].value_counts().sort_index()
for n, count in source_counts.items():
    print(f"  {n} sources: {count:,} player-weeks")


Computing player-week statistics...

‚úì Computed statistics for 574 player-week combinations
  Unique players: 574
  Unique weeks: 1

Loading player information from Sleeper database...
  ‚úì Matched 574 players to Sleeper database

Summary statistics:
  Mean mu: 5.37
  Mean sigma: 9.03
  Mean var: 84.46
  Mean sources per player-week: 2.82

Source count distribution:
  1 sources: 113 player-weeks
  2 sources: 187 player-weeks
  3 sources: 92 player-weeks
  4 sources: 56 player-weeks
  5 sources: 126 player-weeks


## üìä Preview Results

Show top players by projected points with their uncertainty estimates.


In [6]:
print("\n" + "="*100)
print("TOP 25 PLAYERS BY MEAN PROJECTION (mu)")
print("="*100 + "\n")

top_players = stats_df.nlargest(25, 'mu')[[
    'player_name', 'position', 'week', 'mu', 'sigma', 'var', 'n_sources'
]].copy()

# Format for display
top_players['mu'] = top_players['mu'].round(2)
top_players['sigma'] = top_players['sigma'].round(2)
top_players['var'] = top_players['var'].round(2)

print(top_players.to_string(index=False))
print("\n" + "="*100 + "\n")



TOP 25 PLAYERS BY MEAN PROJECTION (mu)

        player_name position  week    mu  sigma    var  n_sources
Christian McCaffrey       RB    11 23.56  10.16 103.17          5
      De'Von Achane       RB    11 23.37  10.80 116.70          5
         Josh Allen       QB    11 23.13   8.33  69.46          5
         Drake Maye       QB    11 22.65   7.42  55.00          5
        Jalen Hurts       QB    11 22.15   7.87  61.87          5
      Lamar Jackson       QB    11 21.67   8.49  72.16          5
     Justin Herbert       QB    11 21.10   7.78  60.52          5
     Bijan Robinson       RB    11 20.44   9.20  84.64          5
         Puka Nacua       WR    11 20.36  10.65 113.35          5
    Patrick Mahomes       QB    11 20.26   7.98  63.66          5
 Jaxon Smith-Njigba       WR    11 20.22  11.00 120.99          5
       Dak Prescott       QB    11 20.03   8.18  66.83          5
      Ja'Marr Chase       WR    11 20.02  10.73 115.19          5
        Josh Jacobs       RB    11 

## üíæ Save to Database

Create/update the `player_week_stats` table with upsert by (sleeper_player_id, week).


In [7]:
print("Saving to database...\n")

# Add computed_at timestamp (UTC ISO8601)
computed_at = datetime.now(timezone.utc).isoformat()
stats_df['computed_at'] = computed_at

# Connect to database
conn = sqlite3.connect(DB_PROJ_PATH)
cursor = conn.cursor()

# Create table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS player_week_stats (
        sleeper_player_id TEXT NOT NULL,
        player_name TEXT,
        position TEXT,
        week INTEGER NOT NULL,
        mu REAL,
        sigma REAL,
        var REAL,
        n_sources INTEGER,
        alpha REAL,
        beta REAL,
        pos_sigma REAL,
        computed_at TEXT,
        PRIMARY KEY (sleeper_player_id, week)
    )
""")

print("  ‚úì Created/verified table 'player_week_stats'")

# Insert/update data (upsert)
inserted = 0
for _, row in stats_df.iterrows():
    cursor.execute("""
        INSERT OR REPLACE INTO player_week_stats 
        (sleeper_player_id, player_name, position, week, mu, sigma, var, 
         n_sources, alpha, beta, pos_sigma, computed_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        row['sleeper_player_id'],
        row['player_name'],
        row['position'],
        int(row['week']),
        float(row['mu']),
        float(row['sigma']),
        float(row['var']),
        int(row['n_sources']),
        float(row['alpha']),
        float(row['beta']),
        float(row['pos_sigma']),
        row['computed_at']
    ))
    inserted += 1

conn.commit()

print(f"  ‚úì Inserted/updated {inserted:,} records")

# Verify data
cursor.execute("SELECT COUNT(*) FROM player_week_stats")
total = cursor.fetchone()[0]
print(f"  ‚úì Total records in table: {total:,}")

conn.close()

print(f"\n‚úÖ Successfully saved player-week statistics!")
print(f"   Computed at: {computed_at}")


Saving to database...

  ‚úì Created/verified table 'player_week_stats'
  ‚úì Inserted/updated 574 records
  ‚úì Total records in table: 1,111

‚úÖ Successfully saved player-week statistics!
   Computed at: 2025-11-13T05:36:50.248683+00:00


## üîç Verification

Query the database to verify saved data.


In [8]:
print("Verifying saved data...\n")

conn = sqlite3.connect(DB_PROJ_PATH)

# Sample query
query = """
    SELECT 
        player_name,
        position,
        week,
        ROUND(mu, 2) as mu,
        ROUND(sigma, 2) as sigma,
        ROUND(var, 2) as var,
        n_sources
    FROM player_week_stats
    ORDER BY mu DESC
    LIMIT 20
"""

df_verify = pd.read_sql_query(query, conn)

print("Top 20 players from database:")
print(df_verify.to_string(index=False))

# Statistics by position
print("\n" + "="*70)
print("STATISTICS BY POSITION")
print("="*70 + "\n")

query = """
    SELECT 
        position,
        COUNT(*) as count,
        ROUND(AVG(mu), 2) as avg_mu,
        ROUND(AVG(sigma), 2) as avg_sigma,
        ROUND(AVG(n_sources), 2) as avg_sources
    FROM player_week_stats
    GROUP BY position
    ORDER BY avg_mu DESC
"""

df_pos = pd.read_sql_query(query, conn)
print(df_pos.to_string(index=False))

conn.close()

print("\n‚úÖ Verification complete!")


Verifying saved data...

Top 20 players from database:
        player_name position  week    mu  sigma    var  n_sources
         Josh Allen       QB    10 23.67   7.42  54.99          5
Christian McCaffrey       RB    11 23.56  10.16 103.17          5
      De'Von Achane       RB    11 23.37  10.80 116.70          5
Christian McCaffrey       RB    10 23.27   9.21  84.81          5
         Josh Allen       QB    11 23.13   8.33  69.46          5
      Lamar Jackson       QB    10 23.10   7.59  57.53          5
         Drake Maye       QB    11 22.65   7.42  55.00          5
        Jalen Hurts       QB    11 22.15   7.87  61.87          5
         Puka Nacua       WR    10 22.10  10.31 106.35          5
      Lamar Jackson       QB    11 21.67   8.49  72.16          5
    Jonathan Taylor       RB    10 21.22   9.24  85.34          5
     Justin Herbert       QB    11 21.10   7.78  60.52          5
        Jaxson Dart       QB    10 20.89   8.89  79.03          5
         Drake Maye  

## üìä Analysis: Uncertainty by Position

Compare how uncertainty (sigma) varies by position.


In [9]:
print("\n" + "="*70)
print("UNCERTAINTY ANALYSIS BY POSITION")
print("="*70 + "\n")

conn = sqlite3.connect(DB_PROJ_PATH)

query = """
    SELECT 
        position,
        COUNT(*) as player_weeks,
        ROUND(AVG(mu), 2) as avg_projection,
        ROUND(AVG(sigma), 2) as avg_uncertainty,
        ROUND(MIN(sigma), 2) as min_uncertainty,
        ROUND(MAX(sigma), 2) as max_uncertainty,
        ROUND(AVG(n_sources), 2) as avg_sources
    FROM player_week_stats
    GROUP BY position
    ORDER BY position
"""

df_uncertainty = pd.read_sql_query(query, conn)
print(df_uncertainty.to_string(index=False))

print("\nInterpretation:")
print("  - Higher avg_uncertainty = more disagreement between sources + baseline variance")
print("  - Players with more sources tend to have lower relative uncertainty")
print("  - When sources disagree, sigma increases (alpha controls this effect)")

conn.close()



UNCERTAINTY ANALYSIS BY POSITION

position  player_weeks  avg_projection  avg_uncertainty  min_uncertainty  max_uncertainty  avg_sources
     DST            58            5.78             7.46              7.0             9.00         1.86
       K            57            8.00             4.31              4.0             5.45         2.82
      QB           145            7.68             8.05              7.0            21.33         2.99
      RB           251            5.31             9.39              9.0            19.21         3.23
      TE           212            3.66             8.20              8.0            10.73         2.91
      WR           388            4.97            10.31             10.0            15.45         2.79

Interpretation:
  - Higher avg_uncertainty = more disagreement between sources + baseline variance
  - Players with more sources tend to have lower relative uncertainty
  - When sources disagree, sigma increases (alpha controls this effect)


## üìù Usage Guide

### Weekly Workflow

**This notebook computes stats for ONE week at a time.**

1. **Set the week:** Update `WEEK = 11` in the Configuration cell
2. **Run all cells:** Restart kernel and run all cells
3. **Done!** The `player_week_stats` table is updated for that week only

**Benefits:**
- No need to re-compute old weeks
- Fast execution (only processes current week)
- Database table accumulates stats across multiple weeks via upsert

**Note:** The notebook uses `INSERT OR REPLACE` with PRIMARY KEY `(sleeper_player_id, week)`, so re-running the same week will update (not duplicate) the data.

### Query Examples

```sql
-- Get all stats for a specific player
SELECT * FROM player_week_stats 
WHERE sleeper_player_id = '4046'
ORDER BY week;

-- Find players with highest uncertainty (most projection disagreement)
SELECT player_name, position, week, mu, sigma, n_sources
FROM player_week_stats
WHERE week = 9
ORDER BY sigma DESC
LIMIT 20;

-- Compare players with similar projections but different uncertainty
SELECT player_name, position, week, mu, sigma, var, n_sources
FROM player_week_stats
WHERE week = 9 AND position = 'RB' AND mu BETWEEN 12 AND 15
ORDER BY sigma DESC;

-- Team variance calculation (sum of individual variances)
SELECT 
    week,
    SUM(mu) as team_mean,
    SUM(var) as team_variance,
    SQRT(SUM(var)) as team_sigma
FROM player_week_stats
WHERE sleeper_player_id IN ('4046', '7528', '8150', '6790', '9226')  -- Your roster
GROUP BY week;
```

### Understanding the Statistics

- **mu**: Expected value (mean projection across all sources)
- **sigma**: Standard deviation (uncertainty estimate)
- **var**: Variance (sigma¬≤)
- **n_sources**: Number of projection sources for this player-week

### Formula

```
sigma = sqrt( (alpha √ó s)¬≤ + (beta √ó pos_sigma)¬≤ )
```

Where:
- **s** = sample std of projections across sources (ddof=1)
- **pos_sigma** = baseline uncertainty for position
- **alpha** = weight for source disagreement (default: 2.0)
- **beta** = weight for baseline variance (default: 1.0)

### Adjusting Parameters

To change uncertainty estimates:
1. Modify `ALPHA`, `BETA`, or `POS_SIGMA` in the Configuration cell
2. Re-run all cells
3. The table will be updated with new values (upsert by player+week)

### Team Variance

Under independence assumption:
- Team mean = Œ£ mu_i
- Team variance = Œ£ var_i
- Team sigma = sqrt(Œ£ var_i)

Correlations between players can be added later for more sophisticated modeling.
