# üìä Player-Week Statistics with Uncertainty

This notebook computes mean projections and variance estimates for each player-week combination.

**Input:** `projections_with_sleeper` table (from `match_projections_to_sleeper.ipynb`)

**Output:** `player_week_stats` table with:
- `mu` = mean projection across sources
- `sigma` = uncertainty (from source disagreement + position baseline)
- `var` = variance (sigma¬≤)

**Use case:** The variance enables computing team-level variance (sum of individual variances under independence assumption). Correlations can be added later.


## ‚öôÔ∏è Configuration

Adjust these parameters to control uncertainty estimates:


In [1]:
# =============================================================================
# ADJUSTABLE PARAMETERS
# =============================================================================

# How much source disagreement inflates uncertainty
# Higher alpha = more weight on projection variance across sources
ALPHA = 2.0

# Baseline outcome randomness weight
# Higher beta = more weight on inherent position variance
BETA = 1.0

# Baseline standard deviation by position (points)
# Represents inherent outcome variance for each position
POS_SIGMA = {
    'QB': 7.0,   # Quarterbacks: moderate variance
    'RB': 9.0,   # Running backs: high variance (usage-dependent)
    'WR': 10.0,  # Wide receivers: highest variance (big-play dependent)
    'TE': 8.0,   # Tight ends: moderate-high variance
    'K': 4.0,    # Kickers: low variance (consistent scoring)
    'DST': 7.0,  # Defense: moderate variance
}

# Default for unknown positions
DEFAULT_POS_SIGMA = 8.0

print("‚úì Configuration loaded")
print(f"  Alpha (source disagreement weight): {ALPHA}")
print(f"  Beta (baseline randomness weight): {BETA}")
print(f"  Position sigmas: {POS_SIGMA}")
print(f"  Default sigma: {DEFAULT_POS_SIGMA}")


‚úì Configuration loaded
  Alpha (source disagreement weight): 2.0
  Beta (baseline randomness weight): 1.0
  Position sigmas: {'QB': 7.0, 'RB': 9.0, 'WR': 10.0, 'TE': 8.0, 'K': 4.0, 'DST': 7.0}
  Default sigma: 8.0


## üì¶ Setup & Imports


In [2]:
import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timezone
import re

print("‚úì Imports successful")


‚úì Imports successful


## üì• Load Data

Load projections from the `projections_with_sleeper` table.


In [3]:
print("Loading projections from database...\n")

# Connect to database
conn = sqlite3.connect('projections.db')

# Load projections with Sleeper IDs
query = """
    SELECT 
        sleeper_player_id,
        player_first_name || ' ' || player_last_name as player_name,
        position,
        team,
        source_website,
        week,
        projected_points,
        match_method
    FROM projections_with_sleeper
    WHERE sleeper_player_id IS NOT NULL
"""

df = pd.read_sql_query(query, conn)

print(f"‚úì Loaded {len(df)} projections with Sleeper IDs")
print(f"\nData shape: {df.shape}")
print(f"Unique players: {df['sleeper_player_id'].nunique()}")
print(f"Unique weeks: {df['week'].nunique()}")
print(f"Sources: {df['source_website'].nunique()}")

# Show sample
print(f"\nSample data:")
print(df.head(10).to_string(index=False))

conn.close()


Loading projections from database...

‚úì Loaded 1241 projections with Sleeper IDs

Data shape: (1241, 8)
Unique players: 527
Unique weeks: 1
Sources: 4

Sample data:
sleeper_player_id         player_name position team  source_website    week  projected_points match_method
             4984          Josh Allen       QB  BUF     sleeper.com Week 10              25.6    automatic
             4881       Lamar Jackson       QB  BAL     sleeper.com Week 10              25.1    automatic
             4984          Josh Allen       QB  BUF fantasypros.com Week 10              25.0    automatic
             4881       Lamar Jackson       QB  BAL fantasypros.com Week 10              24.5    automatic
             4034 Christian McCaffrey       RB   SF        espn.com Week 10              24.4    automatic
            12508         Jaxson Dart       QB  NYG     sleeper.com Week 10              24.1    automatic
            11564          Drake Maye       QB   NE fantasypros.com Week 10         

## üßπ Data Cleaning

1. Parse week to integer
2. Drop rows with null projected_points


In [4]:
print("Cleaning data...\n")

initial_rows = len(df)

# Parse week to integer (extract digits)
def parse_week(week_str):
    """Extract integer from week string (e.g., 'Week 9' -> 9)"""
    if pd.isna(week_str):
        return None
    # Extract all digits
    digits = re.sub(r'\D', '', str(week_str))
    return int(digits) if digits else None

df['week_int'] = df['week'].apply(parse_week)

# Drop null projected_points
df = df.dropna(subset=['projected_points'])
after_null_drop = len(df)

# Drop null week_int
df = df.dropna(subset=['week_int'])
after_week_drop = len(df)

print(f"‚úì Parsing complete")
print(f"  Initial rows: {initial_rows:,}")
print(f"  After dropping null projected_points: {after_null_drop:,} ({initial_rows - after_null_drop} dropped)")
print(f"  After dropping null week: {after_week_drop:,} ({after_null_drop - after_week_drop} dropped)")
print(f"  Final rows: {len(df):,}")

# Show week distribution
print(f"\nWeek distribution:")
week_counts = df.groupby('week_int').size().sort_index()
for week, count in week_counts.items():
    print(f"  Week {int(week)}: {count:,} projections")


Cleaning data...

‚úì Parsing complete
  Initial rows: 1,241
  After dropping null projected_points: 1,241 (0 dropped)
  After dropping null week: 1,241 (0 dropped)
  Final rows: 1,241

Week distribution:
  Week 10: 1,241 projections


## üßÆ Compute Statistics

For each (sleeper_player_id, week, position):
- **mu** = mean of projections across sources
- **s** = sample standard deviation (ddof=1) if ‚â•2 sources, else 0
- **sigma** = sqrt((alpha √ó s)¬≤ + (beta √ó pos_sigma)¬≤)
- **var** = sigma¬≤


In [5]:
print("Computing player-week statistics...\n")

# Group by sleeper_player_id and week only (the unique key)
grouped = df.groupby(['sleeper_player_id', 'week_int'])

stats_list = []

for (player_id, week), group in grouped:
    projections = group['projected_points'].values
    n_sources = len(projections)
    
    # Compute mean
    mu = np.mean(projections)
    
    # Compute sample standard deviation (ddof=1)
    if n_sources >= 2:
        s = np.std(projections, ddof=1)
    else:
        s = 0.0
    
    stats_list.append({
        'sleeper_player_id': player_id,
        'week': week,
        'mu': mu,
        's': s,  # Store sample std for later
        'n_sources': n_sources,
    })

# Create DataFrame with basic stats
stats_df = pd.DataFrame(stats_list)

print(f"‚úì Computed statistics for {len(stats_df):,} player-week combinations")
print(f"  Unique players: {stats_df['sleeper_player_id'].nunique()}")
print(f"  Unique weeks: {stats_df['week'].nunique()}")

# Load player info from Sleeper database
print(f"\nLoading player information from Sleeper database...")
from database_league import LeagueDB

with LeagueDB() as db:
    nfl_players = db.get_nfl_players()

# Create player lookup dictionary
player_lookup = {p['player_id']: p for p in nfl_players}

# Add player name and position from Sleeper database
stats_df['player_name'] = stats_df['sleeper_player_id'].apply(
    lambda pid: f"{player_lookup.get(pid, {}).get('first_name', '')} {player_lookup.get(pid, {}).get('last_name', '')}".strip()
)
stats_df['position'] = stats_df['sleeper_player_id'].apply(
    lambda pid: player_lookup.get(pid, {}).get('position', 'UNKNOWN')
)

print(f"  ‚úì Matched {(stats_df['position'] != 'UNKNOWN').sum():,} players to Sleeper database")

# Now compute sigma and var with position-specific parameters
missing_positions = set()

def compute_sigma_var(row):
    position = row['position']
    s = row['s']
    
    # Get position sigma (with default for unknown positions)
    pos_sigma = POS_SIGMA.get(position, DEFAULT_POS_SIGMA)
    if position not in POS_SIGMA and position != 'UNKNOWN':
        missing_positions.add(position)
    
    # Compute combined sigma: sqrt( (alpha * s)^2 + (beta * pos_sigma)^2 )
    sigma = np.sqrt((ALPHA * s)**2 + (BETA * pos_sigma)**2)
    var = sigma ** 2
    
    return pd.Series({
        'sigma': sigma,
        'var': var,
        'pos_sigma': pos_sigma,
        'alpha': ALPHA,
        'beta': BETA
    })

# Apply the computation
computed = stats_df.apply(compute_sigma_var, axis=1)
stats_df = pd.concat([stats_df, computed], axis=1)

# Drop the temporary 's' column
stats_df = stats_df.drop(columns=['s'])

# Warnings for missing positions
if missing_positions:
    print(f"\n‚ö†Ô∏è  Warning: Unknown positions found (using default sigma={DEFAULT_POS_SIGMA}):")
    for pos in sorted(missing_positions):
        count = stats_df[stats_df['position'] == pos].shape[0]
        print(f"  - {pos}: {count} player-weeks")

# Summary statistics
print(f"\nSummary statistics:")
print(f"  Mean mu: {stats_df['mu'].mean():.2f}")
print(f"  Mean sigma: {stats_df['sigma'].mean():.2f}")
print(f"  Mean var: {stats_df['var'].mean():.2f}")
print(f"  Mean sources per player-week: {stats_df['n_sources'].mean():.2f}")

print(f"\nSource count distribution:")
source_counts = stats_df['n_sources'].value_counts().sort_index()
for n, count in source_counts.items():
    print(f"  {n} sources: {count:,} player-weeks")


Computing player-week statistics...

‚úì Computed statistics for 527 player-week combinations
  Unique players: 527
  Unique weeks: 1

Loading player information from Sleeper database...
  ‚úì Matched 527 players to Sleeper database

Summary statistics:
  Mean mu: 5.57
  Mean sigma: 8.98
  Mean var: 84.01
  Mean sources per player-week: 2.35

Source count distribution:
  1 sources: 118 player-weeks
  2 sources: 206 player-weeks
  3 sources: 101 player-weeks
  4 sources: 102 player-weeks


## üìä Preview Results

Show top players by projected points with their uncertainty estimates.


In [6]:
print("\n" + "="*100)
print("TOP 25 PLAYERS BY MEAN PROJECTION (mu)")
print("="*100 + "\n")

top_players = stats_df.nlargest(25, 'mu')[[
    'player_name', 'position', 'week', 'mu', 'sigma', 'var', 'n_sources'
]].copy()

# Format for display
top_players['mu'] = top_players['mu'].round(2)
top_players['sigma'] = top_players['sigma'].round(2)
top_players['var'] = top_players['var'].round(2)

print(top_players.to_string(index=False))
print("\n" + "="*100 + "\n")



TOP 25 PLAYERS BY MEAN PROJECTION (mu)

        player_name position  week    mu  sigma    var  n_sources
         Josh Allen       QB    10 24.06   7.59  57.66          4
      Lamar Jackson       QB    10 24.04   7.25  52.56          4
Christian McCaffrey       RB    10 22.69   9.81  96.21          4
        Jaxson Dart       QB    10 22.05   8.48  71.91          4
         Puka Nacua       WR    10 21.55  11.23 126.20          4
         Drake Maye       QB    10 21.54   8.07  65.05          4
        Jalen Hurts       QB    10 21.48   7.90  62.36          4
    Jonathan Taylor       RB    10 21.20   9.46  89.55          4
     Justin Herbert       QB    10 20.90   7.96  63.28          4
     Bijan Robinson       RB    10 20.52   9.27  85.99          4
             Bo Nix       QB    10 20.46   7.82  61.11          4
      De'Von Achane       RB    10 20.33   9.35  87.38          4
       Daniel Jones       QB    10 20.10   7.41  54.94          4
  Amon-Ra St. Brown       WR    10 

## üíæ Save to Database

Create/update the `player_week_stats` table with upsert by (sleeper_player_id, week).


In [7]:
print("Saving to database...\n")

# Add computed_at timestamp (UTC ISO8601)
computed_at = datetime.now(timezone.utc).isoformat()
stats_df['computed_at'] = computed_at

# Connect to database
conn = sqlite3.connect('projections.db')
cursor = conn.cursor()

# Create table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS player_week_stats (
        sleeper_player_id TEXT NOT NULL,
        player_name TEXT,
        position TEXT,
        week INTEGER NOT NULL,
        mu REAL,
        sigma REAL,
        var REAL,
        n_sources INTEGER,
        alpha REAL,
        beta REAL,
        pos_sigma REAL,
        computed_at TEXT,
        PRIMARY KEY (sleeper_player_id, week)
    )
""")

print("  ‚úì Created/verified table 'player_week_stats'")

# Insert/update data (upsert)
inserted = 0
for _, row in stats_df.iterrows():
    cursor.execute("""
        INSERT OR REPLACE INTO player_week_stats 
        (sleeper_player_id, player_name, position, week, mu, sigma, var, 
         n_sources, alpha, beta, pos_sigma, computed_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        row['sleeper_player_id'],
        row['player_name'],
        row['position'],
        int(row['week']),
        float(row['mu']),
        float(row['sigma']),
        float(row['var']),
        int(row['n_sources']),
        float(row['alpha']),
        float(row['beta']),
        float(row['pos_sigma']),
        row['computed_at']
    ))
    inserted += 1

conn.commit()

print(f"  ‚úì Inserted/updated {inserted:,} records")

# Verify data
cursor.execute("SELECT COUNT(*) FROM player_week_stats")
total = cursor.fetchone()[0]
print(f"  ‚úì Total records in table: {total:,}")

conn.close()

print(f"\n‚úÖ Successfully saved player-week statistics!")
print(f"   Computed at: {computed_at}")


Saving to database...

  ‚úì Created/verified table 'player_week_stats'
  ‚úì Inserted/updated 527 records
  ‚úì Total records in table: 527

‚úÖ Successfully saved player-week statistics!
   Computed at: 2025-11-05T04:28:16.580672+00:00


## üîç Verification

Query the database to verify saved data.


In [8]:
print("Verifying saved data...\n")

conn = sqlite3.connect('projections.db')

# Sample query
query = """
    SELECT 
        player_name,
        position,
        week,
        ROUND(mu, 2) as mu,
        ROUND(sigma, 2) as sigma,
        ROUND(var, 2) as var,
        n_sources
    FROM player_week_stats
    ORDER BY mu DESC
    LIMIT 20
"""

df_verify = pd.read_sql_query(query, conn)

print("Top 20 players from database:")
print(df_verify.to_string(index=False))

# Statistics by position
print("\n" + "="*70)
print("STATISTICS BY POSITION")
print("="*70 + "\n")

query = """
    SELECT 
        position,
        COUNT(*) as count,
        ROUND(AVG(mu), 2) as avg_mu,
        ROUND(AVG(sigma), 2) as avg_sigma,
        ROUND(AVG(n_sources), 2) as avg_sources
    FROM player_week_stats
    GROUP BY position
    ORDER BY avg_mu DESC
"""

df_pos = pd.read_sql_query(query, conn)
print(df_pos.to_string(index=False))

conn.close()

print("\n‚úÖ Verification complete!")


Verifying saved data...

Top 20 players from database:
        player_name position  week    mu  sigma    var  n_sources
         Josh Allen       QB    10 24.07   7.59  57.66          4
      Lamar Jackson       QB    10 24.04   7.25  52.56          4
Christian McCaffrey       RB    10 22.69   9.81  96.21          4
        Jaxson Dart       QB    10 22.05   8.48  71.91          4
         Drake Maye       QB    10 21.55   8.07  65.05          4
         Puka Nacua       WR    10 21.55  11.23 126.20          4
        Jalen Hurts       QB    10 21.48   7.90  62.36          4
    Jonathan Taylor       RB    10 21.20   9.46  89.55          4
     Justin Herbert       QB    10 20.89   7.96  63.28          4
     Bijan Robinson       RB    10 20.52   9.27  85.99          4
             Bo Nix       QB    10 20.46   7.82  61.11          4
      De'Von Achane       RB    10 20.33   9.35  87.38          4
       Daniel Jones       QB    10 20.10   7.41  54.94          4
  Amon-Ra St. Brown  

## üìä Analysis: Uncertainty by Position

Compare how uncertainty (sigma) varies by position.


In [9]:
print("\n" + "="*70)
print("UNCERTAINTY ANALYSIS BY POSITION")
print("="*70 + "\n")

conn = sqlite3.connect('projections.db')

query = """
    SELECT 
        position,
        COUNT(*) as player_weeks,
        ROUND(AVG(mu), 2) as avg_projection,
        ROUND(AVG(sigma), 2) as avg_uncertainty,
        ROUND(MIN(sigma), 2) as min_uncertainty,
        ROUND(MAX(sigma), 2) as max_uncertainty,
        ROUND(AVG(n_sources), 2) as avg_sources
    FROM player_week_stats
    GROUP BY position
    ORDER BY position
"""

df_uncertainty = pd.read_sql_query(query, conn)
print(df_uncertainty.to_string(index=False))

print("\nInterpretation:")
print("  - Higher avg_uncertainty = more disagreement between sources + baseline variance")
print("  - Players with more sources tend to have lower relative uncertainty")
print("  - When sources disagree, sigma increases (alpha controls this effect)")

conn.close()



UNCERTAINTY ANALYSIS BY POSITION

position  player_weeks  avg_projection  avg_uncertainty  min_uncertainty  max_uncertainty  avg_sources
     DST            28            5.80             7.51              7.0             8.26         1.89
       K            28            7.98             4.29              4.0             5.05         2.89
      QB            70            8.29             8.71              7.0            20.67         2.56
      RB           115            5.81             9.24              9.0            12.12         2.38
      TE           102            3.70             8.16              8.0             9.28         2.50
      WR           184            5.03            10.31             10.0            16.11         2.17

Interpretation:
  - Higher avg_uncertainty = more disagreement between sources + baseline variance
  - Players with more sources tend to have lower relative uncertainty
  - When sources disagree, sigma increases (alpha controls this effect)


## üìù Usage Guide

### Query Examples

```sql
-- Get all stats for a specific player
SELECT * FROM player_week_stats 
WHERE sleeper_player_id = '4046'
ORDER BY week;

-- Find players with highest uncertainty (most projection disagreement)
SELECT player_name, position, week, mu, sigma, n_sources
FROM player_week_stats
WHERE week = 9
ORDER BY sigma DESC
LIMIT 20;

-- Compare players with similar projections but different uncertainty
SELECT player_name, position, week, mu, sigma, var, n_sources
FROM player_week_stats
WHERE week = 9 AND position = 'RB' AND mu BETWEEN 12 AND 15
ORDER BY sigma DESC;

-- Team variance calculation (sum of individual variances)
SELECT 
    week,
    SUM(mu) as team_mean,
    SUM(var) as team_variance,
    SQRT(SUM(var)) as team_sigma
FROM player_week_stats
WHERE sleeper_player_id IN ('4046', '7528', '8150', '6790', '9226')  -- Your roster
GROUP BY week;
```

### Understanding the Statistics

- **mu**: Expected value (mean projection across all sources)
- **sigma**: Standard deviation (uncertainty estimate)
- **var**: Variance (sigma¬≤)
- **n_sources**: Number of projection sources for this player-week

### Formula

```
sigma = sqrt( (alpha √ó s)¬≤ + (beta √ó pos_sigma)¬≤ )
```

Where:
- **s** = sample std of projections across sources (ddof=1)
- **pos_sigma** = baseline uncertainty for position
- **alpha** = weight for source disagreement (default: 2.0)
- **beta** = weight for baseline variance (default: 1.0)

### Adjusting Parameters

To change uncertainty estimates:
1. Modify `ALPHA`, `BETA`, or `POS_SIGMA` in the Configuration cell
2. Re-run all cells
3. The table will be updated with new values (upsert by player+week)

### Team Variance

Under independence assumption:
- Team mean = Œ£ mu_i
- Team variance = Œ£ var_i
- Team sigma = sqrt(Œ£ var_i)

Correlations between players can be added later for more sophisticated modeling.
