Our preprocessing approach is designed to transform raw weekly player statistics into meaningful, position-specific performance metrics. The main goal is to identify players with consistent and high production for predictive modeling.

1. Aggregation of Weekly Data

We start by parsing the weekly offense dataset.

For each player, we compute average performance metrics per week within their respective position (RB, WR, TE, QB).

This allows us to rank players by typical production levels for each position.

In [2]:
import pandas as pd
import numpy as np

In [None]:
weekly_offense = pd.read_csv('Data/weekly_player_stats_offense.csv')
## check how many record are in the weekly player stats 
print(f"Total records in weekly_offense: {len(weekly_offense):,}")
## check how many columns are there print out there names 
print(weekly_offense.columns.tolist())

Total records in weekly_offense: 58,629
['season', 'week', 'offense_snaps', 'offense_pct', 'team_offense_snaps', 'player_id', 'birth_year', 'draft_year', 'draft_round', 'draft_pick', 'draft_ovr', 'height', 'weight', 'college', 'season_type', 'player_name', 'position', 'depth_team', 'conference', 'division', 'team', 'shotgun', 'no_huddle', 'qb_dropback', 'qb_scramble', 'pass_attempts', 'complete_pass', 'incomplete_pass', 'passing_yards', 'receiving_yards', 'yards_after_catch', 'rush_attempts', 'rushing_yards', 'tackled_for_loss', 'first_down_pass', 'first_down_rush', 'third_down_converted', 'third_down_failed', 'fourth_down_converted', 'fourth_down_failed', 'rush_touchdown', 'pass_touchdown', 'safety', 'interception', 'fumble', 'fumble_lost', 'fumble_forced', 'fumble_not_forced', 'fumble_out_of_bounds', 'receptions', 'targets', 'passing_air_yards', 'receiving_air_yards', 'receiving_touchdown', 'pass_attempts_redzone', 'complete_pass_redzone', 'pass_touchdown_redzone', 'pass_attempts_gtg

In [None]:
## check all the positions that are available 
print(weekly_offense['position'].value_counts())
## make a dict with actual column names 
performance_metrics = {
    'QB': ['passing_yards', 'pass_touchdown', 'interception', 'rushing_yards', 'rush_touchdown', 'fantasy_points_ppr'],
    'RB': ['rushing_yards', 'rush_touchdown', 'receiving_yards', 'receiving_touchdown', 'receptions', 'fumble', 'fantasy_points_ppr'],
    'WR': ['receiving_yards', 'receiving_touchdown', 'receptions', 'targets', 'fumble', 'fantasy_points_ppr'],
    'TE': ['receiving_yards', 'receiving_touchdown', 'receptions', 'targets', 'fumble', 'fantasy_points_ppr']
}

position
WR     23085
RB     15009
TE     11862
QB      7670
FB       730
P         84
CB        54
SS        38
FS        33
ILB       18
DE        11
DT        11
OLB        7
K          6
NT         5
MLB        3
LS         2
T          1
Name: count, dtype: int64


In [None]:
# Filter for fantasy football positions only
fantasy_positions = ['QB', 'RB', 'WR', 'TE']
weekly_offense_filtered = weekly_offense[weekly_offense['position'].isin(fantasy_positions)].copy()
## check how many record are in the new weekly player stats 
print(f"Total records in weekly_offense: {len(weekly_offense):,}")
## check our offense position distribution
print(weekly_offense_filtered['position'].value_counts())


Total records in weekly_offense: 58,629
position
WR    23085
RB    15009
TE    11862
QB     7670
Name: count, dtype: int64


In [17]:
# Calculate weekly averages for each player by position
position_aggregations = {}

for position in fantasy_positions:
    #Get only players from this position
    position_data = weekly_offense_filtered[weekly_offense_filtered['position'] == position].copy()
    print(f"Found {len(position_data):,} weekly records for {position} players")

    # get the stats we care about for this position
    stats_we_want = performance_metrics.get(position, [])

    # Set up what we want to calculate
    calculations = {}

    # Add each stat we want to average
    for stat in stats_we_want:
        calculations[stat] = 'mean'
    # Also count how many games each player played
    calculations['week'] = 'count'

    # Group players and calculate averages
    player_groups = ['player_id', 'player_name']
    player_averages = position_data.groupby(player_groups).agg(calculations)
    player_averages = player_averages.round(2)
    
    #Clean up the column names
    player_averages = player_averages.rename(columns={'week': 'games_played'})
    player_averages['position'] = position

    # Sort by fantasy points (best first)
    player_averages = player_averages.sort_values('fantasy_points_ppr', ascending=False)

    #Save results
    position_aggregations[position] = player_averages

    # Show top 5 players for this position
    print(f"Top 5 {position} players:")
    top_5 = player_averages[['fantasy_points_ppr', 'games_played']].head(5)
    
    for i, (player_info, stats) in enumerate(top_5.iterrows(), 1):
        player_name = player_info[1]
        fantasy_avg = stats['fantasy_points_ppr']
        games = stats['games_played']
        print(f"   {i}. {player_name}: {fantasy_avg} pts/game ({games} games)")


    


Found 7,670 weekly records for QB players
Top 5 QB players:
   1. Patrick Mahomes: 26.0 pts/game (131.0 games)
   2. Josh Allen: 25.43 pts/game (117.0 games)
   3. Drew Brees: 23.9 pts/game (137.0 games)
   4. Jayden Daniels: 23.89 pts/game (19.0 games)
   5. Lamar Jackson: 23.36 pts/game (93.0 games)
Found 15,009 weekly records for RB players
Top 5 RB players:
   1. Christian McCaffrey: 22.0 pts/game (93.0 games)
   2. Alvin Kamara: 19.15 pts/game (118.0 games)
   3. Jahmyr Gibbs: 18.73 pts/game (35.0 games)
   4. Saquon Barkley: 18.37 pts/game (94.0 games)
   5. Le'Veon Bell: 17.88 pts/game (90.0 games)
Found 23,085 weekly records for WR players
Top 5 WR players:
   1. Antonio Brown: 19.4 pts/game (122.0 games)
   2. Justin Jefferson: 19.01 pts/game (74.0 games)
   3. Ja'Marr Chase: 18.81 pts/game (66.0 games)
   4. Calvin Johnson: 18.37 pts/game (60.0 games)
   5. Malik Nabers: 17.84 pts/game (15.0 games)
Found 11,862 weekly records for TE players
Top 5 TE players:
   1. Travis Kelc

In [18]:
# Summary for each position
for position, data in position_aggregations.items():
    num_players = len(data)
    avg_fantasy = data['fantasy_points_ppr'].mean()
    
    # Get best player info
    best_player_info = data.index[0]
    best_player_name = best_player_info[1]
    best_player_score = data['fantasy_points_ppr'].iloc[0]
    
    print(f"\n{position} Summary:")
    print(f"   Players: {num_players}")
    print(f"   Average fantasy pts: {avg_fantasy:.1f}/game")
    print(f"   Best player: {best_player_name} ({best_player_score:.1f} pts/game)")

# Save all results to file
if position_aggregations:
    print(f"\nSaving results...")
    
    # Combine all positions
    all_positions = []
    for position_data in position_aggregations.values():
        all_positions.append(position_data)
    
    all_players = pd.concat(all_positions, ignore_index=False)
    all_players = all_players.reset_index()
    
    # Save to CSV
    output_file = 'Data/player_weekly_averages.csv'
    all_players.to_csv(output_file, index=False)
    print(f"Saved {len(all_players)} players to: {output_file}")
    
    # Show top 10 overall
    print(f"\nTOP 10 FANTASY PLAYERS OVERALL:")
    top_10 = all_players.nlargest(10, 'fantasy_points_ppr')
    
    for i, player in top_10.iterrows():
        name = player['player_name']
        pos = player['position']
        pts = player['fantasy_points_ppr']
        games = player['games_played']
        print(f"   {i+1:2d}. {name} ({pos}): {pts:.1f} pts/game ({games} games)")


QB Summary:
   Players: 182
   Average fantasy pts: 11.0/game
   Best player: Patrick Mahomes (26.0 pts/game)

RB Summary:
   Players: 397
   Average fantasy pts: 6.8/game
   Best player: Christian McCaffrey (22.0 pts/game)

WR Summary:
   Players: 573
   Average fantasy pts: 6.3/game
   Best player: Antonio Brown (19.4 pts/game)

TE Summary:
   Players: 285
   Average fantasy pts: 4.9/game
   Best player: Travis Kelce (15.6 pts/game)

Saving results...
Saved 1437 players to: Data/player_weekly_averages.csv

TOP 10 FANTASY PLAYERS OVERALL:
    1. Patrick Mahomes (QB): 26.0 pts/game (131 games)
    2. Josh Allen (QB): 25.4 pts/game (117 games)
    3. Drew Brees (QB): 23.9 pts/game (137 games)
    4. Jayden Daniels (QB): 23.9 pts/game (19 games)
    5. Lamar Jackson (QB): 23.4 pts/game (93 games)
    6. Aaron Rodgers (QB): 23.2 pts/game (187 games)
    7. Joe Burrow (QB): 23.1 pts/game (74 games)
    8. Deshaun Watson (QB): 23.1 pts/game (73 games)
    9. Tom Brady (QB): 22.8 pts/game (

2. Reliability Index

Using data from the previous season, we calculate the standard deviation of each player’s weekly scores.

Players with low standard deviations and high average performance are considered reliable performers.

This creates a reliability index for each player, which helps quantify consistency in scoring.