Our preprocessing approach is designed to transform raw weekly player statistics into meaningful, position-specific performance metrics. The main goal is to identify players with consistent and high production for predictive modeling.

1. Aggregation of Weekly Data

We start by parsing the weekly offense dataset.

For each player, we compute average performance metrics per week within their respective position (RB, WR, TE, QB).

This allows us to rank players by typical production levels for each position.

In [2]:
import pandas as pd
import numpy as np

In [None]:
weekly_offense = pd.read_csv('Data/weekly_player_stats_offense.csv')
## check how many record are in the weekly player stats 
print(f"Total records in weekly_offense: {len(weekly_offense):,}")
## check how many columns are there print out there names 
print(weekly_offense.columns.tolist())

Total records in weekly_offense: 58,629
['season', 'week', 'offense_snaps', 'offense_pct', 'team_offense_snaps', 'player_id', 'birth_year', 'draft_year', 'draft_round', 'draft_pick', 'draft_ovr', 'height', 'weight', 'college', 'season_type', 'player_name', 'position', 'depth_team', 'conference', 'division', 'team', 'shotgun', 'no_huddle', 'qb_dropback', 'qb_scramble', 'pass_attempts', 'complete_pass', 'incomplete_pass', 'passing_yards', 'receiving_yards', 'yards_after_catch', 'rush_attempts', 'rushing_yards', 'tackled_for_loss', 'first_down_pass', 'first_down_rush', 'third_down_converted', 'third_down_failed', 'fourth_down_converted', 'fourth_down_failed', 'rush_touchdown', 'pass_touchdown', 'safety', 'interception', 'fumble', 'fumble_lost', 'fumble_forced', 'fumble_not_forced', 'fumble_out_of_bounds', 'receptions', 'targets', 'passing_air_yards', 'receiving_air_yards', 'receiving_touchdown', 'pass_attempts_redzone', 'complete_pass_redzone', 'pass_touchdown_redzone', 'pass_attempts_gtg

In [None]:
## check all the positions that are available 
print(weekly_offense['position'].value_counts())
## make a dict with actual column names 
performance_metrics = {
    'QB': ['passing_yards', 'pass_touchdown', 'interception', 'rushing_yards', 'rush_touchdown', 'fantasy_points_ppr'],
    'RB': ['rushing_yards', 'rush_touchdown', 'receiving_yards', 'receiving_touchdown', 'receptions', 'fumble', 'fantasy_points_ppr'],
    'WR': ['receiving_yards', 'receiving_touchdown', 'receptions', 'targets', 'fumble', 'fantasy_points_ppr'],
    'TE': ['receiving_yards', 'receiving_touchdown', 'receptions', 'targets', 'fumble', 'fantasy_points_ppr']
}

position
WR     23085
RB     15009
TE     11862
QB      7670
FB       730
P         84
CB        54
SS        38
FS        33
ILB       18
DE        11
DT        11
OLB        7
K          6
NT         5
MLB        3
LS         2
T          1
Name: count, dtype: int64


In [None]:
# Filter for fantasy football positions only
fantasy_positions = ['QB', 'RB', 'WR', 'TE']
weekly_offense_filtered = weekly_offense[weekly_offense['position'].isin(fantasy_positions)].copy()
## check how many record are in the new weekly player stats 
print(f"Total records in weekly_offense: {len(weekly_offense):,}")
## check our offense position distribution
print(weekly_offense_filtered['position'].value_counts())


Total records in weekly_offense: 58,629
position
WR    23085
RB    15009
TE    11862
QB     7670
Name: count, dtype: int64
