# Cleaning Game-by-Game NFL Data

I will be uploading csv files that contain NFL game by game data. I will then be cleaing it so I can process it further and eventually merge it with other data.

In [1]:
import os
import numpy as np
import pandas as pd
import re

In [2]:
df_20 = pd.read_csv('/Users/epainter/Desktop/bet_model_v2/data/raw/gd_2020.csv')
df_21 = pd.read_csv('/Users/epainter/Desktop/bet_model_v2/data/raw/gd_2021.csv')
df_22 = pd.read_csv('/Users/epainter/Desktop/bet_model_v2/data/raw/gd_2022.csv')
df_23 = pd.read_csv('/Users/epainter/Desktop/bet_model_v2/data/raw/gd_2023.csv')

First step is to get rid of weeks that include playoff games, similar to how we cleaned the historical NFL betting data.

Again, this is due to the fact that we want to the model to work with regular season, not playoff games.

In [3]:
# For matching purposes we need to add a year to each data frame

df_20['Season'] = 2020
df_21['Season'] = 2021
df_22['Season'] = 2022
df_23['Season'] = 2023

df_list = [df_20, df_21, df_22, df_23]

for i in range(len(df_list)):
    df = df_list[i]
    
    values_to_drop = ['WildCard', 'Division', 'ConfChamp', 'SuperBowl']  # List of values to drop
    df = df[~df['Week'].isin(values_to_drop)]
    
    df = df[~df['Week'].isna()]
    
    cols_drop = ['Day', 'Date', 'Time', 'Unnamed: 5', 'Unnamed: 7']
    df = df.drop(columns=cols_drop)
    
    # Rename columns
    new_col_names = {
        'Week': 'week', 
        'Winner/tie': 'winner', 
        'Loser/tie': 'loser', 
        'Pts': 'pts_w', 
        'Pts.1': 'pts_l', 
        'YdsW': 'yds_w', 
        'TOW': 'to_w', 
        'YdsL': 'yds_l',
        'TOL': 'to_l',
        'Season': 'season'
    }
    df = df.rename(columns=new_col_names)
    
    df['week'] = df['week'].astype(int)
    
    # Assign the modified DataFrame back to the list
    df_list[i] = df
    

I created a list where each item is an entire NFL seasons regular season game by game data. Below I just assigned each item back to the original data frame.

In [4]:
df_20 = df_list[0]
df_21 = df_list[1]
df_22 = df_list[2]
df_23 = df_list[3]

## Merging All Data

Here we will merge each seasons game by game data frame together into 1 data frame.

In [6]:
df_combined = pd.concat(df_list, ignore_index=True)

In [7]:
df_combined

Unnamed: 0,week,winner,loser,pts_w,pts_l,yds_w,to_w,yds_l,to_l,season
0,1,Kansas City Chiefs,Houston Texans,34.0,20.0,369.0,0.0,360.0,1.0,2020
1,1,Seattle Seahawks,Atlanta Falcons,38.0,25.0,383.0,0.0,506.0,2.0,2020
2,1,Buffalo Bills,New York Jets,27.0,17.0,404.0,2.0,254.0,2.0,2020
3,1,Las Vegas Raiders,Carolina Panthers,34.0,30.0,372.0,0.0,388.0,0.0,2020
4,1,Chicago Bears,Detroit Lions,27.0,23.0,363.0,0.0,426.0,1.0,2020
...,...,...,...,...,...,...,...,...,...,...
1066,18,Las Vegas Raiders,Denver Broncos,27.0,14.0,359.0,0.0,286.0,1.0,2023
1067,18,Kansas City Chiefs,Los Angeles Chargers,13.0,12.0,268.0,1.0,353.0,1.0,2023
1068,18,New York Giants,Philadelphia Eagles,27.0,10.0,415.0,1.0,299.0,4.0,2023
1069,18,Los Angeles Rams,San Francisco 49ers,21.0,20.0,258.0,1.0,300.0,1.0,2023


# Write to CSV

In [8]:
fp = '/Users/epainter/Desktop/bet_model_v2/data/clean/gd_clean.csv'

df_combined.to_csv(fp, index=False)

print(f"Data saved to {fp}")

Data saved to /Users/epainter/Desktop/bet_model_v2/data/clean/gd_clean.csv
