#### Rolling Averages for Game Statistics

In this section, we define a function to compute rolling averages for various game statistics over a specified number of past games (`game_window`). This is useful for tracking team performance trends over time while smoothing out short-term fluctuations.

##### Methodology
1. **Load the Data:** We read game logs from a CSV file into a Pandas DataFrame.
2. **Preprocess Data:**
   - Convert the `date` column to a datetime format.
   - Sort the dataset by `team` and `date` to ensure chronological order.
3. **Calculate Rolling Averages:**
   - Apply an exponentially weighted moving average (EWMA) with a defined `span` equal to `game_window`.
   - Shift the rolling values by one game to ensure that each row only reflects past performance.
   - Retain the first game's original values to avoid NaNs in the output.
4. **Post-processing:**
   - Round all numerical values to two decimal places for readability.
   - Save the processed data to an output CSV file, overwriting any existing file if necessary.

This approach ensures that rolling averages are computed efficiently and can be easily used for further predictive modeling or analysis.

In [None]:
# Importing libraries
import pandas as pd
import os

# Game window size
game_window: int = 25

# Pipeline function to execute the calculations
def compute_rolling_averages(game_window: int, gamelogs_file: str, output_file: str):

    # Load the CSV file
    print("Loading CSV file...")
    df: pd.DataFrame = pd.read_csv(gamelogs_file)

    # Sort by team and date
    print("Sorting data by team and date...")
    df["date"] = pd.to_datetime(df["date"])
    df = df.sort_values(by=["team", "date"])

    # Feature Engineering
    df["nrtg"] = round(df["ortg"] - df["drtg"], 2)
    df["ast_tov"] = round(df["ast"] / df["tov"], 2)
    df["ast_ratio"] = round(df["ast"] / (df["fg"] + df["ast"] + df["tov"]), 2)

    # Identify stat columns (exclude non-numerical or metadata columns)
    stat_columns: list[str] = [col for col in df.columns if col not in ["date", "team"]]

    # Combining the 2 averages methods to obtain a more neutral overview
    print("Computing combined rolling average and EWMA...")
    def compute_combined_avg(group: pd.DataFrame) -> pd.DataFrame:

        # Rolling average (simple mean)
        rolling_mean: pd.DataFrame = (
            group[stat_columns]
            .rolling(window=game_window, min_periods=1)
            .mean()
            .shift(1)
        )

        # Exponential weighted mean (recent games weighted more)
        ewma: pd.DataFrame = (
            group[stat_columns].ewm(span=game_window, adjust=False).mean().shift(1)
        )

        # Combined average (equal weight, or adjust ratio if you prefer)
        combined: pd.DataFrame = (0.4 * rolling_mean + 0.6 * ewma) / 2

        # Fill the first row with actual values to avoid NaNs
        combined.iloc[0] = group.iloc[0][stat_columns]
        return combined

    # Apply per team
    df[stat_columns] = df.groupby("team", group_keys=False, observed=True)[
        stat_columns
    ].apply(compute_combined_avg)

    # Round the results
    print("Rounding values...")
    df[stat_columns] = df[stat_columns].round(2)

    # Save to CSV
    if os.path.exists(output_file):
        print(f"File {output_file} already exists. Removing...")
        os.remove(output_file)

    df.to_csv(output_file, index=False)
    print(f"Rolling averages saved to {output_file}")


# Run the function
compute_rolling_averages(game_window, "./csv/gamelogs.csv", "./csv/averages.csv")