#### Rolling Averages for Game Statistics

In this section, we define a function to compute rolling averages for various game statistics over a specified number of past games (`game_window`). This is useful for tracking team performance trends over time while smoothing out short-term fluctuations.

##### Methodology
1. **Load the Data:** We read game logs from a CSV file into a Pandas DataFrame.
2. **Preprocess Data:**
   - Convert the `date` column to a datetime format.
   - Sort the dataset by `team` and `date` to ensure chronological order.
3. **Calculate Rolling Averages:**
   - Apply an exponentially weighted moving average (EWMA) with a defined `span` equal to `game_window`.
   - Shift the rolling values by one game to ensure that each row only reflects past performance.
   - Retain the first game's original values to avoid NaNs in the output.
4. **Post-processing:**
   - Round all numerical values to two decimal places for readability.
   - Save the processed data to an output CSV file, overwriting any existing file if necessary.

This approach ensures that rolling averages are computed efficiently and can be easily used for further predictive modeling or analysis.

In [None]:
# Importing libraries
import pandas as pd
import os
from tqdm import tqdm

# Game window size
game_window: int = 25


# Pipeline function to execute the calculations
def compute_rolling_averages(game_window: int, gamelogs_file: str, output_file: str):
    """
    Compute rolling and exponentially weighted averages for team game statistics.

    This pipeline function loads a game logs CSV file, sorts the data by team
    and date, performs feature engineering, and computes a combined rolling
    average for each numerical statistic on a per-team basis.

    The combined average is calculated using:
    - A simple rolling mean over a fixed game window
    - An exponential weighted moving average (EWMA) that emphasizes recent games

    The final value is a weighted combination of both methods, shifted by one
    game to prevent data leakage from the current match.

    Parameters
    ----------
    game_window : int
        Number of games to include in the rolling window.
    gamelogs_file : str
        Path to the input CSV file containing per-game team statistics.
    output_file : str
        Path where the computed rolling averages CSV will be saved.

    Returns
    -------
    None
        This function does not return a value. The processed data is written
        directly to the output CSV file.

    Raises
    ------
    FileNotFoundError
        If the input CSV file does not exist.
    ValueError
        If `game_window` is less than 1 or invalid rolling operations occur.
    KeyError
        If required statistical columns are missing from the dataset.
    OSError
        If the output file cannot be created or overwritten.

    Notes
    -----
    - All statistics are shifted by one game to avoid using information from
      the current game.
    - Feature engineering includes assist-to-turnover ratio and assist ratio.
    - The combined average weights are currently fixed at 30% rolling mean
      and 70% EWMA.
    - Existing output files are removed before writing new results.
    """

    # Load the CSV file
    tqdm.write("Loading CSV file...")
    df: pd.DataFrame = pd.read_csv(gamelogs_file)
    tqdm.write(f"   Loaded {len(df)} game records")

    # Sort by team and date
    tqdm.write("   Sorting data by team and date...")
    df["date"] = pd.to_datetime(df["date"])
    df: pd.DataFrame = df.sort_values(by=["team", "date"])

    # Feature Engineering
    tqdm.write("   Engineering additional features...")
    df["ast_tov"] = round(df["ast"] / df["tov"], 2)
    df["ast_ratio"] = round(df["ast"] / (df["fg"] + df["ast"] + df["tov"]), 2)

    # Identify stat columns (exclude non-numerical or metadata columns)
    stat_columns: list[str] = [col for col in df.columns if col not in ["date", "team"]]
    tqdm.write(f"   Processing {len(stat_columns)} statistical columns")

    # Get unique teams for progress tracking
    teams = df["team"].unique()
    tqdm.write(f"\nComputing rolling averages for {len(teams)} teams...\n")

    # Combining the 2 averages methods to obtain a more neutral overview
    def compute_combined_avg(group: pd.DataFrame) -> pd.DataFrame:
        """
        Compute a combined rolling and exponentially weighted average for a team.

        This helper function operates on a single team's game log DataFrame and
        calculates two smoothed statistics for each numerical column:
        - A simple rolling mean over a fixed game window
        - An exponential weighted moving average (EWMA) emphasizing recent games

        The two measures are combined into a single value using fixed weights and
        shifted by one game to prevent data leakage from the current match.

        Parameters
        ----------
        group : pandas.DataFrame
            A DataFrame containing game-by-game statistics for a single team,
            ordered chronologically.

        Returns
        -------
        pandas.DataFrame
            A DataFrame of the same shape as the input statistics, containing
            the combined rolling averages for each game.

        Notes
        -----
        - The rolling window size and weighting coefficients are defined in the
          enclosing scope (`game_window`, `stat_columns`).
        - The first game for each team is filled with actual observed values
          to avoid NaN results after shifting.
        - This function is intended to be used with `groupby().apply()`.
        """

        # Rolling average (simple mean)
        rolling_mean: pd.DataFrame = (
            group[stat_columns]
            .rolling(window=game_window, min_periods=1)
            .mean()
            .shift(1)
        )

        # Exponential weighted mean (recent games weighted more)
        ewma: pd.DataFrame = (
            group[stat_columns].ewm(span=game_window, adjust=False).mean().shift(1)
        )

        # Combined average (equal weight, or adjust ratio if you prefer)
        combined: pd.DataFrame = 0.3 * rolling_mean + 0.7 * ewma

        # Fill the first row with actual values to avoid NaNs
        combined.iloc[0] = group.iloc[0][stat_columns]
        return combined

    # Apply per team with progress bar
    tqdm.pandas(desc="Computing averages", unit="team", colour="green")
    df[stat_columns] = df.groupby("team", group_keys=False, observed=True)[
        stat_columns
    ].progress_apply(compute_combined_avg)

    # Round the results
    tqdm.write("\nRounding values...")
    df[stat_columns] = df[stat_columns].round(2)

    # Save to CSV
    if os.path.exists(output_file):
        tqdm.write(f"File {output_file} already exists. Removing...")
        os.remove(output_file)

    tqdm.write(f"Saving results to {output_file}...")
    df.to_csv(output_file, index=False)
    tqdm.write(f"Rolling averages saved successfully!")
    tqdm.write(f"Output: {output_file}")


if __name__ == "__main__":
    print(f"Game window size: {game_window}\n")
    compute_rolling_averages(game_window, "./csv/gamelogs.csv", "./csv/averages.csv")
    print("\nDone!\n")