# Objective:

This file predicts the outcome of a week's college football games. This file requires the input of month, day, and year, and loads in the relevant information, and predicts who will win each game. 

The predictions are saved to `predictions_<month>_<day>_<year>.csv`.

## Import Libraries

In [2]:
# Data Manipulation
import pandas as pd

# Schedule Retrieval
from this_weeks_games import get_this_weeks_games

# Model Loading
import joblib

`get_game_data`:  gets the offensive and defensive statistics for the `home_team` and `away_team`. 

Uses `offense_df` and `defense_df`.

Returns Pandas DataFrame with `X` feature space, used for ML predictions.

In [3]:
def get_game_data(home_team: str, away_team: str, offense_df: pd.DataFrame, defense_df: pd.DataFrame) -> pd.DataFrame:
	home_offense = offense_df.query('team == @home_team')[[x for x in offense_df.columns if 'remove' not in x and x != 'team']].astype('float').reset_index(drop=True)
	home_defense = defense_df.query('team == @home_team')[[x for x in offense_df.columns if 'remove' not in x and x != 'team']].astype('float').reset_index(drop=True)
	away_offense = offense_df.query('team == @away_team')[[x for x in offense_df.columns if 'remove' not in x and x != 'team']].astype('float').reset_index(drop=True)
	away_defense = defense_df.query('team == @away_team')[[x for x in offense_df.columns if 'remove' not in x and x != 'team']].astype('float').reset_index(drop=True)

	game_df = pd.merge(home_offense, home_defense, left_index=True, right_index=True, suffixes=('_home_off', '_home_def'))
	game_df = pd.merge(game_df, away_offense, left_index=True, right_index=True, suffixes=('', '_away_off'))
	game_df = pd.merge(game_df, away_defense, left_index=True, right_index=True, suffixes=('', '_away_off'))

	return game_df

## Load in Helper Data

Loads in `offense_df` and `defense_df`, which help `get_game_data` get the data for each matchup. 

In [4]:
offense_df = pd.read_csv('data/offense.csv')
defense_df = pd.read_csv('data/defense.csv')

## Load in schedule for this week

Uses the `get_this_weeks_games` function from `week_games.ipynb`.

In [5]:
month = 12
day = 6
year = 2022

games_this_week = get_this_weeks_games(month=month, day=day, year=year)

if games_this_week.shape[0] == 0:
	print('No games this week. Exiting.')
	exit()

In [6]:
games_this_week

Unnamed: 0,home_teams,away_teams
0,army,navy


## Create `X` feature space

Aggregates each game's `X` feature space (`game_df`) into `X_weekend`.

In [7]:
home_teams = []
away_teams = []

X_weekend = None

for home_team, away_team in zip(games_this_week['home_teams'], games_this_week['away_teams']):
	X_game = get_game_data(home_team, away_team, offense_df, defense_df)

	if X_game.shape[0] != 1:
		continue

	home_teams.append(home_team)
	away_teams.append(away_team)

	if X_weekend is None:
		X_weekend = X_game
	else:
		X_weekend = pd.concat((X_weekend, X_game))

## Load Trained Model

Loads in best machine learning model as `clf`.

In [8]:
clf = joblib.load('models/cfb_lr_model.joblib')

## Predict Game Outcomes

Creates `weekend_df` and predicts winner of each game. `win_prob` contains the probability the home team wins. Writes output to `predictions_<month>_<day>_<year>.csv`.

In [9]:
weekend_df = pd.DataFrame(
	{
		'home_teams': home_teams,
		'away_teams': away_teams,
		'win_prob': clf.predict_proba(X_weekend)[:, 1]
	}
)

weekend_df['winner'] = weekend_df.apply(lambda row: row['home_teams'] if row['win_prob'] >= 0.5 else row['away_teams'], axis=1)

weekend_df.to_csv(f'predictions/predictions_{month}_{day}_{year}.csv', index=False)