In [None]:
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from ncaa_simulator import Data, Submission, Tournament, round_names

In [None]:
# Initiate data class to get files
mw = 'M'
ncaa_data = Data(mw, dir='../input/mens-march-mania-2022/MDataFiles_Stage1')
season = 2021
# read and initialize submission class
df = pd.read_csv('../input/ncaa-m-model-2022-lgbm/submission.csv')
submission = Submission(sub_df=df, data=ncaa_data)

# initiate a tournament object
tourney = Tournament(data=ncaa_data,submission=submission,season=season)

# Now what can these do?
## Data class
Pass this to the other two classes. It handles retrieval of all the necessary data. Can get data for mens or womens kaggle competitions

## The Submission Class
The submission class will add some more information to your simple submission file like the round a game takes place in. It also has a method to look up predictions by team names or IDs.

In [None]:
submission.df.head(2)

more interestingly...

In [None]:
pred = submission.df.iloc[0,2]
# pred.s_dict_rev

In [None]:
submission.df.head(2)

the round names can be found in a dictionary at the top of the Python code. Now how about predictions?

In [None]:
pred = submission.get_pred_by_teams(season=season, t1_name='Houston', t2_name='Ohio')
pred

this predictiction class has methods to do things like randomly pick a winner based on the prediction

In [None]:
pred.get_favored(), pred.t1_id, pred.t1_name

## The Tournament Class
The tournament class is built on top of the prediction classes and gives some simulation options and ways to calculate expected losses or tournament odds, like the odds to make the championship.

The first simple example is using a dictionary of game slots and winning team IDs stored in the python code to load historic tournament results to compare our submission to. I am planning on updating these dictionary values for 2022 as the tournament progresses

In [None]:
tourney.get_historic_results()

The results are loaded and below we see that Baylor beat Gonzaga to win the 2021 tournament.

In [None]:
tourney.summary_to_df().head()

Below are the losses. I am dropping the last 4 which are play-in games. There was also one cancelled game in the 2021 tournament that is not appropriately reflected in the losses here. That game was slot `R1X4`. If we omit all of those we can perfectly match my kaggle score from last year.

In [None]:
loss = tourney.get_losses(kaggle=True).mean()
loss

Let's try some simulations - first we will just take the favored team in each slot

In [None]:
tourney.reset_tournament()
tourney.simulate_tournament('chalk') 
tourney.summary_to_df().head()


or we can randomize it

In [None]:
tourney.reset_tournament()
tourney.simulate_tournament('random', seed=13) 
tourney.summary_to_df().head()

or we can run a simulation that will track our expected outcomes and losses

In [None]:
n_sim = 10
results, expected_losses = tourney.simulate_tournaments(n_sim)

Let's visualize the team performance by round from the simulation

In [None]:

results.head(10)

In [None]:
odds = results.copy()
odds.iloc[:,1:] = (results.iloc[:,1:]  / n_sim)
odds.head(10)

In [None]:
odds.set_index('Team')['Championship'].plot(kind='pie')
plt.axis('equal')
plt.show()

In [None]:
plt.hist(expected_losses)
plt.axvline(np.array(expected_losses).mean(), color='r', linestyle='--', label='Expected Loss')
plt.axvline(loss, color='r', linestyle='-', label='True Loss')
plt.legend()