# Generate a March Madness Bracket

If you are seeing this before March 20, 2021, I've created a public ESPN March Madness brackets group <a href="https://fantasy.espn.com/tournament-challenge-bracket/2021/en/group?redirect=tcmen%3A%2F%2Fx-callback-url%2FshowGroup%3FgroupID%3D4123218&ex_cid=tcmen2021_email&groupID=4123218&inviteuser=ezU2NUJEOTAyLUU5MzktNDhCNC05QkQ5LTAyRTkzOTI4QjQxRn0%3D&invitesource=email"> <strong>here</strong></a>, so feel free to join! If you do decide to join, please only submit brackets that your model generated so as to not ruin the integrity of the competition.

<h2> How do I generate the DataFrame of predictions? </h2>

Save your predictions in the following csv/Pandas DataFrame format. Ensure your columns are named exactly ```ID``` and ```Pred```; otherwise, the code won't work.

<img src="https://lh3.googleusercontent.com/2ovFfUcFoM4m8E75N-z4bv7mCXRT4mtZpBxC8OlZJE_kXO58K_G1KAHirByQku0qfmg-t44=s1000">

> Example input DataFrame

Initialize the ```predictSeason``` class with your DataFrame of predictions and use the ```predict_tour``` method to obtain the DataFrame of results. Example code is shown below:
```
pred_df = model.predict(X) # my DataFrame of predictions
my_season = predictSeason(pred_df)
res_df = my_season.predict_tour() # store all the results
pd.set_option('display.max_rows', 68) # set so you can see all of the rows in the DataFrame
res_df
```

The other parameters you can initialize for the ```predictSeason``` class are:
* ```season (int)``` where $\text{season} \in [2003, 2019] \cup [2021]$. Change this variable if you would like to use your model to predict the bracket for a different season. I've tested this for about half of the possible years, so please let me know if there's a season where this code fails. Additionally, if you do decide to do this, ensure that ```pred_df``` has the predictions for that season! 
* ```PRINT (bool)```. If set to ```True```, the result of each game will be printed to the terminal.

<h2> How do I use my DataFrame of simulated tournament games to fill out my March Madness bracket? </h2>

First, ensure your output DataFrame looks like mine in the image below:

<img src="https://lh3.googleusercontent.com/ApjWyH9IBpqDtXUK9IDyOM2u6y9AHgKhJymc8wZMtf2PS4azFeKo2wnH1qczoraOd0Yhpw=s1000">

> Example DataFrame of simulated tournament games

The results are shown from top to bottom in the following order: ```play-in, R64, R32, R16, R8 (div winner)``` for all 4 divisions in the order ```W, X, Y, Z```, and then the final four/championship winners. I personally read the list from top to bottom and keep track of the seeds as I go. For clarity, there are two additional columns, ```Round``` and ```Division```, which should help you navigate the bracket creation process and allow you to make your predictions accordingly.

<h2> Bugs/Questions </h2>
    
If you find any bugs with the code or have any questions, please let me know in the comments section!

In [None]:
import pandas as pd

class predictSeason():
    def __init__(self, pred_df, season=2021, PRINT=False):
        assert 'ID' in pred_df.columns, "Column 'ID' is not found in your input DataFrame. Check your spelling and capitalization."
        assert 'Pred' in pred_df.columns, "Column 'Pred' is not found in your input DataFrame. Check your spelling and capitalization."
        assert str(season) in pred_df['ID'].str.slice(0, 4).unique(), f"{season} season not found in your prediction DataFrame."
        
        try:
            self.teams_df = pd.read_csv(f"../input/ncaam-march-mania-2021-spread/MDataFiles_Stage{STAGE}_Spread/MTeams.csv")
            self.seeding_df = pd.read_csv(f"../input/ncaam-march-mania-2021-spread/MDataFiles_Stage2_Spread/MNCAATourneySeeds.csv")
        except:
            raise Exception("Some files are not found. Ensure the file paths are correct.")
            
        try:
            self.year_df = self.seeding_df[self.seeding_df['Season'] == season]
        except:
            raise Exception(f"The {season} season is out of range, please try again.")
            
        self.pred_df = pred_df
        self.season = season
        self.PRINT = PRINT
        self.res_list = []
        
    def predict_id(self, id1, id2):
        """
        Returns a boolean stating whether the team with id1 wins.
        """
        
        id_str = f"{self.season}_{min(id1, id2)}_{max(id1, id2)}"
        pred = self.pred_df.loc[self.pred_df['ID'] == id_str]['Pred'].iloc[0]
        res = True if pred >= 0 else False
        res = (1-res) if id1 > id2 else res
        return res
    
    def team_id(self, id_test):
        """
        Returns the name of a team with a certain ID.
        """
        
        return self.teams_df.loc[self.teams_df['TeamID'] == id_test]['TeamName'].iloc[0]
    
    def playin_round(self, id_tuple, div_df, playin_df, seed, div):
        """
        Handles logic for playin round (before Round of 64).
        """
        
        id1, id2 = id_tuple
        
        final_res = self.predict_id(id1, id2)

        team1 = f"{seed} {self.team_id(id1)}"
        team2 = f"{seed} {self.team_id(id2)}"
        
        final_res_str = "won" if final_res >= 0 else "lost"
        self.res_list.append([team1, team2, team1 if final_res else team2, "play-in", div])
        print(f"Play-in round, seeds {team1} played {team2} and {final_res_str}.") if self.PRINT else 0
        
        return div_df.append(playin_df.loc[playin_df['TeamID'] == id1] if final_res else playin_df.loc[playin_df['TeamID'] == id2])
    
    def predict_div(self, div):
        """
        Simulate and return the division winner.
        """
        
        div_df = self.year_df.loc[self.year_df['Seed'].str.contains(div)]
        div_df['Seed'] = div_df['Seed'].str.replace(div, '')
        div_df['Seed'] = div_df['Seed'].str.lstrip('0')

        # Check for play-in rounds
        playin_df = div_df.loc[div_df['Seed'].str.len() > 2]

        if len(playin_df):         
            div_df = div_df.loc[div_df['Seed'].str.len() <= 2]
            playin_df['Seed'] = playin_df['Seed'].str.slice(0, 2)
            seed = playin_df.iloc[0]['Seed']
            id_first = playin_df.iloc[:2]['TeamID'].tolist()
            div_df = self.playin_round(id_first, div_df, playin_df, seed, div)
            
            if len(playin_df) == 4:
                id_second = playin_df.iloc[-2:]['TeamID'].tolist()
                div_df = self.playin_round(id_second, div_df, playin_df, 16, div)

        matchup_df = div_df[['Seed', 'TeamID']].set_index('Seed')
        play_list = [1, 16, 8, 9, 5, 12, 4, 13, 6, 11, 3, 14, 7, 10, 2, 15] # This initial order handles all of the logic we need for the bracket
        round_num = 64

        while True:
            """
            Continue until there's a division winner.
            """
            
            tmp_list = []
            for i in range(0, len(play_list), 2):
                seed1 = str(play_list[i])
                seed2 = str(play_list[i+1])

                id1 = matchup_df.loc[seed1]['TeamID']
                id2 = matchup_df.loc[seed2]['TeamID']

                final_res = self.predict_id(id1, id2)

                team1 = f"{seed1} {self.team_id(id1)}"
                team2 = f"{seed2} {self.team_id(id2)}"

                final_res_str = "won" if final_res else "lost"
                self.res_list.append([team1, team2, team1 if final_res else team2, f"R{int(round_num)}", div])
                print(f"Seed {team1} played seed {team2} and {final_res_str}.") if self.PRINT else 0

                tmp_list.append(seed1 if final_res else seed2)

            play_list = tmp_list.copy()
            round_num /= 2
            
            if len(play_list) <= 1:
                break

        return matchup_df.loc[play_list[0]]['TeamID']
    
    def predict_tour(self):
        """
        Driver function for creating the bracket. Run this and only this function.
        """
        
        div_list = ['W', 'X', 'Y', 'Z']
        div_winners = {}
        for div in div_list:
            print(f"Division {div}") if self.PRINT else 0
            winning_seed = self.predict_div(div)
            div_winners[div] = winning_seed
            print("\n") if self.PRINT else 0
        
        # Under the assumption that Division W always plays Division X in the Final Four
        
        id1 = div_winners['W']
        id2 = div_winners['X']
        id3 = div_winners['Y']
        id4 = div_winners['Z']
        
        team1 = self.team_id(id1)
        team2 = self.team_id(id2)
        team3 = self.team_id(id3)
        team4 = self.team_id(id4)
        
        # F4 W & X
        final_res_1 = self.predict_id(id1, id2)
        final_res_str_1 = "won" if final_res_1 else "lost"
        f4_winner_1 = [id1, team1] if final_res_1 else [id2, team2]
        self.res_list.append([team1, team2, team1 if final_res_1 else team2, "R4", "F4"])
        print(f"{team1} played {team2} and {final_res_str_1}.") if self.PRINT else 0
        
        # F4 Y & Z
        final_res_2 = self.predict_id(id3, id4)
        final_res_str_2 = "won" if final_res_2 else "lost"
        f4_winner_2 = [id3, team3] if final_res_2 else [id4, team4]
        self.res_list.append([team3, team4, team3 if final_res_2 else team4, "R4", "F4"])
        print(f"{team3} played {team4} and {final_res_str_2}.") if self.PRINT else 0
        
        # Championship
        champ_res_final = self.predict_id(f4_winner_1[0], f4_winner_2[0])
        champ_res_str = "won" if champ_res_final else "lost"
        champion = f4_winner_1[1] if champ_res_final else f4_winner_2[1]
        self.res_list.append([f4_winner_1[1], f4_winner_2[1], champion, "R2", "Finals"])
        print(f"THE CHAMPION FOR THE {self.season} SEASON IS {champion}!!!")
        
        res_df = pd.DataFrame(data=self.res_list, columns=["Team 1", "Team 2", "Winner", "Round", "Division"])
        return res_df

<h3> Example Code </h3>

In [None]:
pred_df = model.predict(X) # my DataFrame of predictions
my_season = predictSeason(pred_df)
res_df = my_season.predict_tour() # store all the results
pd.set_option('display.max_rows', 68) # set so you can see all of the rows in the DataFrame
res_df