# Creating the Perfect Bracket

There's nothing quite like the most riveting basketball event of the year: NCAA March Madness. The 64-team tournament consists of 4 regions, each with 16 teams ranked independently of the other regions according to their regular season performance. Each team attempts to win 6 successive games in order to emerge victorious as the NCAA national champion.

Perhaps what contributes most to the intrigue of March Madness is filling out a March Madness bracket. "The American Gaming Association estimated in 2019 that 40 million Americans filled out a combined 149 million brackets for a collective wager of \$4.6 billion." It's important to note that even a single bet can be quite lucrative, particularly when an upset occurs (when a lower-ranking underdog beats a higher-ranking favorite). For example, the first-ever upset of a #1 seed by a #16 seed occurred in the 2019 NCAA tournament. In that game "a \$100 bet paid out \$2,500", which translates to American betting odds of +2500!

<br>
*All quotations were cited from the following article: https://www.gobankingrates.com/money/business/money-behind-march-madness-ncaa-basketball-tournament/*

### Problem Structure

The purpose of this personal project is to perform supervised classification on March Madness data to more accurately predict the outcome of an NCAA tournament games--particularly the occurrence of upsets. This would allow for an increased possibility of yielding the kinds of profits mentioned above by filling out more accurate brackets relative to other participants.

# Data Fetching

### Perceived Predictors

Naturally, it will be vitally important to scrape available data that is pertinent to deciding the outcome of an NCAA March Madness game between any two given teams. To successfully do so, we must break down what are generally the most influential elements of a basketball team's success.

<br>Overall team performance during the regular season is generally a good indicator of how a team will perform in March Madness. This would be captured by statistics, both basic and advanced, such as the following:
**<br>Season Record (%)
<br>Conference Record (%); could be important given that the tournament is split into regions
<br>Regular Season Record vs. Tourney Opponent (%); set to theoretical discrete probability of 50% if no such matchups exist 
<br>Strength of Schedule (SOS); measures the difficulty of the teams played (higher number = greater difficulty)
<br>Top 25 Ranking (boolean); considered a consensus top-tier team
<br>Shots Made per Game (FG, 3P, FT)
<br>Point Differential per Game; measures how dominant/unsuccessful you are at outscoring your opponent on average
<br>Misc. Team Stats per Game (Rebounds, Assists, Blocks, etc.)
**

<br>However, March Madness is well-known for its Cinderalla stories--instances where average or underachieving regular season teams make big, unexpected runs in the tournament. Because of this, **it would likely be beneficial to also have team performance during the tournament as an indicator. The difficulty here will be transforming the data--which would be virtually the same categories as the data scraped for the regular season--in such a way that data leakage is avoided.**

<br>It's important to note that in the NCAA, more so than the NBA, experienced coaches can have just as much of an impact on a game's outcome as the players themselves. Hence, it's reasonable to assume that the following statistics could also be solid indicators:
**<br>Coach March Madness Appearances
<br>Coach Sweet Sixteen Appearances
<br>Coach Final Four Appearances
<br>Coach Championships Won
**

<br>And last but certainly not least, we need the data for the structure of the tournaments themselves:
**<br>Favorite Seed
<br>Underdog Seed
<br>Round Number (1-6)
<br>Game Outcome (boolean); did the underdog upset the favorite?
**

In [1]:
from data_fetch import get_team_data, get_rankings_data, get_coach_data

Team Regular Season

In [2]:
season_basic_df = get_team_data(url="https://www.sports-reference.com/cbb/seasons/2019-school-stats.html",
                                    attrs={'id': 'basic_school_stats'})
season_basic_df

Unnamed: 0,Rk,School,G,W,L,W-L%,SRS,SOS,Unnamed: 8,W.1,...,FT,FTA,FT%,ORB,TRB,AST,STL,BLK,TOV,PF
0,1,Abilene Christian NCAA,34,27,7,.794,-1.91,-7.34,,14,...,457,642,.712,325,1110,525,297,93,407,635
1,2,Air Force,32,14,18,.438,-4.28,0.24,,8,...,341,503,.678,253,1077,434,154,57,423,543
2,3,Akron,33,17,16,.515,4.86,1.09,,8,...,380,539,.705,312,1204,399,185,106,388,569
3,4,Alabama A&M,32,5,27,.156,-19.23,-8.38,,4,...,284,453,.627,314,1032,385,234,50,487,587
4,5,Alabama-Birmingham,35,20,15,.571,0.36,-1.52,,10,...,424,630,.673,367,1279,401,218,82,399,578
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
382,349,Wright State,35,21,14,.600,3.29,-0.89,,13,...,510,692,.737,382,1229,484,214,72,402,545
383,350,Wyoming,32,8,24,.250,-9.75,0.19,,4,...,477,660,.723,167,983,331,176,88,450,588
384,351,Xavier,35,19,16,.543,9.61,8.06,,9,...,437,644,.679,371,1281,519,190,128,450,550
385,352,Yale NCAA,30,22,8,.733,5.52,-1.24,,10,...,411,557,.738,259,1157,503,177,131,392,510


In [3]:
season_adv_df = get_team_data(url="https://www.sports-reference.com/cbb/seasons/2019-advanced-school-stats.html",
                                    attrs={'id': 'adv_school_stats'})
season_adv_df

Unnamed: 0,Rk,School,G,W,L,W-L%,SRS,SOS,Unnamed: 8,W.1,...,3PAr,TS%,TRB%,AST%,STL%,BLK%,eFG%,TOV%,ORB%,FT/FGA
0,1,Abilene Christian NCAA,34,27,7,.794,-1.91,-7.34,,14,...,.345,.565,50.3,58.5,12.9,8.0,.535,15.5,28.8,.239
1,2,Air Force,32,14,18,.438,-4.28,0.24,,8,...,.400,.541,50.1,54.1,7.0,5.8,.517,17.4,23.7,.192
2,3,Akron,33,17,16,.515,4.86,1.09,,8,...,.477,.515,48.2,50.1,8.2,8.9,.485,15.0,25.3,.195
3,4,Alabama A&M,32,5,27,.156,-19.23,-8.38,,4,...,.320,.479,47.1,52.3,10.7,4.7,.457,19.4,27.6,.157
4,5,Alabama-Birmingham,35,20,15,.571,0.36,-1.52,,10,...,.346,.536,52.7,44.3,9.3,7.5,.511,14.8,30.4,.212
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
382,349,Wright State,35,21,14,.600,3.29,-0.89,,13,...,.403,.543,52.3,54.7,9.0,6.1,.506,14.6,31.3,.251
383,350,Wyoming,32,8,24,.250,-9.75,0.19,,4,...,.435,.534,44.7,48.0,7.8,7.9,.492,18.6,15.6,.288
384,351,Xavier,35,19,16,.543,9.61,8.06,,9,...,.374,.553,53.4,56.3,8.1,10.6,.528,16.5,32.2,.221
385,352,Yale NCAA,30,22,8,.733,5.52,-1.24,,10,...,.350,.584,52.9,56.3,8.0,11.2,.556,15.9,25.8,.227


Tournament Game Data

In [4]:
mm_games_df = get_team_data(url=("https://apps.washingtonpost.com/sports/search/?pri_school_id=&pri_conference=&pri_coach"
                                 "=&pri_seed_from=1&pri_seed_to=16&pri_power_conference=&pri_bid_type=&opp_school_id"
                                 "=&opp_conference=&opp_coach=&opp_seed_from=1&opp_seed_to=16&opp_power_conference"
                                 "=&opp_bid_type=&game_type=7&from=2019&to=2019&submit="), 
                            attrs={'class': 'search-results'},
                            header=0)
mm_games_df

Unnamed: 0,Year,Round,Seed,Team,Score,Seed.1,Team.1,Score.1
0,2019,National ChampionshipNational Championship,1,Virginia Virginia,85,3,Texas Tech Texas Tech,77
1,2019,Final FourFinal Four,1,Virginia Virginia,63,4,Auburn Auburn,62
2,2019,Final FourFinal Four,2,Michigan State Michigan State,51,3,Texas Tech Texas Tech,61
3,2019,Elite EightElite Eight,1,Gonzaga Gonzaga,69,3,Texas Tech Texas Tech,75
4,2019,Elite EightElite Eight,1,Virginia Virginia,80,3,Purdue Purdue,75
...,...,...,...,...,...,...,...,...
62,2019,First RoundFirst Round,6,Buffalo Buffalo,91,11,Arizona State Arizona State,74
63,2019,Play-InPlay-In,16,Fairleigh Dickinson Fairleigh Dickinson,82,16,Prairie View Prairie View,76
64,2019,Play-InPlay-In,11,Belmont Belmont,81,11,Temple Temple,70
65,2019,Play-InPlay-In,16,North Carolina Central North Carolina Central,74,16,North Dakota State North Dakota State,78


Team Rankings

In [5]:
rankings_df = get_rankings_data(url="https://www.sports-reference.com/cbb/seasons/2019-ratings.html")        
rankings_df

Unnamed: 0,Team,Top_25
2,Gonzaga,1
3,Duke,1
4,Virginia,1
5,Michigan State,1
6,North Carolina,1
...,...,...
384,Alcorn State,0
385,Mississippi Valley State,0
386,Maryland-Eastern Shore,0
387,Chicago State,0


Coaches

In [6]:
coaches_df = get_coach_data(url="https://www.sports-reference.com/cbb/seasons/2019-coaches.html")        
coaches_df

Unnamed: 0,Coach_Team,MM,S16,F4,Champs
2,Abilene Christian,1,,,
3,Air Force,,,,
4,Akron,3,1,,
5,Alabama,1,,,
6,Alabama A&M,,,,
...,...,...,...,...,...
385,Wright State,4,,,
386,Wyoming,,,,
387,Xavier,,,,
388,Yale,2,,,


# Data Cleaning & Feature Engineering

In [7]:
from data_integrity import coach_to_season_integrity_dict, season_to_tourney_integrity_dict
from data_clean import clean_basic_stats, clean_adv_stats, clean_coach_stats, clean_all_stats, clean_tourney_data
from data_merge import merge_clean_team_stats, merge_clean_rankings, merge_clean_coaches, merge_clean_tourney_games

Merge Regular Season Data

In [8]:
clean_season_basic_df = clean_basic_stats(season_basic_df)
clean_season_basic_df

Unnamed: 0,School,G,W-L%,SOS,Tm.,Opp.,FG,FG%,3P,3P%,FT,FT%,ORB,TRB,AST,STL,BLK,TOV,PF
0,Abilene Christian NCAA,34,.794,-7.34,2502,2161,897,.469,251,.380,457,.712,325,1110,525,297,93,407,635
11,Arizona State NCAA,34,.676,6.04,2638,2494,899,.447,240,.336,600,.680,399,1351,459,213,109,466,675
18,Auburn NCAA,40,.750,10.92,3188,2750,1097,.450,454,.377,540,.711,457,1369,572,369,190,466,731
23,Baylor NCAA,34,.588,9.26,2442,2302,869,.442,274,.341,430,.677,450,1281,473,209,159,446,636
24,Belmont NCAA,33,.818,-2.60,2868,2439,1042,.498,343,.372,441,.737,286,1275,645,220,125,376,509
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
364,Virginia NCAA,38,.921,10.15,2714,2132,974,.474,321,.395,445,.744,342,1326,544,211,149,342,542
368,Washington NCAA,36,.750,7.01,2511,2331,883,.451,273,.350,472,.695,334,1129,419,323,206,479,653
380,Wisconsin NCAA,34,.676,11.01,2333,2099,874,.449,241,.359,344,.648,283,1197,430,177,140,327,511
381,Wofford NCAA,35,.857,0.80,2879,2295,1044,.490,385,.414,406,.704,365,1232,528,233,105,377,589


In [9]:
clean_season_adv_df = clean_adv_stats(season_adv_df)
clean_season_adv_df

Unnamed: 0,School,Pace,ORtg,FTr,3PAr,TS%,TRB%,AST%,STL%,BLK%,eFG%,TOV%,ORB%,FT/FGA
0,Abilene Christian NCAA,67.2,108.6,.336,.345,.565,50.3,58.5,12.9,8.0,.535,15.5,28.8,.239
1,Air Force,67.4,99.5,.283,.400,.541,50.1,54.1,7.0,5.8,.517,17.4,23.7,.192
2,Akron,68.5,100.1,.277,.477,.515,48.2,50.1,8.2,8.9,.485,15.0,25.3,.195
3,Alabama A&M,67.5,88.7,.250,.320,.479,47.1,52.3,10.7,4.7,.457,19.4,27.6,.157
4,Alabama-Birmingham,66.6,105.2,.315,.346,.536,52.7,44.3,9.3,7.5,.511,14.8,30.4,.212
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
382,Wright State,67.4,108.1,.341,.403,.543,52.3,54.7,9.0,6.1,.506,14.6,31.3,.251
383,Wyoming,70.2,93.3,.399,.435,.534,44.7,48.0,7.8,7.9,.492,18.6,15.6,.288
384,Xavier,66.5,107.0,.326,.374,.553,53.4,56.3,8.1,10.6,.528,16.5,32.2,.221
385,Yale NCAA,72.9,109.7,.307,.350,.584,52.9,56.3,8.0,11.2,.556,15.9,25.8,.227


In [10]:
team_season_stats_df = merge_clean_team_stats(clean_season_basic_df, clean_season_adv_df)
team_season_stats_df

Unnamed: 0,School,G,W-L%,SOS,Tm.,Opp.,FG,FG%,3P,3P%,...,3PAr,TS%,TRB%,AST%,STL%,BLK%,eFG%,TOV%,ORB%,FT/FGA
0,Abilene Christian,34,.794,-7.34,2502,2161,897,.469,251,.380,...,.345,.565,50.3,58.5,12.9,8.0,.535,15.5,28.8,.239
1,Arizona State,34,.676,6.04,2638,2494,899,.447,240,.336,...,.355,.543,52.6,51.1,8.5,9.6,.506,16.1,31.3,.298
2,Auburn,40,.750,10.92,3188,2750,1097,.450,454,.377,...,.494,.569,49.2,52.1,13.2,15.6,.543,14.3,31.6,.221
3,Baylor,34,.588,9.26,2442,2302,869,.442,274,.341,...,.408,.538,54.1,54.4,9.2,13.8,.512,16.4,37.6,.219
4,Belmont,33,.818,-2.60,2868,2439,1042,.498,343,.372,...,.440,.603,52.2,61.9,8.9,9.1,.580,13.7,25.1,.211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Virginia,38,.921,10.15,2714,2132,974,.474,321,.395,...,.395,.580,53.9,55.9,9.0,13.0,.552,12.8,29.7,.216
64,Washington,36,.750,7.01,2511,2331,883,.451,273,.350,...,.399,.551,47.6,47.5,13.3,16.3,.521,17.4,28.8,.241
65,Wisconsin,34,.676,11.01,2333,2099,874,.449,241,.359,...,.346,.531,49.8,49.2,7.9,11.0,.511,13.0,24.3,.177
66,Wofford,35,.857,0.80,2879,2295,1044,.490,385,.414,...,.436,.598,54.5,50.6,9.6,9.1,.580,13.5,32.5,.190


Merge Rankings Data

In [11]:
season_stats_rankings_df = merge_clean_rankings(team_season_stats_df, rankings_df)
season_stats_rankings_df

Unnamed: 0,School,G,W-L%,SOS,Tm.,Opp.,FG,FG%,3P,3P%,...,TS%,TRB%,AST%,STL%,BLK%,eFG%,TOV%,ORB%,FT/FGA,Top_25
0,Abilene Christian,34,.794,-7.34,2502,2161,897,.469,251,.380,...,.565,50.3,58.5,12.9,8.0,.535,15.5,28.8,.239,0
1,Arizona State,34,.676,6.04,2638,2494,899,.447,240,.336,...,.543,52.6,51.1,8.5,9.6,.506,16.1,31.3,.298,0
2,Auburn,40,.750,10.92,3188,2750,1097,.450,454,.377,...,.569,49.2,52.1,13.2,15.6,.543,14.3,31.6,.221,1
3,Baylor,34,.588,9.26,2442,2302,869,.442,274,.341,...,.538,54.1,54.4,9.2,13.8,.512,16.4,37.6,.219,0
4,Belmont,33,.818,-2.60,2868,2439,1042,.498,343,.372,...,.603,52.2,61.9,8.9,9.1,.580,13.7,25.1,.211,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Virginia,38,.921,10.15,2714,2132,974,.474,321,.395,...,.580,53.9,55.9,9.0,13.0,.552,12.8,29.7,.216,1
64,Washington,36,.750,7.01,2511,2331,883,.451,273,.350,...,.551,47.6,47.5,13.3,16.3,.521,17.4,28.8,.241,0
65,Wisconsin,34,.676,11.01,2333,2099,874,.449,241,.359,...,.531,49.8,49.2,7.9,11.0,.511,13.0,24.3,.177,1
66,Wofford,35,.857,0.80,2879,2295,1044,.490,385,.414,...,.598,54.5,50.6,9.6,9.1,.580,13.5,32.5,.190,0


Merge Coach Data

In [12]:
clean_coaches_df = clean_coach_stats(coaches_df)
all_season_stats_df = merge_clean_coaches(season_stats_rankings_df, clean_coaches_df)

all_season_stats_df

Unnamed: 0,School,G,W-L%,SOS,Tm.,Opp.,FG,FG%,3P,3P%,...,BLK%,eFG%,TOV%,ORB%,FT/FGA,Top_25,MM,S16,F4,Champs
0,Abilene Christian,34,.794,-7.34,2502,2161,897,.469,251,.380,...,8.0,.535,15.5,28.8,.239,0,1,0,0,0
1,Arizona State,34,.676,6.04,2638,2494,899,.447,240,.336,...,9.6,.506,16.1,31.3,.298,0,3,0,0,0
2,Auburn,40,.750,10.92,3188,2750,1097,.450,454,.377,...,15.6,.543,14.3,31.6,.221,1,10,5,1,0
3,Baylor,34,.588,9.26,2442,2302,869,.442,274,.341,...,13.8,.512,16.4,37.6,.219,0,8,4,0,0
4,Belmont,33,.818,-2.60,2868,2439,1042,.498,343,.372,...,9.1,.580,13.7,25.1,.211,0,8,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Virginia,38,.921,10.15,2714,2132,974,.474,321,.395,...,13.0,.552,12.8,29.7,.216,1,9,4,1,1
64,Washington,36,.750,7.01,2511,2331,883,.451,273,.350,...,16.3,.521,17.4,28.8,.241,0,1,0,0,0
65,Wisconsin,34,.676,11.01,2333,2099,874,.449,241,.359,...,11.0,.511,13.0,24.3,.177,1,3,2,0,0
66,Wofford,35,.857,0.80,2879,2295,1044,.490,385,.414,...,9.1,.580,13.5,32.5,.190,0,5,0,0,0


Clean & Merge Season Data

In [13]:
clean_all_season_stats_df = clean_all_stats(all_season_stats_df, clean_season_basic_df)
clean_all_season_stats_df

Unnamed: 0,School,W-L%,SOS,FG%,3P%,FT%,Pace,ORtg,FTr,3PAr,...,FG/Game,3P/Game,FT/Game,ORB/Game,TRB/Game,AST/Game,STL/Game,BLK/Game,TOV/Game,PF/Game
0,Abilene Christian,0.794,-7.34,0.469,0.380,0.712,67.2,108.6,0.336,0.345,...,26.4,7.4,13.4,9.6,32.6,15.4,8.7,2.7,12.0,18.7
1,Arizona State,0.676,6.04,0.447,0.336,0.680,72.5,105.4,0.438,0.355,...,26.4,7.1,17.6,11.7,39.7,13.5,6.3,3.2,13.7,19.9
2,Auburn,0.750,10.92,0.450,0.377,0.711,69.2,114.1,0.312,0.494,...,27.4,11.4,13.5,11.4,34.2,14.3,9.2,4.8,11.6,18.3
3,Baylor,0.588,9.26,0.442,0.341,0.677,66.5,107.6,0.323,0.408,...,25.6,8.1,12.6,13.2,37.7,13.9,6.1,4.7,13.1,18.7
4,Belmont,0.818,-2.60,0.498,0.372,0.737,74.2,116.2,0.286,0.440,...,31.6,10.4,13.4,8.7,38.6,19.5,6.7,3.8,11.4,15.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Virginia,0.921,10.15,0.474,0.395,0.744,60.8,116.3,0.291,0.395,...,25.6,8.4,11.7,9.0,34.9,14.3,5.6,3.9,9.0,14.3
64,Washington,0.750,7.01,0.451,0.350,0.695,67.4,103.2,0.347,0.399,...,24.5,7.6,13.1,9.3,31.4,11.6,9.0,5.7,13.3,18.1
65,Wisconsin,0.676,11.01,0.449,0.359,0.648,64.8,103.9,0.273,0.346,...,25.7,7.1,10.1,8.3,35.2,12.6,5.2,4.1,9.6,15.0
66,Wofford,0.857,0.80,0.490,0.414,0.704,68.5,119.2,0.271,0.436,...,29.8,11.0,11.6,10.4,35.2,15.1,6.7,3.0,10.8,16.8


Clean Tournament Games

In [14]:
clean_mm_df = clean_tourney_data(mm_games_df, clean_all_season_stats_df)
clean_mm_df

Unnamed: 0,Round,Seed_Favorite,Team_Favorite,Seed_Underdog,Team_Underdog,Underdog_Upset
0,National Championship,1,Virginia,3,Texas Tech,0
1,Final Four,1,Virginia,4,Auburn,0
2,Final Four,2,Michigan State,3,Texas Tech,1
3,Elite Eight,1,Gonzaga,3,Texas Tech,1
4,Elite Eight,1,Virginia,3,Purdue,0
...,...,...,...,...,...,...
62,First Round,6,Buffalo,11,Arizona State,0
63,Play-In,16,Prairie View,16,Fairleigh Dickinson,1
64,Play-In,11,Belmont,11,Temple,0
65,Play-In,16,North Dakota State,16,North Carolina Central,0


Merge Tournament Games

In [15]:
all_data_df = merge_clean_tourney_games(clean_mm_df, clean_all_season_stats_df)
all_data_df

Unnamed: 0,Round,Seed_Favorite,Team_Favorite,Seed_Underdog,Team_Underdog,Underdog_Upset,W-L%_Favorite,SOS_Favorite,FG%_Favorite,3P%_Favorite,...,FG/Game_Underdog,3P/Game_Underdog,FT/Game_Underdog,ORB/Game_Underdog,TRB/Game_Underdog,AST/Game_Underdog,STL/Game_Underdog,BLK/Game_Underdog,TOV/Game_Underdog,PF/Game_Underdog
0,National Championship,1,Virginia,3,Texas Tech,0,0.921,10.15,0.474,0.395,...,26.1,7.3,13.4,8.3,34.1,13.6,7.3,4.9,12.0,17.4
1,Final Four,2,Michigan State,3,Texas Tech,1,0.821,12.34,0.480,0.378,...,26.1,7.3,13.4,8.3,34.1,13.6,7.3,4.9,12.0,17.4
2,Elite Eight,1,Gonzaga,3,Texas Tech,1,0.892,5.01,0.526,0.363,...,26.1,7.3,13.4,8.3,34.1,13.6,7.3,4.9,12.0,17.4
3,Sweet 16,2,Michigan,3,Texas Tech,1,0.811,10.55,0.448,0.342,...,26.1,7.3,13.4,8.3,34.1,13.6,7.3,4.9,12.0,17.4
4,Final Four,1,Virginia,4,Auburn,0,0.921,10.15,0.474,0.395,...,27.4,11.4,13.5,11.4,34.2,14.3,9.2,4.8,11.6,18.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,First Round,6,Maryland,11,Belmont,0,0.676,10.09,0.449,0.349,...,31.6,10.4,13.4,8.7,38.6,19.5,6.7,3.8,11.4,15.4
63,First Round,6,Buffalo,11,Arizona State,0,0.889,2.62,0.462,0.337,...,26.4,7.1,17.6,11.7,39.7,13.5,6.3,3.2,13.7,19.9
64,Play-In,11,Belmont,11,Temple,0,0.818,-2.60,0.498,0.372,...,26.5,7.5,14.3,9.6,34.4,14.4,8.6,2.2,11.2,17.6
65,Play-In,16,North Dakota State,16,North Carolina Central,0,0.543,-2.07,0.454,0.365,...,25.7,6.5,13.5,11.7,37.2,15.5,6.1,2.7,15.1,17.9




# Data Exploration (EDA)

### Questions of Interest

As any good data scientist should do, there are a few hypotheses I hope to address in my EDA:

1) Does your data have any null values? Are these values missing at random?

2) What is a bracket's accuracy given random guessing in favor of the majority class (base rate: favorite beats underdog)?

3) How often do upsets occur in a given year's March Madness? 

4) Which seeding combinations are the most likely to produce upsets?

5) What is the win percentage of each seed in the tournament?

### Visualizations

In [16]:
import matplotlib.pyplot as plt
import seaborn as sns

# Feature Engineering (cont'd)

# Feature Selection

# Model Selection

# Model Evaluation

# Conclusions