Skip to content

Analysis of NCAA Tournament results by team and seed for data from 1985 - 2016.

Notifications You must be signed in to change notification settings

mmclaughlin87/march-madness-historical-perfomance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

March Madness Analysis

College basketball fans can better understand the NCAA Tournament by reviewing and analyzing data from previous years. In this report, results from every March Madness game between 1985 and 2016 are analyzed to identify the following trends:

  • Expected performance according to seed
  • Matchups that often result in exciting games
  • Outliers who have exceeded their expectation

This information can be used to identify games that are likely to be interesting, predict winners, or understand a team's tournament history.

Import Dependencies and Prep Data

game_id date round region seed team score opponent_seed opponent opponent_score overtime score_diff win seed_id year
0 0 1985-03-14 Round of 64 East 1 Georgetown 68 16 Lehigh 43 0 25 1 1_16_fav 1985
1 0 1985-03-14 Round of 64 East 16 Lehigh 43 1 Georgetown 68 0 -25 0 1_16_dog 1985
2 1 1985-03-14 Round of 64 East 4 Loyola, Illinois 59 13 Iona 58 0 1 1 4_13_fav 1985
3 1 1985-03-14 Round of 64 East 13 Iona 58 4 Loyola, Illinois 59 0 -1 0 4_13_dog 1985
4 2 1985-03-14 Round of 64 East 5 Southern Methodist 85 12 Old Dominion 68 0 17 1 5_12_fav 1985

How does each seed typically perform?

Key Takeaways:

  • As expected, high seeds generally outperform low seeds.
  • There are exceptions, as 9 seeds win fewer games on average than 10 and 11 seeds, and 5 seeds win fewer games on average than 6 seeds.
  • When performing a statistical analysis, many seeds' performances are not significantly different than the neighboring seeds. Interestingly, seeds 8-12 do not have a significant difference in average point margin per game.

Gather Summary Data on Each Seed

Average Point Spread Average Wins Wins by Round
Seed
1 11.392193 3.351562 round Elite Eight 52 National Ch...
2 7.109302 2.398438 round Elite Eight 28 National Ch...
3 4.960452 1.796875 round Elite Eight 14 National Ch...
4 3.313846 1.546875 round Elite Eight 13 National Ch...
5 0.892593 1.109375 round Elite Eight 6 National Cha...
6 0.335793 1.125000 round Elite Eight 3 National Cha...
7 -0.585062 0.890625 round Elite Eight 2 National Cha...
8 -3.281818 0.726562 round Elite Eight 5 National Cha...
9 -4.220000 0.562500 round Elite Eight 1 National Semif...
10 -3.028571 0.640625 round Elite Eight 1 National Semif...
11 -3.524752 0.578125 round Elite Eight 3 National Semif...
12 -4.461538 0.523438 round Elite Eight 0 Round of 32 20 ...
13 -9.043750 0.250000 round Round of 32 6 Round of 64 26 ...
14 -10.688742 0.179688 round Round of 32 2 Round of 64 21 ...
15 -16.065693 0.070312 round Round of 32 1 Round of 64 8 Sw...
16 -24.718750 0.000000 round Round of 64 0 Name: win, dtype: int64

Compare Average Wins By Seed

As expected, higher seeds generally outperform lower seeds. There are exceptions, as 9 seeds win fewer games on average than 10 and 11 seeds, and 5 seeds win fewer games on average than 6 seeds. When plotted against the number of wins that would be expected if the higher seed won each game, the expected levels of advancement appear to generally hold true.

png

Compare Average Point Margin by Seed

As expected, high seeds have a better average margin of victory than lower seeds. Only seeds 1-6 have positive average point margins, which can be explained by the single elimination format - teams that lose early on in the tournament do not have an opportunity to improve their point margin. 9 seeds also underperform in this area as they have a poorer point margin than both 10 and 11 seeds.

png

Typical Tournament Advancement By Seed

The plot below shows the percentage of each seed that advances to each round of the NCAA Tournament. As expected, higher seeds generally outperform lower seeds. There is an interesting trend highlighted in the second plot: while 10-12 seeds are less likely to achieve an upset and advance to the 2nd round, when they do, they are much more likely to win their 2nd round matchup than 8 or 9 seeds. This results in 10, 11, and 12 seeds advancing to the Sweet Sixteen more often than 8 and 9 seeds. This trend can be explained by matchups, and the fact that 8 and 9 seeds that win their first round game must then play a 1 seed in the 2nd round, while 10-12 seeds face easier 2nd round matchups.

png

png

Statistical Comparison of Seeds

Below are the results of a statistical comparison of each seed's average point margin. The average point margin for every team of each seed was collected and a paired t-test was performed on each set of seeds to determine if they were significantly different from one another. As expected, seeds far from each other are very different, but neighboring seeds often did not meet the 0.05 threshold to be considered significantly different from one another. This allowed us to group seeds' performance into the following statistically similar and unique sets: [1, 2, (3, 4), (5, 6, 7), (8, 9, 10, 11, 12), (13, 14), 15, 16]

P Value Seeds Compared Significant Difference
29 0.093204 3 and 4 False
54 0.587780 5 and 6 False
55 0.159540 5 and 7 False
65 0.376600 6 and 7 False
84 0.427234 8 and 9 False
85 0.821109 8 and 10 False
86 0.834070 8 and 11 False
87 0.305909 8 and 12 False
92 0.313126 9 and 10 False
93 0.568254 9 and 11 False
94 0.841835 9 and 12 False
99 0.668556 10 and 11 False
100 0.213455 10 and 12 False
105 0.431084 11 and 12 False
114 0.178705 13 and 14 False
[1, 2, (3, 4), (5, 6, 7), (8, 9, 10, 11, 12), (13, 14), 15, 16]

Which matchups are most likely to produce upsets?

Key Takeaways:

  • There is a linear relationship between absolute seed difference and percentage of upsets in matchups.
  • Every game between a 2 and 5 seed has resulted in an upset.
  • 40% of games between a 2 and 10 seed has resulted in an upset.

The purpose of this section is to determine which matchups are most likely to produce upsets relative to the absolute seed difference of the teams. Matchups of evenly seeded teams were not considered.

Two dataframes were created to analyze and plot the data. The first dataframe has one record for each game from the perspective of the underdogs. This dataframe was used to create a line of best fit to generate the expected upset percentage by absolute seed difference. The second dataframe has one record for each game from the perspective of the winners. It was used to compare each matchup's historical upset percentage to expected upset percentage and plot the data.

The analysis below highlights two subsets of matchups, but the underlying code contains functions that can be used to look at any subset.

Relatively high likelihood of upsets for matchups that don't occur often

png

This subset was defined as matchups that occur less than every four years on average. To eliminate extreme outliers, the matchups also must have occurred at least three times since 1985. From this subset, the four matchups with the most drastic deviation from expected upset percentage are:

2 seed vs 5 seed

  • 4 total games
  • 100% have been upsets

2 seed vs 8 seed

  • 7 total games
  • 71% have been upsets

1 seed vs 11 seed

  • 6 total matchups
  • 50% have been upsets

2 seed vs 4 seed

  • 7 total matchups
  • 57% have been upsets

Relatively high likelihood of upsets for matchups that occur every year

png

This subset was defined as matchups that occur at least once every year on average. The four matchups with the most drastic deviation from expected upset percentage are:

2 seed vs 10 seed

  • 45 total games
  • 40% have been upsets

4 seed vs 12 seed

  • 35 total games
  • 34% have been upsets

3 seed vs 11 seed

  • 42 total matchups
  • 33% have been upsets

5 seed vs 12 seed

  • 128 total matchups
  • 36% have been upsets

How have individual teams performed in the tournament historically?

Key Takeaways:

  • Even the teams that have performed the best historically average less than three wins per tournament.

  • Most of the teams that have performed the best historically are schools that most fans would associate with being the best in men's college basketball.

  • The average number of wins for a given seed was used to predict the number of "expected wins" for each team based on their seeding for each year.

  • Expected wins was used to find teams that have historically over/underperformed.

Quantifying Team Performance

First, we grouped the results of each game by the teams playing to find the following quantities for each team that has been in the tournament: tournament appearances, games played, games won, win percentage, total point margin, average point margin, average wins per tournament, wins expected based on the historical performance of teams with the same seed, expected games played, deviation from expected wins, and average deviation from expected wins. This information was stored in a dataframe so that the results for individual teams could be accessed and compared.

Team Tournament Appearances Games Played Games Won Win Percentage Total Point Margin Average Point Margin Average Number of Wins Expected Wins Expected Games Played Deviation From Expected Wins Average Deviation From Expected Wins
0 Air Force 2 2 0 0 -20 -10 0 0.828125 2.82812 -0.828125 -0.414062
1 Akron 4 4 0 0 -78 -19.5 0 0.914062 4.91406 -0.914062 -0.228516
2 Alabama 15 33 18 0.545455 34 1.0303 1.2 16.8203 31.8203 1.17969 0.0786458
3 Alabama State 2 2 0 0 -69 -34.5 0 0 2 0 0
4 Albany 5 5 0 0 -73 -14.6 0 0.5 5.5 -0.5 -0.1

Finding the "Top Teams"

We then sorted by the number of tournament appearances to find the schools that had been in the most tournaments since the format was changed to include 64 teams. The results were similar to what we originally expected them to be and consisted of schools that have a reputation for being good at men's basketball.

png

Next we sorted by the number of games won by schools in the NCAA tournament. This consisted of many of the same teams that were in the top ten for most tournament appearances. It was interesting to see what a large spread there was between the number of wins for Duke which was first in this category and UCLA which was tenth. It also showed that some schools like Connecticut get a lot of wins each time they are in the tournament but don't necessarily make it into as many tournaments as some of the other schools included.

png

Finally, we sorted the teams by the net average point spread. Again this mostly consisted of the same teams with just a few exceptions. The fact that Kentucky, Duke and Kansas average almost a positive 10 point victory in every game they play in the tournament was higher than expected.

png

Finding Over and Underperforming Teams

We plotted the average deviation from the expected number of wins per tournament versus the average number of wins per tournament to showcase the teams that had overperformed their seeding and had consistently went far in the tournament. The size of the points were based on the number of total games a team has played in the tournament. The large red points on the very right side of the graph represented schools that consistently win a lot of games in the tournament and outperform their seeds. The smaller points in red didn't make as far into the tournament on average, but still overperformed their expected results.

png

This table shows the average number of wins, average deviation from expected wins, games played, and tournament appearances for the teams highlighted in the previous graph. The fact that Duke, North Carolina, Kentucky, and Connecticut had such a high number of wins and often overperformed their seeds was not surprising consider their position on the total number of wins bar graph shown earlier. It was also expected that they wouldn't overperform their seed as often since they are often seeded very high.

It was interesting that there was a team like Butler that almost wins an extra game more than expected every tournament and had a large number of appearances and games played. Their run to the final four a few years ago probably skews this slightly though.

Team Average Number of Wins Average Deviation From Expected Wins Games Played Tournament Appearances
64 Duke 2.90323 0.366179 116 31
169 North Carolina 2.82759 0.460399 108 29
115 Kentucky 2.77778 0.597801 99 27
52 Connecticut 2.75 0.756641 71 20
47 Cleveland State 1.5 1.28516 5 2
28 Butler 1.46154 0.74399 32 13
128 Loyola Marymount 1.33333 0.752604 7 3
77 Florida Gulf Coast 1 0.964844 4 2
168 Norfolk State 1 0.929688 2 1

We plotted the total number of tournament games played versus the total deviation from expected wins. The teams in red are examples that have lots of tournament appearances but don't necessarily perform at the level expected based on their seeding. This trend was harder to show with the previous plot.

png

This table shows the average number of wins, average deviation from expected wins, games played, and tournament appearances for the teams highlighted in the previous graph. These teams were more similar in the average number of wins per tournament than was originally expected.

Team Average Number of Wins Average Deviation From Expected Wins Games Played Tournament Appearances
188 Oklahoma 1.45833 -0.323568 59 24
101 Illinois 1.27273 -0.23402 50 22
45 Cincinnati 1.2 -0.263281 44 20
204 Purdue 1.13043 -0.33288 49 23
198 Pittsburgh 1 -0.503701 38 19
150 Missouri 0.842105 -0.323602 35 19

About

Analysis of NCAA Tournament results by team and seed for data from 1985 - 2016.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published