In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# What we're doing
We're going to create a 95% confidence interval for the mean heigt and mean average experience for tournament-caliber teams and non-tournament-caliber teams, and see if there's any overlap.

## 2013 - Height

In [2]:
data13 = pd.read_csv("2013_CLEAN_with_postseason.csv")
data13.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,air force,77.0,2.2,0
1,akron,78.0,1.7,1
2,alabama,77.0,1.2,0
3,alabama am,75.0,1.4,0
4,alabama state,75.0,1.1,0


In [3]:
tournament_heights13 = data13[data13["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights13 = data13[data13["Postseason"] == 0]["Avg_Height"].values.tolist()

In [4]:
# Bootstrapping for 2013 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples13 = np.random.choice(tournament_heights13, size = (num_bootstraps, len(tournament_heights13)), replace = True)
tourney_sampling_heights_means13 = np.average(tourney_bootstrap_heights_samples13, axis=1)
bootstrap_tourney_heights_ci_l13 = np.percentile(tourney_sampling_heights_means13, 2.5)
bootstrap_tourney_heights_ci_r13 = np.percentile(tourney_sampling_heights_means13, 97.5)
print("Bootstrapped 95% confidence interval for 2013 tournament teams =", (bootstrap_tourney_heights_ci_l13, bootstrap_tourney_heights_ci_r13))

Bootstrapped 95% confidence interval for 2013 tournament teams = (77.06060606060606, 77.43939393939394)


In [5]:
# Bootstrapping for 2013 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples13 = np.random.choice(nontournament_heights13, size = (num_bootstraps, len(nontournament_heights13)), replace = True)
nontourney_sampling_heights_means13 = np.average(nontourney_bootstrap_heights_samples13, axis=1)
bootstrap_nontourney_heights_ci_l13 = np.percentile(nontourney_sampling_heights_means13, 2.5)
bootstrap_nontourney_heights_ci_r13 = np.percentile(nontourney_sampling_heights_means13, 97.5)
print("Bootstrapped 95% confidence interval for 2013 non-tournament teams =", (bootstrap_nontourney_heights_ci_l13, bootstrap_nontourney_heights_ci_r13))

Bootstrapped 95% confidence interval for 2013 non-tournament teams = (76.42962962962963, 76.65555555555555)


For 2013, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.06 and 77.44. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.43 and 76.66.

## 2013 - Average Experience

In [6]:
tournament_exp13 = data13[data13["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp13 = data13[data13["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [7]:
# Bootstrapping for 2013 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples13 = np.random.choice(tournament_exp13, size = (num_bootstraps, len(tournament_exp13)), replace = True)
tourney_sampling_exp_means13 = np.average(tourney_bootstrap_exp_samples13, axis=1)
bootstrap_tourney_exp_ci_l13 = np.percentile(tourney_sampling_exp_means13, 2.5)
bootstrap_tourney_exp_ci_r13 = np.percentile(tourney_sampling_exp_means13, 97.5)
print("Bootstrapped 95% confidence interval for 2013 tournament teams =", (bootstrap_tourney_exp_ci_l13, bootstrap_tourney_exp_ci_r13))

Bootstrapped 95% confidence interval for 2013 tournament teams = (1.602992424242424, 1.7954545454545454)


In [8]:
# Bootstrapping for 2013 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples13 = np.random.choice(nontournament_exp13, size = (num_bootstraps, len(nontournament_exp13)), replace = True)
nontourney_sampling_exp_means13 = np.average(nontourney_bootstrap_exp_samples13, axis=1)
bootstrap_nontourney_exp_ci_l13 = np.percentile(nontourney_sampling_exp_means13, 2.5)
bootstrap_nontourney_exp_ci_r13 = np.percentile(nontourney_sampling_exp_means13, 97.5)
print("Bootstrapped 95% confidence interval for 2013 non-tournament teams =", (bootstrap_nontourney_exp_ci_l13, bootstrap_nontourney_exp_ci_r13))

Bootstrapped 95% confidence interval for 2013 non-tournament teams = (1.2896203703703704, 1.394814814814815)


For 2013, we are 95% confident that the true mean average experience of tournament teams is between 1.60 and 1.80. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.29 and 1.39.

## 2014 - Height

In [9]:
data14 = pd.read_csv("2014_CLEAN_with_postseason.csv")
data14.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,75.0,0.1,0
1,air force,77.0,1.5,0
2,akron,78.0,1.5,0
3,alabama,77.0,1.5,0
4,alabama am,75.0,2.1,0


In [10]:
tournament_heights14 = data14[data14["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights14 = data14[data14["Postseason"] == 0]["Avg_Height"].values.tolist()

In [11]:
# Bootstrapping for 2014 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples14 = np.random.choice(tournament_heights14, size = (num_bootstraps, len(tournament_heights14)), replace = True)
tourney_sampling_heights_means14 = np.average(tourney_bootstrap_heights_samples14, axis=1)
bootstrap_tourney_heights_ci_l14 = np.percentile(tourney_sampling_heights_means14, 2.5)
bootstrap_tourney_heights_ci_r14 = np.percentile(tourney_sampling_heights_means14, 97.5)
print("Bootstrapped 95% confidence interval for 2014 tournament teams =", (bootstrap_tourney_heights_ci_l14, bootstrap_tourney_heights_ci_r14))

Bootstrapped 95% confidence interval for 2014 tournament teams = (77.07462686567165, 77.4776119402985)


In [12]:
# Bootstrapping for 2014 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples14 = np.random.choice(nontournament_heights14, size = (num_bootstraps, len(nontournament_heights14)), replace = True)
nontourney_sampling_heights_means14 = np.average(nontourney_bootstrap_heights_samples14, axis=1)
bootstrap_nontourney_heights_ci_l14 = np.percentile(nontourney_sampling_heights_means14, 2.5)
bootstrap_nontourney_heights_ci_r14 = np.percentile(nontourney_sampling_heights_means14, 97.5)
print("Bootstrapped 95% confidence interval for 2014 non-tournament teams =", (bootstrap_nontourney_heights_ci_l14, bootstrap_nontourney_heights_ci_r14))

Bootstrapped 95% confidence interval for 2014 non-tournament teams = (76.4981684981685, 76.71428571428571)


For 2014, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.07 and 77.48. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.49 and 76.71.

## 2014 - Average Experience

In [13]:
tournament_exp14 = data14[data14["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp14 = data14[data14["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [14]:
# Bootstrapping for 2014 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples14 = np.random.choice(tournament_exp14, size = (num_bootstraps, len(tournament_exp14)), replace = True)
tourney_sampling_exp_means14 = np.average(tourney_bootstrap_exp_samples14, axis=1)
bootstrap_tourney_exp_ci_l14 = np.percentile(tourney_sampling_exp_means14, 2.5)
bootstrap_tourney_exp_ci_r14 = np.percentile(tourney_sampling_exp_means14, 97.5)
print("Bootstrapped 95% confidence interval for 2014 tournament teams =", (bootstrap_tourney_exp_ci_l14, bootstrap_tourney_exp_ci_r14))

Bootstrapped 95% confidence interval for 2014 tournament teams = (1.5, 1.717910447761194)


In [15]:
# Bootstrapping for 2014 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples14 = np.random.choice(nontournament_exp14, size = (num_bootstraps, len(nontournament_exp14)), replace = True)
nontourney_sampling_exp_means14 = np.average(nontourney_bootstrap_exp_samples14, axis=1)
bootstrap_nontourney_exp_ci_l14 = np.percentile(nontourney_sampling_exp_means14, 2.5)
bootstrap_nontourney_exp_ci_r14 = np.percentile(nontourney_sampling_exp_means14, 97.5)
print("Bootstrapped 95% confidence interval for 2014 non-tournament teams =", (bootstrap_nontourney_exp_ci_l14, bootstrap_nontourney_exp_ci_r14))

Bootstrapped 95% confidence interval for 2014 non-tournament teams = (1.3446886446886448, 1.443956043956044)


For 2014, we are 95% confident that the true mean average experience of tournament teams is between 1.50 and 1.72. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.34 and 1.44.

## 2015 - Height

In [16]:
data15 = pd.read_csv("2015_CLEAN_with_postseason.csv")
data15.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,74.0,0.6,0
1,air force,77.0,1.6,0
2,akron,77.0,1.3,0
3,alabama,77.0,1.6,0
4,alabama am,77.0,1.0,0


In [17]:
tournament_heights15 = data15[data15["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights15 = data15[data15["Postseason"] == 0]["Avg_Height"].values.tolist()

In [18]:
# Bootstrapping for 2015 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples15 = np.random.choice(tournament_heights15, size = (num_bootstraps, len(tournament_heights15)), replace = True)
tourney_sampling_heights_means15 = np.average(tourney_bootstrap_heights_samples15, axis=1)
bootstrap_tourney_heights_ci_l15 = np.percentile(tourney_sampling_heights_means15, 2.5)
bootstrap_tourney_heights_ci_r15 = np.percentile(tourney_sampling_heights_means15, 97.5)
print("Bootstrapped 95% confidence interval for 2015 tournament teams =", (bootstrap_tourney_heights_ci_l15, bootstrap_tourney_heights_ci_r15))

Bootstrapped 95% confidence interval for 2015 tournament teams = (77.02941176470588, 77.47058823529412)


In [19]:
# Bootstrapping for 2015 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples15 = np.random.choice(nontournament_heights15, size = (num_bootstraps, len(nontournament_heights15)), replace = True)
nontourney_sampling_heights_means15 = np.average(nontourney_bootstrap_heights_samples15, axis=1)
bootstrap_nontourney_heights_ci_l15 = np.percentile(nontourney_sampling_heights_means15, 2.5)
bootstrap_nontourney_heights_ci_r15 = np.percentile(nontourney_sampling_heights_means15, 97.5)
print("Bootstrapped 95% confidence interval for 2015 non-tournament teams =", (bootstrap_nontourney_heights_ci_l15, bootstrap_nontourney_heights_ci_r15))

Bootstrapped 95% confidence interval for 2015 non-tournament teams = (76.43682310469315, 76.65342960288808)


For 2015, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.03 and 77.47. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.44 and 76.65.

## 2015 - Average Experience

In [20]:
tournament_exp15 = data15[data15["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp15 = data15[data15["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [21]:
# Bootstrapping for 2015 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples15 = np.random.choice(tournament_exp15, size = (num_bootstraps, len(tournament_exp15)), replace = True)
tourney_sampling_exp_means15 = np.average(tourney_bootstrap_exp_samples15, axis=1)
bootstrap_tourney_exp_ci_l15 = np.percentile(tourney_sampling_exp_means15, 2.5)
bootstrap_tourney_exp_ci_r15 = np.percentile(tourney_sampling_exp_means15, 97.5)
print("Bootstrapped 95% confidence interval for 2015 tournament teams =", (bootstrap_tourney_exp_ci_l15, bootstrap_tourney_exp_ci_r15))

Bootstrapped 95% confidence interval for 2015 tournament teams = (1.4426470588235296, 1.638235294117647)


In [22]:
# Bootstrapping for 2015 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples15 = np.random.choice(nontournament_exp15, size = (num_bootstraps, len(nontournament_exp15)), replace = True)
nontourney_sampling_exp_means15 = np.average(nontourney_bootstrap_exp_samples15, axis=1)
bootstrap_nontourney_exp_ci_l15 = np.percentile(nontourney_sampling_exp_means15, 2.5)
bootstrap_nontourney_exp_ci_r15 = np.percentile(nontourney_sampling_exp_means15, 97.5)
print("Bootstrapped 95% confidence interval for 2015 non-tournament teams =", (bootstrap_nontourney_exp_ci_l15, bootstrap_nontourney_exp_ci_r15))

Bootstrapped 95% confidence interval for 2015 non-tournament teams = (1.339350180505415, 1.444043321299639)


For 2015, we are 95% confident that the true mean average experience of tournament teams is between 1.443 and 1.64. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.34 and 1.444.

## 2016 - Height

In [23]:
data16 = pd.read_csv("2016_CLEAN_with_postseason.csv")
data16.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,75.0,0.8,0
1,air force,77.0,1.4,0
2,akron,77.0,1.8,0
3,alabama,77.0,1.6,0
4,alabama am,76.0,1.3,0


In [24]:
tournament_heights16 = data16[data16["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights16 = data16[data16["Postseason"] == 0]["Avg_Height"].values.tolist()

In [25]:
# Bootstrapping for 2016 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples16 = np.random.choice(tournament_heights16, size = (num_bootstraps, len(tournament_heights16)), replace = True)
tourney_sampling_heights_means16 = np.average(tourney_bootstrap_heights_samples16, axis=1)
bootstrap_tourney_heights_ci_l16 = np.percentile(tourney_sampling_heights_means16, 2.5)
bootstrap_tourney_heights_ci_r16 = np.percentile(tourney_sampling_heights_means16, 97.5)
print("Bootstrapped 95% confidence interval for 2016 tournament teams =", (bootstrap_tourney_heights_ci_l16, bootstrap_tourney_heights_ci_r16))

Bootstrapped 95% confidence interval for 2016 tournament teams = (77.02941176470588, 77.47058823529412)


In [26]:
# Bootstrapping for 2016 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples16 = np.random.choice(nontournament_heights16, size = (num_bootstraps, len(nontournament_heights16)), replace = True)
nontourney_sampling_heights_means16 = np.average(nontourney_bootstrap_heights_samples16, axis=1)
bootstrap_nontourney_heights_ci_l16 = np.percentile(nontourney_sampling_heights_means16, 2.5)
bootstrap_nontourney_heights_ci_r16 = np.percentile(nontourney_sampling_heights_means16, 97.5)
print("Bootstrapped 95% confidence interval for 2016 non-tournament teams =", (bootstrap_nontourney_heights_ci_l16, bootstrap_nontourney_heights_ci_r16))

Bootstrapped 95% confidence interval for 2016 non-tournament teams = (76.50709219858156, 76.71985815602837)


For 2016, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.03 and 77.47. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.51 and 76.72.

## 2016 - Average Experience

In [27]:
tournament_exp16 = data16[data16["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp16 = data16[data16["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [28]:
# Bootstrapping for 2016 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples16 = np.random.choice(tournament_exp16, size = (num_bootstraps, len(tournament_exp16)), replace = True)
tourney_sampling_exp_means16 = np.average(tourney_bootstrap_exp_samples16, axis=1)
bootstrap_tourney_exp_ci_l16 = np.percentile(tourney_sampling_exp_means16, 2.5)
bootstrap_tourney_exp_ci_r16 = np.percentile(tourney_sampling_exp_means16, 97.5)
print("Bootstrapped 95% confidence interval for 2016 tournament teams =", (bootstrap_tourney_exp_ci_l16, bootstrap_tourney_exp_ci_r16))

Bootstrapped 95% confidence interval for 2016 tournament teams = (1.5205882352941176, 1.7102941176470585)


In [29]:
# Bootstrapping for 2016 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples16 = np.random.choice(nontournament_exp16, size = (num_bootstraps, len(nontournament_exp16)), replace = True)
nontourney_sampling_exp_means16 = np.average(nontourney_bootstrap_exp_samples16, axis=1)
bootstrap_nontourney_exp_ci_l16 = np.percentile(nontourney_sampling_exp_means16, 2.5)
bootstrap_nontourney_exp_ci_r16 = np.percentile(nontourney_sampling_exp_means16, 97.5)
print("Bootstrapped 95% confidence interval for 2016 non-tournament teams =", (bootstrap_nontourney_exp_ci_l16, bootstrap_nontourney_exp_ci_r16))

Bootstrapped 95% confidence interval for 2016 non-tournament teams = (1.3173758865248226, 1.417375886524823)


For 2016, we are 95% confident that the true mean average experience of tournament teams is between 1.52 and 1.71. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.32 and 1.42.

## 2017 - Height

In [30]:
data17 = pd.read_csv("2017_CLEAN_with_postseason.csv")
data17.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,76.0,1.0,0
1,air force,76.0,1.9,0
2,akron,77.0,1.8,0
3,alabama,78.0,1.5,0
4,alabama am,75.0,1.3,0


In [31]:
tournament_heights17 = data17[data17["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights17 = data17[data17["Postseason"] == 0]["Avg_Height"].values.tolist()

In [32]:
# Bootstrapping for 2017 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples17 = np.random.choice(tournament_heights17, size = (num_bootstraps, len(tournament_heights17)), replace = True)
tourney_sampling_heights_means17 = np.average(tourney_bootstrap_heights_samples17, axis=1)
bootstrap_tourney_heights_ci_l17 = np.percentile(tourney_sampling_heights_means17, 2.5)
bootstrap_tourney_heights_ci_r17 = np.percentile(tourney_sampling_heights_means17, 97.5)
print("Bootstrapped 95% confidence interval for 2017 tournament teams =", (bootstrap_tourney_heights_ci_l17, bootstrap_tourney_heights_ci_r17))

Bootstrapped 95% confidence interval for 2017 tournament teams = (76.79411764705883, 77.25)


In [33]:
# Bootstrapping for 2017 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples17 = np.random.choice(nontournament_heights17, size = (num_bootstraps, len(nontournament_heights17)), replace = True)
nontourney_sampling_heights_means17 = np.average(nontourney_bootstrap_heights_samples17, axis=1)
bootstrap_nontourney_heights_ci_l17 = np.percentile(nontourney_sampling_heights_means17, 2.5)
bootstrap_nontourney_heights_ci_r17 = np.percentile(nontourney_sampling_heights_means17, 97.5)
print("Bootstrapped 95% confidence interval for 2017 non-tournament teams =", (bootstrap_nontourney_heights_ci_l17, bootstrap_nontourney_heights_ci_r17))

Bootstrapped 95% confidence interval for 2017 non-tournament teams = (76.59074733096085, 76.79359430604983)


For 2017, we are 95% confident that the true mean height of tournament teams, in inches, is between 76.794 and 77.25. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.59 and 76.793.

## 2017 - Average Experience

In [34]:
tournament_exp17 = data17[data17["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp17 = data17[data17["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [35]:
# Bootstrapping for 2017 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples17 = np.random.choice(tournament_exp17, size = (num_bootstraps, len(tournament_exp17)), replace = True)
tourney_sampling_exp_means17 = np.average(tourney_bootstrap_exp_samples17, axis=1)
bootstrap_tourney_exp_ci_l17 = np.percentile(tourney_sampling_exp_means17, 2.5)
bootstrap_tourney_exp_ci_r17 = np.percentile(tourney_sampling_exp_means17, 97.5)
print("Bootstrapped 95% confidence interval for 2017 tournament teams =", (bootstrap_tourney_exp_ci_l17, bootstrap_tourney_exp_ci_r17))

Bootstrapped 95% confidence interval for 2017 tournament teams = (1.5191176470588235, 1.6882352941176468)


In [36]:
# Bootstrapping for 2017 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples17 = np.random.choice(nontournament_exp17, size = (num_bootstraps, len(nontournament_exp17)), replace = True)
nontourney_sampling_exp_means17 = np.average(nontourney_bootstrap_exp_samples17, axis=1)
bootstrap_nontourney_exp_ci_l17 = np.percentile(nontourney_sampling_exp_means17, 2.5)
bootstrap_nontourney_exp_ci_r17 = np.percentile(nontourney_sampling_exp_means17, 97.5)
print("Bootstrapped 95% confidence interval for 2017 non-tournament teams =", (bootstrap_nontourney_exp_ci_l17, bootstrap_nontourney_exp_ci_r17))

Bootstrapped 95% confidence interval for 2017 non-tournament teams = (1.3402135231316725, 1.4291814946619217)


For 2017, we are 95% confident that the true mean average experience of tournament teams is between 1.52 and 1.69. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.34 and 1.43.

## 2018 - Height

In [37]:
data18 = pd.read_csv("2018_CLEAN_with_postseason.csv")
data18.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,76.0,1.5,0
1,air force,77.0,1.9,0
2,akron,77.0,1.2,0
3,alabama,78.0,0.9,1
4,alabama am,75.0,0.9,0


In [38]:
tournament_heights18 = data18[data18["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights18 = data18[data18["Postseason"] == 0]["Avg_Height"].values.tolist()

In [39]:
# Bootstrapping for 2018 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples18 = np.random.choice(tournament_heights18, size = (num_bootstraps, len(tournament_heights18)), replace = True)
tourney_sampling_heights_means18 = np.average(tourney_bootstrap_heights_samples18, axis=1)
bootstrap_tourney_heights_ci_l18 = np.percentile(tourney_sampling_heights_means18, 2.5)
bootstrap_tourney_heights_ci_r18 = np.percentile(tourney_sampling_heights_means18, 97.5)
print("Bootstrapped 95% confidence interval for 2018 tournament teams =", (bootstrap_tourney_heights_ci_l18, bootstrap_tourney_heights_ci_r18))

Bootstrapped 95% confidence interval for 2018 tournament teams = (77.01515151515152, 77.5)


In [40]:
# Bootstrapping for 2018 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples18 = np.random.choice(nontournament_heights18, size = (num_bootstraps, len(nontournament_heights18)), replace = True)
nontourney_sampling_heights_means18 = np.average(nontourney_bootstrap_heights_samples18, axis=1)
bootstrap_nontourney_heights_ci_l18 = np.percentile(nontourney_sampling_heights_means18, 2.5)
bootstrap_nontourney_heights_ci_r18 = np.percentile(nontourney_sampling_heights_means18, 97.5)
print("Bootstrapped 95% confidence interval for 2018 non-tournament teams =", (bootstrap_nontourney_heights_ci_l18, bootstrap_nontourney_heights_ci_r18))

Bootstrapped 95% confidence interval for 2018 non-tournament teams = (76.64768683274022, 76.86120996441281)


For 2018, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.02 and 77.50. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.65 and 76.86.

## 2018 - Average Experience

In [41]:
tournament_exp18 = data18[data18["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp18 = data18[data18["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [42]:
# Bootstrapping for 2018 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples18 = np.random.choice(tournament_exp18, size = (num_bootstraps, len(tournament_exp18)), replace = True)
tourney_sampling_exp_means18 = np.average(tourney_bootstrap_exp_samples18, axis=1)
bootstrap_tourney_exp_ci_l18 = np.percentile(tourney_sampling_exp_means18, 2.5)
bootstrap_tourney_exp_ci_r18 = np.percentile(tourney_sampling_exp_means18, 97.5)
print("Bootstrapped 95% confidence interval for 2018 tournament teams =", (bootstrap_tourney_exp_ci_l18, bootstrap_tourney_exp_ci_r18))

Bootstrapped 95% confidence interval for 2018 tournament teams = (1.4545454545454546, 1.639431818181818)


In [43]:
# Bootstrapping for 2018 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples18 = np.random.choice(nontournament_exp18, size = (num_bootstraps, len(nontournament_exp18)), replace = True)
nontourney_sampling_exp_means18 = np.average(nontourney_bootstrap_exp_samples18, axis=1)
bootstrap_nontourney_exp_ci_l18 = np.percentile(nontourney_sampling_exp_means18, 2.5)
bootstrap_nontourney_exp_ci_r18 = np.percentile(nontourney_sampling_exp_means18, 97.5)
print("Bootstrapped 95% confidence interval for 2018 non-tournament teams =", (bootstrap_nontourney_exp_ci_l18, bootstrap_nontourney_exp_ci_r18))

Bootstrapped 95% confidence interval for 2018 non-tournament teams = (1.3558718861209964, 1.4548042704626336)


For 2018, we are 95% confident that the true mean average experience of tournament teams is between 1.45 and 1.64. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.36 and 1.45.

## 2019 - Height

In [44]:
data19 = pd.read_csv("2019_CLEAN_with_postseason.csv")
data19.head()

Unnamed: 0,School,Avg_Height,Avg_Experience,Postseason
0,abilene christian,76.0,2.0,1
1,air force,77.0,1.5,0
2,akron,76.0,1.5,0
3,alabama,78.0,1.8,0
4,alabama am,76.0,0.8,0


In [45]:
tournament_heights19 = data19[data19["Postseason"] == 1]["Avg_Height"].values.tolist()
nontournament_heights19 = data19[data19["Postseason"] == 0]["Avg_Height"].values.tolist()

In [46]:
# Bootstrapping for 2019 tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples19 = np.random.choice(tournament_heights19, size = (num_bootstraps, len(tournament_heights19)), replace = True)
tourney_sampling_heights_means19 = np.average(tourney_bootstrap_heights_samples19, axis=1)
bootstrap_tourney_heights_ci_l19 = np.percentile(tourney_sampling_heights_means19, 2.5)
bootstrap_tourney_heights_ci_r19 = np.percentile(tourney_sampling_heights_means19, 97.5)
print("Bootstrapped 95% confidence interval for 2019 tournament teams =", (bootstrap_tourney_heights_ci_l19, bootstrap_tourney_heights_ci_r19))

Bootstrapped 95% confidence interval for 2019 tournament teams = (77.14925373134328, 77.56716417910448)


In [47]:
# Bootstrapping for 2019 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples19 = np.random.choice(nontournament_heights19, size = (num_bootstraps, len(nontournament_heights19)), replace = True)
nontourney_sampling_heights_means19 = np.average(nontourney_bootstrap_heights_samples19, axis=1)
bootstrap_nontourney_heights_ci_l19 = np.percentile(nontourney_sampling_heights_means19, 2.5)
bootstrap_nontourney_heights_ci_r19 = np.percentile(nontourney_sampling_heights_means19, 97.5)
print("Bootstrapped 95% confidence interval for 2019 non-tournament teams =", (bootstrap_nontourney_heights_ci_l19, bootstrap_nontourney_heights_ci_r19))

Bootstrapped 95% confidence interval for 2019 non-tournament teams = (76.66308243727599, 76.87455197132617)


For 2019, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.15 and 77.57. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.66 and 76.87.

## 2019 - Average Experience

In [48]:
tournament_exp19 = data19[data19["Postseason"] == 1]["Avg_Experience"].values.tolist()
nontournament_exp19 = data19[data19["Postseason"] == 0]["Avg_Experience"].values.tolist()

In [49]:
# Bootstrapping for 2019 tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples19 = np.random.choice(tournament_exp19, size = (num_bootstraps, len(tournament_exp19)), replace = True)
tourney_sampling_exp_means19 = np.average(tourney_bootstrap_exp_samples19, axis=1)
bootstrap_tourney_exp_ci_l19 = np.percentile(tourney_sampling_exp_means19, 2.5)
bootstrap_tourney_exp_ci_r19 = np.percentile(tourney_sampling_exp_means19, 97.5)
print("Bootstrapped 95% confidence interval for 2019 tournament teams =", (bootstrap_tourney_exp_ci_l19, bootstrap_tourney_exp_ci_r19))

Bootstrapped 95% confidence interval for 2019 tournament teams = (1.5358208955223882, 1.7402985074626864)


In [50]:
# Bootstrapping for 2019 non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples19 = np.random.choice(nontournament_exp19, size = (num_bootstraps, len(nontournament_exp19)), replace = True)
nontourney_sampling_exp_means19 = np.average(nontourney_bootstrap_exp_samples19, axis=1)
bootstrap_nontourney_exp_ci_l19 = np.percentile(nontourney_sampling_exp_means19, 2.5)
bootstrap_nontourney_exp_ci_r19 = np.percentile(nontourney_sampling_exp_means19, 97.5)
print("Bootstrapped 95% confidence interval for 2019 non-tournament teams =", (bootstrap_nontourney_exp_ci_l19, bootstrap_nontourney_exp_ci_r19))

Bootstrapped 95% confidence interval for 2019 non-tournament teams = (1.310035842293907, 1.40752688172043)


For 2019, we are 95% confident that the true mean average experience of tournament teams is between 1.54 and 1.74. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.31 and 1.41.

## Overall Team Height
All years combined

In [51]:
tournament_heights_total = tournament_heights13 + tournament_heights14 + tournament_heights15 + tournament_heights16 + tournament_heights17 + tournament_heights18 + tournament_heights19
nontournament_heights_total = nontournament_heights13 + nontournament_heights14 + nontournament_heights15 + nontournament_heights16 + nontournament_heights17 + nontournament_heights18 + nontournament_heights19

In [52]:
# Bootstrapping for total tournament teams

np.random.seed(703)
num_bootstraps = 10000
tourney_bootstrap_heights_samples_total = np.random.choice(tournament_heights_total, size = (num_bootstraps, len(tournament_heights_total)), replace = True)
tourney_sampling_heights_means_total = np.average(tourney_bootstrap_heights_samples_total, axis=1)
bootstrap_tourney_heights_ci_l_total = np.percentile(tourney_sampling_heights_means_total, 2.5)
bootstrap_tourney_heights_ci_r_total = np.percentile(tourney_sampling_heights_means_total, 97.5)
print("Bootstrapped 95% confidence interval for total tournament teams =", (bootstrap_tourney_heights_ci_l_total, bootstrap_tourney_heights_ci_r_total))

Bootstrapped 95% confidence interval for total tournament teams = (77.15957446808511, 77.32127659574468)


In [53]:
# Bootstrapping for total non-tournament teams

np.random.seed(703)
nontourney_bootstrap_heights_samples_total = np.random.choice(nontournament_heights_total, size = (num_bootstraps, len(nontournament_heights_total)), replace = True)
nontourney_sampling_heights_means_total = np.average(nontourney_bootstrap_heights_samples_total, axis=1)
bootstrap_nontourney_heights_ci_l_total = np.percentile(nontourney_sampling_heights_means_total, 2.5)
bootstrap_nontourney_heights_ci_r_total = np.percentile(nontourney_sampling_heights_means_total, 97.5)
print("Bootstrapped 95% confidence interval for totaln non-tournament teams =", (bootstrap_nontourney_heights_ci_l_total, bootstrap_nontourney_heights_ci_r_total))

Bootstrapped 95% confidence interval for totaln non-tournament teams = (76.60473494595986, 76.68605249614)


Overall, we are 95% confident that the true mean height of tournament teams, in inches, is between 77.16 and 77.32. We are also 95% confident that the true mean height of non-tournament teams, in inches, is between 76.60 and 76.69.

## Overall Team Average Experience
All years combined

In [54]:
tournament_exp_total = tournament_exp13 + tournament_exp14 + tournament_exp15 + tournament_exp16 + tournament_exp17 + tournament_exp18 + tournament_exp19
nontournament_exp_total = nontournament_exp13 + nontournament_exp14 + nontournament_exp15 + nontournament_exp16 + nontournament_exp17 + nontournament_exp18 + nontournament_exp19

In [55]:
# Bootstrapping for total tournament teams

np.random.seed(703)
tourney_bootstrap_exp_samples_total = np.random.choice(tournament_exp_total, size = (num_bootstraps, len(tournament_exp_total)), replace = True)
tourney_sampling_exp_means_total = np.average(tourney_bootstrap_exp_samples_total, axis=1)
bootstrap_tourney_exp_ci_l_total = np.percentile(tourney_sampling_exp_means_total, 2.5)
bootstrap_tourney_exp_ci_r_total = np.percentile(tourney_sampling_exp_means_total, 97.5)
print("Bootstrapped 95% confidence interval for total tournament teams =", (bootstrap_tourney_exp_ci_l_total, bootstrap_tourney_exp_ci_r_total))

Bootstrapped 95% confidence interval for total tournament teams = (1.5704255319148934, 1.6436170212765957)


In [56]:
# Bootstrapping for total non-tournament teams

np.random.seed(703)
nontourney_bootstrap_exp_samples_total = np.random.choice(nontournament_exp_total, size = (num_bootstraps, len(nontournament_exp_total)), replace = True)
nontourney_sampling_exp_means_total = np.average(nontourney_bootstrap_exp_samples_total, axis=1)
bootstrap_nontourney_exp_ci_l_total = np.percentile(nontourney_sampling_exp_means_total, 2.5)
bootstrap_nontourney_exp_ci_r_total = np.percentile(nontourney_sampling_exp_means_total, 97.5)
print("Bootstrapped 95% confidence interval for total non-tournament teams =", (bootstrap_nontourney_exp_ci_l_total, bootstrap_nontourney_exp_ci_r_total))

Bootstrapped 95% confidence interval for total non-tournament teams = (1.3595985589294903, 1.3963458569222853)


Overall, we are 95% confident that the true mean average experience of tournament teams is between 1.57 and 1.64. We are also 95% confident that the true mean average experience of non-tournament teams is between 1.36 and 1.40.

## Hypothesis Tests

In [57]:
tourney_mean = np.mean(tournament_heights_total)
nontourney_mean = np.mean(nontournament_heights_total)

tourney_sd = np.std(tournament_heights_total)
nontourney_sd = np.std(nontournament_heights_total)

tourney_n = len(tournament_heights_total)
nontourney_n = len(nontournament_heights_total)

stats.ttest_ind_from_stats(mean1 = tourney_mean, std1 = tourney_sd, nobs1 = tourney_n, 
                           mean2 = nontourney_mean, std2 = nontourney_sd, nobs2 = nontourney_n)

Ttest_indResult(statistic=12.64635080874502, pvalue=1.5349844336210725e-35)

The results of the two-sided hypothesis test above, in which we calculated a $p$-value of $1.53 \times 10^{-35}$, tells us that there is statistically significant evidence to reject the null hypothesis that the mean height for tournament teams and non-tournament teams is the same. Thus, we can say that there is a statistically-significant difference between these two means.

In [58]:
tourney_exp_mean = np.mean(tournament_exp_total)
nontourney_exp_mean = np.mean(nontournament_exp_total)

tourney_exp_sd = np.std(tournament_exp_total)
nontourney_exp_sd = np.std(nontournament_exp_total)

tourney_exp_n = len(tournament_exp_total)
nontourney_exp_n = len(nontournament_exp_total)

stats.ttest_ind_from_stats(mean1 = tourney_exp_mean, std1 = tourney_exp_sd, nobs1 = tourney_exp_n, 
                           mean2 = nontourney_exp_mean, std2 = nontourney_exp_sd, nobs2 = nontourney_exp_n)

Ttest_indResult(statistic=10.679321140523161, pvalue=4.8125085652985653e-26)

The results of the two-sided hypothesis test above, in which we calculated a $p$-value of $4.81 \times 10^{-26}$, tells us that there is statistically significant evidence to reject the null hypothesis that the mean average experience for tournament teams and non-tournament teams is the same. Thus, we can say that there is a statistically-significant difference between these two means.

# Conclusions

Given that there is no overlap in the 95% confidence intervals for the mean height and mean average experience of tournament teams vs. non-tournament teams for all years combined, we can conclude that there is statistically significant evidence that tournament teams are, on average, taller and more experienced than teams that did not make the tournament. This overall finding is supported when looking at the individual years, as well, with one exception in the 2016 height category.

Further, the hypothesis tests that we conducted above prove that we have statistically-significant evidence to say that the mean height and mean average experience of tournament teams vs. non-tournament teams is different, and our confidence intervals show us that tournament teams are, on average, taller and more experienced.