# Introduction

The purpose of this section is to quickly ensure there is no missing values and there are no strange outliers in the datasets indicated in "Data Section 2 " to "Data Section 4" and "Data Section 6". The following section will be used to perform a more thorough data analysis and visualization of the dataset. This kernel will also serve as a way of practicing data analysis and visualization. 

Please leave any comments on how kernel can be improved whether it is through better visualizations, explanations, etc. 

Thanks!

Also please note that this is an on going notebook and if there are any sections which are currently blank, they will be filled out in the near future. 

# Table Of Contents
* [Import and load data](#import)
* [Initial Data Analysis](#init-data) 
    * [Analyzing Regular Season Detailed Results_2018 Preliminary Table](#data-analysis-reg-det)
    * [Analyzing NCAA Tourney Detailed Results Table](#data-analysis-ncaa-det)
    * [Analyzing Cities_2018 Preliminary Table](#data-analysis-cities)
    * [Analyzing Game Cities_2018 Preliminary Table](#data-analysis-gamecities)
    * [Analyzing Massey Ordinals Table](#data-analysis-mass-ord)
    * [Analyzing Team Coaches Table](#data-analysis-team-coaches)
    * [Analyzing Conferences Table](#data-analysis-conf)
    * [Analyzing Team Conferences Table](#data-analysis-team-conf)
    * [Analyzing Conferences Tourney Games Table](#data-analysis-conf-tou-games)
    * [Analyzing Secondary Tourney Teams Table](#data-analysis-sec-tou-teams)
    * [Analyzing Secondary Tourney Compact Results Table](#data-analysis-sec-tou-com-res)
    * [Analyzing Team Spelling Table](#data-analysis-team-spell)
    * [Analyzing NCAA Tourney Slots Table](#data-analysis-ncaa-tou-slots)
    * [Analyzing NCAA Tourney Seed Round Slots Table](#data-analysis-ncaa-tou-seeds-round)
    
* [Conclusion](#conclusion)

## <a id='import'></a> Import and load data

In [104]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.plotly as py
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))
import matplotlib.pyplot as plt
%matplotlib inline
# Any results you write to the current directory are saved as output.

In [105]:
temp = os.listdir("../input")
all_csv = {}
for i in range(0,len(temp)):
#for i in range(0,33):
    if(temp[i].split(".")[1] == "csv"):
        all_csv[temp[i].split(".")[0]] = pd.read_csv("../input/"+temp[i],encoding = 'ISO-8859-1')

print(all_csv.keys())

# <a id='init-data'></a> Initial Data Analysis

In [106]:
[i for i in all_csv.keys() if 'Prelim' in i]

The following section contain functions that I will be using later in the kernels

In [107]:
def summ(df):
    print(df.head())
    print(100*"*")
    print(df.info())
    print(100*"*")
    if(len(df.select_dtypes(include = ['O']).columns) == 0):
        print("No Objects in DataFrame")
    else:
        print(df.describe(include = ['O']))
    print(100*"*")
    if(len(df.select_dtypes(exclude = ['O']).columns) == 0):
        print("No Integers/Floats in DataFrame")
    else:
        print(df.describe(exclude =['O']))
    print(100*"*")
    if(np.sum(df.isnull().sum().values) == 0):
        print('No missing values')
    else:
        print(df.isnull().sum()[df.isnull().sum().values== 1])
    return(None)

In [108]:
def heatmap(df, height_ratio = (.9, .03), hspace = .15, fig_w_h = (18,25)):
    grid_kws = {"height_ratios": height_ratio, "hspace": hspace}
    fig1, (ax1, cbar1) = plt.subplots(2, gridspec_kw=grid_kws)
    fig1.set_size_inches(fig_w_h)
    ax = sns.heatmap(df.select_dtypes(exclude = ['object']).corr(), \
                 ax=ax1, cbar_ax=cbar1, \
                 cbar_kws={"orientation": "horizontal"},vmin = -1, vmax = 1,\
                 linewidths = 1,annot = True,fmt='.2f',
                annot_kws = {"size": 11})
    cbar1.set_title("Correlation Between Features")
    ax.tick_params(labelsize=12)
    return(ax)
    #scale figure appropriately

In [109]:
def analyze_cont(df,col,x = 'Season',hue = "WLoc",type_plot = "box",wid = 11, hght = 8):
    if (type_plot == "hist"):
        temp_len = len(df[hue].unique())
        hue_val = df[hue].unique()
        #print(hue_val)
        for i in range(len(col)):
            fig,ax = plt.subplots(1,1,figsize = (wid,hght)) 
            #ax.append(plt.subplot2grid(shape=(nx,ny),rowspan = 1, colspan = 1, loc = (xv[i],yv[i])))
            for j in range(0,temp_len):
                sns.distplot(df.loc[df[hue]==hue_val[j],col[i]], \
                    ax = ax,label = hue_val[j], hist = False, kde = True)
        #ax.xaxis.set_tick_params(rotation=45)
        ax.legend(loc = 0)
        plt.show()
    elif(type_plot == "box"):
        for i in range(len(col)):
            fig,ax = plt.subplots(1,1,figsize = (wid,hght)) 
            #ax.append(plt.subplot2grid(shape=(nx,ny),rowspan = 1, colspan = 1, loc = (xv[i],yv[i])))
            sns.boxplot(y = col[i], x = x, data = df, \
                       ax = ax, hue = hue, orient = 'v')
            ax.xaxis.set_tick_params(rotation=45)
            ax.legend(loc = 0)
            plt.show()
    elif(type_plot == "violin"):
        for i in range(len(col)):
            fig,ax = plt.subplots(1,1,figsize = (wid,hght)) 
            #ax.append(plt.subplot2grid(shape=(nx,ny),rowspan = 1, colspan = 1, loc = (xv[i],yv[i])))
            sns.violinplot(y = col[i], x = x, data = df, \
                       ax = ax, hue = hue, orient = 'v')
            ax.xaxis.set_tick_params(rotation=45)
            ax.legend(loc = 0)
            plt.show()
    else:
        print("Invalid Plot Type")
    return(None)

In [110]:
def rankCount_season(df, wid, hght, x_val, y_val, col):
    fig, ax = plt.subplots(13,2, figsize = (wid, hght))
    plt_range = [(i,j) for i in range(13) for j in range(2)]
    for k,m in zip(plt_range,range(1,26)):
        sns.barplot(data = df[df[col]==m].sort_values(by = ["rankCount"], ascending = False), 
                    x = x_val, y = y_val, ax = ax[k[0]][k[1]])
        ax[k[0]][k[1]].xaxis.label.set_size(15)
        ax[k[0]][k[1]].yaxis.label.set_size(15)
        #ax[k[0]][k[1]].set_xticklabels(ax[k[0]][k[1]].get_xticklabels(), fontsize=30)
        ax[k[0]][k[1]].set_yticklabels(ax[k[0]][k[1]].get_yticklabels(), fontsize=15)
        ax[k[0]][k[1]].set_title("Rank: " + str(m), size = 20)
    title ="AP POLL Season: "+str(df.Season.unique()[0])
    #print(title)
    plt.suptitle(title, fontsize = 30)
    plt.subplots_adjust(top=0.97, wspace = 0.35, hspace = 0.35)
   #plt.tight_layout(h_pad = 2)
    return(None)

## <a id='data-analysis-reg-det'></a> Analyzing Regular Season Detailed Results Table

In [111]:
summ(all_csv['RegularSeasonDetailedResults_Prelim2018'])

No missing values in any column or outliers.

In [112]:
heatmap(all_csv['RegularSeasonDetailedResults_Prelim2018'])

It is very interesting to note that certain intuitons about basketball are apparent here. For example it makes sense as a teams PF (personal fouls) increases, the number of free throws made and attempted should go up. This is clearly indicated in the heatmap above. Another for example is TO (turnover) and Stl (steals) which are also highly correlated with each other

In [113]:
all_csv['RegularSeasonDetailedResults_Prelim2018']['WLoc'].unique().tolist()

In [114]:
all_csv['RegularSeasonDetailedResults_Prelim2018'].columns

In [115]:
all_csv['RegularSeasonDetailedResults_Prelim2018'][[i for i in \
    all_csv['RegularSeasonDetailedResults_Prelim2018'].columns if i not in \
    ['Season','DayNum','WTeamID','LTeamID']]].head()

In [116]:
col = all_csv['RegularSeasonDetailedResults_Prelim2018'][[i for i \
    in all_csv['RegularSeasonDetailedResults_Prelim2018'].columns if i not in \
    ['Season','DayNum','WTeamID','LTeamID','WLoc']]].columns

In [117]:
analyze_cont(df = all_csv['RegularSeasonDetailedResults_Prelim2018'],col = col,\
             hue = 'WLoc',type_plot = 'box',x = 'Season',wid = 10, hght = 8)

From the boxplots above, we can see that the winning team location does not do much to partition the different features in this dataframe.

In [118]:
analyze_cont(df = all_csv['RegularSeasonDetailedResults_Prelim2018'],col = col,\
             hue = 'WLoc',type_plot = 'hist',x = 'Season',wid = 10, hght = 8)

We can ignore the following features: Season, DayNum, WTeamID, LTeamID and analyze the remaining the features. Most of the features displayed above appear to follow a normal distribution. A few of them are left skewed and are quite bumpy because the cardinality of the feature is quite small. When cardinality of a continuous feature is small, we will get regions where the bin count is zero in the case of a histrogram which when converted to a kde plot will lead to the sudden dips in those regions. The distribution of NumOT makes sense since we can only have whole numbers for overtime values which is why we see spikes at 1, 2, 3, etc. 



As we can also see from the boxplot above, the features do not vary much across seasons and we can again see that the location of the winning team does not really partition the data

Looking at the individual features is great but would be better would be to look at the winning and losing features side by side to compare

In [119]:
all_csv['RegularSeasonDetailedResults_Prelim2018'].columns

In [120]:
Losing_Team = all_csv['RegularSeasonDetailedResults_Prelim2018'][[i for i in \
                all_csv['RegularSeasonDetailedResults_Prelim2018'].columns \
                                                   if i[0]=='L' or i in ['Season','WLoc']]]

In [121]:
Losing_Team.head()

In [122]:
for name_i in Losing_Team.columns.tolist()[1:]:
    Losing_Team = Losing_Team.rename(index = str, columns = \
                  {name_i:name_i[1:]})
Losing_Team.head()

In [123]:
Losing_Team["Outcome"] = "Losing Team"

In [124]:
Losing_Team.head()

In [125]:
Winning_Team = all_csv['RegularSeasonDetailedResults_Prelim2018'][[i for i in \
                all_csv['RegularSeasonDetailedResults_Prelim2018'].columns \
                                                   if i[0]=='W' or i in ['Season']]]

In [126]:
Winning_Team.head()

In [127]:
for name_i in Winning_Team.columns.tolist()[1:]:
    Winning_Team = Winning_Team.rename(index = str, columns = \
                  {name_i:name_i[1:]})
Winning_Team.head()

In [128]:
Winning_Team["Outcome"] = "Winning Team"

In [129]:
Winning_Team.head()

In [130]:
Winning_Losing_Team = pd.concat([Winning_Team, Losing_Team], axis = 0)

In [131]:
Winning_Losing_Team.head()

In [132]:
 ax = sns.factorplot(x="Loc", y='FGM',
                   hue="Outcome", col="Season", col_wrap = 4,\
                    data=Winning_Losing_Team, \
                    orient="v", palette="Set3", \
                    kind="violin", dodge=True, size = 4, aspect = 0.8)

What the above plot show again is that on average the location of the winning team does not really partition the data into different sets. So whether its field goal made or scoer, we should expect on the average the values to be the same per season. 

In [133]:
for name_i in [i for i in Winning_Losing_Team.columns.tolist() if i not in \
               ['Season','Loc','Outcome','TeamID']]:
    ax = sns.factorplot(x="Loc", y=name_i,
                   hue="Outcome", col="Season", col_wrap = 4,\
                    data=Winning_Losing_Team, \
                    orient="v", palette="Set3", \
                    kind="box", dodge=True)
    #g[name_i]=ax

## <a id='data-analysis-ncaa-det'></a> Analyzing NCAA Tourney Detailed Results Table

In [134]:
summ(all_csv['NCAATourneyDetailedResults'])

No missing values or outliers. We should expect similar visualizations to the ones seen in the RegularSeasonDetailed results section

In [135]:
heatmap(all_csv['NCAATourneyDetailedResults'])

In [136]:
tour_losing_team =all_csv['NCAATourneyDetailedResults'][[i for i in \
                all_csv['NCAATourneyDetailedResults'].columns \
               if i[0]=='L' or i in ['Season','WLoc']]]
for name_i in tour_losing_team.columns.tolist()[1:]:
    tour_losing_team = tour_losing_team.rename(index = str, columns = \
                  {name_i:name_i[1:]})
tour_losing_team["Outcome"] = "Losing Team"
tour_losing_team.head()

In [137]:
tour_winning_team =all_csv['NCAATourneyDetailedResults'][[i for i in \
                all_csv['NCAATourneyDetailedResults'].columns \
               if i[0]=='W' or i in ['Season','WLoc']]]
for name_i in tour_winning_team.columns.tolist()[1:]:
    tour_winning_team = tour_winning_team.rename(index = str, columns = \
                  {name_i:name_i[1:]})
tour_winning_team["Outcome"] = "Winning Team"
tour_winning_team.head()

In [138]:
tour_winning_losing_team = pd.concat([tour_winning_team,tour_losing_team], axis = 0)
tour_winning_losing_team.head()

In [139]:
 ax = sns.factorplot(x="Loc", y='FGM',
                   hue="Outcome", col="Season", col_wrap = 4,\
                    data=tour_winning_losing_team, \
                    orient="v", palette="Set3", \
                    kind="bar", dodge=True, size = 4, aspect = 0.8)

Just as expected, we see that on average the different statistics are constant over the available seasons

## <a id='data-analysis-cities'></a> Analyzing Cities Table

In [140]:
summ(all_csv['Cities_Prelim2018'])

One missing vale for state. Since there are 413 observations and only 1 is missing we can take a look at the missing observation then drop it

In [141]:
all_csv['Cities_Prelim2018'][all_csv['Cities_Prelim2018'].State.isnull()]

Rather than dropping observation, imputation can be used since missing location is in Mexico, California, Texas or Puerto Rico. A quick google search indicates that missing value is Puerto Rico.

In [142]:
all_csv['Cities_Prelim2018'].State.fillna('PR',inplace = True)

In [143]:
all_csv['Cities_Prelim2018'].isnull().sum()

In [144]:
all_csv['Cities_Prelim2018'].State.value_counts().index

In [145]:
fig, ax = plt.subplots(1,1, figsize = (15,5))
sns.countplot(all_csv['Cities_Prelim2018']['State'], ax = ax,
             order = all_csv['Cities_Prelim2018'].State.value_counts().index)
ax.xaxis.set_tick_params(rotation = 90)

In [146]:
all_csv['Cities_Prelim2018'].State.value_counts()[all_csv['Cities_Prelim2018'].State.value_counts().values>15]

In [147]:
 100*all_csv['Cities_Prelim2018'].State.value_counts()[all_csv['Cities_Prelim2018'].State.value_counts().values>15].sum()/\
    all_csv['Cities_Prelim2018'].State.value_counts().sum()

More than 25% of the cities are in New York(Region - North East), Texas (Region - West South Central), California (Region - Pacific), Florida (Region - South Atlantic), and North Carolina (Region - South Atlantic). 

The table by itself does not tell much but once merged with the Game Cities table, more information could be be inferred from data

## <a id='data-analysis-gamecities'></a> Analyzing Game Cities Table

In [148]:
summ(all_csv['GameCities_Prelim2018'])

In [149]:
ax = sns.barplot(x = "CRType", y = "Percentage",\
            data = pd.DataFrame(100*all_csv['GameCities_Prelim2018'].CRType.value_counts()/\
all_csv['GameCities_Prelim2018'].CRType.value_counts().sum()).reset_index().\
           rename(index = str, columns = {"index":"CRType","CRType":"Percentage"}))
pos = range(len(all_csv['GameCities_Prelim2018'].CRType.value_counts().values))
max_val = [np.around(i,2) for i in (100*all_csv['GameCities_Prelim2018'].CRType.value_counts()/\
all_csv['GameCities_Prelim2018'].CRType.value_counts().sum()).tolist()]
for tick,label in zip(pos,ax.get_xticklabels()):
    ax.text(pos[tick], max_val[tick] + 0.03, str(max_val[tick])+"%", \
    horizontalalignment='center', color='black', weight='semibold')
plt.show()

As we can see most of the game cities in this table, (~98%), correspond to regular season games. 

In [150]:
cities = all_csv['Cities_Prelim2018'].merge(all_csv['GameCities_Prelim2018'],on = "CityID")

In [151]:
summ(cities)

In [152]:
#fig, ax = plt.subplots(1,1)
#g = sns.factorplot("CRType", col = 'State',data = cities,\
#              kind = "count", col_wrap = 4)
cities.groupby(['State','CRType'])['CRType'].count().apply(lambda x: 100*x/cities.shape[0]).\
sort_values(ascending = False).head(10)

In [153]:
sns.factorplot(data = cities.groupby(['State','CRType'])['City'].count().apply(lambda x: \
100*x/cities.shape[0]).reset_index(level = "CRType").reset_index(), col = 'State', \
col_wrap = 4, x = "CRType", y = "City", kind = "bar")

As expected, NY, CA, TX, NC and FL are in the top 7 states where games since 2010 were played. It is expected since the five states mentioned consist of ~27% if the cities listed in the Cities.csv table.  PA and OH are also in the list of top 7 states where games were played since 2010. Although they only have 12 game cities each, both states seem to be a popular locations for the games. 

In [154]:
cities[cities.CRType =='NCAA'].State.value_counts().apply(lambda x: 100 * x/\
cities[cities.CRType =='NCAA'].shape[0]).head(10)

In [155]:
cities[cities.CRType =='NCAA'].State.value_counts().apply(lambda x: 100 * x/\
cities[cities.CRType =='NCAA'].shape[0]).reset_index().head(10)

In [156]:
fig1, ax1 = plt.subplots(1,1, figsize = (15,5))
sns.barplot(data =cities[cities.CRType =='NCAA'].State.value_counts().apply(lambda x: 100 * x/\
cities[cities.CRType =='NCAA'].shape[0]).reset_index(), x = "index", y = "State",
order = cities[cities.CRType =='NCAA'].State.value_counts().sort_values(ascending = False).index,\
ax = ax1)
ax1.set_xlabel('STATES')
ax1.set_ylabel('Percentage (%)')
plt.show()

It is also interesting to show that NY, TX, NC, FL and CA are among the states where a high percentage of NCAA tournament games are played. The tournament is usually played on neutral ground so these States must be popular destinations. 

## <a id='data-analysis-mass-ord'></a> Analyzing Massey Ordinals Table

In [157]:
summ(all_csv['MasseyOrdinals_Prelim2018'])

Rather than using teamID, I will merge this table with Teams.csv table in order to use TeamName instead

In [158]:
masseyOrdinals = all_csv['MasseyOrdinals_Prelim2018'].merge(all_csv['Teams'][['TeamID','TeamName']], on = 'TeamID')

Time to analyze the rankings and see how much of the observations corresponding to the rankings

In [159]:
100*masseyOrdinals.SystemName.value_counts().sort_values(ascending = False).head(10)/\
masseyOrdinals.SystemName.value_counts().sum()

In [160]:
masseyOrdinals.groupby(['Season','SystemName'])['OrdinalRank'].agg({"Min": np.min, "Max":np.max}).head(10)

Upon evaluating the masseyOrdinals table, the AP rank is what I will be examining in closer detail  since it deals with only the top 25 teams each season. I could have also selected only the top 25 teams from each ranking but that can be done later or in a different kernel. 

In [161]:
masseyOrdinals_AP = masseyOrdinals[masseyOrdinals.SystemName == 'AP']
masseyOrdinals_AP.head()

Now lets look at the distribution of team rankings from 2003 to 2018

In [162]:
fig, ax = plt.subplots(8,2)
fig.set_size_inches(30,150)
wid_hei = [(j,k) for j in range(9) for k in range(2)]
for i,j in zip(np.sort(masseyOrdinals_AP.Season.unique()),wid_hei):
    sns.boxplot(data = masseyOrdinals_AP[masseyOrdinals_AP.Season == i], \
                y = 'TeamName', x = 'OrdinalRank', ax = ax[j[0]][j[1]])
    ax[j[0]][j[1]].yaxis.label.set_size(20)
    ax[j[0]][j[1]].xaxis.label.set_size(20)
    ax[j[0]][j[1]].set_ylabel('Team Name')
    ax[j[0]][j[1]].set_yticklabels(ax[j[0]][j[1]].get_yticklabels(), fontsize=20)
    ax[j[0]][j[1]].set_title("Season: " + str(i), size = 30)
    #ax[j[0]][j[1]].set_xticklabels(ax[j[0]][j[1]].get_xticklabels(), fontsize=20)
plt.tight_layout()

What stands out to me is teams that have a small distribution in the upper rankings like Duke in 2007 or Arizona in 2014. It just shows that those teams were extremely dominant in their respective years and stayed on top of the rankings for the whole season. What we can now do is see how many times each team ranked in the AP poll got a certain rank. For example how many times Xavier were ranked number 1 since 2003, etc. 

In [163]:
masseyOrdinals_AP_rankcount = masseyOrdinals_AP.groupby(\
    ['Season','TeamName','OrdinalRank'])['SystemName'].count()
masseyOrdinals_AP_rankcount.head()

In [164]:
masseyOrdinals_AP_rankcount = masseyOrdinals_AP_rankcount.reset_index(level = \
    ["TeamName","OrdinalRank","Season"]).rename(index=str, columns={"SystemName": "rankCount"})

In [165]:
rankCount_season(df = masseyOrdinals_AP_rankcount[masseyOrdinals_AP_rankcount.Season == 2010],
                wid = 20, hght = 60, y_val = "TeamName", x_val = "rankCount", col = "OrdinalRank")

With the rankCount_season function, I will be able to look at how teams were ranked in AP poll from 2003 to 2016. All I need to do is change the year  in the function to look at a different year.

 Next, I'll look at the most dominant teams since 2003. Ill be focusing on the teams in the top 6

In [166]:
plt_val = [(i,j) for i in range(3) for j in range(2)]
fig, ax = plt.subplots(3,2, figsize = (20,35))
temp = masseyOrdinals_AP.groupby(\
        ['OrdinalRank','TeamName'])['SystemName'].count().reset_index(level = \
        ['TeamName','OrdinalRank'])
#print(temp.head())
for i,j in zip(plt_val, range(1,7)):
    sns.barplot(data = temp[temp.OrdinalRank == j].sort_values(by = 'SystemName', ascending = False), \
                y = "TeamName", x = "SystemName", \
                ax = ax[i[0]][i[1]])
    ax[i[0]][i[1]].yaxis.label.set_size(15)
    ax[i[0]][i[1]].xaxis.label.set_size(15)
    ax[i[0]][i[1]].set_xlabel('AP Rank: '+str(j)+" count")
    ax[i[0]][i[1]].set_yticklabels(ax[i[0]][i[1]].get_yticklabels(), fontsize=12)
    ax[i[0]][i[1]].set_title("Rank: "+str(j), size = 20)
plt.suptitle("AP Total Rank Since 2003", fontsize=30)
plt.subplots_adjust(wspace = 0.3, hspace = 0.125, top=0.965)

It is very interesting to note that is is a consistent set of teams that tdominate the top 6. 

## <a id='data-analysis-team-coaches'></a> Analyzing Team Coaches Table

In [167]:
summ(all_csv['TeamCoaches_Prelim2018'])

We can look to see who the longest serving coach is per team

In [178]:
teamCoaches = all_csv['TeamCoaches_Prelim2018'].merge(all_csv['Teams'][['TeamID','TeamName']])
teamCoaches.head()

In [184]:
teamCoaches_count = teamCoaches.groupby(['TeamName','CoachName'])['Season'].count()
teamCoaches_count.sort_values(ascending = False).head(10)

In [None]:
all_csv['TeamCoaches_Prelim2018'].groupby(["Season"])

## <a id='data-analysis-conf'></a> Analyzing Conferences Table

In [168]:
summ(all_csv['Conferences'])

## <a id='data-analysis-team-conf'></a> Analyzing Team Conferences Table

In [169]:
summ(all_csv['TeamConferences'])

## <a id='data-analysis-conf-tou-games'></a> Analyzing Conference Tourney Games Table

In [170]:
summ(all_csv['ConferenceTourneyGames'])

## <a id='data-analysis-sec-tou-teams'></a> Analyzing Secondary Tourney Teams Table

In [171]:
summ(all_csv['SecondaryTourneyTeams'])

## <a id='data-analysis-sec-tou-com-res'></a> Analyzing Secondary Tourney Compact Results Table

In [172]:
summ(all_csv['SecondaryTourneyCompactResults'])

## <a id='data-analysis-team-spell'></a> Analyzing Team Spelling Table

In [173]:
summ(all_csv['TeamSpellings'])

## <a id='data-analysis-ncaa-tou-slots'></a> Analyzing NCAA Tourney Slots Table

In [174]:
summ(all_csv['NCAATourneySlots'])

## <a id='data-analysis-ncaa-tou-seeds-round'></a> Analyzing NCAA Tourney Seeds Round Slots Table

In [175]:
summ(all_csv['NCAATourneySeedRoundSlots'])

# <a id='conclusion'></a> Conclusion