# Statistiques descriptives 
# 2020 March Madness
In this notebook I explore the 2020 Men's and Women's NCAA basketball data. Hopefully you find the analysis and code helpful. Feel free to use any of the helper functions in your code but please reference this as the original source.

![](https://upload.wikimedia.org/wikipedia/en/thumb/2/28/March_Madness_logo.svg/440px-March_Madness_logo.svg.png)

# Import Libraries


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib as mpl
from matplotlib.patches import Circle, Rectangle, Arc
import seaborn as sns
plt.style.use('seaborn-dark-palette')
mypal = plt.rcParams['axes.prop_cycle'].by_key()['color'] # Grab the color pal
import os
import gc

MENS_DIR = '../input/google-cloud-ncaa-march-madness-2020-division-1-mens-tournament'
WOMENS_DIR = '../input/google-cloud-ncaa-march-madness-2020-division-1-womens-tournament'

## Loss Metric & Sample Submission
Log Loss is the metric we will be evaluated on for the tournament prediction challenge. This metric provides a stronger punishment that are overly confident and wrong.

In [None]:
def logloss(true_label, predicted, eps=1e-15):
    p = np.clip(predicted, eps, 1 - eps)
    if true_label == 1:
        return -np.log(p)
    return -np.log(1 - p)

In [None]:
print(f'Confident Wrong Prediction: \t\t {logloss(1, 0.01):0.4f}')
print(f'Confident Correct Prediction: \t\t {logloss(0, 0.01):0.4f}')
print(f'Non-Confident Wrong Prediction: \t {logloss(1, 0.49):0.4f}')
print(f'Non-Confident Correct Prediction: \t {logloss(0, 0.49):0.4f}')

Your submission will have a prediction for every possible combination of tournament teams. 
- Stage 1 (not final) will be graded your score will be based on 2015-2019. It's possible to cheat and get a perfect score.. but don't do that. 
- In Stage 2 you will be graded on the outcomes of the yet to be played 2020 tournament.
- `ID` is in the format SSSS_XXXX_YYYY, where SSSS is the four digit season number, XXXX is the four-digit TeamID of the lower-ID team, and YYYY is the four-digit TeamID of the higher-ID team. Read more here: https://www.kaggle.com/c/march-madness-analytics-2020/data

In [None]:
Mss = pd.read_csv(f'{MENS_DIR}/MSampleSubmissionStage1_2020.csv')
Wss = pd.read_csv(f'{WOMENS_DIR}/WSampleSubmissionStage1_2020.csv')
Mss.head()

# Team Data
**MTeams & WTeams**

Team name and Team ID, first and last D1 Season. Sorting by the `FirstD1Season` column we can see some of the newest teams in D1 basketball. Welcome to D1 Merrimack! Cool mascot.
![](https://media0.giphy.com/media/Q5G8oHPpDGLb0aaayD/giphy.gif)

# Womens teams

In [None]:
# Womens' data does not contain years joined :(
WTeams = pd.read_csv(f'{WOMENS_DIR}/WDataFiles_Stage1/WTeams.csv')
WTeams.head()

In [None]:
len(WTeams) # number of teams in total

# Seasons Data
## WSeasons
These files identify the different seasons included in the historical data, along with certain season-level properties.


In [None]:
WSeasons = pd.read_csv(f'{WOMENS_DIR}/WDataFiles_Stage1/WSeasons.csv')
WSeasons.head()

# Day Zero : first day of the season
# Regions = to identify the four regions

In [None]:
WSeasons

# Tourney Seed Data

## WNCAATourneySeeds for the NCAA Tournament : March Madness

This file identifies the seeds for all teams in each NCAA® tournament, for all seasons of historical data.

In [None]:
WNCAATourneySeeds = pd.read_csv(f'{WOMENS_DIR}/WDataFiles_Stage1/WNCAATourneySeeds.csv')

In [None]:
WNCAATourneySeeds.head()

In [None]:
# lets get the seeds for 2019
# teams selected for the March Madness
march_2019 = WNCAATourneySeeds[WNCAATourneySeeds['Season'] == 2019]
march_2019

In [None]:
# let's join this with the teams data to see some of the past matchups

teams = WNCAATourneySeeds.merge(WTeams, validate='many_to_one')

In [None]:
teams

In [None]:
len(teams['TeamID'].unique()) # teams selected for the NCAA

In [None]:
count = teams.groupby('TeamName').count() # to see old and pretty young teams
count = count.sort_values('TeamID', ascending = False)
old_teams = count[count['Season']>10]

In [None]:
plt.figure(figsize = (40,39))
plt.barh(count.index[:30], count['TeamID'][:30])

# Regular Season Results
## WRegularSeasonCompactResults

These files identify the game-by-game NCAA® tournament results for all seasons of historical data.

In [None]:
WRegularSeasonCompactResults = pd.read_csv(f'{WOMENS_DIR}/WDataFiles_Stage1/WRegularSeasonCompactResults.csv')

In [None]:
# We have the team the won, lost and the score.
WRegularSeasonCompactResults.head(5)

We can join our regular season results on the team names to more clearly identify the games.

In [None]:
# Lets Add the winning and losing team names to the results

WRegularSeasonCompactResults = \
    WRegularSeasonCompactResults \
    .merge(WTeams[['TeamName', 'TeamID']],
           left_on='WTeamID',
           right_on='TeamID',
           validate='many_to_one') \
    .drop('TeamID', axis=1) \
    .rename(columns={'TeamName': 'WTeamName'}) \
    .merge(WTeams[['TeamName', 'TeamID']],
           left_on='LTeamID',
           right_on='TeamID') \
    .drop('TeamID', axis=1) \
    .rename(columns={'TeamName': 'LTeamName'})

In [None]:
WRegularSeasonCompactResults

In [None]:
results_post_2015 = WRegularSeasonCompactResults[WRegularSeasonCompactResults['Season']>2014]

In [None]:
victories = WRegularSeasonCompactResults.groupby(['Season', 'WTeamName']).count()

In [None]:
victories =victories.reset_index()

# Visualize number of wins per team Since 2015

In [None]:
for i in [2015,2016,2017,2018,2019]: 
    plt.style.use('fivethirtyeight')
    data =  victories[victories['Season']==i] 
    data = data.sort_values('DayNum', ascending = False)
    plt.figure(figsize = (15,12))
    a = 'Season '+str(i)
    plt.title(a)
    plt.barh(data['WTeamName'][:20], data['DayNum'][:20])

In [None]:
WRegularSeasonCompactResults.head()

In [None]:
# score difference 
WRegularSeasonCompactResults['Score_Diff'] = WRegularSeasonCompactResults['WScore'] - WRegularSeasonCompactResults['LScore']

In [None]:
results_2009 = WRegularSeasonCompactResults[WRegularSeasonCompactResults['Season']>2009]

# Most Winning teams in general

In [None]:
plt.style.use('fivethirtyeight')
WRegularSeasonCompactResults['counter'] = 1
WRegularSeasonCompactResults.groupby('WTeamName')['counter'] \
    .count() \
    .sort_values() \
    .tail(20) \
    .plot(kind='barh',
          title='Most Winning (Regular Season) Womens Teams',
          figsize=(15, 8),
          xlim=(400, 680),
          color=mypal[0])
plt.show()


In [None]:
# after 2009

plt.style.use('fivethirtyeight')
results_2009['counter'] = 1
results_2009.groupby('WTeamName')['counter'] \
    .count() \
    .sort_values() \
    .tail(20) \
    .plot(kind='barh',
          title='Most Winning (Regular Season) Teams after 2009',
          figsize=(15, 8),
          xlim=(10, 350),
          color=mypal[1])
plt.show()


In [None]:
teams_2019 = WRegularSeasonCompactResults[WRegularSeasonCompactResults['Season']==2019]

In [None]:
len(teams_2019['WTeamID'].unique())

# Results NCAA Tournament 

In [None]:
WRegularTourneyCompactResults = pd.read_csv(f'{WOMENS_DIR}/WDataFiles_Stage1/WNCAATourneyCompactResults.csv')

In [None]:
WRegularTourneyCompactResults = \
    WRegularTourneyCompactResults \
    .merge(WTeams[['TeamName', 'TeamID']],
           left_on='WTeamID',
           right_on='TeamID',
           validate='many_to_one') \
    .drop('TeamID', axis=1) \
    .rename(columns={'TeamName': 'WTeamName'}) \
    .merge(WTeams[['TeamName', 'TeamID']],
           left_on='LTeamID',
           right_on='TeamID') \
    .drop('TeamID', axis=1) \
    .rename(columns={'TeamName': 'LTeamName'})

In [None]:
WRegularTourneyCompactResults['Score_Diff'] = WRegularTourneyCompactResults['WScore'] - WRegularTourneyCompactResults['LScore']

In [None]:
WRegularTourneyCompactResults

In [None]:
WNCAATourneySeeds

In [None]:
WRegularTourneyCompactResults = \
    WRegularTourneyCompactResults \
    .merge(WNCAATourneySeeds[['Seed', 'TeamID', 'Season']],
           left_on=['WTeamID', 'Season'],
           right_on=['TeamID','Season'],
           validate='many_to_one') \
    .drop('TeamID', axis=1) \
    .rename(columns={'Seed': 'WSeed'}) \
    .merge(WNCAATourneySeeds[['Seed', 'TeamID', 'Season']],
           left_on=['LTeamID', 'Season'],
           right_on=['TeamID','Season'],
           validate='many_to_one') \
    .drop('TeamID', axis=1) \
    .rename(columns={'Seed': 'LSeed'})

In [None]:
WRegularTourneyCompactResults

In [None]:
WRegularTourneyCompactResults = WRegularTourneyCompactResults.sort_values(['Season', 'DayNum'])

In [None]:
WRegularTourneyCompactResults

> It could be interesting to check the seeds of every team and the "unexpected" results (Cinderella) 

In [None]:
WRegularTourneyCompactResults= WRegularTourneyCompactResults.reset_index()

In [None]:
WRegularTourneyCompactResults['index']=WRegularTourneyCompactResults.index

# Add Rounds to Tourney Data

In [None]:
WRegularTourneyCompactResults['Round'] = 0 
#WRegularTourneyCompactResults = WRegularTourneyCompactResults.reset_index()
for i in WRegularTourneyCompactResults.index: 
    
    if WRegularTourneyCompactResults['Season'][i]<2003 :
        #print(WRegularTourneyCompactResults['Season'][i])
        #print(WRegularTourneyCompactResults['DayNum'][i])
        if WRegularTourneyCompactResults['DayNum'][i] == 137 :
            WRegularTourneyCompactResults['Round'][i]= 1 
        elif WRegularTourneyCompactResults['DayNum'][i] == 138: 
            WRegularTourneyCompactResults['Round'][i]= 1 
            
        elif WRegularTourneyCompactResults['DayNum'][i] == 139 :
            WRegularTourneyCompactResults['Round'][i]= 2 
        elif WRegularTourneyCompactResults['DayNum'] [i] == 140 :
            WRegularTourneyCompactResults['Round'][i]= 2 
            
        elif WRegularTourneyCompactResults['DayNum'][i] ==145 :
            WRegularTourneyCompactResults['Round'][i]= 3 
        elif WRegularTourneyCompactResults['DayNum'][i] ==147 :
            WRegularTourneyCompactResults['Round'][i]= 4 
        elif WRegularTourneyCompactResults['DayNum'][i] ==151: 
            WRegularTourneyCompactResults['Round'][i]= 5
        else: #WRegularTourneyCompactResults['DayNum'][i]==153:
            WRegularTourneyCompactResults['Round'][i]= 6
                

    else :   
        WRegularTourneyCompactResults['Round'][i] = 0 
        if WRegularTourneyCompactResults['Season'][i]<2015 : 
            if WRegularTourneyCompactResults['DayNum'][i] ==138 :
                WRegularTourneyCompactResults['Round'][i]= 1 
            elif WRegularTourneyCompactResults['DayNum'][i] ==139: 
                WRegularTourneyCompactResults['Round'][i]= 1
            elif WRegularTourneyCompactResults['DayNum'][i] == 140 :
                WRegularTourneyCompactResults['Round'][i]= 2
            elif WRegularTourneyCompactResults['DayNum'][i] ==141:
                WRegularTourneyCompactResults['Round'][i]= 2 
            elif WRegularTourneyCompactResults['DayNum'][i] ==145 :
                WRegularTourneyCompactResults['Round'][i]= 3 
            elif WRegularTourneyCompactResults['DayNum'][i] ==146:
                WRegularTourneyCompactResults['Round'][i]= 3 
            elif WRegularTourneyCompactResults['DayNum'][i] ==147:
                WRegularTourneyCompactResults['Round'][i]= 4
            elif WRegularTourneyCompactResults['DayNum'][i] ==148:
                WRegularTourneyCompactResults['Round'][i]= 4 
            elif WRegularTourneyCompactResults['DayNum'][i] ==153: 
                WRegularTourneyCompactResults['Round'][i]= 5
            else: #WRegularTourneyCompactResults['DayNum'][i]==155:
                WRegularTourneyCompactResults['Round'][i]= 6
    
        else :  
            if WRegularTourneyCompactResults['Season'][i]<2017 : 

                if WRegularTourneyCompactResults['DayNum'][i] ==137:
                    WRegularTourneyCompactResults['Round'][i]= 1
                elif WRegularTourneyCompactResults['DayNum'][i] ==138:
                    WRegularTourneyCompactResults['Round'][i]= 1 
                elif WRegularTourneyCompactResults['DayNum'][i] ==139 or WRegularTourneyCompactResults['DayNum'][i] ==140:
                    WRegularTourneyCompactResults['Round'][i]= 2 
                elif WRegularTourneyCompactResults['DayNum'][i] ==144 or WRegularTourneyCompactResults['DayNum'][i] ==145:
                    WRegularTourneyCompactResults['Round'][i]= 3 
                elif WRegularTourneyCompactResults['DayNum'][i] ==146 or WRegularTourneyCompactResults['DayNum'][i] ==147:
                    WRegularTourneyCompactResults['Round'][i]= 4 
                elif WRegularTourneyCompactResults['DayNum'][i] ==153: 
                    WRegularTourneyCompactResults['Round'][i]= 5
                else: # WRegularTourneyCompactResults['DayNum'][i]==155:
                    WRegularTourneyCompactResults['Round'][i]= 6

            else : 
                if WRegularTourneyCompactResults['DayNum'][i] ==137 or WRegularTourneyCompactResults['DayNum'][i] ==138:
                    WRegularTourneyCompactResults['Round'][i]= 1 
                elif WRegularTourneyCompactResults['DayNum'][i] ==139 or WRegularTourneyCompactResults['DayNum'][i] ==140:
                    WRegularTourneyCompactResults['Round'][i]= 2 
                elif WRegularTourneyCompactResults['DayNum'][i] ==144 or WRegularTourneyCompactResults['DayNum'][i] ==145:
                    WRegularTourneyCompactResults['Round'][i]= 3 
                elif WRegularTourneyCompactResults['DayNum'][i] ==146 or WRegularTourneyCompactResults['DayNum'][i] ==147:
                    WRegularTourneyCompactResults['Round'][i]= 4 
                elif WRegularTourneyCompactResults['DayNum'][i] ==151: 
                    WRegularTourneyCompactResults['Round'][i]= 5
                else: # WRegularTourneyCompactResults['DayNum'][i] ==153:
                    WRegularTourneyCompactResults['Round'][i]= 6  
            

# Add Regions to Results

In [None]:

WRegularTourneyCompactResults['Region']=''
for i in WRegularTourneyCompactResults.index:
    if WRegularTourneyCompactResults['LSeed'][i][0] == WRegularTourneyCompactResults['WSeed'][i][0]: 
        WRegularTourneyCompactResults['Region'][i]=WRegularTourneyCompactResults['LSeed'][i][0]
    else : 
        WRegularTourneyCompactResults['Region'][i]=WRegularTourneyCompactResults['WSeed'][i][0] + WRegularTourneyCompactResults['LSeed'][i][0]
        
        

# Add Seeds to results

In [None]:
WRegularTourneyCompactResults['Seeds']=''
for i in WRegularTourneyCompactResults.index: 
    WRegularTourneyCompactResults['Seeds'][i] = str(WRegularTourneyCompactResults['WSeed'][i][1:]) + '-' + str(int(WRegularTourneyCompactResults['LSeed'][i][1:]))



# Cinderellas (surprises) for the tournament

In [None]:
cinderella= pd.DataFrame()
same_seed = pd.DataFrame()
predicted =pd.DataFrame()

for i in WRegularTourneyCompactResults.index: 
    if int(WRegularTourneyCompactResults['WSeed'][i][1:])>int(WRegularTourneyCompactResults['LSeed'][i][1:]):
        #print((WRegularTourneyCompactResults['WSeed'][i][1:], WRegularTourneyCompactResults['LSeed'][i][1:]))
        cinderella = pd.concat([cinderella, pd.DataFrame(WRegularTourneyCompactResults[WRegularTourneyCompactResults['index']==i])])
    elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==int(WRegularTourneyCompactResults['LSeed'][i][1:]):
        same_seed = pd.concat([same_seed, pd.DataFrame(WRegularTourneyCompactResults[WRegularTourneyCompactResults['index']==i])])
    
    else : 
        predicted = pd.concat([predicted, pd.DataFrame(WRegularTourneyCompactResults[WRegularTourneyCompactResults['index']==i])])


In [None]:
# surprises 
cinderella= cinderella.reset_index()

# same seed games 
same_seed =same_seed.reset_index()

# predicted wins with seeds 
predicted =predicted.reset_index()

## Look at resutls according to rounds

In [None]:
round_6 = WRegularTourneyCompactResults[WRegularTourneyCompactResults['Round']==6]
round_1 = WRegularTourneyCompactResults[WRegularTourneyCompactResults['Round']==1]
round_5 = WRegularTourneyCompactResults[WRegularTourneyCompactResults['Round']==5]

In [None]:
# Surprises at Round 1 
plt.figure(figsize =(30,30))
test = round_1.groupby('Seeds').count()
test = test.sort_values('Season')
plt.bar(test.index, test['Season'])
plt.title('Surprises according to seeds')

In [None]:
# Surprised in Semis (Round 5)

plt.figure(figsize =(30,30))
test = round_5.groupby('Seeds').count()
test = test.sort_values('Season')
plt.bar(test.index, test['Season'])
plt.title('Surprises according to seeds')

In [None]:
# Surprises in Final (round 6)

plt.figure(figsize =(30,30))
test = round_6.groupby('Seeds').count()
test = test.sort_values('Season')
plt.bar(test.index, test['Season'])
plt.title('Surprises according to seeds')

## Same Seed Games Proportion

In [None]:
(len(same_seed)/len(WRegularTourneyCompactResults))*100

Same seeds games represent only 2 % of the total games
(Only one in 2019 = the final)

## Results in 2019

In [None]:
cinderella_2019 = cinderella[cinderella['Season']==2019]
march_2019 = WRegularTourneyCompactResults[WRegularTourneyCompactResults['Season']==2019]
march_2019 = march_2019.sort_values('DayNum')

### Surprises Proportions since 1998

In [None]:
(len(cinderella)/len(WRegularTourneyCompactResults))*100

21.5 % of surprises over the years

### Surprises Proportions in 2019

In [None]:
# in 2019
len(cinderella_2019)/len(march_2019)

> by using the seeds you would have got 83% accuracy in 2019 : so it's going to be the baseline to improve the model

# Little and Big Cinderellas


In [None]:
# get the ones with more than 1 seed difference 
# for ex match between W01 and W02 wont count 

big_cinderella = pd.DataFrame()
little_cinderella = pd.DataFrame()
for i in cinderella.index: 
    if int(cinderella['WSeed'][i][1:]) - int(cinderella['LSeed'][i][1:])>1 :
        big_cinderella = pd.concat([big_cinderella, pd.DataFrame(cinderella[WRegularTourneyCompactResults['index']==i])])
    else : 
        little_cinderella = pd.concat([little_cinderella, pd.DataFrame(cinderella[WRegularTourneyCompactResults['index']==i])])
        

In [None]:
# Big Cinderellas 

In [None]:
(len(big_cinderella)/len(WRegularTourneyCompactResults))*100

13% of games are big surprises 

In [None]:
big_cinderella.groupby('Seeds').count()['Season'].sort_values().plot(kind = 'barh',
          title='Seeds for big cinderellas',
          figsize=(15, 8),
          color=mypal[0])
plt.show()

# Little Cinderellas

In [None]:
(len(little_cinderella)/len(WRegularTourneyCompactResults))*100

8.6% = little surprises (1 seed difference) 

In [None]:
little_cinderella.groupby('Seeds').count()['Season'].sort_values().plot(kind = 'barh',
          title='Seeds for Little cinderellas',
          figsize=(15, 8),
          color=mypal[0])
plt.show()

# Cinderellas Over the Years

In [None]:
# number of surprises over the years
plt.style.use('fivethirtyeight')
test = cinderella.groupby('Season').count()
plt.bar(test.index, test['index'])
plt.title('Cinderellas over the years')

### BIG CINDERELLAS 

In [None]:
# number of big surprises over the years
plt.style.use('fivethirtyeight')
test = big_cinderella.groupby('Season').count()
plt.bar(test.index, big_cinderella.groupby('Season').count()['index'])
plt.title('Big Cinderellas over the years')

## Cinderellas according to ROUNDS

In [None]:
# predict surprises according to rounds

test = cinderella.groupby('Round').count()
plt.bar(test.index, test['Season'])
plt.title('Surprises over the years according to rounds')

In [None]:
# in 2019 

test = cinderella_2019.groupby('Round').count()
plt.bar(test.index, test['Season'])
plt.title('Surprises in each round in season 2019')


### SURPRISES ACCORDING TO SEED REGIONS : W,Y,X Z

In [None]:
test = cinderella.groupby('Region').count()
plt.bar(test.index, test['Season'])
plt.title('Surprises IN REGIONS')

### SURPRISES ACCORDING TO SEED REGIONS EACH YEAR : 

In [None]:
for i in cinderella['Season'].unique():
    cind =cinderella[cinderella['Season']==i ]
    test = cind.groupby('Region').count()
    plt.figure()
    plt.bar(test.index, test['Season'])
    plt.title(i)

In [None]:
# code to get surprises per year per region per round

'''for i in cinderella['Season'].unique()[20:]:
    cind =cinderella[cinderella['Season']==i ]
    
    for j in cind['Region'].unique():
        cind2 = cind[cind['Region']==j]
        test = cind2.groupby('Round').count()
        
        plt.figure(figsize =(20,20))
        plt.bar(test.index, test['Season'])
        hello = str(i)+' in region ' + str(j)
        plt.title(hello)'''

* ## Surprises according to seeds combinations for all cinderellas

In [None]:
cinderella.groupby('Seeds').count()['Season'].sort_values().plot(kind = 'barh',
          title='Seeds for All cinderellas',
          figsize=(15, 8),
          color=mypal[0])
plt.show()

## Seeds per Season for cinderellas

In [None]:
# what seeds are involved mostly 
for i in cinderella['Season'].unique():
    cind.groupby('Seeds').count()['Season'].sort_values().plot(kind = 'barh',
              title='Seeds for All cinderellas in '+str(i) ,
              figsize=(15, 8),
              color=mypal[0])
    plt.show()

## Big Cinderellas per Season per Region 

In [None]:
for i in big_cinderella['Season'].unique():
    cind =big_cinderella[big_cinderella['Season']==i ]
    cind.groupby('Region').count()['Season'].sort_values().plot(kind = 'barh',
              title='Big Cinderellas in Season ' +str(i) ,
              figsize=(15, 8),
              color=mypal[0])
    plt.show()

## Seeds Surprises after 2009

In [None]:

cinderella_post_2010 = cinderella[cinderella['Season']>2009]
plt.figure(figsize =(30,30))
test = cinderella_post_2010.groupby('Seeds').count()
test = test.sort_values('Season')
plt.bar(test.index, test['Season'])
plt.title('Surprises according to seeds')

In [None]:
# to get the round of each seed combination
tryt = cinderella[cinderella['Seeds']=='07-2']
tryt

- 05-4 Round 2
- 06-3 Round 2
- 11-3 Round 2
- 02-1 Round 4 
- 07 - 2 Round 2 



In [None]:
big_cinderella_2019 = big_cinderella[big_cinderella['Season']==2019]

In [None]:
big_cinderella_2019

### Proportions of little and big cinderellas

In [None]:
(len(big_cinderella)/len(WRegularTourneyCompactResults))*100

In [None]:
(len(little_cinderella)/len(WRegularTourneyCompactResults))*100

what is gonna improve the score accuracy considerably is to detect little and big cinderellas 
like the matches between seeds really close 

In [None]:
# find the round for each game but NOT WORKING VERSIONA

'''
WRegularTourneyCompactResults['Round']= 0
for i in WRegularTourneyCompactResults.index :
    for k in range(len(L)):     
        if int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k]:
            WRegularTourneyCompactResults['Round'][i]= 1                                                                      
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k]:
            WRegularTourneyCompactResults['Round'][i]= 1                                                                         
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M_2[k]:
            WRegularTourneyCompactResults['Round'][i]= 2                                                                           
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M_2[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L_2[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L_2[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L_2[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L_2[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M_2[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M_2[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k]:
            WRegularTourneyCompactResults['Round'][i]= 2
            

        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k-4]:
            WRegularTourneyCompactResults['Round'][i]= 3
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k-4] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k]:
            WRegularTourneyCompactResults['Round'][i]= 3
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k-4]:
            WRegularTourneyCompactResults['Round'][i]= 3
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k-4] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k]:
            WRegularTourneyCompactResults['Round'][i]= 3
            

        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==L[k] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== M[k-4]:
            WRegularTourneyCompactResults['Round'][i]= 3
        elif int(WRegularTourneyCompactResults['WSeed'][i][1:])==M[k-4] and int(WRegularTourneyCompactResults['LSeed'][i][1:])== L[k]:
            WRegularTourneyCompactResults['Round'][i]= 3

'''



In [None]:
#2017 season through 2020 season:
#Round 1 = days 137/138 (Fri/Sat)
#Round 2 = days 139/140 (Sun/Mon)
#Round 3 = days 144/145 (Sweet Sixteen, Fri/Sat)
#Round 4 = days 146/147 (Elite Eight, Sun/Mon)
#National Seminfinal = day 151 (Fri)
#National Final = day 153 (Sun)

#2015 season and 2016 season:
#Round 1 = days 137/138 (Fri/Sat)
#Round 2 = days 139/140 (Sun/Mon)
#Round 3 = days 144/145 (Sweet Sixteen, Fri/Sat)
#Round 4 = days 146/147 (Elite Eight, Sun/Mon)
#National Seminfinal = day 153 (Sun)
#National Final = day 155 (Tue)

#2003 season through 2014 season:
#Round 1 = days 138/139 (Sat/Sun)
#Round 2 = days 140/141 (Mon/Tue)
#Round 3 = days 145/146 (Sweet Sixteen, Sat/Sun)
#Round 4 = days 147/148 (Elite Eight, Mon/Tue)
#National Seminfinal = day 153 (Sun)
#National Final = day 155 (Tue)

#1998 season through 2002 season:
#Round 1 = days 137/138 (Fri/Sat)
#Round 2 = days 139/140 (Sun/Mon)
#Round 3 = day 145 only (Sweet Sixteen, Sat)
#Round 4 = day 147 only (Elite Eight, Mon)
#National Seminfinal = day 151 (Fri)
#National Final = day 153 (Sun)

# Day #133 is Selection Monday (for the women's tournament).

137 = First Round in 2019
First round SEEDS: 
- 16 - 1
- 15 - 2
- 14 - 3
- 13 - 4
- 12 - 5 
- 11 - 6
- 10 - 7
- 9 - 8

# STATS PLAYERS
> 

### PLAYERS NAMES AND ID

In [None]:
WPlayers = pd.read_csv(f'{WOMENS_DIR}/WPlayers.csv')

In [None]:
WPlayers

# Event Data SINCE 2015 FOR EACH GAME

Each MEvents & WEvents file lists the play-by-play event logs for more than 99.5% of games from that season.
Each event is assigned to either a team or a single one of the team's players.
Thus if a basket is made by one player and an assist is credited to a second player,
that would show up as two separate records. The players are listed by PlayerID within the xPlayers.csv file.

Womens Event Files:
- WEvents2015.csv, WEvents2016.csv, WEvents2017.csv, WEvents2018.csv, WEvents2019.csv

We can read in all files and combine into one huge dataframe, one for womens and one for mens.

In [None]:
womens_events = []
for year in [2015, 2016, 2017, 2018, 2019]:
    womens_events.append(pd.read_csv(f'{WOMENS_DIR}/WEvents{year}.csv'))
WEvents = pd.concat(womens_events)
print(WEvents.shape)

In [None]:
WEvents.head()

In [None]:
del womens_events
gc.collect()

### MERGE PLAYERS AND EVENTS 

In [None]:
# Merge Player name onto events

WEvents = WEvents.merge(WPlayers,
              how='left',
              left_on='EventPlayerID',
              right_on='PlayerID')

In [None]:
WEvents

In [None]:
# Event Types
plt.style.use('fivethirtyeight')
WEvents['counter'] = 1
WEvents.groupby('EventType')['counter'] \
    .sum() \
    .sort_values(ascending=False) \
    .plot(kind='bar',
          figsize=(15, 5),
         color=mypal[3],
         title='Event Type Frequency (Womens)')
plt.xticks(rotation=0)
plt.show()

It could be interesting to check what players have better stats the average 

CRITERIA FOR BEST STATS: 
- made2 
- made3
- steal 
- block 
- reb 
- assist

NEGATIVE CRITERIA : 
- miss2 
- miss3 

INTERESTING = RADIO MADE / MISS + MADE (reussite) 

### Get the games in the season against teams with cinderellas 

games = pd.DataFrame()
for i in cinderella.index:
    for j in WRegularSeasonCompactResults.index: 
        if cinderella['Season'][i] == WRegularSeasonCompactResults['Season'][j]: 
            if cinderella['WTeamID'][i]==WRegularSeasonCompactResults['WTeamID'][i] and cinderella['LTeamID'][i]==WRegularSeasonCompactResults['LTeamID'][i] :
                games = pd.concat([games, pd.DataFrame(WRegularTourneyCompactResults[WRegularTourneyCompactResults['index']==i])])



# Area of Event
We are told that the `Area` feature describes the 13 "areas" of the court, as follows: 1=under basket; 2=in the paint; 3=inside right wing; 4=inside right; 5=inside center; 6=inside left; 7=inside left wing; 8=outside right wing; 9=outside right; 10=outside center; 11=outside left; 12=outside left wing; 13=backcourt.

We can map these values to their names.

In [None]:
area_mapping = {0: np.nan,
                1: 'under basket',
                2: 'in the paint',
                3: 'inside right wing',
                4: 'inside right',
                5: 'inside center',
                6: 'inside left',
                7: 'inside left wing',
                8: 'outside right wing',
                9: 'outside right',
                10: 'outside center',
                11: 'outside left',
                12: 'outside left wing',
                13: 'backcourt'}

WEvents['Area_Name'] = WEvents['Area'].map(area_mapping)

In [None]:
WEvents.groupby('Area_Name')['counter'].sum() \
    .sort_values() \
    .plot(kind='barh',
          figsize=(15, 8),
          title='Frequency of Event Area')
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(15, 8))
for i, d in WEvents.loc[~WEvents['Area_Name'].isna()].groupby('Area_Name'):
    d.plot(x='X', y='Y', style='.', label=i, ax=ax, title='Visualizing Event Areas')
    ax.legend()
plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
plt.show()

## Plotting X, Y Data
This is some of the most exciting data provided, but after looking there are some things to consider.
- X, Y points are not available for all games- so this is not a complete sample
- The X/Y position is provided for fouls, turnovers, and field-goal attempts (either 2-point or 3-point). No X/Y data for other events.

In [None]:
# Normalize X, Y positions for court dimentions
# Court is 50 feet wide and 94 feet end to end.

WEvents['X_'] = (WEvents['X'] * (94/100))
WEvents['Y_'] = (WEvents['Y'] * (50/100))

# NCAA Court Plot Function

In [None]:
def create_ncaa_full_court(ax=None, three_line='mens', court_color='#dfbb85',
                           lw=3, lines_color='black', lines_alpha=0.5,
                           paint_fill='blue', paint_alpha=0.4):
    """
    Creates NCAA Basketball
    Dimensions are in feet (Court is 97x50 ft)
    Created by: Rob Mulla / https://github.com/RobMulla

    * Note that this function uses "feet" as the unit of measure.
    * NCAA Data is provided on a x range: 0, 100 and y-range 0 to 100
    * To plot X/Y positions first convert to feet like this:
    ```
    Events['X_'] = (Events['X'] * (94/100))
    Events['Y_'] = (Events['Y'] * (50/100))
    ```

    three_line: 'mens', 'womens' or 'both' defines 3 point line plotted
    court_color : (hex) Color of the court
    lw : line width
    lines_color : Color of the lines
    paint_fill : Color inside the paint
    paint_alpha : transparency of the "paint"
    """
    if ax is None:
        ax = plt.gca()

    # Create Pathes for Court Lines
    center_circle = Circle((94/2, 50/2), 6,
                           linewidth=lw, color=lines_color, lw=lw,
                           fill=False, alpha=lines_alpha)
#     inside_circle = Circle((94/2, 50/2), 2,
#                            linewidth=lw, color=lines_color, lw=lw,
#                            fill=False, alpha=lines_alpha)

    hoop_left = Circle((5.25, 50/2), 1.5 / 2,
                       linewidth=lw, color=lines_color, lw=lw,
                       fill=False, alpha=lines_alpha)
    hoop_right = Circle((94-5.25, 50/2), 1.5 / 2,
                        linewidth=lw, color=lines_color, lw=lw,
                        fill=False, alpha=lines_alpha)

    # Paint - 18 Feet 10 inches which converts to 18.833333 feet - gross!
    left_paint = Rectangle((0, (50/2)-6), 18.833333, 12,
                           fill=paint_fill, alpha=paint_alpha,
                           lw=lw, edgecolor=None)
    right_paint = Rectangle((94-18.83333, (50/2)-6), 18.833333,
                            12, fill=paint_fill, alpha=paint_alpha,
                            lw=lw, edgecolor=None)
    
    left_paint_boarder = Rectangle((0, (50/2)-6), 18.833333, 12,
                           fill=False, alpha=lines_alpha,
                           lw=lw, edgecolor=lines_color)
    right_paint_boarder = Rectangle((94-18.83333, (50/2)-6), 18.833333,
                            12, fill=False, alpha=lines_alpha,
                            lw=lw, edgecolor=lines_color)

    left_arc = Arc((18.833333, 50/2), 12, 12, theta1=-
                   90, theta2=90, color=lines_color, lw=lw,
                   alpha=lines_alpha)
    right_arc = Arc((94-18.833333, 50/2), 12, 12, theta1=90,
                    theta2=-90, color=lines_color, lw=lw,
                    alpha=lines_alpha)
    
    leftblock1 = Rectangle((7, (50/2)-6-0.666), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    leftblock2 = Rectangle((7, (50/2)+6), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(leftblock1)
    ax.add_patch(leftblock2)
    
    left_l1 = Rectangle((11, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l2 = Rectangle((14, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l3 = Rectangle((17, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(left_l1)
    ax.add_patch(left_l2)
    ax.add_patch(left_l3)
    left_l4 = Rectangle((11, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l5 = Rectangle((14, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    left_l6 = Rectangle((17, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(left_l4)
    ax.add_patch(left_l5)
    ax.add_patch(left_l6)
    
    rightblock1 = Rectangle((94-7-1, (50/2)-6-0.666), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    rightblock2 = Rectangle((94-7-1, (50/2)+6), 1, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(rightblock1)
    ax.add_patch(rightblock2)

    right_l1 = Rectangle((94-11, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l2 = Rectangle((94-14, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l3 = Rectangle((94-17, (50/2)-6-0.666), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(right_l1)
    ax.add_patch(right_l2)
    ax.add_patch(right_l3)
    right_l4 = Rectangle((94-11, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l5 = Rectangle((94-14, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    right_l6 = Rectangle((94-17, (50/2)+6), 0.166, 0.666,
                           fill=True, alpha=lines_alpha,
                           lw=0, edgecolor=lines_color,
                           facecolor=lines_color)
    ax.add_patch(right_l4)
    ax.add_patch(right_l5)
    ax.add_patch(right_l6)
    
    # 3 Point Line
    if (three_line == 'mens') | (three_line == 'both'):
        # 22' 1.75" distance to center of hoop
        three_pt_left = Arc((6.25, 50/2), 44.291, 44.291, theta1=-78,
                            theta2=78, color=lines_color, lw=lw,
                            alpha=lines_alpha)
        three_pt_right = Arc((94-6.25, 50/2), 44.291, 44.291,
                             theta1=180-78, theta2=180+78,
                             color=lines_color, lw=lw, alpha=lines_alpha)

        # 4.25 feet max to sideline for mens
        ax.plot((0, 11.25), (3.34, 3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((0, 11.25), (50-3.34, 50-3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-11.25, 94), (3.34, 3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-11.25, 94), (50-3.34, 50-3.34),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.add_patch(three_pt_left)
        ax.add_patch(three_pt_right)

    if (three_line == 'womens') | (three_line == 'both'):
        # womens 3
        three_pt_left_w = Arc((6.25, 50/2), 20.75 * 2, 20.75 * 2, theta1=-85,
                              theta2=85, color=lines_color, lw=lw, alpha=lines_alpha)
        three_pt_right_w = Arc((94-6.25, 50/2), 20.75 * 2, 20.75 * 2,
                               theta1=180-85, theta2=180+85,
                               color=lines_color, lw=lw, alpha=lines_alpha)

        # 4.25 inches max to sideline for mens
        ax.plot((0, 8.3), (4.25, 4.25), color=lines_color,
                lw=lw, alpha=lines_alpha)
        ax.plot((0, 8.3), (50-4.25, 50-4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-8.3, 94), (4.25, 4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)
        ax.plot((94-8.3, 94), (50-4.25, 50-4.25),
                color=lines_color, lw=lw, alpha=lines_alpha)

        ax.add_patch(three_pt_left_w)
        ax.add_patch(three_pt_right_w)

    # Add Patches
    ax.add_patch(left_paint)
    ax.add_patch(left_paint_boarder)
    ax.add_patch(right_paint)
    ax.add_patch(right_paint_boarder)
    ax.add_patch(center_circle)
#     ax.add_patch(inside_circle)
    ax.add_patch(hoop_left)
    ax.add_patch(hoop_right)
    ax.add_patch(left_arc)
    ax.add_patch(right_arc)

    # Restricted Area Marker
    restricted_left = Arc((6.25, 50/2), 8, 8, theta1=-90,
                        theta2=90, color=lines_color, lw=lw,
                        alpha=lines_alpha)
    restricted_right = Arc((94-6.25, 50/2), 8, 8,
                         theta1=180-90, theta2=180+90,
                         color=lines_color, lw=lw, alpha=lines_alpha)
    ax.add_patch(restricted_left)
    ax.add_patch(restricted_right)
    
    # Backboards
    ax.plot((4, 4), ((50/2) - 3, (50/2) + 3),
            color=lines_color, lw=lw*1.5, alpha=lines_alpha)
    ax.plot((94-4, 94-4), ((50/2) - 3, (50/2) + 3),
            color=lines_color, lw=lw*1.5, alpha=lines_alpha)
    ax.plot((4, 4.6), (50/2, 50/2), color=lines_color,
            lw=lw, alpha=lines_alpha)
    ax.plot((94-4, 94-4.6), (50/2, 50/2),
            color=lines_color, lw=lw, alpha=lines_alpha)

    # Half Court Line
    ax.axvline(94/2, color=lines_color, lw=lw, alpha=lines_alpha)

    # Boarder
    boarder = Rectangle((0.3,0.3), 94-0.6, 50-0.6, fill=False, lw=3, color='black', alpha=lines_alpha)
    ax.add_patch(boarder)
    
    # Plot Limit
    ax.set_xlim(0, 94)
    ax.set_ylim(0, 50)
    ax.set_facecolor(court_color)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel('')
    return ax


fig, ax = plt.subplots(figsize=(15, 8.5))
create_ncaa_full_court(ax, three_line='both', paint_alpha=0.4)
plt.show()


In [None]:
fig, ax = plt.subplots(figsize=(15, 7.8))
ms = 10
ax = create_ncaa_full_court(ax, paint_alpha=0.1)
WEvents.query('EventType == "turnover"') \
    .plot(x='X_', y='Y_', style='X',
          title='Turnover Locations (Mens)',
          c='red',
          alpha=0.3,
         figsize=(15, 9),
         label='Steals',
         ms=ms,
         ax=ax)
ax.set_xlabel('')
ax.get_legend().remove()
plt.show()

In [None]:
COURT_COLOR = '#dfbb85'
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 4))
# Where are 3 pointers made from? (This is really cool)
WEvents.query('EventType == "made3"') \
    .plot(x='X_', y='Y_', style='.',
          color='blue',
          title='3 Pointers Made (Womens)',
          alpha=0.01, ax=ax1)
ax1 = create_ncaa_full_court(ax1, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.set_facecolor(COURT_COLOR)
WEvents.query('EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='.',
          title='3 Pointers Missed (Womens)',
          color='red',
          alpha=0.01, ax=ax2)
ax2.set_facecolor(COURT_COLOR)
ax2 = create_ncaa_full_court(ax2, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.get_legend().remove()
ax2.get_legend().remove()
ax1.set_xticks([])
ax1.set_yticks([])
ax2.set_xticks([])
ax2.set_yticks([])
ax1.set_xlabel('')
ax2.set_xlabel('')
plt.show()

In [None]:
COURT_COLOR = '#dfbb85'
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 4))
# Where are 3 pointers made from? (This is really cool)
WEvents.query('EventType == "made2"') \
    .plot(x='X_', y='Y_', style='.',
          color='blue',
          title='2 Pointers Made (Womens)',
          alpha=0.01, ax=ax1)
ax1.set_facecolor(COURT_COLOR)
ax1 = create_ncaa_full_court(ax1, lw=0.5, three_line='womens', paint_alpha=0.1)
WEvents.query('EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='.',
          title='2 Pointers Missed (Womens)',
          color='red',
          alpha=0.01, ax=ax2)
ax2.set_facecolor(COURT_COLOR)
ax2 = create_ncaa_full_court(ax2, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.get_legend().remove()
ax2.get_legend().remove()
ax1.set_xticks([])
ax1.set_yticks([])
ax2.set_xticks([])
ax2.set_yticks([])
ax1.set_xlabel('')
ax2.set_xlabel('')
plt.show()

## PlayerIDs
There is an issue when trying to read in lines where the player name has a comma. We can use `error_bad_lines` to get past this, but ideally the data would be cleaned to remove the comma or a different delimiter would be used.

In [None]:
WPlayers = pd.read_csv(f'{WOMENS_DIR}/WPlayers.csv')

In [None]:
WPlayers.head()

# Plotting Specific Players' Made/Missed Shots
Now that we have player names in the event data, lets single out specific players. Starting with one of the most exciting players of the last decade.

![](https://thenypost.files.wordpress.com/2018/11/zion-williamson-duke-freshman-scouting-comparables.jpg?quality=80&strip=all&w=618&h=410&crop=1)

Next lets look at Katie Lou Samuelson. She is known to be a 3-point shooter. As such, we can see her shots mostly come from outside the 3-point line.

![](https://imagesvc.timeincapp.com/v3/fan/image?url=https://highposthoops.com/wp-content/uploads/getty-images/2018/10/951142340.jpeg?&w=618&h=410&crop=1)

In [None]:
ms = 10 # Marker Size
FirstName = 'Katie Lou'
LastName = 'Samuelson'
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax, three_line='womens')
WEvents.query('FirstName == @FirstName and LastName == @LastName and EventType == "made2"') \
    .plot(x='X_', y='Y_', style='o',
          title='Shots (Katie Lou Samuelson)',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 2',
         ms=ms,
         ax=ax)
plt.legend()
WEvents.query('FirstName == @FirstName and LastName == @LastName and EventType == "miss2"') \
    .plot(x='X_', y='Y_', style='X',
          alpha=0.5, ax=ax,
         label='Missed 2',
         ms=ms)
plt.legend()
WEvents.query('FirstName == @FirstName and LastName == @LastName and EventType == "made3"') \
    .plot(x='X_', y='Y_', style='o',
          c='brown',
          alpha=0.5,
         figsize=(15, 8),
         label='Made 3', ax=ax,
         ms=ms)
plt.legend()
WEvents.query('FirstName == @FirstName and LastName == @LastName and EventType == "miss3"') \
    .plot(x='X_', y='Y_', style='X',
          c='green',
          alpha=0.5, ax=ax,
         label='Missed 3',
         ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()

# Shot Heatmap
We can plot a heatmap of where shots occur on the court. Interesting observation when comparing the mens to womens game is that many of the shots for mens come from directly under the hoop, while the hot spots for women shots come more frequently from the left and right of the hoop.

In [None]:
N_bins = 100
shot_events = WEvents.loc[WEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (WEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax,
                            paint_alpha=0.0,
                            three_line='mens',
                            court_color='black',
                            lines_color='white')
_ = plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.1, shot_events['X_'].shape), # Add Jitter to values for plotting
           shot_events['Y_'].values + np.random.normal(0, 0.1, shot_events['Y_'].shape),
           bins=N_bins, norm=mpl.colors.LogNorm(),
               cmap='plasma')

# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')

ax.set_title('Shot Heatmap (Mens)')
plt.show()

In [None]:
N_bins = 100
shot_events = WEvents.loc[WEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (WEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax, three_line='womens', paint_alpha=0.0,
                            court_color='black',
                            lines_color='white')
_ = plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.2, shot_events['X_'].shape),
           shot_events['Y_'].values + np.random.normal(0, 0.2, shot_events['Y_'].shape),
           bins=N_bins, norm=mpl.colors.LogNorm(),
               cmap='plasma')

# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')

ax.set_title('Shot Heatmap (Womens)')
plt.show()

In [None]:
MEvents['PointsScored'] =  0
MEvents.loc[MEvents['EventType'] == 'made2', 'PointsScored'] = 2
MEvents.loc[MEvents['EventType'] == 'made3', 'PointsScored'] = 3
MEvents.loc[MEvents['EventType'] == 'missed2', 'PointsScored'] = 0
MEvents.loc[MEvents['EventType'] == 'missed3', 'PointsScored'] = 0

In [None]:
# # Average Points Scored per xy coord
# avg_pnt_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
#     .groupby(['X_','Y_'])['PointsScored'].mean().reset_index()

# # .plot(x='X_',y='Y_', style='.')
# fig, ax = plt.subplots(figsize=(15, 8))
# ax = sns.scatterplot(data=avg_pnt_xy, x='X_', y='Y_', hue='PointsScored', cmap='coolwarm')
# ax = create_ncaa_full_court(ax)
# plt.show()


In [None]:
# avg_made_xy.sort_values('Made')

In [None]:
# avg_made_xy['Made'] / avg_made_xy['Missed']

In [None]:
# MEvents['Made'] = False
# MEvents['Made'] = False
# MEvents.loc[MEvents['EventType'] == 'made2', 'Made'] = True
# MEvents.loc[MEvents['EventType'] == 'made3', 'Made'] = True
# MEvents.loc[MEvents['EventType'] == 'missed2', 'Made'] = False
# MEvents.loc[MEvents['EventType'] == 'missed3', 'Made'] = False
# MEvents.loc[MEvents['EventType'] == 'made2', 'Missed'] = False
# MEvents.loc[MEvents['EventType'] == 'made3', 'Missed'] = False
# MEvents.loc[MEvents['EventType'] == 'missed2', 'Missed'] = True
# MEvents.loc[MEvents['EventType'] == 'missed3', 'Missed'] = True

# # Average Pct Made per xy coord
# avg_made_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
#     .groupby(['X_','Y_'])['Made','Missed'].sum().reset_index()

# # .plot(x='X_',y='Y_', style='.')
# fig, ax = plt.subplots(figsize=(15, 8))
# cmap = sns.cubehelix_palette(as_cmap=True)
# ax = sns.scatterplot(data=avg_made_xy, x='X_', y='Y_', size='Made', cmap='plasma')
# ax = create_ncaa_full_court(ax, paint_alpha=0)
# ax.set_title('Number of Shots Made')
# plt.show()

# TODO
- Half Court Plot
- Normalize X,Y data to half court

# Reference
1. Court Lines code inspired by code made for plotting the NBA court. http://savvastjortjoglou.com/nba-shot-sharts.html
2. Official NCAA Basketball Court Dimensions:

![](https://og4sg2f1jmu2x9xay48pj5z1-wpengine.netdna-ssl.com/wp-content/uploads/2019/06/NCAA-Mens-and-Womens-Basketball-Court-Diagram-3-point-line-extended-2019.png)