# Homework

Consider conditional probability that a shot results in a goal given the score differential state,

$$p(g_s| X_s) = \frac{p(X_s|g_s) p(g_s)}{ p(X_s | g_s) p(g_s) +  p(X_s | \bar{g_s}) p( \bar{g_s}) }$$

where $X_s$., is the score differential from -3 to -3.
1. Use Bayes Rule to calculate the conditional probabilities that a shot results in a goal given a tied game.
    - Generate variables: isgoal, istied 
    - Calculate the unconditional probability (prior) that a shot results in a goal
    - Calculate the unconditional probability (prior) that a shot does not result in a goal
    - Calculate inverse conditional probability of a score differential state given a goal
    - Calculate inverse conditional probability of a score differential state given not a goal
    - Use Bayes Rule to calculate conditional probability that a shot results in a goal given a tied game
    
2. Show that the conditional probability from the data is the same as with Bayes Rule.

3. Suppose a player had a better than average shot. Use Bayes Rule to calculate the conditional probabilities that a shot results in a goal given a tied game. Increase the prior unconditional probability that a shot results in a goal to reflect his ability better than average shot.
    - Specifically increase the prior probability to 0.10, 0.15, and 0.20.
    - Show the change in the posterior probability when prior is increased


In [1]:
%matplotlib inline
import os
import sys
import pandas
import numpy
import matplotlib
import matplotlib.pyplot as plt
pandas.set_option('display.notebook_repr_html', False)
pandas.set_option('display.max_columns', 30)
pandas.set_option('display.max_rows', 25)
pandas.set_option('precision',2)

from decimal import getcontext, Decimal
# Set the precision.
getcontext().prec = 3

# Import Data

In [2]:
print('working directory:', os.getcwd())
dm = pandas.read_csv('shots.csv')
dm.info()

('working directory:', '/Users/MattyB/Desktop/Data Sets/4P94-master/03-bayes_rule')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59426 entries, 0 to 59425
Data columns (total 25 columns):
season                  59426 non-null int64
gamenumber              59426 non-null int64
gamedate                59426 non-null object
vteamcode               59426 non-null object
hteamcode               59426 non-null object
eventnumber             59426 non-null int64
period                  59426 non-null int64
eventtimefromzero       59426 non-null int64
advantagetypeshooter    59426 non-null object
advantagetypegoalie     59426 non-null object
subseasontype           59426 non-null object
teamcode                59426 non-null object
isTHome                 59426 non-null int64
eventtype               59426 non-null object
playernumber            59426 non-null int64
position                59426 non-null object
namegoalie              59426 non-null object
tgoals                  59426 no

In [3]:
dm.head()

   season  gamenumber   gamedate vteamcode hteamcode  eventnumber  period  \
0    2011       20001  10/6/2011       PHI       BOS            4       1   
1    2011       20001  10/6/2011       PHI       BOS            9       1   
2    2011       20001  10/6/2011       PHI       BOS           14       1   
3    2011       20001  10/6/2011       PHI       BOS           25       1   
4    2011       20001  10/6/2011       PHI       BOS           29       1   

   eventtimefromzero advantagetypeshooter advantagetypegoalie subseasontype  \
0                 47                   EV                  EV           REG   
1                114                   EV                  EV           REG   
2                138                   EV                  EV           REG   
3                249                   EV                  EV           REG   
4                297                   EV                  EV           REG   

  teamcode  isTHome eventtype  playernumber position      name

## Examine Data

In [4]:
dm.columns

Index([u'season', u'gamenumber', u'gamedate', u'vteamcode', u'hteamcode',
       u'eventnumber', u'period', u'eventtimefromzero',
       u'advantagetypeshooter', u'advantagetypegoalie', u'subseasontype',
       u'teamcode', u'isTHome', u'eventtype', u'playernumber', u'position',
       u'namegoalie', u'tgoals', u'ogoals', u'zone', u'X', u'Y', u'XNorm',
       u'YNorm', u'shotType'],
      dtype='object')

In [5]:
dm.describe()

        season  gamenumber  eventnumber    period  eventtimefromzero  \
count  59426.0    59426.00     59426.00  59426.00           59426.00   
mean    2011.0    20621.62       154.60      2.03             584.55   
std        0.0      353.97        89.82      0.85             347.96   
min     2011.0    20001.00         3.00      1.00               3.00   
25%     2011.0    20317.00        77.00      1.00             277.00   
50%     2011.0    20624.00       154.00      2.00             573.00   
75%     2011.0    20927.00       229.00      3.00             886.00   
max     2011.0    21230.00       392.00      4.00            1200.00   

        isTHome  playernumber    tgoals    ogoals         X         Y  \
count  59426.00      59426.00  59426.00  59426.00  59426.00  59426.00   
mean       0.49         29.91      1.31      1.18      0.27     -0.21   
std        0.50         13.60      1.37      1.28     63.27     19.46   
min        0.00          1.00      0.00      0.00    -99.00

#### Create Variables
- seconds from start of game
- score margin (adjust to within 3 goals)
- create indicator variables to represent catagories
    - shot resulted in a goal
    - shot types
- numpy.where

In [6]:
dm['secStart'] = ((dm['period']-1)*1200 + dm['eventtimefromzero'])
dm = dm.sort_values(by=['season', 'gamenumber', 'secStart'], ascending=[1, 1, 1])
dm['minStart'] = (dm['secStart']/60).astype(int)

In [7]:
dm['dscore'] = dm['tgoals'] - dm['ogoals']

In [8]:
dm.head()

   season  gamenumber   gamedate vteamcode hteamcode  eventnumber  period  \
0    2011       20001  10/6/2011       PHI       BOS            4       1   
1    2011       20001  10/6/2011       PHI       BOS            9       1   
2    2011       20001  10/6/2011       PHI       BOS           14       1   
3    2011       20001  10/6/2011       PHI       BOS           25       1   
4    2011       20001  10/6/2011       PHI       BOS           29       1   

   eventtimefromzero advantagetypeshooter advantagetypegoalie subseasontype  \
0                 47                   EV                  EV           REG   
1                114                   EV                  EV           REG   
2                138                   EV                  EV           REG   
3                249                   EV                  EV           REG   
4                297                   EV                  EV           REG   

  teamcode  isTHome eventtype  playernumber position      name

In [9]:
dm['dscore'].describe()

count    59426.00
mean         0.13
std          1.52
min         -9.00
25%         -1.00
50%          0.00
75%          1.00
max          9.00
Name: dscore, dtype: float64

In [10]:
dm['dscore'] =  numpy.where(dm['dscore']>2,  2 , dm['dscore'] )
dm['dscore'] =  numpy.where(dm['dscore']<-2, -2 , dm['dscore'] )
dm['dscore'].describe()

count    59426.00
mean         0.11
std          1.18
min         -2.00
25%         -1.00
50%          0.00
75%          1.00
max          2.00
Name: dscore, dtype: float64

### #1 Bayes Rule

In [11]:
# Create Indicator Variable
dm['isgoal'] = numpy.where(dm['eventtype']== 'GOAL', 1, 0)
dm['istied'] = numpy.where(dm['dscore']==0, 1, 0)

In [12]:
dm['eventtype'].value_counts()

SHOT    54724
GOAL     4702
Name: eventtype, dtype: int64

In [13]:
dm['isgoal'].value_counts()

0    54724
1     4702
Name: isgoal, dtype: int64

In [14]:
dm['isgoal'].mean()

0.07912361592568909

- $p(g_s)=0.079$
- $p( \bar{g_s})=0.921$

In [15]:
dm.groupby(['dscore'])['eventtype'].value_counts()

dscore  eventtype
-2      SHOT          6082
        GOAL           598
-1      SHOT          8902
        GOAL           822
 0      SHOT         20850
        GOAL          1702
 1      SHOT         10725
        GOAL           891
 2      SHOT          8165
        GOAL           689
Name: eventtype, dtype: int64

In [16]:
dm.groupby(['dscore'])['isgoal'].mean()

dscore
-2    0.09
-1    0.08
 0    0.08
 1    0.08
 2    0.08
Name: isgoal, dtype: float64

In [17]:
dg = dm[dm['isgoal']==1]
dn = dm[dm['isgoal']==0]

Probability that a goal is scored while the score is tied
$$p(g_s| X_s) = \frac{p(X_s|g_s) p(g_s)}{ p(X_s | g_s) p(g_s) +  p(X_s | \bar{g_s}) p( \bar{g_s}) }$$


In [18]:
p_goal = dm['isgoal'].mean()
p_goal.round(2)

0.080000000000000002

In [19]:
p_notgoal = 1 - dm['isgoal'].mean()
p_notgoal.round(2)

0.92000000000000004

In [20]:
p_tied_goal = dg['istied'].mean()
p_tied_goal.round(2)

0.35999999999999999

In [21]:
p_tied_notgoal = dn['istied'].mean()
p_tied_notgoal.round(2)

0.38

In [22]:
p_goal_tied = p_tied_goal*p_goal / (p_tied_goal*p_goal + p_tied_notgoal*p_notgoal)
p_goal_tied.round(2)

0.080000000000000002

### #2 Show Conditional Probability is the Same as Bayes Rule

In [23]:
dm.groupby(['istied'])['isgoal'].mean()

istied
0    0.08
1    0.08
Name: isgoal, dtype: float64

In [24]:
p_goal.round(2)

0.080000000000000002

### #3 Test Bayes Rule with Different Prior Probability

In [25]:
# When prior probability = 0.10
p_goal = 0.10

In [26]:
p_notgoal = 0.9

In [27]:
p_goal_tied = p_tied_goal*p_goal / (p_tied_goal*p_goal + p_tied_notgoal*p_notgoal)
p_goal_tied.round(2)

0.10000000000000001

In [28]:
# When prior probability = 0.15
p_goal = 0.15
p_notgoal = 0.85

In [29]:
p_goal_tied = p_tied_goal*p_goal / (p_tied_goal*p_goal + p_tied_notgoal*p_notgoal)
p_goal_tied.round(2)

0.14000000000000001

In [30]:
# When prior probability = 0.20
p_goal = 0.20
p_notgoal = 0.8

In [31]:
p_goal_tied = p_tied_goal*p_goal / (p_tied_goal*p_goal + p_tied_notgoal*p_notgoal)
p_goal_tied.round(2)

0.19

- When $p(g_s)$ was origanally 0.08, $p(g_s| X_s)$ = 0.08
- When $p(g_s)$ was increased to 0.10, $p(g_s| X_s)$ increased to 0.10
- When $p(g_s)$ was increased to 0.15, $p(g_s| X_s)$ increased to 0.14
- When $p(g_s)$ was increased to 0.20, $p(g_s| X_s)$ increased to 0.19