# Introduction to probability models

In [None]:
# Objectives
1. Understand the difference between conditional and unconditional probabilties
2. State Bayes Rule in probability and odds form
3. Demonstrate understanding of prior and inverse conditional probabilities
4. Demonstrate understanding of Bayes Factor
2. Use Bayes rule to calculate conditional probabilities

Conditional probability that a shot results in a goal

# Bayes Rule

$$p(g_s| X_s) = \frac{p(X_s|g_s) p(g_s)}{ p(X_s | g_s) p(g_s) +  p(X_s | \bar{g_s}) p( \bar{g_s}) }$$

where,

$X_s$: various game factors (i.e., score differential, period game states, shottype)

$P(g_s| X_s) $ : posterior (conditional) probability of a goal given event X

$P(g_s) $ : prior (unconditional) probability of a goal

$P(\bar{g_s}) $: prior (unconditional) probability not a goal

$P(X_s|g_s) $: inverse conditional probability given a goal

$P(X_s|\bar{g_s}) $: inverse conditional probability given not a goal

In [111]:
%matplotlib inline
import os
import sys
import pandas
import numpy
import matplotlib
import matplotlib.pyplot as plt
pandas.set_option('display.notebook_repr_html', False)
pandas.set_option('display.max_columns', 20)
pandas.set_option('display.max_rows', 25)
pandas.set_option('precision',2)

from decimal import getcontext, Decimal
# Set the precision.
getcontext().prec = 3

# import data

In [112]:
print('working directory: ', os.getcwd())
dm = pandas.read_csv('shots.csv')
dm.info()

('working directory: ', '/home/vmuser/Documents/4P94/03-bayes_rule')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59426 entries, 0 to 59425
Data columns (total 25 columns):
season                  59426 non-null int64
gamenumber              59426 non-null int64
gamedate                59426 non-null object
vteamcode               59426 non-null object
hteamcode               59426 non-null object
eventnumber             59426 non-null int64
period                  59426 non-null int64
eventtimefromzero       59426 non-null int64
advantagetypeshooter    59426 non-null object
advantagetypegoalie     59426 non-null object
subseasontype           59426 non-null object
teamcode                59426 non-null object
isTHome                 59426 non-null int64
eventtype               59426 non-null object
playernumber            59426 non-null int64
position                59426 non-null object
namegoalie              59426 non-null object
tgoals                  59426 non-null int64
og

# examine data

In [113]:
dm.head()

   season  gamenumber   gamedate vteamcode hteamcode  eventnumber  period  \
0    2011       20001  10/6/2011       PHI       BOS            4       1   
1    2011       20001  10/6/2011       PHI       BOS            9       1   
2    2011       20001  10/6/2011       PHI       BOS           14       1   
3    2011       20001  10/6/2011       PHI       BOS           25       1   
4    2011       20001  10/6/2011       PHI       BOS           29       1   

   eventtimefromzero advantagetypeshooter advantagetypegoalie   ...     \
0                 47                   EV                  EV   ...      
1                114                   EV                  EV   ...      
2                138                   EV                  EV   ...      
3                249                   EV                  EV   ...      
4                297                   EV                  EV   ...      

  position      namegoalie  tgoals ogoals  zone   X   Y  XNorm  YNorm shotType  
0        G 

In [114]:
dm.columns

Index([u'season', u'gamenumber', u'gamedate', u'vteamcode', u'hteamcode',
       u'eventnumber', u'period', u'eventtimefromzero',
       u'advantagetypeshooter', u'advantagetypegoalie', u'subseasontype',
       u'teamcode', u'isTHome', u'eventtype', u'playernumber', u'position',
       u'namegoalie', u'tgoals', u'ogoals', u'zone', u'X', u'Y', u'XNorm',
       u'YNorm', u'shotType'],
      dtype='object')

In [115]:
dm.describe()

        season  gamenumber  eventnumber    period  eventtimefromzero  \
count  59426.0    59426.00     59426.00  59426.00           59426.00   
mean    2011.0    20621.62       154.60      2.03             584.55   
std        0.0      353.97        89.82      0.85             347.96   
min     2011.0    20001.00         3.00      1.00               3.00   
25%     2011.0    20317.00        77.00      1.00             277.00   
50%     2011.0    20624.00       154.00      2.00             573.00   
75%     2011.0    20927.00       229.00      3.00             886.00   
max     2011.0    21230.00       392.00      4.00            1200.00   

        isTHome  playernumber    tgoals    ogoals         X         Y  \
count  59426.00      59426.00  59426.00  59426.00  59426.00  59426.00   
mean       0.49         29.91      1.31      1.18      0.27     -0.21   
std        0.50         13.60      1.37      1.28     63.27     19.46   
min        0.00          1.00      0.00      0.00    -99.00

# create variables

* seconds from start of game
* score margin (adjust to within 3 goals)
* create indicator variables to represent catagories

    * shot resulted in a goal
    * shot types

* numpy.where

In [116]:
dm['secStart'] = ((dm['period']-1)*1200 + dm['eventtimefromzero'])
dm = dm.sort_values(by=['season', 'gamenumber', 'secStart'], ascending=[1, 1, 1])
dm['minStart'] = (dm['secStart']/60).astype(int)

In [117]:
dm['dscore'] = dm['tgoals'] - dm['ogoals']

In [118]:
dm['dscore'].describe()

count    59426.00
mean         0.13
std          1.52
min         -9.00
25%         -1.00
50%          0.00
75%          1.00
max          9.00
Name: dscore, dtype: float64

In [119]:
# use numpy where command for if else statement
# numpy.where(if ,  then , else )
dm['dscore'] =  numpy.where(dm['dscore']>3,  3 , dm['dscore'] )
dm['dscore'] =  numpy.where(dm['dscore']<-3, -3 , dm['dscore'] )
dm['dscore'].describe()

count    59426.00
mean         0.12
std          1.37
min         -3.00
25%         -1.00
50%          0.00
75%          1.00
max          3.00
Name: dscore, dtype: float64

In [120]:
dm['eventtype'].value_counts()

SHOT    54724
GOAL     4702
Name: eventtype, dtype: int64

In [121]:
dm['isgoal'] = numpy.where(dm['eventtype']=='GOAL', 1 , 0)

In [122]:
dm['eventtype'].value_counts()

SHOT    54724
GOAL     4702
Name: eventtype, dtype: int64

In [123]:
dm['isgoal'].value_counts()
dm['isgoal'].mean()


0.07912361592568909

Probability that a shot results in a goal given a wrist shot

$$p(g_s| wrist_s) = \frac{p(wrist_s|g_s) p(g_s)}{ p(wrist_s | g_s) p(g_s) +  p(wrist_s | \bar{g_s}) p( \bar{g_s}) }$$


* Show that Bayes Rule and conditional probabilities in data are equivalent
* note, the mean of an indicator variable is the probability

In [124]:
dm['shotType'].value_counts()

Wrist          29240
Slap           12255
Snap            8421
Backhand        5277
Tip-In          2395
Wrap-around      952
Deflected        886
Name: shotType, dtype: int64

In [125]:
dm['iswrist'] = numpy.where(dm['shotType']=='Wrist', 1 , 0)
dm['isslap']  = numpy.where(dm['shotType']=='Slap', 1 , 0)

In [126]:
dm.groupby(['shotType'])['eventtype'].value_counts()

shotType     eventtype
Backhand     SHOT          4761
             GOAL           516
Deflected    SHOT           713
             GOAL           173
Slap         SHOT         11707
             GOAL           548
Snap         SHOT          7734
             GOAL           687
Tip-In       SHOT          1959
             GOAL           436
Wrap-around  SHOT           903
             GOAL            49
Wrist        SHOT         26947
             GOAL          2293
Name: eventtype, dtype: int64

In [127]:
dm.groupby(['dscore'])['eventtype'].value_counts()

dscore  eventtype
-3      SHOT          2315
        GOAL           239
-2      SHOT          3767
        GOAL           359
-1      SHOT          8902
        GOAL           822
 0      SHOT         20850
        GOAL          1702
 1      SHOT         10725
        GOAL           891
 2      SHOT          5089
        GOAL           433
 3      SHOT          3076
        GOAL           256
Name: eventtype, dtype: int64

In [128]:
dm.groupby(['shotType'])['isgoal'].mean()

shotType
Backhand       0.10
Deflected      0.20
Slap           0.04
Snap           0.08
Tip-In         0.18
Wrap-around    0.05
Wrist          0.08
Name: isgoal, dtype: float64

In [129]:
dm.groupby(['isgoal'])['iswrist'].mean()


isgoal
0    0.49
1    0.49
Name: iswrist, dtype: float64

In [130]:
dm.groupby(['isgoal'])['isslap'].mean()

isgoal
0    0.21
1    0.12
Name: isslap, dtype: float64

In [131]:
dm.groupby(['dscore'])['isgoal'].mean()

dscore
-3    0.09
-2    0.09
-1    0.08
 0    0.08
 1    0.08
 2    0.08
 3    0.08
Name: isgoal, dtype: float64

Unconitional probability that a shot results in a goal $P(g_s) $

In [132]:
p_goal = dm['isgoal'].mean()
p_goal.round(2)

0.080000000000000002

Unconditional probability that a shot does not result in a goal $P(\bar{g_s})$

In [133]:
p_notgoal = 1- p_goal
p_notgoal.round(2)

0.92000000000000004

Create conditional data frames to calculate inverse conditional event probabilities

In [134]:
dg = dm[dm['isgoal']==1]
dn = dm[dm['isgoal']==0]

Probability that a goal was scored with a wrist shot $P(wrist_s|g_s)$

In [135]:
p_wrist_goal = dg['iswrist'].mean()
p_wrist_goal.round(2)

0.48999999999999999

In [136]:
p_wrist_notgoal = dn['iswrist'].mean()
p_wrist_notgoal.round(2)

0.48999999999999999

# Bayes Rule

Probability that a shot results in a goal given a wrist shot

$$p(g_s| wrist_s) = \frac{p(wrist_s|g_s) p(g_s)}{ p(wrist_s | g_s) p(g_s) +  p(wrist_s | \bar{g_s}) p( \bar{g_s}) }$$


In [137]:
p_goal_wrist = p_wrist_goal*p_goal / (p_wrist_goal*p_goal + p_wrist_notgoal*p_notgoal)
p_goal_wrist.round(2)


0.080000000000000002

# Conditional probability in the data

In [138]:
p_wrist_data = dm[dm['shotType']=='Wrist']['isgoal'].mean()
p_wrist_data.round(2)


0.080000000000000002

Probability that a shot results in a goal given a slap shot

$$p(g_s| slap_s) = \frac{p(slap_s|g_s) p(g_s)}{ p(slap_s | g_s) p(g_s) +  p(slap_s | \bar{g_s}) p( \bar{g_s}) }$$


In [139]:
p_slap_goal = dg['isslap'].mean()
p_slap_goal.round(2)

0.12

In [140]:
p_slap_notgoal = dn['isslap'].mean()
p_slap_notgoal.round(2)

0.20999999999999999

In [141]:
p_goal_slap = p_slap_goal*p_goal / (p_slap_goal*p_goal + p_slap_notgoal*p_notgoal  )
p_goal_slap.round(2)

0.040000000000000001

In [142]:
dm.groupby(['shotType'])['isgoal'].mean()

shotType
Backhand       0.10
Deflected      0.20
Slap           0.04
Snap           0.08
Tip-In         0.18
Wrap-around    0.05
Wrist          0.08
Name: isgoal, dtype: float64

Probability goal given shottype is backhand