# Value of on-ice events

<p>data frames used in this notebook:</p>
<p>&nbsp; &nbsp; 1. all on-ice events.</p>
<p>&nbsp; &nbsp; 2. even strength on-ice events.</p> 
<p>&nbsp; &nbsp; 3. all on-ice events prior to a goal.</p>
<p>&nbsp; &nbsp; 4. even strength on-ice events prior to a goal.</p>
<p>&nbsp; &nbsp; 5. all on-ice events that occured while goal differential was between -1 and 1.</p>
<p>&nbsp; &nbsp; 6. even strength events that occured while goal differential was between -1 and 1.</p> 

## 1) all on-ice events data

In [294]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy
import zipfile


pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 200)

In [295]:
pwd

'/Users/stefanostselios/Desktop/nhl_roster_design-master'

In [296]:
da = pd.read_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/pbp_merged.csv')
#da = pd.read_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/pbp_merged.csv')
da = da.drop('Unnamed: 0', axis=1)
da = da.rename(columns={'TeamCode': 'EventTeamCode'})

- keep regular season games and relevant on-ice events in **regulation time**. Drop duplicates by season, game number, event number and event team to have one obsrevation per event per game.

In [297]:
da = da[da['GameNumber'] <= 21230]
da = da[da['Period'] <= 3]
da = da[da['Period'] >= 1]
da = da[da['EventType']!='STOP']
da = da[da['EventType']!='EISTR']
da = da[da['EventType']!='EIEND']
da = da[da['EventType'] !='FIGHT']
da = da.dropna(subset=['EventNumber'])

In [298]:
db = da[['Season', 'GameNumber', 'VTeamCode', 'HTeamCode', 'EventNumber', 'EventType', 'EventTeamCode']]
db = db.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
db = db.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
db.head()

Unnamed: 0,Season,GameNumber,VTeamCode,HTeamCode,EventNumber,EventType,EventTeamCode
0,2010,20001,MTL,TOR,1,FAC,MTL
1,2010,20001,MTL,TOR,3,HIT,TOR
2,2010,20001,MTL,TOR,4,HIT,MTL
3,2010,20001,MTL,TOR,5,HIT,MTL
4,2010,20001,MTL,TOR,6,GIVE,TOR


- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [299]:
db['Goal'] = db.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
db['Block'] = db.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
db['Faceoff'] = db.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
db['Giveaway'] = db.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
db['Hit'] = db.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
db['Miss'] = db.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
db['Penalty'] = db.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
db['Shot'] = db.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
db['Takeaway'] = db.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [300]:
db['Blocks'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
db['Faceoffs'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
db['Giveaways'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
db['Goals'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
db['Hits'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
db['Misses'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
db['Penalties'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
db['Shots'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
db['Takeaways'] = db.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape the dataframe from wide to long based on team code.

In [301]:
db = db.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in db.columns if 'TeamCode' in col]
db = pd.lreshape(db, {'TeamCode' : a})
db = db.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
db = db.rename(columns={'EventTeam': 'EventTeamCode'})
db.head()

Unnamed: 0,Block,Blocks,EventNumber,EventTeamCode,EventType,Faceoff,Faceoffs,GameNumber,Giveaway,Giveaways,Goal,Goals,Hit,Hits,Miss,Misses,Penalties,Penalty,Season,Shot,Shots,Takeaway,Takeaways,TeamCode
0,,,1,MTL,FAC,1.0,23.0,20001,,,,,,,,,,,2010,,,,,MTL
310113,,,1,MTL,FAC,1.0,23.0,20001,,,,,,,,,,,2010,,,,,TOR
1,,,3,TOR,HIT,,,20001,,,,,1.0,27.0,,,,,2010,,,,,MTL
310114,,,3,TOR,HIT,,,20001,,,,,1.0,27.0,,,,,2010,,,,,TOR
2,,,4,MTL,HIT,,,20001,,,,,1.0,34.0,,,,,2010,,,,,MTL


In [302]:
db.shape

(620226, 24)

- drop duplicates by season, game number, team code and event type

In [303]:
db = db.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
db = db [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
db.shape

(43756, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [304]:
db['Blocks_F'] = db.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Blocks_A'] = db.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Faceoffs_F'] = db.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Faceoffs_A'] = db.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Giveaways_F'] = db.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Giveaways_A'] = db.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Goals_F'] = db.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Goals_A'] = db.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Hits_F'] = db.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Hits_A'] = db.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Miss_F'] = db.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Miss_A'] = db.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Penalties_F'] = db.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Penalties_A'] = db.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Shots_F'] = db.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Shots_A'] = db.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db['Takeaways_F'] = db.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
db['Takeaways_A'] = db.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
db = db.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backfill and forward fill on-ice events by season, gamenumber and teamcode. 

In [305]:
db['Blocks_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
db['Faceoffs_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
db['Giveaways_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
db['Goals_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
db['Hits_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
db['Miss_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
db['Penalties_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
db['Shots_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
db['Takeaways_F'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
db['Blocks_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
db['Faceoffs_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
db['Giveaways_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
db['Goals_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
db['Hits_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
db['Miss_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
db['Penalties_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
db['Shots_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
db['Takeaways_A'] = db.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [306]:
db = db[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
db = db.sort_values(['Season', 'GameNumber'], ascending=[True, True])
db = db.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
db.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,21.0,22.0,23.0,20.0,7.0,16.0,2.0,3.0,34.0,27.0,15.0,9.0,5.0,3.0,26.0,21.0,7.0,6.0
310113,2010,20001,TOR,22.0,21.0,20.0,23.0,16.0,7.0,3.0,2.0,27.0,34.0,9.0,15.0,3.0,5.0,21.0,26.0,6.0,7.0
267,2010,20002,PHI,16.0,14.0,22.0,34.0,9.0,11.0,3.0,2.0,34.0,32.0,10.0,18.0,6.0,5.0,24.0,29.0,1.0,9.0
310380,2010,20002,PIT,14.0,16.0,34.0,22.0,11.0,9.0,2.0,3.0,32.0,34.0,18.0,10.0,5.0,6.0,29.0,24.0,9.0,1.0
546,2010,20003,CAR,19.0,19.0,33.0,52.0,11.0,11.0,4.0,3.0,14.0,19.0,9.0,8.0,5.0,5.0,27.0,26.0,3.0,8.0


In [307]:
db.shape

(2460, 21)

- import team roster player rank dataframe (dc) and merge on-ice events (db).

In [308]:
dc = pd.read_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/team_roster_player_rank.csv')
#dc = pd.read_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/team_roster_player_rank.csv')
dc = dc.drop('Unnamed: 0', axis=1)
dc.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0
1,1,20001,MTL,2010,TOR,21.0,R,MTL,2,3,1,18.0,F,12.0,12.0,6.0
2,1,20001,MTL,2010,TOR,57.0,L,MTL,2,3,2,18.0,F,12.0,12.0,6.0
3,1,20001,MTL,2010,TOR,26.0,D,MTL,2,3,2,18.0,D,6.0,12.0,6.0
4,1,20001,MTL,2010,TOR,75.0,D,MTL,2,3,2,18.0,D,6.0,12.0,6.0


In [309]:
dc.shape

(36540, 16)

In [310]:
dv = pd.merge(dc, db, on=['Season', 'GameNumber', 'TeamCode'], how='left')
#dv = dv.drop('Unnamed: 0', axis=1)
dv = dv.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dv = dv.sort_values(['Season', 'GameNumber'], ascending=[True, True])

In [311]:
dv.shape

(2030, 34)

In [312]:
dv.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,21.0,22.0,23.0,20.0,7.0,16.0,2.0,3.0,34.0,27.0,15.0,9.0,5.0,3.0,26.0,21.0,7.0,6.0
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,22.0,21.0,20.0,23.0,16.0,7.0,3.0,2.0,27.0,34.0,9.0,15.0,3.0,5.0,21.0,26.0,6.0,7.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,16.0,14.0,22.0,34.0,9.0,11.0,3.0,2.0,34.0,32.0,10.0,18.0,6.0,5.0,24.0,29.0,1.0,9.0
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,14.0,16.0,34.0,22.0,11.0,9.0,2.0,3.0,32.0,34.0,18.0,10.0,5.0,6.0,29.0,24.0,9.0,1.0
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,19.0,19.0,33.0,52.0,11.0,11.0,4.0,3.0,14.0,19.0,9.0,8.0,5.0,5.0,27.0,26.0,3.0,8.0


- create columns for team win and team loss. 

In [313]:
dv['TeamWin'] =  dv.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dv['TeamLos'] =  dv.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [314]:
dv['GP'] = dv.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
dv['GW'] = dv.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
dv['GL'] = dv.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
dv['GF'] = dv.groupby(['Season','TeamCode'])['GF'].transform('sum')
dv['GA'] = dv.groupby(['Season','TeamCode'])['GA'].transform('sum')
dv['Blocks_F'] = dv.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
dv['Faceoffs_F'] = dv.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
dv['Giveaways_F'] = dv.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
dv['Goals_F'] = dv.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
dv['Hits_F'] = dv.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
dv['Miss_F'] = dv.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
dv['Penalties_F'] = dv.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
dv['Shots_F'] = dv.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
dv['Takeaways_F'] = dv.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
dv['Blocks_A'] = dv.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
dv['Faceoffs_A'] = dv.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
dv['Giveaways_A'] = dv.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
dv['Goals_A'] = dv.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
dv['Hits_A'] = dv.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
dv['Miss_A'] = dv.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
dv['Penalties_A'] = dv.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
dv['Shots_A'] = dv.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
dv['Takeaways_A'] = dv.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
dv.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,1011.0,996.0,1927.0,1955.0,580.0,499.0,179.0,171.0,1351.0,1545.0,821.0,755.0,353.0,336.0,1989.0,1891.0,432.0,385.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,1129.0,1123.0,1985.0,1940.0,764.0,694.0,183.0,203.0,1761.0,1632.0,847.0,904.0,323.0,361.0,1785.0,1923.0,489.0,547.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,1149.0,1059.0,2140.0,2146.0,568.0,561.0,225.0,188.0,1677.0,1584.0,830.0,904.0,363.0,352.0,2020.0,1962.0,498.0,547.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,945.0,1110.0,1983.0,2059.0,437.0,408.0,184.0,169.0,1977.0,1764.0,816.0,737.0,422.0,407.0,2035.0,1812.0,378.0,402.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,1097.0,1230.0,1998.0,2489.0,537.0,558.0,208.0,205.0,1983.0,1555.0,962.0,1071.0,308.0,389.0,2070.0,2237.0,652.0,583.0,1,0,76,38,35


- display wins and losses per team for the season.

In [315]:
dv['L'] = dv.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dv['W'] = dv.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dv.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,1011.0,996.0,1927.0,1955.0,580.0,499.0,179.0,171.0,1351.0,1545.0,821.0,755.0,353.0,336.0,1989.0,1891.0,432.0,385.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,1129.0,1123.0,1985.0,1940.0,764.0,694.0,183.0,203.0,1761.0,1632.0,847.0,904.0,323.0,361.0,1785.0,1923.0,489.0,547.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,1149.0,1059.0,2140.0,2146.0,568.0,561.0,225.0,188.0,1677.0,1584.0,830.0,904.0,363.0,352.0,2020.0,1962.0,498.0,547.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,945.0,1110.0,1983.0,2059.0,437.0,408.0,184.0,169.0,1977.0,1764.0,816.0,737.0,422.0,407.0,2035.0,1812.0,378.0,402.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,1097.0,1230.0,1998.0,2489.0,537.0,558.0,208.0,205.0,1983.0,1555.0,962.0,1071.0,308.0,389.0,2070.0,2237.0,652.0,583.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [316]:
dv = dv.drop_duplicates(['Season', 'TeamCode'])
dv['WinPc'] = dv['W']/ dv['GP']
dv['LossPc'] = dv['L']/ dv['GP']
dv['Mean_Blocks_F'] = dv['Blocks_F']/ dv['GP']
dv['Mean_Faceoffs_F'] = dv['Faceoffs_F']/ dv['GP']
dv['Mean_Giveaways_F'] = dv['Giveaways_F']/ dv['GP']
dv['Mean_Goals_F'] = dv['Goals_F']/ dv['GP']
dv['Mean_Hits_F'] = dv['Hits_F']/ dv['GP']
dv['Mean_Miss_F'] = dv['Miss_F']/ dv['GP']
dv['Mean_Penalties_F'] = dv['Penalties_F']/ dv['GP']
dv['Mean_Shots_F'] = dv['Shots_F']/ dv['GP']
dv['Mean_Takeaways_F'] = dv['Takeaways_F']/ dv['GP']
dv['Mean_Blocks_A'] = dv['Blocks_A']/ dv['GP']
dv['Mean_Faceoffs_A'] = dv['Faceoffs_A']/ dv['GP']
dv['Mean_Giveaways_A'] = dv['Giveaways_A']/ dv['GP']
dv['Mean_Goals_A'] = dv['Goals_A']/ dv['GP']
dv['Mean_Hits_A'] = dv['Hits_A']/ dv['GP']
dv['Mean_Miss_A'] = dv['Miss_A']/ dv['GP']
dv['Mean_Penalties_A'] = dv['Penalties_A']/ dv['GP']
dv['Mean_Shots_A'] = dv['Shots_A']/ dv['GP']
dv['Mean_Takeaways_A'] = dv['Takeaways_A']/ dv['GP']
dv.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,1011.0,996.0,1927.0,1955.0,580.0,499.0,179.0,171.0,1351.0,1545.0,821.0,755.0,353.0,336.0,1989.0,1891.0,432.0,385.0,0,1,68,34,31,31,37,0.544118,0.455882,14.867647,28.338235,8.529412,2.632353,19.867647,12.073529,5.191176,29.25,6.352941,14.647059,28.75,7.338235,2.514706,22.720588,11.102941,4.941176,27.808824,5.661765
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,1129.0,1123.0,1985.0,1940.0,764.0,694.0,183.0,203.0,1761.0,1632.0,847.0,904.0,323.0,361.0,1785.0,1923.0,489.0,547.0,1,0,70,34,31,36,34,0.485714,0.514286,16.128571,28.357143,10.914286,2.614286,25.157143,12.1,4.614286,25.5,6.985714,16.042857,27.714286,9.914286,2.9,23.314286,12.914286,5.157143,27.471429,7.814286
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,1149.0,1059.0,2140.0,2146.0,568.0,561.0,225.0,188.0,1677.0,1584.0,830.0,904.0,363.0,352.0,2020.0,1962.0,498.0,547.0,1,0,72,41,31,31,41,0.569444,0.430556,15.958333,29.722222,7.888889,3.125,23.291667,11.527778,5.041667,28.055556,6.916667,14.708333,29.805556,7.791667,2.611111,22.0,12.555556,4.888889,27.25,7.597222
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,945.0,1110.0,1983.0,2059.0,437.0,408.0,184.0,169.0,1977.0,1764.0,816.0,737.0,422.0,407.0,2035.0,1812.0,378.0,402.0,0,1,71,41,31,31,40,0.56338,0.43662,13.309859,27.929577,6.15493,2.591549,27.84507,11.492958,5.943662,28.661972,5.323944,15.633803,29.0,5.746479,2.380282,24.84507,10.380282,5.732394,25.521127,5.661972
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,1097.0,1230.0,1998.0,2489.0,537.0,558.0,208.0,205.0,1983.0,1555.0,962.0,1071.0,308.0,389.0,2070.0,2237.0,652.0,583.0,1,0,76,38,35,38,38,0.5,0.5,14.434211,26.289474,7.065789,2.736842,26.092105,12.657895,4.052632,27.236842,8.578947,16.184211,32.75,7.342105,2.697368,20.460526,14.092105,5.118421,29.434211,7.671053


In [317]:
dv = dv[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
dv['Rank_W'] = dv.groupby(['Season'])['WinPc'].rank(ascending=False)
dv = dv.sort_values(['Season', 'Rank_W'], ascending=[True, True])
dv.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,12.958904,32.273973,6.739726,3.082192,22.123288,12.287671,4.657534,28.849315,7.273973,15.109589,26.356164,7.356164,2.219178,22.808219,11.082192,4.616438,27.39726,7.09589,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,14.153846,31.184615,10.061538,2.969231,21.769231,13.246154,4.353846,31.092308,8.307692,15.615385,26.6,9.753846,2.292308,23.907692,11.030769,4.415385,25.892308,6.861538,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,14.131579,29.815789,6.592105,2.947368,21.065789,11.565789,4.618421,29.368421,5.105263,15.605263,27.552632,8.342105,2.171053,24.078947,11.355263,4.644737,29.881579,7.657895,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,10.867647,30.5,8.823529,3.073529,21.323529,13.235294,4.088235,30.161765,6.941176,13.676471,28.294118,7.455882,2.852941,23.397059,11.058824,4.161765,27.352941,7.132353,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,15.292308,26.907692,7.353846,2.707692,23.076923,10.230769,5.215385,25.030769,5.246154,10.215385,29.384615,8.923077,2.692308,21.707692,14.246154,4.707692,30.769231,6.015385,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,15.277778,29.819444,8.194444,2.375,22.597222,11.75,4.388889,27.722222,7.680556,15.305556,28.388889,7.958333,2.125,22.930556,11.513889,4.041667,26.125,6.777778,6.0
306,2010,LA,70,40,30,0.571429,0.428571,12.485714,29.4,10.171429,2.514286,25.571429,12.9,4.385714,25.685714,5.328571,14.571429,27.685714,9.457143,2.328571,28.371429,11.685714,4.642857,25.485714,5.242857,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,15.958333,29.722222,7.888889,3.125,23.291667,11.527778,5.041667,28.055556,6.916667,14.708333,29.805556,7.791667,2.611111,22.0,12.555556,4.888889,27.25,7.597222,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,13.309859,27.929577,6.15493,2.591549,27.84507,11.492958,5.943662,28.661972,5.323944,15.633803,29.0,5.746479,2.380282,24.84507,10.380282,5.732394,25.521127,5.661972,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,15.520548,26.643836,5.082192,2.835616,27.931507,11.342466,4.684932,27.013699,7.164384,12.547945,29.0,6.794521,2.287671,26.890411,10.712329,5.219178,26.821918,6.630137,10.0


In [318]:
dv.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_all_events_ranking.csv', index='False', sep=',')
#dv.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_all_events_ranking.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [319]:
dv['DBlock'] = dv['Mean_Blocks_F'] - dv['Mean_Blocks_A']
dv['DFaceoff'] = dv['Mean_Faceoffs_F'] - dv['Mean_Faceoffs_A']
dv['DGiveaway'] = dv['Mean_Giveaways_F'] - dv['Mean_Giveaways_A']
dv['DGoal'] = dv['Mean_Goals_F'] - dv['Mean_Goals_A']
dv['DHit'] = dv['Mean_Hits_F'] - dv['Mean_Hits_A']
dv['DMiss'] = dv['Mean_Miss_F'] - dv['Mean_Miss_A']
dv['DPenalty'] = dv['Mean_Penalties_F'] - dv['Mean_Penalties_A']
dv['DShot'] = dv['Mean_Shots_F'] - dv['Mean_Shots_A']
dv['DTakeaway'] = dv['Mean_Takeaways_F'] - dv['Mean_Takeaways_A']

## all on-ice events analysis

- summary analysis

In [320]:
dv.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,14.015993,28.553395,7.849157,2.650775,22.510534,11.635117,4.620236,27.287465,7.031621,13.967391,28.575154,7.843626,2.666309,22.653046,11.601221,4.620047,27.183733,7.033107,15.5,0.048602,-0.021759,0.005532,-0.015534,-0.142511,0.033896,0.000189,0.103732,-0.001487
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,1.417671,1.646064,1.75688,0.286535,2.880149,0.988081,0.590286,1.847804,1.357131,1.333267,1.504471,1.492127,0.339782,2.67056,1.169741,0.526381,1.705757,1.163477,8.802429,1.72523,2.624395,0.957012,0.501318,2.973096,1.579219,0.386041,2.735315,0.986362
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,10.402985,24.764706,4.643836,1.925373,18.32,10.19403,3.552239,22.926471,5.105263,10.215385,24.656716,5.287671,2.125,19.014286,8.641791,3.61194,23.253731,5.242857,1.0,-2.808824,-6.460526,-2.030303,-0.984375,-7.211538,-4.015385,-1.065789,-6.014706,-2.552632
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,13.293374,27.496408,6.754013,2.437395,20.293134,10.662363,4.328462,26.273544,6.068135,13.322319,27.744048,6.898788,2.412077,20.651884,10.911842,4.206692,26.127404,6.038054,8.25,-1.300699,-1.482606,-0.523304,-0.359649,-2.122229,-1.083031,-0.213405,-1.868571,-0.622736
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,14.142713,28.620666,7.819444,2.626036,22.065324,11.631059,4.637978,27.173216,6.928922,13.999269,28.832746,7.674679,2.671047,22.26087,11.346646,4.629648,27.383246,6.917611,15.5,0.091468,0.196256,0.152311,0.134581,-0.369565,0.433124,-0.013158,0.502778,0.230832
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,15.066176,29.641667,8.797024,2.845268,23.550189,12.285198,4.928135,28.647795,7.772252,14.755655,29.217556,8.907096,2.888235,23.780034,12.354638,5.005842,27.865377,7.803022,22.75,0.971181,1.471709,0.406338,0.242647,1.489367,1.310421,0.305147,2.049359,0.685049
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,16.628571,32.273973,12.367647,3.125,28.835616,13.253521,5.943662,31.092308,10.157143,16.184211,32.75,11.794118,3.5,28.371429,14.246154,5.732394,30.769231,9.544118,30.0,5.076923,5.917808,2.507463,0.863014,7.794521,2.215385,0.619718,5.2,1.446154


#### win percent analysis

- regress **win percent** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [321]:
print ('win percent for all on-ice events differential')
y = dv['WinPc']  
X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

win percent for all on-ice events differential
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.934
Model:                            OLS   Adj. R-squared:                  0.827
Method:                 Least Squares   F-statistic:                     8.686
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           0.000397
Time:                        20:26:58   Log-Likelihood:                 69.554
No. Observations:                  30   AIC:                            -101.1
Df Residuals:                      11   BIC:                            -74.48
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------

In [322]:
print ('win percent for all on-ice events')
y = dv['WinPc']  
X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print(result.summary())

win percent for all on-ice events
Optimization terminated successfully.
         Current function value: 0.661362
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04584
Time:                        20:26:58   Log-Likelihood:                -19.841
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const               -1.5699     18.848     -0.083      0.934  

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [323]:
print ('win percent for all on-ice events differential')
y = dv['WinPc']  
X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print(result.summary())

win percent for all on-ice events differential
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.851
Model:                            OLS   Adj. R-squared:                  0.785
Method:                 Least Squares   F-statistic:                     12.74
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           1.73e-06
Time:                        20:26:58   Log-Likelihood:                 57.323
No. Observations:                  30   AIC:                            -94.65
Df Residuals:                      20   BIC:                            -80.63
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const

In [324]:
print ('win percent for all on-ice events differential')
y = dv['WinPc']  
X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print(result.summary())

win percent for all on-ice events differential
Optimization terminated successfully.
         Current function value: 0.664170
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04179
Time:                        20:26:58   Log-Likelihood:                -19.925
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9950
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0040      0.372      0.011      0.991       

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [325]:
print ('mean goals for for all on-ice events')
y = dv['Mean_Goals_F']  
X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals for for all on-ice events
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.716
Model:                            OLS   Adj. R-squared:                  0.366
Method:                 Least Squares   F-statistic:                     2.048
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0991
Time:                        20:26:58   Log-Likelihood:                 14.315
No. Observations:                  30   AIC:                             5.370
Df Residuals:                      13   BIC:                             29.19
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
con

In [326]:
#y = dv['Mean_Goals_F']  
#X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [327]:
print ('mean goals for for all on-ice events differential')
y = dv['Mean_Goals_F']  
X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals for for all on-ice events differential
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.384
Model:                            OLS   Adj. R-squared:                  0.150
Method:                 Least Squares   F-statistic:                     1.638
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.174
Time:                        20:26:58   Log-Likelihood:                 2.7104
No. Observations:                  30   AIC:                             12.58
Df Residuals:                      21   BIC:                             25.19
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
co

In [328]:
#y = dv['Mean_Goals_F']  
#X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [329]:
print ('mean goals against for all on-ice events')
y = dv['Mean_Goals_A']  
X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals against for all on-ice events
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.706
Model:                            OLS   Adj. R-squared:                  0.345
Method:                 Least Squares   F-statistic:                     1.955
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.114
Time:                        20:26:58   Log-Likelihood:                 8.7050
No. Observations:                  30   AIC:                             16.59
Df Residuals:                      13   BIC:                             40.41
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------

In [330]:
#y = dv['Mean_Goals_A']  
#X = sm.add_constant(dv[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [331]:
print ('mean goals against for all on-ice events differential')
y = dv['Mean_Goals_A']  
X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals against for all on-ice events differential
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.339
Model:                            OLS   Adj. R-squared:                  0.087
Method:                 Least Squares   F-statistic:                     1.344
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.277
Time:                        20:26:58   Log-Likelihood:                -3.4750
No. Observations:                  30   AIC:                             24.95
Df Residuals:                      21   BIC:                             37.56
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------------------------------------------------

In [332]:
#y = dv['Mean_Goals_A']  
#X = sm.add_constant(dv[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

## 2)  even strength on-ice events data

- drop duplicates by season, game number, event number and event team to have one observation per event. If advantage type is missing, group by season, gamenumber and use backward fill. Keep only on-ice events that happened in **even strength situations.** 

In [333]:
dd = da[['Season', 'GameNumber', 'AdvantageType', 'VTeamCode', 'HTeamCode', 'EventNumber', 'EventType', 'EventTeamCode']]
dd = dd.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dd = dd.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
dd['AdvantageType'] = dd.groupby(['Season','GameNumber'])['AdvantageType'].apply(lambda x: x.bfill())
dd = dd[dd['AdvantageType'] == 'EV']
dd.head()

Unnamed: 0,Season,GameNumber,AdvantageType,VTeamCode,HTeamCode,EventNumber,EventType,EventTeamCode
0,2010,20001,EV,MTL,TOR,1,FAC,MTL
1,2010,20001,EV,MTL,TOR,3,HIT,TOR
2,2010,20001,EV,MTL,TOR,4,HIT,MTL
3,2010,20001,EV,MTL,TOR,5,HIT,MTL
4,2010,20001,EV,MTL,TOR,6,GIVE,TOR


- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [334]:
dd['Goal'] = dd.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
dd['Block'] = dd.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
dd['Faceoff'] = dd.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
dd['Giveaway'] = dd.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
dd['Hit'] = dd.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
dd['Miss'] = dd.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
dd['Penalty'] = dd.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
dd['Shot'] = dd.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
dd['Takeaway'] = dd.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [335]:
dd['Blocks'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
dd['Faceoffs'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
dd['Giveaways'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
dd['Goals'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
dd['Hits'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
dd['Misses'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
dd['Penalties'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
dd['Shots'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
dd['Takeaways'] = dd.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape the dataframe from wide to long based on team code.

In [336]:
dd = dd.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in dd.columns if 'TeamCode' in col]
dd = pd.lreshape(dd, {'TeamCode' : a})
dd = dd.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dd = dd.rename(columns={'EventTeam': 'EventTeamCode'})
dd.head()

Unnamed: 0,AdvantageType,Block,Blocks,EventNumber,EventTeamCode,EventType,Faceoff,Faceoffs,GameNumber,Giveaway,Giveaways,Goal,Goals,Hit,Hits,Miss,Misses,Penalties,Penalty,Season,Shot,Shots,Takeaway,Takeaways,TeamCode
0,EV,,,1,MTL,FAC,1.0,17.0,20001,,,,,,,,,,,2010,,,,,MTL
256153,EV,,,1,MTL,FAC,1.0,17.0,20001,,,,,,,,,,,2010,,,,,TOR
1,EV,,,3,TOR,HIT,,,20001,,,,,1.0,24.0,,,,,2010,,,,,MTL
256154,EV,,,3,TOR,HIT,,,20001,,,,,1.0,24.0,,,,,2010,,,,,TOR
2,EV,,,4,MTL,HIT,,,20001,,,,,1.0,29.0,,,,,2010,,,,,MTL


In [337]:
dd.shape

(512306, 25)

- drop duplicates by season, game number, team code and event type

In [338]:
dd = dd.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
dd = dd [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
dd = dd.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dd.shape

(43300, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [339]:
dd['Blocks_F'] = dd.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Blocks_A'] = dd.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Faceoffs_F'] = dd.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Faceoffs_A'] = dd.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Giveaways_F'] = dd.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Giveaways_A'] = dd.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Goals_F'] = dd.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Goals_A'] = dd.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Hits_F'] = dd.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Hits_A'] = dd.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Miss_F'] = dd.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Miss_A'] = dd.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Penalties_F'] = dd.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Penalties_A'] = dd.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Shots_F'] = dd.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Shots_A'] = dd.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd['Takeaways_F'] = dd.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dd['Takeaways_A'] = dd.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dd = dd.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backfill and forward fill on-ice events by season, game number and team code. 

In [340]:
dd['Blocks_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
dd['Faceoffs_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
dd['Giveaways_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
dd['Goals_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
dd['Hits_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
dd['Miss_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
dd['Penalties_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
dd['Shots_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
dd['Takeaways_F'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
dd['Blocks_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
dd['Faceoffs_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
dd['Giveaways_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
dd['Goals_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
dd['Hits_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
dd['Miss_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
dd['Penalties_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
dd['Shots_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
dd['Takeaways_A'] = dd.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [341]:
dd = dd[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
dd = dd.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dd = dd.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dd.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,14.0,18.0,17.0,11.0,5.0,13.0,2.0,3.0,29.0,24.0,10.0,7.0,5.0,3.0,21.0,19.0,4.0,5.0
256153,2010,20001,TOR,18.0,14.0,11.0,17.0,13.0,5.0,3.0,2.0,24.0,29.0,7.0,10.0,3.0,5.0,19.0,21.0,5.0,4.0
210,2010,20002,PHI,12.0,12.0,15.0,25.0,7.0,8.0,1.0,1.0,27.0,28.0,8.0,11.0,6.0,5.0,20.0,25.0,,8.0
256363,2010,20002,PIT,12.0,12.0,25.0,15.0,8.0,7.0,1.0,1.0,28.0,27.0,11.0,8.0,5.0,6.0,25.0,20.0,8.0,
429,2010,20003,CAR,14.0,19.0,25.0,39.0,9.0,8.0,2.0,1.0,14.0,15.0,9.0,7.0,5.0,4.0,25.0,19.0,3.0,8.0


In [342]:
dd.shape

(2460, 21)

- merge all even strength on-ice events (dd) onto  team roster player rank dataframe (dc).

In [343]:
de = pd.merge(dc, dd, on=['Season', 'GameNumber', 'TeamCode'], how='left')
#dw = dv.drop('Unnamed: 0', axis=1)
de = de.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
de = de.sort_values(['Season', 'GameNumber'], ascending=[True, True])

In [344]:
de.shape

(2030, 34)

In [345]:
de.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,14.0,18.0,17.0,11.0,5.0,13.0,2.0,3.0,29.0,24.0,10.0,7.0,5.0,3.0,21.0,19.0,4.0,5.0
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,18.0,14.0,11.0,17.0,13.0,5.0,3.0,2.0,24.0,29.0,7.0,10.0,3.0,5.0,19.0,21.0,5.0,4.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,12.0,12.0,15.0,25.0,7.0,8.0,1.0,1.0,27.0,28.0,8.0,11.0,6.0,5.0,20.0,25.0,,8.0
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,12.0,12.0,25.0,15.0,8.0,7.0,1.0,1.0,28.0,27.0,11.0,8.0,5.0,6.0,25.0,20.0,8.0,
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,14.0,19.0,25.0,39.0,9.0,8.0,2.0,1.0,14.0,15.0,9.0,7.0,5.0,4.0,25.0,19.0,3.0,8.0


- create columns for team win and team loss. 

In [346]:
de['TeamWin'] =  de.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
de['TeamLos'] =  de.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [347]:
de['GP'] = de.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
de['GW'] = de.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
de['GL'] = de.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
de['GF'] = de.groupby(['Season','TeamCode'])['GF'].transform('sum')
de['GA'] = de.groupby(['Season','TeamCode'])['GA'].transform('sum')
de['Blocks_F'] = de.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
de['Faceoffs_F'] = de.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
de['Giveaways_F'] = de.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
de['Goals_F'] = de.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
de['Hits_F'] = de.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
de['Miss_F'] = de.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
de['Penalties_F'] = de.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
de['Shots_F'] = de.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
de['Takeaways_F'] = de.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
de['Blocks_A'] = de.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
de['Faceoffs_A'] = de.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
de['Giveaways_A'] = de.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
de['Goals_A'] = de.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
de['Hits_A'] = de.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
de['Miss_A'] = de.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
de['Penalties_A'] = de.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
de['Shots_A'] = de.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
de['Takeaways_A'] = de.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
de.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,801.0,822.0,1481.0,1478.0,473.0,427.0,131.0,123.0,1255.0,1428.0,658.0,592.0,308.0,307.0,1601.0,1490.0,330.0,293.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,945.0,905.0,1496.0,1512.0,651.0,559.0,140.0,146.0,1613.0,1503.0,678.0,745.0,292.0,319.0,1423.0,1608.0,407.0,450.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,930.0,843.0,1653.0,1664.0,471.0,427.0,169.0,135.0,1533.0,1458.0,651.0,730.0,319.0,315.0,1634.0,1600.0,402.0,449.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,751.0,886.0,1547.0,1506.0,343.0,332.0,130.0,126.0,1803.0,1621.0,654.0,594.0,379.0,372.0,1650.0,1452.0,306.0,322.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,910.0,1007.0,1538.0,1951.0,426.0,466.0,152.0,155.0,1849.0,1428.0,786.0,877.0,270.0,342.0,1655.0,1826.0,532.0,455.0,1,0,76,38,35


- display wins and losses per team for the season.

In [348]:
de['L'] = de.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
de['W'] = de.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
de.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,801.0,822.0,1481.0,1478.0,473.0,427.0,131.0,123.0,1255.0,1428.0,658.0,592.0,308.0,307.0,1601.0,1490.0,330.0,293.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,945.0,905.0,1496.0,1512.0,651.0,559.0,140.0,146.0,1613.0,1503.0,678.0,745.0,292.0,319.0,1423.0,1608.0,407.0,450.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,930.0,843.0,1653.0,1664.0,471.0,427.0,169.0,135.0,1533.0,1458.0,651.0,730.0,319.0,315.0,1634.0,1600.0,402.0,449.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,751.0,886.0,1547.0,1506.0,343.0,332.0,130.0,126.0,1803.0,1621.0,654.0,594.0,379.0,372.0,1650.0,1452.0,306.0,322.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,910.0,1007.0,1538.0,1951.0,426.0,466.0,152.0,155.0,1849.0,1428.0,786.0,877.0,270.0,342.0,1655.0,1826.0,532.0,455.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [349]:
de = de.drop_duplicates(['Season', 'TeamCode'])
de['WinPc'] = de['W']/ de['GP']
de['LossPc'] = de['L']/ de['GP']
de['Mean_Blocks_F'] = de['Blocks_F']/ de['GP']
de['Mean_Faceoffs_F'] = de['Faceoffs_F']/ de['GP']
de['Mean_Giveaways_F'] = de['Giveaways_F']/ de['GP']
de['Mean_Goals_F'] = de['Goals_F']/ de['GP']
de['Mean_Hits_F'] = de['Hits_F']/ de['GP']
de['Mean_Miss_F'] = de['Miss_F']/ de['GP']
de['Mean_Penalties_F'] = de['Penalties_F']/ de['GP']
de['Mean_Shots_F'] = de['Shots_F']/ de['GP']
de['Mean_Takeaways_F'] = de['Takeaways_F']/ de['GP']
de['Mean_Blocks_A'] = de['Blocks_A']/ de['GP']
de['Mean_Faceoffs_A'] = de['Faceoffs_A']/ de['GP']
de['Mean_Giveaways_A'] = de['Giveaways_A']/ de['GP']
de['Mean_Goals_A'] = de['Goals_A']/ de['GP']
de['Mean_Hits_A'] = de['Hits_A']/ de['GP']
de['Mean_Miss_A'] = de['Miss_A']/ de['GP']
de['Mean_Penalties_A'] = de['Penalties_A']/ de['GP']
de['Mean_Shots_A'] = de['Shots_A']/ de['GP']
de['Mean_Takeaways_A'] = de['Takeaways_A']/ de['GP']
de.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,801.0,822.0,1481.0,1478.0,473.0,427.0,131.0,123.0,1255.0,1428.0,658.0,592.0,308.0,307.0,1601.0,1490.0,330.0,293.0,0,1,68,34,31,31,37,0.544118,0.455882,11.779412,21.779412,6.955882,1.926471,18.455882,9.676471,4.529412,23.544118,4.852941,12.088235,21.735294,6.279412,1.808824,21.0,8.705882,4.514706,21.911765,4.308824
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,945.0,905.0,1496.0,1512.0,651.0,559.0,140.0,146.0,1613.0,1503.0,678.0,745.0,292.0,319.0,1423.0,1608.0,407.0,450.0,1,0,70,34,31,36,34,0.485714,0.514286,13.5,21.371429,9.3,2.0,23.042857,9.685714,4.171429,20.328571,5.814286,12.928571,21.6,7.985714,2.085714,21.471429,10.642857,4.557143,22.971429,6.428571
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,930.0,843.0,1653.0,1664.0,471.0,427.0,169.0,135.0,1533.0,1458.0,651.0,730.0,319.0,315.0,1634.0,1600.0,402.0,449.0,1,0,72,41,31,31,41,0.569444,0.430556,12.916667,22.958333,6.541667,2.347222,21.291667,9.041667,4.430556,22.694444,5.583333,11.708333,23.111111,5.930556,1.875,20.25,10.138889,4.375,22.222222,6.236111
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,751.0,886.0,1547.0,1506.0,343.0,332.0,130.0,126.0,1803.0,1621.0,654.0,594.0,379.0,372.0,1650.0,1452.0,306.0,322.0,0,1,71,41,31,31,40,0.56338,0.43662,10.577465,21.788732,4.830986,1.830986,25.394366,9.211268,5.338028,23.239437,4.309859,12.478873,21.211268,4.676056,1.774648,22.830986,8.366197,5.239437,20.450704,4.535211
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,910.0,1007.0,1538.0,1951.0,426.0,466.0,152.0,155.0,1849.0,1428.0,786.0,877.0,270.0,342.0,1655.0,1826.0,532.0,455.0,1,0,76,38,35,38,38,0.5,0.5,11.973684,20.236842,5.605263,2.0,24.328947,10.342105,3.552632,21.776316,7.0,13.25,25.671053,6.131579,2.039474,18.789474,11.539474,4.5,24.026316,5.986842


In [350]:
de = de[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
de['Rank_W'] = de.groupby(['Season'])['WinPc'].rank(ascending=False)
de = de.sort_values(['Season', 'Rank_W'], ascending=[True, True])
de.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,10.726027,24.616438,5.438356,2.109589,20.657534,9.821918,4.109589,23.328767,6.109589,11.986301,20.60274,5.890411,1.712329,21.356164,8.890411,4.109589,21.835616,5.986301,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,11.661538,23.630769,8.353846,2.092308,20.784615,10.538462,3.953846,24.369231,7.107692,12.430769,20.923077,8.030769,1.553846,22.523077,8.907692,4.0,21.292308,5.923077,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,11.894737,23.802632,5.565789,2.263158,19.565789,9.434211,4.157895,24.223684,4.302632,13.065789,21.684211,7.013158,1.565789,22.355263,9.355263,4.092105,24.618421,6.473684,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,8.75,23.691176,7.382353,2.25,19.735294,10.588235,3.544118,24.602941,5.661765,11.044118,22.308824,6.132353,2.132353,21.941176,9.014706,3.529412,22.382353,5.779412,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,11.646154,20.569231,6.076923,1.830769,21.261538,8.138462,4.661538,19.8,4.507692,8.230769,22.4,7.338462,1.969231,20.061538,11.369231,4.230769,24.384615,5.030769,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,12.597222,22.833333,6.888889,1.75,20.652778,9.347222,3.930556,21.902778,6.097222,12.888889,22.5,6.402778,1.611111,21.347222,9.458333,3.555556,21.097222,5.833333,6.0
306,2010,LA,70,40,30,0.571429,0.428571,10.414286,22.342857,7.885714,1.871429,23.342857,10.085714,4.014286,20.271429,4.6,11.757143,21.528571,7.571429,1.771429,26.142857,9.428571,4.242857,20.842857,4.571429,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,12.916667,22.958333,6.541667,2.347222,21.291667,9.041667,4.430556,22.694444,5.583333,11.708333,23.111111,5.930556,1.875,20.25,10.138889,4.375,22.222222,6.236111,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,10.577465,21.788732,4.830986,1.830986,25.394366,9.211268,5.338028,23.239437,4.309859,12.478873,21.211268,4.676056,1.774648,22.830986,8.366197,5.239437,20.450704,4.535211,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,12.616438,20.726027,3.90411,2.041096,25.575342,9.39726,4.315068,21.821918,5.958904,10.39726,22.60274,5.767123,1.739726,24.931507,8.69863,4.671233,22.109589,5.493151,10.0


In [351]:
de.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_even_strength_events_ranking.csv', index='False', sep=',')
#de.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_even_strength_events_ranking.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [352]:
de['DBlock'] = de['Mean_Blocks_F'] - de['Mean_Blocks_A']
de['DFaceoff'] = de['Mean_Faceoffs_F'] - de['Mean_Faceoffs_A']
de['DGiveaway'] = de['Mean_Giveaways_F'] - de['Mean_Giveaways_A']
de['DGoal'] = de['Mean_Goals_F'] - de['Mean_Goals_A']
de['DHit'] = de['Mean_Hits_F'] - de['Mean_Hits_A']
de['DMiss'] = de['Mean_Miss_F'] - de['Mean_Miss_A']
de['DPenalty'] = de['Mean_Penalties_F'] - de['Mean_Penalties_A']
de['DShot'] = de['Mean_Shots_F'] - de['Mean_Shots_A']
de['DTakeaway'] = de['Mean_Takeaways_F'] - de['Mean_Takeaways_A']

##  even strength on-ice events analysis

- summary analysis

In [353]:
de.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,11.417775,22.192122,6.410316,1.949343,20.842484,9.403684,4.10223,22.113145,5.842156,11.380892,22.204486,6.405016,1.961169,20.959631,9.370857,4.104053,22.022916,5.842192,15.5,0.036883,-0.012363,0.0053,-0.011826,-0.117148,0.032828,-0.001822,0.090229,-3.6e-05
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,1.175248,1.240125,1.452156,0.196917,2.566832,0.761482,0.530903,1.645818,1.126842,1.077691,1.063502,1.147924,0.244096,2.552182,0.935056,0.49037,1.357174,0.974271,8.802429,1.293126,1.908653,0.84825,0.360516,2.752829,1.229296,0.357114,2.254737,0.848269
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,8.75,19.117647,3.849315,1.537313,16.96,8.138462,3.134615,18.455882,4.302632,8.230769,20.223881,4.424658,1.553846,17.071429,7.19403,3.328358,19.447761,4.308824,1.0,-2.294118,-5.434211,-1.863014,-0.605634,-6.326923,-3.230769,-0.947368,-4.911765,-2.171053
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,10.657468,21.437022,5.575658,1.825339,19.064396,8.666685,3.715789,21.522403,5.052473,10.882837,21.571233,5.610898,1.772233,19.099265,8.750548,3.664292,21.145994,4.996498,8.25,-0.844482,-1.040179,-0.450883,-0.278807,-2.026923,-0.936607,-0.238961,-1.397433,-0.46273
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,11.480529,22.277699,6.362939,1.89895,20.441176,9.415735,4.145124,21.909135,5.700113,11.457576,22.248162,6.25509,1.920433,20.559706,9.22,4.100847,22.054795,5.878205,15.5,-0.296518,0.276961,0.232143,0.058472,-0.550121,0.292995,0.021242,0.475548,0.166121
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,11.960526,22.960737,7.229577,2.088462,21.891098,10.03413,4.401684,23.306434,6.359155,11.929012,22.794406,7.194444,2.084834,22.251741,9.923077,4.511029,22.933571,6.493421,22.75,0.746711,0.864019,0.393269,0.250641,1.164706,0.949295,0.252213,2.03851,0.524527
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,13.742857,24.616438,9.852941,2.347222,26.657534,10.69697,5.338028,24.602941,8.58,13.25,25.671053,9.338235,2.453125,26.142857,11.539474,5.239437,24.618421,7.897059,30.0,3.415385,4.013699,1.940299,0.697368,7.041096,1.636364,0.549296,3.394737,1.46


#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [354]:
print ('win percent for even strength events')
y = de['WinPc']  
X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

win percent for even strength events
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.876
Model:                            OLS   Adj. R-squared:                  0.674
Method:                 Least Squares   F-statistic:                     4.331
Date:                Fri, 19 Jan 2018   Prob (F-statistic):            0.00843
Time:                        20:28:33   Log-Likelihood:                 60.076
No. Observations:                  30   AIC:                            -82.15
Df Residuals:                      11   BIC:                            -55.53
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
con

In [355]:
print ('win percent for even strength events')
y = de['WinPc']  
X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print(result.summary())

win percent for even strength events
Optimization terminated successfully.
         Current function value: 0.663241
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04313
Time:                        20:28:33   Log-Likelihood:                -19.897
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const               -1.6550     30.040     -0.055      0.95

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [356]:
print ('win percent for even strength events differential')
y = de['WinPc']  
X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print(result.summary())

win percent for even strength events differential
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.729
Model:                            OLS   Adj. R-squared:                  0.607
Method:                 Least Squares   F-statistic:                     5.976
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           0.000437
Time:                        20:28:33   Log-Likelihood:                 48.303
No. Observations:                  30   AIC:                            -76.61
Df Residuals:                      20   BIC:                            -62.59
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
co

In [357]:
print ('win percent for even strength events differential')
y = de['WinPc']  
X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print(result.summary())

win percent for even strength events differential
Optimization terminated successfully.
         Current function value: 0.668337
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.03578
Time:                        20:28:33   Log-Likelihood:                -20.050
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9972
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0029      0.371      0.008      0.994    

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [358]:
print ('mean goals for in even strength events')
y = de['Mean_Goals_F']  
X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in even strength events
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.781
Model:                            OLS   Adj. R-squared:                  0.512
Method:                 Least Squares   F-statistic:                     2.904
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0294
Time:                        20:28:33   Log-Likelihood:                 29.497
No. Observations:                  30   AIC:                            -24.99
Df Residuals:                      13   BIC:                            -1.174
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
c

In [359]:
#y = de['Mean_Goals_F']  
#X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [360]:
print ('mean goals for in even strength events differential')
y = de['Mean_Goals_F']  
X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in even strength events differential
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.389
Model:                            OLS   Adj. R-squared:                  0.156
Method:                 Least Squares   F-statistic:                     1.668
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.165
Time:                        20:28:33   Log-Likelihood:                 14.070
No. Observations:                  30   AIC:                            -10.14
Df Residuals:                      21   BIC:                             2.471
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------


In [361]:
#y = de['Mean_Goals_F']  
#X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [362]:
print ('mean goals against in even strength events')
y = de['Mean_Goals_A']  
X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals against in even strength events
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.698
Model:                            OLS   Adj. R-squared:                  0.326
Method:                 Least Squares   F-statistic:                     1.876
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.129
Time:                        20:28:33   Log-Likelihood:                 18.198
No. Observations:                  30   AIC:                            -2.396
Df Residuals:                      13   BIC:                             21.42
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
----------------------------------------------------------------------------------

In [363]:
#y = de['Mean_Goals_A']  
#X = sm.add_constant(de[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [364]:
print ('mean goals against for differential in even strength events')
y = de['Mean_Goals_A']  
X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against for differential in even strength events
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.286
Model:                            OLS   Adj. R-squared:                  0.015
Method:                 Least Squares   F-statistic:                     1.054
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.430
Time:                        20:28:33   Log-Likelihood:                 5.3080
No. Observations:                  30   AIC:                             7.384
Df Residuals:                      21   BIC:                             19.99
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------------------------------------------

In [365]:
#y = de['Mean_Goals_A']  
#X = sm.add_constant(de[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

## 3) all on-ice events prior to a goal data

- keep only goals and create a column that will display the number of goal per game.

In [366]:
df = da[['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'VTeamCode', 'HTeamCode', 'EventTeamCode']]
dg = df[df['EventType'] == 'GOAL']
dg['Goal'] = dg.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else 0, axis=1)
dg['GoalNumber'] = dg.groupby(['Season', 'GameNumber']).cumcount()+1
dg.head()
dg = dg[['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode', 'GoalNumber']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


- merge dg onto df to display the goal number per game. Group by season, game number and period to backwardfill advantage type and goal number.

In [367]:
df = pd.merge(df, dg, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode'], how='left')
df['AdvantageType'] = df.groupby(['Season', 'GameNumber'])['AdvantageType'].apply(lambda x: x.bfill())
df['GoalNumber'] = df.groupby(['Season', 'GameNumber', 'Period'])['GoalNumber'].apply(lambda x: x.bfill())
df.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0


In [368]:
df.isnull().sum()

Season                    0
GameNumber                0
EventNumber               0
AdvantageType             0
Period                    0
EventType                 0
EventTimeFromZero         0
EventDetail               0
VTeamCode                 0
HTeamCode                 0
EventTeamCode             0
GoalNumber           131354
dtype: int64

In [369]:
df.shape

(310113, 12)

- On-ice events that occured in a different period from a goal or after a goal are excluded from the dataframe.

In [370]:
df = df.dropna(subset=['GoalNumber'])
df = df.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
df = df.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
df.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0


In [371]:
df.shape

(178759, 12)

- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [372]:
df['Goal'] = df.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
df['Block'] = df.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
df['Faceoff'] = df.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
df['Giveaway'] = df.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
df['Hit'] = df.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
df['Miss'] = df.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
df['Penalty'] = df.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
df['Shot'] = df.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
df['Takeaway'] = df.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [373]:
df['Blocks'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
df['Faceoffs'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
df['Giveaways'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
df['Goals'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
df['Hits'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
df['Misses'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
df['Penalties'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
df['Shots'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
df['Takeaways'] = df.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape dataframe from wide to long

In [374]:
df = df.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in df.columns if 'TeamCode' in col]
df = pd.lreshape(df, {'TeamCode' : a})
df = df.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
df = df.rename(columns={'EventTeam': 'EventTeamCode'})
df.head()

Unnamed: 0,AdvantageType,Block,Blocks,EventDetail,EventNumber,EventTeamCode,EventTimeFromZero,EventType,Faceoff,Faceoffs,GameNumber,Giveaway,Giveaways,Goal,GoalNumber,Goals,Hit,Hits,Miss,Misses,Penalties,Penalty,Period,Season,Shot,Shots,Takeaway,Takeaways,TeamCode
0,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,8.0,20001,,,,1.0,,,,,,,,1,2010,,,,,MTL
178759,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,8.0,20001,,,,1.0,,,,,,,,1,2010,,,,,TOR
1,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,20001,,,,1.0,,1.0,11.0,,,,,1,2010,,,,,MTL
178760,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,20001,,,,1.0,,1.0,11.0,,,,,1,2010,,,,,TOR
2,EV,,,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",4,MTL,46,HIT,,,20001,,,,1.0,,1.0,12.0,,,,,1,2010,,,,,MTL


In [375]:
df.shape

(357518, 29)

- drop duplicates by season, game number, team code and event type

In [376]:
df = df.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
df = df [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
df = df.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
df.shape

(42212, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [377]:
df['Blocks_F'] = df.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Blocks_A'] = df.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Faceoffs_F'] = df.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Faceoffs_A'] = df.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Giveaways_F'] = df.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Giveaways_A'] = df.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Goals_F'] = df.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Goals_A'] = df.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Hits_F'] = df.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Hits_A'] = df.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Miss_F'] = df.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Miss_A'] = df.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Penalties_F'] = df.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Penalties_A'] = df.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Shots_F'] = df.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Shots_A'] = df.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df['Takeaways_F'] = df.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
df['Takeaways_A'] = df.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
df = df.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backfill and forward fill on-ice events by season, game number and team code. 

In [378]:
df['Blocks_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
df['Faceoffs_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
df['Giveaways_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
df['Goals_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
df['Hits_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
df['Miss_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
df['Penalties_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
df['Shots_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
df['Takeaways_F'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
df['Blocks_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
df['Faceoffs_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
df['Giveaways_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
df['Goals_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
df['Hits_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
df['Miss_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
df['Penalties_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
df['Shots_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
df['Takeaways_A'] = df.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())
df = df.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [379]:
df = df[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
df = df.sort_values(['Season', 'GameNumber'], ascending=[True, True])
df = df.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
df.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,4.0,5.0,8.0,4.0,4.0,4.0,2.0,3.0,12.0,11.0,3.0,1.0,,,7.0,6.0,2.0,1.0
178759,2010,20001,TOR,5.0,4.0,4.0,8.0,4.0,4.0,3.0,2.0,11.0,12.0,1.0,3.0,,,6.0,7.0,1.0,2.0
77,2010,20002,PHI,4.0,3.0,5.0,16.0,2.0,3.0,3.0,2.0,10.0,13.0,7.0,7.0,3.0,2.0,8.0,9.0,,1.0
178836,2010,20002,PIT,3.0,4.0,16.0,5.0,3.0,2.0,2.0,3.0,13.0,10.0,7.0,7.0,2.0,3.0,9.0,8.0,1.0,
175,2010,20003,CAR,16.0,19.0,27.0,48.0,11.0,11.0,4.0,3.0,12.0,17.0,9.0,7.0,5.0,5.0,22.0,24.0,3.0,8.0


In [380]:
df.shape

(2444, 21)

- **merge all on-ice events (df) onto  team roster player rank dataframe (dc) to create new dataframe (dh).**

In [381]:
dh = pd.merge(dc, df, on=['Season', 'GameNumber', 'TeamCode'], how='left')
dh = dh.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dh = dh.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dh.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,4.0,5.0,8.0,4.0,4.0,4.0,2.0,3.0,12.0,11.0,3.0,1.0,,,7.0,6.0,2.0,1.0
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,5.0,4.0,4.0,8.0,4.0,4.0,3.0,2.0,11.0,12.0,1.0,3.0,,,6.0,7.0,1.0,2.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,4.0,3.0,5.0,16.0,2.0,3.0,3.0,2.0,10.0,13.0,7.0,7.0,3.0,2.0,8.0,9.0,,1.0
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,3.0,4.0,16.0,5.0,3.0,2.0,2.0,3.0,13.0,10.0,7.0,7.0,2.0,3.0,9.0,8.0,1.0,
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,16.0,19.0,27.0,48.0,11.0,11.0,4.0,3.0,12.0,17.0,9.0,7.0,5.0,5.0,22.0,24.0,3.0,8.0


In [382]:
dh.shape

(2030, 34)

- create column for team win and team loss.

In [383]:
dh['TeamWin'] =  dh.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dh['TeamLos'] =  dh.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [384]:
dh['GP'] = dh.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
dh['GW'] = dh.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
dh['GL'] = dh.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
dh['GF'] = dh.groupby(['Season','TeamCode'])['GF'].transform('sum')
dh['GA'] = dh.groupby(['Season','TeamCode'])['GA'].transform('sum')
dh['Blocks_F'] = dh.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
dh['Faceoffs_F'] = dh.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
dh['Giveaways_F'] = dh.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
dh['Goals_F'] = dh.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
dh['Hits_F'] = dh.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
dh['Miss_F'] = dh.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
dh['Penalties_F'] = dh.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
dh['Shots_F'] = dh.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
dh['Takeaways_F'] = dh.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
dh['Blocks_A'] = dh.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
dh['Faceoffs_A'] = dh.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
dh['Giveaways_A'] = dh.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
dh['Goals_A'] = dh.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
dh['Hits_A'] = dh.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
dh['Miss_A'] = dh.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
dh['Penalties_A'] = dh.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
dh['Shots_A'] = dh.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
dh['Takeaways_A'] = dh.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
dh.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,523.0,567.0,1114.0,1115.0,355.0,290.0,179.0,171.0,793.0,851.0,464.0,400.0,191.0,187.0,1138.0,1023.0,249.0,222.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,675.0,651.0,1189.0,1158.0,458.0,437.0,183.0,203.0,1045.0,973.0,500.0,531.0,191.0,197.0,1053.0,1142.0,281.0,303.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,732.0,654.0,1274.0,1361.0,343.0,342.0,225.0,188.0,995.0,957.0,513.0,553.0,205.0,203.0,1213.0,1163.0,309.0,310.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,481.0,538.0,1039.0,1045.0,216.0,222.0,184.0,169.0,984.0,902.0,419.0,361.0,211.0,205.0,1036.0,963.0,174.0,195.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,601.0,671.0,1125.0,1394.0,296.0,343.0,208.0,205.0,1120.0,864.0,533.0,587.0,153.0,206.0,1131.0,1271.0,356.0,309.0,1,0,76,38,35


- display wins and losses per team for the season.

In [385]:
dh['L'] = dh.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dh['W'] = dh.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dh.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,523.0,567.0,1114.0,1115.0,355.0,290.0,179.0,171.0,793.0,851.0,464.0,400.0,191.0,187.0,1138.0,1023.0,249.0,222.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,675.0,651.0,1189.0,1158.0,458.0,437.0,183.0,203.0,1045.0,973.0,500.0,531.0,191.0,197.0,1053.0,1142.0,281.0,303.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,732.0,654.0,1274.0,1361.0,343.0,342.0,225.0,188.0,995.0,957.0,513.0,553.0,205.0,203.0,1213.0,1163.0,309.0,310.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,481.0,538.0,1039.0,1045.0,216.0,222.0,184.0,169.0,984.0,902.0,419.0,361.0,211.0,205.0,1036.0,963.0,174.0,195.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,601.0,671.0,1125.0,1394.0,296.0,343.0,208.0,205.0,1120.0,864.0,533.0,587.0,153.0,206.0,1131.0,1271.0,356.0,309.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [386]:
dh = dh.drop_duplicates(['Season', 'TeamCode'])
dh['WinPc'] = dh['W']/ dh['GP']
dh['LossPc'] = dh['L']/ dh['GP']
dh['Mean_Blocks_F'] = dh['Blocks_F']/ dh['GP']
dh['Mean_Faceoffs_F'] = dh['Faceoffs_F']/ dh['GP']
dh['Mean_Giveaways_F'] = dh['Giveaways_F']/ dh['GP']
dh['Mean_Goals_F'] = dh['Goals_F']/ dh['GP']
dh['Mean_Hits_F'] = dh['Hits_F']/ dh['GP']
dh['Mean_Miss_F'] = dh['Miss_F']/ dh['GP']
dh['Mean_Penalties_F'] = dh['Penalties_F']/ dh['GP']
dh['Mean_Shots_F'] = dh['Shots_F']/ dh['GP']
dh['Mean_Takeaways_F'] = dh['Takeaways_F']/ dh['GP']
dh['Mean_Blocks_A'] = dh['Blocks_A']/ dh['GP']
dh['Mean_Faceoffs_A'] = dh['Faceoffs_A']/ dh['GP']
dh['Mean_Giveaways_A'] = dh['Giveaways_A']/ dh['GP']
dh['Mean_Goals_A'] = dh['Goals_A']/ dh['GP']
dh['Mean_Hits_A'] = dh['Hits_A']/ dh['GP']
dh['Mean_Miss_A'] = dh['Miss_A']/ dh['GP']
dh['Mean_Penalties_A'] = dh['Penalties_A']/ dh['GP']
dh['Mean_Shots_A'] = dh['Shots_A']/ dh['GP']
dh['Mean_Takeaways_A'] = dh['Takeaways_A']/ dh['GP']
dh.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,523.0,567.0,1114.0,1115.0,355.0,290.0,179.0,171.0,793.0,851.0,464.0,400.0,191.0,187.0,1138.0,1023.0,249.0,222.0,0,1,68,34,31,31,37,0.544118,0.455882,7.691176,16.382353,5.220588,2.632353,11.661765,6.823529,2.808824,16.735294,3.661765,8.338235,16.397059,4.264706,2.514706,12.514706,5.882353,2.75,15.044118,3.264706
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,675.0,651.0,1189.0,1158.0,458.0,437.0,183.0,203.0,1045.0,973.0,500.0,531.0,191.0,197.0,1053.0,1142.0,281.0,303.0,1,0,70,34,31,36,34,0.485714,0.514286,9.642857,16.985714,6.542857,2.614286,14.928571,7.142857,2.728571,15.042857,4.014286,9.3,16.542857,6.242857,2.9,13.9,7.585714,2.814286,16.314286,4.328571
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,732.0,654.0,1274.0,1361.0,343.0,342.0,225.0,188.0,995.0,957.0,513.0,553.0,205.0,203.0,1213.0,1163.0,309.0,310.0,1,0,72,41,31,31,41,0.569444,0.430556,10.166667,17.694444,4.763889,3.125,13.819444,7.125,2.847222,16.847222,4.291667,9.083333,18.902778,4.75,2.611111,13.291667,7.680556,2.819444,16.152778,4.305556
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,481.0,538.0,1039.0,1045.0,216.0,222.0,184.0,169.0,984.0,902.0,419.0,361.0,211.0,205.0,1036.0,963.0,174.0,195.0,0,1,71,41,31,31,40,0.56338,0.43662,6.774648,14.633803,3.042254,2.591549,13.859155,5.901408,2.971831,14.591549,2.450704,7.577465,14.71831,3.126761,2.380282,12.704225,5.084507,2.887324,13.56338,2.746479
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,601.0,671.0,1125.0,1394.0,296.0,343.0,208.0,205.0,1120.0,864.0,533.0,587.0,153.0,206.0,1131.0,1271.0,356.0,309.0,1,0,76,38,35,38,38,0.5,0.5,7.907895,14.802632,3.894737,2.736842,14.736842,7.013158,2.013158,14.881579,4.684211,8.828947,18.342105,4.513158,2.697368,11.368421,7.723684,2.710526,16.723684,4.065789


In [387]:
dh = dh[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
dh['Rank_W'] = dh.groupby(['Season'])['WinPc'].rank(ascending=False)
dh = dh.sort_values(['Season', 'Rank_W'], ascending=[True, True])
dh.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,7.164384,18.246575,3.657534,3.082192,12.424658,6.753425,2.561644,15.671233,4.027397,8.260274,14.219178,4.09589,2.219178,12.164384,6.123288,2.630137,15.123288,3.575342,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,8.092308,18.569231,5.723077,2.969231,13.076923,7.153846,2.769231,18.246154,5.0,8.969231,15.707692,5.815385,2.292308,14.507692,6.4,2.846154,15.8,3.953846,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,8.039474,16.986842,3.605263,2.947368,11.75,6.434211,2.618421,16.828947,2.842105,8.605263,16.144737,4.789474,2.171053,13.263158,6.394737,2.618421,16.618421,4.197368,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,6.647059,19.235294,5.647059,3.073529,13.397059,8.382353,2.426471,19.147059,4.455882,8.632353,17.838235,4.808824,2.852941,14.676471,6.588235,2.602941,17.029412,4.279412,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,8.6,15.246154,4.461538,2.707692,12.630769,5.723077,2.953846,13.923077,2.784615,6.015385,16.815385,5.015385,2.692308,12.107692,8.046154,2.830769,17.492308,3.430769,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,8.027778,16.097222,4.347222,2.375,11.444444,5.875,2.430556,14.194444,3.930556,7.402778,15.111111,4.152778,2.125,11.916667,6.388889,2.111111,13.972222,3.5,6.0
306,2010,LA,70,40,30,0.571429,0.428571,6.528571,15.857143,5.228571,2.514286,14.057143,6.942857,2.128571,14.5,2.785714,7.671429,14.814286,4.8,2.328571,15.771429,6.228571,2.414286,13.485714,3.042857,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,10.166667,17.694444,4.763889,3.125,13.819444,7.125,2.847222,16.847222,4.291667,9.083333,18.902778,4.75,2.611111,13.291667,7.680556,2.819444,16.152778,4.305556,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,6.774648,14.633803,3.042254,2.591549,13.859155,5.901408,2.971831,14.591549,2.450704,7.577465,14.71831,3.126761,2.380282,12.704225,5.084507,2.887324,13.56338,2.746479,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,7.849315,14.753425,2.794521,2.835616,14.561644,5.890411,2.589041,14.424658,3.712329,7.054795,15.821918,3.821918,2.287671,14.657534,5.931507,2.753425,14.191781,3.753425,10.0


In [388]:
dh.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_all_events_prior_to_a_goal_ranking.csv', index='False', sep=',')
#dh.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_all_events_prior_to_a_goal_ranking.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [389]:
dh['DBlock'] = dh['Mean_Blocks_F'] - dh['Mean_Blocks_A']
dh['DFaceoff'] = dh['Mean_Faceoffs_F'] - dh['Mean_Faceoffs_A']
dh['DGiveaway'] = dh['Mean_Giveaways_F'] - dh['Mean_Giveaways_A']
dh['DGoal'] = dh['Mean_Goals_F'] - dh['Mean_Goals_A']
dh['DHit'] = dh['Mean_Hits_F'] - dh['Mean_Hits_A']
dh['DMiss'] = dh['Mean_Miss_F'] - dh['Mean_Miss_A']
dh['DPenalty'] = dh['Mean_Penalties_F'] - dh['Mean_Penalties_A']
dh['DShot'] = dh['Mean_Shots_F'] - dh['Mean_Shots_A']
dh['DTakeaway'] = dh['Mean_Takeaways_F'] - dh['Mean_Takeaways_A']

## all on-ice events prior to a goal analysis

### summary analysis

In [390]:
dh.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,7.872545,16.358326,4.526607,2.650775,12.696254,6.517062,2.570491,15.512804,3.982055,7.848034,16.369219,4.511334,2.666309,12.802948,6.508612,2.574837,15.447002,3.981087,15.5,0.024512,-0.010893,0.015273,-0.015534,-0.106694,0.00845,-0.004346,0.065802,0.000968
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,1.04709,1.386059,1.084054,0.286535,1.752179,0.794751,0.388352,1.608098,0.920777,0.853982,1.517137,0.927445,0.339782,1.810818,0.857579,0.35158,1.373545,0.745571,8.802429,0.946901,1.650123,0.641598,0.501318,1.863925,0.912101,0.244498,1.761252,0.573177
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,4.955224,12.850746,2.69863,1.925373,9.343284,4.298507,1.820896,12.074627,2.450704,5.462687,12.119403,2.492537,2.125,8.328358,3.895522,1.731343,11.074627,2.746479,1.0,-1.985294,-4.573529,-1.19697,-0.984375,-4.673077,-2.323077,-0.697368,-3.569231,-1.355263
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,7.574709,15.573052,3.82151,2.437395,11.683824,5.89316,2.304534,14.515845,3.449853,7.459584,15.260256,3.924708,2.412077,11.692402,6.086714,2.358571,14.927885,3.443077,8.25,-0.626741,-0.69,-0.348298,-0.359649,-1.39293,-0.545139,-0.158499,-1.123239,-0.309658
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,7.878605,16.432158,4.515491,2.626036,12.576254,6.626533,2.635297,15.42663,3.972421,7.898356,16.578846,4.730263,2.671047,12.518891,6.397368,2.670332,15.545641,3.93467,15.5,-0.053477,-0.007353,0.030382,0.134581,0.212862,0.119737,0.036616,0.390411,0.013056
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,8.349296,17.161094,5.226576,2.845268,13.849227,7.013706,2.807375,16.596549,4.480168,8.36313,17.341071,4.982639,2.888235,14.35,7.109714,2.818809,16.57411,4.322817,22.75,0.616637,0.819079,0.370168,0.242647,1.02516,0.743956,0.099594,1.308064,0.427917
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,10.166667,19.235294,7.25,3.125,16.931507,8.382353,3.342857,19.147059,6.085714,9.3,18.902778,6.720588,3.5,16.596154,8.046154,3.132353,17.492308,5.453125,30.0,2.584615,4.027397,1.820896,0.863014,4.342466,1.794118,0.43662,2.552632,1.046154


#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [391]:
print ('win percent for all on-ice events prior to a goal')
y = dh['WinPc']  
X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent for all on-ice events prior to a goal
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.938
Model:                            OLS   Adj. R-squared:                  0.837
Method:                 Least Squares   F-statistic:                     9.259
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           0.000293
Time:                        20:29:57   Log-Likelihood:                 70.451
No. Observations:                  30   AIC:                            -102.9
Df Residuals:                      11   BIC:                            -76.28
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------

In [392]:
print ('win percent for all on-ice events prior to a goal')
y = dh['WinPc']  
X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print(result.summary())

win percent for all on-ice events prior to a goal
Optimization terminated successfully.
         Current function value: 0.661202
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04608
Time:                        20:29:57   Log-Likelihood:                -19.836
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const                0.5691      8.055      0.

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [393]:
print ('win percent for all on-ice events differential prior to a goal')
y = dh['WinPc']  
X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent for all on-ice events differential prior to a goal
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.879
Model:                            OLS   Adj. R-squared:                  0.825
Method:                 Least Squares   F-statistic:                     16.20
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           2.38e-07
Time:                        20:29:57   Log-Likelihood:                 60.448
No. Observations:                  30   AIC:                            -100.9
Df Residuals:                      20   BIC:                            -86.88
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------

In [394]:
print ('win percent for all on-ice events differential prior to a goal')
y = dh['WinPc']  
X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent for all on-ice events differential prior to a goal
Optimization terminated successfully.
         Current function value: 0.663215
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04317
Time:                        20:29:58   Log-Likelihood:                -19.896
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9943
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0011      0.372      0.003  

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [395]:
print ('mean goals for in all on-ice events prior to a goal')
y = dh['Mean_Goals_F']  
X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in all on-ice events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.732
Model:                            OLS   Adj. R-squared:                  0.403
Method:                 Least Squares   F-statistic:                     2.223
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0763
Time:                        20:29:58   Log-Likelihood:                 15.206
No. Observations:                  30   AIC:                             3.588
Df Residuals:                      13   BIC:                             27.41
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------------------

In [396]:
#y = dh['Mean_Goals_F']  
#X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [397]:
print ('mean goals for in all on-ice events differential prior to a goal')
y = dh['Mean_Goals_F']  
X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in all on-ice events differential prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.371
Model:                            OLS   Adj. R-squared:                  0.132
Method:                 Least Squares   F-statistic:                     1.550
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.200
Time:                        20:29:58   Log-Likelihood:                 2.3979
No. Observations:                  30   AIC:                             13.20
Df Residuals:                      21   BIC:                             25.82
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------

In [398]:
#y = dh['Mean_Goals_F']  
#X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [399]:
print ('mean goals against in all on-ice events prior to a goal')
y = dh['Mean_Goals_A']  
X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against in all on-ice events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.789
Model:                            OLS   Adj. R-squared:                  0.529
Method:                 Least Squares   F-statistic:                     3.033
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0248
Time:                        20:29:58   Log-Likelihood:                 13.644
No. Observations:                  30   AIC:                             6.713
Df Residuals:                      13   BIC:                             30.53
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------

In [400]:
#y = dh['Mean_Goals_A']  
#X = sm.add_constant(dh[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [401]:
print ('mean goals against in all on-ice events differential prior to a goal')
y = dh['Mean_Goals_A']  
X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against in all on-ice events differential prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.056
Method:                 Least Squares   F-statistic:                     1.214
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.339
Time:                        20:29:58   Log-Likelihood:                -3.9758
No. Observations:                  30   AIC:                             25.95
Df Residuals:                      21   BIC:                             38.56
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------

In [402]:
#y = dh['Mean_Goals_A']  
#X = sm.add_constant(dh[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

## 4) even strength on-ice events prior to a goal data

- keep on-ice even strenghth events that happened prior to a goal.

In [403]:
di = da[['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'VTeamCode', 'HTeamCode', 'EventTeamCode']]
di = pd.merge(di, dg, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode'], how='left')
di['AdvantageType'] = di.groupby(['Season', 'GameNumber'])['AdvantageType'].apply(lambda x: x.bfill())
di['GoalNumber'] = di.groupby(['Season', 'GameNumber', 'Period'])['GoalNumber'].apply(lambda x: x.bfill())
di = di[di['AdvantageType'] == 'EV']
di.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0


- Even strength on-ice events that occured in a different period from a goal or after a goal are excluded from the dataframe.

In [404]:
di = di.dropna(subset=['GoalNumber'])
di = di.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
di = di.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
di.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0


In [405]:
di.shape

(147273, 12)

- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [406]:
di['Goal'] = di.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
di['Block'] = di.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
di['Faceoff'] = di.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
di['Giveaway'] = di.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
di['Hit'] = di.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
di['Miss'] = di.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
di['Penalty'] = di.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
di['Shot'] = di.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
di['Takeaway'] = di.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [407]:
di['Blocks'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
di['Faceoffs'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
di['Giveaways'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
di['Goals'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
di['Hits'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
di['Misses'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
di['Penalties'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
di['Shots'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
di['Takeaways'] = di.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape data wide to long

In [408]:
di = di.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in di.columns if 'TeamCode' in col]
di = pd.lreshape(di, {'TeamCode' : a})
di = di.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
di = di.rename(columns={'EventTeam': 'EventTeamCode'})
di.head()

Unnamed: 0,AdvantageType,Block,Blocks,EventDetail,EventNumber,EventTeamCode,EventTimeFromZero,EventType,Faceoff,Faceoffs,GameNumber,Giveaway,Giveaways,Goal,GoalNumber,Goals,Hit,Hits,Miss,Misses,Penalties,Penalty,Period,Season,Shot,Shots,Takeaway,Takeaways,TeamCode
0,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,8.0,20001,,,,1.0,,,,,,,,1,2010,,,,,MTL
147273,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,8.0,20001,,,,1.0,,,,,,,,1,2010,,,,,TOR
1,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,20001,,,,1.0,,1.0,11.0,,,,,1,2010,,,,,MTL
147274,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,20001,,,,1.0,,1.0,11.0,,,,,1,2010,,,,,TOR
2,EV,,,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",4,MTL,46,HIT,,,20001,,,,1.0,,1.0,12.0,,,,,1,2010,,,,,MTL


- drop duplicates by season, game number, team code and event type.

In [409]:
di = di.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
di = di [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
di = di.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
di.shape

(41352, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [410]:
di['Blocks_F'] = di.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Blocks_A'] = di.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Faceoffs_F'] = di.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Faceoffs_A'] = di.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Giveaways_F'] = di.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Giveaways_A'] = di.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Goals_F'] = di.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Goals_A'] = di.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Hits_F'] = di.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Hits_A'] = di.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Miss_F'] = di.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Miss_A'] = di.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Penalties_F'] = di.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Penalties_A'] = di.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Shots_F'] = di.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Shots_A'] = di.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di['Takeaways_F'] = di.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
di['Takeaways_A'] = di.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
di = di.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backward and forward fill on-ice events by season, game number and team code.

In [411]:
di['Blocks_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
di['Faceoffs_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
di['Giveaways_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
di['Goals_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
di['Hits_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
di['Miss_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
di['Penalties_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
di['Shots_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
di['Takeaways_F'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
di['Blocks_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
di['Faceoffs_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
di['Giveaways_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
di['Goals_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
di['Hits_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
di['Miss_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
di['Penalties_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
di['Shots_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
di['Takeaways_A'] = di.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())
di = di.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [412]:
di = di[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
di = di.sort_values(['Season', 'GameNumber'], ascending=[True, True])
di = di.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
di.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,4.0,5.0,8.0,4.0,4.0,4.0,2.0,3.0,12.0,11.0,3.0,1.0,,,7.0,6.0,2.0,1.0
147273,2010,20001,TOR,5.0,4.0,4.0,8.0,4.0,4.0,3.0,2.0,11.0,12.0,1.0,3.0,,,6.0,7.0,1.0,2.0
77,2010,20002,PHI,2.0,3.0,2.0,10.0,1.0,1.0,1.0,1.0,7.0,11.0,7.0,2.0,3.0,2.0,7.0,7.0,,1.0
147350,2010,20002,PIT,3.0,2.0,10.0,2.0,1.0,1.0,1.0,1.0,11.0,7.0,2.0,7.0,2.0,3.0,7.0,7.0,1.0,
145,2010,20003,CAR,11.0,19.0,19.0,35.0,9.0,8.0,2.0,1.0,12.0,13.0,9.0,6.0,5.0,4.0,20.0,17.0,3.0,8.0


In [413]:
di.shape

(2444, 21)

- **merge all even strength on-ice events (di) onto  team roster player rank dataframe (dc) to create new dataframe (dj).**

In [414]:
dj = pd.merge(dc, di, on=['Season', 'GameNumber', 'TeamCode'], how='left')
dj = dj.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dj = dj.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dj.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,4.0,5.0,8.0,4.0,4.0,4.0,2.0,3.0,12.0,11.0,3.0,1.0,,,7.0,6.0,2.0,1.0
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,5.0,4.0,4.0,8.0,4.0,4.0,3.0,2.0,11.0,12.0,1.0,3.0,,,6.0,7.0,1.0,2.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,2.0,3.0,2.0,10.0,1.0,1.0,1.0,1.0,7.0,11.0,7.0,2.0,3.0,2.0,7.0,7.0,,1.0
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,3.0,2.0,10.0,2.0,1.0,1.0,1.0,1.0,11.0,7.0,2.0,7.0,2.0,3.0,7.0,7.0,1.0,
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,11.0,19.0,19.0,35.0,9.0,8.0,2.0,1.0,12.0,13.0,9.0,6.0,5.0,4.0,20.0,17.0,3.0,8.0


In [415]:
dj.shape

(2030, 34)

- create a column for team win and team loss.

In [416]:
dj['TeamWin'] =  dj.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dj['TeamLos'] =  dj.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [417]:
dj['GP'] = dj.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
dj['GW'] = dj.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
dj['GL'] = dj.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
dj['GF'] = dj.groupby(['Season','TeamCode'])['GF'].transform('sum')
dj['GA'] = dj.groupby(['Season','TeamCode'])['GA'].transform('sum')
dj['Blocks_F'] = dj.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
dj['Faceoffs_F'] = dj.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
dj['Giveaways_F'] = dj.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
dj['Goals_F'] = dj.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
dj['Hits_F'] = dj.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
dj['Miss_F'] = dj.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
dj['Penalties_F'] = dj.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
dj['Shots_F'] = dj.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
dj['Takeaways_F'] = dj.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
dj['Blocks_A'] = dj.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
dj['Faceoffs_A'] = dj.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
dj['Giveaways_A'] = dj.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
dj['Goals_A'] = dj.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
dj['Hits_A'] = dj.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
dj['Miss_A'] = dj.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
dj['Penalties_A'] = dj.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
dj['Shots_A'] = dj.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
dj['Takeaways_A'] = dj.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
dj.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,408.0,468.0,853.0,840.0,290.0,247.0,131.0,123.0,737.0,793.0,369.0,309.0,164.0,174.0,910.0,801.0,188.0,164.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,570.0,529.0,890.0,900.0,392.0,350.0,140.0,146.0,967.0,901.0,398.0,439.0,173.0,172.0,832.0,957.0,235.0,249.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,586.0,533.0,1001.0,1062.0,286.0,270.0,169.0,135.0,918.0,881.0,410.0,451.0,178.0,180.0,992.0,939.0,255.0,257.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,400.0,425.0,810.0,775.0,165.0,180.0,130.0,126.0,909.0,845.0,336.0,282.0,193.0,190.0,836.0,763.0,144.0,160.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,518.0,553.0,886.0,1115.0,230.0,286.0,152.0,155.0,1055.0,798.0,441.0,494.0,135.0,184.0,896.0,1061.0,291.0,243.0,1,0,76,38,35


- display wins and losses by team for the season. 

In [418]:
dj['L'] = dj.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dj['W'] = dj.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dj.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,408.0,468.0,853.0,840.0,290.0,247.0,131.0,123.0,737.0,793.0,369.0,309.0,164.0,174.0,910.0,801.0,188.0,164.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,570.0,529.0,890.0,900.0,392.0,350.0,140.0,146.0,967.0,901.0,398.0,439.0,173.0,172.0,832.0,957.0,235.0,249.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,586.0,533.0,1001.0,1062.0,286.0,270.0,169.0,135.0,918.0,881.0,410.0,451.0,178.0,180.0,992.0,939.0,255.0,257.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,400.0,425.0,810.0,775.0,165.0,180.0,130.0,126.0,909.0,845.0,336.0,282.0,193.0,190.0,836.0,763.0,144.0,160.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,518.0,553.0,886.0,1115.0,230.0,286.0,152.0,155.0,1055.0,798.0,441.0,494.0,135.0,184.0,896.0,1061.0,291.0,243.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [419]:
dj = dj.drop_duplicates(['Season', 'TeamCode'])
dj['WinPc'] = dj['W']/ dj['GP']
dj['LossPc'] = dj['L']/ dj['GP']
dj['Mean_Blocks_F'] = dj['Blocks_F']/ dj['GP']
dj['Mean_Faceoffs_F'] = dj['Faceoffs_F']/ dj['GP']
dj['Mean_Giveaways_F'] = dj['Giveaways_F']/ dj['GP']
dj['Mean_Goals_F'] = dj['Goals_F']/ dj['GP']
dj['Mean_Hits_F'] = dj['Hits_F']/ dj['GP']
dj['Mean_Miss_F'] = dj['Miss_F']/ dj['GP']
dj['Mean_Penalties_F'] = dj['Penalties_F']/ dj['GP']
dj['Mean_Shots_F'] = dj['Shots_F']/ dj['GP']
dj['Mean_Takeaways_F'] = dj['Takeaways_F']/ dj['GP']
dj['Mean_Blocks_A'] = dj['Blocks_A']/ dj['GP']
dj['Mean_Faceoffs_A'] = dj['Faceoffs_A']/ dj['GP']
dj['Mean_Giveaways_A'] = dj['Giveaways_A']/ dj['GP']
dj['Mean_Goals_A'] = dj['Goals_A']/ dj['GP']
dj['Mean_Hits_A'] = dj['Hits_A']/ dj['GP']
dj['Mean_Miss_A'] = dj['Miss_A']/ dj['GP']
dj['Mean_Penalties_A'] = dj['Penalties_A']/ dj['GP']
dj['Mean_Shots_A'] = dj['Shots_A']/ dj['GP']
dj['Mean_Takeaways_A'] = dj['Takeaways_A']/ dj['GP']
dj.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,408.0,468.0,853.0,840.0,290.0,247.0,131.0,123.0,737.0,793.0,369.0,309.0,164.0,174.0,910.0,801.0,188.0,164.0,0,1,68,34,31,31,37,0.544118,0.455882,6.0,12.544118,4.264706,1.926471,10.838235,5.426471,2.411765,13.382353,2.764706,6.882353,12.352941,3.632353,1.808824,11.661765,4.544118,2.558824,11.779412,2.411765
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,570.0,529.0,890.0,900.0,392.0,350.0,140.0,146.0,967.0,901.0,398.0,439.0,173.0,172.0,832.0,957.0,235.0,249.0,1,0,70,34,31,36,34,0.485714,0.514286,8.142857,12.714286,5.6,2.0,13.814286,5.685714,2.471429,11.885714,3.357143,7.557143,12.857143,5.0,2.085714,12.871429,6.271429,2.457143,13.671429,3.557143
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,586.0,533.0,1001.0,1062.0,286.0,270.0,169.0,135.0,918.0,881.0,410.0,451.0,178.0,180.0,992.0,939.0,255.0,257.0,1,0,72,41,31,31,41,0.569444,0.430556,8.138889,13.902778,3.972222,2.347222,12.75,5.694444,2.472222,13.777778,3.541667,7.402778,14.75,3.75,1.875,12.236111,6.263889,2.5,13.041667,3.569444
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,400.0,425.0,810.0,775.0,165.0,180.0,130.0,126.0,909.0,845.0,336.0,282.0,193.0,190.0,836.0,763.0,144.0,160.0,0,1,71,41,31,31,40,0.56338,0.43662,5.633803,11.408451,2.323944,1.830986,12.802817,4.732394,2.71831,11.774648,2.028169,5.985915,10.915493,2.535211,1.774648,11.901408,3.971831,2.676056,10.746479,2.253521
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,518.0,553.0,886.0,1115.0,230.0,286.0,152.0,155.0,1055.0,798.0,441.0,494.0,135.0,184.0,896.0,1061.0,291.0,243.0,1,0,76,38,35,38,38,0.5,0.5,6.815789,11.657895,3.026316,2.0,13.881579,5.802632,1.776316,11.789474,3.828947,7.276316,14.671053,3.763158,2.039474,10.5,6.5,2.421053,13.960526,3.197368


In [420]:
dj = dj[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
dj['Rank_W'] = dj.groupby(['Season'])['WinPc'].rank(ascending=False)
dj = dj.sort_values(['Season', 'Rank_W'], ascending=[True, True])
dj.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,5.931507,13.945205,2.945205,2.109589,11.643836,5.328767,2.232877,12.643836,3.315068,6.712329,10.90411,3.232877,1.712329,11.39726,4.986301,2.410959,11.863014,3.109589,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,6.646154,13.846154,4.753846,2.092308,12.492308,5.569231,2.507692,14.215385,4.153846,7.015385,12.061538,4.738462,1.553846,13.584615,5.123077,2.584615,12.892308,3.369231,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,6.710526,13.631579,3.026316,2.263158,10.921053,5.289474,2.407895,13.894737,2.434211,7.276316,12.947368,4.105263,1.565789,12.421053,5.302632,2.342105,13.894737,3.618421,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,5.441176,15.102941,4.647059,2.25,12.426471,6.588235,2.088235,15.397059,3.632353,6.955882,14.161765,4.102941,2.132353,13.75,5.382353,2.220588,14.0,3.470588,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,6.615385,11.261538,3.738462,1.830769,11.676923,4.523077,2.692308,10.692308,2.307692,4.769231,12.830769,4.092308,1.969231,11.061538,6.169231,2.538462,13.738462,2.923077,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,6.652778,12.25,3.541667,1.75,10.222222,4.611111,2.222222,11.194444,3.166667,6.125,11.986111,3.291667,1.611111,11.097222,5.208333,1.777778,11.222222,3.027778,6.0
306,2010,LA,70,40,30,0.571429,0.428571,5.428571,12.228571,4.057143,1.871429,12.885714,5.385714,1.957143,11.314286,2.442857,6.242857,11.342857,3.9,1.771429,14.628571,5.057143,2.214286,10.8,2.657143,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,8.138889,13.902778,3.972222,2.347222,12.75,5.694444,2.472222,13.777778,3.541667,7.402778,14.75,3.75,1.875,12.236111,6.263889,2.5,13.041667,3.569444,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,5.633803,11.408451,2.323944,1.830986,12.802817,4.732394,2.71831,11.774648,2.028169,5.985915,10.915493,2.535211,1.774648,11.901408,3.971831,2.676056,10.746479,2.253521,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,6.287671,11.383562,2.082192,2.041096,13.273973,4.794521,2.383562,11.547945,3.041096,5.671233,12.219178,3.164384,1.739726,13.534247,4.739726,2.452055,11.561644,3.054795,10.0


In [421]:
dj.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_even_strength_events_prior_to_a_goal_ranking.csv', index='False', sep=',')
#dj.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_even_strength_events_prior_to_a_goal_ranking.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [422]:
dj['DBlock'] = dj['Mean_Blocks_F'] - dj['Mean_Blocks_A']
dj['DFaceoff'] = dj['Mean_Faceoffs_F'] - dj['Mean_Faceoffs_A']
dj['DGiveaway'] = dj['Mean_Giveaways_F'] - dj['Mean_Giveaways_A']
dj['DGoal'] = dj['Mean_Goals_F'] - dj['Mean_Goals_A']
dj['DHit'] = dj['Mean_Hits_F'] - dj['Mean_Hits_A']
dj['DMiss'] = dj['Mean_Miss_F'] - dj['Mean_Miss_A']
dj['DPenalty'] = dj['Mean_Penalties_F'] - dj['Mean_Penalties_A']
dj['DShot'] = dj['Mean_Shots_F'] - dj['Mean_Shots_A']
dj['DTakeaway'] = dj['Mean_Takeaways_F'] - dj['Mean_Takeaways_A']

## event strength on-ice events prior to a goal analysis

- summary analysis

In [423]:
dj.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,6.434122,12.733994,3.704219,1.949343,11.800225,5.259087,2.28848,12.544971,3.329511,6.414447,12.7346,3.690448,1.961169,11.889143,5.257104,2.295284,12.49019,3.326184,15.5,0.019674,-0.000606,0.013771,-0.011826,-0.088919,0.001982,-0.006804,0.054781,0.003327
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,0.843849,1.081281,0.906519,0.196917,1.605917,0.615625,0.365692,1.326821,0.766616,0.711246,1.195505,0.748857,0.244096,1.72869,0.706781,0.326422,1.161162,0.650816,8.802429,0.731297,1.277561,0.57244,0.360516,1.743549,0.674102,0.256874,1.469623,0.490756
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,4.223881,10.313433,2.082192,1.537313,8.656716,3.38806,1.596154,10.19403,2.028169,4.41791,9.940299,2.104478,1.553846,7.791045,3.208955,1.597015,9.104478,2.253521,1.0,-1.514706,-3.588235,-1.082192,-0.605634,-4.211538,-1.646154,-0.644737,-3.046154,-1.184211
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,5.957746,12.119643,3.048722,1.825339,10.85894,4.800116,2.022059,11.422198,2.844814,6.107337,12.036218,3.181507,1.772233,10.949208,4.869815,2.129658,11.799348,2.919125,8.25,-0.428426,-0.586607,-0.337302,-0.278807,-1.265724,-0.581647,-0.163688,-0.867411,-0.246479
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,6.41453,12.806471,3.734615,1.89895,11.676697,5.33746,2.366922,12.686204,3.240868,6.543016,12.901174,3.763158,1.920433,11.592421,5.255482,2.416006,12.657308,3.204318,15.5,-0.008564,0.148182,0.023317,0.058472,0.171196,0.147523,-0.000433,0.507143,0.058368
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,6.801236,13.622723,4.193662,2.088462,12.789613,5.66,2.493056,13.425134,3.672846,6.86631,13.422308,4.104683,2.084834,13.368542,5.684856,2.511364,13.314648,3.609649,22.75,0.507675,0.636397,0.458696,0.250641,0.878183,0.470693,0.144796,1.074206,0.338039
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,8.142857,15.102941,5.823529,2.347222,15.643836,6.588235,3.057143,15.397059,5.0,7.557143,14.75,5.323529,2.453125,15.653846,6.5,2.779412,14.0,4.588235,30.0,1.846154,3.041096,1.447761,0.697368,3.972603,1.205882,0.444444,2.394737,0.903846


#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [424]:
print ('win percent in even stregth events prior to a goal')
y = dj['WinPc']  
X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.899
Model:                            OLS   Adj. R-squared:                  0.733
Method:                 Least Squares   F-statistic:                     5.434
Date:                Fri, 19 Jan 2018   Prob (F-statistic):            0.00329
Time:                        20:31:13   Log-Likelihood:                 63.096
No. Observations:                  30   AIC:                            -88.19
Df Residuals:                      11   BIC:                            -61.57
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------

In [425]:
print ('win percent in even stregth events prior to a goal')
y = dj['WinPc']  
X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent in even stregth events prior to a goal
Optimization terminated successfully.
         Current function value: 0.662584
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.04408
Time:                        20:31:13   Log-Likelihood:                -19.878
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const                1.7404     10.101      0

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [426]:
print ('win percent for differntial in even stregth events prior to a goal')
y = dj['WinPc']  
X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent for differntial in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.801
Model:                            OLS   Adj. R-squared:                  0.711
Method:                 Least Squares   F-statistic:                     8.924
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           2.70e-05
Time:                        20:31:13   Log-Likelihood:                 52.910
No. Observations:                  30   AIC:                            -85.82
Df Residuals:                      20   BIC:                            -71.81
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
----------------------------------------------------------------

In [427]:
print ('win percent for differntial in even stregth events prior to a goal')
y = dj['WinPc']  
X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent for differntial in even stregth events prior to a goal
Optimization terminated successfully.
         Current function value: 0.665888
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.03931
Time:                        20:31:13   Log-Likelihood:                -19.977
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9960
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.0010      0.372     -0.0

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [428]:
print ('mean goals for in even stregth events prior to a goal')
y = dj['Mean_Goals_F']  
X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.757
Model:                            OLS   Adj. R-squared:                  0.459
Method:                 Least Squares   F-statistic:                     2.537
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0485
Time:                        20:31:13   Log-Likelihood:                 27.937
No. Observations:                  30   AIC:                            -21.87
Df Residuals:                      13   BIC:                             1.946
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------------------------------------------

In [429]:
#y = dj['Mean_Goals_F']  
#X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [430]:
print ('mean goals for for differntial in even stregth events prior to a goal')
y = dj['Mean_Goals_F']  
X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for for differntial in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.306
Model:                            OLS   Adj. R-squared:                  0.042
Method:                 Least Squares   F-statistic:                     1.158
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.368
Time:                        20:31:13   Log-Likelihood:                 12.170
No. Observations:                  30   AIC:                            -6.341
Df Residuals:                      21   BIC:                             6.270
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------

In [431]:
#y = dj['Mean_Goals_F']  
#X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [432]:
print ('mean goals against in even stregth events prior to a goal')
y = dj['Mean_Goals_A']  
X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.811
Model:                            OLS   Adj. R-squared:                  0.578
Method:                 Least Squares   F-statistic:                     3.480
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0142
Time:                        20:31:13   Log-Likelihood:                 25.213
No. Observations:                  30   AIC:                            -16.43
Df Residuals:                      13   BIC:                             7.394
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------------

In [433]:
#y = dj['Mean_Goals_A']  
#X = sm.add_constant(dj[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [434]:
print ('mean goals against for differntial in even stregth events prior to a goal')
y = dj['Mean_Goals_A']  
X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against for differntial in even stregth events prior to a goal
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.248
Model:                            OLS   Adj. R-squared:                 -0.039
Method:                 Least Squares   F-statistic:                    0.8640
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.561
Time:                        20:31:13   Log-Likelihood:                 4.5143
No. Observations:                  30   AIC:                             8.971
Df Residuals:                      21   BIC:                             21.58
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------

In [435]:
#y = dj['Mean_Goals_A']  
#X = sm.add_constant(dj[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

## 5) all on-ice events when score differential is between -1 and 1

- display the home goal number and visitor goal number by game number and season. Keep all on-ice events that happened prior to a goal when the score differential was between -1 and 1. Exclude all other events.

In [436]:
dz = dg[dg['EventTeamCode'] == dg['HTeamCode']]
dz['HGoalNumber'] = dz.groupby(['Season', 'GameNumber']).cumcount()+1
dy = dg[dg['EventTeamCode'] == dg['VTeamCode']]
dy['VGoalNumber'] = dy.groupby(['Season', 'GameNumber']).cumcount()+1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


- merge visitor goal number dataframe (dy) and home goal number dataframe (dz) onto goal dataframe (dg). 

In [437]:
dg = pd.merge(dg, dy, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode', 'GoalNumber'], how='left')
dg = pd.merge(dg, dz, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode', 'GoalNumber'], how='left')

- forward fill home goal number and visitor game number by season and game number. Fill in 'NaN' values with zero for home and visitor game number.

In [438]:
dg['HGoalNumber'] = dg.groupby(['Season', 'GameNumber'])['HGoalNumber'].apply(lambda x: x.ffill())
dg['VGoalNumber'] = dg.groupby(['Season', 'GameNumber'])['VGoalNumber'].apply(lambda x: x.ffill())
dg['VGoalNumber'] = dg['VGoalNumber'].fillna(0)
dg['HGoalNumber'] = dg['HGoalNumber'].fillna(0)

- keep on-ice events that happened prior to a goal while score differential is between -1 and 1

In [439]:
dk = da[['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'VTeamCode', 'HTeamCode', 'EventTeamCode']]
dk = pd.merge(dk, dg, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode'], how='left')
dk['AdvantageType'] = dk.groupby(['Season', 'GameNumber'])['AdvantageType'].apply(lambda x: x.bfill())
dk['GoalNumber'] = dk.groupby(['Season', 'GameNumber', 'Period'])['GoalNumber'].apply(lambda x: x.bfill())
dk['HGoalNumber'] = dk.groupby(['Season', 'GameNumber', 'Period'])['HGoalNumber'].apply(lambda x: x.bfill())
dk['VGoalNumber'] = dk.groupby(['Season', 'GameNumber', 'Period'])['VGoalNumber'].apply(lambda x: x.bfill())
dk['GD'] = dk['HGoalNumber'] - dk['VGoalNumber']
dk = dk[(dk['GD'] >= -1) & (dk['GD'] <= 1)]
dk.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber,VGoalNumber,HGoalNumber,GD
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0,0.0,1.0,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0


In [440]:
dk.shape

(111469, 15)

- On-ice events that occured in a different period from a goal or after a goal are excluded from the dataframe.

In [441]:
dk = dk.dropna(subset=['GoalNumber'])
dk = dk.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dk = dk.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
dk.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber,VGoalNumber,HGoalNumber,GD
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0,0.0,1.0,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0


In [442]:
dk.shape

(111469, 15)

- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [443]:
dk['Goal'] = dk.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
dk['Block'] = dk.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
dk['Faceoff'] = dk.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
dk['Giveaway'] = dk.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
dk['Hit'] = dk.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
dk['Miss'] = dk.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
dk['Penalty'] = dk.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
dk['Shot'] = dk.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
dk['Takeaway'] = dk.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [444]:
dk['Blocks'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
dk['Faceoffs'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
dk['Giveaways'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
dk['Goals'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
dk['Hits'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
dk['Misses'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
dk['Penalties'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
dk['Shots'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
dk['Takeaways'] = dk.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape data wide to long.

In [445]:
dk = dk.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in dk.columns if 'TeamCode' in col]
dk = pd.lreshape(dk, {'TeamCode' : a})
dk = dk.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dk = dk.rename(columns={'EventTeam': 'EventTeamCode'})
dk.head()

Unnamed: 0,AdvantageType,Block,Blocks,EventDetail,EventNumber,EventTeamCode,EventTimeFromZero,EventType,Faceoff,Faceoffs,GD,GameNumber,Giveaway,Giveaways,Goal,GoalNumber,Goals,HGoalNumber,Hit,Hits,Miss,Misses,Penalties,Penalty,Period,Season,Shot,Shots,Takeaway,Takeaways,VGoalNumber,TeamCode
0,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,3.0,1.0,20001,,,,1.0,,1.0,,,,,,,1,2010,,,,,0.0,MTL
111469,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,3.0,1.0,20001,,,,1.0,,1.0,,,,,,,1,2010,,,,,0.0,TOR
1,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,8.0,,,,,1,2010,,,,,0.0,MTL
111470,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,8.0,,,,,1,2010,,,,,0.0,TOR
2,EV,,,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",4,MTL,46,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,11.0,,,,,1,2010,,,,,0.0,MTL


- drop duplicates by season, game number, team code and event type.

In [446]:
dk = dk.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
dk = dk [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
dk = dk.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dk.shape

(38122, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [447]:
dk['Blocks_F'] = dk.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Blocks_A'] = dk.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Faceoffs_F'] = dk.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Faceoffs_A'] = dk.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Giveaways_F'] = dk.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Giveaways_A'] = dk.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Goals_F'] = dk.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Goals_A'] = dk.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Hits_F'] = dk.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Hits_A'] = dk.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Miss_F'] = dk.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Miss_A'] = dk.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Penalties_F'] = dk.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Penalties_A'] = dk.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Shots_F'] = dk.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Shots_A'] = dk.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk['Takeaways_F'] = dk.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dk['Takeaways_A'] = dk.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dk = dk.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backward and forward fill of on-ice events by season, game number and team code.

In [448]:
dk['Blocks_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
dk['Faceoffs_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
dk['Giveaways_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
dk['Goals_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
dk['Hits_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
dk['Miss_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
dk['Penalties_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
dk['Shots_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
dk['Takeaways_F'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
dk['Blocks_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
dk['Faceoffs_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
dk['Giveaways_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
dk['Goals_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
dk['Hits_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
dk['Miss_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
dk['Penalties_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
dk['Shots_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
dk['Takeaways_A'] = dk.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())
dk = dk.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [449]:
dk = dk[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
dk = dk.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dk = dk.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dk.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,4.0,5.0,3.0,3.0,4.0,4.0,2.0,1.0,11.0,8.0,2.0,,,,6.0,5.0,1.0,
111469,2010,20001,TOR,5.0,4.0,3.0,3.0,4.0,4.0,1.0,2.0,8.0,11.0,,2.0,,,5.0,6.0,,1.0
59,2010,20002,PHI,,1.0,2.0,6.0,,1.0,1.0,2.0,1.0,5.0,1.0,,,1.0,2.0,1.0,,
111528,2010,20002,PIT,1.0,,6.0,2.0,1.0,,2.0,1.0,5.0,1.0,,1.0,1.0,,1.0,2.0,,
83,2010,20003,CAR,14.0,16.0,23.0,44.0,11.0,11.0,3.0,3.0,10.0,16.0,7.0,7.0,5.0,4.0,17.0,22.0,3.0,7.0


In [450]:
dk.shape

(2444, 21)

- **merge all on-ice events (dk) onto  team roster player rank dataframe (dc) to create new dataframe (dl).**

In [451]:
dl = pd.merge(dc, dk, on=['Season', 'GameNumber', 'TeamCode'], how='left')
dl = dl.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dl = dl.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dl.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,4.0,5.0,3.0,3.0,4.0,4.0,2.0,1.0,11.0,8.0,2.0,,,,6.0,5.0,1.0,
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,5.0,4.0,3.0,3.0,4.0,4.0,1.0,2.0,8.0,11.0,,2.0,,,5.0,6.0,,1.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,,1.0,2.0,6.0,,1.0,1.0,2.0,1.0,5.0,1.0,,,1.0,2.0,1.0,,
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,1.0,,6.0,2.0,1.0,,2.0,1.0,5.0,1.0,,1.0,1.0,,1.0,2.0,,
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,14.0,16.0,23.0,44.0,11.0,11.0,3.0,3.0,10.0,16.0,7.0,7.0,5.0,4.0,17.0,22.0,3.0,7.0


In [452]:
dl.shape

(2030, 34)

- create a column for team win and team loss.

In [453]:
dl['TeamWin'] =  dl.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dl['TeamLos'] =  dl.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [454]:
dl['GP'] = dl.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
dl['GW'] = dl.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
dl['GL'] = dl.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
dl['GF'] = dl.groupby(['Season','TeamCode'])['GF'].transform('sum')
dl['GA'] = dl.groupby(['Season','TeamCode'])['GA'].transform('sum')
dl['Blocks_F'] = dl.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
dl['Faceoffs_F'] = dl.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
dl['Giveaways_F'] = dl.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
dl['Goals_F'] = dl.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
dl['Hits_F'] = dl.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
dl['Miss_F'] = dl.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
dl['Penalties_F'] = dl.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
dl['Shots_F'] = dl.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
dl['Takeaways_F'] = dl.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
dl['Blocks_A'] = dl.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
dl['Faceoffs_A'] = dl.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
dl['Giveaways_A'] = dl.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
dl['Goals_A'] = dl.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
dl['Hits_A'] = dl.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
dl['Miss_A'] = dl.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
dl['Penalties_A'] = dl.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
dl['Shots_A'] = dl.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
dl['Takeaways_A'] = dl.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
dl.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,293.0,337.0,673.0,623.0,219.0,171.0,102.0,96.0,483.0,539.0,293.0,235.0,107.0,102.0,713.0,567.0,153.0,131.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,425.0,413.0,763.0,692.0,305.0,270.0,116.0,120.0,677.0,651.0,315.0,368.0,114.0,119.0,607.0,741.0,184.0,189.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,501.0,445.0,813.0,882.0,244.0,220.0,122.0,135.0,705.0,660.0,318.0,363.0,137.0,126.0,815.0,775.0,211.0,216.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,303.0,325.0,626.0,653.0,148.0,149.0,104.0,105.0,667.0,615.0,250.0,223.0,119.0,106.0,632.0,615.0,115.0,123.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,390.0,440.0,740.0,906.0,201.0,245.0,128.0,124.0,763.0,587.0,329.0,397.0,96.0,120.0,749.0,841.0,232.0,216.0,1,0,76,38,35


- display wins and losses per team for the season.

In [455]:
dl['L'] = dl.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dl['W'] = dl.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dl.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,293.0,337.0,673.0,623.0,219.0,171.0,102.0,96.0,483.0,539.0,293.0,235.0,107.0,102.0,713.0,567.0,153.0,131.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,425.0,413.0,763.0,692.0,305.0,270.0,116.0,120.0,677.0,651.0,315.0,368.0,114.0,119.0,607.0,741.0,184.0,189.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,501.0,445.0,813.0,882.0,244.0,220.0,122.0,135.0,705.0,660.0,318.0,363.0,137.0,126.0,815.0,775.0,211.0,216.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,303.0,325.0,626.0,653.0,148.0,149.0,104.0,105.0,667.0,615.0,250.0,223.0,119.0,106.0,632.0,615.0,115.0,123.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,390.0,440.0,740.0,906.0,201.0,245.0,128.0,124.0,763.0,587.0,329.0,397.0,96.0,120.0,749.0,841.0,232.0,216.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [456]:
dl = dl.drop_duplicates(['Season', 'TeamCode'])
dl['WinPc'] = dl['W']/ dl['GP']
dl['LossPc'] = dl['L']/ dl['GP']
dl['Mean_Blocks_F'] = dl['Blocks_F']/ dl['GP']
dl['Mean_Faceoffs_F'] = dl['Faceoffs_F']/ dl['GP']
dl['Mean_Giveaways_F'] = dl['Giveaways_F']/ dl['GP']
dl['Mean_Goals_F'] = dl['Goals_F']/ dl['GP']
dl['Mean_Hits_F'] = dl['Hits_F']/ dl['GP']
dl['Mean_Miss_F'] = dl['Miss_F']/ dl['GP']
dl['Mean_Penalties_F'] = dl['Penalties_F']/ dl['GP']
dl['Mean_Shots_F'] = dl['Shots_F']/ dl['GP']
dl['Mean_Takeaways_F'] = dl['Takeaways_F']/ dl['GP']
dl['Mean_Blocks_A'] = dl['Blocks_A']/ dl['GP']
dl['Mean_Faceoffs_A'] = dl['Faceoffs_A']/ dl['GP']
dl['Mean_Giveaways_A'] = dl['Giveaways_A']/ dl['GP']
dl['Mean_Goals_A'] = dl['Goals_A']/ dl['GP']
dl['Mean_Hits_A'] = dl['Hits_A']/ dl['GP']
dl['Mean_Miss_A'] = dl['Miss_A']/ dl['GP']
dl['Mean_Penalties_A'] = dl['Penalties_A']/ dl['GP']
dl['Mean_Shots_A'] = dl['Shots_A']/ dl['GP']
dl['Mean_Takeaways_A'] = dl['Takeaways_A']/ dl['GP']
dl.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,293.0,337.0,673.0,623.0,219.0,171.0,102.0,96.0,483.0,539.0,293.0,235.0,107.0,102.0,713.0,567.0,153.0,131.0,0,1,68,34,31,31,37,0.544118,0.455882,4.308824,9.897059,3.220588,1.5,7.102941,4.308824,1.573529,10.485294,2.25,4.955882,9.161765,2.514706,1.411765,7.926471,3.455882,1.5,8.338235,1.926471
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,425.0,413.0,763.0,692.0,305.0,270.0,116.0,120.0,677.0,651.0,315.0,368.0,114.0,119.0,607.0,741.0,184.0,189.0,1,0,70,34,31,36,34,0.485714,0.514286,6.071429,10.9,4.357143,1.657143,9.671429,4.5,1.628571,8.671429,2.628571,5.9,9.885714,3.857143,1.714286,9.3,5.257143,1.7,10.585714,2.7
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,501.0,445.0,813.0,882.0,244.0,220.0,122.0,135.0,705.0,660.0,318.0,363.0,137.0,126.0,815.0,775.0,211.0,216.0,1,0,72,41,31,31,41,0.569444,0.430556,6.958333,11.291667,3.388889,1.694444,9.791667,4.416667,1.902778,11.319444,2.930556,6.180556,12.25,3.055556,1.875,9.166667,5.041667,1.75,10.763889,3.0
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,303.0,325.0,626.0,653.0,148.0,149.0,104.0,105.0,667.0,615.0,250.0,223.0,119.0,106.0,632.0,615.0,115.0,123.0,0,1,71,41,31,31,40,0.56338,0.43662,4.267606,8.816901,2.084507,1.464789,9.394366,3.521127,1.676056,8.901408,1.619718,4.577465,9.197183,2.098592,1.478873,8.661972,3.140845,1.492958,8.661972,1.732394
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,390.0,440.0,740.0,906.0,201.0,245.0,128.0,124.0,763.0,587.0,329.0,397.0,96.0,120.0,749.0,841.0,232.0,216.0,1,0,76,38,35,38,38,0.5,0.5,5.131579,9.736842,2.644737,1.684211,10.039474,4.328947,1.263158,9.855263,3.052632,5.789474,11.921053,3.223684,1.631579,7.723684,5.223684,1.578947,11.065789,2.842105


In [457]:
dl = dl[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
dl['Rank_W'] = dl.groupby(['Season'])['WinPc'].rank(ascending=False)
dl = dl.sort_values(['Season', 'Rank_W'], ascending=[True, True])
dl.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,4.534247,11.424658,2.383562,1.726027,8.273973,4.232877,1.534247,10.178082,2.520548,5.438356,9.027397,2.561644,1.438356,7.794521,3.849315,1.465753,9.547945,2.315068,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,5.076923,11.184615,3.661538,1.661538,8.630769,4.492308,1.461538,11.323077,2.861538,5.8,9.8,3.738462,1.6,9.661538,3.938462,1.630769,9.815385,2.538462,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,5.368421,10.421053,2.460526,1.644737,7.921053,4.078947,1.565789,10.381579,1.802632,5.315789,10.013158,3.184211,1.447368,8.618421,4.0,1.605263,10.684211,2.618421,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,4.058824,11.823529,3.676471,1.970588,8.602941,4.926471,1.455882,11.823529,2.897059,5.073529,10.705882,2.882353,1.823529,9.308824,3.897059,1.558824,10.279412,2.661765,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,5.430769,9.215385,2.892308,1.6,8.323077,3.676923,1.476923,8.753846,1.738462,3.6,10.369231,3.323077,1.692308,8.169231,4.784615,1.415385,10.861538,2.246154,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,5.222222,10.263889,3.027778,1.541667,7.388889,3.888889,1.555556,9.736111,2.569444,5.069444,9.986111,2.902778,1.416667,8.138889,4.069444,1.402778,9.013889,2.583333,6.0
306,2010,LA,70,40,30,0.571429,0.428571,4.485714,10.842857,3.728571,1.685714,10.142857,4.771429,1.357143,9.814286,1.685714,5.457143,10.485714,3.285714,1.657143,11.3,4.2,1.7,9.328571,2.114286,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,6.958333,11.291667,3.388889,1.694444,9.791667,4.416667,1.902778,11.319444,2.930556,6.180556,12.25,3.055556,1.875,9.166667,5.041667,1.75,10.763889,3.0,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,4.267606,8.816901,2.084507,1.464789,9.394366,3.521127,1.676056,8.901408,1.619718,4.577465,9.197183,2.098592,1.478873,8.661972,3.140845,1.492958,8.661972,1.732394,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,5.178082,9.383562,1.643836,1.643836,10.054795,4.013699,1.561644,9.534247,2.328767,4.575342,10.027397,2.315068,1.493151,9.246575,3.575342,1.60274,9.246575,2.616438,10.0


In [458]:
dl.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_all_events_with_goal_differential_ranking.csv', index='False', sep=',')
#dl.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_all_events_with_goal_differential_rankin.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [459]:
dl['DBlock'] = dl['Mean_Blocks_F'] - dl['Mean_Blocks_A']
dl['DFaceoff'] = dl['Mean_Faceoffs_F'] - dl['Mean_Faceoffs_A']
dl['DGiveaway'] = dl['Mean_Giveaways_F'] - dl['Mean_Giveaways_A']
dl['DGoal'] = dl['Mean_Goals_F'] - dl['Mean_Goals_A']
dl['DHit'] = dl['Mean_Hits_F'] - dl['Mean_Hits_A']
dl['DMiss'] = dl['Mean_Miss_F'] - dl['Mean_Miss_A']
dl['DPenalty'] = dl['Mean_Penalties_F'] - dl['Mean_Penalties_A']
dl['DShot'] = dl['Mean_Shots_F'] - dl['Mean_Shots_A']
dl['DTakeaway'] = dl['Mean_Takeaways_F'] - dl['Mean_Takeaways_A']

## all on-ice events prior to a goal while score differential is between -1 and 1 analysis

- summary analysis

In [460]:
dl.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,4.962322,10.108206,2.886557,1.622558,8.259698,4.040797,1.483147,9.720919,2.533545,4.947075,10.119179,2.873299,1.626241,8.327178,4.033233,1.49091,9.688615,2.531115,15.5,0.015247,-0.010973,0.013258,-0.003682,-0.06748,0.007564,-0.007763,0.032304,0.00243
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,0.838681,1.152263,0.714561,0.149268,1.356608,0.549171,0.22702,1.264658,0.685015,0.671669,1.114775,0.632065,0.154923,1.404677,0.606178,0.208377,1.092978,0.533987,8.802429,0.674269,1.08883,0.460086,0.12908,1.168557,0.592202,0.175545,1.160875,0.334294
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,2.686567,6.910448,1.621212,1.19403,5.104478,2.597015,0.955224,6.910448,1.462687,3.238806,6.701493,1.373134,1.313433,4.641791,2.268657,0.985075,6.223881,1.597015,1.0,-1.014706,-3.102941,-0.723684,-0.211268,-2.615385,-1.107692,-0.36,-2.323529,-0.815789
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,4.544018,9.457532,2.39314,1.510417,7.247222,3.682952,1.400527,8.911972,2.041667,4.575873,9.445652,2.364978,1.507544,7.741393,3.810446,1.403045,9.176227,2.181538,8.25,-0.585976,-0.54507,-0.420573,-0.092308,-0.805147,-0.456197,-0.095063,-0.625595,-0.246963
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,4.898756,10.342471,2.955278,1.659341,8.230736,4.046323,1.469231,9.775198,2.544996,4.914561,10.153405,2.892565,1.60303,8.15406,4.0,1.48494,9.771182,2.560897,15.5,0.081871,0.097533,0.070136,0.007243,0.216629,0.034722,-0.007692,0.386693,0.035699
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,5.243056,10.885714,3.344777,1.692262,9.212441,4.394737,1.614811,10.475509,2.871635,5.428767,10.675501,3.270207,1.736538,9.226598,4.202344,1.624393,10.309853,2.747368,22.75,0.501842,0.556386,0.40942,0.085407,0.608036,0.458904,0.145833,0.721667,0.232026
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,6.958333,11.823529,4.357143,1.970588,11.109589,4.926471,1.928571,12.42,4.58,6.180556,12.38,4.117647,1.9,11.3,5.257143,1.914286,11.7,3.96,30.0,1.830769,2.39726,0.850746,0.287671,2.780822,1.029412,0.295775,2.147059,0.62


#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [461]:
print ('win percent in all on-ice events while score diffential between -1 and 1')
y = dl['WinPc']  
X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.748
Model:                            OLS   Adj. R-squared:                  0.337
Method:                 Least Squares   F-statistic:                     1.818
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.156
Time:                        20:32:22   Log-Likelihood:                 49.422
No. Observations:                  30   AIC:                            -60.84
Df Residuals:                      11   BIC:                            -34.22
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
----------------------------------------------------

In [462]:
print ('win percent in all on-ice events while score diffential between -1 and 1')
y = dl['WinPc']  
X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent in all on-ice events while score diffential between -1 and 1
Optimization terminated successfully.
         Current function value: 0.667692
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.03671
Time:                        20:32:22   Log-Likelihood:                -20.031
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const                0.

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [463]:
print ('win percent for differntial in all on-ice events while score diffential between -1 and 1')
y = dl['WinPc']  
X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent for differntial in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.513
Model:                            OLS   Adj. R-squared:                  0.294
Method:                 Least Squares   F-statistic:                     2.341
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0543
Time:                        20:32:22   Log-Likelihood:                 39.515
No. Observations:                  30   AIC:                            -59.03
Df Residuals:                      20   BIC:                            -45.02
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------

In [464]:
print ('win percent for differntial in all on-ice events while score diffential between -1 and 1')
y = dl['WinPc']  
X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent for differntial in all on-ice events while score diffential between -1 and 1
Optimization terminated successfully.
         Current function value: 0.675673
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.02520
Time:                        20:32:22   Log-Likelihood:                -20.270
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9993
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.00

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [465]:
print ('mean goals for in all on-ice events while score diffential between -1 and 1')
y = dl['Mean_Goals_F']  
X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.853
Model:                            OLS   Adj. R-squared:                  0.672
Method:                 Least Squares   F-statistic:                     4.715
Date:                Fri, 19 Jan 2018   Prob (F-statistic):            0.00368
Time:                        20:32:22   Log-Likelihood:                 43.762
No. Observations:                  30   AIC:                            -53.52
Df Residuals:                      13   BIC:                            -29.70
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------

In [466]:
#y = dl['Mean_Goals_F']  
#X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [467]:
print ('mean goals for for differntial in all on-ice events while score diffential between -1 and 1')
y = dl['Mean_Goals_F']  
X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals for for differntial in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.399
Model:                            OLS   Adj. R-squared:                  0.170
Method:                 Least Squares   F-statistic:                     1.744
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.146
Time:                        20:32:22   Log-Likelihood:                 22.642
No. Observations:                  30   AIC:                            -27.28
Df Residuals:                      21   BIC:                            -14.67
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------

In [468]:
#y = dl['Mean_Goals_F']  
#X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [469]:
print ('mean goals against in all on-ice events while score diffential between -1 and 1')
y = dl['Mean_Goals_A']  
X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.761
Model:                            OLS   Adj. R-squared:                  0.468
Method:                 Least Squares   F-statistic:                     2.594
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0448
Time:                        20:32:22   Log-Likelihood:                 35.383
No. Observations:                  30   AIC:                            -36.77
Df Residuals:                      13   BIC:                            -12.95
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------

In [470]:
#y = dl['Mean_Goals_A']  
#X = sm.add_constant(dl[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [471]:
print ('mean goals against for differntial in all on-ice events while score diffential between -1 and 1')
y = dl['Mean_Goals_A']  
X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against for differntial in all on-ice events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.323
Model:                            OLS   Adj. R-squared:                  0.065
Method:                 Least Squares   F-statistic:                     1.251
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.320
Time:                        20:32:22   Log-Likelihood:                 19.730
No. Observations:                  30   AIC:                            -21.46
Df Residuals:                      21   BIC:                            -8.850
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
-----------------------------------

In [472]:
#y = dl['Mean_Goals_A']  
#X = sm.add_constant(dl[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

## 6) even strength on-ice events when score differential is between -1 and 1

- keep on-ice even strenghth events that happened prior to a goal while score differential is between -1 and 1.

In [473]:
dm = da[['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'VTeamCode', 'HTeamCode', 'EventTeamCode']]
dm = pd.merge(dm, dg, on=['Season', 'GameNumber', 'EventNumber', 'AdvantageType', 'Period', 'EventType', 'EventTimeFromZero', 'EventDetail', 'EventTeamCode', 'VTeamCode', 'HTeamCode'], how='left')
dm['AdvantageType'] = dm.groupby(['Season', 'GameNumber'])['AdvantageType'].apply(lambda x: x.bfill())
dm['GoalNumber'] = dm.groupby(['Season', 'GameNumber', 'Period'])['GoalNumber'].apply(lambda x: x.bfill())
dm['HGoalNumber'] = dm.groupby(['Season', 'GameNumber', 'Period'])['HGoalNumber'].apply(lambda x: x.bfill())
dm['VGoalNumber'] = dm.groupby(['Season', 'GameNumber', 'Period'])['VGoalNumber'].apply(lambda x: x.bfill())
dm['GD'] = dm['HGoalNumber'] - dm['VGoalNumber']
dm = dm[(dm['GD'] >= -1) & (dm['GD'] <= 1)]
dm = dm[dm['AdvantageType'] == 'EV']
dm.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber,VGoalNumber,HGoalNumber,GD
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0,0.0,1.0,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0


- Even strength on-ice events that occured in a different period from a goal or after a goal are excluded from the dataframe.

In [474]:
dm = dm.dropna(subset=['GoalNumber'])
dm = dm.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dm = dm.drop_duplicates(['Season', 'GameNumber', 'EventNumber', 'EventTeamCode'])
dm.head()

Unnamed: 0,Season,GameNumber,EventNumber,AdvantageType,Period,EventType,EventTimeFromZero,EventDetail,VTeamCode,HTeamCode,EventTeamCode,GoalNumber,VGoalNumber,HGoalNumber,GD
0,2010,20001,1,EV,1,FAC,0,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,MTL,TOR,MTL,1.0,0.0,1.0,1.0
1,2010,20001,3,EV,1,HIT,15,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0
2,2010,20001,4,EV,1,HIT,46,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
3,2010,20001,5,EV,1,HIT,57,"MTL #76 SUBBAN HIT TOR #15 KABERLE, Neu. Zone",MTL,TOR,MTL,1.0,0.0,1.0,1.0
4,2010,20001,6,EV,1,GIVE,69,"TOR&nbsp;GIVEAWAY - #35 GIGUERE, Def. Zone",MTL,TOR,TOR,1.0,0.0,1.0,1.0


- Assign a value of 1 if an on-ice event is a goal, 0 if not. Follow the same procedure for block, faceoff, giveaway, hits, miss, penalty, shot and takeaway. Group by season, game number and event type to find the sum of each on-ice event per game. 

In [475]:
dm['Goal'] = dm.apply(lambda x: 1 if (x['EventType'] == 'GOAL') else np.nan, axis=1)
dm['Block'] = dm.apply(lambda x: 1 if (x['EventType'] == 'BLOCK') else np.nan, axis=1)
dm['Faceoff'] = dm.apply(lambda x: 1 if (x['EventType'] == 'FAC') else np.nan, axis=1)
dm['Giveaway'] = dm.apply(lambda x: 1 if (x['EventType'] == 'GIVE') else np.nan, axis=1)
dm['Hit'] = dm.apply(lambda x: 1 if (x['EventType'] == 'HIT') else np.nan, axis=1)
dm['Miss'] = dm.apply(lambda x: 1 if (x['EventType'] == 'MISS') else np.nan, axis=1)
dm['Penalty'] = dm.apply(lambda x: 1 if (x['EventType'] == 'PENL') else np.nan, axis=1)
dm['Shot'] = dm.apply(lambda x: 1 if (x['EventType'] == 'SHOT') else np.nan, axis=1)
dm['Takeaway'] = dm.apply(lambda x: 1 if (x['EventType'] == 'TAKE') else np.nan, axis=1)

In [476]:
dm['Blocks'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Block'].transform('sum')
dm['Faceoffs'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Faceoff'].transform('sum')
dm['Giveaways'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Giveaway'].transform('sum')
dm['Goals'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Goal'].transform('sum')
dm['Hits'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Hit'].transform('sum')
dm['Misses'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Miss'].transform('sum')
dm['Penalties'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Penalty'].transform('sum')
dm['Shots'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Shot'].transform('sum')
dm['Takeaways'] = dm.groupby(['Season','GameNumber', 'EventTeamCode', 'EventType'])['Takeaway'].transform('sum')

- reshape data wide to long.

In [477]:
dm = dm.rename(columns={'EventTeamCode': 'EventTeam'})
a = [col for col in dm.columns if 'TeamCode' in col]
dm = pd.lreshape(dm, {'TeamCode' : a})
dm = dm.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dm = dm.rename(columns={'EventTeam': 'EventTeamCode'})
dm.head()

Unnamed: 0,AdvantageType,Block,Blocks,EventDetail,EventNumber,EventTeamCode,EventTimeFromZero,EventType,Faceoff,Faceoffs,GD,GameNumber,Giveaway,Giveaways,Goal,GoalNumber,Goals,HGoalNumber,Hit,Hits,Miss,Misses,Penalties,Penalty,Period,Season,Shot,Shots,Takeaway,Takeaways,VGoalNumber,TeamCode
0,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,3.0,1.0,20001,,,,1.0,,1.0,,,,,,,1,2010,,,,,0.0,MTL
92528,EV,,,MTL won Neu. Zone - MTL #11 GOMEZ vs TOR #37 B...,1,MTL,0,FAC,1.0,3.0,1.0,20001,,,,1.0,,1.0,,,,,,,1,2010,,,,,0.0,TOR
1,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,8.0,,,,,1,2010,,,,,0.0,MTL
92529,EV,,,"TOR #37 BRENT HIT MTL #26 GORGES, Off. Zone",3,TOR,15,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,8.0,,,,,1,2010,,,,,0.0,TOR
2,EV,,,"MTL #14 PLEKANEC HIT TOR #2 SCHENN, Off. Zone",4,MTL,46,HIT,,,1.0,20001,,,,1.0,,1.0,1.0,11.0,,,,,1,2010,,,,,0.0,MTL


- drop duplicates by season, game number, team code and event type.

In [478]:
dm = dm.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'EventTeamCode', 'EventType'])
dm = dm [['Season', 'GameNumber', 'TeamCode', 'EventNumber', 'EventType', 'EventTeamCode',  'Blocks', 'Faceoffs', 'Giveaways', 'Goals', 'Hits', 'Misses', 'Penalties', 'Shots', 'Takeaways']]
dm = dm.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])
dm.shape

(36662, 15)

- assign all on-ice events to their respectful teams. If team code is the same as event team code, then the on-ice event is assigned to that team. If not it is assigned to the opposing team. Each on-ice event generates two variables per team: For (F) and Against (A).

In [479]:
dm['Blocks_F'] = dm.apply(lambda x: x['Blocks'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Blocks_A'] = dm.apply(lambda x: x['Blocks'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Faceoffs_F'] = dm.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Faceoffs_A'] = dm.apply(lambda x: x['Faceoffs'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Giveaways_F'] = dm.apply(lambda x: x['Giveaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Giveaways_A'] = dm.apply(lambda x: x['Giveaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Goals_F'] = dm.apply(lambda x: x['Goals'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Goals_A'] = dm.apply(lambda x: x['Goals'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Hits_F'] = dm.apply(lambda x: x['Hits'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Hits_A'] = dm.apply(lambda x: x['Hits'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Miss_F'] = dm.apply(lambda x: x['Misses'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Miss_A'] = dm.apply(lambda x: x['Misses'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Penalties_F'] = dm.apply(lambda x: x['Penalties'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Penalties_A'] = dm.apply(lambda x: x['Penalties'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Shots_F'] = dm.apply(lambda x: x['Shots'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Shots_A'] = dm.apply(lambda x: x['Shots'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm['Takeaways_F'] = dm.apply(lambda x: x['Takeaways'] if (x['TeamCode'] == x['EventTeamCode']) else np.nan, axis=1)
dm['Takeaways_A'] = dm.apply(lambda x: x['Takeaways'] if (x['TeamCode'] != x['EventTeamCode']) else np.nan, axis=1)
dm = dm.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- backward and forward fill on-ice events by season, game number and team code.

In [480]:
dm['Blocks_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_F'].apply(lambda x: x.ffill().bfill())
dm['Faceoffs_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_F'].apply(lambda x: x.ffill().bfill())
dm['Giveaways_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_F'].apply(lambda x: x.ffill().bfill())
dm['Goals_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Goals_F'].apply(lambda x: x.ffill().bfill())
dm['Hits_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Hits_F'].apply(lambda x: x.ffill().bfill())
dm['Miss_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Miss_F'].apply(lambda x: x.ffill().bfill())
dm['Penalties_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_F'].apply(lambda x: x.ffill().bfill())
dm['Shots_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Shots_F'].apply(lambda x: x.ffill().bfill())
dm['Takeaways_F'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_F'].apply(lambda x: x.ffill().bfill())
dm['Blocks_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Blocks_A'].apply(lambda x: x.ffill().bfill())
dm['Faceoffs_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Faceoffs_A'].apply(lambda x: x.ffill().bfill())
dm['Giveaways_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Giveaways_A'].apply(lambda x: x.ffill().bfill())
dm['Goals_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Goals_A'].apply(lambda x: x.ffill().bfill())
dm['Hits_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Hits_A'].apply(lambda x: x.ffill().bfill())
dm['Miss_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Miss_A'].apply(lambda x: x.ffill().bfill())
dm['Penalties_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Penalties_A'].apply(lambda x: x.ffill().bfill())
dm['Shots_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Shots_A'].apply(lambda x: x.ffill().bfill())
dm['Takeaways_A'] = dm.groupby(['Season','GameNumber', 'TeamCode'])['Takeaways_A'].apply(lambda x: x.ffill().bfill())
dm = dm.sort_values(['Season', 'GameNumber', 'EventNumber'], ascending=[True, True, True])

- keep only relative columns and drop duplicates by season, gamenumber and teamcode, to have two observations per game.

In [481]:
dm = dm[['Season', 'GameNumber', 'TeamCode', 'Blocks_F', 'Blocks_A', 'Faceoffs_F', 'Faceoffs_A', 'Giveaways_F', 'Giveaways_A', 'Goals_F', 'Goals_A', 'Hits_F', 'Hits_A', 'Miss_F', 'Miss_A', 'Penalties_F', 'Penalties_A', 'Shots_F', 'Shots_A', 'Takeaways_F', 'Takeaways_A']]
dm = dm.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dm = dm.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dm.head()

Unnamed: 0,Season,GameNumber,TeamCode,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,2010,20001,MTL,4.0,5.0,3.0,3.0,4.0,4.0,2.0,1.0,11.0,8.0,2.0,,,,6.0,5.0,1.0,
92528,2010,20001,TOR,5.0,4.0,3.0,3.0,4.0,4.0,1.0,2.0,8.0,11.0,,2.0,,,5.0,6.0,,1.0
59,2010,20002,PHI,,1.0,1.0,5.0,,1.0,,1.0,1.0,5.0,1.0,,,1.0,2.0,1.0,,
92587,2010,20002,PIT,1.0,,5.0,1.0,1.0,,1.0,,5.0,1.0,,1.0,1.0,,1.0,2.0,,
79,2010,20003,CAR,10.0,16.0,17.0,32.0,9.0,8.0,1.0,1.0,10.0,13.0,7.0,6.0,5.0,3.0,15.0,15.0,3.0,7.0


In [482]:
dm.shape

(2442, 21)

- **merge even strength on-ice events (dm) onto  team roster player rank dataframe (dc) to create new dataframe (dn).**

In [483]:
dn = pd.merge(dc, dm, on=['Season', 'GameNumber', 'TeamCode'], how='left')
dn = dn.drop_duplicates(['Season', 'GameNumber', 'TeamCode'])
dn = dn.sort_values(['Season', 'GameNumber'], ascending=[True, True])
dn.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,2,3,2,18.0,F,12.0,12.0,6.0,4.0,5.0,3.0,3.0,4.0,4.0,2.0,1.0,11.0,8.0,2.0,,,,6.0,5.0,1.0,
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,3,2,2,18.0,F,12.0,12.0,6.0,5.0,4.0,3.0,3.0,4.0,4.0,1.0,2.0,8.0,11.0,,2.0,,,5.0,6.0,,1.0
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,3,2,1,18.0,F,12.0,12.0,6.0,,1.0,1.0,5.0,,1.0,,1.0,1.0,5.0,1.0,,,1.0,2.0,1.0,,
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,2,3,1,18.0,F,12.0,12.0,6.0,1.0,,5.0,1.0,1.0,,1.0,,5.0,1.0,,1.0,1.0,,1.0,2.0,,
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,4,3,1,18.0,F,12.0,12.0,6.0,10.0,16.0,17.0,32.0,9.0,8.0,1.0,1.0,10.0,13.0,7.0,6.0,5.0,3.0,15.0,15.0,3.0,7.0


In [484]:
dn.shape

(2030, 34)

- create a column for team win and team loss.

In [485]:
dn['TeamWin'] =  dn.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dn['TeamLos'] =  dn.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)

- display games played, games won, games loss, all on-ice events for and against by team for the season.

In [486]:
dn['GP'] = dn.groupby(['Season','TeamCode'])['GameNumber'].transform('count')
dn['GW'] = dn.groupby(['Season','WinTeam'])['TeamWin'].transform('sum')
dn['GL'] = dn.groupby(['Season','LossTeam'])['TeamLos'].transform('sum')
dn['GF'] = dn.groupby(['Season','TeamCode'])['GF'].transform('sum')
dn['GA'] = dn.groupby(['Season','TeamCode'])['GA'].transform('sum')
dn['Blocks_F'] = dn.groupby(['Season','TeamCode'])['Blocks_F'].transform('sum')
dn['Faceoffs_F'] = dn.groupby(['Season','TeamCode'])['Faceoffs_F'].transform('sum')
dn['Giveaways_F'] = dn.groupby(['Season','TeamCode'])['Giveaways_F'].transform('sum')
dn['Goals_F'] = dn.groupby(['Season','TeamCode'])['Goals_F'].transform('sum')
dn['Hits_F'] = dn.groupby(['Season','TeamCode'])['Hits_F'].transform('sum')
dn['Miss_F'] = dn.groupby(['Season','TeamCode'])['Miss_F'].transform('sum')
dn['Penalties_F'] = dn.groupby(['Season','TeamCode'])['Penalties_F'].transform('sum')
dn['Shots_F'] = dn.groupby(['Season','TeamCode'])['Shots_F'].transform('sum')
dn['Takeaways_F'] = dn.groupby(['Season','TeamCode'])['Takeaways_F'].transform('sum')
dn['Blocks_A'] = dn.groupby(['Season','TeamCode'])['Blocks_A'].transform('sum') 
dn['Faceoffs_A'] = dn.groupby(['Season','TeamCode'])['Faceoffs_A'].transform('sum')
dn['Giveaways_A'] = dn.groupby(['Season','TeamCode'])['Giveaways_A'].transform('sum')
dn['Goals_A'] = dn.groupby(['Season','TeamCode'])['Goals_A'].transform('sum')
dn['Hits_A'] = dn.groupby(['Season','TeamCode'])['Hits_A'].transform('sum')
dn['Miss_A'] = dn.groupby(['Season','TeamCode'])['Miss_A'].transform('sum')
dn['Penalties_A'] = dn.groupby(['Season','TeamCode'])['Penalties_A'].transform('sum')
dn['Shots_A'] = dn.groupby(['Season','TeamCode'])['Shots_A'].transform('sum')
dn['Takeaways_A'] = dn.groupby(['Season','TeamCode'])['Takeaways_A'].transform('sum')
dn.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,226.0,277.0,515.0,469.0,177.0,150.0,73.0,69.0,450.0,513.0,237.0,188.0,93.0,95.0,563.0,449.0,112.0,95.0,0,1,68,34,31
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,360.0,339.0,580.0,536.0,265.0,218.0,85.0,85.0,630.0,602.0,251.0,313.0,105.0,104.0,487.0,632.0,157.0,157.0,1,0,70,34,31
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,406.0,364.0,635.0,685.0,206.0,178.0,97.0,97.0,648.0,617.0,266.0,299.0,119.0,111.0,671.0,604.0,180.0,181.0,1,0,72,41,31
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,252.0,251.0,488.0,482.0,111.0,117.0,79.0,75.0,616.0,576.0,203.0,168.0,108.0,100.0,517.0,493.0,95.0,103.0,0,1,71,41,31
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,337.0,371.0,596.0,724.0,154.0,208.0,98.0,93.0,721.0,543.0,281.0,333.0,84.0,109.0,607.0,708.0,191.0,168.0,1,0,76,38,35


- display wins and losses per team for the season. 

In [487]:
dn['L'] = dn.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dn['W'] = dn.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dn.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,226.0,277.0,515.0,469.0,177.0,150.0,73.0,69.0,450.0,513.0,237.0,188.0,93.0,95.0,563.0,449.0,112.0,95.0,0,1,68,34,31,31,37
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,360.0,339.0,580.0,536.0,265.0,218.0,85.0,85.0,630.0,602.0,251.0,313.0,105.0,104.0,487.0,632.0,157.0,157.0,1,0,70,34,31,36,34
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,406.0,364.0,635.0,685.0,206.0,178.0,97.0,97.0,648.0,617.0,266.0,299.0,119.0,111.0,671.0,604.0,180.0,181.0,1,0,72,41,31,31,41
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,252.0,251.0,488.0,482.0,111.0,117.0,79.0,75.0,616.0,576.0,203.0,168.0,108.0,100.0,517.0,493.0,95.0,103.0,0,1,71,41,31,31,40
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,337.0,371.0,596.0,724.0,154.0,208.0,98.0,93.0,721.0,543.0,281.0,333.0,84.0,109.0,607.0,708.0,191.0,168.0,1,0,76,38,35,38,38


- divide wins, losses by game to determine each team's winning and losing percentage. Divide all on-ice events by number of games each team played and display the mean of all on-ice events that occured for a team throughout the season.  

In [488]:
dn = dn.drop_duplicates(['Season', 'TeamCode'])
dn['WinPc'] = dn['W']/ dn['GP']
dn['LossPc'] = dn['L']/ dn['GP']
dn['Mean_Blocks_F'] = dn['Blocks_F']/ dn['GP']
dn['Mean_Faceoffs_F'] = dn['Faceoffs_F']/ dn['GP']
dn['Mean_Giveaways_F'] = dn['Giveaways_F']/ dn['GP']
dn['Mean_Goals_F'] = dn['Goals_F']/ dn['GP']
dn['Mean_Hits_F'] = dn['Hits_F']/ dn['GP']
dn['Mean_Miss_F'] = dn['Miss_F']/ dn['GP']
dn['Mean_Penalties_F'] = dn['Penalties_F']/ dn['GP']
dn['Mean_Shots_F'] = dn['Shots_F']/ dn['GP']
dn['Mean_Takeaways_F'] = dn['Takeaways_F']/ dn['GP']
dn['Mean_Blocks_A'] = dn['Blocks_A']/ dn['GP']
dn['Mean_Faceoffs_A'] = dn['Faceoffs_A']/ dn['GP']
dn['Mean_Giveaways_A'] = dn['Giveaways_A']/ dn['GP']
dn['Mean_Goals_A'] = dn['Goals_A']/ dn['GP']
dn['Mean_Hits_A'] = dn['Hits_A']/ dn['GP']
dn['Mean_Miss_A'] = dn['Miss_A']/ dn['GP']
dn['Mean_Penalties_A'] = dn['Penalties_A']/ dn['GP']
dn['Mean_Shots_A'] = dn['Shots_A']/ dn['GP']
dn['Mean_Takeaways_A'] = dn['Takeaways_A']/ dn['GP']
dn.head()

Unnamed: 0,GD,GameNumber,LossTeam,Season,WinTeam,PlayerNumber,PlayerPosition,TeamCode,GF,GA,Rank,RosterCount,Position,PositionCount,FCount,DCount,Blocks_F,Blocks_A,Faceoffs_F,Faceoffs_A,Giveaways_F,Giveaways_A,Goals_F,Goals_A,Hits_F,Hits_A,Miss_F,Miss_A,Penalties_F,Penalties_A,Shots_F,Shots_A,Takeaways_F,Takeaways_A,TeamWin,TeamLos,GP,GW,GL,L,W,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A
0,1,20001,MTL,2010,TOR,11.0,C,MTL,188,181,2,18.0,F,12.0,12.0,6.0,226.0,277.0,515.0,469.0,177.0,150.0,73.0,69.0,450.0,513.0,237.0,188.0,93.0,95.0,563.0,449.0,112.0,95.0,0,1,68,34,31,31,37,0.544118,0.455882,3.323529,7.573529,2.602941,1.073529,6.617647,3.485294,1.367647,8.279412,1.647059,4.073529,6.897059,2.205882,1.014706,7.544118,2.764706,1.397059,6.602941,1.397059
18270,1,20001,MTL,2010,TOR,37.0,C,TOR,195,218,2,18.0,F,12.0,12.0,6.0,360.0,339.0,580.0,536.0,265.0,218.0,85.0,85.0,630.0,602.0,251.0,313.0,105.0,104.0,487.0,632.0,157.0,157.0,1,0,70,34,31,36,34,0.485714,0.514286,5.142857,8.285714,3.785714,1.214286,9.0,3.585714,1.5,6.957143,2.242857,4.842857,7.657143,3.114286,1.214286,8.6,4.471429,1.485714,9.028571,2.242857
18,-1,20002,PIT,2010,PHI,17.0,C,PHI,235,207,1,18.0,F,12.0,12.0,6.0,406.0,364.0,635.0,685.0,206.0,178.0,97.0,97.0,648.0,617.0,266.0,299.0,119.0,111.0,671.0,604.0,180.0,181.0,1,0,72,41,31,31,41,0.569444,0.430556,5.638889,8.819444,2.861111,1.347222,9.0,3.694444,1.652778,9.319444,2.5,5.055556,9.513889,2.472222,1.347222,8.569444,4.152778,1.541667,8.388889,2.513889
18288,-1,20002,PIT,2010,PHI,71.0,C,PIT,204,180,1,18.0,F,12.0,12.0,6.0,252.0,251.0,488.0,482.0,111.0,117.0,79.0,75.0,616.0,576.0,203.0,168.0,108.0,100.0,517.0,493.0,95.0,103.0,0,1,71,41,31,31,40,0.56338,0.43662,3.549296,6.873239,1.56338,1.112676,8.676056,2.859155,1.521127,7.28169,1.338028,3.535211,6.788732,1.647887,1.056338,8.112676,2.366197,1.408451,6.943662,1.450704
36,-1,20003,MIN,2010,CAR,53.0,C,CAR,222,218,1,18.0,F,12.0,12.0,6.0,337.0,371.0,596.0,724.0,154.0,208.0,98.0,93.0,721.0,543.0,281.0,333.0,84.0,109.0,607.0,708.0,191.0,168.0,1,0,76,38,35,38,38,0.5,0.5,4.434211,7.842105,2.026316,1.289474,9.486842,3.697368,1.105263,7.986842,2.513158,4.881579,9.526316,2.736842,1.223684,7.144737,4.381579,1.434211,9.315789,2.210526


In [489]:
dn = dn[['Season', 'TeamCode', 'GP', 'W', 'L','WinPc', 'LossPc', 'Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F','Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F','Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A','Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A','Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A','Mean_Shots_A', 'Mean_Takeaways_A']]
dn['Rank_W'] = dn.groupby(['Season'])['WinPc'].rank(ascending=False)
dn = dn.sort_values(['Season', 'Rank_W'], ascending=[True, True])
dn.head(30)

Unnamed: 0,Season,TeamCode,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W
18576,2010,VAN,73,48,25,0.657534,0.342466,3.767123,8.753425,2.013699,1.205479,7.835616,3.493151,1.383562,8.246575,2.109589,4.369863,7.013699,2.068493,1.09589,7.342466,3.164384,1.356164,7.60274,2.054795,1.0
90,2010,SJ,65,41,24,0.630769,0.369231,4.246154,8.569231,3.092308,1.230769,8.261538,3.446154,1.292308,8.861538,2.369231,4.6,7.492308,3.092308,1.092308,9.076923,3.276923,1.461538,8.246154,2.169231,2.0
18432,2010,BOS,76,45,31,0.592105,0.407895,4.526316,8.513158,2.078947,1.210526,7.421053,3.342105,1.447368,8.565789,1.578947,4.539474,7.973684,2.815789,1.026316,8.197368,3.447368,1.447368,9.078947,2.276316,3.0
18396,2010,DET,68,40,28,0.588235,0.411765,3.191176,9.367647,3.029412,1.426471,8.088235,3.779412,1.220588,9.676471,2.367647,4.058824,8.397059,2.382353,1.382353,8.691176,3.191176,1.279412,8.470588,2.161765,4.0
126,2010,ANA,65,38,27,0.584615,0.415385,4.292308,6.892308,2.353846,1.153846,7.846154,2.892308,1.369231,6.876923,1.430769,2.784615,8.0,2.753846,1.230769,7.415385,3.661538,1.215385,8.815385,1.923077,5.0
18720,2010,WSH,72,42,30,0.583333,0.416667,4.236111,7.569444,2.416667,1.069444,6.555556,3.069444,1.458333,7.652778,2.041667,4.166667,7.819444,2.291667,1.097222,7.597222,3.277778,1.208333,7.152778,2.208333,6.0
306,2010,LA,70,40,30,0.571429,0.428571,3.828571,8.357143,2.885714,1.285714,9.3,3.614286,1.285714,7.414286,1.457143,4.371429,7.914286,2.6,1.3,10.442857,3.528571,1.6,7.6,1.771429,7.0
18,2010,PHI,72,41,31,0.569444,0.430556,5.638889,8.819444,2.861111,1.347222,9.0,3.694444,1.652778,9.319444,2.5,5.055556,9.513889,2.472222,1.347222,8.569444,4.152778,1.541667,8.388889,2.513889,8.0
18288,2010,PIT,71,40,31,0.56338,0.43662,3.549296,6.873239,1.56338,1.112676,8.676056,2.859155,1.521127,7.28169,1.338028,3.535211,6.788732,1.647887,1.056338,8.112676,2.366197,1.408451,6.943662,1.450704,9.0
180,2010,NYR,73,41,32,0.561644,0.438356,4.054795,7.520548,1.246575,1.191781,9.136986,3.246575,1.424658,7.780822,1.876712,3.69863,7.69863,1.90411,1.150685,8.616438,2.849315,1.438356,7.657534,2.178082,10.0


In [490]:
dn.to_csv('/Users/stefanostselios/Brock University/Kevin Mongeon - StephanosShare/out/season_team_even_strength_events_with_goal_differential_ranking.csv', index='False', sep=',')
#dn.to_csv('/Users/kevinmongeon/Brock University/Steve Tselios - StephanosShare/out/season_team_even_strength_events_with_goal_differential_rankin.csv', index='False', sep=',')

- display the diffence between each on-ice events per team.

In [491]:
dn['DBlock'] = dn['Mean_Blocks_F'] - dn['Mean_Blocks_A']
dn['DFaceoff'] = dn['Mean_Faceoffs_F'] - dn['Mean_Faceoffs_A']
dn['DGiveaway'] = dn['Mean_Giveaways_F'] - dn['Mean_Giveaways_A']
dn['DGoal'] = dn['Mean_Goals_F'] - dn['Mean_Goals_A']
dn['DHit'] = dn['Mean_Hits_F'] - dn['Mean_Hits_A']
dn['DMiss'] = dn['Mean_Miss_F'] - dn['Mean_Miss_A']
dn['DPenalty'] = dn['Mean_Penalties_F'] - dn['Mean_Penalties_A']
dn['DShot'] = dn['Mean_Shots_F'] - dn['Mean_Shots_A']
dn['DTakeaway'] = dn['Mean_Takeaways_F'] - dn['Mean_Takeaways_A']

## even strength on-ice events prior to a goal while score differential is between -1 and 1 analysis

- summary analysis

In [492]:
dn.describe()

Unnamed: 0,Season,GP,W,L,WinPc,LossPc,Mean_Blocks_F,Mean_Faceoffs_F,Mean_Giveaways_F,Mean_Goals_F,Mean_Hits_F,Mean_Miss_F,Mean_Penalties_F,Mean_Shots_F,Mean_Takeaways_F,Mean_Blocks_A,Mean_Faceoffs_A,Mean_Giveaways_A,Mean_Goals_A,Mean_Hits_A,Mean_Miss_A,Mean_Penalties_A,Mean_Shots_A,Mean_Takeaways_A,Rank_W,DBlock,DFaceoff,DGiveaway,DGoal,DHit,DMiss,DPenalty,DShot,DTakeaway
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2010.0,67.666667,33.833333,33.833333,0.498496,0.501504,4.073572,7.915885,2.367745,1.199633,7.707225,3.302848,1.319632,7.944041,2.128933,4.056366,7.924389,2.357359,1.205708,7.764463,3.303136,1.33139,7.915178,2.122033,15.5,0.017206,-0.008504,0.010386,-0.006075,-0.057237,-0.000288,-0.011758,0.028863,0.0069
std,0.0,8.052985,7.991734,7.240134,0.09448,0.09448,0.694948,0.906739,0.612697,0.115276,1.242109,0.417519,0.226229,1.042616,0.580212,0.580092,0.914749,0.535167,0.131355,1.310589,0.526499,0.191838,0.931039,0.457015,8.802429,0.545656,0.848124,0.425783,0.102781,1.080253,0.444565,0.201901,1.008339,0.312178
min,2010.0,38.0,18.0,20.0,0.296875,0.342466,2.343284,5.455882,1.246575,0.925373,4.80597,2.089552,0.846154,5.910448,1.338028,2.597015,5.522388,1.164179,0.970149,4.38806,1.940299,0.888889,5.268657,1.397059,1.0,-0.867647,-2.514706,-0.736842,-0.236842,-2.307692,-0.885714,-0.461538,-2.071429,-0.697368
25%,2010.0,66.25,28.25,30.0,0.438263,0.432072,3.75,7.570466,1.941746,1.137146,6.695,3.079861,1.218043,7.298073,1.630515,3.715461,7.521405,1.948941,1.098279,7.068327,3.07679,1.224038,7.591176,1.810315,8.25,-0.423988,-0.25,-0.357993,-0.051194,-0.898699,-0.338848,-0.055481,-0.524718,-0.196078
50%,2010.0,70.0,36.5,31.5,0.520833,0.479167,4.013508,8.117089,2.408333,1.212406,7.746154,3.342105,1.329977,7.97827,2.148128,4.066176,7.942437,2.363051,1.218985,7.57067,3.27735,1.342145,7.788257,2.165498,15.5,-0.013621,0.028083,0.028169,0.0,0.265625,0.082254,-0.014312,0.266469,0.055969
75%,2010.0,72.0,40.0,37.5,0.567928,0.561737,4.354327,8.546147,2.828932,1.276533,8.572427,3.579697,1.467525,8.561009,2.368835,4.497462,8.360755,2.75,1.294231,8.594737,3.490385,1.447368,8.469608,2.325658,22.75,0.358572,0.51532,0.311842,0.053283,0.491391,0.337192,0.113883,0.643324,0.204412
max,2010.0,76.0,48.0,47.0,0.657534,0.703125,5.638889,9.367647,3.785714,1.426471,10.328767,4.1,1.8,10.02,3.98,5.055556,10.1,3.338235,1.46,10.442857,4.471429,1.685714,9.62,3.16,30.0,1.507692,1.739726,0.701493,0.184211,2.520548,0.720588,0.338028,1.676471,0.82


#### $WinPc = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanGoals_F + \beta_{5}MeanHits_F + \beta_{6}MeanMiss_F + \beta_{7}MeanPenalties_F + \beta_{8}MeanShots_F + \beta_{9}MeanTakeaways_F + \beta_{10}MeanBlocks_A + \beta_{11}MeanFaceoffs_A + \beta_{12}MeanGiveaways_A + \beta_{13}MeanGoals_A + \beta_{14}MeanHits_A + \beta_{15}MeanMiss_A + \beta_{16}MeanPenalties_A + \beta_{17}MeanShots_A + \beta_{18}MeanTakeaways_A + e_{s}$

In [493]:
print ('win percent in even strength events while score diffential between -1 and 1')
y = dn['WinPc']  
X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.654
Model:                            OLS   Adj. R-squared:                  0.089
Method:                 Least Squares   F-statistic:                     1.157
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.414
Time:                        20:33:30   Log-Likelihood:                 44.653
No. Observations:                  30   AIC:                            -51.31
Df Residuals:                      11   BIC:                            -24.68
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------

In [494]:
print ('win percent in even strength events while score diffential between -1 and 1')
y = dn['WinPc']  
X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Goals_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Goals_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent in even strength events while score diffential between -1 and 1
Optimization terminated successfully.
         Current function value: 0.670841
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       11
Method:                           MLE   Df Model:                           18
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.03217
Time:                        20:33:30   Log-Likelihood:                -20.125
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                     1.000
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
const               

#### $WinPc = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DGoal + \beta_{5}DHit + \beta_{6}DMiss+ \beta_{7}DPenalty + \beta_{8}DShot + \beta_{9}DTakeaway + e_{s}$

In [495]:
print ('win percent for differntial in even strength events while score diffential between -1 and 1')
y = dn['WinPc']  
X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.OLS(y, X).fit()
print (result.summary())

win percent for differntial in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:                  WinPc   R-squared:                       0.510
Model:                            OLS   Adj. R-squared:                  0.290
Method:                 Least Squares   F-statistic:                     2.315
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0566
Time:                        20:33:30   Log-Likelihood:                 39.429
No. Observations:                  30   AIC:                            -58.86
Df Residuals:                      20   BIC:                            -44.85
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------

In [496]:
print ('win percent for differntial in even strength events while score diffential between -1 and 1')
y = dn['WinPc']  
X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DGoal', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result = sm.Logit(y, X).fit()
print (result.summary())

win percent for differntial in even strength events while score diffential between -1 and 1
Optimization terminated successfully.
         Current function value: 0.675724
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:                  WinPc   No. Observations:                   30
Model:                          Logit   Df Residuals:                       20
Method:                           MLE   Df Model:                            9
Date:                Fri, 19 Jan 2018   Pseudo R-squ.:                 0.02512
Time:                        20:33:30   Log-Likelihood:                -20.272
converged:                       True   LL-Null:                       -20.794
                                        LLR p-value:                    0.9993
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0

#### mean goals for analysis

- regress **mean goals for** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_F = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [497]:
print ('mean goals for in even strength events while score diffential between -1 and 1')
y = dn['Mean_Goals_F']  
X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print(result.summary())

mean goals for in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.884
Model:                            OLS   Adj. R-squared:                  0.742
Method:                 Least Squares   F-statistic:                     6.213
Date:                Fri, 19 Jan 2018   Prob (F-statistic):           0.000956
Time:                        20:33:30   Log-Likelihood:                 55.111
No. Observations:                  30   AIC:                            -76.22
Df Residuals:                      13   BIC:                            -52.40
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
----------------------------------------------

In [498]:
#y = dn['Mean_Goals_F']  
#X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_F = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [499]:
print ('mean goals for for differntial in even strength events while score diffential between -1 and 1')
Y = dn['Mean_Goals_F']
X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result= sm.OLS(Y, X).fit()
print (result.summary())

mean goals for for differntial in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_F   R-squared:                       0.224
Model:                            OLS   Adj. R-squared:                 -0.072
Method:                 Least Squares   F-statistic:                    0.7568
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.643
Time:                        20:33:30   Log-Likelihood:                 26.553
No. Observations:                  30   AIC:                            -35.11
Df Residuals:                      21   BIC:                            -22.50
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------

In [500]:
#y = dn['Mean_Goals_F']  
#X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### mean goals against analysis

- regress **mean goals against** on the mean of on-ice events (predictor variables). Add a constant to the predictors and use **OLS** and **Logit**. The purpose is to deterimine the impact each on-ice event has on goals scored.

#### $MeanGoals_A = \beta_{0} + \beta_{1}MeanBlocks_F + \beta_{2}MeanFaceoffs_F + \beta_{3}MeanGiveaways_F + \beta_{4}MeanHits_F + \beta_{5}MeanMiss_F + \beta_{6}MeanPenalties_F + \beta_{7}MeanShots_F + \beta_{8}MeanTakeaways_F + \beta_{9}MeanBlocks_A + \beta_{10}MeanFaceoffs_A + \beta_{11}MeanGiveaways_A + \beta_{12}MeanHits_A + \beta_{13}MeanMiss_A + \beta_{14}MeanPenalties_A + \beta_{15}MeanShots_A + \beta_{16}MeanTakeaways_A + e_{s}$

In [501]:
print ('mean goals against in even strength events while score diffential between -1 and 1')
y = dn['Mean_Goals_A']  
X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
result = sm.OLS(y, X).fit()
print (result.summary())

mean goals against in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.716
Model:                            OLS   Adj. R-squared:                  0.365
Method:                 Least Squares   F-statistic:                     2.044
Date:                Fri, 19 Jan 2018   Prob (F-statistic):             0.0997
Time:                        20:33:30   Log-Likelihood:                 37.693
No. Observations:                  30   AIC:                            -41.39
Df Residuals:                      13   BIC:                            -17.57
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------

In [502]:
#y = dn['Mean_Goals_A']  
#X = sm.add_constant(dn[['Mean_Blocks_F', 'Mean_Faceoffs_F', 'Mean_Giveaways_F', 'Mean_Hits_F', 'Mean_Miss_F', 'Mean_Penalties_F', 'Mean_Shots_F', 'Mean_Takeaways_F', 'Mean_Blocks_A', 'Mean_Faceoffs_A', 'Mean_Giveaways_A', 'Mean_Hits_A', 'Mean_Miss_A', 'Mean_Penalties_A', 'Mean_Shots_A', 'Mean_Takeaways_A']])
#result = sm.Logit(y, X).fit()
#result.summary()

#### $MeanGoals_A = \beta_{0} + \beta_{1}DBlock + \beta_{2}DFaceoff + \beta_{3}DGiveaway + \beta_{4}DHit + \beta_{5}DMiss+ \beta_{6}DPenalty + \beta_{7}DShot + \beta_{8}DTakeaway + e_{s}$

In [503]:
print ('mean goals against for differntial in even strength events while score diffential between -1 and 1')
Y = dn['Mean_Goals_A']
X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
result= sm.OLS(Y, X).fit()
print (result.summary())


mean goals against for differntial in even strength events while score diffential between -1 and 1
                            OLS Regression Results                            
Dep. Variable:           Mean_Goals_A   R-squared:                       0.285
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     1.045
Date:                Fri, 19 Jan 2018   Prob (F-statistic):              0.436
Time:                        20:33:30   Log-Likelihood:                 23.861
No. Observations:                  30   AIC:                            -29.72
Df Residuals:                      21   BIC:                            -17.11
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------

In [504]:
#y = dn['Mean_Goals_A']  
#X = sm.add_constant(dn[['DBlock', 'DFaceoff', 'DGiveaway', 'DHit', 'DMiss', 'DPenalty', 'DShot', 'DTakeaway']])
#result = sm.Logit(y, X).fit()
#result.summary()