# purpose of notebook:

### a) generate a variable that will show the time difference between an event and a goal.
### b) keep only events that happened 20 seconds prior to a goal.
### c) group events by goal number to count the occurance of each event prior to a goal.
### d) sum by event type to display the incidence of each event in two games.
### e) establish the impact of each event on a goal.
### f) deterime if events have a positive or negative impact on each team.
### g) assign values to players based on their participation in events that led to a goal. 

## 1) import modules

In [253]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

## 2) import data frame

In [254]:
dm = pd.read_csv('pbpmerge.csv')

## 3) drop unnamed column (irrelevant)

In [255]:
dm = dm.drop('Unnamed: 0', axis=1)

In [256]:
dm.columns

Index(['Season', 'GameNumber', 'EventNumber', 'Period', 'AdvantageType',
       'EventTimeFromZero', 'EventTimeFromTwenty', 'EventType', 'EventDetail',
       'VPlayer1', 'VPlayer1Position', 'VPlayer2', 'VPlayer2Position',
       'VPlayer3', 'VPlayer3Position', 'VPlayer4', 'VPlayer4Position',
       'VPlayer5', 'VPlayer5Position', 'VPlayer6', 'VPlayer6Position',
       'HPlayer1', 'HPlayer1Position', 'HPlayer2', 'HPlayer2Position',
       'HPlayer3', 'HPlayer3Position', 'HPlayer4', 'HPlayer4Position',
       'HPlayer5', 'HPlayer5Position', 'HPlayer6', 'HPlayer6Position',
       'TeamCode', 'PlayerNumber', 'PlayerName', 'ShotType', 'Zone', 'Length',
       'ShotResult', 'ShotTeamCode', 'ShotPlayerNumber', 'ShotPlayerName',
       'WinTeamCode', 'VTeamCode', 'VPlayerNumber', 'VPlayerName', 'HTeamCode',
       'HPlayerNumber', 'HPlayerName', 'HitterTeamCode', 'HitterPlayerNumber',
       'HitterPlayerName', 'HitteeTeamCode', 'HitteePlayerNumber',
       'HitteePlayerName', 'PenaltyTeamCod

## 4) fill in team code for all type of events

In [257]:
dm['TeamCode'] = np.where(dm['EventType'] == 'FAC', dm['WinTeamCode'],
                             (np.where(dm['EventType'] == 'HIT', dm['HitterTeamCode'],
                                       (np.where(dm['EventType'] == 'PENL', dm['PenaltyTeamCode'], dm['TeamCode'])))))

## 5) fill in home and visitor team code with the use of  team code variable

### - fill in home team code for all events prior to a goal

In [258]:
dm['HTeamCode'] = dm['HTeamCode'].fillna(method='bfill')

### - fill in visitor team code for all events  prior to a goal

In [259]:
dm['VTeamCode'] = dm['VTeamCode'].fillna(method='bfill')

## 6) fill in variables goal number and goal time with values

### - fill in goal number backwards for all events that occured prior to a goal

In [260]:
dm['GoalNumber']= dm['GoalNumber'].fillna(method='bfill')

### - fill in goal time backwards for all events that occured prior to a goal

In [261]:
dm['GoalTime'] = dm['GoalTime'].fillna(method='bfill')

## 7) generate a variable that will calculate the time difference between goal and events

In [262]:
dm['TBGoalandEvent'] = dm['GoalTime'] - dm['EventTimeFromZero']

In [263]:
dm[200:230]

Unnamed: 0,Season,GameNumber,EventNumber,Period,AdvantageType,EventTimeFromZero,EventTimeFromTwenty,EventType,EventDetail,VPlayer1,...,PenaltyType,DrawnByTeamCode,DrawnByPlayerNumber,DrawnByPlayerName,GameDate,GoalNumber,GoalTime,Assist1Player,Assist2Player,TBGoalandEvent
200,2010,20001,279.0,3,EV,904,296.0,SHOT,"MTL ONGOAL - #21 GIONTA, Slap, Neu. Zone, 72 ft.",11.0,...,,,,,,2.0,1035.0,,,131.0
201,2010,20001,280.0,3,EV,929,271.0,BLOCK,"TOR #41 KULEMIN BLOCKED BY MTL #94 PYATT, Wri...",17.0,...,,,,,,2.0,1035.0,,,106.0
202,2010,20001,281.0,3,EV,966,234.0,PENL,"MTL #94 PYATT&nbsp;Hooking(2 min), Def. Zone D...",17.0,...,Hooking,TOR,41.0,KULEMIN,,2.0,1035.0,,,69.0
203,2010,20001,295.0,3,EV,1087,113.0,SHOT,"TOR ONGOAL - #81 KESSEL, Wrist, Off. Zone, 19 ft.",17.0,...,,,,,,2.0,1035.0,,,-52.0
204,2010,20001,297.0,3,EV,1110,90.0,FAC,MTL won Neu. Zone - MTL #14 PLEKANEC vs TOR #3...,14.0,...,,,,,2010-10-07,2.0,1035.0,,,-75.0
205,2010,20001,298.0,3,EV,1124,76.0,HIT,"TOR #2 SCHENN HIT MTL #46 KOSTITSYN, Def. Zone",14.0,...,,,,,,2.0,1035.0,,,-89.0
206,2010,20001,299.0,3,EV,1145,55.0,BLOCK,MTL #14 PLEKANEC BLOCKED BY TOR #36 GUNNARSSO...,11.0,...,,,,,,2.0,1035.0,,,-110.0
207,2010,20001,300.0,3,EV,1181,19.0,TAKE,"TOR&nbsp;TAKEAWAY - #22 BEAUCHEMIN, Def. Zone",11.0,...,,,,,,2.0,1035.0,,,-146.0
208,2010,20001,301.0,3,EV,1192,8.0,SHOT,"MTL ONGOAL - #21 GIONTA, Wrap-around, Off. Zon...",11.0,...,,,,,,2.0,1035.0,,,-157.0
209,2010,20001,302.0,3,EV,1198,2.0,SHOT,"MTL ONGOAL - #21 GIONTA, Snap, Off. Zone, 12 ft.",11.0,...,,,,,,2.0,1035.0,,,-163.0


## 8) keep only events that happened 20 seconds prior to goal

In [264]:
dm = dm[dm['TBGoalandEvent'] <= 20]

In [265]:
dm = dm[dm['TBGoalandEvent'] >= 0]

### | results show the events that occured 20 sec prior to each goal.

## 9) create a column that will show the total observations for two games

In [266]:
dm['Counts'] = dm.groupby('Season')['EventType'].transform('count')

## 10) create columns for each type of event and assign values to determine the impact they have on a goal

### - create a column that assigns a value of 1 to block events

In [267]:
dm['block'] = np.where(dm['EventType'] == 'BLOCK', 1, 0)

### - create a column that assigns a value of 1 to faceoff events

In [268]:
dm['faceoff'] = np.where(dm['EventType'] == 'FAC', 1, 0)

### - create a column that assigns a value of 1 to giveaway events

In [269]:
dm['giveaway'] = np.where(dm['EventType'] == 'GIVE', 1, 0)

### - create a column that assigns a value of 1 to goal events

In [270]:
dm['goal'] = np.where(dm['EventType'] == 'GOAL', 1, 0)

### - create a column that assigns a value of 1 to hit events

In [271]:
dm['hit'] = np.where(dm['EventType'] == 'HIT', 1, 0)

### - create a column that assigns a value of 1 to miss events

In [272]:
dm['miss'] = np.where(dm['EventType'] == 'MISS', 1, 0)

### - create a column that assigns a value of 1 to penalty events

In [273]:
dm['penalty'] = np.where(dm['EventType'] == 'PENL', 1, 0)

### - create a column that assigns a value of 1 to shot events

In [274]:
dm['shot'] = np.where(dm['EventType'] == 'SHOT', 1, 0)

### - create a column that assigns a value of 1 to takeaway events

In [275]:
dm['takeaway'] = np.where(dm['EventType'] == 'TAKE', 1, 0)

## 11) display of each event leading to a goal for two games

In [276]:
dy = dm.groupby(['Season','GameNumber', 'GoalNumber', 'EventType']).size()

### | the above table lists the occurance of each event type prior to each goal.

## 12) create a variable that will show the value of each event and determine if the event has a positive or negative impact on each team. 

### the mean is used to determine the impact each event has on a goal. 

In [277]:
dm['eventvalue'] = np.where(dm['EventType'] == 'BLOCK', dm['block'].mean(),
                             (np.where(dm['EventType'] == 'FAC', dm['faceoff'].mean(),
                                       (np.where(dm['EventType'] == 'GIVE', -(dm['giveaway'].mean()),
                                                 (np.where(dm['EventType'] == 'GOAL', dm['goal'].mean(),
                                                          (np.where(dm['EventType'] == 'HIT', dm['hit'].mean(),
                                                                   (np.where(dm['EventType'] == 'MISS', dm['miss'].mean(),
                                                                            (np.where(dm['EventType'] == 'PENL', -(dm['penalty'].mean()),
                                                                                     (np.where(dm['EventType'] == 'SHOT', dm['shot'].mean(),
                                                                                              (np.where(dm['EventType'] == 'TAKE', dm['takeaway'].mean(), 0)))))))))))))))))

### - giveaway has a negative impact on the team that commited it

### - faceoff has a positive impact on the team that won it and a negative impact on the team that lost 

### - takeaway has a positive impact on the team that stole the puck and have possession

### all events have been assigned a value. 

### - hit has a positive impact for the team that delivered the hit and a negative impact on the team that received the hit

### - penalty has a positive impact on the team that drew the penalty and a negative impact on the team serving 

## 13) create event value for home and visitor teams

### - event value for home team. If an event has a positive impact on the home team, the mean will be positive. If an event has a negative impact on the home team, the mean will be negative.

In [281]:
dm['heventvalue'] = np.where(dm['TeamCode'] == dm['HTeamCode'], dm['eventvalue'], -(dm['eventvalue']))

### - event value for visitor team.  If an event has a positive impact on the visitor team, the mean will be positive. If an event has a negative impact on the home team, the mean will be negative.

In [279]:
dm['veventvalue'] = np.where(dm['TeamCode'] == dm['VTeamCode'], dm['eventvalue'], -(dm['eventvalue']))

## 14) assign values to each player dependent on the ice-events they participated in.