Player evaluation will be conducted prior to roster design, as it weighs the overall performance of each player from each game throughout the course of a season. Total Hockey Rating (THoR) is an all-inclusive statistic rating of all NHL defensemen and forwards incidental to all on-ice events. All events of a game are documented and appointed a value determined by the probability that event generated a goal. 

### purpose of notebook:

a) generate a variable that will show the time difference between a goal and all events that happened prior.


b) keep only events that happened 20 seconds prior to a goal.

c) group events by goal number to count the occurance of each event prior to a goal.

d) sum by event type to display the incidence of each event in two games.

e) establish the impact of each event on a goal.

f) deterime if events have a positive or negative impact on each team.

g) assign values to players based on their participation in events that led to a goal.

##  import modules

In [44]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

## import data frame

The merged data frame created in the roster_design_stephanos notebook is imported and used for player evaluation.

In [45]:
dm = pd.read_csv('pbpmerge.csv')

## drop unnamed column (irrelevant)

In [46]:
dm = dm.drop('Unnamed: 0', axis=1)

## a list of the columns in pbpmerge data frame

In [47]:
dm.columns

Index(['Season', 'GameNumber', 'EventNumber', 'Period', 'AdvantageType',
       'EventTimeFromZero', 'EventTimeFromTwenty', 'EventType', 'EventDetail',
       'VPlayer1', 'VPlayer1Position', 'VPlayer2', 'VPlayer2Position',
       'VPlayer3', 'VPlayer3Position', 'VPlayer4', 'VPlayer4Position',
       'VPlayer5', 'VPlayer5Position', 'VPlayer6', 'VPlayer6Position',
       'HPlayer1', 'HPlayer1Position', 'HPlayer2', 'HPlayer2Position',
       'HPlayer3', 'HPlayer3Position', 'HPlayer4', 'HPlayer4Position',
       'HPlayer5', 'HPlayer5Position', 'HPlayer6', 'HPlayer6Position',
       'TeamCode', 'PlayerNumber', 'PlayerName', 'ShotType', 'Zone', 'Length',
       'ShotResult', 'ShotTeamCode', 'ShotPlayerNumber', 'ShotPlayerName',
       'WinTeamCode', 'VTeamCode', 'VPlayerNumber', 'VPlayerName', 'HTeamCode',
       'HPlayerNumber', 'HPlayerName', 'HitterTeamCode', 'HitterPlayerNumber',
       'HitterPlayerName', 'HitteeTeamCode', 'HitteePlayerNumber',
       'HitteePlayerName', 'PenaltyTeamCod

## fill in team code for all type of events

For team code column not to be missing any data, **numpy where** is used. It is a command that assigns values to team code based on the event type and the outcome of that event. 

- if an event is a faceoff, the team that won the faceoff will be assigned to 'TeamCode'. 

- if an event is a hit, the team that registered a hit will be assigned to 'TeamCode'. 

- if an event is a penalty, the team that committed the penalty will be assigned to 'TeamCode'.

In [48]:
dm['TeamCode'] = np.where(dm['EventType'] == 'FAC', dm['WinTeamCode'],
                             (np.where(dm['EventType'] == 'HIT', dm['HitterTeamCode'],
                                       (np.where(dm['EventType'] == 'PENL', dm['PenaltyTeamCode'], dm['TeamCode'])))))

## fill in home and visitor team code 

To confirm there are no missing data, home and visitor team code are filled in *backwards*. 

 - visitor team code for all events prior to a goal filled in backwards

In [50]:
dm['VTeamCode'] = dm['VTeamCode'].fillna(method='bfill')

 - home team code for all events prior to a goal filled in backwards

In [49]:
dm['HTeamCode'] = dm['HTeamCode'].fillna(method='bfill')

##  fill in variables goal number and goal time with values

Goal number and goal time values will be assigned to every event, dependent on the number of goals scored in a game and the time (from zero) they happened. Since events that occured **prior to a goal** are being examined, *fill in backwards method* is used. This will assist with the calculation of time difference between a goal and a given event.

 - goal number for all events prior to a goal filled in backwards

In [51]:
dm['GoalNumber']= dm['GoalNumber'].fillna(method='bfill')

- goal time for all events prior to a goal filled in backwards

In [52]:
dm['GoalTime'] = dm['GoalTime'].fillna(method='bfill')

## generate a variable that will calculate the time difference between goal and events

The time difference between a goal and an event is calculated as followed: 

In [53]:
dm['TBGoalandEvent'] = dm['GoalTime'] - dm['EventTimeFromZero']

## keep only events that happened 20 seconds prior to goal

The playler evaluation model uses only events that happened 20 seconds prior to a goal. If the time between a goal and an event exceeds 20 seconds, the event will not be included in the dataframe. Thus:

In [55]:
dm = dm[dm['TBGoalandEvent'] <= 20]

In [56]:
dm = dm[dm['TBGoalandEvent'] >= 0]

## create a column that will show the total observations for two games

the data is grouped by season to count the total occurance of events that happened 20 seconds prior to a goal, in the first two games of the season

In [57]:
dm['Counts'] = dm.groupby('Season')['EventType'].transform('count')

## create columns for each type of event and assign values to determine the impact they have on a goal

Values are appointed to eight even strengths events: face-off, shot on goal, missed shot, penalty, hit, takeaway, giveaway and goal.

 - create a column that assigns a value of 1 to block events and a value of 0 to every non block event

In [58]:
dm['block'] = np.where(dm['EventType'] == 'BLOCK', 1, 0)

 - create a column that assigns a value of 1 to faceoff events and a value of 0 to every non faceoff event

In [59]:
dm['faceoff'] = np.where(dm['EventType'] == 'FAC', 1, 0)

 - create a column that assigns a value of 1 to giveaway events and a value of 0 to every non giveaway event

In [60]:
dm['giveaway'] = np.where(dm['EventType'] == 'GIVE', 1, 0)

- create a column that assigns a value of 1 to goal events and a value of 0 to every non goal event

In [61]:
dm['goal'] = np.where(dm['EventType'] == 'GOAL', 1, 0)

- create a column that assigns a value of 1 to hit events and a value of 0 to every non hit event

In [62]:
dm['hit'] = np.where(dm['EventType'] == 'HIT', 1, 0)

- create a column that assigns a value of 1 to miss events and a value of 0 to every non miss shot event

In [63]:
dm['miss'] = np.where(dm['EventType'] == 'MISS', 1, 0)

 - create a column that assigns a value of 1 to penalty events and a value of 0 to every non penalty events

In [64]:
dm['penalty'] = np.where(dm['EventType'] == 'PENL', 1, 0)

- create a column that assigns a value of 1 to shot events and a value of 0 to every non shot events

In [65]:
dm['shot'] = np.where(dm['EventType'] == 'SHOT', 1, 0)

 - create a column that assigns a value of 1 to takeaway events and a value of 0 to non takeaway events

In [66]:
dm['takeaway'] = np.where(dm['EventType'] == 'TAKE', 1, 0)

## display of each event leading to a goal for two games

The below table lists the occurance of each event type prior to each goal.

In [67]:
dy = dm.groupby(['Season','GameNumber', 'GoalNumber', 'EventType']).size()

## create a variable that will display the value of each event 

All events that happened 20 seconds prior to a goal are counted. The **mean** is used to establish the impact each event has on a goal.

Fist step is to determine if an event has a positive or negative impact on a goal:

 - giveaway has a negative impact on the team that lost possession.

 - faceoff has a positive impact on the team that won the faceoff and a negative impact for the team that lost. 

- hit has a positive impact for the team that delivered the hit and a negative impact on the team that received the hit.

- penalty has a positive impact on the team that drew the penalty and a negative impact on the team serving. 

 - takeaway has a positive impact on the team that stole the puck and have possession.

In [68]:
dm['eventvalue'] = np.where(dm['EventType'] == 'BLOCK', dm['block'].mean(),
                             (np.where(dm['EventType'] == 'FAC', dm['faceoff'].mean(),
                                       (np.where(dm['EventType'] == 'GIVE', -(dm['giveaway'].mean()),
                                                 (np.where(dm['EventType'] == 'GOAL', dm['goal'].mean(),
                                                          (np.where(dm['EventType'] == 'HIT', dm['hit'].mean(),
                                                                   (np.where(dm['EventType'] == 'MISS', dm['miss'].mean(),
                                                                            (np.where(dm['EventType'] == 'PENL', -(dm['penalty'].mean()),
                                                                                     (np.where(dm['EventType'] == 'SHOT', dm['shot'].mean(),
                                                                                              (np.where(dm['EventType'] == 'TAKE', dm['takeaway'].mean(), 0)))))))))))))))))

All event types have been assigned a value based on the impact they had on a goal. 

##  create event value for home and visitor teams

Each event has an effect on both home and visitor team. An event that has a positive impact on the home team will have a negative impact on the visitor team. Similarly, an event that has a negative effect on the home team, will have a positive effect on the visitor team.

- If an event has a positive impact on the **home team**, the mean will be positive. If an event has a negative impact on the home team, the mean will be negative.

In [69]:
dm['heventvalue'] = np.where(dm['TeamCode'] == dm['HTeamCode'], dm['eventvalue'], -(dm['eventvalue']))

- If an event has a positive impact on the **visitor team**, the mean will be positive. If an event has a negative impact on the home team, the mean will be negative.

In [70]:
dm['veventvalue'] = np.where(dm['TeamCode'] == dm['VTeamCode'], dm['eventvalue'], -(dm['eventvalue']))

## assign values to each player 

The value of an event is assigned to all players that were on ice, a total of 12 players (6 per team). The overall contribution of each player is the total (sum) of events they participated in. 

### a) Overall contribution of each player from the visitor team in all (6) positions 

Group data frame by season, visitor team code and visitor player position to seperate players that play in the same position. 

- create variable that evaluates the overall contribution of each player from the visitor team that is listed as "VPlayer1"

In [84]:
dm['vp1'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer1'])['veventvalue'].transform('sum')

- create variable that evaluates each player from the visitor team that is listed as "VPlayer2"

In [72]:
dm['vp2'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer2'])['veventvalue'].transform('sum')

- create variable that evaluates each player from the visitor team that is listed as "VPlayer3"

In [73]:
dm['vp3'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer3'])['veventvalue'].transform('sum')

- create variable that evaluates each player from the visitor team that is listed as "VPlayer4"

In [74]:
dm['vp4'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer4'])['veventvalue'].transform('sum')

- create variable that evaluates each player from the visitor team that is listed as "VPlayer5"

In [75]:
dm['vp5'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer5'])['veventvalue'].transform('sum')

- create variable that evaluates each player from the visitor team that is listed as "VPlayer6"

In [76]:
dm['vp6'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer6'])['veventvalue'].transform('sum')

### b) Overall contribution of each player from the home team in all (6) positions

Group data frame by season, home team code and home player position to seperate players that play in the same position. 

- create variable that evaluates each player from the home team that is listed as "HPlayer1"

In [77]:
dm['hp1'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer1'])['heventvalue'].transform('sum')

- create variable that evaluates each player from the home team that is listed as "HPlayer2"

In [78]:
dm['hp2'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer2'])['heventvalue'].transform('sum')

- create variable that evaluates each player from the home team that is listed as "HPlayer3"

In [79]:
dm['hp3'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer3'])['heventvalue'].transform('mean')

- create variable that evaluates each player from the home team that is listed as "HPlayer4"

In [80]:
dm['hp4'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer4'])['heventvalue'].transform('sum')

- create variable that evaluates each player from the home team that is listed as "HPlayer5"

In [81]:
dm['hp5'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer5'])['heventvalue'].transform('sum')

- create variable that evaluates each player from the home team that is listed as "HPlayer6"

In [82]:
dm['hp6'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer6'])['heventvalue'].transform('sum')

## players that play in multiple positions.

Throughout the duration of a game, a player may change position. As mentioned above, the overall impact of a given player is the total (sum) of events he participated in. To properly measure each player's contribution, a cross examination per position must be applied. 

- a) cross examine each position for visitor team.

- b) cross examine each position for home team.