## Player Evaluation

Player evaluation will be conducted prior to roster design, as it weighs the overall performance of each player from each game throughout the course of a season. Total Hockey Rating (THoR) is an all-inclusive statistic rating of all NHL defensemen and forwards incidental to all on-ice events. All events of a game are documented and appointed a value determined by the probability that event generated a goal. 

### purpose of notebook:

- determine roster position for both home and visitor team.
- reshape data set
- generate a variable that will show the time difference between a goal and all events that happened prior.
- keep only events that happened 20 seconds prior to a goal.
- group events by goal number to count the occurance of each event prior to a goal.
- sum by event type to display the incidence of each event in two games.
- determine if zone start has a positive or negative impact on each team for a given on ice event ( offensive, neutral and defensive).
- establish the impact of each event on a goal.
- determine if events have a positive or negative impact on each team.
- assign values to players based on their participation in events that led to a goal.

###  import modules

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

### import data frame

The merged data frame created in the roster_design_stephanos notebook is imported and used for player evaluation.

In [34]:
dm = pd.read_csv('out_data/pbpmerge.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [35]:
dm = dm.drop('Unnamed: 0', axis=1)

In [36]:
dm.columns

Index(['Season', 'GameNumber', 'EventNumber', 'Period', 'AdvantageType',
       'EventTimeFromZero', 'EventTimeFromTwenty', 'EventType', 'EventDetail',
       'VPlayer1', 'VPosition1', 'VPlayer2', 'VPosition2', 'VPlayer3',
       'VPosition3', 'VPlayer4', 'VPosition4', 'VPlayer5', 'VPosition5',
       'VPlayer6', 'VPosition6', 'HPlayer1', 'HPosition1', 'HPlayer2',
       'HPosition2', 'HPlayer3', 'HPosition3', 'HPlayer4', 'HPosition4',
       'HPlayer5', 'HPosition5', 'HPlayer6', 'HPosition6', 'TeamCode',
       'PlayerNumber', 'PlayerName', 'ShotType', 'Zone', 'Length',
       'ShotResult', 'ShotTeamCode', 'ShotPlayerNumber', 'ShotPlayerName',
       'WinTeamCode', 'VTeamCode', 'VNumber', 'VName', 'HTeamCode', 'HNumber',
       'HName', 'HitterTeamCode', 'HitterPlayerNumber', 'HitterPlayerName',
       'HitteeTeamCode', 'HitteePlayerNumber', 'HitteePlayerName',
       'PenaltyTeamCode', 'PenaltyPlayerNumber', 'PenaltyPlayerName',
       'PenaltyType', 'DrawnByTeamCode', 'DrawnByPlayer

### roster position

Play by play data reports each on-ice event along all 12 players that were on the ice during a given event, along with the outcome of that on-ice event. There are 6 players for the visitor team and 6 for the home team. Positions 1, 2 and 3 are the forward positions, 4 and 5 are the defense positions and 6 is the goaltender position. Each position is categorized below. 

#### a) for visitor team:

In [37]:
dm['VPosition1'] = 'C'
dm['VPosition2'] = 'RW'
dm['VPosition3'] = 'LW'
dm['VPosition4'] = 'RD'
dm['VPosition5'] = 'LD'

#### b) for home team: 

In [38]:
dm['HPosition1'] = 'C'
dm['HPosition2'] = 'RW'
dm['HPosition3'] = 'LW'
dm['HPosition4'] = 'RD'
dm['HPosition5'] = 'LD'

### reshape from wide to long

Once each roster position has been determined, the next step is to reshape the data set form wide to long. Instead of having 2 columns for each roster position (24 total), all players will be listed into 4 columns: 2 columns for the visitor team ** 'VPlayer' & 'VPosition'** and 2 columns for the home team **'HPlayer' & 'HPosition'**

In [39]:
a = [col for col in dm.columns if 'VPlayer' in col]
b = [col for col in dm.columns if 'HPlayer' in col]
c = [col for col in dm.columns if 'VPosition' in col]
d = [col for col in dm.columns if 'HPosition' in col]
dm = pd.lreshape(dm, {'VPlayer' : a, 'HPlayer' : b, 'VPosition' : c, 'HPosition': d})

In [40]:
dm = dm.sort_values(['Season', 'GameNumber', 'Period', 'EventNumber'], ascending=[True, True, True, True])

In [41]:
dm.columns

Index(['AdvantageType', 'Assist1Player', 'Assist2Player', 'DrawnByPlayerName',
       'DrawnByPlayerNumber', 'DrawnByTeamCode', 'EventDetail', 'EventNumber',
       'EventTimeFromTwenty', 'EventTimeFromZero', 'EventType', 'GameDate',
       'GameNumber', 'GoalNumber', 'GoalTime', 'HName', 'HNumber', 'HTeamCode',
       'HitteePlayerName', 'HitteePlayerNumber', 'HitteeTeamCode',
       'HitterPlayerName', 'HitterPlayerNumber', 'HitterTeamCode', 'Length',
       'PenaltyPlayerName', 'PenaltyPlayerNumber', 'PenaltyTeamCode',
       'PenaltyType', 'Period', 'PlayerName', 'PlayerNumber', 'Season',
       'ShotPlayerName', 'ShotPlayerNumber', 'ShotResult', 'ShotTeamCode',
       'ShotType', 'TeamCode', 'VName', 'VNumber', 'VTeamCode', 'WinTeamCode',
       'Zone', 'endtime', 'starttime', 'VPlayer', 'HPlayer', 'VPosition',
       'HPosition'],
      dtype='object')

### fill in team code for all type of events

For team code column not to be missing any data, **numpy where** is used. It is a command that assigns values to team code based on the event type and the outcome of that event. 
- if an event is a faceoff, the team that won the faceoff will be assigned to 'TeamCode'. 
- if an event is a hit, the team that registered a hit will be assigned to 'TeamCode'. 
- if an event is a penalty, the team that committed the penalty will be assigned to 'TeamCode'.

In [42]:
dm['TeamCode'] = np.where(dm['EventType'] == 'FAC', dm['WinTeamCode'],
                             (np.where(dm['EventType'] == 'HIT', dm['HitterTeamCode'],
                                       (np.where(dm['EventType'] == 'PENL', dm['PenaltyTeamCode'], dm['TeamCode'])))))

### fill in home and visitor team code 

To confirm there are no missing data, home and visitor team code are filled in *backwards*. 

 - visitor team code for all events prior to a goal filled in backwards

In [43]:
dm['VTeamCode'] = dm['VTeamCode'].fillna(method='backfill')

 - home team code for all events prior to a goal filled in backwards

In [44]:
dm['HTeamCode'] = dm['HTeamCode'].fillna(method='backfill')

###  fill in variables goal number and goal time with values

Goal number and goal time values will be assigned to every event, dependent on the number of goals scored in a game and the time (from zero) they happened. Since events that occured **prior to a goal** are being examined, *fill in backwards method* is used. This will assist with the calculation of time difference between a goal and a given event.

 - goal number for all events prior to a goal filled in backwards

In [45]:
dm['GoalNumber']= dm['GoalNumber'].fillna(method='backfill')

- goal time for all events prior to a goal filled in backwards

In [46]:
dm['GoalTime'] = dm['GoalTime'].fillna(method='backfill')

### time difference between goal and events

The time difference between a goal and an event is calculated as followed: 

In [47]:
dm['TBGoalandEvent'] = dm['GoalTime'] - dm['EventTimeFromZero']

### keep only events that happened 20 seconds prior to goal

The playler evaluation model uses only events that happened 20 seconds prior to a goal. If the time between a goal and an event exceeds 20 seconds, the event will not be included in the dataframe. Thus:

In [48]:
dm = dm[dm['TBGoalandEvent'] <= 20]

In [49]:
dm = dm[dm['TBGoalandEvent'] >= 0]

### total observations

the data is grouped by season to count the total occurance of events that happened 20 seconds prior to a goal, in the first two games of the season

In [50]:
dm['counts'] = dm.groupby('Season')['EventType'].transform('count')

### display of each event leading to a goal for two games

The below table lists the occurance of each event type prior to each goal.

In [51]:
dy = dm.groupby(['Season','GameNumber', 'GoalNumber', 'EventType', 'Zone']).size()

### zone start (ZS)

With the help of zone variable, offensive, neutral and defensive zone starts will be created.

**zone start variable:** 

- a value of 1 will be assigned if the on-ice event happened in the offensive zone.
- a value of 0 will be assigned if the on-ice event happened in the neutral zone.
- a value of -1 if it happened in the defensive zone of the representative team.

In [52]:
dm['zs'] = np.where(dm['Zone'] == 'O', 1,
                    (np.where(dm['Zone'] == 'D', -1, 0)))

### home and visitor zone start

- **visitor team zone start (vzs)**
- If team code of event is the same as visitor team, the visitor zone start variable will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start. 

In [53]:
dm['vzs'] = np.where(dm['TeamCode'] == dm['VTeamCode'], dm['zs'], -dm['zs'] )

- **home team zone start (hzs) **
- If team code of event is the same as home team, the home team will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start.

In [54]:
dm['hzs'] = np.where(dm['TeamCode'] == dm['HTeamCode'], dm['zs'], -dm['zs'] )

### create columns for each type of event and assign values to determine the impact they have on a goal

Values are appointed to all even strengths events (9): 
- block shot
- face-off
- giveaway
- goal
- hit
- missed shot
- penalty
- shot on goal
- takeaway

In [55]:
dm['block'] = np.where(dm['EventType'] == 'BLOCK', 1, 0)
dm['faceoff'] = np.where(dm['EventType'] == 'FAC', 1, 0)
dm['giveaway'] = np.where(dm['EventType'] == 'GIVE', 1, 0)
dm['goal'] = np.where(dm['EventType'] == 'GOAL', 1, 0)
dm['hit'] = np.where(dm['EventType'] == 'HIT', 1, 0)
dm['miss'] = np.where(dm['EventType'] == 'MISS', 1, 0)
dm['penalty'] = np.where(dm['EventType'] == 'PENL', 1, 0)
dm['shot'] = np.where(dm['EventType'] == 'SHOT', 1, 0)
dm['takeaway'] = np.where(dm['EventType'] == 'TAKE', 1, 0)

### create a variable that will display the value of each event 

All events that happened 20 seconds prior to a goal are counted. The **mean** is used to establish the impact each event has on a goal.

Fist step is to determine if an event has a positive or negative impact on a goal:
 - giveaway has a **negative impact** on the team that lost possession.
 - faceoff has a **positive impact** on the team that won the faceoff and a **negative impact** on the team that lost. 
 - hit has a positive impact for the team that delivered the hit and a negative impact on the team that received the hit.
 - penalty has a **positive impact** on the team that drew the penalty and a **negative impact** on the team serving. 
 - takeaway has a **positive impact** on the team that stole the puck and gained possession.

In [56]:
dm['eventvalue'] = np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'BLOCK'), dm['block'].mean(),
                             (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'FAC'), dm['faceoff'].mean(),
                                       (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'GIVE'), -(dm['giveaway'].mean()),
                                                 (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'GOAL'), dm['goal'].mean(),
                                                          (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'HIT'), dm['hit'].mean(),
                                                                   (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'MISS'), dm['miss'].mean(),
                                                                            (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'PENL'), -(dm['penalty'].mean()),
                                                                                     (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'SHOT'), dm['shot'].mean(),
                                                                                              (np.where((dm['Season']== dm['Season']) & (dm['EventType'] == 'TAKE'), dm['takeaway'].mean(), 0)))))))))))))))))

All event types have been assigned a value based on the impact they had on a goal. 

##  create event value for home and visitor teams

Each event has an effect on both home and visitor team. An event that has a positive impact on the home team will have a negative impact on the visitor team. Similarly, an event that has a negative effect on the home team, will have a positive effect on the visitor team.

- If an event has a positive impact on the **home team**, the mean will be positive. If an event has a negative impact on the home team, the mean will be negative.
- If an event has a positive impact on the **visitor team**, the mean will be positive. If an event has a negative impact on the visitor team, the mean will be negative.

In [57]:
dm['heventvalue'] = np.where((dm['Season']== dm['Season']) & dm['TeamCode'] == dm['HTeamCode'], dm['eventvalue'], -(dm['eventvalue']))
dm['veventvalue'] = np.where((dm['Season']== dm['Season']) & dm['TeamCode'] == dm['VTeamCode'], dm['eventvalue'], -(dm['eventvalue']))

### assign values to each player 

The value of an event is assigned to all players that were on ice, a total of 12 players (6 per team). The overall contribution of each player is the total (sum) of events they participated in. 

#### contribution of each player  per game. 

Group data frame by season, game number, team code and player for both home and away teams.

In [58]:
dm['vp'] = dm.groupby(['Season', 'GameNumber', 'VTeamCode', 'VPlayer'])['veventvalue'].transform('sum')
dm['hp'] = dm.groupby(['Season', 'GameNumber', 'HTeamCode', 'HPlayer'])['heventvalue'].transform('sum')

## games played

Create a variable that will calculate the sum of games each player played in. The total contribution of a given  player is the sum of events he participated in divided by the number of games he played.

- **a) games played per player for visitor team:**

- create variable that counts the amount of games each player from the **visitor team** played.

In [59]:
dm['vgp'] = dm.groupby(['Season', 'VTeamCode', 'EventNumber', 'VPlayer'])['GameNumber'].transform('count')

- **b) games played per player for home team:**

- create variable that counts the amount of games each player from the **home team** played.

In [60]:
dm['hgp'] = dm.groupby(['Season', 'HTeamCode', 'EventNumber', 'HPlayer'])['GameNumber'].transform('count')

## overall games played

The amount of games each player played has been calculated only for his team being at home or away for the season, since home games played and visitor games played were used. The **total games** of each player is the sum of home and away games he participated in for a whole season.

- create a variable will add up the home event value and away event value for all players of a given team.

In [61]:
dm['gp'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer'] == dm['VPlayer']), (dm['hgp'] + dm['vgp']),
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer'] != dm['VPlayer']), dm['hgp'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer'] == dm['HPlayer']), (dm['vgp'] + dm['hgp']), dm['vgp'])))))

## overall player contribution

The impact of each player has been calculated only for his team being at home or away for the season, since home event value and visitor event value were used. The **total contribution** of each player is the total of events he participated for a whole season. Thus, the sum of both home and away event values must be computed.

- create a variable will add up the home event value and away event value for all players of a given team, that played in **position 1.**

In [62]:
dm['plyr'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer'] == dm['VPlayer']), (dm['hp'] + dm['vp'])/dm['gp'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer'] != dm['VPlayer']), dm['hp']/dm['hgp'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer'] == dm['HPlayer']), (dm['vp'] + dm['hp'])/dm['gp'], dm['vp']/dm['vgp'])))))

The total contribution of each player for the duration of a season has been measured.

## store player evaluation data frame

the player evaluation data frame will be stored and used for the next stage of analysis, player allocation.

In [63]:
dm.to_csv('out_data/player_evaluation.csv', index='False', sep=',')

The next step is to allocate players to their respectful roster position.