# Player Allocation

Upon completion of evaluation of overall performance for all skaters, each player will be categorized and assigned to their respectful position within the roster of their team. The fundamental statistic used to differentiate top from bottom players, is zone start percentage. There are three zones from where plays can be started: offensive, neutral and defensive. Offensive zone start percentage is calculated by the number of face-offs held in the attacking area divided by the sum of face-offs an individual player was on the ice for. Identically, defensive zone start percentage is computed by the number of face-offs taken in their own territory divided by the total face-offs each player was on the ice for. Skaters who are talented in creating opportunities and producing goals will have a greater percentage of offensive zone starts in comparison to defensive zone starts. Equivalently, players who are skilled in preventing chances and goals being conceded, will have a higher defensive zone start percentage in correlation to offensive zone starts. In other words, top six forwards and top four pairing defensemen will start much of their shifts in the zone their opponent whereas bottom six forwards and bottom pairing defensemen in their own zone.


### purpose of notebook:

a) create two variables: offensive zone start and defensive zone start.

b) sum up the total offensive and defensive zone start per player.

c) categorize forwards into top and bottom six.

d) categorize defensemen into top four and bottom pairing.

e) determine each player's roster depth position

##  import modules

In [12]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

## import data frame

The player evaluation data frame is used for player allocation.

In [13]:
dm = pd.read_csv('player_evaluation.csv')

## drop unnamed column (irrelevant)

In [14]:
dm = dm.drop('Unnamed: 0', axis=1)

## zone start

With the help of zone variable, offensive, neutral and defensive zone starts will be created.

**zone start variable:** 

- a value of 1 will be assigned if the on-ice event happened in the offensive zone.

- a value of 0 will be assigned if the on-ice event happened in the neutral zone.

- a value of -1 if it happened in the defensive zone of the representative team.

In [15]:
dm['zs'] = np.where(dm['Zone'] == 'O', 1,
                    (np.where(dm['Zone'] == 'D', -1, 0)))

## home and visitor zone start

- visitor team zone start (vzs)

If team code of event is the same as visitor team, the visitor zone start variable will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start. 

In [16]:
dm['vzs'] = np.where(dm['TeamCode'] == dm['VTeamCode'], dm['zs'], -dm['zs'] )

- home team zone start (hzs) 

If team code of event is the same as home team, the home team will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start. 

In [17]:
dm['hzs'] = np.where(dm['TeamCode'] == dm['HTeamCode'], dm['zs'], -dm['zs'] )

## assign zone start to players

The value of zone start is assigned to all players that were on ice, a total of 12 players (6 per team). The overall zone start variable of each player is the total (sum) of events they participated in. 

### a) overall zone start of each player from the visitor team in all (6) positions 

Group data frame by season, visitor team code and visitor player.

- create variable that sums up the overall zone start of each player from the **visitor team**.

In [18]:
dm['zvp'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer'])['vzs'].transform('sum')

### b) overall zone start of each player from the home team in all (6) positions

Group data frame by season, home team code and home player.

- create variable that sums up the overall zone start of each player from the **home team**.

In [19]:
dm['zhp'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer'])['hzs'].transform('sum')

## overall zone starts

Zone starts of each player has been calculated only for his team being home or away for the season, since home zone start value and visitor zone start value were used. The **total zone starts** of each player is the total of zone starts he participated for a whole season. Thus, the sum of both home and away zone starts.

- create a variable will add up the home zone start value and away zone start value for all players of a given team.

In [20]:
dm['zplyr'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer'] == dm['VPlayer']), (dm['zhp'] + dm['zvp'])/dm['gp'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer'] != dm['VPlayer']), dm['zhp']/dm['hgp'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer'] == dm['HPlayer']), (dm['zvp'] + dm['zhp'])/dm['gp'], dm['zvp']/dm['vgp'])))))

## allocate players per position to forward lines and defensive pairings

- **position 1** is the **right wing** position of forward lines.

- **position 2** is the **right wing** position of forward lines.

- **position 3** is the **left wing** position of forward lines. 

- **position 4** is the **right defense** position of defense pairing. 

- **position 5** is the **left defense** position of defense pairing.

- **position 6** is the **goaltender** position.

- If the zone start variable of a player for a given team is the highest in that specific position, it indicates that he participated in the most offensive zone starts. These skaters will be identified as **first line forwards and top defenisive pairing**. 

- If the zone start variable of a player is the second highest in that specific position, it indicates that the given skater participated in the second most offensive zone starts. These skaters will be identified as **second line forwards and second defenisive pairing**. 

- If the zone start variable of a player is the third highest in that specific position, it indicates that the given skater participated in the third most offensive zone starts. These skaters will be identified as **third line forwards and bottom defenisive pairing**. 

- If the zone start variable of a player is the lowest in that specific position, it indicates that the given skater participated in the least offensive zone starts. These skaters will be identified as **fourth line forwards**. 

### a) visitor team

- generate a variable that will allocate all players to their respectful line. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [21]:
dm['vmax'] = dm.groupby(['Season', 'VTeamCode', 'VPosition'])['zvp'].transform(max)

In [22]:
dm['vmin'] = dm.groupby(['Season', 'VTeamCode', 'VPosition'])['zvp'].transform(min)

In [27]:
dm['vpos'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'C') & (dm['zvp'] == dm['vmax']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'C') & (dm['zvp'] == dm['vmin']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'C') & (dm['VPlayer'] != dm['VPlayer']) & (dm['zvp'] != dm['vmax']) & (dm['zvp'] != dm['vmin']) & (dm['zvp'].shift() != dm['vmax']) & (dm['zvp'].shift() != dm['vmin']) & (dm['zvp'] > dm['zvp'].shift()), 2, 
                                  (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RW') & (dm['zvp'] == dm['vmax']), 1, 
                                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RW') & (dm['zvp'] == dm['vmin']), 4,
                                      (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RW') & (dm['VPlayer'] != dm['VPlayer']) & (dm['zvp'] != dm['vmax']) & (dm['zvp'] != dm['vmin']) & (dm['zvp'].shift() != dm['vmax']) & (dm['zvp'].shift() != dm['vmin']) & (dm['zvp'] > dm['zvp'].shift()), 2,
                                                (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LW') & (dm['zvp'] == dm['vmax']), 1, 
                                                          (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LW') & (dm['zvp'] == dm['vmin']), 4,
                                                                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LW') & (dm['VPlayer'] != dm['VPlayer']) & (dm['zvp'] != dm['vmax']) & (dm['zvp'] != dm['vmin']) & (dm['zvp'].shift() != dm['vmax']) & (dm['zvp'].shift() != dm['vmin']) & (dm['zvp'] > dm['zvp'].shift()), 2,
                                                                              (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RD') & (dm['zvp'] == dm['vmax']), 1, 
                                                                                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RD') & (dm['zvp'] == dm['vmin']), 4,
                                                                                                  (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'RD') & (dm['VPlayer'] != dm['VPlayer']) & (dm['zvp'] != dm['vmax']) & (dm['zvp'] != dm['vmin']) & (dm['zvp'].shift() != dm['vmax']) & (dm['zvp'].shift() != dm['vmin']) & (dm['zvp'] > dm['zvp'].shift()), 2,
                                                                                                            (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LD') & (dm['zvp'] == dm['vmax']), 1, 
                                                                                                                      (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LD') & (dm['zvp'] == dm['vmin']), 4,
                                                                                                                                (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPosition'] == 'LD') & (dm['VPlayer'] != dm['VPlayer']) & (dm['zvp'] != dm['vmax']) & (dm['zvp'] != dm['vmin']) & (dm['zvp'].shift() != dm['vmax']) & (dm['zvp'].shift() != dm['vmin']) & (dm['zvp'] > dm['zvp'].shift()), 2, 3))))))))))))))))))))))))))))) 


### b) home team

- generate a variable that will allocate all players to their respectful line. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [31]:
dm['hmax'] = dm.groupby(['Season', 'HTeamCode', 'HPosition'])['zhp'].transform(max)

In [32]:
dm['hmin'] = dm.groupby(['Season', 'HTeamCode', 'HPosition'])['zhp'].transform(min)

In [33]:
dm['hpos'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'C') & (dm['zhp'] == dm['hmax']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'C') & (dm['zhp'] == dm['hmin']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'C') & (dm['HPlayer'] != dm['HPlayer']) & (dm['zhp'] != dm['hmax']) & (dm['zhp'] != dm['hmin']) & (dm['zhp'].shift() != dm['hmax']) & (dm['zhp'].shift() != dm['hmin']) & (dm['zhp'] > dm['zhp'].shift()), 2, 
                                  (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RW') & (dm['zhp'] == dm['hmax']), 1, 
                                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RW') & (dm['zhp'] == dm['hmin']), 4,
                                      (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RW') & (dm['HPlayer'] != dm['HPlayer']) & (dm['zhp'] != dm['hmax']) & (dm['zhp'] != dm['hmin']) & (dm['zhp'].shift() != dm['hmax']) & (dm['zhp'].shift() != dm['hmin']) & (dm['zhp'] > dm['zhp'].shift()), 2,
                                                (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LW') & (dm['zhp'] == dm['hmax']), 1, 
                                                          (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LW') & (dm['zhp'] == dm['hmin']), 4,
                                                                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LW') & (dm['HPlayer'] != dm['HPlayer']) & (dm['zhp'] != dm['hmax']) & (dm['zhp'] != dm['hmin']) & (dm['zhp'].shift() != dm['hmax']) & (dm['zhp'].shift() != dm['hmin']) & (dm['zhp'] > dm['zhp'].shift()), 2,
                                                                              (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RD') & (dm['zhp'] == dm['hmax']), 1, 
                                                                                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RD') & (dm['zhp'] == dm['hmin']), 4,
                                                                                                  (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'RD') & (dm['HPlayer'] != dm['HPlayer']) & (dm['zhp'] != dm['hmax']) & (dm['zhp'] != dm['hmin']) & (dm['zhp'].shift() != dm['hmax']) & (dm['zhp'].shift() != dm['hmin']) & (dm['zhp'] > dm['zhp'].shift()), 2,
                                                                                                            (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LD') & (dm['zhp'] == dm['hmax']), 1, 
                                                                                                                      (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LD') & (dm['zhp'] == dm['hmin']), 4,
                                                                                                                                (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPosition'] == 'LD') & (dm['HPlayer'] != dm['HPlayer']) & (dm['zhp'] != dm['hmax']) & (dm['zhp'] != dm['hmin']) & (dm['zhp'].shift() != dm['hmax']) & (dm['zhp'].shift() != dm['hmin']) & (dm['zhp'] > dm['zhp'].shift()), 2, 3))))))))))))))))))))))))))))) 


## overall player allocation##

Each player has been assigned to their respectful roster position based on his team being home or away for the season. The **overall roster position** of each player is the mean of both home and away position.

In [34]:
dm['position'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer'] == dm['VPlayer']), (dm['hpos'] + dm['vpos'])/dm['gp'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer'] != dm['VPlayer']), dm['hpos']/dm['hgp'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer'] == dm['HPlayer']), (dm['vpos'] + dm['hpos'])/dm['gp'], dm['vpos']/dm['vgp'])))))

## store player allocation data frame

the player allocation data frame will be stored and used for the implementation of roster model.

In [36]:
dm.to_csv('player_allocation.csv', index='False', sep=',')