# Player Allocation

Upon completion of evaluation of overall performance for all skaters, each player will be categorized and assigned to their respectful position within the roster of their team. The fundamental statistic used to differentiate top from bottom players, is zone start percentage. There are three zones from where plays can be started: offensive, neutral and defensive. Offensive zone start percentage is calculated by the number of face-offs held in the attacking area divided by the sum of face-offs an individual player was on the ice for. Identically, defensive zone start percentage is computed by the number of face-offs taken in their own territory divided by the total face-offs each player was on the ice for. Skaters who are talented in creating opportunities and producing goals will have a greater percentage of offensive zone starts in comparison to defensive zone starts. Equivalently, players who are skilled in preventing chances and goals being conceded, will have a higher defensive zone start percentage in correlation to offensive zone starts. In other words, top six forwards and top four pairing defensemen will start much of their shifts in the zone their opponent whereas bottom six forwards and bottom pairing defensemen in their own zone.


### purpose of notebook:

a) create two variables: offensive zone start and defensive zone start.

b) sum up the total offensive and defensive zone start per player.

c) categorize forwards into top and bottom six.

d) categorize defensemen into top four and bottom pairing.

e) determine each player's roster depth position

##  import modules

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

## import data frame

The player evaluation data frame is used for player allocation.

In [2]:
dm = pd.read_csv('player_evaluation.csv')

## drop unnamed column (irrelevant)

In [3]:
dm = dm.drop('Unnamed: 0', axis=1)

## zone start

With the help of zone variable, offensive, neutral and defensive zone starts will be created.

**zone start variable:** 

- a value of 1 will be assigned if the on-ice event happened in the offensive zone.

- a value of 0 will be assigned if the on-ice event happened in the neutral zone.

- a value of -1 if it happened in the defensive zone of the representative team.

In [4]:
dm['zs'] = np.where(dm['Zone'] == 'O', 1,
                    (np.where(dm['Zone'] == 'D', -1, 0)))

## home and visitor zone start

- visitor team zone start (vzs)

If team code of event is the same as visitor team, the visitor zone start variable will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start. 

In [5]:
dm['vzs'] = np.where(dm['TeamCode'] == dm['VTeamCode'], dm['zs'], -dm['zs'] )

- home team zone start (hzs) 

If team code of event is the same as home team, the home team will be assigned identical value to zone start. If not, it will be assigned the opposite (negative) value of zone start. 

In [6]:
dm['hzs'] = np.where(dm['TeamCode'] == dm['HTeamCode'], dm['zs'], -dm['zs'] )

## assign zone start to players

The value of zone start is assigned to all players that were on ice, a total of 12 players (6 per team). The overall zone start variable of each player is the total (sum) of events they participated in. 

### a) overall zone start of each player from the visitor team in all (6) positions 

Group data frame by season, visitor team code and visitor player position to seperate players that play in the same position. 

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer1"

In [7]:
dm['zvp1'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer1'])['vzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer2"

In [8]:
dm['zvp2'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer2'])['vzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer3"

In [9]:
dm['zvp3'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer3'])['vzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer4"

In [10]:
dm['zvp4'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer4'])['vzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer5"

In [11]:
dm['zvp5'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer5'])['vzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "VPlayer6"

In [12]:
dm['zvp6'] = dm.groupby(['Season', 'VTeamCode', 'VPlayer6'])['vzs'].transform('sum')

### b) overall zone start of each player from the home team in all (6) positions

Group data frame by season, home team code and home player position to seperate players that play in the same position.

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer1"

In [13]:
dm['zhp1'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer1'])['hzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer2"

In [14]:
dm['zhp2'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer2'])['hzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer3"

In [15]:
dm['zhp3'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer3'])['hzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer4"

In [16]:
dm['zhp4'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer4'])['hzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer5"

In [17]:
dm['zhp5'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer5'])['hzs'].transform('sum')

- create variable that sums up the overall zone start of each player from the visitor team that is listed as "HPlayer6"

In [18]:
dm['zhp6'] = dm.groupby(['Season', 'HTeamCode', 'HPlayer6'])['hzs'].transform('sum')

## players that play in multiple positions

Throughout the duration of a game, a player may change position. As mentioned above, the overall impact of a given player is the total (sum) of events he participated in. To properly measure each player's contribution, a cross examination per position must be applied.

- **a) cross examine each position for visitor team:**

**visitor player 1**

- if **visitor player in position 1** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in.A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [19]:
dm['tzvp1'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer2']), dm['zvp1'] + dm['zvp2'],
                       (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer3']), dm['zvp1'] + dm['zvp3'],
                                 (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer4']), dm['zvp1'] + dm['zvp4'],
                                           (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer5']), dm['zvp1'] + dm['zvp5'],
                                                     (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer6']), dm['zvp1'] + dm['zvp6'], dm['zvp1']))))))))) 

**visitor player 2**

- if **visitor player in position 2** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [20]:
dm['tzvp2'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] == dm['VPlayer1']), dm['zvp2'] + dm['zvp1'],
                       (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] == dm['VPlayer3']), dm['zvp2'] + dm['zvp3'],
                                 (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] == dm['VPlayer4']), dm['zvp2'] + dm['zvp4'],
                                           (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] == dm['VPlayer5']), dm['zvp2'] + dm['zvp5'],
                                                     (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] == dm['VPlayer6']), dm['zvp2'] + dm['zvp6'], dm['zvp2']))))))))) 

** visitor player 3**

- if **visitor player in position 3** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [21]:
dm['tzvp3'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] == dm['VPlayer1']), dm['zvp3'] + dm['zvp1'],
                       (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] == dm['VPlayer2']), dm['zvp3'] + dm['zvp2'],
                                 (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] == dm['VPlayer4']), dm['zvp3'] + dm['zvp4'],
                                           (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] == dm['VPlayer5']), dm['zvp3'] + dm['zvp5'],
                                                     (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] == dm['VPlayer6']), dm['zvp3'] + dm['zvp6'], dm['zvp3']))))))))) 

** visitor player 4**

- if **visitor player in position 4** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [22]:
dm['tzvp4'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] == dm['VPlayer1']), dm['zvp4'] + dm['zvp1'],
                       (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] == dm['VPlayer2']), dm['zvp4'] + dm['zvp2'],
                                 (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] == dm['VPlayer3']), dm['zvp4'] + dm['zvp3'],
                                           (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] == dm['VPlayer5']), dm['zvp4'] + dm['zvp5'],
                                                     (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] == dm['VPlayer6']), dm['zvp4'] + dm['zvp6'], dm['zvp4']))))))))) 

** visitor player 5**

- if **visitor player in position 5** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [23]:
dm['tzvp5'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] == dm['VPlayer1']), dm['zvp5'] + dm['zvp1'],
                       (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] == dm['VPlayer2']), dm['zvp5'] + dm['zvp2'],
                                 (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] == dm['VPlayer3']), dm['zvp5'] + dm['zvp3'],
                                           (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] == dm['VPlayer4']), dm['zvp5'] + dm['zvp4'],
                                                     (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] == dm['VPlayer6']), dm['zvp5'] + dm['zvp6'], dm['zvp5']))))))))) 

**visitor player 6**

Position 6 is the goaltender position. A goalie doesn't play in any other position, therefore :

In [24]:
dm['tzvp6'] = dm['zvp6']

- **b) cross examine each position for home team:**

**home position 1**

- if **home player in position 1** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [25]:
dm['tzhp1'] = np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] == dm['HPlayer2']), dm['zhp1'] + dm['zhp2'],
                       (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] == dm['HPlayer3']), dm['zhp1'] + dm['zhp3'],
                                 (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] == dm['HPlayer4']), dm['zhp1'] + dm['zhp4'],
                                           (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] == dm['HPlayer5']), dm['zhp1'] + dm['zhp5'],
                                                     (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] == dm['HPlayer6']), dm['zhp1'] + dm['zhp6'], dm['zhp1']))))))))) 

** home player 2**

- if **home player in position 2** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [26]:
dm['tzhp2'] = np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] == dm['HPlayer1']), dm['zhp2'] + dm['zhp1'],
                       (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] == dm['HPlayer3']), dm['zhp2'] + dm['zhp3'],
                                 (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] == dm['HPlayer4']), dm['zhp2'] + dm['zhp4'],
                                           (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] == dm['HPlayer5']), dm['zhp2'] + dm['zhp5'],
                                                     (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] == dm['HPlayer6']), dm['zhp2'] + dm['zhp6'], dm['zhp2']))))))))) 

** home player 3**

- if **home player in position 3** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [27]:
dm['tzhp3'] = np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] == dm['HPlayer1']), dm['zhp3'] + dm['zhp1'],
                       (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] == dm['HPlayer2']), dm['zhp3'] + dm['zhp2'],
                                 (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] == dm['HPlayer4']), dm['zhp3'] + dm['zhp4'],
                                           (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] == dm['HPlayer5']), dm['zhp3'] + dm['zhp5'],
                                                     (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] == dm['HPlayer6']), dm['zhp3'] + dm['zhp6'], dm['zhp3']))))))))) 

** home player 4**

- if **home player in position 4** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [28]:
dm['tzhp4'] = np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] == dm['HPlayer1']), dm['zhp4'] + dm['zhp1'],
                       (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] == dm['HPlayer2']), dm['zhp4'] + dm['zhp2'],
                                 (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] == dm['HPlayer3']), dm['zhp4'] + dm['zhp3'],
                                           (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] == dm['HPlayer5']), dm['zhp4'] + dm['zhp5'],
                                                     (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] == dm['HPlayer6']), dm['zhp4'] + dm['zhp6'], dm['zhp4']))))))))) 

**home player 5**

- if **home player in position 5** played any games in any other position throughtout the season, the total of zone starts he was on the ice for are calculated with the use of **np.where**. It is the sum of zone starts for all positions he played in. A player might play in position 6 (goaltender position) when his team is trailing in the final minutes of a game and they decide to pull the goaltender to add an additonal skater.

In [29]:
dm['tzhp5'] = np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] == dm['HPlayer1']), dm['zhp5'] + dm['zhp1'],
                       (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] == dm['HPlayer2']), dm['zhp5'] + dm['zhp2'],
                                 (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] == dm['HPlayer3']), dm['zhp5'] + dm['zhp3'],
                                           (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] == dm['HPlayer4']), dm['zhp5'] + dm['zhp4'],
                                                     (np.where((dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] == dm['HPlayer6']), dm['zhp5'] + dm['zhp6'], dm['zhp5']))))))))) 

** home player 6**

**Position 6 is the goaltender position.** A goalie doesn't play in any other position, therefore :

In [30]:
dm['tzhp6'] = dm['zhp6']

## overall zone starts

Zone starts of each player has been calculated only for his team being home or away for the season, since home zone start value and visitor zone start value were used. The **total zone starts** of each player is the total of zone starts he participated for a whole season. Thus, the sum of both home and away zone starts.

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 1.**

In [31]:
dm['zplyr1'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer1'] == dm['VPlayer1']), (dm['tzhp1'] + dm['tzvp1'])/dm['gp1'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer1'] != dm['VPlayer1']), dm['tzhp1']/dm['thgp3'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer1'] == dm['HPlayer1']), (dm['tzvp1'] + dm['tzhp1'])/dm['gp1'], dm['tzvp1']/dm['tvgp1'])))))

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 2.**

In [32]:
dm['zplyr2'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer2'] == dm['VPlayer2']), (dm['tzhp2'] + dm['tzvp2'])/2,
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer2'] != dm['VPlayer2']), dm['tzhp2'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer2'] == dm['HPlayer2']), (dm['tzvp2'] + dm['tzhp2'])/2, dm['tzvp2'])))))

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 3.**

In [33]:
dm['zplyr3'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer3'] == dm['VPlayer3']), (dm['tzhp3'] + dm['tzvp3'])/2,
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer3'] != dm['VPlayer3']), dm['tzhp3'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer3'] == dm['HPlayer3']), (dm['tzvp3'] + dm['tzhp3'])/2, dm['tzvp3'])))))

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 4.**

In [34]:
dm['zplyr4'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer4'] == dm['VPlayer4']), (dm['tzhp4'] + dm['tzvp4'])/2,
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer4'] != dm['VPlayer4']), dm['tzhp4'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer4'] == dm['HPlayer4']), (dm['tzvp4'] + dm['tzhp4'])/2, dm['tzvp4'])))))

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 5.**

In [35]:
dm['zplyr5'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer5'] == dm['VPlayer5']), (dm['tzhp5'] + dm['tzvp5'])/2,
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer5'] != dm['VPlayer5']), dm['tzhp5'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer5'] == dm['HPlayer5']), (dm['tzvp5'] + dm['tzhp5'])/2, dm['tzvp5'])))))

- create a variable will add up the home zone start value and away zone start value for all players of a given team, that played in **position 6.**

In [36]:
dm['zplyr6'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer6'] == dm['VPlayer6']), (dm['tzhp6'] + dm['tzvp6'])/2,
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer6'] != dm['VPlayer6']), dm['tzhp6'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer6'] == dm['HPlayer6']), (dm['tzvp6'] + dm['tzhp6'])/2, dm['tzvp6'])))))

## allocate players per position to forward lines and defensive pairings

- If the zone start variable of a player for a given team is the highest in that specific position, it indicates that he participated in the most offensive zone starts. These skaters will be identified as **first line forwards and top defenisive pairing**. 

- If the zone start variable of a player is the second highest in that specific position, it indicates that the given skater participated in the second most offensive zone starts. These skaters will be identified as **second line forwards and second defenisive pairing**. 

- If the zone start variable of a player is the third highest in that specific position, it indicates that the given skater participated in the third most offensive zone starts. These skaters will be identified as **third line forwards and bottom defenisive pairing**. 

- If the zone start variable of a player is the lowest in that specific position, it indicates that the given skater participated in the least offensive zone starts. These skaters will be identified as **fourth line forwards**. 

### a) visitor team

- generate a variable that will allocate all players to their respectful line. **Position 1** is the **centre** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [37]:
dm['vmax1'] = dm.groupby(['Season', 'VTeamCode'])['tzvp1'].transform(max)

In [38]:
dm['vmin1'] = dm.groupby(['Season', 'VTeamCode'])['tzvp1'].transform(min)

In [39]:
dm['vc'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp1'] == dm['vmax1']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp1'] == dm['vmin1']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] != dm['VPlayer1']) & (dm['tzvp1'] != dm['vmax1']) & (dm['tzvp1'] != dm['vmin1']) & (dm['tzvp1'].shift() != dm['vmax1']) & (dm['tzvp1'].shift() != dm['vmin1']) & (dm['tzvp1'] > dm['tzvp1'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 2** is the **right wing** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [40]:
dm['vmax2'] = dm.groupby(['Season', 'VTeamCode'])['tzvp2'].transform(max)

In [41]:
dm['vmin2'] = dm.groupby(['Season', 'VTeamCode'])['tzvp2'].transform(min)

In [42]:
dm['vrw'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp2'] == dm['vmax2']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp2'] == dm['vmin2']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer2'] != dm['VPlayer2']) & (dm['tzvp2'] != dm['vmax2']) & (dm['tzvp2'] != dm['vmin2']) & (dm['tzvp2'].shift() != dm['vmax2']) & (dm['tzvp2'].shift() != dm['vmin2']) & (dm['tzvp2'] > dm['tzvp2'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 3** is the **left wing** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [43]:
dm['vmax3'] = dm.groupby(['Season', 'VTeamCode'])['tzvp3'].transform(max)

In [44]:
dm['vmin3'] = dm.groupby(['Season', 'VTeamCode'])['tzvp3'].transform(min)

In [45]:
dm['vlw'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp3'] == dm['vmax3']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp3'] == dm['vmin3']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer3'] != dm['VPlayer3']) & (dm['tzvp3'] != dm['vmax3']) & (dm['tzvp3'] != dm['vmin3']) & (dm['tzvp3'].shift() != dm['vmax3']) & (dm['tzvp3'].shift() != dm['vmin3']) & (dm['tzvp3'] > dm['tzvp3'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 4** is the **right defense** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [46]:
dm['vmax4'] = dm.groupby(['Season', 'VTeamCode'])['tzvp4'].transform(max)

In [47]:
dm['vmin4'] = dm.groupby(['Season', 'VTeamCode'])['tzvp4'].transform(min)

In [48]:
dm['vdr'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp4'] == dm['vmax4']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp4'] == dm['vmin4']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer4'] != dm['VPlayer4']) & (dm['tzvp4'] != dm['vmax4']) & (dm['tzvp4'] != dm['vmin4']) & (dm['tzvp4'].shift() != dm['vmax4']) & (dm['tzvp4'].shift() != dm['vmin4']) & (dm['tzvp4'] > dm['tzvp4'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 5** is the **left defense** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [49]:
dm['vmax5'] = dm.groupby(['Season', 'VTeamCode'])['tzvp5'].transform(max)

In [50]:
dm['vmin5'] = dm.groupby(['Season', 'VTeamCode'])['tzvp5'].transform(min)

In [51]:
dm['vdl'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp5'] == dm['vmax5']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp5'] == dm['vmin5']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer5'] != dm['VPlayer5']) & (dm['tzvp5'] != dm['vmax5']) & (dm['tzvp5'] != dm['vmin5']) & (dm['tzvp5'].shift() != dm['vmax5']) & (dm['tzvp5'].shift() != dm['vmin5']) & (dm['tzvp5'] > dm['tzvp5'].shift()), 2, 3))))) 


- **Position 6** is the **goaltender position**. No conditions necessary 

In [52]:
dm['vg'] = 1

### b) home team

- generate a variable that will allocate all players to their respectful line. **Position 1** is the **centre** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [53]:
dm['hmax1'] = dm.groupby(['Season', 'HTeamCode'])['tzhp1'].transform(max)

In [54]:
dm['hmin1'] = dm.groupby(['Season', 'HTeamCode'])['tzhp1'].transform(min)

In [55]:
dm['hc'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp1'] == dm['hmax1']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp1'] == dm['hmin1']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer1'] != dm['HPlayer1']) & (dm['tzhp1'] != dm['hmax1']) & (dm['tzhp1'] != dm['hmin1']) & (dm['tzhp1'].shift() != dm['hmax1']) & (dm['tzhp1'].shift() != dm['hmin1']) & (dm['tzhp1'] > dm['tzhp1'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 2** is the **right wing** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [56]:
dm['hmax2'] = dm.groupby(['Season', 'HTeamCode'])['tzhp2'].transform(max)

In [57]:
dm['hmin2'] = dm.groupby(['Season', 'HTeamCode'])['tzhp2'].transform(min)

In [58]:
dm['hrw'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp2'] == dm['hmax2']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp2'] == dm['hmin2']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer2'] != dm['HPlayer2']) & (dm['tzhp2'] != dm['hmax2']) & (dm['tzhp2'] != dm['hmin2']) & (dm['tzhp2'].shift() != dm['hmax2']) & (dm['tzhp2'].shift() != dm['hmin2']) & (dm['tzhp2'] > dm['tzhp2'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 3** is the **left wing** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [59]:
dm['hmax3'] = dm.groupby(['Season', 'HTeamCode'])['tzhp3'].transform(max)

In [60]:
dm['hmin3'] = dm.groupby(['Season', 'HTeamCode'])['tzhp3'].transform(min)

In [61]:
dm['hlw'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp3'] == dm['hmax3']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp3'] == dm['hmin3']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer3'] != dm['HPlayer3']) & (dm['tzhp3'] != dm['hmax3']) & (dm['tzhp3'] != dm['hmin3']) & (dm['tzhp3'].shift() != dm['hmax3']) & (dm['tzhp3'].shift() != dm['hmin3']) & (dm['tzhp3'] > dm['tzhp3'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 4** is the **right defense** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [62]:
dm['hmax4'] = dm.groupby(['Season', 'HTeamCode'])['tzhp4'].transform(max)

In [63]:
dm['hmin4'] = dm.groupby(['Season', 'HTeamCode'])['tzhp4'].transform(min)

In [64]:
dm['hdr'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp4'] == dm['hmax4']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp4'] == dm['hmin4']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer4'] != dm['HPlayer4']) & (dm['tzhp4'] != dm['hmax4']) & (dm['tzhp4'] != dm['hmin4']) & (dm['tzhp4'].shift() != dm['hmax4']) & (dm['tzhp4'].shift() != dm['hmin4']) & (dm['tzhp4'] > dm['tzhp4'].shift()), 2, 3))))) 


- generate a variable that will allocate all players to their respectful line. **Position 5** is the **left defense** position of forward lines. If **total zone start ** is the highest amongst players per position, that player is assinged to the **top line**. If **total zone start** is the lowest amongst players per position, that player is assinged to the **4th line**. For the two values left, the player that has the highest **total zone start ** will be allocated to the **2nd line**. The other player will be assigned to the **3rd line**.

In [65]:
dm['hmax5'] = dm.groupby(['Season', 'HTeamCode'])['tzhp5'].transform(max)

In [66]:
dm['hmin5'] = dm.groupby(['Season', 'HTeamCode'])['tzhp5'].transform(min)

In [67]:
dm['hdl'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp5'] == dm['hmax5']), 1, 
                    (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['tzhp5'] == dm['hmin5']), 4,
                        (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['HTeamCode']) & (dm['HPlayer5'] != dm['HPlayer5']) & (dm['tzhp5'] != dm['hmax5']) & (dm['tzhp5'] != dm['hmin5']) & (dm['tzhp5'].shift() != dm['hmax5']) & (dm['tzhp5'].shift() != dm['hmin5']) & (dm['tzhp5'] > dm['tzhp5'].shift()), 2, 3))))) 


-**Position 6** is the **goaltender position.** No conditions necessary 

In [68]:
dm['hg'] = 1

## overall player allocation##

Each player has been assigned to their respectful roster position based on his team being home or away for the season. The **overall roster position** of each player is the mean of both home and away position.

- create a variable for the overall roster position for all players of a given team, that played in position 1, which is the **centre position.**

In [69]:
dm['c'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer1'] == dm['VPlayer1']), (dm['hc'] + dm['vc'])/dm['gp1'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer1'] != dm['VPlayer1']), dm['hc']/dm['thgp1'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer1'] == dm['HPlayer1']), (dm['vc'] + dm['hc'])/dm['gp1'], dm['vc']/dm['tvgp1'])))))

- create a variable for the overall roster position for all players of a given team, that played in position 2, which is the **right wing position.**

In [70]:
dm['rw'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer2'] == dm['VPlayer2']), (dm['hrw'] + dm['vrw'])/dm['gp2'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer2'] != dm['VPlayer2']), dm['hrw']/dm['thgp2'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer2'] == dm['HPlayer2']), (dm['vrw'] + dm['hrw'])/dm['gp2'], dm['vrw']/dm['tvgp2'])))))

- create a variable for the overall roster position for all players of a given team, that played in position 3, which is the **left wing position.**

In [71]:
dm['lw'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer3'] == dm['VPlayer3']), (dm['hlw'] + dm['vlw'])/dm['gp3'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer3'] != dm['VPlayer3']), dm['hlw']/dm['thgp3'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer3'] == dm['HPlayer3']), (dm['vlw'] + dm['hlw'])/dm['gp3'], dm['vlw']/dm['tvgp3'])))))

- create a variable for the overall roster position for all players of a given team, that played in position 4, which is the **right defense position.**

In [72]:
dm['dr'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer4'] == dm['VPlayer4']), (dm['hdr'] + dm['vdr'])/dm['gp4'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer4'] != dm['VPlayer4']), dm['hdr']/dm['thgp4'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer4'] == dm['HPlayer4']), (dm['vdr'] + dm['hdr'])/dm['gp4'], dm['vdr']/dm['tvgp4'])))))

- create a variable for the overall roster position for all players of a given team, that played in position 5, which is the **left defense position.**

In [73]:
dm['dl'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer5'] == dm['VPlayer5']), (dm['hdl'] + dm['vdl'])/dm['gp5'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer5'] != dm['VPlayer5']), dm['hdl']/dm['thgp5'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer5'] == dm['HPlayer5']), (dm['vdl'] + dm['hdl'])/dm['gp5'], dm['vdl']/dm['tvgp5'])))))

- create a variable for the overall roster position for all players of a given team, that played in position 5, which is the **goaltender position.**

In [74]:
dm['g'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer6'] == dm['VPlayer6']), (dm['hg'] + dm['vg'])/dm['gp6'],
                   (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer6'] != dm['VPlayer6']), dm['hg']/dm['thgp6'],
                   (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer6'] == dm['HPlayer6']), (dm['vg'] + dm['hg'])/dm['gp6'], dm['vg']//dm['tvgp6'])))))

## store player allocation data frame

the player allocation data frame will be stored and used for the implementation of roster model.

In [75]:
dm.to_csv('player_allocation.csv', index='False', sep=',')