## the value of each roster position

To determine the impact each roster position has on team success, we need to examine the talent of players per game and the result of each game. For each roster position, there will be elite players (1) and secondary players (2) per team. We need to have one row per game.

In [41]:
import sys
import os
import pandas as pd
import numpy as np
import datetime, time
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pylab import hist, show
import scipy

### import data set

In [42]:
dm = pd.read_csv('out_data/play_by_play_with_player_rank.csv')
dm = dm.drop('Unnamed: 0', axis=1)

  interactivity=interactivity, compiler=compiler, result=result)


In [43]:
dm.columns

Index(['Season', 'GameNumber', 'GameDate', 'Period', 'AdvantageType', 'Zone',
       'EventNumber', 'EventType', 'EventDetail', 'EventTeamCode',
       'EventPlayerNumber', 'EventPlayerName', 'EventTimeFromZero',
       'EventTimeFromTwenty', 'TeamCode', 'PlayerNumber', 'PlayerPosition',
       'ShotType', 'ShotResult', 'Length', 'PenaltyType', 'Rank'],
      dtype='object')

In [44]:
dm.shape

(3063332, 22)

- use a different data set named dq (quality)

In [45]:
dq = dm

- Keep one observation per game, team and player (drop duplicates).

In [46]:
dq = dq.drop_duplicates(['GameNumber', 'TeamCode', 'PlayerNumber'])

In [47]:
dq.columns

Index(['Season', 'GameNumber', 'GameDate', 'Period', 'AdvantageType', 'Zone',
       'EventNumber', 'EventType', 'EventDetail', 'EventTeamCode',
       'EventPlayerNumber', 'EventPlayerName', 'EventTimeFromZero',
       'EventTimeFromTwenty', 'TeamCode', 'PlayerNumber', 'PlayerPosition',
       'ShotType', 'ShotResult', 'Length', 'PenaltyType', 'Rank'],
      dtype='object')

In [48]:
dq.isnull().sum()

Season                     0
GameNumber                 0
GameDate                   0
Period                     0
AdvantageType              0
Zone                       0
EventNumber                0
EventType                  0
EventDetail                0
EventTeamCode              0
EventPlayerNumber          0
EventPlayerName            0
EventTimeFromZero          0
EventTimeFromTwenty        0
TeamCode                   0
PlayerNumber               0
PlayerPosition             0
ShotType               37953
ShotResult             44978
Length                 39953
PenaltyType            46498
Rank                       0
dtype: int64

### count the number of quality players per position for each game

- group by season, gameumber, teamcode and playernumber to count the occurance of each player and sum up the observations of players. There should be 19 players per team per game for the dataset to be correct.

In [49]:
dq['playercount'] = dq.groupby(['Season', 'GameNumber', 'TeamCode', 'PlayerNumber',])['PlayerNumber'].transform('count')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [50]:
dq['roster'] = dq.groupby(['Season', 'GameNumber', 'TeamCode'])['playercount'].transform('sum')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


- create a column that will display the amount of quality players per position per team and game number

In [51]:
dq['rosterposition'] = dq.groupby(['Season', 'GameNumber', 'TeamCode', 'PlayerPosition', 'Rank'])['playercount'].transform('sum')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [52]:
dq.isnull().sum()

Season                     0
GameNumber                 0
GameDate                   0
Period                     0
AdvantageType              0
Zone                       0
EventNumber                0
EventType                  0
EventDetail                0
EventTeamCode              0
EventPlayerNumber          0
EventPlayerName            0
EventTimeFromZero          0
EventTimeFromTwenty        0
TeamCode                   0
PlayerNumber               0
PlayerPosition             0
ShotType               37953
ShotResult             44978
Length                 39953
PenaltyType            46498
Rank                       0
playercount                0
roster                     0
rosterposition             0
dtype: int64

- the next step is to group players by gamenumber, teamcode, position and rank, to display the quality of players each team has per position. **Pivot table** by player position and rank using roster position values. Game number and team are the indexes. We want to join the levels to generate columns by roster position and rank (10 columns). 


In [53]:
dq = pd.pivot_table(dq, index=['GameNumber', 'TeamCode'], columns=['PlayerPosition', 'Rank'], values=['rosterposition'])
dq = dq.reset_index()
dq.columns = ['_'.join(str(s).strip() for s in col if s) for col in dq.columns]
dq.reset_index()
dq = df.fillna(0)

In [54]:
dq = dq.rename(columns={'rosterposition_C_1': 'C1', 'rosterposition_C_2': 'C2', 'rosterposition_D_1': 'D1', 'rosterposition_D_2': 'D2', 'rosterposition_G_1' : 'G1', 'rosterposition_G_2': 'G2', 'rosterposition_L_1': 'L1', 'rosterposition_L_2': 'L2', 'rosterposition_R_1': 'R1', 'rosterposition_R_2': 'R2' })


- the data set shows the quality amount of players per team for every single regular season game.

In [55]:
dq.head()

Unnamed: 0,GameNumber,TeamCode,C1,C2,D1,D2,G1,G2,L1,L2,R1,R2
0,20001,MTL,1.0,6.0,1.0,5.0,1.0,0.0,0.0,4.0,0.0,1.0
1,20001,TOR,2.0,3.0,1.0,5.0,0.0,1.0,2.0,1.0,0.0,4.0
2,20002,PHI,3.0,2.0,2.0,4.0,1.0,0.0,1.0,4.0,1.0,1.0
3,20002,PIT,2.0,6.0,2.0,4.0,1.0,0.0,1.0,2.0,0.0,1.0
4,20003,CAR,2.0,3.0,2.0,4.0,1.0,0.0,0.0,3.0,1.0,3.0
