# Benchmark Construction

In this notebook I will construct the benchmark models used for my analysis.

In [74]:
# import libraries
import numpy as np
import pandas as pd

In [75]:
# import datasets
data = pd.read_csv('capstone data final.csv', index_col=[0,1,2])
cc = pd.read_csv('Calvins Captains Benchmark.csv')

In [76]:
cc.head()

Unnamed: 0,Round,Player,Score
0,6,Zach Merrett,108
1,6,Tom Rockliff,100
2,6,Tom Mitchell,125
3,6,Adam Treloar,82
4,6,Dayne Zorko,76


## Benchmark Dataset Construction

Here I will gather the data needed to construct the 'Season Average' and 'Form' benchmarks. As you can see above, the dataset for the 'Calvin's Captains' benchmark has already been gathered manually. Information for this dataset was gathered from the Calvin's Captains articles available on the Dreamteam Talk website (http://dreamteamtalk.com/)

In [77]:
yr = 2017
rd = list(range(6,21))
pl_sa = []
pl_f = []
sc_sa = []
sc_f = []

for r in rd:
    season = data.loc[(yr,r)].sort_values('season_av', ascending=False).iloc[0:5]
    for index, row in season.iterrows():
        pl_sa.append(index)
        sc_sa.append(row['score'])
    form = data.loc[(yr,r)].sort_values('three_rd_av', ascending=False).iloc[0:5]
    for index, row in form.iterrows():
        pl_f.append(index)
        sc_f.append(row['score'])

In [78]:
rd_col = []
for r in rd:
    for i in range(5):
        rd_col.append(r)
print(rd_col)

[6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20]


In [79]:
# season average benchmark data
sa = pd.DataFrame()
sa['Round'] = rd_col
sa['Player'] = pl_sa
sa['Score'] = sc_sa
sa.head()

Unnamed: 0,Round,Player,Score
0,6,Joel Selwood,60.0
1,6,Dayne Zorko,76.0
2,6,Tom Rockliff,100.0
3,6,Zach Merrett,108.0
4,6,Rory Sloane,155.0


In [80]:
# three week average benchmark data
f = pd.DataFrame()
f['Round'] = rd_col
f['Player'] = pl_f
f['Score'] = sc_f
f.head()

Unnamed: 0,Round,Player,Score
0,6,Joel Selwood,60.0
1,6,Rory Sloane,155.0
2,6,Dayne Zorko,76.0
3,6,Tom Rockliff,100.0
4,6,Gary Ablett,162.0


In [81]:
# write the benchmark data to file
sa.to_csv('Season Average Benchmark.csv', index=False)
f.to_csv('3 Week Benchmark.csv', index=False)

## Construction of the Benchmark

Here we compute the average score of the top 5 players per round according to each method. These values represent the benchmarks that we will evaluate the model results against.

In [82]:
benchmarks = pd.DataFrame()
benchmarks['Calvins Captains'] = cc.groupby('Round')['Score'].mean()
benchmarks['Season Average'] = sa.groupby('Round')['Score'].mean()
benchmarks['3 Week Average'] = f.groupby('Round')['Score'].mean()
benchmarks

Unnamed: 0_level_0,Calvins Captains,Season Average,3 Week Average
Round,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
6,98.2,99.8,110.6
7,114.4,121.8,110.6
8,113.0,118.0,122.2
9,131.6,113.4,117.4
10,101.0,106.8,98.6
11,115.4,107.6,113.2
12,139.6,119.8,115.0
13,93.6,105.0,106.2
14,104.8,100.8,99.0
15,114.0,112.4,125.4


In [83]:
# write benchmarks to file
benchmarks.to_csv('Benchmarks.csv')