To analyse the teams in the league, we need to get the results in a format we can use.

The results are hosted on Yahoo's fantasy hockey site. To date, I've manually retrieved the results and created a csv file (I'll automate it someday). Each week's results are in a separate file in the same directory.

I need to retreive results from csv files and put them into a data structure that is easy to work with.

In [1]:
# prepare this workspace
import numpy as np
import csv
import pandas as pd

We want to load the first week's results into a Numpy array.

I've uploaded the league results for weeks 1 through 19 to a Github repository, where each week exists in a separate .csv file.

In [2]:
# define a function to load a file from Github
def loadweekonline(weeknumber):
    
    url = 'https://raw.githubusercontent.com/scibbatical/fan_hockey/master/w%s.csv' % weeknumber
    results = np.array(pd.read_csv(url,header=None))
    
    return results

# load a file
w1 = loadweekonline(1)
        
# show results
print(w1)

[[  13.      22.     -13.      53.      50.       2.       0.895]
 [  17.      32.       4.      40.      41.       4.       0.922]
 [  12.      18.     -14.      57.      42.       5.       0.908]
 [   6.      22.      -7.      71.      42.       5.       0.893]
 [   9.       9.       4.     113.      54.       4.       0.906]
 [  13.      25.       1.      65.      47.       1.       0.897]
 [   7.      16.      -9.      84.      56.       2.       0.916]
 [  11.      27.      -2.      54.      65.       3.       0.891]
 [  11.      35.      16.      95.      58.       6.       0.907]
 [  14.      33.      -1.      30.      39.       5.       0.898]
 [  10.      20.      14.      46.      38.       4.       0.897]
 [   9.      17.      -2.     101.      51.       2.       0.906]]


This array isn't the prettiest to look at, but fortunately, I just figured out how to use Pandas, so I'll use it to make it more presentable.

I'll add lists of team and category names as well.

In [3]:
names = ['Basement Dwellers','Chotchmahoneless','Dice-n-Draft','Dont Toews Me Bro','Happys Hustlers','Hard Off the Glass','Neals Neat Team','Newfie Rockers','RyansNOTsoRandomTeam','The Gallows Pole', 'TopShelf','Tylers Tilers']

cats = ['G', 'A', '+/-', 'Hits', 'Blk', 'W', "SV%"]

# display using Pandas
pd.DataFrame(w1, index=names, columns=cats)

Unnamed: 0,G,A,+/-,Hits,Blk,W,SV%
Basement Dwellers,13.0,22.0,-13.0,53.0,50.0,2.0,0.895
Chotchmahoneless,17.0,32.0,4.0,40.0,41.0,4.0,0.922
Dice-n-Draft,12.0,18.0,-14.0,57.0,42.0,5.0,0.908
Dont Toews Me Bro,6.0,22.0,-7.0,71.0,42.0,5.0,0.893
Happys Hustlers,9.0,9.0,4.0,113.0,54.0,4.0,0.906
Hard Off the Glass,13.0,25.0,1.0,65.0,47.0,1.0,0.897
Neals Neat Team,7.0,16.0,-9.0,84.0,56.0,2.0,0.916
Newfie Rockers,11.0,27.0,-2.0,54.0,65.0,3.0,0.891
RyansNOTsoRandomTeam,11.0,35.0,16.0,95.0,58.0,6.0,0.907
The Gallows Pole,14.0,33.0,-1.0,30.0,39.0,5.0,0.898


This looks great! But the best part is that a week's results are a 2D array. Each row is a team's stats for week 1, and each column is a stat category.

We can add another dimension to the data: time. Each week will be represented by a layer, where each layer will be a 2D array of results like the one we've already loaded.

Let's define a function to create our 3D result array, then use it to load 19 weeks of results.

In [4]:
def compresultsonline(uptoweek):
    
    #start by loading the first week    
    results = [loadweekonline(1)]
    
    # now append the other weeks onto results        
    for i in range(uptoweek-1):
        results = np.append(results,[loadweekonline(i+2)],axis=0)
        
    return results


BHLresults = compresultsonline(19)

The data is now in a 3D array from which we can grab data according to team, week, and category index by:

BHLresults[(week),(team),(category)]

For example, my team's (team index 9) performance in Goals (category index 0) this season can be found:

In [6]:
pd.DataFrame(BHLresults[:,9,0]).transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
0,14.0,8.0,5.0,7.0,5.0,10.0,2.0,14.0,5.0,5.0,5.0,8.0,5.0,12.0,7.0,7.0,8.0,7.0,4.0


Now that results are in a Numpy array, the fun can begin!

Since I want to demonstrate some sort of data analysis in this notebook, let's calculate the teams' mean performance in each category.

In [7]:
# initialize an array into which we can populate the values
means = np.zeros(np.shape(BHLresults)[-2:])

for team in range(np.size(BHLresults,1)):
    for cat in range(np.size(BHLresults,2)):
        means[team,cat]=round(np.mean(BHLresults[:,team,cat]),3)

# display using Pandas
pd.DataFrame(means, index=names, columns=cats)

Unnamed: 0,G,A,+/-,Hits,Blk,W,SV%
Basement Dwellers,6.842,10.079,1.158,43.711,28.342,2.132,0.907
Chotchmahoneless,8.263,16.105,-0.421,20.895,22.658,2.368,0.921
Dice-n-Draft,6.474,15.184,-0.816,30.447,24.947,2.263,0.914
Dont Toews Me Bro,6.395,13.605,-0.184,47.079,32.605,2.605,0.912
Happys Hustlers,4.632,7.447,0.868,61.211,36.789,3.342,0.919
Hard Off the Glass,5.632,11.868,1.132,34.184,26.368,2.026,0.916
Neals Neat Team,5.184,8.868,0.553,47.658,30.0,2.5,0.921
Newfie Rockers,8.079,11.289,2.553,45.921,33.368,2.316,0.912
RyansNOTsoRandomTeam,7.526,14.053,1.895,49.763,33.132,2.842,0.913
The Gallows Pole,7.263,15.789,-0.342,23.684,24.368,2.895,0.924


This isn't revolutionary stuff, but it's a start.