# Fantasy Hockey Draft List Generator
** Final Project for DSE200x by Jeff Puuri **

Abstract: In 2017, it was estimated that there were over 50 million participants in online fantasy sports.  Participation in fantasy sports is a leisure activity, but many play for money as well as pride.

Drafting the best team at the start of each season is a major key to a successful season as measured by winning one's league, or finishing 'in the money' (usually the top 3 spots).  In helping participants prepare for a draft, many online leagues provide a suggested draft list, that ranks sports players' projected performance in the upcoming season based on last year's stats only.  It can be shown, based on average draft position, that league-provided suggested draft lists are the predominant source for draft selections.

Are these league-provided draft lists, the best source for one's draft choices?  Or is it possible to apply data science to create an improved draft list of players, based on more than just last years stats, that can provide a fantasy participant with an advantage at draft time over those using the draft list provided by the online fantasy host?

The goal in drafting a better team is to give one a better chance of their team finishing the season in the top spots, providing enhanced enjoyment, pride, and possible winnings in leagues that play for money.

This project study will produce a model for Fantasy Hockey using Yahoo Sports, the online service in which I have participated with same league for over ten years.  Therefore I will use the same statistical categories that are used by my league.  Although the model produced by this study is specific to the sport of hockey and a certain set of statsitical categories used by my leagues, I would like to demonstrate that this approach could be extended to other fantasy sports and the statistical categories defined for a participant's particular league.

One other note is that this study is geared towards season-long fantasy leagues, and not daily matchups that are becoming popular.

In [43]:
# import libraries to use
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataFolder = 'D:/PjtData/4_PD/edX_DSE200x/wk10_FinalPjt/'

## Import Datasets
The next step is to import the datasets that will be used in the analysis.  One set is going to contain the predicted and actual rankings from Yahoo Sports for all players for the 2017-18 season (ranks).  The other contains the player statistics for 2004-2017 from Hockey-Reference.com that has been cleansed by a user through Kaggle (stats).


In [44]:
statsFile = 'NHLKaggleStats.csv'
ranksFile = 'YahooPlayerRanks.csv'
statsDF = pd.read_csv(dataFolder + statsFile)
ranksDF = pd.read_csv(dataFolder + ranksFile)

## Remove unnecessary columns - Ranks
Each of the raw datasets contains statistical columns that are not part of the Yahoo Fantasy Hockey scoring criteria, therefore they can be removed to unclutter upcoming dataframe displays.  The following columns are considered for the scoring criteria: G = Goals, A = Assists, PM = Plus/Minus, PIM = Penalties in Minutes, PPP=Power Play Points, GWG = Game Winning Goals, SOG = Shots on Goal, HIT = Hits, BLK = Blocked shots.  We need to keep the GP = Games Played column for normalizing stats in a later step, and the PreSeasonRank and EndSeasonRank columns for comparing the model results.  We also keep PlayerName, Team, and Pos(ition) for identification purposes in visualizations and reports.

In [45]:
ranksDF.columns

Index(['PlayerName', 'Team', 'Pos', 'GP', 'PreSeasonRank', 'EndSeasonRank',
       'TOIperGP', 'G', 'A', 'PM', 'PIM', 'PPP', 'GWG', 'SOG', 'HIT', 'BLK'],
      dtype='object')

In [46]:
del(ranksDF['TOIperGP'])
ranksDF.columns

Index(['PlayerName', 'Team', 'Pos', 'GP', 'PreSeasonRank', 'EndSeasonRank',
       'G', 'A', 'PM', 'PIM', 'PPP', 'GWG', 'SOG', 'HIT', 'BLK'],
      dtype='object')

## Remove unnecessary columns - Stats
Keep the same corresponding columns from the Kaggle source, and then rename the columns to align with the Yahoo naming

In [47]:
statsDF.columns

Index(['Rk', 'Player', 'Age', 'Pos', 'Tm', 'GP', 'G', 'A', 'PTS', 'plusminus',
       'PIM', 'PS', 'EV', 'PP', 'SH', 'GW', 'EV00', 'PP00', 'SH00', 'S',
       'S_percent', 'TOI', 'ATOI', 'BLK', 'HIT', 'FOW', 'FOL', 'FO_percent',
       'HART', 'Votes', 'Season', 'NHLID'],
      dtype='object')

In [48]:
statsDF['PPP'] = statsDF['PP'] + statsDF['PP00'] # In the NHL dataset PowerPlay Points were in two columns for goals and assists
statsDF.drop(['Rk', 'Age', 'PTS', 'PS', 'EV', 'PP', 'SH', 'EV00', 'PP00', 'SH00', 'S_percent', 'TOI', 'ATOI', 'FOW', 'FOL', 'FO_percent', 'HART', 'Votes', 'NHLID'], axis=1, inplace=True)
statsDF.rename({'Player':'PlayerName', 'Tm':'Team', 'plusminus':'PM', 'GW':'GWG', 'S':'SOG'}, axis='columns', inplace='True')
statsDF = statsDF[['PlayerName', 'Team', 'Pos', 'GP', 'G', 'A', 'PM', 'PIM', 'PPP', 'GWG', 'SOG', 'HIT', 'BLK', 'Season']]
statsDF.columns

Index(['PlayerName', 'Team', 'Pos', 'GP', 'G', 'A', 'PM', 'PIM', 'PPP', 'GWG',
       'SOG', 'HIT', 'BLK', 'Season'],
      dtype='object')

In [49]:
# Whittle down the stats dataset to just the seasons we need
statsSeasons = [2015, 2016, 2017]
statsDF = statsDF[statsDF['Season'].isin(statsSeasons)]

In [50]:
statsDF.shape

(2668, 14)