# Introduction

One of the main objectives participants of fantasy sports competitions try to achieve is assembling the best-performing team over the course of an entire season. Typically, performance is based on points accrued by each participant's team chosen during a each period of the competition, which can last days, weeks, or months. Most major sports leagues have branded fantasy leagues for theire respective competitions. In this study, data collected from the English Premier League's Fantasy Premier League over the past five years will be disseminated to understand the challenge of choosing a team of 11 starters and four substitues can be, such that the points accrued by the team over each competition period is maximized.

The rules for playing FPL can be found using the following URL - https://fantasy.premierleague.com/help/rules. Rules will be mentioned as needed as this analysis moves towards a comprehensive description of what factors must be considered when assembling a team for each gameweek.

# 1 - Exploring the Data

First, we'll import some important libraries for doing basic exploratory analysis to describe the data.

In [None]:
import pandas as pd

# save filepath to variable for easier access
fpl_data_file_path = '../data/FPL17-GW0.csv'

# read the data and store data in DataFrame titled fpl_data
fpl_data = pd.read_csv(fpl_data_file_path) 

# print a summary of the data in FPL data
fpl_data.describe()

Viewing these summary statistics show us some unusual figures, such as the PointsLastRound column being 0.0 for every row. That's because we are using data representing available player data for 2017-2018 Fantasy Premier League GameWeek 0. No games have been played in the English Premier League yet, thus various columns are initialized appropriately. There will be 37 additional datasets to be used for the 2017-2018 FPL season as each team in the English Premiership play 38 games to determine which team is the champion.

Now let's try looking at the first few records to see which colulmns are provided and what their data types are.

In [None]:
fpl_data.shape

In [None]:
fpl_data.head()

Right off the bat, we can see that the records in this file are sorted by Surname. Also, it appears that the first two players, Rolando Aarons and Tammy Abraham are new to the English Premier League. Aarons, along with his team Newcastle United, were promoted from the Championship, while Tammy Abraham was loaned to Swansea City for the season. 

Now let's take a look at the rest of the columns that are available for each player.

In [None]:
list(fpl_data.columns)

The data set we are using was curated by the website Fantasy Overlord (http://fantasyoverlord.com/FPL). The author of the website implemented a machine learning algorithm insprired by the research paper, "Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains," which utilizes Bayesian Q-Learning. Knowing this, we will be able to utilize the data populated by the columns containing forecasts for each week later on in the analysis to compare the approach in this study.

#2 - Understanding the Cost of the Team-of-the-Week

While predicting the team-of-the-week (TOTW) is hard in itself, we first must understand the TOTW's characteristics and how they differ from the general population of players to select from.