# Introduction

One of the main objectives participants of fantasy sports competitions try to achieve is assembling the best-performing team over the course of an entire season. Typically, performance is based on points accrued by each participant's team chosen during a each period of the competition, which can last days, weeks, or months. Most major sports leagues have branded fantasy leagues for theire respective competitions. In this study, data collected from the English Premier League's Fantasy Premier League over the past five years will be disseminated to understand the challenge of choosing a team of 11 starters and four substitues can be, such that the points accrued by the team over each competition period is maximized.

The rules for playing FPL can be found using the following URL - https://fantasy.premierleague.com/help/rules. Rules will be mentioned as needed as this analysis moves towards a comprehensive description of what factors must be considered when assembling a team for each gameweek.

# 1 - Exploring the Data

First, we'll import some important libraries for doing basic exploratory analysis to describe the data.

In [1]:
import pandas as pd

# save filepath to variable for easier access
fpl_data_file_path = '../data/FPL17-GW0.csv'

# read the data and store data in DataFrame titled fpl_data
fpl_data = pd.read_csv(fpl_data_file_path) 

# print a summary of the data in FPL data
fpl_data.describe()

Unnamed: 0,Cost,PointsLastRound,TotalPoints,AveragePoints,AveragePointsPerDollar,TotalPointsPerDollar,GameweekWeighting,TransfersOut,YellowCards,GoalsConceded,...,GW30Forecast,GW31Forecast,GW32Forecast,GW33Forecast,GW34Forecast,GW35Forecast,GW36Forecast,GW37Forecast,GW38Forecast,ICTIndex
count,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0,...,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0,501.0
mean,5427146.0,0.0,53.690619,1.031201,1.727529e-07,9e-06,0.0,0.0,2.245509,18.546906,...,0.991617,1.018762,0.961078,0.979441,0.988822,1.006387,0.994212,1.028343,1.012375,70.408383
std,1325587.0,0.0,52.914706,1.483116,2.341806e-07,8e-06,0.0,0.0,2.796713,18.125129,...,1.65179,1.710254,1.57869,1.640609,1.615424,1.685372,1.638654,1.685288,1.690919,79.389004
min,4000000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4500000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,5000000.0,0.0,45.0,5.1358770000000005e-27,9.337958e-34,8e-06,0.0,0.0,1.0,15.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,50.0
75%,5500000.0,0.0,90.0,2.105263,4.052632e-07,1.6e-05,0.0,0.0,4.0,31.0,...,1.7,1.6,1.7,1.6,1.8,1.6,1.8,1.8,1.7,111.3
max,12500000.0,0.0,264.0,6.947368,7.842105e-07,3e-05,0.0,0.0,14.0,70.0,...,9.9,8.5,9.9,9.4,8.6,8.4,8.4,9.9,8.6,453.4


Viewing these summary statistics show us some unusual figures, such as the PointsLastRound column being 0.0 for every row. That's because we are using data representing available player data for 2017-2018 Fantasy Premier League GameWeek 0. No games have been played in the English Premier League yet, thus various columns are initialized appropriately. There will be 37 additional datasets to be used for the 2017-2018 FPL season as each team in the English Premiership play 38 games to determine which team is the champion.

Now let's try looking at the first few records to see which colulmns are provided and what their data types are.

In [2]:
fpl_data.shape

(501, 88)

In [3]:
fpl_data.head()

Unnamed: 0,FirstName,Surname,PositionsList,Team,Cost,PointsLastRound,TotalPoints,AveragePoints,AveragePointsPerDollar,TotalPointsPerDollar,...,GW30Forecast,GW31Forecast,GW32Forecast,GW33Forecast,GW34Forecast,GW35Forecast,GW36Forecast,GW37Forecast,GW38Forecast,ICTIndex
0,Rolando,Aarons,MID,NEW,4500000,0,0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Tammy,Abraham,FWD,SWA,5500000,0,0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Charlie,Adam,MID,STK,5000000,0,59,4.2678410000000005e-27,8.535682e-34,1.2e-05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,106.7
3,,Adam Smith,DEF,BOU,5000000,0,104,2.736842,5.473684e-07,2.1e-05,...,2.6,2.8,2.0,3.4,1.9,2.5,2.8,3.1,2.5,132.1
4,,Adrian,GLK,WHU,4500000,0,64,4.629523e-27,1.0287830000000001e-33,1.4e-05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,37.7


Right off the bat, we can see that the records in this file are sorted by Surname. Also, it appears that the first two players, Rolando Aarons and Tammy Abraham are new to the English Premier League. Aarons, along with his team Newcastle United, were promoted from the Championship, while Tammy Abraham was loaned to Swansea City for the season. 

Now let's take a look at the rest of the columns that are available for each player.

In [4]:
list(fpl_data.columns)

['FirstName',
 'Surname',
 'PositionsList',
 'Team',
 'Cost',
 'PointsLastRound',
 'TotalPoints',
 'AveragePoints',
 'AveragePointsPerDollar',
 'TotalPointsPerDollar',
 'GameweekWeighting',
 'TransfersOut',
 'YellowCards',
 'GoalsConceded',
 'GoalsConcededPoints',
 'Saves',
 'SavesPoints',
 'GoalsScored',
 'GoalsScoredPoints',
 'ValueSeason',
 'TransfersOutRound',
 'PriceRise',
 'PriceFallRound',
 'LastSeasonPoints',
 'PriceFall',
 'ValueForm',
 'PenaltiesMissed',
 'Form',
 'Bonus',
 'FanRating',
 'CleanSheets',
 'CleanSheetPoints',
 'Assists',
 'SelectedByPercent',
 'TransfersIn',
 'OwnGoals',
 'EAIndex',
 'PenaltiesSaved',
 'DreamteamCount',
 'MinutesPlayed',
 'TransfersInRound',
 'PriceRiseRound',
 'RedCards',
 'BPS',
 'NextFixture1',
 'NextFixture2',
 'NextFixture3',
 'NextFixture4',
 'NextFixture5',
 'GW1Forecast',
 'GW2Forecast',
 'GW3Forecast',
 'GW4Forecast',
 'GW5Forecast',
 'GW6Forecast',
 'GW7Forecast',
 'GW8Forecast',
 'GW9Forecast',
 'GW10Forecast',
 'GW11Forecast',
 'GW12

The data set we are using was curated by the website Fantasy Overlord (http://fantasyoverlord.com/FPL). The author of the website implemented a machine learning algorithm insprired by the research paper, "Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains," which utilizes Bayesian Q-Learning. Knowing this, we will be able to utilize the data populated by the columns containing forecasts for each week later on in the analysis to compare the approach in this study.

# 2 - Understanding the Cost of the Team-of-the-Week

While predicting the team-of-the-week (TOTW) is hard in itself, we first must understand the TOTW's characteristics and how they differ from the general population of players to select from. 

## 2.1 - Dynamics of the the New Season

### 2.1.1 - English Football League System
The English Premier League (EPL), as well as the other leagues below it, operate with relegation/promotion mechanism such that the bottom three teams (those with the least amount of points accrued at the end of the season) are relegated to the league below the EPL, which is the English Championship. Furthermore, the top three teams from the English Championship are promoted to participate in the EPL for the following season. Players of the newly promoted sides (Newcastle United, Brighton & Hove Albion and Huddersfield Town) may not have any points under the column "Last Season Points" as either the player was promoted as part of the team or was purchased on a permanant or loan contract prior to the season. 

In [5]:
fpl_data.loc[(fpl_data.Team == "NEW"), ['Surname', 'Team', 'PositionsList', 'LastSeasonPoints']]

Unnamed: 0,Surname,Team,PositionsList,LastSeasonPoints
0,Aarons,NEW,MID,0
21,Atsu,NEW,MID,0
89,Clark,NEW,DEF,0
93,Colback,NEW,MID,0
108,Darlow,NEW,GLK,0
118,de Jong,NEW,MID,0
126,Diame,NEW,MID,0
133,Dummett,NEW,DEF,0
138,Elliot,NEW,GLK,0
168,Gamez,NEW,DEF,0


Any player associated with the newly promoted sides which have points under the "Last Season Points" column played for a EPL team during the prior season. For example, Newcastle United defender Javier Manquillo, an Atletico Madrid player was loaned to Sunderland during the 2016-17 season. Players with no points in this column will not be considered for TOTW consideration.

### 2.1.2 - TOTW Costs vs. Starting Budget

In [6]:
top_performers = fpl_data.sort_values('LastSeasonPoints', ascending = False).groupby('PositionsList').head(5)

top_performers[['PositionsList', 'Surname', 'Team', 'LastSeasonPoints', 'Cost']]

Unnamed: 0,PositionsList,Surname,Team,LastSeasonPoints,Cost
405,MID,Sanchez,ARS,264,12000000
13,MID,Alli,TOT,225,9500000
191,MID,Hazard,CHE,224,10500000
230,FWD,Kane,TOT,224,12500000
278,FWD,Lukaku,MUN,221,11500000
140,MID,Eriksen,TOT,218,9500000
116,MID,De Bruyne,MCI,199,10000000
99,FWD,Costa,CHE,196,10000000
153,FWD,Firmino,LIV,180,8500000
68,DEF,Cahill,CHE,178,6500000


By taking the top eleven players (one goalkeeper and ten field players), we have:

<ul>
  <li>Kane, TOT, 224, 12.5</li>
  <li>Lukaku, MUN, 221, 11.5</li>
  <li>Sanchez, ARS, 264, 12.0</li>
  <li>Alli, TOT, 225, 9.5</li>
  <li>Hazard, CHE, 224, 10.5</li>
  <li>Eriksen, TOT, 218, 9.5</li>
  <li>De Bruyne, MCI, 199, 10.0</li>
  <li>Cahill, CHE, 178, 6.5</li>
  <li>Alonso, CHE, 177, 7.0</li>
  <li>Azpilicueta, CHE, 170, 6.5</li>
  <li>Heaton, BUR, 149, 5.0</li>
</ul>

This point-accruing machine playing a 3-5-2 formation, scored a whopping 2249 points, and had a total cost of 100.5. Obviously to start a season, an FPL participant could not choose all of these players as participants have the additional constraint of picking four more players (one goalkeeper and three field players) to round out an official team for the competition and have a budget of 100 (pounds?). 

Participants can get around this as the season progresses as any player chosen to be on the team can go up in cost (or down) throughout the season. With additional room, a participant may decide to sell a player which has gone up in cost (not likely as the player is most likely performing well) and use the growth of the additional funds to strengthen other position(s) on the team.

TOTW busting initial budget of 100....how does the trade-offs FPL participants impact decision making (may need to look at net transfers after week 0 to better understand this)

### 2.1.3 - TOTW...Does What it Wants (By Enlisting Who it Wants)
TOTW does not follow rules aside from 11 players selected, one being a goalkeeper and the rest field players. The TOTW can select any number of players from each team whereas a participant in FPL can only choose up to three players from a single team.


## 2.2 - A Battle Until the End...Of the Season

Since no FPL participant can attain the TOTW initially, what is the best approximate team that can be put together? how is the done over the course of the season while taking into account a multitude of factors, including: trading more than max number of players per gameweek **_NOTE:I think its somewhere around 5 points per additional transfer deducted from total points._** 

This situation brings up an interesting question from a computational perspective - what type of complexity does a participant exhibit in finding the team they will utilize for an arbitrary gameweek? One the other side of the same coin, does the TOTW exhibit the same complexity? If not, how is selecting the TOTW different and is harder or easier?

Some of FPL's mechanics do lend themselves to getting close to TOTW. Routinely picking good captains would put a participant approximately close to an arbitrary gameweek's TOTW, but this is even harder to do than just picking a team of 15 players.

# 3 - Background

