# Transfer selection for English Premier League Fantasy Football

After signing up to https://fantasy.premierleague.com/ and making a team, every week you get to transfer players. In this simple method, we will only transfer a single player, since it costs points to transfer more than 1 and we will no look into the cost-benefit analysis of that. We are also just going to make the transfer that will give the biggest increase in total score.

## Getting the player data from the website

Firstly we start off by saving a bunch of html files. We save the current team by clicking "My Team" and saving the page as "curTeam.html. Then under "Transfers", we can select what players to view. What we want to do is save the data by position for each position. So firstly, we choose "Goalkeepers" and we then save the webpage as "GK1.html". Then, at the bottom of the "Player selection" table, we can click on the arrow to go to the next page and then save it again as "GK2.html". Next we move on to choosing "Defenders" only and repeat the process (so we should have "D1.html" to "D6.html"). And we repeat the process for the midfielders (giving us "MF1.html" to "MF8.html") and forwards (giving us "F1.html" to "F3.html"). Make sure that these files are saved in the same location as this notebook. **Note:** It's important that we leave the "Sorted by" field as *Total Score* and "With a maximum price of" as *Unlimited*. This is because we want all the players in order of their total score to run the code. And it is also important that you make sure your squad is empty. Otherwise the program will add some extra rows to the data (it messes up the scraping). I used google chrome but I think it should work with other browsers too.

After getting the data, we rename all ".html" files as ".txt" (e.g. "curTeam.txt", "GK1.txt", "GK2.txt", "D1.txt" etc.). All the folders with the additional data created by your web browser when saving the webpage can be deleted.

We also note the remaining cash that we have left

In [56]:
remainingCash=0.7

## Choosing the "best" transfer
First we load all the required python libraries and define our constants.

In [57]:
### Libraries ###

import sys #used for input argument
import re #regular expressions
import csv #to write to csv
import numpy as np #to change list into matrix
import itertools #to iterate through different combinations of players

### Constants ###

POSITIONS=['GK','D','MF','F']
NOFILES=[2,6,8,3]
TOT_NUMBER=[1,4,4,2]
MAX_COST=100
NUM_PLAYERS=11

Next, we take all the html data that we have (already saved as .txt files) and we convert them into matrices (1 for each position). So it looks like:

Goalies:

[[Player Status, Name, Price, Total Score, 'GK'],
[Player Status, Name, Price, Total Score, 'GK'],
.
.
.]

Defenders:

[[Player Status, Name, Price, Total Score, 'D'],
[Player Status, Name, Price, Total Score, 'D'],
.
.
.]

etc.

We save all these matrices into one dictionary called "tables" where each table has a key of its position (i.e. 'GK','D','MF','F').

In [66]:
### Convert from html to numpy matrices ###

# Function: removeChars
# Usage: number_str = removeChars(str)
# Description: takes in a string and delets any character that's not a
# number or a decimal point.
######################################################################
def removeChars(string):
    non_decimal=re.compile(r'[^\d.]+')
    clean_string=non_decimal.sub('',string)
    return clean_string

# Function: makeRow
# Usage: statuses = makeRow(filename,str_bef,str_aft)
# Description: Finds all the strings written in between str_bef and
# str_aft and returns an array of those strings.
#######################################################################
def makeRows(filename,str_bef,str_aft):
    start=len(str_bef)
    rows=[]
    file = open(filename)
    for line in file:
        if str_bef in line:
            end=line.index(str_aft)
            row=line[start:end]+','
            row=row.replace("–", "-") #a hacky way to remove a non-recognized character by np.savetxt
            rows.append(row)
    return rows

# Function Family: find___
# Usage: ___ = find___(filename)
# Descriptions: Given a txt file of an team selection html page, it
# will find all the player ___ (e.g. statuses, names) for the week and 
# make an array of them.
#######################################################################
def findStatuses(filename):
    str_bef='            <a href="https://fantasy.premierleague.com/a/squad/transfers#" class="ismjs-info ism-table--el__status-link" title="'
    str_aft='"><svg class="ism-icon--element'
    statuses=makeRows(filename,str_bef,str_aft)
    return statuses
    
def findNames(filename):
	str_bef='                <a href="https://fantasy.premierleague.com/a/squad/transfers#" class="ism-table--el__name">'
	str_aft='</a>'
	names=makeRows(filename,str_bef,str_aft)
	return names

def findPricesAndScores(filename):
    str_bef='    <td class="ism-table--el__strong">'
    str_aft='</td>'
    pricesAndScores=makeRows(filename,str_bef,str_aft)
    for i in range(0,len(pricesAndScores)):
        pricesAndScores[i]=removeChars(pricesAndScores[i]) #removes pound signs and other random entries that happened to have same bef and aft strings
    pricesAndScores=[x for x in pricesAndScores if x != ''] #remove blank entries (which show up for some reason)
    return pricesAndScores

def findTeams(filename):
    str_bef='                <span class="ism-table--el__strong">'
    str_aft='</span>'
    teams=makeRows(filename,str_bef,str_aft)
    #Manually remove erroneous finds
    teams=teams[15:] #remove teams of players in curTeam that also get found in search
    badTeamTags=['<%- team_short_name %>,','ng"><%- team_short_name %>,'] #bad strings that are identified as a team
    teams=[team for team in teams if team not in badTeamTags]
    return teams

# Function: onlyPrices[Scores](pricesAndScores)
# Usage: prices[scores]= onlyPrices[Scores](pricesAndScores)
# Descriptions: Takes in an array of pricesAndScores (where the prices)
# and scores are back to back) and selects out only the price[score].
# This is used because the price and score of the players of the same 
# html tags before and after.
#######################################################################
def onlyPrices(pricesAndScores):
    prices=[]
    for i in range(0,len(pricesAndScores)):
        if i % 2 == 0:
            prices.append(pricesAndScores[i]+',')
    return prices

def onlyScores(pricesAndScores):
    scores=[]
    for i in range(0,len(pricesAndScores)):
        if i % 2 == 1:
            scores.append(pricesAndScores[i]+',')
    return scores

# Function: makeTable
# Usage: table = makeTable(filename,'GK')
# Descriptions: Takes in a html file (in txt format) and position and
# makes a table with columns 1) Player status, 2) Name, 3) Price, 4) 
# Total score and 5) Position.
#######################################################################
def makeTable(filename,position):
    table=[]
    #Find relevant fields and append them to the table
    statuses=findStatuses(filename)
    table.append(statuses)
    names=findNames(filename)
    table.append(names)
    pricesAndScores=findPricesAndScores(filename)
    prices=onlyPrices(pricesAndScores)
    table.append(prices)
    scores=onlyScores(pricesAndScores)
    table.append(scores)
    positions= [position]*len(statuses)
    table.append(positions)
    teams=findTeams(filename)
    table.append(teams)
    ###Get rid of extra entries in first column
    start0=len(table[0])-len(table[1])
    end0=len(table[0])
    table[0]=table[0][start0:end0]
    return table

# Function: rowBind
# Usage: full_table = rowBind(table1,table2)
# Descriptions: Takes 2 tables and combines them into 1 by row.
#######################################################################
def rowBind(table1,table2):
    if len(table1)==0:
        return table2
    for i in range(0,len(table1)):
            table1[i]=table1[i]+table2[i]
    return table1

## Main function ##
tables={}
#Make a table for each position and add it to tables
for i in range(0,len(POSITIONS)):
    #Initialize variables
    position = POSITIONS[i]
    noFiles= NOFILES[i]
    table=[]
    #Make a table for each file and then combine them
    for j in range(0,noFiles):
        filename = position+str(j+1)+'.txt'
        cur_table=makeTable(filename,position)
        table=rowBind(table,cur_table)
    #Need to make it into a numpy array to be able to manipulate it easily
    table=np.array(table)
    table=np.transpose(table)
    tables[POSITIONS[i]]=table #add to tables

Next, we make a table of our current team with the same information

In [67]:
# Function: findPlayerStatus
# Usage: status = findPlayerStatus(player,tables)
# Descriptions: Takes in the player that you're looking for and the
# list of tables of all the players and finds the status of that player
#######################################################################
def findPlayerStatus(player,tables):
    for position in POSITIONS:
        curTable=tables[position]
        for i in range(0,len(curTable[0])):
            if (curTable[1][i]==player):
                return curTable[0][i]
    return 'not found' #If the player is not found

# Function: findPlayerPrice
# Usage: price = findPlayerPrice(player,tables)
# Descriptions: Takes in the player that you're looking for and the
# list of tables of all the players and finds the price of that player
#######################################################################
def findPlayerPrices(player,tables):
    for position in POSITIONS:
        curTable=tables[position]
        for i in range(0,len(curTable[0])):
            if (curTable[1][i]==player):
                return curTable[2][i]
    return 'not found'#If the player is not found

# Function: findPlayerScore
# Usage: price = findPlayerScore(player,tables)
# Descriptions: Takes in the player that you're looking for and the
# list of tables of all the players and finds the total score of that 
# player
#######################################################################
def findPlayerScores(player,tables):
    for position in POSITIONS:
        curTable=tables[position]
        for i in range(0,len(curTable[0])):
            if (curTable[1][i]==player):
                return curTable[3][i]
    return 'not found'

# Function: findPlayerTeam
# Usage: price = findPlayerTeam(player,tables)
# Descriptions: Takes in the player that you're looking for and the
# list of tables of all the players and finds the team of that 
# player
#######################################################################
def findPlayerTeams(player,tables):
    for position in POSITIONS:
        curTable=tables[position]
        for i in range(0,len(curTable[0])):
            if (curTable[1][i]==player):
                return curTable[5][i]
    return 'not found'

# Function Family: findCur___
# Usage: ___ = findCur___(filename)
# Descriptions: Given a txt file of a My Team html page, it
# will find all the player ___ (e.g. statuses, names) for the week and 
# make an array of them.
#######################################################################
def findCurPlayers(filename):
    str_bef='                <div class="ism-element__name">'
    str_aft='</div>'
    curPlayers=makeRows(filename,str_bef,str_aft)
    return curPlayers

def findCurStatuses(curPlayers,tables):
    curStatuses=[]
    for player in curPlayers:
        curStatuses.append(findPlayerStatus(player,tables))
    return curStatuses

def findCurPrices(curPlayers,tables):
    curPrices=[]
    for player in curPlayers:
        curPrices.append(findPlayerPrices(player,tables))
    return curPrices

def findCurScores(curPlayers,tables):
    curScores=[]
    for player in curPlayers:
        curScores.append(findPlayerScores(player,tables))
    return curScores

def findCurTeams(curPlayers,tables):
    curTeams=[]
    for player in curPlayers:
        curTeams.append(findPlayerTeams(player,tables))
    return curTeams

## Main Function ##
curTeam=[]
curPlayers=findCurPlayers('curTeam.txt')
curPos=['GK','D','D','D','D','MF','MF','MF','MF','F','F','GK','D','MF','F']
curStatuses=findCurStatuses(curPlayers,tables)
curPrices=findCurPrices(curPlayers,tables)
curScores=findCurScores(curPlayers,tables)
curTeams=findCurTeams(curPlayers,tables)
curTeam.append(curStatuses)
curTeam.append(curPlayers)
curTeam.append(curPrices)
curTeam.append(curScores)
curTeam.append(curPos)
curTeam.append(curTeams)
#Need to make it into a numpy array to be able to manipulate it easily
curTeam=np.array(curTeam)
curTeam=np.transpose(curTeam)
curTeam

array([ ['View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,', 'View player information,'],
       ['de Gea,', 'Cahill,', 'Koscielny,', 'Azpilicueta,', 'Walker,', 'Lallana,', 'Capoue,', 'SÃ¡nchez,', 'Hazard,', 'Lukaku,', 'Defoe,', 'Pope,', 'Love,', 'de Roon,'],
       ['5.4,', '6.4,', '6.3,', '6.6,', '6.1,', '7.6,', '4.8,', '11.9,', '10.4,', '9.5,', '7.7,', '4.0,', '4.0,', '4.4,'],
       ['65,', '96,', '77,', '96,', '80,', '104,', '81,', '136,', '118,', '99,', '100,', '0,', '13,', '41,'],
       ['GK', 'D', 'D', 'D', 'D', 'MF', 'MF', 'MF', 'MF', 'F', 'F', 'GK', 'D', 'MF', 'F'],
       ['MUN,', 'CHE,', 'ARS,', 'CHE,', 'TOT,', 'LIV,', 'WAT,', 'ARS,', 'CHE,', 'EVE,', 'SUN,', 'BUR,', 

Another rule we have to take into account is that we are only allowed to have a maximum of 3 players from one team on our fantasy team. Therefore we need to see if there are any teams which we are not allowed to get new players from when we make our substitutions. For that we need to have a list of all teams for which we have 3 players

In [68]:
maxedOutTeams=[]
for team in set(curTeams):
    if curTeams.count(team) > 2:
        maxedOutTeams.append(team)
maxedOutTeams

['CHE,']

Now that we have all the data in a nice table, we can create teams that cost less than 100 and see which one has the highest total score. However there are a few players we can get rid of before doing that.

The idea is that if we can only substitute 1 player, so we should only keep the 1 player per position with the highest total score for each price (you would always rather choose a player with a higher score for the same price, and you can choose a maximum of 1). We're also not considering players that don't have a usual chance of playing (so only players for which it says 'View player information,').

In [73]:
### Picking top players per price ###

#Initialize variables
short_tables={} # Final list of teams
#For each position, pick the top 5 per price (note that the tables are already ordered by total points, so we can)
#just pick the first 5 per price
for i in range(0,len(POSITIONS)):
    table=tables[POSITIONS[i]]
    short_table=[]
    prices=set(table[2]) #list of unique prices
    for price in prices:
        counter=0 #when we've reached the max number of players, we have to leave the loop
        for j in range(0,len(table)):
            isPlaying = (table[0][j]=='View player information,')
            if (table[2][j]==price and isPlaying):
                short_table.append([table[0][j],table[1][j],table[2][j],table[3][j],table[5][j]])
                break
    short_tables[POSITIONS[i]]=np.asmatrix(short_table)

Finally, we loop through each possible substitution and find the one that creates the highest total score differential

In [76]:
#Declare variables
maxDiff=0
subOut=''
subInt=''
#Loop through players and find best transfer
for i in range(0,NUM_PLAYERS):
    curPlayer=curTeam[1][i]
    curPos=curTeam[4][i]
    curPrice=float(removeChars(curTeam[2][i]))
    curScore=float(removeChars(curTeam[3][i]))
    curT=curTeam[5][i]
    #Loop through all players that have same position as current player
    for row in short_tables[curPos]:
        price=float(removeChars(row[0,2]))
        team=row[0,4]
        #Check if proposed player doesn't violate max 3 players per team rule
        if (team not in maxedOutTeams or curT==team):
            #Check if transfer is feasible in terms of money and if proposed player is not already in roster
            if (curPrice+remainingCash-price>=0 and row[0,1] not in curTeam[1]):
                score=float(removeChars(row[0,3]))
                #If score differential is more than current maximum, set this transfer as the best one
                if (score-curScore > maxDiff):
                    subOut=curPlayer
                    subIn=row[0,1]
                    maxDiff=score-curScore
print('Substitute in ' + subIn[:-1] + ' for ' + subOut[:-1]+ '.')

Substitute in Heaton for de Gea.
