# Introduction
For our analysis, we need the reamainig contract length of a player, since it can significantly change the price of a player. The only source that I could find, which has the past contract lengths of a player, is FIFA. Thus, for the 2019/2020 transfer window we want to look at, we can use a FIFA 19  dataframe and get the contract length of the last season before a player left his club. For this, I downloaded a file from [Kaggle](https://www.kaggle.com/karangadiya/fifa19), which has the data for all players on the game, and saved it as "FIFA19.csv".

In [None]:
import pandas as pd
import numpy as np

# Data Handling

In [None]:
#Read in fifa dataframe
fifa=pd.read_csv("FIFA19.csv")
fifa

In [None]:
#read in dataframe with transfers
transfers=pd.read_pickle("transfers.pkl")
transfers

In [None]:
#Only take the columns we need to get contract lenght of players
contracts=fifa[["Name","Age","Club","Contract Valid Until","ID"]]

In [None]:
#Only take the columns we need to get contract lenght of players
transfers=transfers[["Left","Joined","Age"]]
transfers

In [None]:
#Reset index to make iteration simpler
transfers.reset_index(inplace=True)

In [None]:
#add two new columns to store contract length and player id on fifa
transfers=pd.concat([transfers,pd.DataFrame(columns=["Contract","ID"])])
#Reverse the order of the columns
transfers=transfers[list(transfers.columns)[::-1]]


In [None]:
#make the names of the player more compatibe with FIFA
transfers["Player"]=transfers.Player.replace({'-':' ',"'":""}, regex=True)
transfers

# Get Contract length of player
Now we can get the contract lenght of the player. One of the problem we have though, is that the player and club names on FIFA and Transfermarkt.ch are not the same. This makes getting the contract length more complicated, since it is not possible to exactly match the player and club names. As a work around, we can use the `SequenceMatcher` moodule form [`difflib`](https://docs.python.org/3/library/difflib.html). This module makes it possible to calculate a "Similarity-Ratio" between two strings. With this ratio we can now compare how similar two strings are. If the Names on Fifa and transfermarkt are similar enough, we can then conclude that this is the same player. While this solution will lead us to missing some players and adding some wrong players, it will also allow us to find the contracts of more players. I tried to minimise the amount of errors by comparing the name, the club and the age of the player. Only if all three are similar, I concluded that the players are the same. While there still were some mistakes (the algorithm for instance concluded that  [Javi Hernández](https://www.fifaindex.com/de/player/244523/javi-hern%C3%A1ndez/changelog/) and [Theo Hernández](https://www.fifaindex.com/de/player/232656/theo-hern%C3%A1ndez/) were the same player, since both have a similar name, left the same club (Real Madrid) and are roughly the same age), I could minimise the amount of mistakes this way.

In [None]:
#Source: https://docs.python.org/3/library/difflib.html
from difflib import SequenceMatcher
import difflib

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

In [None]:
#Iterate through both dataframe
for index,row in transfers.iterrows():
    for ind,ro in contracts.iterrows():
        try: #If the surname of a player is equals that in FIFA or the surname is similar to that in FIFA, continue
            if transfers.iloc[index]["Player"].partition(" ")[-1] in contracts.iloc[ind]["Name"] or similar(transfers.iloc[index]["Player"].partition(" ")[-1], contracts.iloc[ind]["Name"])>0.6:
                names_compare=[]#create name list
                indices=[]#create indices list
                age_dif=abs(transfers.iloc[index]["Age"]-contracts.iloc[ind]["Age"])#calculate age difference
                #if club name is samilar or fifa club name is in transfermarkt club name and age difference is 1 or smaller: Continue
                if similar(transfers.iloc[index]["Left"],contracts.iloc[ind]["Club"])>0.5 and age_dif<1.5 or contracts.iloc[ind]["Club"] in transfers.iloc[index]["Left"] and age_dif<1.5:
                    names_compare.extend([transfers.iloc[index]["Left"],transfers.iloc[index]["Joined"]])#add all players that are similar
                    indices.append(ind)#add their index
                if len(names_compare)>0:#if there are such players: Continue
                    closest=difflib.get_close_matches(contracts.iloc[ind]["Club"],names_compare,1,0)#Look which club name is most similar
                    indice=names_compare.index(closest[0])//2
                    transfers.iloc[index,3]=contracts.iloc[indices[indice]]["ID"]#add most similar players id
                    transfers.iloc[index,4]=contracts.iloc[indices[indice]]["Contract Valid Until"] #add most similar player contract length
        except TypeError:
            continue

In [None]:
#drop all players that lack contract length
transfers=transfers.dropna()

In [None]:
#reset index
transfers=transfers.reset_index().drop("index",axis=1)

Since the Contract Values for some players are wrong, since they were loaned out right before being sold, we need to get their contract length through webscraping. For those players, we use [fifaindex.com](https://www.fifaindex.com/) and try to find their contract length there:

In [None]:
#import packages for webscraping
import requests
from bs4 import BeautifulSoup

In [None]:
#Create list with all invalide contract values
invalids=['Jun 30, 2019', '2019', '2018', 'Dec 31, 2018', 'Jun 30, 2020']
for index,row in transfers.iterrows():#iterate through transfers df
    if transfers.iloc[index]["Contract"] in invalids: #check if contract value is in invalids and continue if this the case
        name=transfers.iloc[index]["Player"].replace(" ","-")#adjust the name for the url
        ids=transfers.iloc[index]["ID"]#get player id fou URL
        print(name)
        page="https://www.fifaindex.com/de/player/"+str(ids)+"/" +name+"/changelog/"#create URL link for player
        html = requests.get(page).text #get webpage
        data = BeautifulSoup(html, 'html5')#create BeatifulSoup object
        all_text=data.find_all("div",{"class":"card mb-5"})#get relevant html code
        d=0
        for i in range(len(all_text)):#iterate through parts where contract length could be
            if "Vertragsdauer" in all_text[i].text and "202" in all_text[i].text:#check if it is contract length and make sure it is the right contract
                d=d+1#in order to just check the first one
                con=all_text[i].find_all("div",{"class":"mb-2 col-6"})#get html code of where contract length could be
                if d<1.1:
                        for n in range(len(con)):#iterate through relevant part
                            if "Vertragsdauer" in con[n].text and  "202" in con[n].text and "FIFA 20" in all_text[i].text:#chck wheter it's the right contract and get contract value
                                if int(con[n].text[15:19])>2019:#
                                    print(name)
                                    print(con[n].text[15:19])
                                    transfers.iloc[index,4]=con[n].text[15:19]
                                elif int(con[n].text[15:19])<=2019:
                                    transfers.iloc[index,4]=con[n].text[21:25]
                                    print(name)
                                    print(con[n].text[21:25])
                            elif "Vertragsdauer" in con[n].text and "202" in con[n].text:
                                if int(con[n].text[15:19])<=2019:
                                    transfers.iloc[index,4]=con[n].text[21:25]
                                    print(name)
                                    print(con[n].text[21:25])
                            

In [None]:
#drop all rows which still have invalid contracts
index_drop=[]
for index, row in transfers.iterrows():
    if transfers.iloc[index,4] in invalids:
        index_drop.append(index)
transfers.drop(index_drop,axis=0,inplace=True)

In [None]:
#Change contract value from string to numeric
transfers["Contract"]=pd.to_numeric(transfers["Contract"])

In [None]:
#Get remaining years on contract
transfers["Contract"]=transfers["Contract"]-2019

In [None]:
#change index to player names
transfers.index=transfers["Player"]

In [None]:
#save dataframe as pickle
transfers.to_pickle("contracts.pkl")