# 200_____load_____&_prep_Fantasy1718  
## Purpose 
In this notebook we load in our Fantasty dataset and perform any necessary cleaning.  
## Datasets 
* _Input_: N/A
* _Output_: SEA1718.csv

In [38]:
import os.path
import numpy as np
import pandas as pd

### Reading in all player information from the Top 5 Leagues from statsbunker.com
* Each league has its own unqique ID used when reading the table.  
* Using pd.read_html these tables can be read in with ease.

In [39]:
BPL1718 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=586")[0]
BUN1718 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=600")[0]
LAL1718 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=591")[0]
SEI1718 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=593")[0]
FRA1718 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=594")[0]

### Using pd.concat to merge each league together.  
* Resetting index.  
* Drop old index and column 'More' which contains no information.

In [40]:
SEA1718 = [BPL1718,BUN1718,LAL1718,SEI1718,FRA1718]
SEA1718 = pd.concat(SEA1718)
SEA1718 = SEA1718.reset_index()
SEA1718.drop(SEA1718.columns[[0,20]], axis=1, inplace=True)

In [41]:
SEA1718.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,Red,Sub,CO,Off,Pen SV,Pen M,Goals conceded,Conceded 1+,OG
0,Mohamed Salah,246,LPOOL,Forward,32,31,9,-,-,0,0,2,2,16,-,1,-,-,0
1,Harry Kane,193,SPURS,Forward,31,26,2,-,-,5,0,2,2,9,-,1,-,-,0
2,Raheem Sterling,174,MCFC,Midfielder,27,18,10,-,-,4,0,5,4,9,-,0,-,-,0
3,Sergio Aguero,168,MCFC,Forward,22,21,6,-,-,2,0,7,3,9,-,0,-,-,0
4,Romelu Lukaku,160,MUFC,Forward,32,16,7,-,-,4,0,1,1,2,-,1,-,-,0


### Creating new columns Apps and Form
* Apps is the total number of appearances a player makes (Starts plus being subbed on)
* Form is the total number of points divided by the number of appearances.
* All null values in Form filled in as 0

In [42]:
SEA1718['Apps'] = SEA1718['Start'] + SEA1718['CO']
SEA1718['Form'] = SEA1718['Points']/SEA1718['Apps']
SEA1718.loc[SEA1718['Form'].isnull(),'Form'] = 0

In [43]:
SEA1718.Form.isnull().sum()

0

### Replacing all Hyphen values with 0
* In this dataset hyphen represented no data so these were replaced with 0

In [44]:
DashColumns = ['Goals','A','CS','CS part','Yellow','Red','Sub','CO','Off','Pen SV','Pen M','Goals conceded','Conceded 1+','OG',]
SEA1718[DashColumns] = SEA1718[DashColumns].replace('-' ,'0')

### Checking dataframe data types
* Changed the columns CS, CS part, Pen SV, Goals conceded and Conceded 1+ from type object to int

In [45]:
SEA1718.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3025 entries, 0 to 3024
Data columns (total 21 columns):
Players           3025 non-null object
Points            3025 non-null int64
Clubs             2362 non-null object
Position          3025 non-null object
Start             3025 non-null int64
Goals             3025 non-null int64
A                 3025 non-null int64
CS                3025 non-null object
CS part           3025 non-null object
Yellow            3025 non-null int64
Red               3025 non-null int64
Sub               3025 non-null int64
CO                3025 non-null int64
Off               3025 non-null int64
Pen SV            3025 non-null object
Pen M             3025 non-null int64
Goals conceded    3025 non-null object
Conceded 1+       3025 non-null object
OG                3025 non-null int64
Apps              3025 non-null int64
Form              3025 non-null float64
dtypes: float64(1), int64(12), object(8)
memory usage: 496.4+ KB


In [46]:
SEA1718['CS'] = SEA1718['CS'].str.replace('%','').astype(np.int)
SEA1718['CS part'] = SEA1718['CS part'].str.replace('%','').astype(np.int)
SEA1718['Pen SV'] = SEA1718['Pen SV'].str.replace('%','').astype(np.int)
SEA1718['Goals conceded'] = SEA1718['Goals conceded'].str.replace('%','').astype(np.int)
SEA1718['Conceded 1+'] = SEA1718['Conceded 1+'].str.replace('%','').astype(np.int)

#### Spliting player names into First Initial and Surname.
Repeating process from FIFA18 notebook in preperation for joining.  
Each case deals with players with single, normal double and triple barrel names.  
EG: Harry Kane is now H.Kane

In [47]:
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 1, 'FirstInitial'] = SEA1718['Players'].astype(str).str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 1, 'Firstname'] = SEA1718['Players'].str.split().str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 1, 'Surname'] = SEA1718['Players'].str.split().str[0]

SEA1718.loc[SEA1718['Players'].str.split().str.len() == 2, 'FirstInitial'] = SEA1718['Players'].astype(str).str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 2, 'Firstname'] = SEA1718['Players'].str.split().str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 2, 'Surname'] = SEA1718['Players'].str.split().str[-1]

SEA1718.loc[SEA1718['Players'].str.split().str.len() == 3, 'FirstInitial'] = SEA1718['Players'].astype(str).str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 3, 'Firstname'] = SEA1718['Players'].str.split().str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 3, 'Surname'] = SEA1718['Players'].str.split().str[-1]

SEA1718.loc[SEA1718['Players'].str.split().str.len() == 4, 'FirstInitial'] = SEA1718['Players'].astype(str).str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 4, 'Firstname'] = SEA1718['Players'].str.split().str[0]
SEA1718.loc[SEA1718['Players'].str.split().str.len() == 4, 'Surname'] = SEA1718['Players'].str.split().str[-1]

### Cleaned dataframe now finished

In [48]:
SEA1718.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,...,Pen SV,Pen M,Goals conceded,Conceded 1+,OG,Apps,Form,FirstInitial,Firstname,Surname
0,Mohamed Salah,246,LPOOL,Forward,32,31,9,0,0,0,...,0,1,0,0,0,34,7.235294,M,Mohamed,Salah
1,Harry Kane,193,SPURS,Forward,31,26,2,0,0,5,...,0,1,0,0,0,33,5.848485,H,Harry,Kane
2,Raheem Sterling,174,MCFC,Midfielder,27,18,10,0,0,4,...,0,0,0,0,0,31,5.612903,R,Raheem,Sterling
3,Sergio Aguero,168,MCFC,Forward,22,21,6,0,0,2,...,0,0,0,0,0,25,6.72,S,Sergio,Aguero
4,Romelu Lukaku,160,MUFC,Forward,32,16,7,0,0,4,...,0,1,0,0,0,33,4.848485,R,Romelu,Lukaku


#### Saving to csv file in data/prep

In [49]:
SEA1718.to_csv('../../data/prep/Fantasy1718.csv')