# 220_____load_____&_prep_Fantasy1516  
## Purpose 
In this notebook we load in our Fantasty dataset and perform any necessary cleaning.  
## Datasets 
* _Input_: N/A
* _Output_: SEA1516.csv

In [2]:
import os.path
import numpy as np
import pandas as pd

### Reading in all player information from the Top 5 Leagues from statsbunker.com
* Each league has its own unqique ID used when reading the table.  
* Using pd.read_html these tables can be read in with ease.

In [3]:
BPL1516 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=515")[0]
BUN1516 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=516")[0]
LAL1516 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=518")[0]
SEI1516 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=517")[0]
FRA1516 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=514")[0]

### Using pd.concat to merge each league together.  
* Resetting index.  
* Drop old index and column 'More' which contains no information.

In [4]:
SEA1516 = [BPL1516,BUN1516,LAL1516,SEI1516,FRA1516]
SEA1516 = pd.concat(SEA1516)
SEA1516 = SEA1516.reset_index()
SEA1516.drop(SEA1516.columns[[0,20]], axis=1, inplace=True)

In [7]:
SEA1516.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,Red,Sub,CO,Off,Pen SV,Pen M,Goals conceded,Conceded 1+,OG
0,Jamie Vardy,202,LEICSC,Forward,36,24,6,-,-,6,0,0,0,7,-,1,-,-,0
1,Harry Kane,199,SPURS,Forward,38,25,2,-,-,5,0,0,0,9,-,0,-,-,1
2,Riyad Mahrez,189,LEICSC,Midfielder,36,17,12,-,-,1,0,2,1,24,-,2,-,-,0
3,Sergio Aguero,182,MCFC,Forward,29,24,2,-,-,1,0,3,1,16,-,1,-,-,0
4,Romelu Lukaku,179,EVER,Forward,36,18,7,-,-,3,0,2,1,8,-,1,-,-,0


### Creating new columns Apps and Form
* Apps is the total number of appearances a player makes (Starts plus being subbed on)
* Form is the total number of points divided by the number of appearances.
* All null values in Form filled in as 0

In [8]:
SEA1516['Apps'] = SEA1516['Start'] + SEA1516['CO']
SEA1516['Form'] = SEA1516['Points']/SEA1516['Apps']
SEA1516.loc[SEA1516['Form'].isnull(),'Form'] = 0

In [18]:
SEA1516.Form.isnull().sum()

0

### Replacing all Hyphen values with 0
* In this dataset hyphen represented no data so these were replaced with 0

In [12]:
DashColumns = ['Goals','A','CS','CS part','Yellow','Red','Sub','CO','Off','Pen SV','Pen M','Goals conceded','Conceded 1+','OG',]
SEA1516[DashColumns] = SEA1516[DashColumns].replace('-' ,'0')

### Checking dataframe data types
* Changed the columns CS, CS part, Pen SV, Goals conceded and Conceded 1+ from type object to int

In [13]:
SEA1516.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3121 entries, 0 to 3120
Data columns (total 21 columns):
Players           3121 non-null object
Points            3121 non-null int64
Clubs             2435 non-null object
Position          3121 non-null object
Start             3121 non-null int64
Goals             3121 non-null int64
A                 3121 non-null int64
CS                3121 non-null object
CS part           3121 non-null object
Yellow            3121 non-null int64
Red               3121 non-null int64
Sub               3121 non-null int64
CO                3121 non-null int64
Off               3121 non-null int64
Pen SV            3121 non-null object
Pen M             3121 non-null int64
Goals conceded    3121 non-null object
Conceded 1+       3121 non-null object
OG                3121 non-null int64
Apps              3121 non-null int64
Form              3121 non-null float64
dtypes: float64(1), int64(12), object(8)
memory usage: 512.1+ KB


In [14]:
SEA1516['CS'] = SEA1516['CS'].str.replace('%','').astype(np.int)
SEA1516['CS part'] = SEA1516['CS part'].str.replace('%','').astype(np.int)
SEA1516['Pen SV'] = SEA1516['Pen SV'].str.replace('%','').astype(np.int)
SEA1516['Goals conceded'] = SEA1516['Goals conceded'].str.replace('%','').astype(np.int)
SEA1516['Conceded 1+'] = SEA1516['Conceded 1+'].str.replace('%','').astype(np.int)

#### Spliting player names into First Initial and Surname.
Repeating process from FIFA18 notebook in preperation for joining.  
Each case deals with players with single, normal double and triple barrel names.  
EG: Dele Alli is now D.Alli

In [19]:
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 1, 'FirstInitial'] = SEA1516['Players'].astype(str).str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 1, 'Firstname'] = SEA1516['Players'].str.split().str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 1, 'Surname'] = SEA1516['Players'].str.split().str[0]

SEA1516.loc[SEA1516['Players'].str.split().str.len() == 2, 'FirstInitial'] = SEA1516['Players'].astype(str).str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 2, 'Firstname'] = SEA1516['Players'].str.split().str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 2, 'Surname'] = SEA1516['Players'].str.split().str[-1]

SEA1516.loc[SEA1516['Players'].str.split().str.len() == 3, 'FirstInitial'] = SEA1516['Players'].astype(str).str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 3, 'Firstname'] = SEA1516['Players'].str.split().str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 3, 'Surname'] = SEA1516['Players'].str.split().str[-1]

SEA1516.loc[SEA1516['Players'].str.split().str.len() == 4, 'FirstInitial'] = SEA1516['Players'].astype(str).str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 4, 'Firstname'] = SEA1516['Players'].str.split().str[0]
SEA1516.loc[SEA1516['Players'].str.split().str.len() == 4, 'Surname'] = SEA1516['Players'].str.split().str[-1]

### Cleaned dataframe now finished

In [20]:
SEA1516.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,...,Pen SV,Pen M,Goals conceded,Conceded 1+,OG,Apps,Form,FirstInitial,Firstname,Surname
0,Jamie Vardy,202,LEICSC,Forward,36,24,6,0,0,6,...,0,1,0,0,0,36,5.611111,J,Jamie,Vardy
1,Harry Kane,199,SPURS,Forward,38,25,2,0,0,5,...,0,0,0,0,1,38,5.236842,H,Harry,Kane
2,Riyad Mahrez,189,LEICSC,Midfielder,36,17,12,0,0,1,...,0,2,0,0,0,37,5.108108,R,Riyad,Mahrez
3,Sergio Aguero,182,MCFC,Forward,29,24,2,0,0,1,...,0,1,0,0,0,30,6.066667,S,Sergio,Aguero
4,Romelu Lukaku,179,EVER,Forward,36,18,7,0,0,3,...,0,1,0,0,0,37,4.837838,R,Romelu,Lukaku


#### Saving to csv file in data/prep

In [17]:
SEA1516.to_csv('../../data/prep/Fantasy1516.csv')