# 210_____load_____&_prep_Fantasy1617  
## Purpose In this notebook we load in our FIFA complete dataset and perform any necessary cleaning.  
## Datasets 
* _Input_: N/A
* _Output_: SEA1617.csv

In [3]:
import os.path
import numpy as np
import pandas as pd

### Reading in all player information from the Top 5 Leagues from statsbunker.com
* Each league has its own unqique ID used when reading the table.  
* Using pd.read_html these tables can be read in with ease.

In [4]:
BPL1617 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=556")[0]
BUN1617 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=561")[0]
LAL1617 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=564")[0]
SEI1617 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=562")[0]
FRA1617 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=563")[0]

### Using pd.concat to merge each league together.  
* Resetting index.  
* Drop old index and column 'More' which contains no information.

In [5]:
SEA1617 = [BPL1617,BUN1617,LAL1617,SEI1617,FRA1617]
SEA1617 = pd.concat(SEA1617)
SEA1617 = SEA1617.reset_index()
SEA1617.drop(SEA1617.columns[[0,20]], axis=1, inplace=True)

In [6]:
SEA1617.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,Red,Sub,CO,Off,Pen SV,Pen M,Goals conceded,Conceded 1+,OG
0,Harry Kane,220,SPURS,Forward,29,29,7,-,-,3,0,1,1,16,-,1,-,-,0
1,Alexis Sanchez,219,ARSL,Forward,36,24,11,-,-,6,0,2,2,8,-,1,-,-,0
2,Romelu Lukaku,213,EVER,Forward,36,25,6,-,-,3,0,1,1,2,-,0,-,-,0
3,Diego Costa,179,CHEL,Forward,35,20,7,-,-,10,0,1,0,8,-,1,-,-,0
4,Dele Alli,179,SPURS,Midfielder,35,18,7,-,-,4,0,2,2,13,-,0,-,-,0


### Creating new columns Apps and Form
* Apps is the total number of appearances a player makes (Starts plus being subbed on)
* Form is the total number of points divided by the number of appearances.
* All null values in Form filled in as 0

In [6]:
SEA1617['Apps'] = SEA1617['Start'] + SEA1617['CO']
SEA1617['Form'] = SEA1617['Points']/SEA1617['Apps']
SEA1617.loc[SEA1617['Form'].isnull(),'Form'] = 0

In [7]:
SEA1617.Form.isnull().sum()

0

### Replacing all Hyphen values with 0
* In this dataset hyphen represented no data so these were replaced with 0

In [11]:
DashColumns = ['Goals','A','CS','CS part','Yellow','Red','Sub','CO','Off','Pen SV','Pen M','Goals conceded','Conceded 1+','OG',]
SEA1617[DashColumns] = SEA1617[DashColumns].replace('-' ,'0')

### Checking dataframe data types
* Changed the columns CS, CS part, Pen SV, Goals conceded and Conceded 1+ from type object to int

In [12]:
SEA1617.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3141 entries, 0 to 3140
Data columns (total 21 columns):
Players           3141 non-null object
Points            3141 non-null int64
Clubs             2398 non-null object
Position          3141 non-null object
Start             3141 non-null int64
Goals             3141 non-null int64
A                 3141 non-null int64
CS                3141 non-null object
CS part           3141 non-null object
Yellow            3141 non-null int64
Red               3141 non-null int64
Sub               3141 non-null int64
CO                3141 non-null int64
Off               3141 non-null int64
Pen SV            3141 non-null object
Pen M             3141 non-null int64
Goals conceded    3141 non-null object
Conceded 1+       3141 non-null object
OG                3141 non-null int64
Apps              3141 non-null int64
Form              3141 non-null float64
dtypes: float64(1), int64(12), object(8)
memory usage: 515.4+ KB


In [13]:
SEA1617['CS'] = SEA1617['CS'].str.replace('%','').astype(np.int)
SEA1617['CS part'] = SEA1617['CS part'].str.replace('%','').astype(np.int)
SEA1617['Pen SV'] = SEA1617['Pen SV'].str.replace('%','').astype(np.int)
SEA1617['Goals conceded'] = SEA1617['Goals conceded'].str.replace('%','').astype(np.int)
SEA1617['Conceded 1+'] = SEA1617['Conceded 1+'].str.replace('%','').astype(np.int)

#### Spliting player names into First Initial and Surname.
Repeating process from FIFA18 notebook in preperation for joining.  
Each case deals with players with single, normal double and triple barrel names.  
EG: Pierre Emerick Aubameyang is now P.Aubameyang

In [9]:
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 1, 'FirstInitial'] = SEA1617['Players'].astype(str).str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 1, 'Firstname'] = SEA1617['Players'].str.split().str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 1, 'Surname'] = SEA1617['Players'].str.split().str[0]

SEA1617.loc[SEA1617['Players'].str.split().str.len() == 2, 'FirstInitial'] = SEA1617['Players'].astype(str).str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 2, 'Firstname'] = SEA1617['Players'].str.split().str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 2, 'Surname'] = SEA1617['Players'].str.split().str[-1]

SEA1617.loc[SEA1617['Players'].str.split().str.len() == 3, 'FirstInitial'] = SEA1617['Players'].astype(str).str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 3, 'Firstname'] = SEA1617['Players'].str.split().str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 3, 'Surname'] = SEA1617['Players'].str.split().str[-1]

SEA1617.loc[SEA1617['Players'].str.split().str.len() == 4, 'FirstInitial'] = SEA1617['Players'].astype(str).str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 4, 'Firstname'] = SEA1617['Players'].str.split().str[0]
SEA1617.loc[SEA1617['Players'].str.split().str.len() == 4, 'Surname'] = SEA1617['Players'].str.split().str[-1]

### Cleaned dataframe now finished

In [11]:
SEA1617.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,...,Pen SV,Pen M,Goals conceded,Conceded 1+,OG,Apps,Form,FirstInitial,Firstname,Surname
0,Harry Kane,220,SPURS,Forward,29,29,7,-,-,3,...,-,1,-,-,0,30,7.333333,H,Harry,Kane
1,Alexis Sanchez,219,ARSL,Forward,36,24,11,-,-,6,...,-,1,-,-,0,38,5.763158,A,Alexis,Sanchez
2,Romelu Lukaku,213,EVER,Forward,36,25,6,-,-,3,...,-,0,-,-,0,37,5.756757,R,Romelu,Lukaku
3,Diego Costa,179,CHEL,Forward,35,20,7,-,-,10,...,-,1,-,-,0,35,5.114286,D,Diego,Costa
4,Dele Alli,179,SPURS,Midfielder,35,18,7,-,-,4,...,-,0,-,-,0,37,4.837838,D,Dele,Alli


#### Saving to csv file in data/prep

In [12]:
SEA1617.to_csv('../../data/prep/Fantasy1617.csv')