# 250_____load_____&_prep_Fantasy1213  
## Purpose 
In this notebook we read in our Fantasty dataset and perform any necessary cleaning.  
## Datasets 
* _Input_: N/A
* _Output_: SEA1213.csv

In [2]:
import os.path
import numpy as np
import pandas as pd

### Reading in all player information from the Top 5 Leagues from statsbunker.com
* Each league has its own unqique ID used when reading the table.  
* Using pd.read_html these tables can be read in with ease.

In [3]:
BPL1213 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=415")[0]
BUN1213 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=416")[0]
LAL1213 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=413")[0]
SEI1213 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=414")[0]
FRA1213 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=412")[0]

### Using pd.concat to merge each league together.  
* Resetting index.  
* Drop old index and column 'More' which contains no information.

In [4]:
SEA1213 = [BPL1213,BUN1213,LAL1213,SEI1213,FRA1213]
SEA1213 = pd.concat(SEA1213)
SEA1213 = SEA1213.reset_index()
SEA1213.drop(SEA1213.columns[[0,20]], axis=1, inplace=True)

In [5]:
SEA1213.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,Red,Sub,CO,Off,Pen SV,Pen M,Goals conceded,Conceded 1+,OG
0,Robin van Persie,225,MUFC,Forward,35,26,10,-,-,6,0,3,3,4,-,1,-,-,0
1,Luis Suarez,186,LPOOL,Forward,33,23,5,-,-,10,0,0,0,2,-,0,-,-,0
2,Gareth Bale,180,SPURS,Midfielder,33,21,6,-,-,6,0,0,0,4,-,0,-,-,1
3,Christian Benteke,165,AVILLA,Forward,32,19,4,-,-,8,0,2,2,4,-,0,-,-,0
4,Santi Cazorla,164,ARSL,Midfielder,37,12,10,-,-,1,0,1,1,11,-,0,-,-,0


### Creating new columns Apps and Form
* Apps is the total number of appearances a player makes (Starts plus being subbed on)
* Form is the total number of points divided by the number of appearances.
* All null values in Form filled in as 0

In [6]:
SEA1213['Apps'] = SEA1213['Start'] + SEA1213['CO']
SEA1213['Form'] = SEA1213['Points']/SEA1213['Apps']
SEA1213.loc[SEA1213['Form'].isnull(),'Form'] = 0

In [7]:
SEA1213.Form.isnull().sum()

0

### Replacing all Hyphen values with 0
* In this dataset hyphen represented no data so these were replaced with 0

In [8]:
DashColumns = ['Goals','A','CS','CS part','Yellow','Red','Sub','CO','Off','Pen SV','Pen M','Goals conceded','Conceded 1+','OG',]
SEA1213[DashColumns] = SEA1213[DashColumns].replace('-' ,'0')

### Checking dataframe data types
* Changed the columns CS, CS part, Pen SV, Goals conceded and Conceded 1+ from type object to int

In [9]:
SEA1213.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3096 entries, 0 to 3095
Data columns (total 21 columns):
Players           3096 non-null object
Points            3096 non-null int64
Clubs             2408 non-null object
Position          3096 non-null object
Start             3096 non-null int64
Goals             3096 non-null int64
A                 3096 non-null int64
CS                3096 non-null object
CS part           3096 non-null object
Yellow            3096 non-null int64
Red               3096 non-null int64
Sub               3096 non-null int64
CO                3096 non-null int64
Off               3096 non-null int64
Pen SV            3096 non-null object
Pen M             3096 non-null int64
Goals conceded    3096 non-null object
Conceded 1+       3096 non-null object
OG                3096 non-null int64
Apps              3096 non-null int64
Form              3096 non-null float64
dtypes: float64(1), int64(12), object(8)
memory usage: 508.0+ KB


In [10]:
SEA1213['CS'] = SEA1213['CS'].str.replace('%','').astype(np.int)
SEA1213['CS part'] = SEA1213['CS part'].str.replace('%','').astype(np.int)
SEA1213['Pen SV'] = SEA1213['Pen SV'].str.replace('%','').astype(np.int)
SEA1213['Goals conceded'] = SEA1213['Goals conceded'].str.replace('%','').astype(np.int)
SEA1213['Conceded 1+'] = SEA1213['Conceded 1+'].str.replace('%','').astype(np.int)

#### Spliting player names into First Initial and Surname.
Repeating process from FIFA18 notebook in preperation for joining.  
Each case deals with players with single, normal double and triple barrel names.  
EG: Wayne Rooney is now W.Rooney

In [11]:
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 1, 'FirstInitial'] = SEA1213['Players'].astype(str).str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 1, 'Firstname'] = SEA1213['Players'].str.split().str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 1, 'Surname'] = SEA1213['Players'].str.split().str[0]

SEA1213.loc[SEA1213['Players'].str.split().str.len() == 2, 'FirstInitial'] = SEA1213['Players'].astype(str).str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 2, 'Firstname'] = SEA1213['Players'].str.split().str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 2, 'Surname'] = SEA1213['Players'].str.split().str[-1]

SEA1213.loc[SEA1213['Players'].str.split().str.len() == 3, 'FirstInitial'] = SEA1213['Players'].astype(str).str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 3, 'Firstname'] = SEA1213['Players'].str.split().str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 3, 'Surname'] = SEA1213['Players'].str.split().str[-1]

SEA1213.loc[SEA1213['Players'].str.split().str.len() == 4, 'FirstInitial'] = SEA1213['Players'].astype(str).str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 4, 'Firstname'] = SEA1213['Players'].str.split().str[0]
SEA1213.loc[SEA1213['Players'].str.split().str.len() == 4, 'Surname'] = SEA1213['Players'].str.split().str[-1]

### Cleaned dataframe now finished

In [12]:
SEA1213.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,...,Pen SV,Pen M,Goals conceded,Conceded 1+,OG,Apps,Form,FirstInitial,Firstname,Surname
0,Robin van Persie,225,MUFC,Forward,35,26,10,0,0,6,...,0,1,0,0,0,38,5.921053,R,Robin,Persie
1,Luis Suarez,186,LPOOL,Forward,33,23,5,0,0,10,...,0,0,0,0,0,33,5.636364,L,Luis,Suarez
2,Gareth Bale,180,SPURS,Midfielder,33,21,6,0,0,6,...,0,0,0,0,1,33,5.454545,G,Gareth,Bale
3,Christian Benteke,165,AVILLA,Forward,32,19,4,0,0,8,...,0,0,0,0,0,34,4.852941,C,Christian,Benteke
4,Santi Cazorla,164,ARSL,Midfielder,37,12,10,0,0,1,...,0,0,0,0,0,38,4.315789,S,Santi,Cazorla


#### Saving to csv file in data/prep

In [13]:
SEA1213.to_csv('../../data/prep/Fantasy1213.csv')