# 240_____load_____&_prep_Fantasy1314
## Purpose In this notebook we load in our our Fantasty dataset and perform any necessary cleaning.  
## Datasets 
* _Input_: N/A
* _Output_: SEA1314.csv

In [1]:
import os.path
import numpy as np
import pandas as pd

### Reading in all player information from the Top 5 Leagues from statsbunker.com
* Each league has its own unqique ID used when reading the table.  
* Using pd.read_html these tables can be read in with ease.

In [2]:
BPL1314 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=449")[0]
BUN1314 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=447")[0]
LAL1314 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=461")[0]
SEI1314 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=462")[0]
FRA1314 = pd.read_html("https://www.statbunker.com/competitions/FantasyFootballPlayersStats?comp_id=454")[0]

### Using pd.concat to merge each league together.  
* Resetting index.  
* Drop old index and column 'More' which contains no information.

In [3]:
SEA1314 = [BPL1314,BUN1314,LAL1314,SEI1314,FRA1314]
SEA1314 = pd.concat(SEA1314)
SEA1314 = SEA1314.reset_index()
SEA1314.drop(SEA1314.columns[[0,20]], axis=1, inplace=True)

In [5]:
SEA1314.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,Red,Sub,CO,Off,Pen SV,Pen M,Goals conceded,Conceded 1+,OG
0,Luis Suarez,254,LPOOL,Forward,33,31,13,-,-,6,0,0,0,3,-,0,-,-,0
1,Yaya Toure,193,MCFC,Midfielder,35,20,9,-,-,4,0,1,0,10,-,0,-,-,0
2,Daniel Sturridge,177,LPOOL,Forward,26,21,7,-,-,2,0,3,3,13,-,1,-,-,0
3,Olivier Giroud,172,ARSL,Forward,36,16,8,-,-,4,0,1,0,16,-,0,-,-,0
4,Steven Gerrard,162,LPOOL,Midfielder,33,13,13,-,-,7,0,1,1,6,-,1,-,-,0


### Creating new columns Apps and Form
* Apps is the total number of appearances a player makes (Starts plus being subbed on)
* Form is the total number of points divided by the number of appearances.
* All null values in Form filled in as 0

In [6]:
SEA1314['Apps'] = SEA1314['Start'] + SEA1314['CO']
SEA1314['Form'] = SEA1314['Points']/SEA1314['Apps']
SEA1314.loc[SEA1314['Form'].isnull(),'Form'] = 0

In [7]:
SEA1314.Form.isnull().sum()

0

### Replacing all Hyphen values with 0
* In this dataset hyphen represented no data so these were replaced with 0

In [8]:
DashColumns = ['Goals','A','CS','CS part','Yellow','Red','Sub','CO','Off','Pen SV','Pen M','Goals conceded','Conceded 1+','OG',]
SEA1314[DashColumns] = SEA1314[DashColumns].replace('-' ,'0')

### Checking dataframe data types
* Changed the columns CS, CS part, Pen SV, Goals conceded and Conceded 1+ from type object to int

In [9]:
SEA1314.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2999 entries, 0 to 2998
Data columns (total 21 columns):
Players           2999 non-null object
Points            2999 non-null int64
Clubs             2420 non-null object
Position          2999 non-null object
Start             2999 non-null int64
Goals             2999 non-null int64
A                 2999 non-null int64
CS                2999 non-null object
CS part           2999 non-null object
Yellow            2999 non-null int64
Red               2999 non-null int64
Sub               2999 non-null int64
CO                2999 non-null int64
Off               2999 non-null int64
Pen SV            2999 non-null object
Pen M             2999 non-null int64
Goals conceded    2999 non-null object
Conceded 1+       2999 non-null object
OG                2999 non-null int64
Apps              2999 non-null int64
Form              2999 non-null float64
dtypes: float64(1), int64(12), object(8)
memory usage: 492.1+ KB


In [10]:
SEA1314['CS'] = SEA1314['CS'].str.replace('%','').astype(np.int)
SEA1314['CS part'] = SEA1314['CS part'].str.replace('%','').astype(np.int)
SEA1314['Pen SV'] = SEA1314['Pen SV'].str.replace('%','').astype(np.int)
SEA1314['Goals conceded'] = SEA1314['Goals conceded'].str.replace('%','').astype(np.int)
SEA1314['Conceded 1+'] = SEA1314['Conceded 1+'].str.replace('%','').astype(np.int)

#### Spliting player names into First Initial and Surname.
Repeating process from FIFA18 notebook in preperation for joining.  
Each case deals with players with single, normal double and triple barrel names.  
EG: Aaron Moy is now A.Moy

In [15]:
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 1, 'FirstInitial'] = SEA1314['Players'].astype(str).str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 1, 'Firstname'] = SEA1314['Players'].str.split().str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 1, 'Surname'] = SEA1314['Players'].str.split().str[0]

SEA1314.loc[SEA1314['Players'].str.split().str.len() == 2, 'FirstInitial'] = SEA1314['Players'].astype(str).str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 2, 'Firstname'] = SEA1314['Players'].str.split().str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 2, 'Surname'] = SEA1314['Players'].str.split().str[-1]

SEA1314.loc[SEA1314['Players'].str.split().str.len() == 3, 'FirstInitial'] = SEA1314['Players'].astype(str).str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 3, 'Firstname'] = SEA1314['Players'].str.split().str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 3, 'Surname'] = SEA1314['Players'].str.split().str[-1]

SEA1314.loc[SEA1314['Players'].str.split().str.len() == 4, 'FirstInitial'] = SEA1314['Players'].astype(str).str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 4, 'Firstname'] = SEA1314['Players'].str.split().str[0]
SEA1314.loc[SEA1314['Players'].str.split().str.len() == 4, 'Surname'] = SEA1314['Players'].str.split().str[-1]

### Cleaned dataframe now finished

In [16]:
SEA1314.head(5)

Unnamed: 0,Players,Points,Clubs,Position,Start,Goals,A,CS,CS part,Yellow,...,Pen SV,Pen M,Goals conceded,Conceded 1+,OG,Apps,Form,FirstInitial,Firstname,Surname
0,Luis Suarez,254,LPOOL,Forward,33,31,13,0,0,6,...,0,0,0,0,0,33,7.69697,L,Luis,Suarez
1,Yaya Toure,193,MCFC,Midfielder,35,20,9,0,0,4,...,0,0,0,0,0,35,5.514286,Y,Yaya,Toure
2,Daniel Sturridge,177,LPOOL,Forward,26,21,7,0,0,2,...,0,1,0,0,0,29,6.103448,D,Daniel,Sturridge
3,Olivier Giroud,172,ARSL,Forward,36,16,8,0,0,4,...,0,0,0,0,0,36,4.777778,O,Olivier,Giroud
4,Steven Gerrard,162,LPOOL,Midfielder,33,13,13,0,0,7,...,0,1,0,0,0,34,4.764706,S,Steven,Gerrard


#### Saving to csv file in data/prep

In [14]:
SEA1314.to_csv('../../data/prep/Fantasy1314.csv')