# 100_____load_____&_prep_FIFA18_dataset

## Purpose
In this notebook we load in our FIFA complete dataset and perform any necessary cleaning. 
## Datasets
* _Input_: complete.csv00
* _Output_: FIFA18.csv

In [1]:
import os.path
import numpy as np
import pandas as pd

## Loading the Datasets
The dataset is in a standard csv format reading the filein with pd.read_csv

In [2]:
FIFA18 = pd.read_csv("../../data/raw/complete.csv")
FIFA18.shape

(17994, 185)

#### Quick look at the csv file

In [3]:
FIFA18.head(5)

Unnamed: 0,ID,name,full_name,club,club_logo,special,age,league,birth_date,height_cm,...,prefers_cb,prefers_lb,prefers_lwb,prefers_ls,prefers_lf,prefers_lam,prefers_lcm,prefers_ldm,prefers_lcb,prefers_gk
0,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Real Madrid CF,https://cdn.sofifa.org/18/teams/243.png,2228,32,Spanish Primera División,1985-02-05,185.0,...,False,False,False,False,False,False,False,False,False,False
1,158023,L. Messi,Lionel Messi,FC Barcelona,https://cdn.sofifa.org/18/teams/241.png,2158,30,Spanish Primera División,1987-06-24,170.0,...,False,False,False,False,False,False,False,False,False,False
2,190871,Neymar,Neymar da Silva Santos Jr.,Paris Saint-Germain,https://cdn.sofifa.org/18/teams/73.png,2100,25,French Ligue 1,1992-02-05,175.0,...,False,False,False,False,False,False,False,False,False,False
3,176580,L. Suárez,Luis Suárez,FC Barcelona,https://cdn.sofifa.org/18/teams/241.png,2291,30,Spanish Primera División,1987-01-24,182.0,...,False,False,False,False,False,False,False,False,False,False
4,167495,M. Neuer,Manuel Neuer,FC Bayern Munich,https://cdn.sofifa.org/18/teams/21.png,1493,31,German Bundesliga,1986-03-27,193.0,...,False,False,False,False,False,False,False,False,False,True


#### Quick look at all the columns contained in the csv

In [4]:
FIFA18.columns.tolist()

['ID',
 'name',
 'full_name',
 'club',
 'club_logo',
 'special',
 'age',
 'league',
 'birth_date',
 'height_cm',
 'weight_kg',
 'body_type',
 'real_face',
 'flag',
 'nationality',
 'photo',
 'eur_value',
 'eur_wage',
 'eur_release_clause',
 'overall',
 'potential',
 'pac',
 'sho',
 'pas',
 'dri',
 'def',
 'phy',
 'international_reputation',
 'skill_moves',
 'weak_foot',
 'work_rate_att',
 'work_rate_def',
 'preferred_foot',
 'crossing',
 'finishing',
 'heading_accuracy',
 'short_passing',
 'volleys',
 'dribbling',
 'curve',
 'free_kick_accuracy',
 'long_passing',
 'ball_control',
 'acceleration',
 'sprint_speed',
 'agility',
 'reactions',
 'balance',
 'shot_power',
 'jumping',
 'stamina',
 'strength',
 'long_shots',
 'aggression',
 'interceptions',
 'positioning',
 'vision',
 'penalties',
 'composure',
 'marking',
 'standing_tackle',
 'sliding_tackle',
 'gk_diving',
 'gk_handling',
 'gk_kicking',
 'gk_positioning',
 'gk_reflexes',
 'rs',
 'rw',
 'rf',
 'ram',
 'rcm',
 'rm',
 'rdm',
 'r

#### Checking to see if any columns contain null values

In [5]:
FIFA18.isnull().sum()

ID                                 0
name                               0
full_name                          0
club                             253
club_logo                        253
special                            0
age                                0
league                           253
birth_date                         0
height_cm                          0
weight_kg                          0
body_type                          0
real_face                          0
flag                               0
nationality                        0
photo                              0
eur_value                          0
eur_wage                           0
eur_release_clause              1494
overall                            0
potential                          0
pac                                0
sho                                0
pas                                0
dri                                0
def                                0
phy                                0
i

#### Identified the problems with null values in clubs and leagues.
Filling null leagues with 'Rest Of The World' and null clubs as Free Agents as these players have no club but are still contained in the game

In [6]:
FIFA18.loc[FIFA18['league'].isnull(), 'league'] = 'Rest Of The World'
FIFA18.loc[FIFA18['club'].isnull(), 'club'] = 'Free Agent'

#### Normalizing all characters in the dataset
This will later help when merging datasets when characters are different.  
This will help reproducibility.   
An example could be Mesut Özil who contains an umlaut in his name which could cause problems.

Partial code used from : 
https://gist.github.com/j4mie/557354

In [7]:
cols = FIFA18.select_dtypes(include=[np.object]).columns
FIFA18[cols] = FIFA18[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))

#### Spliting player names into First Initial and Surname.
Repeated four times in each case dealing with players with double and triple barrel names.
EG: Kevin De Bryne is now K.Bryne

In [8]:
FIFA18.loc[FIFA18['name'].str.split().str.len() == 1, 'FirstInitial'] = FIFA18['name'].astype(str).str[0]
FIFA18.loc[FIFA18['name'].str.split().str.len() == 1, 'Surname'] = FIFA18['name'].str.split().str[-1]

FIFA18.loc[FIFA18['name'].str.split().str.len() == 2, 'FirstInitial'] = FIFA18['name'].astype(str).str[0]
FIFA18.loc[FIFA18['name'].str.split().str.len() == 2, 'Surname'] = FIFA18['name'].str.split().str[-1]

FIFA18.loc[FIFA18['name'].str.split().str.len() == 3, 'FirstInitial'] = FIFA18['name'].astype(str).str[0]
FIFA18.loc[FIFA18['name'].str.split().str.len() == 3, 'Surname'] = FIFA18['name'].str.split().str[-1]

FIFA18.loc[FIFA18['name'].str.split().str.len() == 4, 'FirstInitial'] = FIFA18['name'].astype(str).str[0]
FIFA18.loc[FIFA18['name'].str.split().str.len() == 4, 'Surname'] = FIFA18['name'].str.split().str[-1]

### Cleaned dataframe now finished

In [11]:
FIFA18.head(5)

Unnamed: 0,ID,name,full_name,club,club_logo,special,age,league,birth_date,height_cm,...,prefers_lwb,prefers_ls,prefers_lf,prefers_lam,prefers_lcm,prefers_ldm,prefers_lcb,prefers_gk,FirstInitial,Surname
0,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Real Madrid CF,https://cdn.sofifa.org/18/teams/243.png,2228,32,Spanish Primera Division,1985-02-05,185.0,...,False,False,False,False,False,False,False,False,C,Ronaldo
1,158023,L. Messi,Lionel Messi,FC Barcelona,https://cdn.sofifa.org/18/teams/241.png,2158,30,Spanish Primera Division,1987-06-24,170.0,...,False,False,False,False,False,False,False,False,L,Messi
2,190871,Neymar,Neymar da Silva Santos Jr.,Paris Saint-Germain,https://cdn.sofifa.org/18/teams/73.png,2100,25,French Ligue 1,1992-02-05,175.0,...,False,False,False,False,False,False,False,False,N,Neymar
3,176580,L. Suarez,Luis Suarez,FC Barcelona,https://cdn.sofifa.org/18/teams/241.png,2291,30,Spanish Primera Division,1987-01-24,182.0,...,False,False,False,False,False,False,False,False,L,Suarez
4,167495,M. Neuer,Manuel Neuer,FC Bayern Munich,https://cdn.sofifa.org/18/teams/21.png,1493,31,German Bundesliga,1986-03-27,193.0,...,False,False,False,False,False,False,False,True,M,Neuer


#### Saving to csv file in data/prep

In [10]:
FIFA18.to_csv('../../data/prep/FIFA18.csv')