# SC1015 Mini Project

---

### Essential Libraries

Let us begin by importing the essential Python Libraries.

> NumPy : Library for Numeric Computations in Python  
> Pandas : Library for Data Acquisition and Preparation  
> Matplotlib : Low-level library for Data Visualization  
> Seaborn : Higher-level library for Data Visualization  

In [None]:
# Basic Libraries

import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics

### Import the Fifa Dataset

The dataset is in CSV format; hence we use the `read_csv` function from Pandas to impart the FIFA 19 Dataset

In [None]:
fifaData = pd.read_csv('Fifa 19.csv')
fifaData.head()

In [None]:
fifaData.info()

### Dropping the columns of data that cannot be used 

Firstly, we decided to drop the columns of data which contains variables that definitely cannot be used in determining wages. This is because it does not give any valuable information whatsoever.

The columns are 'ID', 'Photo', 'Flag', 'Club', 'Club Logo','Value', 'Special', 'Preferred Foot', 'Work Rate', 
               'International Reputation', 'Weak Foot', 'Position', 'Skill Moves', 'Body Type', 
               'Real Face', 'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until' and 
               'Release Clause'

In [None]:
fifaData.drop(['ID', 'Photo', 'Flag', 'Club', 'Club Logo', 'Value', 'Special', 'Preferred Foot', 'Work Rate', 
               'International Reputation', 'Weak Foot', 'Position', 'Skill Moves', 'Body Type', 
               'Real Face', 'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until', 
               'Release Clause'], axis = 1, inplace = True)

In [None]:
fifaData.info()

### Dropping the columns of data that are non-specific

We also decided to drop the columns of data which contains the ratings of players when placed in every position there is. This data is non-specific and cannot be used. (why can't it be used?)

A much better variable to be used is Overall, since it takes into account the average of all of the positions combined.

The columns being dropped are 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM', 
               'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB', 
               'LCB', 'CB', 'RCB', and 'RB'.

In [None]:
fifaData.drop(fifaData.iloc[:, 9:35], inplace = True, axis = 1)

In [None]:
fifaData.info()

### Dropping the columns of data that are football-related

Lastly, we decided to drop the specific statistics because the aim of our project is to find out how non-football statistics determine wages.


The columns being dropped are 'Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning' and 'GKReflexes'.

In [None]:
fifaData.drop(fifaData.iloc[:, 9:43], inplace = True, axis = 1)

In [None]:
fifaData.info()

### Data Cleaning

We will now be focussing on the variables that we are most probably going to use in determining wages. But first, we would have to clean the data appropriately so that we are able to utilise them.

In [None]:
fifaData.head()

In [None]:
#Cleaning the Wage and making it Dtype integer instead of object

fifaData['Wage'] = fifaData['Wage'].str.replace('€', '')
fifaData['Wage'] = fifaData['Wage'].str.replace('K', '').astype(int)

#Removing the rows with Wage 0
fifaData = fifaData.loc[fifaData["Wage"] != 0]

#Renaming the column from Wage to Wage (in Thousands)
fifaData = fifaData.rename(columns={'Wage': 'Wage (in Thousands)'})

In [None]:
fifaData.info()

In [None]:
#Cleaning the Height and making it Dtype float instead of object

fifaData['Height'] = fifaData['Height'].str.replace("'", ".").astype(float)
fifaData['Height'] = fifaData['Height'].multiply(30.48).round(1)

In [None]:
fifaData.info()

In [None]:
#Cleaning the Weight 
fifaData['Weight'] = fifaData['Weight'].str.replace('lbs', '')

#Removing the rows with NULL Weight
fifaData.dropna(subset=['Weight'], inplace = True)

#Making Weight Dtype integer instead of object
fifaData['Weight'] = fifaData['Weight'].astype(int)

In [None]:
fifaData.info()