# 420_prep_RQ3_Dataframes
## Purpose 
In this notebook we prepare a dataframe for our Research Question 3.  
## Datasets 
* _Input_: Joined1617.csv,Joined1516.csv,Joined1415.csv,Joined1314.csv,Joined1213.csv
* _Output_: RQ3.csv

In [1]:
import math
import os.path
import numpy as np
import pandas as pd

### Reading in our cleaned Joined datasets from 16-17 to 12-13.

In [2]:
DF1 = pd.read_csv("../../data/prep/Joined1617.csv")
DF2 = pd.read_csv("../../data/prep/Joined1516.csv")
DF3 = pd.read_csv("../../data/prep/Joined1415.csv")
DF4 = pd.read_csv("../../data/prep/Joined1314.csv")
DF5 = pd.read_csv("../../data/prep/Joined1213.csv")

## Choosing which columns are needed for the Research Question
* index refers to all columns that have a fixed figure in FIFA.

In [3]:
index = ['Players','club','league','age','nationality','Position','overall',
 'pac','sho','pas','dri','def','phy','international_reputation','skill_moves',
 'weak_foot','work_rate_att','work_rate_def','preferred_foot','crossing','finishing',
 'heading_accuracy','short_passing','volleys','dribbling','curve','free_kick_accuracy','long_passing',
 'ball_control','acceleration','sprint_speed','agility','reactions','balance',
 'shot_power','jumping','stamina','strength','long_shots','aggression','interceptions',
 'positioning', 'vision', 'penalties', 'composure', 'marking', 'standing_tackle',
 'sliding_tackle','gk_diving','gk_handling','gk_kicking','gk_positioning','gk_reflexes']

### Concatting each years using index and relevant Fantasy info

In [4]:
s1 = DF1.drop_duplicates(index).set_index(index)[['Apps','Goals','A','CS','CS part','Yellow','Red']]
s2 = DF2.drop_duplicates(index).set_index(index)[['Apps','Goals','A','CS','CS part','Yellow','Red']]
s3 = DF3.drop_duplicates(index).set_index(index)[['Apps','Goals','A','CS','CS part','Yellow','Red']]
s4 = DF4.drop_duplicates(index).set_index(index)[['Apps','Goals','A','CS','CS part','Yellow','Red']]
s5 = DF5.drop_duplicates(index).set_index(index)[['Apps','Goals','A','CS','CS part','Yellow','Red']]

RQ3 = pd.concat([s1,s2,s3,s4,s5], axis=1, keys=('16/17','15/16','14/15','13/14','12/13')).fillna(0).astype(float).reset_index()

## Creating New Columns
* **Total Apps** - contains the total number of appearances a player has made.<br><br>

* **Average Goals/Game** - is the Total Goals divided by Total Apps a player has played.<br><br>

* **Average Assists/Game** - is the Total Assists divided by Total Apps a player has played.<br><br>

* **Clean Sheets** - contains the total number of clean sheets a player has kept.<br><br>

* **Partial Clean Sheets** - contains the total number of partial clean sheets a player has kept.<br><br>

* **Total Clean Sheets** - contains the total number of clean sheets addded to the number of partial clean sheets.<br><br>

* **Average Clean Sheets/Game** - is the Total Clean Sheets divided by Total Apps a player has played.<br><br>

* **Average Yellows/Game** - is the Total Yellow Cards divided by Total Apps a player has played.<br><br>

* **Average Red/Game** - is the Total Red Cards divided by Total Apps a player has played.<br><br>

* **Average Cards/Game** - is the Total Red Cards + Total Yellow Cards divided by Total Apps a player has played.<br><br>

* **Attacking Attributes** - is all Attacking Attributes added together divided by the total number of Attacking Attributes giving an average. This is then rounded to an even number<br><br>

* **Defending Attributes** - is all Defending Attributes added together divided by the total number of Defending Attributes giving an average. This is then rounded to an even number<br><br>

* **Goalkeeping Attributes** - is all Goalkeeping Attributes added together divided by the total number of Goalkeeping Attributes giving an average. This is then rounded to an even number<br><br>


In [5]:
RQ3['Total Apps'] = RQ3['16/17']['Apps'] + RQ3['15/16']['Apps'] + RQ3['14/15']['Apps'] + RQ3['13/14']['Apps'] + RQ3['12/13']['Apps']

In [6]:
RQ3['Total Goals'] = RQ3['16/17']['Goals']+RQ3['15/16']['Goals']+RQ3['14/15']['Goals']+RQ3['13/14']['Goals']+RQ3['12/13']['Goals']

In [7]:
RQ3['Average Goals/Game'] = RQ3['Total Goals']/RQ3['Total Apps']

In [8]:
RQ3['Total Assists'] = RQ3['16/17']['A']+RQ3['15/16']['A']+RQ3['14/15']['A']+RQ3['13/14']['A']+RQ3['12/13']['A']

In [9]:
RQ3['Average Assists/Game'] = RQ3['Total Assists']/RQ3['Total Apps']

In [10]:
RQ3['Clean Sheets'] = RQ3['16/17']['CS']+RQ3['15/16']['CS']+RQ3['14/15']['CS']+RQ3['13/14']['CS']+RQ3['12/13']['CS']

In [11]:
RQ3['Partial Clean Sheets'] = RQ3['16/17']['CS part']+RQ3['15/16']['CS part']+RQ3['14/15']['CS part']+RQ3['13/14']['CS part']+RQ3['12/13']['CS part']

In [12]:
RQ3['Total Clean Sheets'] = RQ3['Clean Sheets'] + RQ3['Partial Clean Sheets'] 

In [13]:
RQ3['Average Clean Sheets/Game'] = RQ3['Total Clean Sheets']/RQ3['Total Apps']

In [14]:
RQ3['Total Yellows'] = RQ3['16/17']['Yellow']+RQ3['15/16']['Yellow']+RQ3['14/15']['Yellow']+RQ3['13/14']['Yellow']+RQ3['12/13']['Yellow']

In [15]:
RQ3['Average Yellows/Game'] = RQ3['Total Yellows']/RQ3['Total Apps']

In [16]:
RQ3['Total Reds'] = RQ3['16/17']['Red']+RQ3['15/16']['Red']+RQ3['14/15']['Red']+RQ3['13/14']['Red']+RQ3['12/13']['Red']

In [17]:
RQ3['Average Reds/Game'] = RQ3['Total Reds']/RQ3['Total Apps']

In [18]:
RQ3['Total Cards'] = RQ3['Total Yellows']+RQ3['Total Reds']

In [19]:
RQ3['Average Cards/Game'] = (RQ3['Total Reds']+RQ3['Total Yellows'])/RQ3['Total Apps']

In [20]:
RQ3['Attacking Attributes'] = (RQ3['crossing']+RQ3['finishing']+RQ3['heading_accuracy']+RQ3['short_passing']+RQ3['volleys'])/5

In [21]:
RQ3['Attacking Attributes'] = RQ3['Attacking Attributes'].round(0).apply(int)

In [22]:
RQ3['Defending Attributes'] = (RQ3['marking']+RQ3['standing_tackle']+RQ3['sliding_tackle'])/3

In [23]:
RQ3['Defending Attributes'] = RQ3['Defending Attributes'].round(0).apply(int)

In [24]:
RQ3['Goalkeeping Attributes'] = (RQ3['gk_diving']+RQ3['gk_handling']+RQ3['gk_kicking']+RQ3['gk_positioning']+RQ3['gk_reflexes'])/5

In [25]:
RQ3['Goalkeeping Attributes'] = RQ3['Goalkeeping Attributes'].round(0).apply(int)

## Filtering
* Removing players who have played less than 15 games in the last 5 seasons

In [26]:
RQ3 = RQ3[RQ3['Total Apps']>15]

## Tidying Up
* First checking what columns are contain in the dataframe.
* Splitting into 5 dataframes each with individual columns for each graph and picking the positions they are applicable too.

In [27]:
list(RQ3)

[('Players', ''),
 ('club', ''),
 ('league', ''),
 ('age', ''),
 ('nationality', ''),
 ('Position', ''),
 ('overall', ''),
 ('pac', ''),
 ('sho', ''),
 ('pas', ''),
 ('dri', ''),
 ('def', ''),
 ('phy', ''),
 ('international_reputation', ''),
 ('skill_moves', ''),
 ('weak_foot', ''),
 ('work_rate_att', ''),
 ('work_rate_def', ''),
 ('preferred_foot', ''),
 ('crossing', ''),
 ('finishing', ''),
 ('heading_accuracy', ''),
 ('short_passing', ''),
 ('volleys', ''),
 ('dribbling', ''),
 ('curve', ''),
 ('free_kick_accuracy', ''),
 ('long_passing', ''),
 ('ball_control', ''),
 ('acceleration', ''),
 ('sprint_speed', ''),
 ('agility', ''),
 ('reactions', ''),
 ('balance', ''),
 ('shot_power', ''),
 ('jumping', ''),
 ('stamina', ''),
 ('strength', ''),
 ('long_shots', ''),
 ('aggression', ''),
 ('interceptions', ''),
 ('positioning', ''),
 ('vision', ''),
 ('penalties', ''),
 ('composure', ''),
 ('marking', ''),
 ('standing_tackle', ''),
 ('sliding_tackle', ''),
 ('gk_diving', ''),
 ('gk_handli

### RQ3_1 : Shooting


In [28]:
RQ3_1 = RQ3[['Players','club','league','overall','age','Position','sho','crossing','finishing','heading_accuracy','short_passing','volleys','composure','preferred_foot','work_rate_att','weak_foot','Attacking Attributes','Total Apps','Total Goals','Average Goals/Game']]

In [29]:
RQ3_1 = RQ3_1[RQ3_1['Position']=='Forward']

#### Saving to csv file in data/analysis

In [30]:
RQ3_1.to_csv('../../data/analysis/RQ3_1.csv')

### RQ3_2: Assists

In [31]:
RQ3_2 = RQ3[['Players','club','league','overall','age','Position','pas','crossing','vision','short_passing','long_passing','dribbling','Total Apps','Total Assists','Average Assists/Game']]

In [32]:
RQ3_2 = RQ3_2[(RQ3_2['Position']=='Midfielder')|(RQ3_2['Position']=='Forward')]

#### Saving to csv file in data/analysis

In [33]:
RQ3_2.to_csv('../../data/analysis/RQ3_2.csv')

### RQ3_3: Goalkeeping

In [34]:
RQ3_3 = RQ3[['Players','club','league','overall','age','Position','gk_diving','gk_handling','gk_kicking','gk_positioning','gk_reflexes','Goalkeeping Attributes','Total Apps','Total Clean Sheets','Average Clean Sheets/Game']]

In [35]:
RQ3_3 = RQ3_3[RQ3_3['Position']=='Goalkeeper']

#### Saving to csv file in data/analysis

In [36]:
RQ3_3.to_csv('../../data/analysis/RQ3_3.csv')

### RQ3_4: Defending

In [37]:
RQ3_4 = RQ3[['Players','club','league','overall','age','Position','Defending Attributes','def','marking','standing_tackle','sliding_tackle','work_rate_def','Total Apps','Total Cards','Average Clean Sheets/Game']]

In [38]:
RQ3_4 = RQ3_4[RQ3_4['Position']=='Defender']

#### Saving to csv file in data/analysis

In [39]:
RQ3_4.to_csv('../../data/analysis/RQ3_4.csv')

### RQ3_5: Mentality(Agression)

In [40]:
RQ3_5 = RQ3[['Players','club','league','overall','age','Position','aggression','work_rate_def','Total Apps','Total Cards','Average Cards/Game']]

In [41]:
RQ3_5 = RQ3_5[RQ3_5['Position']!='Goalkeeper']

#### Saving to csv file in data/analysis

In [42]:
RQ3_5.to_csv('../../data/analysis/RQ3_5.csv')