# Fifa 20 Ultimate Team Player/Prices Analysis

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
<li><a href="#statistical">Statistical Tests</a></li>
</ul>

<a id='intro'></a>
## Introduction

> 

<a id='wrangling'></a>
## Data Wrangling


### Gathering Data

In [1]:
import zipfile
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
random.seed(42)

In [None]:
# Extract contents from Premier League zip file 

with zipfile.ZipFile('fifa-20-ultimate-team-players-dataset.zip', 'r') as myzip:
    myzip.extractall()

In [2]:
# Read Players CSV

df_16_players = pd.read_csv('fut_bin16_players.csv')
df_17_players = pd.read_csv('fut_bin17_players.csv')
df_18_players = pd.read_csv('fut_bin18_players.csv')
df_19_players = pd.read_csv('fut_bin19_players.csv')
df_20_players = pd.read_csv('fut_bin20_players.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
# Read Prices CSV

df_16_prices = pd.read_csv('fut_bin16_prices.csv')
df_17_prices = pd.read_csv('fut_bin17_prices.csv')
df_18_prices = pd.read_csv('fut_bin18_prices.csv')
df_19_prices = pd.read_csv('fut_bin19_prices.csv')
df_20_prices = pd.read_csv('fut_bin20_prices.csv')

### Assessing Data


In [127]:
# Dataframe selection - Select here which dataframe will be used. 

df = df_20_players.copy()

In [130]:
# Visual Assessment 

df.head()

Unnamed: 0,futbin_id,player_name,player_extended_name,quality,revision,origin,overall,club,league,nationality,...,ps4_max,ps4_prp,xbox_last,xbox_min,xbox_max,xbox_prp,pc_last,pc_min,pc_max,pc_prp
0,1,Pelé,Edson Arantes Nascimento,Gold - Rare,Icon,Prime,98,Icons,Icons,Brazil,...,11700000.0,47,5198000.0,593000.0,11300000.0,43,,790000.0,15000000.0,47
1,2,Maradona,Diego Maradona,Gold - Rare,Icon,Prime,97,Icons,Icons,Argentina,...,7600000.0,47,3799000.0,364000.0,6900000.0,52,6500000.0,456000.0,8700000.0,73
2,3,Ronaldo,Ronaldo Luís Nazário de Lima,Gold - Rare,Icon,Prime,96,Icons,Icons,Brazil,...,15000000.0,90,10300000.0,945000.0,15000000.0,66,,790000.0,15000000.0,96
3,4,Pelé,Edson Arantes Nascimento,Gold - Rare,Icon,Medium,95,Icons,Icons,Brazil,...,6500000.0,58,3375000.0,313000.0,5900000.0,54,5000000.0,399000.0,7600000.0,63
4,5,Maradona,Diego Maradona,Gold - Rare,Icon,Medium,95,Icons,Icons,Argentina,...,3400000.0,55,1825000.0,193000.0,3700000.0,46,2200000.0,234000.0,4400000.0,47


In [131]:
print('Quality: ', df.quality.unique())
print('Position: ',df.position.unique())

Quality:  ['Gold - Rare' 'Gold - Non-Rare' 'Silver - Rare' 'Silver - Non-Rare'
 'Bronze - Non-Rare' 'Bronze - Rare']
Position:  ['CAM' 'ST' 'CF' 'CB' 'GK' 'LW' 'CM' 'RW' 'LM' 'RB' 'CDM' 'LB' 'RM' 'RWB'
 'LWB' 'RF' 'LF']


In [133]:
df['height'].describe(), df['weight'].describe()

(count    18775.000000
 mean       181.405326
 std          6.787917
 min        155.000000
 25%        177.000000
 50%        181.000000
 75%        186.000000
 max        205.000000
 Name: height, dtype: float64,
 count    18775.000000
 mean        75.444527
 std          7.099590
 min         50.000000
 25%         70.000000
 50%         75.000000
 75%         80.000000
 max        110.000000
 Name: weight, dtype: float64)

In [137]:
df.columns

Index(['futbin_id', 'player_name', 'player_extended_name', 'quality',
       'revision', 'origin', 'overall', 'club', 'league', 'nationality',
       'position', 'age', 'date_of_birth', 'height', 'weight', 'intl_rep',
       'added_date', 'pace', 'pace_acceleration', 'pace_sprint_speed',
       'dribbling', 'drib_agility', 'drib_balance', 'drib_reactions',
       'drib_ball_control', 'drib_dribbling', 'drib_composure', 'shooting',
       'shoot_positioning', 'shoot_finishing', 'shoot_shot_power',
       'shoot_long_shots', 'shoot_volleys', 'shoot_penalties', 'passing',
       'pass_vision', 'pass_crossing', 'pass_free_kick', 'pass_short',
       'pass_long', 'pass_curve', 'defending', 'def_interceptions',
       'def_heading', 'def_marking', 'def_stand_tackle', 'def_slid_tackle',
       'physicality', 'phys_jumping', 'phys_stamina', 'phys_strength',
       'phys_aggression', 'gk_diving', 'gk_reflexes', 'gk_handling',
       'gk_speed', 'gk_kicking', 'gk_positoning', 'pref_foot', 'att_w

#### Counter-attacking 4-4-2 strategy

Analysis of a counter-attacking strategy built over a 4-4-2. In the defense, the team needs two Centre-backs (CB's) one tall and the other one fast enough to cover. Since the strategy is a low defense line to attract the opponent, a strong left and right backs (LB, RB) are needed to gain back the possession. These two players will cover half of the pitch extension and they need to be good at defending and passing. They don't need to be good at the attack or crossing. In the midfield, two central midfielders (CDM, CM/CAM) will share the task of attacking the opponent to gain the possession back and start the counter-attack. So we need a tall midfielder, good in the air, strong tackling and another one good at long passing. On the sides, two speedsters are needed to support the strikers. They will also be important in defending when losing possession upfront and recomposing. In the attacking positions, we need one tall and strong striker and a goal scoring number 9. When the ball reaches the attacking positions, the two fast right and left-wingers will give a passing option to the strikers. 

**Centre-Back 1** = Tall, good in the air, good at the positioning. \
**Centre-Back 2** = Medium height, fast enough to cover CB 1 and good at the positioning. 

**Left Back** - Good at defense and passing. \
**Right Back** - Good at defense and passing. 

**Central Defensive Midfielder** - Tall, good in the air, strong tackling. \
**Central Attacking Midfielder** - Good at long passing, good vision. 

**Left Winger** - Speedsters. Good crossing.\
**Right Winger** - Speedsters. Good crossing.

**Striker** - Clinical at finishing. \
**Centre-forward** - Strong, good ball control. 

In [99]:
# Create Dataframes per position 

# Goalkeeper Dataframe
goal = df.query('position == "GK"')

# Defense Dataframe
defense = df.query('position == "CB" or position == "RB" or position == "LB" or position == "RWB" or position == "RWB"')

# Midfield Dataframe
midfield = df.query('position == "CAM" or position == "CM" or position == "CDM" or position == "RM" or position == "LM"')

# Attacking Dataframe
attack = df.query('position == "ST" or position == "CF" or position == "RF" or position == "LF" or position == "LW" or position == "RW"')


In [93]:
# Functions per attributes.

# Body 
def height():
    plt.title('Height'), 
    plt.axvline(df.height.mean(), c='red'),
    plt.hist(df.height, alpha=0.5)    
    
def weight():
    plt.title('Weight'), 
    plt.axvline(df.weight.mean(), c='red'),
    plt.hist(df.weight, alpha=0.5)    

# Speed and Acceleration 
def pace(pos):
    plt.title('Pace'), 
    plt.axvline(pos.pace.mean(), c='red'),
    plt.hist(pos.pace, alpha=0.5)
     
def pace_acc(pos):
    plt.title('Pace Acceleration'), 
    plt.axvline(pos.pace_acceleration.mean(), c='red'),
    plt.hist(pos.pace_acceleration, alpha=0.5)

def pace_sprint(pos):
    plt.title('Pace Sprint Speed:'), 
    plt.axvline(pos.pace_sprint_speed.mean(), c='red'),
    plt.hist(pos.pace_sprint_speed, alpha=0.5)

# Shooting
def shooting(pos):
    plt.title('Shooting:'), 
    plt.axvline(pos.shooting.mean(), c='red'),
    plt.hist(pos.shooting, alpha=0.5)

# Passing 
def passing(pos):
    plt.title('Passing:'), 
    plt.axvline(pos.passing.mean(), c='red'),
    plt.hist(pos.passing, alpha=0.5)

def passing_long(pos):
    plt.title('Long Passing:'), 
    plt.axvline(pos.pass_long.mean(), c='red'),
    plt.hist(pos.pass_long, alpha=0.5)

# Defending 
def defending(pos):
    plt.title('Defending:'), 
    plt.axvline(pos.defending.mean(), c='red'),
    plt.hist(pos.defending, alpha=0.5)



### Cleaning Data

#### 
> 

<a id='eda'></a>
## Exploratory Data Analysis

### Research Question 1  - 
> **Anwser** = 