# FIFA PLAYER ANALYSIS & RECOMMENDATION

- Part 1: FIFA Player Analysis (You are here)
- Part 2: FIFA Player Recommendation (Go to this notebook)

**Table of Content**

| Sr. No | Section Title | Description |
| ------ | ------------- | ----------- |
| 00 | Drive & Files | Linking your google drive and uploading your Kaggle API token |
| 01 | Kaggle Setup | Information & details on downloading dataset via Kaggle API |
| 02 | Modules & Library | Information on setting up requirements for training and inference |
| 03 | Data Analysis | Information on Analysis of players |

Quick Summary of Analysis

1. Top 5 Players per Position
2. Top 5 Highest Earning players
3. Most common and rare player positions
4. Top 10 Highest Potential Players by Age
5. Comparison: Internation Reputation vs Overall
6. Comparison: Internation Reputation vs Wages Earned
7. Impact of Speed for Attacking/Midfield/defenders Player Ratings
8. Impact of Age on Wages/Skills
9. Mean Age per Position
10. Top 10 Countries with Highest Rated Players
11. Top 10 Clubs with Highest Rated Players
12. Top 10 Players by Work Rate

## Section 0: Drive & Files

>     Linking your google drive and uploading your Kaggle API token

#### 1: Mounting your Google Drive

In [1]:
from google.colab import drive, files
# drive.mount('/content/drive')

#### 2: Uploading your Kaggle API Token File

In [2]:
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"inboxpraveen17","key":"a4ee5a949866942dca35b8f132765639"}'}

## Section 1: Kaggle Setup

> 	Information & details on downloading dataset via Kaggle API

#### 1: Enable Kaggle API for User Mode Aceess

In [3]:
!pip install -q kaggle
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

#### 2: Downloading the Dataset from Kaggle to Google Colab

In [4]:
!kaggle datasets download -d stefanoleone992/fifa-22-complete-player-dataset

Downloading fifa-22-complete-player-dataset.zip to /content
 99% 108M/109M [00:04<00:00, 28.5MB/s]
100% 109M/109M [00:04<00:00, 22.9MB/s]


#### 3: Extracting Dataset

Whenever we download dataset from Kaggle, it is usually in zip or tar compression technique. To be able to access dataset, we need to extract it using suitable decompression technique.

In [5]:
## Unzip the dataset into /content/data directory 
!unzip /content/fifa-22-complete-player-dataset.zip -d /content/data

## Once extracted, we then remove it to save disk space.
!rm /content/fifa-22-complete-player-dataset.zip

Archive:  /content/fifa-22-complete-player-dataset.zip
  inflating: /content/data/Career Mode female player datasets - FIFA 16-22.xlsx  
  inflating: /content/data/Career Mode player datasets - FIFA 15-22.xlsx  
  inflating: /content/data/female_players_16.csv  
  inflating: /content/data/female_players_17.csv  
  inflating: /content/data/female_players_18.csv  
  inflating: /content/data/female_players_19.csv  
  inflating: /content/data/female_players_20.csv  
  inflating: /content/data/female_players_21.csv  
  inflating: /content/data/female_players_22.csv  
  inflating: /content/data/players_15.csv  
  inflating: /content/data/players_16.csv  
  inflating: /content/data/players_17.csv  
  inflating: /content/data/players_18.csv  
  inflating: /content/data/players_19.csv  
  inflating: /content/data/players_20.csv  
  inflating: /content/data/players_21.csv  
  inflating: /content/data/players_22.csv  


## Section 2: Modules & Library

>     Information on setting up requirements for training and inference

#### 1: Installing required packages

In [None]:
!pip install --quiet fastparquet
!pip install --quiet pyarrow

---

For your Information: This notebook is complied on Google Colab that provides most of the modules pre-installed in working environment. If you happen to run it locally on your system, you may need to install additional dependencies. 

---

#### 2: Importing Rerquired Packages

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import math
import datetime as dt

%matplotlib inline

## Section 3: Data Analysis

>     Information on Analysis of players

In [2]:
fifa_22_players = pd.read_csv("/content/data/players_22.csv",low_memory=False)
print(fifa_22_players.shape)

(19239, 110)


In [3]:
fifa_22_players.head(3)

Unnamed: 0,sofifa_id,player_url,short_name,long_name,player_positions,overall,potential,value_eur,wage_eur,age,...,lcb,cb,rcb,rb,gk,player_face_url,club_logo_url,club_flag_url,nation_logo_url,nation_flag_url
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,"RW, ST, CF",93,93,78000000.0,320000.0,34,...,50+3,50+3,50+3,61+3,19+3,https://cdn.sofifa.net/players/158/023/22_120.png,https://cdn.sofifa.net/teams/73/60.png,https://cdn.sofifa.net/flags/fr.png,https://cdn.sofifa.net/teams/1369/60.png,https://cdn.sofifa.net/flags/ar.png
1,188545,https://sofifa.com/player/188545/robert-lewand...,R. Lewandowski,Robert Lewandowski,ST,92,92,119500000.0,270000.0,32,...,60+3,60+3,60+3,61+3,19+3,https://cdn.sofifa.net/players/188/545/22_120.png,https://cdn.sofifa.net/teams/21/60.png,https://cdn.sofifa.net/flags/de.png,https://cdn.sofifa.net/teams/1353/60.png,https://cdn.sofifa.net/flags/pl.png
2,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,"ST, LW",91,91,45000000.0,270000.0,36,...,53+3,53+3,53+3,60+3,20+3,https://cdn.sofifa.net/players/020/801/22_120.png,https://cdn.sofifa.net/teams/11/60.png,https://cdn.sofifa.net/flags/gb-eng.png,https://cdn.sofifa.net/teams/1354/60.png,https://cdn.sofifa.net/flags/pt.png


In [4]:
all_columns = list(fifa_22_players.columns)
for each_column in all_columns:
    print(f"{each_column:<50} => DataType({fifa_22_players[each_column].dtype})")

sofifa_id                                          => DataType(int64)
player_url                                         => DataType(object)
short_name                                         => DataType(object)
long_name                                          => DataType(object)
player_positions                                   => DataType(object)
overall                                            => DataType(int64)
potential                                          => DataType(int64)
value_eur                                          => DataType(float64)
wage_eur                                           => DataType(float64)
age                                                => DataType(int64)
dob                                                => DataType(object)
height_cm                                          => DataType(int64)
weight_kg                                          => DataType(int64)
club_team_id                                       => DataType(float64)
club_name

---

Important Observation: We have columns "ls" to "gk" which has predictable values of each player, when played at that position, would change his overall rating by +(plus) or -(minus) value.

Example: If player has originally plays at Striker Role with overall 90 Rating. If he is played at midfield position, he may not be rated at same 90 but could be 88. So then we would have CM(central midfield) value as 90-2.

---

Even though we have mixed data types, we can mainly classify columns into 2 major categories:

1. Numerial domain columns
2. Non-Numerical domain columns

We can then make use of numerical columns to perform major anlaysis and later on look into non-numerical features.

In [5]:
## Creating numerical columns
numerical_columns = [each_column for each_column in all_columns if (fifa_22_players[each_column]).dtype != "O"]

## Creating non-numerical columns
nonnumerical_columns = [each_column for each_column in all_columns if (fifa_22_players[each_column]).dtype == "O"]

In [6]:
print(f"Total Columns: {len(all_columns)}")
print(f"Total Numerical Columns: {len(numerical_columns)}")
print(f"Total Non-Numerical Columns: {len(nonnumerical_columns)}")

Total Columns: 110
Total Numerical Columns: 60
Total Non-Numerical Columns: 50


In [7]:
print(numerical_columns)

['sofifa_id', 'overall', 'potential', 'value_eur', 'wage_eur', 'age', 'height_cm', 'weight_kg', 'club_team_id', 'league_level', 'club_jersey_number', 'club_contract_valid_until', 'nationality_id', 'nation_team_id', 'nation_jersey_number', 'weak_foot', 'skill_moves', 'international_reputation', 'release_clause_eur', 'pace', 'shooting', 'passing', 'dribbling', 'defending', 'physic', 'attacking_crossing', 'attacking_finishing', 'attacking_heading_accuracy', 'attacking_short_passing', 'attacking_volleys', 'skill_dribbling', 'skill_curve', 'skill_fk_accuracy', 'skill_long_passing', 'skill_ball_control', 'movement_acceleration', 'movement_sprint_speed', 'movement_agility', 'movement_reactions', 'movement_balance', 'power_shot_power', 'power_jumping', 'power_stamina', 'power_strength', 'power_long_shots', 'mentality_aggression', 'mentality_interceptions', 'mentality_positioning', 'mentality_vision', 'mentality_penalties', 'mentality_composure', 'defending_marking_awareness', 'defending_standi

In [8]:
print(nonnumerical_columns)

['player_url', 'short_name', 'long_name', 'player_positions', 'dob', 'club_name', 'league_name', 'club_position', 'club_loaned_from', 'club_joined', 'nationality_name', 'nation_position', 'preferred_foot', 'work_rate', 'body_type', 'real_face', 'player_tags', 'player_traits', 'ls', 'st', 'rs', 'lw', 'lf', 'cf', 'rf', 'rw', 'lam', 'cam', 'ram', 'lm', 'lcm', 'cm', 'rcm', 'rm', 'lwb', 'ldm', 'cdm', 'rdm', 'rwb', 'lb', 'lcb', 'cb', 'rcb', 'rb', 'gk', 'player_face_url', 'club_logo_url', 'club_flag_url', 'nation_logo_url', 'nation_flag_url']


In [9]:
## Now lets look into empty values in each columns
fifa_22_players[numerical_columns].isna().sum()

sofifa_id                          0
overall                            0
potential                          0
value_eur                         74
wage_eur                          61
age                                0
height_cm                          0
weight_kg                          0
club_team_id                      61
league_level                      61
club_jersey_number                61
club_contract_valid_until         61
nationality_id                     0
nation_team_id                 18480
nation_jersey_number           18480
weak_foot                          0
skill_moves                        0
international_reputation           0
release_clause_eur              1176
pace                            2132
shooting                        2132
passing                         2132
dribbling                       2132
defending                       2132
physic                          2132
attacking_crossing                 0
attacking_finishing                0
a

Columns with highest number of empty values:

- goalkeeping_speed
- nation_team_id
- nation_team_id
- pace
- shooting
- passing
- dribbling
- defending
- physic
- release_clause_eur
- value_eur
- wage_eur
- club_team_id
- league_level
- club_jersey_number
- club_contract_valid_until

***We can choose to delete the columns which will not be much helpful during our analysis. Apart from that, we only have option to drop certain players who do not have proper statistic values in FIFA.***

In [10]:
## Now lets look into empty values in each columns
fifa_22_players[nonnumerical_columns].isna().sum()

player_url              0
short_name              0
long_name               0
player_positions        0
dob                     0
club_name              61
league_name            61
club_position          61
club_loaned_from    18137
club_joined          1163
nationality_name        0
nation_position     18480
preferred_foot          0
work_rate               0
body_type               0
real_face               0
player_tags         17798
player_traits        9841
ls                      0
st                      0
rs                      0
lw                      0
lf                      0
cf                      0
rf                      0
rw                      0
lam                     0
cam                     0
ram                     0
lm                      0
lcm                     0
cm                      0
rcm                     0
rm                      0
lwb                     0
ldm                     0
cdm                     0
rdm                     0
rwb         

Non-Numerical Columns with Highest number of Empty Values
- nation_position
- nation_logo_url
- club_loaned_from
- player_tags
- player_traits
- club_joined
- club_name
- league_name
- club_position
- club_logo_url
- club_flag_url

***We can choose to delete the columns which will not be much helpful during our analysis. Apart from that, we only have option to drop certain players who do not have proper statistic values in FIFA.***

In [11]:
fifa_22_players.drop(
    ["nation_position","nation_logo_url","club_loaned_from","player_tags","player_traits","club_joined","club_name","league_name","club_position","club_logo_url","club_flag_url","goalkeeping_speed","nation_team_id","nation_team_id","release_clause_eur","club_team_id","league_level","club_jersey_number","club_contract_valid_until","nation_jersey_number"],
    axis=1,
    inplace=True
)

In [12]:
all_columns = list(fifa_22_players.columns)

## Creating numerical columns
numerical_columns = [each_column for each_column in all_columns if (fifa_22_players[each_column]).dtype != "O"]

## Creating non-numerical columns
nonnumerical_columns = [each_column for each_column in all_columns if (fifa_22_players[each_column]).dtype == "O"]

print(f"Total Columns: {len(all_columns)}")
print(f"Total Numerical Columns: {len(numerical_columns)}")
print(f"Total Non-Numerical Columns: {len(nonnumerical_columns)}")

Total Columns: 91
Total Numerical Columns: 52
Total Non-Numerical Columns: 39


We have dropped 19 columns and now left with 91 columns

In [13]:
## Now lets look into empty values in each columns
fifa_22_players[numerical_columns].isna().sum()

sofifa_id                         0
overall                           0
potential                         0
value_eur                        74
wage_eur                         61
age                               0
height_cm                         0
weight_kg                         0
nationality_id                    0
weak_foot                         0
skill_moves                       0
international_reputation          0
pace                           2132
shooting                       2132
passing                        2132
dribbling                      2132
defending                      2132
physic                         2132
attacking_crossing                0
attacking_finishing               0
attacking_heading_accuracy        0
attacking_short_passing           0
attacking_volleys                 0
skill_dribbling                   0
skill_curve                       0
skill_fk_accuracy                 0
skill_long_passing                0
skill_ball_control          

In [14]:
## Now lets look into empty values in each columns
fifa_22_players[nonnumerical_columns].isna().sum()

player_url          0
short_name          0
long_name           0
player_positions    0
dob                 0
nationality_name    0
preferred_foot      0
work_rate           0
body_type           0
real_face           0
ls                  0
st                  0
rs                  0
lw                  0
lf                  0
cf                  0
rf                  0
rw                  0
lam                 0
cam                 0
ram                 0
lm                  0
lcm                 0
cm                  0
rcm                 0
rm                  0
lwb                 0
ldm                 0
cdm                 0
rdm                 0
rwb                 0
lb                  0
lcb                 0
cb                  0
rcb                 0
rb                  0
gk                  0
player_face_url     0
nation_flag_url     0
dtype: int64

We can now remove these players which have empty values

In [15]:
print(f"Shape before dropping NA: {fifa_22_players.shape}")
fifa_22_players.dropna(inplace=True)
print(f"Shape after dropping NA: {fifa_22_players.shape}")

Shape before dropping NA: (19239, 91)
Shape after dropping NA: (17041, 91)


---

We now have final data which can be used for further analysis. 

Let's find some useful Insights.

---

#### 1. Top 5 Players Per Position

In [16]:
top_5_players_per_position = fifa_22_players[["short_name","overall","player_positions"]].groupby(
    ['player_positions']
).apply(
    lambda x: (x.groupby('short_name').sum().sort_values('overall', ascending=False)).head(5)
)

In [17]:
top_5_players_per_position.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,overall
player_positions,short_name,Unnamed: 2_level_1
CAM,Bruno Fernandes,88
CAM,Jonathan Viera,81
CAM,N. Vlašić,80
CAM,N. Lodeiro,80
CAM,J. Lingard,79
"CAM, CDM",P. Kasami,74
"CAM, CDM",D. Johnson,72
"CAM, CDM",V. Gvilia,67
"CAM, CDM",M. Chrapek,66
"CAM, CDM",M. Czyżycki,63


Some player positions only have very limited number of players. Thus, we may need a approach which generalizes he number of unique player positions for a player.

This generalization could be the preferred position - which can be obtained from the position in which the player has been rated the highest.