# Learning data analysis and visualization with FIFA21


## Players performances, age influence, key characterstics per positions, main foot  
. 

This notebook is covering 4 main studies: 

* Understand the key characteristics you need to be a good player at your position  
* Understand the age influence on the average level of a player 
* See if there is a significant difference between Left and Right main foot players
* Have an overview of the proportion of every position and more statistics about football players


![Alt text](https://sportslens.com/wp-content/uploads/2020/05/FIFA21.jpg)


First we import the necessary librairies

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline 


In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Openning of the database with fifa 21 staticts about football players. The necessary columns are selected 

In [None]:
df=pd.read_csv("/kaggle/input/fifa-21-complete-player-dataset/fifa21_male2.csv")
cols=['Name', 'Age', 'OVA', 'Nationality', 'foot','Value','BP','Height','PAC','SHO','PAS','DRI','DEF','PHY']
df=df[cols]
df.head()

Verification that the shape is normal, the data is not having unexpected data or missing data

In [None]:
df.shape


In [None]:
df.info

In [None]:
df.describe

We need to make sur the int object in the database are really int and are not hiding some texts 

In [None]:
df.dtypes


# Age statistics
We take first stats about the age of the players over the 17125 players, it gives an idea about the age repartition


In [None]:
df['Age'].describe()

In [None]:
df['Age'].plot(kind='box', vert=False, figsize=(14,6))

In [None]:

df['Age'].plot(kind='density', figsize=(14,6))

# Nationality repartition

In [None]:
df['Nationality'].value_counts()


# Role within the team repartition over all players

In [None]:
df['BP'].value_counts().plot(kind='pie', figsize=(6,6))

# Values of players regarding their positions - role

In [None]:
ax = df['BP'].value_counts().plot(kind='bar', figsize=(14,6))
ax.set_ylabel('Value')
ax.set_xlabel(' Main Position')

# Study of the more intresting statistics to have to be a good player at a specific position : CENTER BACK case 

In [None]:
df[df['BP'] == 'CB'] # keep only Center Back

Creation of the correlation matrix to find the most interesting abilities to have for the CB position

In [None]:

corr = df[df['BP'] == 'CB'].corr() #Correlation matrix for CB player
corr

We plot the correlatino matrix

In [None]:
fig = plt.figure(figsize=(8,8))
plt.matshow(corr, cmap='RdBu', fignum=fig.number)
plt.xticks(range(len(corr.columns)), corr.columns, rotation='vertical');
plt.yticks(range(len(corr.columns)), corr.columns);

# Study of the more intresting statistics to have to be a good player at a specific position : Striker case 

In [None]:

corr = df[df['BP'] == 'ST'].corr() #Correlation matrix for Striker player
corr

In [None]:
fig = plt.figure(figsize=(8,8))
plt.matshow(corr, cmap='RdBu', fignum=fig.number)
plt.xticks(range(len(corr.columns)), corr.columns, rotation='vertical');
plt.yticks(range(len(corr.columns)), corr.columns);

# Study of the more intresting statistics to have to be a good player at a specific position : Goal Keeper case 

In [None]:

corr = df[df['BP'] == 'GK'].corr() #Correlation matrix for Goal Keeper player
corr

In [None]:
fig = plt.figure(figsize=(8,8))
plt.matshow(corr, cmap='RdBu', fignum=fig.number)
plt.xticks(range(len(corr.columns)), corr.columns, rotation='vertical');
plt.yticks(range(len(corr.columns)), corr.columns);

# Study of the more intresting statistics to have to be a good player at a specific position : Right wing case 

In [None]:
corr = df[df['BP'] == 'RW'].corr() #Correlation matrix for Goal Keeper player
corr

In [None]:
fig = plt.figure(figsize=(8,8))
plt.matshow(corr, cmap='RdBu', fignum=fig.number)
plt.xticks(range(len(corr.columns)), corr.columns, rotation='vertical');
plt.yticks(range(len(corr.columns)), corr.columns);

# Comparaison of the average score between Left and Right main foot players

First we get the average value for right main foot players

In [None]:
df.loc[(df['foot'] == 'Right'), 'OVA'].mean() # right foot average player

Then we compare it to the left main foot players. The difference isn't significant

In [None]:
df.loc[(df['foot'] == 'Left'), 'OVA'].mean() # left foot average player 

# Study of the evolution of the average level regarding players's ages

In [None]:
ax = df[['OVA', 'Age']].boxplot(by='Age', figsize=(10,6))
ax.set_ylabel('OVA')


In [None]:
df.plot(kind='scatter', x='Age', y='OVA', figsize=(6,6))

# Are some positions having better average scores?

In [None]:

ax = df[['OVA', 'BP']].boxplot(by='BP', figsize=(10,6)) # niveau moyen des joueurs en fonction des positions
ax.set_ylabel('OVA')


With the first plot it seems there is not significant difference, but is there a difference when studying the super stars players ( score > 84 => 109 best players) 


In [None]:

dt =df[df.OVA > 84]
dt.shape


Plot of the scores regarding the position, we can see that ST,RW,LW and GK are more represented in the very top players

In [None]:

ax = dt[['OVA', 'BP']].boxplot(by='BP', figsize=(10,6)) # niveau moyen des joueurs en fonction des positions
ax.set_ylabel('OVA') # repartition of the role of super player ( OVA > 84)
