# FIFA 20 Analysis

## Analysis of FIFA 20 Game Set

FIFA is a video game series that has been released by EA Sports every year with adding new players and changing attributes of these players according to their performances in the real game. The game has data about players and teams from almost all the leagues and countries from the world. The game FIFA 20 was released in late September 2019 and has rated the attributes accordingly.

### Breif Description of the data set

Data of every player available on the game FIFA 20
Player positions, with the role in the club and in the national team
Player attributes with statistics as Attacking, Skills, Defense, Mentality, GK Skills, etc.
Player personal data like Nationality, Club, DateOfBirth, Wage, Salary, etc.

In [None]:
#Import all libraries
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import plotly as pl
from plotly.offline import plot
import re
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
#Import the dataset
d_20 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_20.csv',error_bad_lines=False)

### Checking the dataset

In [None]:
d_20.head()

In [None]:
d_20.shape

In [None]:
cols = list(d_20.columns)

print(cols)

In [None]:
u_c = ['dob','sofifa_id','player_url','long_name','body_type','real_face','loaned_from','nation_position','nation_jersey_number']

##### Remove the unrequired data

In [None]:
d_20 = d_20.drop(u_c,axis=1)

d_20.head()

## Data Manipulation

In [None]:
d_20['BMI'] = d_20['weight_kg'] / ((d_20['height_cm'] / 100)**2)

d_20.head()

In [None]:
d_20[['short_name','player_positions']]

In [None]:
#Distributing the player positions in different columns
new_player_position = d_20['player_positions'].str.get_dummies(sep = ', ').add_prefix('Position_')

new_player_position.head()

In [None]:
#Concatenate the new created columns with the dataset
d_20 = pd.concat([d_20,new_player_position], axis = 1)

d_20.head()

In [None]:
#Dropping the original position column to eliminate confusion
d_20 = d_20.drop('player_positions', axis = 1)
d_20.head()

### Cleaning the columns of different position attributes

In [None]:
positions = ['ls','st','rs','lw','lf','cf','rf','rw','lam','cam','ram','lm','lcm','cm','rcm','rm','lwb','ldm','cdm','rdm','rwb','lb','lcb','cb','rcb','rb']

In [None]:
d_20[positions]

In [None]:
for i in positions :
    d_20[i] = d_20[i].str.split('+', n = 1, expand = True)[0]
    
d_20.head()

In [None]:
#Filling the null values with 0 and converting the column into integer value
d_20[positions] = d_20[positions].fillna(0)

d_20[positions] = d_20[positions].astype(int)

d_20[positions]

In [None]:
style = ['dribbling','defending','physic','passing','shooting','pace']

In [None]:
#Filling the null values in the above columns with the median values
for i in style : 
    d_20[i] = d_20[i].fillna(d_20[i].median())

In [None]:
d_20[style].isnull().sum()

In [None]:
d_20[style] = d_20[style].astype(int)

In [None]:
#Filling all the null values of the data set by "0"

d_20 = d_20.fillna(0)

d_20.isnull().sum()

## Data Analysis and Visualization

#### Analysis based on Age

In [None]:
a = d_20['age']

fig = go.Figure()
fig.add_trace(
    go.Histogram(x=a,
                marker=dict(color='rgba(114, 186, 59, 0.5)'))
)
fig.add_shape(
        go.layout.Shape(type='line', xref='x', yref='paper',
                        x0=a.mean(), y0=0, x1=a.mean(), y1=0.9, line={'dash': 'dash'}),
)
fig.show()
print("Skewness of age is", d_20['age'].skew())


In [None]:
d_20.loc[d_20['age'] == d_20['age'].min()]

In [None]:
d_20.loc[d_20['age'] == d_20['age'].max()]

#### The age of maximum players lie between 20 to 27 years with a mean of around 25 years

The minimum age of the players in the game is 16 years whereas the maximum age is 42 years



In [None]:
#The age VS overall Rating comparison
avo = sns.lineplot(d_20['age'], d_20['overall'], palette = 'Wistia')
plt.title('Age vs Overall', fontsize = 20)

plt.show()

The overall ratings of the players is more skewed when they are young and even at an older age

### Calculated BMI of Players

In [None]:
b = d_20['BMI']

fig = go.Figure()
fig.add_trace(
    go.Histogram(x=b,
                marker=dict(color='rgba(114, 18, 59, 0.5)'),
                )
)
fig.add_shape(
        go.layout.Shape(type='line', xref='x', yref='paper',
                        x0=b.mean(), y0=0, x1=b.mean(), y1=0.9, line={'dash': 'dash'}),
)
fig.show()

print("Skewness of BMI is", d_20['BMI'].skew())

##### The BMI of most players lie between 22 and 24 with a mean around 23. The data is not skewed towards any side.

In [None]:
#Player with the highest BMI
d_20.loc[d_20['BMI'] == d_20['BMI'].max()][['short_name', 'age', 'overall', 'BMI']]

In [None]:
#Player with the lowest BMI
d_20.loc[d_20['BMI'] == d_20['BMI'].min()][['short_name', 'age', 'overall', 'BMI']]

### Preffered Foot to Shoot

In [None]:
pie_chart1 = px.pie(d_20, names = 'preferred_foot', title = 'Preffered Foot to Shoot')
pie_chart1.show()
print("The average of overall scores of players who prefer Right foot is", round(d_20.loc[d_20['preferred_foot'] == 'Right']['overall'].mean(), 2))

print("The average of overall scores of players who prefer Left foot is", round(d_20.loc[d_20['preferred_foot'] == 'Left']['overall'].mean(), 2))

### International Reputation


In [None]:
pie_chart2 = px.pie(d_20, names = 'international_reputation', title = 'International Reputation')
pie_chart2.show()
d_20['international_reputation'].value_counts()

In [None]:
d_20.loc[d_20['international_reputation'] == 5]

It is seen that the these 6 players with highest international reputation in the game are also the most reputed players in the world

In [None]:
d_20.loc[d_20['international_reputation'] == 1].head(10)

Even some the lowest internationally reputed players have overall ratings and potential high

### Value of the Players

In [None]:
scatter_plot = go.Figure(
data = go.Scatter(
    x = d_20 ['overall'],
    y = d_20 ['value_eur'],
    mode = 'markers',
    marker = dict(
    size = 10,
    color = d_20['age'],
    showscale = True
    ),
    text = d_20['short_name']
)
)

scatter_plot.update_layout(title = 'Scatter Plot Year 2020',
                   xaxis_title = 'Overall Rating',
                   yaxis_title = 'Value in EUR')
scatter_plot.show()



In [None]:
d_20.loc[d_20['value_eur'] == d_20['value_eur']].head(5)[['short_name','value_eur', 'club','age', 'overall' ]]

In [None]:
d_20.loc[d_20['value_eur'] == d_20['value_eur']].tail(5)[['short_name','value_eur', 'club','age', 'overall' ]]

In [None]:
#3D Scatter Plot
scatter5 = px.scatter_3d(d_20.head(50), x = 'overall', y = 'age', z = 'value_eur', color = 'short_name')
scatter5.update_layout(title = 'Top 50 player Value comparison with age and overall rating')
scatter5.show()

##### It is seen that with decrease in player ratings the value also reduces

The top 5 highest valued players are also the top rated players and play for well-known clubs.
The lowest valued 5 players have a low overall rating and also are young.

The 3D scatter plot shows that the top 50 players are at the peak of their ages and have very high overall rating

### Work Rate of Players

In [None]:

pie_chart2 = px.pie(d_20, names = 'work_rate', title = 'Work Rate')
pie_chart2.show()
d_20['work_rate'].value_counts()

In [None]:
d_20.loc[d_20['work_rate'] == 'High/High'].head(10)[['short_name','work_rate','age','BMI','club','overall','value_eur']]

##### It can be seen that most of the players have a medium work rate 

In the highest work rate, it seen that these players are recruited in the top clubs and have high value, high rating and are not young

### Club wise Analysis

In [None]:
#List of all the club names
Club =np.unique(d_20['club'])

#List of means of the overall ratings of the clubs
Club_mean = d_20.groupby(d_20['club'])['overall'].mean()


In [None]:
scatter_plot = go.Figure(
data = go.Scatter(
    x = Club,
    y = Club_mean,
    mode = 'markers',
    marker = dict(
    size = 10,
    color = d_20['value_eur']    
    )
)
)

scatter_plot.update_layout(title = 'Mean Overall Rating of all teams',
                   xaxis_title = 'Clubs',
                   yaxis_title = 'Overall Rating')
scatter_plot.show()

In [None]:
#Club with player from most different nations
d_20.groupby(['club'])['nationality'].nunique().sort_values(ascending = False).head()

In [None]:
#Club with player from least different nations
d_20.groupby(['club'])['nationality'].nunique().sort_values(ascending = True).head()

#### It can be seen that the top teams where the ratings are high also represent the clubs that are highest in terms of their value

The most diverse club is with players from 19 different countries while some clubs have players from only one country.

## Positional Analysis

In [None]:
attacking = ['RW','LW','ST','CF','LS','RS','LF','RF']

piech1 = d_20.query('team_position in @attacking')

piechart1 = px.pie(piech1, names = 'team_position', color_discrete_sequence= px.colors.sequential.Magenta_r,
                  title = 'Pie Chart For Attacking Positions')
piechart1.show()

In [None]:
midfielding = ['CAM','RCM','CDM','LDM','RM','LM','LCM','RDM','RAM','CM','LAM']

piech2 = d_20.query('team_position in @midfielding')

piechart2 = px.pie(piech2, names = 'team_position', color_discrete_sequence= px.colors.sequential.Mint_r,
                  title = 'Pie Chart For Midfield Positions')
piechart2.show()

In [None]:
defending = ['LCB','RCB','LB','RB','CB','RWB','LWB']

piech3 = d_20.query('team_position in @defending')

piechart3 = px.pie(piech3, names = 'team_position', color_discrete_sequence= px.colors.sequential.Teal_r,
                  title = 'Pie Chart For Defensive Positions')
piechart3.show()

It is seen that the players are well distributed in the mid-fielding and defending positions and there are many players for each role.
In case of the attacking position, there are many Strikers as compared to other positions

### Model to list players according to position and the amount 

In [None]:
def top_players (pos, value):
    col = str('Position_')+str.upper(pos)
    targ = d_20[(d_20[col]==1) & (d_20['value_eur'] <= value)][['short_name','age','overall','BMI','value_eur']].head(10)
    return targ

In [None]:
top_players('lw',50000000)

#### This model will help the FIFA gamer in the Manager Mode when he/she wants to buy a new player according to the position and the money that the person has.


### League Wise Analysis

## Comparative Analysis

In [None]:
d_19 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_19.csv',error_bad_lines=False)
d_18 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_18.csv',error_bad_lines=False)
d_17 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_17.csv',error_bad_lines=False)
d_16 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_16.csv',error_bad_lines=False)
d_15 = pd.read_csv('/kaggle/input/fifa-20-complete-player-dataset/players_15.csv',error_bad_lines=False)

In [None]:
attributes = ['dribbling','defending','physic','passing','shooting','pace','overall']

In [None]:
def playergrow(name):
    nm20 = d_20[d_20.short_name.str.contains(name, regex = False)]
    nm19 = d_19[d_19.short_name.str.contains(name, regex = False)]
    nm18 = d_18[d_18.short_name.str.contains(name, regex = False)]
    nm17 = d_17[d_17.short_name.str.contains(name, regex = False)]
    nm16 = d_16[d_16.short_name.str.contains(name, regex = False)]
    nm15 = d_15[d_15.short_name.str.contains(name, regex = False)]
    
    scat20 = go.Scatterpolar(
        r = [nm20['dribbling'].values[0],  nm20['defending'].values[0],   nm20['physic'].values[0], 
             nm20['passing'].values[0],     nm20['shooting'].values[0],    nm20['pace'].values[0], 
             nm20['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2020'
    )
    scat19 = go.Scatterpolar(
        r = [nm19['dribbling'].values[0],  nm19['defending'].values[0],   nm19['physic'].values[0], 
             nm19['passing'].values[0],     nm19['shooting'].values[0],    nm19['pace'].values[0], 
             nm19['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2019'
    )
    scat18 = go.Scatterpolar(
        r = [nm18['dribbling'].values[0],  nm18['defending'].values[0],   nm18['physic'].values[0], 
             nm18['passing'].values[0],     nm18['shooting'].values[0],    nm18['pace'].values[0], 
             nm18['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2018'
    )
    scat17 = go.Scatterpolar(
        r = [nm17['dribbling'].values[0],  nm17['defending'].values[0],   nm17['physic'].values[0], 
             nm17['passing'].values[0],     nm17['shooting'].values[0],    nm17['pace'].values[0], 
             nm17['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2017'
    )
    scat16 = go.Scatterpolar(
        r = [nm16['dribbling'].values[0],  nm16['defending'].values[0],   nm16['physic'].values[0], 
             nm16['passing'].values[0],     nm16['shooting'].values[0],    nm16['pace'].values[0], 
             nm16['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2016'
    )
    scat15 = go.Scatterpolar(
        r = [nm15['dribbling'].values[0],  nm15['defending'].values[0],   nm15['physic'].values[0], 
             nm15['passing'].values[0],     nm15['shooting'].values[0],    nm15['pace'].values[0], 
             nm15['overall'].values[0]
            ]
      ,
        theta = attributes,
        fill = 'toself',
        name = '2015'
    )
    
    plan = [scat20, scat19, scat18, scat17, scat16, scat15]
    lay = go.Layout(
        polar = dict(
            radialaxis = dict(
                visible = True,
                range = [0,100]
            )
        )
        ,
        showlegend = True,
        title = 'Comparison of {} during years in years 2015 to 2020'.format(name)
    )
    figure = go.Figure (data = plan, layout = lay)
    figure.show()

In [None]:
x = playergrow('Neymar')
y = playergrow('L. Messi')
z = playergrow('Cristiano Ronaldo')

#### The comparative study of the top 3 players of the game show that over the years they have improved in some or the other field and increased their own worth

## Top 50 rated players Analysis



In [None]:
pie5 = px.pie(d_20.head(50),names='club',title='Clubs of top 50 players')
pie5.show()

The ownership of most of the top 50 players is dominated by 3-4 teams, which are well-renowned and also have won many titles.

In [None]:
def bar_diagram (field):
    plt.figure(dpi=125)
    sns.countplot(field,data=d_20.head(50))
    plt.xlabel(field)
    plt.ylabel('Count')
    plt.title('Distribution of Top 50 players according to {}'.format(field))
    plt.show()

In [None]:
bar_diagram('team_jersey_number')

The highest number of players in the top 50 have a jersey number 10

In [None]:
bar_diagram('age')

The age of the top 50 players is well distributed.

In [None]:
scatter_plot2 = go.Figure(
data = go.Scatter(
    x = d_20 ['BMI'],
    y = d_20 ['pace'].head(50),
    mode = 'markers',
    marker = dict(
    size = 10,
    color = d_20['overall'],
    showscale = True
    ),
    text = d_20['short_name']
)
)

scatter_plot2.update_layout(title = 'BMI of top 50 players with best pace',
                   xaxis_title = 'BMI',
                   yaxis_title = 'Pace Rating')
scatter_plot2.show()

The players with a BMI around 22 to 24 have the best pace. Hence, we can say that the players in that range are in a good physical condition.