In [None]:
#Importing the Libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Turning off the warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
#To display all the columns of the dataframe
pd.set_option('display.max_columns',70)

In [None]:
#Reading the FIFA20 dataset
fifa = pd.read_csv("/kaggle/input/fifa-player-stats-database/FIFA20_official_data.csv")

## DATA UNDERSTANDING

In [None]:
#Dimension of the dataset
fifa.shape

17104 datapoints or player details and 65 features/columns

In [None]:
#head of the dataset
fifa.head()

OK WAIT!!! DE BRUYNE!!!!!!. YOU ARE ALREADY MENTIONED IN THE FIRST FEW ROWS!!

In [None]:
#columns of the dataframe
fifa.columns

In [None]:
#INFO() function
fifa.info()

I can find there are some missing values for columns such as `Club,Joined`, and many missing values for these columns - `LoanedFrom` and `Marking`.

In [None]:
#Some initial stats for the df
fifa.describe()

In [None]:
#Copy of our pulled in df
fifa_copy = fifa.copy()

## PREPROCESS AND SOME CLEANING

In [None]:
fifa.head()

I am going to delete some rows right away, which might not be adding any value adds to our analysis going forward. We would be concentrating around players and their corresponding skills,their respective clubs and countries only. All the url columns and index columns we will try removing right now.

In [None]:
#Dropping some of the columns - ID, Photo, Flag, Club Logo
fifa.drop(['ID','Photo','Flag','Club Logo'],axis=1,inplace=True)

In [None]:
#Dropping Real Face column
fifa.drop(['Real Face'],axis=1,inplace=True)

In [None]:
#Filtering for rows which have Loaned From column not NULL 
fifa.loc[~fifa['Loaned From'].isnull()][:5]

You can find some of the Loaned players in the list above.<br>
Ohh Perisic was on loan from Inter to Bayern. I was not aware of that! :p

I guess it would be better in this case not to handle the missing values for the columns, because we might end up missing out so many players in that case.<br>
For columns such as `Loaned From` might be many for many players as well, since it would be a direct contract with the player.

#### PLAYER VALUE AND WAGE - DATA CLEANING

In [None]:
#Checking whether all values are defined in Euros or not
fifa.loc[fifa['Value'].str.startswith('€')].shape[0]

In [None]:
#Checking the same for Wages.
fifa.loc[fifa['Wage'].str.startswith('€')].shape[0]

So the Value and Wage of all the players are defined in Euros and also object type, which we can try converting accordingly and try making it as a numeric column. 

In [None]:
#Splitting the value column to get just the numeric
fifa['Value'] = fifa['Value'].str.split('€')
fifa['Value'] = fifa['Value'].apply(lambda x:x[1])

In [None]:
#Splitting the wage column to get just the numeric
fifa['Wage'] = fifa['Wage'].str.split('€')
fifa['Wage'] = fifa['Wage'].apply(lambda x:x[1])

In [None]:
#Converting the player value in thousand Euros to Million Euros and then stripping the end denote.
fifa_value_K = fifa.loc[fifa['Value'].str.endswith('K')]
fifa_value_K['Value'] = fifa_value_K['Value'].apply(lambda x: x[:-1])
fifa_value_K['Value'] = fifa_value_K['Value'].astype('float64')
fifa_value_K['Value'] = fifa_value_K['Value'] / 1000

In [None]:
#Stripping the end denote for Million Euros Player value
fifa_value_M = fifa.loc[fifa['Value'].str.endswith('M')]
fifa_value_M['Value'] = fifa_value_M['Value'].apply(lambda x: x[:-1])
fifa_value_M['Value'] = fifa_value_M['Value'].astype('float64')

In [None]:
#Converting the player wage in thousand Euros to Million Euros and then stripping the end denote.
fifa_value_K['Wage'] = fifa_value_K['Wage'].apply(lambda x: x[:-1] if x.endswith('K') else x)
fifa_value_K['Wage'] = fifa_value_K['Wage'].astype('float64')
fifa_value_K['Wage'] = fifa_value_K['Wage'] / 1000

In [None]:
fifa_value_M.loc[fifa_value_M['Wage'].str.endswith("M")]

We can find that there are no players who are having player values in Million euros having wages also in Millions. So we will go with thousand converted to millions

In [None]:
#Converting the player wage in thousand Euros to Million Euros and then stripping the end denote.
fifa_value_M['Wage'] = fifa_value_M['Wage'].apply(lambda x: x[:-1] if x.endswith('K') else x)
fifa_value_M['Wage'] = fifa_value_M['Wage'].astype('float64')
fifa_value_M['Wage'] = fifa_value_M['Wage'] / 1000

In [None]:
#Concatenating both the splitted up dataframes
fifa2 = pd.concat([fifa_value_M,fifa_value_K])

In [None]:
fifa2.shape

I can find that the resultant dataframe after our preprocess is having less no of records than our initial dataframe. These are records that would be having player Values provided as 0 or any other number which is not in thousands or Millions. We will proceed with this dataset as of now.

### CLEANING THE POSITION FEATURE 

We can find some HTML code getting in(probably when the data is scrapped) with the Position column. We will clean those.

In [None]:
fifa2['Position'] = fifa2['Position'].str.split(">")
fifa2['Position'] = fifa2['Position'].apply(lambda x:x[1])

### CLEANING THE WEIGHT COLUMN

All the player weights are mentioned in lbs and is object type. We will remove the postfix and convert it into int column.

In [None]:
fifa2['Weight'] = fifa2['Weight'].apply(lambda x : x[:-3])
fifa2['Weight'] = fifa2['Weight'].astype('int64')

## ANALYSIS ON THE PREFERED FOOT

In [None]:
foot = fifa2['Preferred Foot'].value_counts()
foot

As we expected, the Left Footers are RARE!! there are only 4000 left footers in our player list.

In [None]:
foot_right = foot[0]/fifa2['Preferred Foot'].count()*100
foot_left = foot[1]/fifa2['Preferred Foot'].count()*100
foot_df = pd.DataFrame({'Percentage':[foot_right,foot_left]},index=['Right Foot','Left Foot'])
foot_df.style.background_gradient(cmap='Purples')

In [None]:
#Barplot for the classes
plt.title("Foot Preference")
sns.barplot(x=foot_df.index,y=foot_df['Percentage'],palette='Blues')
plt.show()

This clearly indicates the dominance of Right footers in the Football. There are just a small percentage of Left Footers, hence their importance.

## AVERAGE AGE OF ALL THE PLAYERS IN FIFA20 DATASET

In [None]:
fifa2.Age.mean()

In [None]:
fifa2.head()

## TOP 10 PLAYERS WITH HIGHEST OVERALL

In [None]:
fifa_overall = fifa2.sort_values(['Overall'],ascending=False)[:10]
fifa_overall[['Name','Overall','Potential','Club','Preferred Foot','Position']].style.background_gradient(cmap='Greens')

These are the top 10 -> <br>
`L. Messi`,`Cristiano Ronaldo`,`Neymar Jr`,`J. Oblak`,`R. Lewandowski`,`K. De Bruyne`,`E. Hazard`,`V. van Dijk`,`M. ter Stegen`,`S. Mané`<Br>
Interesting FACT -> 2 Belgium players in the top10 list.<Br>
And also <B>NO WONDER HOW LIVERPOOL WERE ABLE TO WIN THE PL -> 2 OF THEIR BEST ARE IN THE TOP10 TOO </B>

## TOP 10 PLAYERS WITH HIGHEST POTENTIAL

In [None]:
fifa_potential = fifa2.sort_values(['Potential'],ascending=False)[:10]
fifa_potential[['Name','Overall','Potential','Club','Preferred Foot','Position']].style.background_gradient(cmap='Reds')

These are the top 10 -> <br>
`K. Mbappé`,`J. Sancho`,`L. Messi`,`Cristiano Ronaldo`,`K. Havertz`,`João Félix`,`J. Oblak`,`M. ter Stegen`,`Vinícius Jr.`,`Neymar Jr`<Br>
OK. Now i got why Chelsea is behind Kai Havertz!!!! GO CHELSEA GO! GO AND GET HIM IN THIS TRANSFER WINDOW.<br>
Sancho probably reached Manchester United as well.

What is the fun in always viewing the top guys! We will see the bottom ones now.

## BOTTOM 10 IN POTENTIAL AND OVERALL

In [None]:
fifa2.sort_values(['Overall'],ascending=True)[:10]

In [None]:
fifa2.sort_values(['Potential'],ascending=True)[:10]

## GERMAN PLAYERS 

In [None]:
fifa2_germany = fifa2.loc[fifa2.Nationality=='Germany']
fifa2_germany.sort_values(['Overall'],ascending=False)[:5]

Some of the best german players are already listed here! :)

## MAYBE AGE IS JUST A NUMBER FOR BUFFON AND SOME OTHERS!

In [None]:
fifa2.loc[fifa2.Age >40]

### Average Age of an International Team in FIFA20

In [None]:
plt.figure(figsize=(10,30))
sns.barplot(y=fifa2['Nationality'],x=fifa2['Age'])
plt.plot()

Average age of players from Oman team is greater than 35, while that of Singapore is less than 20!!!!!

## TOP 5 HIGHEST VALUED PLAYERS

In [None]:
fifa2.sort_values(['Value'],ascending=False)[:5]

## WEAK FOOT AND SKILL MOVES

In [None]:
#To find the range of values for Weak Foot
print(fifa2['Weak Foot'].min())
print(fifa2['Weak Foot'].max())

In [None]:
#To find the range of values for Skill Moves
print(fifa2['Skill Moves'].min())
print(fifa2['Skill Moves'].max())

In [None]:
fifa2.loc[((fifa2['Weak Foot']==fifa2['Weak Foot'].max()) &(fifa2['Skill Moves']==fifa2['Skill Moves'].max()))]

`Weak Foot` and `Skill Moves` - NEYMAR, DEMBELE, NANI, VITINHO, RIBERY

## PLAYER POSITION

In [None]:
fifa_pos = fifa2['Position'].value_counts()
plt.figure(figsize=(15,5))
sns.barplot(fifa_pos.index,fifa_pos.values)
plt.show()

No of centre forwards in the player set is the least. We can see how many of them are there in this dataset

In [None]:
fifa2.loc[fifa2['Position']=='CF'].count()[0]

## HEAVIEST PLAYER

In [None]:
fifa2.sort_values(['Weight'],ascending=False)[:1][['Name','Weight']]

## POSITION VS PLAYER VALUE

In [None]:
plt.figure(figsize=(15,8))
sns.boxplot(y=fifa2['Value'],x=fifa2['Position'])
plt.plot()

Ofcourse there are exceptions for each position. But what we can see from this one is that Right and Left forwards mean values are actually at the top compared to all other positions.

## AGILITY, AGGRESSION AND THEIR CORRELATIONS

In [None]:
fifa3 = fifa2[['Crossing', 'Finishing', 'HeadingAccuracy',
       'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',
       'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',
       'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',
       'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision']]

In [None]:
plt.figure(figsize=(20,7))
sns.heatmap(round(fifa3.corr(),2),annot=True,cmap='Blues')
plt.show()

Some highly correlated features in the dataset - `Positioning` and `Finishing` ; `Positioning` and `Dribbling`. Infact positioning is a highly correlated feature with many other features as well like - Longshots <br>
Agility is one other feature we would be interested in. We can see that `Agility` is highly correlated with Balance, Sprintspeed and Acceleration. So you have got an important tip to be a better footballer - <B> BE AGILE!! </B>

`Aggression` only helps in `interceptions`. This is what data is saying