## Data Wrangling of Football Player's Data of FIFA 2021

### Kaggle

Kaggle is a data science competition platform and online community of data scientists and machine learning practitioners under Google LLC. Kaggle enables users to find and publish datasets, explore and build models in a web-based data science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

## Dataset

The following dataset is provided by RACHIT TOSHNIWAL, it was primarily webscrapped from https://sofifa.com/. You can find the details about the dataset on the [Kaggle](https://www.kaggle.com/datasets/yagunnersya/fifa-21-messy-raw-dataset-for-cleaning-exploring?select=fifa21_raw_data.csv%E2%80%8B)

### Steps to complete the task:

- Basic steps including checking for missing, duplicate and information of dataframe
- Remove unnecessary columns and renaming columns
- Height and Weight into integer and their conversion
- Converting star columns into integer
- Converting value, wage and release columns into integer with conversion of M and K into numbers.
- Remove \n from Hits column and converting into integer
- Changing Team & Contract into seperate columns
- Converting Date Columns from object to Date and creating seperate day, month and year columns

In [1]:
#Importing relevant library
import pandas as pd

In [2]:
#Reading dataframe for further analysis
df = pd.read_csv("fifa21_raw_data.csv", dtype = {'Hits' : str})

In [3]:
#Shape of the dataframe
df.shape

(18979, 77)

In [4]:
df.head(3)

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Age,↓OVA,POT,Team & Contract,...,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,https://cdn.sofifa.com/players/158/023/21_60.png,Lionel Messi,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,...,Medium,Low,5 ★,85,92,91,95,38,65,\n372
1,https://cdn.sofifa.com/players/020/801/21_60.png,C. Ronaldo dos Santos Aveiro,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,...,High,Low,5 ★,89,93,81,89,35,77,\n344
2,https://cdn.sofifa.com/players/200/389/21_60.png,Jan Oblak,http://sofifa.com/player/200389/jan-oblak/210005/,Slovenia,GK,J. Oblak,27,91,93,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,...,Medium,Medium,3 ★,87,92,78,90,52,90,\n86


In [5]:
df.columns

Index(['photoUrl', 'LongName', 'playerUrl', 'Nationality', 'Positions', 'Name',
       'Age', '↓OVA', 'POT', 'Team & Contract', 'ID', 'Height', 'Weight',
       'foot', 'BOV', 'BP', 'Growth', 'Joined', 'Loan Date End', 'Value',
       'Wage', 'Release Clause', 'Attacking', 'Crossing', 'Finishing',
       'Heading Accuracy', 'Short Passing', 'Volleys', 'Skill', 'Dribbling',
       'Curve', 'FK Accuracy', 'Long Passing', 'Ball Control', 'Movement',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Power', 'Shot Power', 'Jumping', 'Stamina', 'Strength', 'Long Shots',
       'Mentality', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Defending', 'Marking', 'Standing Tackle',
       'Sliding Tackle', 'Goalkeeping', 'GK Diving', 'GK Handling',
       'GK Kicking', 'GK Positioning', 'GK Reflexes', 'Total Stats',
       'Base Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC', 'SHO', 'PAS',
       'DRI', 'DEF', 'PHY', 'Hits

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   photoUrl          18979 non-null  object
 1   LongName          18979 non-null  object
 2   playerUrl         18979 non-null  object
 3   Nationality       18979 non-null  object
 4   Positions         18979 non-null  object
 5   Name              18979 non-null  object
 6   Age               18979 non-null  int64 
 7   ↓OVA              18979 non-null  int64 
 8   POT               18979 non-null  int64 
 9   Team & Contract   18979 non-null  object
 10  ID                18979 non-null  int64 
 11  Height            18979 non-null  object
 12  Weight            18979 non-null  object
 13  foot              18979 non-null  object
 14  BOV               18979 non-null  int64 
 15  BP                18979 non-null  object
 16  Growth            18979 non-null  int64 
 17  Joined      

In [7]:
df.describe(include='all')

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Age,↓OVA,POT,Team & Contract,...,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
count,18979,18979,18979,18979,18979,18979,18979.0,18979.0,18979.0,18979,...,18979,18979,18979,18979.0,18979.0,18979.0,18979.0,18979.0,18979.0,18979
unique,18978,18851,18978,164,640,17919,,,,9023,...,3,3,5,,,,,,,374
top,https://cdn.sofifa.com/players/251/698/21_60.png,Liam Kelly,http://sofifa.com/player/251698/kevin-berlaso/...,England,CB,J. Rodríguez,,,,\n India\nFree\n\n,...,Medium,Medium,1 ★,,,,,,,\n1
freq,2,3,2,1704,2441,13,,,,29,...,12700,13956,17628,,,,,,,4321
mean,,,,,,,25.194583,65.718636,71.136098,,...,,,,67.454239,53.457716,57.681069,62.875494,49.865904,64.368618,
std,,,,,,,4.710753,6.968999,6.114176,,...,,,,10.678058,13.827229,10.081914,9.927875,16.44273,9.601665,
min,,,,,,,16.0,47.0,47.0,,...,,,,25.0,16.0,25.0,25.0,12.0,28.0,
25%,,,,,,,21.0,61.0,67.0,,...,,,,61.0,44.0,51.0,57.0,35.0,58.0,
50%,,,,,,,25.0,66.0,71.0,,...,,,,68.0,56.0,58.0,64.0,53.0,65.0,
75%,,,,,,,29.0,70.0,75.0,,...,,,,75.0,64.0,64.0,69.0,63.0,71.0,


In [8]:
#checking for null values
df.isnull().sum()

photoUrl       0
LongName       0
playerUrl      0
Nationality    0
Positions      0
              ..
PAS            0
DRI            0
DEF            0
PHY            0
Hits           0
Length: 77, dtype: int64

In [9]:
#Checking for duplicate values
df.duplicated().sum()

1

In [10]:
df[df.duplicated()==True]

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Age,↓OVA,POT,Team & Contract,...,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
944,https://cdn.sofifa.com/players/251/698/21_60.png,Kevin Berlaso,http://sofifa.com/player/251698/kevin-berlaso/...,Ecuador,RB,K. Berlaso,32,77,77,\n Ecuador\nFree\n\n,...,High,Medium,2 ★,78,56,69,77,72,68,\n12


In [11]:
df[df['LongName']=='Kevin Berlaso']

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Age,↓OVA,POT,Team & Contract,...,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
899,https://cdn.sofifa.com/players/251/698/21_60.png,Kevin Berlaso,http://sofifa.com/player/251698/kevin-berlaso/...,Ecuador,RB,K. Berlaso,32,77,77,\n Ecuador\nFree\n\n,...,High,Medium,2 ★,78,56,69,77,72,68,\n12
944,https://cdn.sofifa.com/players/251/698/21_60.png,Kevin Berlaso,http://sofifa.com/player/251698/kevin-berlaso/...,Ecuador,RB,K. Berlaso,32,77,77,\n Ecuador\nFree\n\n,...,High,Medium,2 ★,78,56,69,77,72,68,\n12


In [12]:
df.drop_duplicates(inplace=True)

In [13]:
df.duplicated().sum()

0

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18978 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   photoUrl          18978 non-null  object
 1   LongName          18978 non-null  object
 2   playerUrl         18978 non-null  object
 3   Nationality       18978 non-null  object
 4   Positions         18978 non-null  object
 5   Name              18978 non-null  object
 6   Age               18978 non-null  int64 
 7   ↓OVA              18978 non-null  int64 
 8   POT               18978 non-null  int64 
 9   Team & Contract   18978 non-null  object
 10  ID                18978 non-null  int64 
 11  Height            18978 non-null  object
 12  Weight            18978 non-null  object
 13  foot              18978 non-null  object
 14  BOV               18978 non-null  int64 
 15  BP                18978 non-null  object
 16  Growth            18978 non-null  int64 
 17  Joined      

In [15]:
#Changing the display option, max number of columns
pd.set_option('display.max_columns', None)

### TODO 1 - Remove unnecessary columns and renaming columns

In [16]:
#Droping unnecessary columns
df.drop(['photoUrl', 'playerUrl', 'ID', 'Loan Date End'], axis=1, inplace=True)

In [17]:
df.columns

Index(['LongName', 'Nationality', 'Positions', 'Name', 'Age', '↓OVA', 'POT',
       'Team & Contract', 'Height', 'Weight', 'foot', 'BOV', 'BP', 'Growth',
       'Joined', 'Value', 'Wage', 'Release Clause', 'Attacking', 'Crossing',
       'Finishing', 'Heading Accuracy', 'Short Passing', 'Volleys', 'Skill',
       'Dribbling', 'Curve', 'FK Accuracy', 'Long Passing', 'Ball Control',
       'Movement', 'Acceleration', 'Sprint Speed', 'Agility', 'Reactions',
       'Balance', 'Power', 'Shot Power', 'Jumping', 'Stamina', 'Strength',
       'Long Shots', 'Mentality', 'Aggression', 'Interceptions', 'Positioning',
       'Vision', 'Penalties', 'Composure', 'Defending', 'Marking',
       'Standing Tackle', 'Sliding Tackle', 'Goalkeeping', 'GK Diving',
       'GK Handling', 'GK Kicking', 'GK Positioning', 'GK Reflexes',
       'Total Stats', 'Base Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC',
       'SHO', 'PAS', 'DRI', 'DEF', 'PHY', 'Hits'],
      dtype='object')

In [18]:
new_columns = ['Full Name', 'Nationality', 'Positions', 'Name', 'Age', 'Overall Rating', 'Potential',
       'Team & Contract', 'Height(CM)', 'Weight(Lbs)', 'foot', 'Best Overall', 'Best Position',
       'Growth', 'Joined Date', 'Value', 'Wage', 'Release Clause',
       'Attacking', 'Crossing', 'Finishing', 'Heading Accuracy',
       'Short Passing', 'Volleys', 'Skill', 'Dribbling', 'Curve',
       'FK Accuracy', 'Long Passing', 'Ball Control', 'Movement',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Power', 'Shot Power', 'Jumping', 'Stamina', 'Strength', 'Long Shots',
       'Mentality', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Defending', 'Marking', 'Standing Tackle',
       'Sliding Tackle', 'Goalkeeping', 'GK Diving', 'GK Handling',
       'GK Kicking', 'GK Positioning', 'GK Reflexes', 'Total Stats',
       'Base Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC', 'SHO', 'PAS',
       'DRI', 'DEF', 'PHY', 'Hits']

In [19]:
df.columns = new_columns

In [20]:
#Renaming columns 'W/F', 'SM' and 'IR'
df.rename(columns = {'W/F':'W/F(★)', 'SM':'SM(★)' ,'IR':'IR(★)', "Team & Contract": "Team" , "Height(CM)" : "Height(cm)" ,
                     "Weight(Lbs)": "Weight(lbs)", "Value": "Value(€)", "Wage":"Wage(€)",
                     "Release Clause": "Release Clause(€)"}, inplace=True)

### TODO 2 - Height and Weight into integer or float and their conversion

`convert_height(string)` following function will take the value in string format and using the stip and split functions fetch the inches and feet values which are later convert into cms and integer value.

In [21]:
#Function to convert height from feet'inches" to cms.
def convert_height(string):
    string_list = (string.strip('"')).split("'")
    height_in_cm = (int(string_list[0]) * 12 + int(string_list[1])) * 2.54
    return height_in_cm

In [22]:
#Example
height = "5'10\""
convert_height(height)

177.8

In [24]:
#Applying the function to the Height(CM) column
df['Height(cm)']= df['Height(cm)'].apply(convert_height)

In [25]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,170.18,159lbs,Left,93,RW,0,"Jul 1, 2004",€67.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,\n372
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,187.96,183lbs,Right,92,ST,0,"Jul 10, 2018",€46M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,\n344
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,187.96,192lbs,Right,91,GK,2,"Jul 16, 2014",€75M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,\n86


In [26]:
#Getting the Lbs value of Weight using strip function
df['Weight(lbs)'] = df['Weight(lbs)'].apply(lambda x : int(x.strip('lbs')))

In [27]:
df[['Height(cm)', 'Weight(lbs)']].dtypes

Height(cm)     float64
Weight(lbs)      int64
dtype: object

In [28]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,170.18,159,Left,93,RW,0,"Jul 1, 2004",€67.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,\n372
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,187.96,183,Right,92,ST,0,"Jul 10, 2018",€46M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,\n344
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,187.96,192,Right,91,GK,2,"Jul 16, 2014",€75M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,\n86


### TODO 3 - Converting star columns into integer

`convert_stars(string)` this function will remove stars from columns and convert it into integer rating

In [29]:
#Creating function to remove stars from string
def convert_stars(string):
    rating = int(string.strip('★'))
    return rating

In [30]:
df['W/F(★)'] = df['W/F(★)'].apply(convert_stars)
df['SM(★)'] = df['SM(★)'].apply(convert_stars)
df['IR(★)'] = df['IR(★)'].apply(convert_stars)

In [31]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,170.18,159,Left,93,RW,0,"Jul 1, 2004",€67.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,\n372
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,187.96,183,Right,92,ST,0,"Jul 10, 2018",€46M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,\n344
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,187.96,192,Right,91,GK,2,"Jul 16, 2014",€75M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,\n86


### TODO 4 - Converting value, wage and release columns into integer with conversion of M and K into numbers

We found that there are some values are irrelavant to the data, value €0 is irrelevant hence we need to drop them from our data

In [32]:
for item in df['Value(€)']:
    if item[-1] != 'M' and item[-1] != 'K':
        print(item)
        break

€0


In [33]:
df[df['Value(€)'] == '€0'].shape

(248, 73)

In [34]:
#Getting index of irrelavant data
drop_index = df[df['Value(€)'] == '€0'].index

In [35]:
df.drop(drop_index, inplace=True)

`wage_conversion(string)` this function get the string value with M and K and will be converted into integer value of columns

In [36]:
def wage_conversion(string):
    #strip "€" from the string
    wage_txt = string.strip("€")
    if wage_txt[-1] == "M":
        return int(float(wage_txt[:-1]) * 1000000)
    elif wage_txt[-1] == "K":
        return int(float(wage_txt[:-1]) * 1000)
    elif wage_txt[-1] == "0":
        return 0
    else:
        return int(wage_txt[:-1])

#Example
wage_conversion("€138.4M")

138400000

In [37]:
df["Value(€)"] = df["Value(€)"].apply(wage_conversion)

In [38]:
df["Wage(€)"] = df["Wage(€)"].apply(wage_conversion)

In [39]:
df["Release Clause(€)"] = df["Release Clause(€)"].apply(wage_conversion)

In [40]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,170.18,159,Left,93,RW,0,"Jul 1, 2004",67500000,560000,138400000,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,\n372
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,187.96,183,Right,92,ST,0,"Jul 10, 2018",46000000,220000,75900000,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,\n344
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,187.96,192,Right,91,GK,2,"Jul 16, 2014",75000000,125000,159400000,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,\n86


### TODO 5- Remove \n from Hits column and converting into integer

`convert_hits(string)` this function conver the string into integer hits value.

In [41]:
def convert_hits(string):
    #strip the spaces from string
    hit = str(string).strip()
    # if K is the last item of the string the hit value is multiplied by 1000
    if hit[-1] == 'K':
        return int(float(hit[:-1]) * 1000)
    elif hit[-1] == "0":
        return 0
    else:
        return int(hit)

In [42]:
df['Hits'] = df['Hits'].apply(convert_hits)

In [43]:
df.head(2)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,170.18,159,Left,93,RW,0,"Jul 1, 2004",67500000,560000,138400000,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,372
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,\n\n\n\nJuventus\n2018 ~ 2022\n\n,187.96,183,Right,92,ST,0,"Jul 10, 2018",46000000,220000,75900000,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,344


In [44]:
df.describe()

Unnamed: 0,Age,Overall Rating,Potential,Height(cm),Weight(lbs),Best Overall,Growth,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits
count,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0,18730.0
mean,25.132141,65.67827,71.149813,181.227712,165.368927,66.718206,5.471543,2258585.0,8786.011746,4014402.0,248.840096,49.654672,45.830272,51.926802,58.753604,42.674746,256.3748,55.585798,47.228991,42.349279,52.691297,58.519434,317.708382,64.365029,64.421516,63.378911,61.610091,63.932835,296.504698,57.778911,64.570315,62.631554,64.766524,46.757395,254.045969,55.539936,46.351522,50.285051,53.841484,48.027977,58.033422,139.856594,46.538601,47.689002,45.628991,81.28126,16.380673,16.178804,16.044688,16.187293,16.489802,1594.611799,355.52456,2.939616,2.364816,1.091351,67.454458,53.416498,57.637373,62.851148,49.829792,64.335291,15.505446
std,4.676324,6.970029,6.112458,6.817795,15.589258,6.750358,5.672507,5124863.0,19694.647391,9825806.0,74.19327,18.117347,19.555793,17.266243,14.497329,17.624894,78.572647,18.745829,18.190194,17.207929,15.168984,16.546866,55.877195,14.880637,14.633208,14.587133,9.070201,14.062467,50.741054,13.305773,11.852065,15.787138,12.474764,19.289275,64.539724,17.12492,20.687167,19.413867,13.68938,15.639201,12.063926,61.123771,20.103176,21.337767,20.888394,84.477808,17.519561,16.774774,16.458974,16.961809,17.817003,269.823025,40.793483,0.668295,0.765402,0.361585,10.659771,13.828913,10.093331,9.931053,16.429268,9.603825,71.810446
min,16.0,47.0,47.0,154.94,110.0,48.0,0.0,5000.0,0.0,0.0,42.0,6.0,3.0,5.0,7.0,3.0,40.0,5.0,4.0,5.0,5.0,5.0,122.0,13.0,12.0,14.0,24.0,12.0,122.0,18.0,15.0,12.0,16.0,4.0,50.0,9.0,3.0,2.0,9.0,6.0,12.0,20.0,3.0,5.0,4.0,10.0,2.0,2.0,2.0,2.0,2.0,747.0,232.0,1.0,1.0,1.0,28.0,16.0,25.0,25.0,12.0,28.0,0.0
25%,21.0,61.0,67.0,175.26,154.0,62.0,0.0,300000.0,1000.0,438000.0,222.0,38.0,30.0,44.0,54.0,30.0,222.0,49.0,35.0,31.0,43.0,54.0,289.0,57.0,57.0,55.0,56.0,56.0,264.0,48.0,58.0,55.0,57.0,32.0,227.0,44.0,25.0,40.0,45.0,39.0,50.0,83.0,29.0,27.0,25.0,48.0,8.0,8.0,8.0,8.0,8.0,1452.0,326.0,3.0,2.0,1.0,61.0,44.0,51.0,57.0,35.0,58.0,1.0
50%,25.0,66.0,71.0,180.34,165.0,67.0,4.0,650000.0,3000.0,1000000.0,262.0,54.0,49.0,55.0,62.0,44.0,269.0,61.0,49.0,41.0,56.0,63.0,327.0,67.0,67.0,66.0,62.0,66.0,302.0,59.0,65.0,66.0,66.0,51.0,263.0,58.0,53.0,55.0,55.0,49.0,59.0,159.0,52.0,55.0,52.0,53.0,11.0,11.0,11.0,11.0,11.0,1626.0,356.0,3.0,2.0,1.0,68.0,56.0,58.0,64.0,53.0,65.0,3.0
75%,28.0,70.0,75.0,185.42,176.0,71.0,9.0,1800000.0,8000.0,2900000.0,297.0,63.0,62.0,64.0,68.0,56.0,310.0,68.0,61.0,55.0,64.0,69.0,356.0,74.0,74.0,74.0,68.0,74.0,334.0,68.0,73.0,73.0,73.75,62.0,297.0,68.0,64.0,64.0,64.0,60.0,66.0,191.0,63.0,65.0,63.0,59.0,14.0,14.0,14.0,14.0,14.0,1780.0,384.0,3.0,3.0,1.0,75.0,64.0,64.0,69.0,63.0,71.0,8.0
max,43.0,93.0,95.0,205.74,243.0,93.0,26.0,105500000.0,560000.0,203100000.0,437.0,94.0,95.0,93.0,94.0,90.0,470.0,96.0,94.0,94.0,93.0,96.0,464.0,97.0,96.0,96.0,95.0,97.0,444.0,95.0,95.0,97.0,97.0,94.0,421.0,96.0,91.0,95.0,95.0,92.0,96.0,272.0,94.0,93.0,90.0,440.0,90.0,92.0,93.0,91.0,90.0,2316.0,498.0,5.0,5.0,5.0,96.0,93.0,93.0,95.0,91.0,91.0,4500.0


### TODO 6 - Making Team & Contract  seperate columns

- Splitting data "FC Barcelona 2004 ~ 2021" into separate "FC Barcelona" and "2004 ~ 2021" using `lambda x: x.strip().split('\n')`

In [45]:
df['Contract']= df['Team'].apply(lambda x: x.strip().split('\n')[1])

In [46]:
df['Team']= df['Team'].apply(lambda x: x.strip().split('\n')[0])

In [47]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits,Contract
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,FC Barcelona,170.18,159,Left,93,RW,0,"Jul 1, 2004",67500000,560000,138400000,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,372,2004 ~ 2021
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,Juventus,187.96,183,Right,92,ST,0,"Jul 10, 2018",46000000,220000,75900000,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,344,2018 ~ 2022
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,Atlético Madrid,187.96,192,Right,91,GK,2,"Jul 16, 2014",75000000,125000,159400000,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,86,2014 ~ 2023


In [48]:
df['Contract'] = df['Contract'].str.replace('~', '-')

In [49]:
#Checking Dtype of "Joined Date"
df["Joined Date"].dtypes

dtype('O')

### TODO 7 - Converting Date Columns from object to Date and creating seperate day, month and year columns

In [50]:
#Converting into datetime datatype
df["Joined Date"] = pd.to_datetime(df['Joined Date'])

In [51]:
df['Joined Date'].dtype

dtype('<M8[ns]')

In [52]:
#Creating 'Year', 'Month', 'Day' Columns using datetime column
df['Year'] = df["Joined Date"].dt.year
df['Month'] = df["Joined Date"].dt.month_name()
df['Day'] = df["Joined Date"].dt.day_name()

In [53]:
df.head(3)

Unnamed: 0,Full Name,Nationality,Positions,Name,Age,Overall Rating,Potential,Team,Height(cm),Weight(lbs),foot,Best Overall,Best Position,Growth,Joined Date,Value(€),Wage(€),Release Clause(€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F(★),SM(★),A/W,D/W,IR(★),PAC,SHO,PAS,DRI,DEF,PHY,Hits,Contract,Year,Month,Day
0,Lionel Messi,Argentina,RW ST CF,L. Messi,33,93,93,FC Barcelona,170.18,159,Left,93,RW,0,2004-07-01,67500000,560000,138400000,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,372,2004 - 2021,2004,July,Thursday
1,C. Ronaldo dos Santos Aveiro,Portugal,ST LW,Cristiano Ronaldo,35,92,92,Juventus,187.96,183,Right,92,ST,0,2018-07-10,46000000,220000,75900000,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,344,2018 - 2022,2018,July,Tuesday
2,Jan Oblak,Slovenia,GK,J. Oblak,27,91,93,Atlético Madrid,187.96,192,Right,91,GK,2,2014-07-16,75000000,125000,159400000,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,86,2014 - 2023,2014,July,Wednesday


In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18730 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Full Name          18730 non-null  object        
 1   Nationality        18730 non-null  object        
 2   Positions          18730 non-null  object        
 3   Name               18730 non-null  object        
 4   Age                18730 non-null  int64         
 5   Overall Rating     18730 non-null  int64         
 6   Potential          18730 non-null  int64         
 7   Team               18730 non-null  object        
 8   Height(cm)         18730 non-null  float64       
 9   Weight(lbs)        18730 non-null  int64         
 10  foot               18730 non-null  object        
 11  Best Overall       18730 non-null  int64         
 12  Best Position      18730 non-null  object        
 13  Growth             18730 non-null  int64         
 14  Joined

In [55]:
col_names = ['Full Name', 'Name', 'Age', 'Height(cm)', 'Weight(lbs)', 'Nationality', 'Team', 'Contract', 'Joined Date', 'Year',
       'Month', 'Day',
       'Value(€)', 'Wage(€)', 'Release Clause(€)', 'Positions',
       'Overall Rating', 'Potential',
       'foot', 'Best Overall', 'Best Position', 'Growth', 'Attacking', 'Crossing',
       'Finishing', 'Heading Accuracy', 'Short Passing', 'Volleys', 'Skill',
       'Dribbling', 'Curve', 'FK Accuracy', 'Long Passing', 'Ball Control',
       'Movement', 'Acceleration', 'Sprint Speed', 'Agility', 'Reactions',
       'Balance', 'Power', 'Shot Power', 'Jumping', 'Stamina', 'Strength',
       'Long Shots', 'Mentality', 'Aggression', 'Interceptions', 'Positioning',
       'Vision', 'Penalties', 'Composure', 'Defending', 'Marking',
       'Standing Tackle', 'Sliding Tackle', 'Goalkeeping', 'GK Diving',
       'GK Handling', 'GK Kicking', 'GK Positioning', 'GK Reflexes',
       'Total Stats', 'Base Stats', 'A/W', 'D/W',
       'PAC', 'SHO', 'PAS', 'DRI', 'DEF', 'PHY', 'Hits', 'W/F(★)', 'SM(★)', 'IR(★)']

In [56]:
final_df = df[col_names]

In [57]:
final_df.head()

Unnamed: 0,Full Name,Name,Age,Height(cm),Weight(lbs),Nationality,Team,Contract,Joined Date,Year,Month,Day,Value(€),Wage(€),Release Clause(€),Positions,Overall Rating,Potential,foot,Best Overall,Best Position,Growth,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,A/W,D/W,PAC,SHO,PAS,DRI,DEF,PHY,Hits,W/F(★),SM(★),IR(★)
0,Lionel Messi,L. Messi,33,170.18,159,Argentina,FC Barcelona,2004 - 2021,2004-07-01,2004,July,Thursday,67500000,560000,138400000,RW ST CF,93,93,Left,93,RW,0,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,Medium,Low,85,92,91,95,38,65,372,4,4,5
1,C. Ronaldo dos Santos Aveiro,Cristiano Ronaldo,35,187.96,183,Portugal,Juventus,2018 - 2022,2018-07-10,2018,July,Tuesday,46000000,220000,75900000,ST LW,92,92,Right,92,ST,0,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,High,Low,89,93,81,89,35,77,344,4,5,5
2,Jan Oblak,J. Oblak,27,187.96,192,Slovenia,Atlético Madrid,2014 - 2023,2014-07-16,2014,July,Wednesday,75000000,125000,159400000,GK,91,93,Right,91,GK,2,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,Medium,Medium,87,92,78,90,52,90,86,3,1,3
3,Kevin De Bruyne,K. De Bruyne,29,180.34,154,Belgium,Manchester City,2015 - 2023,2015-08-30,2015,August,Sunday,87000000,370000,161000000,CAM CM,91,91,Right,91,CAM,0,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,High,High,76,86,93,88,64,78,163,5,4,4
4,Neymar da Silva Santos Jr.,Neymar Jr,28,175.26,150,Brazil,Paris Saint-Germain,2017 - 2022,2017-08-03,2017,August,Thursday,90000000,270000,166500000,LW CAM,91,91,Right,91,LW,0,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,High,Medium,91,85,86,94,36,59,273,5,5,5


### Below Code can be used to add a new column which gives the number of year player spent in a club
`import datetime as dt

dt.datetime.today().strftime("%m/%d/%Y")

final_df.assign(year_in_club = dt.datetime.today().year - final_df.loc[:,['Year']], inplace=True)`

In [58]:
final_df.head()

Unnamed: 0,Full Name,Name,Age,Height(cm),Weight(lbs),Nationality,Team,Contract,Joined Date,Year,Month,Day,Value(€),Wage(€),Release Clause(€),Positions,Overall Rating,Potential,foot,Best Overall,Best Position,Growth,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,A/W,D/W,PAC,SHO,PAS,DRI,DEF,PHY,Hits,W/F(★),SM(★),IR(★)
0,Lionel Messi,L. Messi,33,170.18,159,Argentina,FC Barcelona,2004 - 2021,2004-07-01,2004,July,Thursday,67500000,560000,138400000,RW ST CF,93,93,Left,93,RW,0,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,Medium,Low,85,92,91,95,38,65,372,4,4,5
1,C. Ronaldo dos Santos Aveiro,Cristiano Ronaldo,35,187.96,183,Portugal,Juventus,2018 - 2022,2018-07-10,2018,July,Tuesday,46000000,220000,75900000,ST LW,92,92,Right,92,ST,0,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,High,Low,89,93,81,89,35,77,344,4,5,5
2,Jan Oblak,J. Oblak,27,187.96,192,Slovenia,Atlético Madrid,2014 - 2023,2014-07-16,2014,July,Wednesday,75000000,125000,159400000,GK,91,93,Right,91,GK,2,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,Medium,Medium,87,92,78,90,52,90,86,3,1,3
3,Kevin De Bruyne,K. De Bruyne,29,180.34,154,Belgium,Manchester City,2015 - 2023,2015-08-30,2015,August,Sunday,87000000,370000,161000000,CAM CM,91,91,Right,91,CAM,0,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,High,High,76,86,93,88,64,78,163,5,4,4
4,Neymar da Silva Santos Jr.,Neymar Jr,28,175.26,150,Brazil,Paris Saint-Germain,2017 - 2022,2017-08-03,2017,August,Thursday,90000000,270000,166500000,LW CAM,91,91,Right,91,LW,0,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,High,Medium,91,85,86,94,36,59,273,5,5,5


In [59]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18730 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Full Name          18730 non-null  object        
 1   Name               18730 non-null  object        
 2   Age                18730 non-null  int64         
 3   Height(cm)         18730 non-null  float64       
 4   Weight(lbs)        18730 non-null  int64         
 5   Nationality        18730 non-null  object        
 6   Team               18730 non-null  object        
 7   Contract           18730 non-null  object        
 8   Joined Date        18730 non-null  datetime64[ns]
 9   Year               18730 non-null  int64         
 10  Month              18730 non-null  object        
 11  Day                18730 non-null  object        
 12  Value(€)           18730 non-null  int64         
 13  Wage(€)            18730 non-null  int64         
 14  Releas

## References:
- [Kaggle Dataset](https://www.kaggle.com/datasets/yagunnersya/fifa-21-messy-raw-dataset-for-cleaning-exploring?select=fifa21_raw_data.csv%E2%80%8B)
- [FIFA Players Data](https://sofifa.com/)
- [Pandas Documentation](https://pandas.pydata.org/docs/#pandas-documentation)