# FIFA World Cup Data Analysis

![](https://img.fifa.com/image/upload/t_l4/v1543921822/ex1ksdevyxwsgu7rzdv6.jpg)


### Project Overview
The dataset we analyse is FIFA World Cup which have 3 different datasets about Winners, Players and Matches. The datasets are not perfect with lot of duplicated and incorrect data so our job is to clean the datasets to make it usable for meaningful data extraction.

Please note: during the data cleaning process, we use the explode() function so please upgrade your pandas to 0.25.0 in order to make this function works.

### The datasets
_For more information about the dataset, read [here](https://www.kaggle.com/abecklas/fifa-world-cup)._

### Our process
1. Project Planning: We spent some time to plan for the story, share the work among team member, brainstom ideas for data analysis and visualisation.
2. Clean the datasets	
3. Do Explanatory Data Analysis on Jupyter Notebook	
4. Do more detailed analysis with data visualization using Google Data Studio	
5. Present It!

### Google Data Studio report
[Please click here to view our report](https://datastudio.google.com/open/1ybUFdJcafz1F46MHy8rC8HQX4-2utHwC)



### Import libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import re
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")

In [2]:
# from google.colab import drive
# drive.mount('/content/gdrive')

### Create Data Frames

In [3]:
# df_matches_raw = pd.read_csv('/content/gdrive/My Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/matches.csv', encoding='utf-8')
# df_players_raw = pd.read_csv('/content/gdrive/My Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/players.csv', encoding='utf-8')

df_matches_raw = pd.read_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/matches.csv', encoding='UTF-8')
df_winners_raw = pd.read_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/winners.csv', encoding='UTF-8')
df_players_raw = pd.read_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/players.csv', encoding='UTF-8')

# Clean Data

In [4]:
# Because the Event column store multiple data in one data point, 
# we will split its string into multiple data points in order to extract more valuable insights
df_players_raw = df_players_raw.assign(Event=df_players_raw["Event"].str.split('\s')).explode('Event').reset_index(drop=True)

# Define helper functions
def get_event_type(row):
  """ Return only the words from Event rows
  """
  event_type = ''
  if pd.notna(row):
    event_type = re.findall(r"[A-Z]+", row)[0]
  return event_type

def get_event_at(row):
  """ Return only the numbers from Event rows
  """
  event_at = ''
  if pd.notna(row):
    event_at = re.findall(r"\d+", row)[0]
  return event_at

# Proceeding the string manipulation
df_players_raw["Event At"] = df_players_raw.apply(lambda x: get_event_at(x['Event']) , axis=1)
df_players_raw["Event"] = df_players_raw.apply(lambda x: get_event_type(x['Event']) , axis=1)

# Rename Event column to Event Type
df_players_raw.rename(columns={'Event': 'Event Type'}, inplace=True)

# Test
df_players_raw[df_players_raw['Event Type'] != ""].sample(10)

Unnamed: 0,RoundID,MatchID,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At
36787,255931,300186463,ECU,RUEDA Reinaldo (COL),S,14,MINDA,,O,83
20043,323,129,TCH,VENGLOS Jozef (SVK),S,9,Lubos KUBIK,,G,76
19632,322,263,ITA,VICINI Azeglio (ITA),S,17,Roberto DONADONI,,O,51
38666,255951,300186487,BRA,SCOLARI Luiz Felipe (BRA),S,17,L GUSTAVO,,Y,60
35237,249718,300061508,URU,TABAREZ Oscar (URU),N,13,W.S.ABREU.G,,I,76
20425,751,243,IRL,CHARLTON Jack (ENG),N,10,Tony CASCARINO,,I,53
23483,1014,8731,ESP,CLEMENTE Javier (ESP),S,14,I. CAMPO,,Y,75
18089,714,421,ESP,MUNOZ Miguel (ESP),N,7,Juan Antonio SENOR,,IH,46
14979,293,740,ALG,MEKHLOUFI Rachid (ALG),S,16,Faouzi MANSOURI,,O,75
237,201,1092,YUG,SIMONOVIC Bosko (YUG),S,0,Ivica BEK,,G,67


In [5]:
df_matches_raw.tail()

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
4567,,,,,,,,,,,,,,,,,,,,
4568,,,,,,,,,,,,,,,,,,,,
4569,,,,,,,,,,,,,,,,,,,,
4570,,,,,,,,,,,,,,,,,,,,
4571,,,,,,,,,,,,,,,,,,,,


In [6]:
df_players_raw.head(2)

Unnamed: 0,RoundID,MatchID,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At
0,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,
1,201,1096,MEX,LUQUE Juan (MEX),S,0,Oscar BONFIGLIO,GK,,


In [7]:
df_winners_raw.head(5)

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607


#### Remove NaN

In [8]:
# Remove null rows of Matches dataset
# Get data when RoundID is null
df_matches = df_matches_raw.copy()
df_players = df_players_raw.copy()
df_winners = df_winners_raw.copy()

df_matches = df_matches[(df_matches["RoundID"].isnull() == False) & (df_matches["MatchID"].isnull() == False)]

# Find NaN values from the data points
df_matches.isnull().sum()

Year                    0
Datetime                0
Stage                   0
Stadium                 0
City                    0
Home Team Name          0
Home Team Goals         0
Away Team Goals         0
Away Team Name          0
Win conditions          0
Attendance              2
Half-time Home Goals    0
Half-time Away Goals    0
Referee                 0
Assistant 1             0
Assistant 2             0
RoundID                 0
MatchID                 0
Home Team Initials      0
Away Team Initials      0
dtype: int64

In [9]:
# Get the 2 rows with Attendance is NaN
df_matches[df_matches["Attendance"].isnull()]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
823,2014.0,30 Jun 2014 - 17:00,Round of 16,Estadio Beira-Rio,Porto Alegre,Germany,2.0,1.0,Algeria,Germany win after extra time,,0.0,0.0,RICCI Sandro (BRA),DE CARVALHO Emerson (BRA),VAN GASSE Marcelo (BRA),255951.0,300186460.0,GER,ALG
841,2014.0,30 Jun 2014 - 17:00,Round of 16,Estadio Beira-Rio,Porto Alegre,Germany,2.0,1.0,Algeria,Germany win after extra time,,0.0,0.0,RICCI Sandro (BRA),DE CARVALHO Emerson (BRA),VAN GASSE Marcelo (BRA),255951.0,300186460.0,GER,ALG


In [10]:
# Clean NaN value
df_matches["Attendance"].fillna(value = 0, inplace = True)
df_matches = df_matches.fillna('')
df_players = df_players.fillna('')
df_winners = df_winners.fillna('')

In [11]:
df_matches.sample(5)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
636,2002.0,21 Jun 2002 - 15:30,Quarter-finals,Shizuoka Stadium Ecopa,Shizuoka,England,1.0,2.0,Brazil,,47436.0,1.0,1.0,RAMOS RIZO Felipe (MEX),VERGARA Hector (CAN),SAEED Mohamed (MDV),43950300.0,43950057.0,ENG,BRA
290,1978.0,11 Jun 1978 - 13:45,Group 3,Estadio Jos� Mar�a Minella,Mar Del Plata,Brazil,1.0,0.0,Austria,,35221.0,1.0,0.0,WURTZ Robert (FRA),BOUZO Farouk (SYR),GEBREYESUS DIFUE Tesfaye (ERI),278.0,2215.0,BRA,AUT
489,1994.0,26 Jun 1994 - 16:00,Group A,Stanford Stadium,San Francisco,Switzerland,0.0,2.0,Colombia,,83401.0,0.0,1.0,MIKKELSEN Peter (DEN),CHRISTENSEN Carl-Johan Meyer (DEN),JAMES Douglas Micael (TRI),337.0,3074.0,SUI,COL
540,1998.0,20 Jun 1998 - 14:30,Group H,La Beaujoire,Nantes,Japan,0.0,1.0,Croatia,,35500.0,0.0,0.0,RAMDHAN Ramesh (TRI),GONZALES Merere (TRI),SALIE Achmat (RSA),1014.0,8751.0,JPN,CRO
641,2002.0,26 Jun 2002 - 20:30,Semi-finals,Saitama Stadium 2002,Saitama,Brazil,1.0,0.0,Turkey,,61058.0,0.0,0.0,NIELSEN Kim Milton (DEN),WIERZBOWSKI Maciej (POL),SRAMKA Igor (SVK),43950400.0,43950062.0,BRA,TUR


In [12]:
# Process Attendance values
df_winners = df_winners_raw.fillna('')
df_winners["Attendance"] = df_winners["Attendance"].str.replace('.', '')
df_winners.head()

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1045246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768607


#### Clean Data Point values

In [13]:
# Rename Germany FR to Germany
df_winners.replace("Germany FR", "Germany", inplace=True)
df_matches.replace("Germany FR", "Germany", inplace=True)
df_players.replace("Germany FR", "Germany", inplace=True)
# df_matches.replace("German DR", "Germany", inplace=True)
# df_players.replace("German DR", "Germany", inplace=True)

# Fix country names
df_matches.replace("IR Iran", "Iran", inplace=True)
df_matches.replace("rn\">Bosnia and Herzegovina", "Bosnia and Herzegovina", inplace=True)
df_matches.replace("rn\">Serbia and Montenegro", "Serbia and Montenegro", inplace=True)
df_matches.replace("rn\">United Arab Emirates", "United Arab Emirates", inplace=True)
df_matches.replace("rn\">Republic of Ireland", "Republic of Ireland", inplace=True)
df_matches.replace("rn\">Trinidad and Tobago", "Trinidad and Tobago", inplace=True)
df_matches.replace("C�te d'Ivoire", "Ivory Coast", inplace=True)
df_matches.replace("Korea DPR", "North Korea", inplace=True)
df_players.replace("Korea DPR", "North Korea", inplace=True)

# Fix Stadium names
df_matches.replace("Maracan� - Est�dio Jornalista M�rio Filho", "Maracan�", inplace=True)

#### Separate Datetime column

In [14]:
# Separate DateTime column
df_matches.insert(loc = 2, column="Time", value="")
df_matches.head(2)

Unnamed: 0,Year,Datetime,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930.0,13 Jul 1930 - 15:00,,Group 1,Pocitos,Montevideo,France,4.0,1.0,Mexico,...,4444.0,3.0,0.0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201.0,1096.0,FRA,MEX
1,1930.0,13 Jul 1930 - 15:00,,Group 4,Parque Central,Montevideo,USA,3.0,0.0,Belgium,...,18346.0,2.0,0.0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201.0,1090.0,USA,BEL


In [15]:
df_matches.loc[:, 'Time'] = df_matches["Datetime"].apply(lambda x: x.split('-')[1].strip())
df_matches.loc[:, 'Datetime'] = df_matches["Datetime"].apply(lambda x: x.split('-')[0].strip())
df_matches.head(2)

Unnamed: 0,Year,Datetime,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4.0,1.0,Mexico,...,4444.0,3.0,0.0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201.0,1096.0,FRA,MEX
1,1930.0,13 Jul 1930,15:00,Group 4,Parque Central,Montevideo,USA,3.0,0.0,Belgium,...,18346.0,2.0,0.0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201.0,1090.0,USA,BEL


In [16]:
# Rename Datetime column
df_matches.rename(columns={'Datetime': 'Date'}, inplace=True)
df_matches.sample(5)

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
222,1970.0,11 Jun 1970,16:00,Group 3,Jalisco,Guadalajara,England,1.0,0.0,Czechoslovakia,...,49292.0,0.0,0.0,MACHIN Roger (FRA),EMSBERGER Gyula (HUN),MARSCHALL Ferdinand (AUT),250.0,1813.0,ENG,TCH
788,2014.0,17 Jun 2014,18:00,Group H,Arena Pantanal,Cuiaba,Russia,1.0,1.0,Korea Republic,...,37603.0,0.0,0.0,PITANA Nestor (ARG),MAIDANA Hernan (ARG),BELATTI Juan Pablo (ARG),255931.0,300186499.0,RUS,KOR
432,1990.0,16 Jun 1990,21:00,Group F,Sant Elia,Cagliari,England,0.0,0.0,Netherlands,...,35267.0,0.0,0.0,PETROVIC Zoran (SRB),HANSAL Mohamed (ALG),CODESAL MENDEZ Edgardo (MEX),322.0,160.0,ENG,NED
358,1982.0,10 Jul 1982,20:00,Match for third place,Jose Rico Perez,Alicante,Poland,3.0,2.0,France,...,28000.0,2.0,1.0,GARRIDO Antonio (POR),RUBIO VAZQUEZ Mario (MEX),LACARNE Belaid (ALG),676.0,921.0,POL,FRA
522,1998.0,12 Jun 1998,21:00,Group C,Stade V�lodrome,Marseilles,France,3.0,0.0,South Africa,...,55000.0,1.0,0.0,REZENDE Marcio (BRA),PINTO Arnaldo (BRA),GONZALES Merere (TRI),1014.0,8730.0,FRA,RSA


#### Check data duplication

In [17]:
# Check duplication
df_matches["MatchID"].nunique() == df_matches["MatchID"].count()

False

In [18]:
# Find duplicated rows from the dataset
def get_duplicated_data(df, key):
  """
    Return a DataFrame of duplicated data points of a given dataset
  """
  
  df_key = df[key]
  return df[df_key.isin(df_key[df_key.duplicated()])].sort_values(key)
  
get_duplicated_data(df_matches, "MatchID")

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
839,2014.0,29 Jun 2014,17:00,Round of 16,Arena Pernambuco,Recife,Costa Rica,1.0,1.0,Greece,...,41242.0,0.0,0.0,Ben WILLIAMS (AUS),CREAM Matthew (AUS),ANAZ Hakan (AUS),255951.0,300186459.0,CRC,GRE
833,2014.0,29 Jun 2014,17:00,Round of 16,Arena Pernambuco,Recife,Costa Rica,1.0,1.0,Greece,...,41242.0,0.0,0.0,Ben WILLIAMS (AUS),CREAM Matthew (AUS),ANAZ Hakan (AUS),255951.0,300186459.0,CRC,GRE
841,2014.0,30 Jun 2014,17:00,Round of 16,Estadio Beira-Rio,Porto Alegre,Germany,2.0,1.0,Algeria,...,0.0,0.0,0.0,RICCI Sandro (BRA),DE CARVALHO Emerson (BRA),VAN GASSE Marcelo (BRA),255951.0,300186460.0,GER,ALG
823,2014.0,30 Jun 2014,17:00,Round of 16,Estadio Beira-Rio,Porto Alegre,Germany,2.0,1.0,Algeria,...,0.0,0.0,0.0,RICCI Sandro (BRA),DE CARVALHO Emerson (BRA),VAN GASSE Marcelo (BRA),255951.0,300186460.0,GER,ALG
824,2014.0,04 Jul 2014,17:00,Quarter-finals,Estadio Castelao,Fortaleza,Brazil,2.0,1.0,Colombia,...,60342.0,1.0,0.0,Carlos VELASCO CARBALLO (ESP),ALONSO FERNANDEZ Roberto (ESP),YUSTE Juan (ESP),255953.0,300186461.0,BRA,COL
845,2014.0,04 Jul 2014,17:00,Quarter-finals,Estadio Castelao,Fortaleza,Brazil,2.0,1.0,Colombia,...,60342.0,1.0,0.0,Carlos VELASCO CARBALLO (ESP),ALONSO FERNANDEZ Roberto (ESP),YUSTE Juan (ESP),255953.0,300186461.0,BRA,COL
840,2014.0,30 Jun 2014,13:00,Round of 16,Estadio Nacional,Brasilia,France,2.0,0.0,Nigeria,...,67882.0,0.0,0.0,GEIGER Mark (USA),HURD Sean (USA),FLETCHER Joe (CAN),255951.0,300186462.0,FRA,NGA
822,2014.0,30 Jun 2014,13:00,Round of 16,Estadio Nacional,Brasilia,France,2.0,0.0,Nigeria,...,67882.0,0.0,0.0,GEIGER Mark (USA),HURD Sean (USA),FLETCHER Joe (CAN),255951.0,300186462.0,FRA,NGA
826,2014.0,08 Jul 2014,17:00,Semi-finals,Estadio Mineirao,Belo Horizonte,Brazil,1.0,7.0,Germany,...,58141.0,0.0,5.0,RODRIGUEZ Marco (MEX),TORRENTERA Marvin (MEX),QUINTERO Marcos (MEX),255955.0,300186474.0,BRA,GER
848,2014.0,08 Jul 2014,17:00,Semi-finals,Estadio Mineirao,Belo Horizonte,Brazil,1.0,7.0,Germany,...,58141.0,0.0,5.0,RODRIGUEZ Marco (MEX),TORRENTERA Marvin (MEX),QUINTERO Marcos (MEX),255955.0,300186474.0,BRA,GER


In [19]:
df_matches.drop_duplicates(keep = 'first', inplace = True)

In [20]:
df_players.drop_duplicates(keep = 'first', inplace = True)

#### Feature Engineering

In [21]:
country_dict = {'FRA':'France', 'MEX': 'Mexico', 'USA':'USA', 'BEL':'Belgium', 'YUG':'Yugoslavia', 'BRA':'Brazil', 'ROU':'Romania', 
                'PER':'Peru', 'ARG':'Argentina', 'CHI':'Chile', 'BOL':'Bolivia', 'PAR':'Paraguay', 'URU':'Burundi', 'AUT':'Austria', 'HUN':'Hungary', 
                'EGY':'Egypt', 'SUI':'Switzerland', 'NED':'Netherlands', 'SWE':'Sweden', 'GER':'Germany', 'ESP':'Spain', 'ITA':'Italy', 'TCH':'Czechoslovakia', 'INH':'Dutch East Indies', 
                'CUB':'Cuba', 'NOR':'Norway', 'POL':'Poland', 'FRG':'Germany', 'GDR' : 'German DR',
                'ENG':'England', 'SCO':'Scotland', 'TUR':'Turkey', 'KOR':'Korea Republic', 'URS':'Soviet Union', 'WAL':'Wales', 'NIR':'Northern Ireland', 'COL':'Colombia', 
                'BUL':'Bulgaria', 'PRK':'North Korea', 'POR':'Portugal', 'ISR':'Israel', 'MAR':'Morocco', 'SLV':'El Salvador', 'AUS':'Australia', 'ZAI':'Zaire', 
                'HAI':'Haiti', 'TUN':'Tunisia', 'IRN':'IR Iran', 'CMR':'Cameroon', 'NZL':'New Zealand', 'ALG':'Algeria', 'HON':'Honduras', 'KUW':'Kuwait', 'CAN':'Canada', 
                'IRQ':'Iraq', 'DEN':'Denmark', 'UAE':'United Arab Emirates', 'CRC':'Costa Rica', 'IRL':'Republic of Ireland', 'KSA':'Saudi Arabia', 'RUS':'Russia', 'GRE':'Greece', 'NGA':'Nigeria', 
                'RSA':'South Africa', 'JPN':'Japan', 'JAM':'Jamaica', 'CRO':'Croatia', 'SEN':'Senegal', 'SVN':'Slovenia', 'ECU':'Ecuador', 'CHN':'China PR', 'TRI':'Trinidad and Tobago',
                'CIV':'Ivory Coast', 'SCG':'Serbia and Montenegro', 'ANG':'Angola', 'CZE':'Czech Republic', 'GHA':'Ghana', 'TOG':'Togo', 'UKR':'Ukraine', 'SRB':'Serbia', 'SVK':'Slovakia', 'BIH':'Bosnia and Herzegovina'}

# Input a player name and return the country name
def get_country_name_from_player(df, player_name):
  initials = df[df["Player Name"] == player_name]['Team Initials'].unique()[0]
  
  return country_dict[initials]

def get_country_name_by_initials(initials):
  return country_dict[initials]

In [22]:
df_players["Team Name"] = df_players["Team Initials"].apply(lambda x: get_country_name_by_initials(x))

#### Convert Data Types

In [23]:
# Convert data type of Players dataset
df_players["Team Initials"] = df_players["Team Initials"].astype("category")
df_players["Coach Name"] = df_players["Coach Name"].astype("category")
df_players["Line-up"] = df_players["Line-up"].astype("category")
df_players["Player Name"] = df_players["Player Name"].astype("category")
df_players["Position"] = df_players["Position"].astype("category")
df_players["Event Type"] = df_players["Event Type"].astype("category")
df_players["Event At"] = df_players["Event At"].astype("category")
df_players["Team Name"] = df_players["Team Name"].astype("category")
df_players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38646 entries, 0 to 38645
Data columns (total 11 columns):
RoundID          38646 non-null int64
MatchID          38646 non-null int64
Team Initials    38646 non-null category
Coach Name       38646 non-null category
Line-up          38646 non-null category
Shirt Number     38646 non-null int64
Player Name      38646 non-null category
Position         38646 non-null category
Event Type       38646 non-null category
Event At         38646 non-null category
Team Name        38646 non-null category
dtypes: category(8), int64(3)
memory usage: 1.9 MB


In [24]:
# Convert data type of Winners dataset
df_winners["Country"] = df_winners["Country"].astype("category")
df_winners["Winner"] = df_winners["Winner"].astype("category")
df_winners["Runners-Up"] = df_winners["Runners-Up"].astype("category")
df_winners["Third"] = df_winners["Third"].astype("category")
df_winners["Fourth"] = df_winners["Fourth"].astype("category")
df_winners["Attendance"] = df_winners["Attendance"].astype("int")
df_winners.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
Year              20 non-null int64
Country           20 non-null category
Winner            20 non-null category
Runners-Up        20 non-null category
Third             20 non-null category
Fourth            20 non-null category
GoalsScored       20 non-null int64
QualifiedTeams    20 non-null int64
MatchesPlayed     20 non-null int64
Attendance        20 non-null int64
dtypes: category(5), int64(5)
memory usage: 4.0 KB


In [25]:
# Convert data type of Matches dataset
df_matches["Year"] = df_matches["Year"].astype("category")
df_matches["Date"] = df_matches["Date"].astype("category")
df_matches["Time"] = df_matches["Time"].astype("category")
# df_matches["Datetime"] = df_matches["Datetime"].astype("category")
df_matches["Stage"] = df_matches["Stage"].astype("category")
df_matches["Stadium"] = df_matches["Stadium"].astype("category")
df_matches["City"] = df_matches["City"].astype("category")
df_matches["Home Team Name"] = df_matches["Home Team Name"].astype("category")
df_matches["Home Team Goals"] = df_matches["Home Team Goals"].astype("int")
df_matches["Away Team Goals"] = df_matches["Away Team Goals"].astype("int")
df_matches["Away Team Name"] = df_matches["Away Team Name"].astype("category")
df_matches["Win conditions"] = df_matches["Win conditions"].astype("category")
df_matches["Attendance"] = df_matches["Attendance"].astype("int")
df_matches["Half-time Home Goals"] = df_matches["Half-time Home Goals"].astype("int")
df_matches["Half-time Away Goals"] = df_matches["Half-time Away Goals"].astype("int")
df_matches["Referee"] = df_matches["Referee"].astype("category")
df_matches["Assistant 1"] = df_matches["Assistant 1"].astype("category")
df_matches["Assistant 2"] = df_matches["Assistant 2"].astype("category")
df_matches["RoundID"] = df_matches["RoundID"].astype("int")
df_matches["MatchID"] = df_matches["MatchID"].astype("int")
df_matches["Home Team Initials"] = df_matches["Home Team Initials"].astype("category")
df_matches["Away Team Initials"] = df_matches["Away Team Initials"].astype("category")
df_matches.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 836 entries, 0 to 835
Data columns (total 21 columns):
Year                    836 non-null category
Date                    836 non-null category
Time                    836 non-null category
Stage                   836 non-null category
Stadium                 836 non-null category
City                    836 non-null category
Home Team Name          836 non-null category
Home Team Goals         836 non-null int64
Away Team Goals         836 non-null int64
Away Team Name          836 non-null category
Win conditions          836 non-null category
Attendance              836 non-null int64
Half-time Home Goals    836 non-null int64
Half-time Away Goals    836 non-null int64
Referee                 836 non-null category
Assistant 1             836 non-null category
Assistant 2             836 non-null category
RoundID                 836 non-null int64
MatchID                 836 non-null int64
Home Team Initials      836 non-null categ

In [26]:
# df_matches_players = df_matches.copy()
# df_matches_players.merge(df_players, left_on="MatchID", right_on="MatchID")
# Merge Players and Matches datasets
df_matches_players = pd.merge(df_matches, df_players, how='outer', on="MatchID")

df_matches_players[df_matches_players['Player Name'] == 'Alex THEPOT']

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,RoundID_y,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At,Team Name
0,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,...,201,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,,France
146,1930.0,15 Jul 1930,16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,...,201,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,,France
341,1930.0,19 Jul 1930,12:50,Group 1,Estadio Centenario,Montevideo,Chile,1,0,France,...,201,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,,France
705,1934.0,27 May 1934,16:30,Preliminary round,Stadio Benito Mussolini,Turin,Austria,3,2,France,...,204,FRA,KIMPTON George (ENG),S,0,Alex THEPOT,GKC,,,France


In [27]:
# Convert data type
df_matches_players["Year"] = df_matches_players["Year"].astype("category")
df_matches_players["Date"] = df_matches_players["Date"].astype("category")
df_matches_players["Time"] = df_matches_players["Time"].astype("category")
df_matches_players["Stage"] = df_matches_players["Stage"].astype("category")
df_matches_players["Stadium"] = df_matches_players["Stadium"].astype("category")
df_matches_players["City"] = df_matches_players["City"].astype("category")
df_matches_players["Home Team Name"] = df_matches_players["Home Team Name"].astype("category")
df_matches_players["Home Team Goals"] = df_matches_players["Home Team Goals"].astype("int")
df_matches_players["Away Team Goals"] = df_matches_players["Away Team Goals"].astype("int")
df_matches_players["Away Team Name"] = df_matches_players["Away Team Name"].astype("category")
df_matches_players["Win conditions"] = df_matches_players["Win conditions"].astype("category")
df_matches_players["Attendance"] = df_matches_players["Attendance"].astype("int")
df_matches_players["Half-time Home Goals"] = df_matches_players["Half-time Home Goals"].astype("int")
df_matches_players["Half-time Away Goals"] = df_matches_players["Half-time Away Goals"].astype("int")
df_matches_players["Referee"] = df_matches_players["Referee"].astype("category")
df_matches_players["Assistant 1"] = df_matches_players["Assistant 1"].astype("category")
df_matches_players["Assistant 2"] = df_matches_players["Assistant 2"].astype("category")
df_matches_players["RoundID_x"] = df_matches_players["RoundID_x"].astype("int")
df_matches_players["RoundID_y"] = df_matches_players["RoundID_y"].astype("int")
df_matches_players["MatchID"] = df_matches_players["MatchID"].astype("int")
df_matches_players["Home Team Initials"] = df_matches_players["Home Team Initials"].astype("category")
df_matches_players["Away Team Initials"] = df_matches_players["Away Team Initials"].astype("category")
df_matches_players["Event Type"] = df_matches_players["Event Type"].astype("category")
df_matches_players["Event At"] = df_matches_players["Event At"].astype("category")
df_matches_players["Team Name"] = df_matches_players["Team Name"].astype("category")
df_matches_players["Team Initials"] = df_matches_players["Team Initials"].astype("category")
df_matches_players["Coach Name"] = df_matches_players["Coach Name"].astype("category")
df_matches_players["Line-up"] = df_matches_players["Line-up"].astype("category")
df_matches_players["Player Name"] = df_matches_players["Player Name"].astype("category")
df_matches_players["Position"] = df_matches_players["Position"].astype("category")

df_matches_players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38646 entries, 0 to 38645
Data columns (total 31 columns):
Year                    38646 non-null category
Date                    38646 non-null category
Time                    38646 non-null category
Stage                   38646 non-null category
Stadium                 38646 non-null category
City                    38646 non-null category
Home Team Name          38646 non-null category
Home Team Goals         38646 non-null int64
Away Team Goals         38646 non-null int64
Away Team Name          38646 non-null category
Win conditions          38646 non-null category
Attendance              38646 non-null int64
Half-time Home Goals    38646 non-null int64
Half-time Away Goals    38646 non-null int64
Referee                 38646 non-null category
Assistant 1             38646 non-null category
Assistant 2             38646 non-null category
RoundID_x               38646 non-null int64
MatchID                 38646 non-null int64


#### Write the cleaned datasets to files

In [28]:
df_winners.to_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/winners_cleaned.csv', index = False)
df_matches.to_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/matches_cleaned.csv', index = False)
df_players.to_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/players_cleaned.csv', index = False)
df_matches_players.to_csv('/Users/jodythai/Google Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/matches_players_cleaned.csv', index = False)

# EDA

# TODO

### Get data by Events Type

### Make Corr() Diagrams

In [29]:
df_matches_players[df_matches_players["Team Initials"] == "GDR"]

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,RoundID_y,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At,Team Name
10258,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,1,Juergen CROY,GK,,,German DR
10260,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,3,Bernd BRANSCH,C,,,German DR
10262,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,4,Konrad WEISE,,,,German DR
10264,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,7,Juergen POMMERENKE,,,,German DR
10266,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,8,Wolfram LOEWE,,O,55,German DR
10268,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,11,Joachim STREICH,,G,72,German DR
10270,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,12,Siegmar WAETZLICH,,Y,1,German DR
10272,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,14,Juergen SPARWASSER,,,,German DR
10274,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,15,Eberhard VOGEL,,Y,1,German DR
10276,1974.0,14 Jun 1974,19:30,Group 1,Volksparkstadion,Hamburg,German DR,2,0,Australia,...,262,GDR,BUSCHNER Georg (GER),S,16,Harald IRMSCHER,,,,German DR


In [30]:
df_matches[df_matches['Stadium'].str.contains("Estadio")]

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
8,1930.0,18 Jul 1930,14:30,Group 3,Estadio Centenario,Montevideo,Uruguay,1,0,Peru,...,57735,0,0,LANGENUS Jean (BEL),BALWAY Thomas (FRA),CRISTOPHE Henry (BEL),201,1099,URU,PER
9,1930.0,19 Jul 1930,12:50,Group 1,Estadio Centenario,Montevideo,Chile,1,0,France,...,2000,0,0,TEJADA Anibal (URU),LOMBARDI Domingo (URU),REGO Gilberto (BRA),201,1094,CHI,FRA
10,1930.0,19 Jul 1930,15:00,Group 1,Estadio Centenario,Montevideo,Argentina,6,3,Mexico,...,42100,3,1,SAUCEDO Ulises (BOL),ALONSO Gualberto (URU),RADULESCU Constantin (ROU),201,1086,ARG,MEX
11,1930.0,20 Jul 1930,13:00,Group 2,Estadio Centenario,Montevideo,Brazil,4,0,Bolivia,...,25466,1,0,BALWAY Thomas (FRA),MATEUCCI Francisco (URU),VALLEJO Gaspar (MEX),201,1091,BRA,BOL
12,1930.0,20 Jul 1930,15:00,Group 4,Estadio Centenario,Montevideo,Paraguay,1,0,Belgium,...,12000,1,0,VALLARINO Ricardo (URU),MACIAS Jose (ARG),LOMBARDI Domingo (URU),201,1089,PAR,BEL
13,1930.0,21 Jul 1930,14:50,Group 3,Estadio Centenario,Montevideo,Uruguay,4,0,Romania,...,70022,4,0,REGO Gilberto (BRA),WARNKEN Alberto (CHI),SAUCEDO Ulises (BOL),201,1100,URU,ROU
14,1930.0,22 Jul 1930,14:45,Group 1,Estadio Centenario,Montevideo,Argentina,3,1,Chile,...,41459,2,1,LANGENUS Jean (BEL),CRISTOPHE Henry (BEL),SAUCEDO Ulises (BOL),201,1084,ARG,CHI
15,1930.0,26 Jul 1930,14:45,Semi-finals,Estadio Centenario,Montevideo,Argentina,6,1,USA,...,72886,1,0,LANGENUS Jean (BEL),VALLEJO Gaspar (MEX),WARNKEN Alberto (CHI),202,1088,ARG,USA
16,1930.0,27 Jul 1930,14:45,Semi-finals,Estadio Centenario,Montevideo,Uruguay,6,1,Yugoslavia,...,79867,3,1,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),BALWAY Thomas (FRA),202,1101,URU,YUG
17,1930.0,30 Jul 1930,14:15,Final,Estadio Centenario,Montevideo,Uruguay,4,2,Argentina,...,68346,1,2,LANGENUS Jean (BEL),SAUCEDO Ulises (BOL),CRISTOPHE Henry (BEL),405,1087,URU,ARG


In [31]:
df_matches_players['Event Type'].value_counts()

       28225
I       2560
O       2538
Y       2187
G       2163
IH       290
OH       288
P        175
R        117
RSY       51
W         41
MP        11
Name: Event Type, dtype: int64

In [32]:
df_matches_players[df_matches_players['Event Type'] == 'Y']

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,RoundID_y,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At,Team Name
2980,1950.0,13 Jul 1950,15:00,Group 6,Maracan�,Rio De Janeiro,Brazil,6,1,Spain,...,209,BRA,COSTA Flavio (BRA),S,0,BIGODE,,Y,1,Brazil
5985,1962.0,30 May 1962,15:00,Group 4,Estadio El Teniente-Codelco,Rancagua,Argentina,1,0,Bulgaria,...,231,ARG,LORENZO Juan Carlos (ARG),S,15,Ruben NAVARRO,C,Y,1,Argentina
7401,1966.0,12 Jul 1966,19:30,Group 3,Goodison Park,Liverpool,Brazil,2,0,Bulgaria,...,238,BUL,VYTLACIL Rudolf (TCH),S,6,Dobromir JECHEV,,Y,1,Bulgaria
7402,1966.0,12 Jul 1966,19:30,Group 3,Goodison Park,Liverpool,Brazil,2,0,Bulgaria,...,238,BRA,FEOLA Vicente (BRA),S,13,DENILSON,,Y,1,Brazil
7409,1966.0,12 Jul 1966,19:30,Group 3,Goodison Park,Liverpool,Brazil,2,0,Bulgaria,...,238,BUL,VYTLACIL Rudolf (TCH),S,11,Ivan KOLEV,,Y,1,Bulgaria
7450,1966.0,12 Jul 1966,19:30,Group 4,Ayresome Park,Middlesbrough,Soviet Union,3,0,North Korea,...,238,URS,MOROZOV Nikolai (URS),S,15,Galimzyan KHUSAINOV,,Y,1,Soviet Union
7802,1966.0,15 Jul 1966,19:30,Group 4,Ayresome Park,Middlesbrough,North Korea,1,1,Chile,...,238,CHI,ALAMOS Luis (CHI),S,12,Ruben MARCOS,,Y,1,Chile
7847,1966.0,16 Jul 1966,15:00,Group 3,Old Trafford Stadium,Manchester,Portugal,3,0,Bulgaria,...,238,BUL,VYTLACIL Rudolf (TCH),S,7,Dinko DERMENDZHIEV,,Y,1,Bulgaria
7848,1966.0,16 Jul 1966,15:00,Group 3,Old Trafford Stadium,Manchester,Portugal,3,0,Bulgaria,...,238,POR,GLORIA Otto (BRA),S,13,EUSEBIO (Eusebio da Silva Ferreira),,Y,1,Portugal
7885,1966.0,16 Jul 1966,15:00,Group 2,Villa Park,Birmingham,Germany,0,0,Argentina,...,238,FRG,SCHOEN Helmut (FRG),S,4,Franz BECKENBAUER,,Y,1,Germany


In [33]:
df_players[df_players["Event Type"] == "P"]["Event Type"].count()

175

In [34]:
df2 = df_matches_players[df_matches_players['Stadium'].str.contains("Maracan")]
# df2[df2['Event Type'] == 'Y']['Event Type'].value_counts()
df2

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,...,RoundID_y,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event Type,Event At,Team Name
2203,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,BARBOSA,GK,,,Brazil
2204,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,MEX,VIAL Octavio (MEX),S,0,Antonio CARBAJAL,GK,,,Mexico
2205,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,ADEMIR,,G,30,Brazil
2206,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,ADEMIR,,G,79,Brazil
2207,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,MEX,VIAL Octavio (MEX),S,0,Alfonso MONTEMAYOR,C,,,Mexico
2208,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,AUGUSTO,C,,,Brazil
2209,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,MEX,VIAL Octavio (MEX),S,0,Mario OCHOA,,,,Mexico
2210,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,FRIACA,,,,Brazil
2211,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,MEX,VIAL Octavio (MEX),S,0,Hector ORTIZ,,,,Mexico
2212,1950.0,24 Jun 1950,15:00,Group 1,Maracan�,Rio De Janeiro,Brazil,4,0,Mexico,...,208,BRA,COSTA Flavio (BRA),S,0,JUVENAL,,,,Brazil


In [43]:
df_matches_players[df_matches_players['Stadium'].str.contains('Estadio Cente')]['Event Type'].value_counts()

       365
G       43
P        1
Y        0
W        0
RSY      0
R        0
OH       0
O        0
MP       0
IH       0
I        0
Name: Event Type, dtype: int64

In [47]:
df_matches_players[df_matches_players['City'].str.contains('Montevi')]['MatchID'].unique()

array([1096, 1090, 1093, 1098, 1085, 1095, 1092, 1097, 1099, 1094, 1086,
       1091, 1089, 1100, 1084, 1088, 1101, 1087])

In [48]:
df_matches_players[df_matches_players['City'].str.contains('Montevi')]['Stadium'].unique()

[Pocitos, Parque Central, Estadio Centenario]
Categories (3, object): [Pocitos, Parque Central, Estadio Centenario]

In [53]:
df_matches_players[df_matches_players['Year'] == 1934]['Event Type'].value_counts()

       646
G       67
P        3
R        1
Y        0
W        0
RSY      0
OH       0
O        0
MP       0
IH       0
I        0
Name: Event Type, dtype: int64

In [85]:
df4 = df_matches_players[(df_matches_players['Event Type'] == "Y") & (df_matches_players['Event At'] == "1")]

df4.sort_values('Time', ascending = False).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 122 entries, 15469 to 9360
Data columns (total 31 columns):
Year                    122 non-null category
Date                    122 non-null category
Time                    122 non-null category
Stage                   122 non-null category
Stadium                 122 non-null category
City                    122 non-null category
Home Team Name          122 non-null category
Home Team Goals         122 non-null int64
Away Team Goals         122 non-null int64
Away Team Name          122 non-null category
Win conditions          122 non-null category
Attendance              122 non-null int64
Half-time Home Goals    122 non-null int64
Half-time Away Goals    122 non-null int64
Referee                 122 non-null category
Assistant 1             122 non-null category
Assistant 2             122 non-null category
RoundID_x               122 non-null int64
MatchID                 122 non-null int64
Home Team Initials      122 non-null 

In [65]:
df5 = df_players[(df_players['Event Type'] == "G") & (df_players['Event At'] == "1")]
df5["Player Name"].value_counts()

Arne NYBERG            1
Bernard LACOMBE        1
PAK Seung Zin          1
Bryan ROBSON           1
Emile VEINANTE         1
                      ..
Milko GAIDARSKI        0
Milorad ARSENIJEVIC    0
Milorad MILUTINOVIC    0
Milos HRSTIC           0
?URI?I?                0
Name: Player Name, Length: 7663, dtype: int64