# Team 6 - World Cup

![](https://img.fifa.com/image/upload/t_l4/v1543921822/ex1ksdevyxwsgu7rzdv6.jpg)

_For more information about the dataset, read [here](https://www.kaggle.com/abecklas/fifa-world-cup)._

## Your tasks
- Name your team!
- Read the source and do some quick research to understand more about the dataset and its topic
- Clean the data
- Perform Exploratory Data Analysis on the dataset
- Analyze the data more deeply and extract insights
- Visualize your analysis on Google Data Studio
- Present your works in front of the class and guests next Monday

## Submission Guide
- Create a Github repository for your project
- Upload the dataset (.csv file) and the Jupyter Notebook to your Github repository. In the Jupyter Notebook, **include the link to your Google Data Studio report**.
- Submit your works through this [Google Form](https://forms.gle/oxtXpGfS8JapVj3V8).

## Tips for Data Cleaning, Manipulation & Visualization
- Here are some of our tips for Data Cleaning, Manipulation & Visualization. [Click here](https://hackmd.io/cBNV7E6TT2WMliQC-GTw1A)

_____________________________

## Some Hints for This Dataset:
- Is there a way to integrate the data from all 3 datasets?
- It seems like the `winners` dataset doesn't have data of World Cup 2018. Can you Google the relevant information and add it to the dataset using `pandas`?
- The format of some number columns in `matches` dataset doesn't look right.
- Can you seperate the Date and the Time of `Datetime` column in `matches` dataset?
- And more...

### Import libraries

In [0]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


### Create Data Frames

In [0]:
df_matches_raw = pd.read_csv('/content/gdrive/My Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/matches.csv', encoding='utf-8')

df_players_raw = pd.read_csv('/content/gdrive/My Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/players.csv', encoding='utf-8')

df_winners_raw = pd.read_csv('/content/gdrive/My Drive/PROJECTS/CoderSchool_Fansipan/github_repo/world-cup-da/data/winners.csv', encoding='utf-8')

# Clean Data

In [0]:
df_matches_raw.tail()

In [0]:
df_players_raw.head(2)

In [0]:
df_winners_raw.head(5)

#### Remove NaN

In [0]:
# Remove null rows of Matches dataset
# Get data when RoundID is null
df_matches = df_matches_raw[(df_matches_raw["RoundID"].isnull() == False) & (df_matches_raw["MatchID"].isnull() == False)]

# Find NaN values from the data points
df_matches.isnull().sum()

Year                    0
Datetime                0
Stage                   0
Stadium                 0
City                    0
Home Team Name          0
Home Team Goals         0
Away Team Goals         0
Away Team Name          0
Win conditions          0
Attendance              2
Half-time Home Goals    0
Half-time Away Goals    0
Referee                 0
Assistant 1             0
Assistant 2             0
RoundID                 0
MatchID                 0
Home Team Initials      0
Away Team Initials      0
dtype: int64

In [0]:
# Get the 2 rows with Attendance is NaN
df_matches[df_matches["Attendance"].isnull()]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials


In [0]:
# Clean NaN value
df_matches["Attendance"].fillna(value = 0, inplace = True)
df_matches = df_matches.fillna('')
df_players = df_players.fillna('')
df_winners = df_winners_raw.fillna('')

In [0]:
df_matches.sample(5)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
621,2002.0,13 Jun 2002 - 15:30,Group C,Seoul World Cup Stadium,Seoul,Turkey,3.0,0.0,China PR,,43605.0,2.0,0.0,RUIZ Oscar (COL),TOMUSANGE Ali (UGA),CHARLES Curtis (ATG),43950100.0,43950042.0,TUR,CHN
48,1938.0,14 Jun 1938 - 18:00,Quarter-finals,Stade du Parc Lescure,Bordeaux,Brazil,2.0,1.0,Czechoslovakia,,18141.0,0.0,1.0,CAPDEVILLE Georges (FRA),MARENCO Paul (FRA),KISSENBERGER Ernest (FRA),429.0,1153.0,BRA,TCH
69,1950.0,09 Jul 1950 - 15:00,Group 6,Pacaembu,Sao Paulo,Uruguay,2.0,2.0,Spain,,44802.0,1.0,2.0,GRIFFITHS Benjamin (WAL),DATTILO Generoso (ITA),ALVAREZ Alfredo (BOL),209.0,1207.0,URU,ESP
460,1990.0,03 Jul 1990 - 20:00,Semi-finals,San Paolo,Naples,Italy,1.0,1.0,Argentina,win on penalties (3 - 4),59978.0,0.0,0.0,VAUTROT Michel (FRA),LISTKIEWICZ Michal (POL),MIKKELSEN Peter (DEN),3464.0,28.0,ITA,ARG
256,1974.0,26 Jun 1974 - 16:00,Group B,Rheinstadion,D�Sseldorf,Yugoslavia,0.0,2.0,Germany FR,,67385.0,0.0,1.0,MARQUES Armando (BRA),ANGONESE Aurelio (ITA),PEREZ NUNEZ Edison A. (PER),263.0,2066.0,YUG,FRG


#### Convert Data Types

In [0]:
df_matches.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 852 entries, 0 to 851
Data columns (total 20 columns):
Year                    852 non-null category
Datetime                852 non-null object
Stage                   852 non-null category
Stadium                 852 non-null category
City                    852 non-null category
Home Team Name          852 non-null category
Home Team Goals         852 non-null int64
Away Team Goals         852 non-null int64
Away Team Name          852 non-null category
Win conditions          852 non-null category
Attendance              850 non-null float64
Half-time Home Goals    852 non-null float64
Half-time Away Goals    852 non-null float64
Referee                 852 non-null object
Assistant 1             852 non-null object
Assistant 2             852 non-null object
RoundID                 852 non-null float64
MatchID                 852 non-null float64
Home Team Initials      852 non-null object
Away Team Initials      852 non-null objec

In [0]:
# Convert data type of Matches dataset
df_matches["Year"] = df_matches["Year"].astype("category")
df_matches["Datetime"] = df_matches["Datetime"].astype("category")
df_matches["Stage"] = df_matches["Stage"].astype("category")
df_matches["Stadium"] = df_matches["Stadium"].astype("category")
df_matches["City"] = df_matches["City"].astype("category")
df_matches["Home Team Name"] = df_matches["Home Team Name"].astype("category")
df_matches["Home Team Goals"] = df_matches["Home Team Goals"].astype("int")
df_matches["Away Team Goals"] = df_matches["Away Team Goals"].astype("int")
df_matches["Away Team Name"] = df_matches["Away Team Name"].astype("category")
df_matches["Win conditions"] = df_matches["Win conditions"].astype("category")
df_matches["Attendance"] = df_matches["Attendance"].astype("int")
df_matches["Half-time Home Goals"] = df_matches["Half-time Home Goals"].astype("int")
df_matches["Half-time Away Goals"] = df_matches["Half-time Away Goals"].astype("int")
df_matches["Referee"] = df_matches["Referee"].astype("category")
df_matches["Assistant 1"] = df_matches["Assistant 1"].astype("category")
df_matches["Assistant 2"] = df_matches["Assistant 2"].astype("category")
df_matches["RoundID"] = df_matches["RoundID"].astype("int")
df_matches["MatchID"] = df_matches["MatchID"].astype("int")
df_matches["Home Team Initials"] = df_matches["Home Team Initials"].astype("category")
df_matches["Away Team Initials"] = df_matches["Away Team Initials"].astype("category")
df_matches.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 852 entries, 0 to 851
Data columns (total 20 columns):
Year                    852 non-null category
Datetime                852 non-null category
Stage                   852 non-null object
Stadium                 852 non-null category
City                    852 non-null category
Home Team Name          852 non-null category
Home Team Goals         852 non-null int64
Away Team Goals         852 non-null int64
Away Team Name          852 non-null category
Win conditions          852 non-null category
Attendance              852 non-null int64
Half-time Home Goals    852 non-null int64
Half-time Away Goals    852 non-null int64
Referee                 852 non-null category
Assistant 1             852 non-null category
Assistant 2             852 non-null category
RoundID                 852 non-null int64
MatchID                 852 non-null int64
Home Team Initials      852 non-null category
Away Team Initials      852 non-null categor

In [0]:
df_players.head(2)

Unnamed: 0,RoundID,MatchID,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event
0,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,
1,201,1096,MEX,LUQUE Juan (MEX),S,0,Oscar BONFIGLIO,GK,


In [0]:
# Convert data type of Players dataset
df_players["Team Initials"] = df_players["Team Initials"].astype("category")
df_players["Coach Name"] = df_players["Coach Name"].astype("category")
df_players["Line-up"] = df_players["Line-up"].astype("category")
df_players["Player Name"] = df_players["Player Name"].astype("category")
df_players["Position"] = df_players["Position"].astype("category")
df_players["Event"] = df_players["Event"].astype("category")

df_players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 37048 entries, 0 to 37047
Data columns (total 9 columns):
RoundID          37048 non-null int64
MatchID          37048 non-null int64
Team Initials    37048 non-null category
Coach Name       37048 non-null category
Line-up          37048 non-null category
Shirt Number     37048 non-null int64
Player Name      37048 non-null category
Position         37048 non-null category
Event            37048 non-null category
dtypes: category(6), int64(3)
memory usage: 1.9 MB


In [0]:
# Process 
df_winners = df_winners_raw.fillna('')
df_winners["Attendance"] = df_winners["Attendance"].str.replace('.', '')
df_winners.head()

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1045246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768607


In [0]:
# Convert data type of Winners dataset
df_winners["Country"] = df_winners["Country"].astype("category")
df_winners["Winner"] = df_winners["Winner"].astype("category")
df_winners["Runners-Up"] = df_winners["Runners-Up"].astype("category")
df_winners["Third"] = df_winners["Third"].astype("category")
df_winners["Fourth"] = df_winners["Fourth"].astype("category")
df_winners["Attendance"] = df_winners["Attendance"].astype("int")

df_winners.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
Year              20 non-null int64
Country           20 non-null category
Winner            20 non-null category
Runners-Up        20 non-null category
Third             20 non-null category
Fourth            20 non-null category
GoalsScored       20 non-null int64
QualifiedTeams    20 non-null int64
MatchesPlayed     20 non-null int64
Attendance        20 non-null int64
dtypes: category(5), int64(5)
memory usage: 4.0 KB


#### Separate Datetime column

In [0]:
# Separate DateTime column
df_matches.insert(loc = 2, column="Time", value="")
df_matches.head(2)

Unnamed: 0,Year,Datetime,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930.0,13 Jul 1930 - 15:00,,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1930.0,13 Jul 1930 - 15:00,,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL


In [0]:
df_matches.loc[:, 'Time'] = df_matches["Datetime"].apply(lambda x: x.split('-')[1].strip())
df_matches.loc[:, 'Datetime'] = df_matches["Datetime"].apply(lambda x: x.split('-')[0].strip())
df_matches.head(2)

Unnamed: 0,Year,Datetime,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1930.0,13 Jul 1930,15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL


In [0]:
# Rename Datetime column
df_matches.rename(columns={'Datetime': 'Date'}, inplace=True)
df_matches.sample(5)

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
32,1934.0,03 Jun 1934,16:30,Semi-finals,Nazionale PNF,Rome,Czechoslovakia,3,1,Germany,,15000,1,0,BARLASSINA Rinaldo (ITA),BERANEK Alois (AUT),ESCARTIN Pedro (ESP),3492,1130,TCH,GER
702,2006.0,01 Jul 2006,17:00,Quarter-finals,"FIFA World Cup Stadium, Gelsenkirchen",Gelsenkirchen,England,0,0,Portugal,Portugal win on penalties (1 - 3),52000,0,0,ELIZONDO Horacio (ARG),GARCIA Dario (ARG),OTERO Rodolfo (ARG),97410300,97410059,ENG,POR
393,1986.0,12 Jun 1986,12:00,Group D,Tecnologico,Monterrey,Algeria,0,3,Spain,,23980,0,1,TAKADA Shizuo (JPN),PICON-ACKONG Edwin (MRI),ESPOSITO Carlos (ARG),308,378,ALG,ESP
575,1998.0,04 Jul 1998,21:00,Quarter-finals,Stade de Gerland,Lyon,Germany,0,3,Croatia,,39100,0,1,PEDERSEN Rune (NOR),NILSSON Mikael (SWE),VAN DEN BROECK Marc (BEL),1025,8783,GER,CRO
498,1994.0,30 Jun 1994,19:30,Group D,Foxboro Stadium,Boston,Greece,0,2,Nigeria,,53001,0,1,MOTTRAM Leslie (SCO),PARK Hae-Yong (KOR),ALVES Paulo Jorge (BRA),337,3083,GRE,NGA
356,1982.0,08 Jul 1982,17:15,Semi-finals,Camp Nou,Barcelona,Poland,0,2,Italy,,50000,0,1,CARDELLINO DE SAN VICENTE Juan (URU),SOCHA David (USA),ARISTIZABAL MURCIA Gilberto (COL),295,996,POL,ITA
217,1970.0,10 Jun 1970,16:00,Group 4,Nou Camp - Estadio Le�n,Leon,Germany FR,3,1,Peru,,17875,3,1,AGUILAR ELIZALDE Abel (MEX),ORTIZ DE MENDIBIL Jose Maria (ESP),SBARDELLA Antonio (ITA),250,1840,FRG,PER
33,1934.0,07 Jun 1934,18:00,Match for third place,Giorgio Ascarelli,Naples,Germany,3,2,Austria,,7000,3,1,CARRARO Albino (ITA),CAIRONI Camillo (ITA),ESCARTIN Pedro (ESP),3491,1105,GER,AUT
192,1966.0,23 Jul 1966,15:00,Quarter-finals,Wembley Stadium,London,England,1,0,Argentina,,90584,0,0,KREITLEIN Rudolf (GER),DIENST Gottfried (SUI),ZSOLT Istvan (HUN),239,1577,ENG,ARG
119,1958.0,15 Jun 1958,19:00,Group 1,Malmo Stadion,Malm�,Germany FR,2,2,Northern Ireland,,21990,1,1,FERNANDES CAMPOS Joaquim (POR),AHLNER Sten (SWE),HELGE Leo (DEN),220,1389,FRG,NIR


In [0]:
# Convert data types for Date & Time columns
df_matches["Date"] = df_matches["Date"].astype("category")
df_matches["Time"] = df_matches["Time"].astype("category")
df_matches.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 852 entries, 0 to 851
Data columns (total 21 columns):
Year                    852 non-null category
Date                    852 non-null category
Time                    852 non-null category
Stage                   852 non-null category
Stadium                 852 non-null category
City                    852 non-null category
Home Team Name          852 non-null category
Home Team Goals         852 non-null int64
Away Team Goals         852 non-null int64
Away Team Name          852 non-null category
Win conditions          852 non-null category
Attendance              852 non-null int64
Half-time Home Goals    852 non-null int64
Half-time Away Goals    852 non-null int64
Referee                 852 non-null category
Assistant 1             852 non-null category
Assistant 2             852 non-null category
RoundID                 852 non-null int64
MatchID                 852 non-null int64
Home Team Initials      852 non-null categ

#### Check data duplication

In [0]:
# Check duplication
df_matches["MatchID"].nunique() == df_matches["MatchID"].count()

True

In [0]:
# Find duplicated rows from the dataset
df_matches_id = df_matches['MatchID']
df_matches[df_matches_id.isin(df_matches_id[df_matches_id.duplicated()])].sort_values("MatchID")

Unnamed: 0,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials


In [0]:
df_matches.drop_duplicates(keep = 'first', inplace = True)

In [0]:
df_players.drop_duplicates(keep = 'first', inplace = True)

# EDA & Feature Engineering

In [0]:
# List of players each season
# def get_players_from_season(year):
#   return df_players[df_players["Year"] == year]

# df_players_2014 = get_players_from_season(2014)

df_pm = pd.merge(df_players, df_matches, how='left', on="MatchID")

df_pm

Unnamed: 0,RoundID_x,MatchID,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event,Year,Date,Time,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID_y,Home Team Initials,Away Team Initials
0,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
1,201,1096,MEX,LUQUE Juan (MEX),S,0,Oscar BONFIGLIO,GK,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
2,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Marcel LANGILLER,,G40',1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
3,201,1096,MEX,LUQUE Juan (MEX),S,0,Juan CARRENO,,G70',1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
4,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Ernest LIBERATI,,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
5,201,1096,MEX,LUQUE Juan (MEX),S,0,Rafael GARZA,C,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
6,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Andre MASCHINOT,,G43' G87',1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
7,201,1096,MEX,LUQUE Juan (MEX),S,0,Hilario LOPEZ,,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
8,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Etienne MATTLER,,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX
9,201,1096,MEX,LUQUE Juan (MEX),S,0,Dionisio MEJIA,,,1930.0,13 Jul 1930,15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,FRA,MEX


In [0]:
# find a list of one team of one WC season


In [0]:
# Add column to indicate if a player played in a winning team
df_players['Cup Won'] = ''
df_players

Unnamed: 0,RoundID,MatchID,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event,Cup Won
0,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Alex THEPOT,GK,,
1,201,1096,MEX,LUQUE Juan (MEX),S,0,Oscar BONFIGLIO,GK,,
2,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Marcel LANGILLER,,G40',
3,201,1096,MEX,LUQUE Juan (MEX),S,0,Juan CARRENO,,G70',
4,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Ernest LIBERATI,,,
5,201,1096,MEX,LUQUE Juan (MEX),S,0,Rafael GARZA,C,,
6,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Andre MASCHINOT,,G43' G87',
7,201,1096,MEX,LUQUE Juan (MEX),S,0,Hilario LOPEZ,,,
8,201,1096,FRA,CAUDRON Raoul (FRA),S,0,Etienne MATTLER,,,
9,201,1096,MEX,LUQUE Juan (MEX),S,0,Dionisio MEJIA,,,
