# Team 6 - World Cup

![](https://img.fifa.com/image/upload/t_l4/v1543921822/ex1ksdevyxwsgu7rzdv6.jpg)

_For more information about the dataset, read [here](https://www.kaggle.com/abecklas/fifa-world-cup)._

## Your tasks
- Name your team!
- Read the source and do some quick research to understand more about the dataset and its topic
- Clean the data
- Perform Exploratory Data Analysis on the dataset
- Analyze the data more deeply and extract insights
- Visualize your analysis on Google Data Studio
- Present your works in front of the class and guests next Monday

## Submission Guide
- Create a Github repository for your project
- Upload the dataset (.csv file) and the Jupyter Notebook to your Github repository. In the Jupyter Notebook, **include the link to your Google Data Studio report**.
- Submit your works through this [Google Form](https://forms.gle/oxtXpGfS8JapVj3V8).

## Tips for Data Cleaning, Manipulation & Visualization
- Here are some of our tips for Data Cleaning, Manipulation & Visualization. [Click here](https://hackmd.io/cBNV7E6TT2WMliQC-GTw1A)

_____________________________

## Some Hints for This Dataset:
- Is there a way to integrate the data from all 3 datasets?
- It seems like the `winners` dataset doesn't have data of World Cup 2018. Can you Google the relevant information and add it to the dataset using `pandas`?
- The format of some number columns in `matches` dataset doesn't look right.
- Can you seperate the Date and the Time of `Datetime` column in `matches` dataset?
- And more...

# Team 0

**Import**

In [0]:
import numpy as np
import pandas as pd
import re
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")

**Merge 3 files into 1 file**

In [0]:
wc1 = pd.read_csv("https://raw.githubusercontent.com/impossibleno1/MachineLearningCoderSc/master/Datasets/World-cup/matches.csv")
wc2 = pd.read_csv("https://raw.githubusercontent.com/impossibleno1/MachineLearningCoderSc/master/Datasets/World-cup/players.csv")
wc3 = pd.read_csv("https://raw.githubusercontent.com/impossibleno1/MachineLearningCoderSc/master/Datasets/World-cup/winners.csv")

wc = pd.merge(pd.merge(wc1,wc2,on='MatchID'),wc3,on='Year')


**Change name RoundID_x to RoundID**

In [0]:
wc.rename(columns={'RoundID_x': 'RoundID'}, inplace=True) # Selective renaming

**Delete RoundID_y**

In [0]:
wc.drop(columns=['RoundID_y'], inplace=True)

**Test new data**

In [35]:
wc.sample(10)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance_x,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials,Team Initials,Coach Name,Line-up,Shirt Number,Player Name,Position,Event,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance_y
27528,2002.0,15 Jun 2002 - 15:30,Round of 16,Jeju World Cup Stadium,Jeju,Germany,1.0,0.0,Paraguay,,25176.0,0.0,0.0,BATRES Carlos (GUA),CHARLES Curtis (ATG),DANTE Dramane (MLI),43950200.0,43950049.0,GER,PAR,GER,VOELLER Rudi (GER),S,1,KAHN,GKC,,Korea/Japan,Brazil,Germany,Turkey,Korea Republic,161,32,64,2.705.197
37611,2014.0,12 Jul 2014 - 17:00,Play-off for third place,Estadio Nacional,Brasilia,Brazil,0.0,3.0,Netherlands,,68034.0,0.0,2.0,HAIMOUDI Djamel (ALG),ACHIK Redouane (MAR),ETCHIALI Abdelhak (ALG),255957.0,300186502.0,BRA,NED,NED,Louis VAN GAAL (NED),S,8,DE GUZMAN,,Y36',Brazil,Germany,Argentina,Netherlands,Brazil,171,32,64,3.386.810
3180,1954.0,16 Jun 1954 - 18:00,Group 1,Charmilles,Geneva,Brazil,5.0,0.0,Mexico,,13470.0,4.0,0.0,WYSSLING Paul (SUI),SCHONHOLZER Ernest (SUI),DA COSTA VIEIRA Jose (POR),211.0,1249.0,BRA,MEX,MEX,LOPEZ Antonio (ESP),N,1,Antonio CARBAJAL,,,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607
32781,2010.0,22 Jun 2010 - 20:30,Group B,Peter Mokaba Stadium,Polokwane,Greece,0.0,2.0,Argentina,,38891.0,0.0,0.0,Ravshan IRMATOV (UZB),ILYASOV Rafael (UZB),KOCHKAROV Bakhadyr (KGZ),249722.0,300061455.0,GRE,ARG,ARG,MARADONA Diego (ARG),S,8,VERON,,,South Africa,Spain,Netherlands,Germany,Uruguay,145,32,64,3.178.856
17984,1990.0,09 Jun 1990 - 17:00,Group D,Renato Dall Ara,Bologna,"rn"">United Arab Emirates",0.0,2.0,Colombia,,30791.0,0.0,0.0,COURTNEY George (ENG),TAKADA Shizuo (JPN),SNODDY Alan (NIR),322.0,119.0,UAE,COL,COL,MATURANA Francisco (COL),S,16,Arnoldo IGUARAN,,O75',Italy,Germany FR,Argentina,Italy,England,115,24,52,2.516.215
33840,2010.0,03 Jul 2010 - 16:00,Quarter-finals,Cape Town Stadium,Cape Town,Argentina,0.0,4.0,Germany,,64100.0,0.0,1.0,Ravshan IRMATOV (UZB),ILYASOV Rafael (UZB),KOCHKAROV Bakhadyr (KGZ),249718.0,300061505.0,ARG,GER,ARG,MARADONA Diego (ARG),S,10,MESSI,,,South Africa,Spain,Netherlands,Germany,Uruguay,145,32,64,3.178.856
23440,1998.0,19 Jun 1998 - 17:30,Group D,Parc des Princes,Paris,Nigeria,1.0,0.0,Bulgaria,,45500.0,1.0,0.0,SANCHEZ YANTEN Mario (CHI),DIAZ GALVEZ Jorge (CHI),PINTO Arnaldo (BRA),1014.0,8747.0,NGA,BUL,NGA,MILUTINOVIC Bora (YUG),S,11,Garba LAWAL,,,France,France,Brazil,Croatia,Netherlands,171,32,64,2.785.100
23832,1998.0,22 Jun 1998 - 21:00,Group G,Stade Municipal,Toulouse,Romania,2.0,1.0,England,,33500.0,0.0,0.0,BATTA Marc (FRA),POUDEVIGNE Jacques (FRA),SOLDATOS Aristidis Chris (RSA),1014.0,8756.0,ROU,ENG,ROU,IORDANESCU Anghel (ROU),S,9,Viorel MOLDOVAN,,G46' O86',France,France,Brazil,Croatia,Netherlands,171,32,64,2.785.100
31929,2010.0,16 Jun 2010 - 20:30,Group A,Loftus Versfeld Stadium,Tshwane/Pretoria,South Africa,0.0,3.0,Uruguay,,42658.0,0.0,1.0,BUSACCA Massimo (SUI),ARNET Matthias (SUI),BURAGINA Francesco (SUI),249722.0,300061452.0,RSA,URU,URU,TABAREZ Oscar (URU),N,13,W.S.ABREU.G,,,South Africa,Spain,Netherlands,Germany,Uruguay,145,32,64,3.178.856
35041,2014.0,19 Jun 2014 - 13:00,Group C,Estadio Nacional,Brasilia,Colombia,2.0,1.0,C�te d'Ivoire,,68748.0,0.0,0.0,WEBB Howard (ENG),MULLARKEY Michael (ENG),Darren CANN (ENG),255931.0,300186468.0,COL,CIV,CIV,LAMOUCHI Sabri (FRA),S,19,TOURE YAYA,C,,Brazil,Germany,Argentina,Netherlands,Brazil,171,32,64,3.386.810


In [0]:
wc.info()

In [0]:
wc.describe()

In [0]:
wc.isna().sum()

**Find which team won the most**


In [36]:
wc['Year'].unique()

array([1930., 1934., 1938., 1950., 1954., 1958., 1962., 1966., 1970.,
       1974., 1978., 1982., 1986., 1990., 1994., 1998., 2002., 2006.,
       2010., 2014.])

In [50]:
def timeswin(a):
  times = []
  times.append(wc3['Winner'])
  return times.count(times[0])
wc3['AmountWin'] = wc3.apply('Year', axis=1)

TypeError: ignored