# Exploratory Data Analysis of FIFA World Cup Matches

This notebook performs an end‑to‑end exploratory data analysis on FIFA World Cup match data from the inaugural 1930 tournament up through the most recent edition. We will load, inspect and clean the data, then dive into trends in match counts, goal scoring, team performances and host‑country effects, using summary statistics and visualizations to surface key insights.

## Objectives

1. Load and inspect the raw match dataset to understand its structure, dimensions and data types.  
2. Clean and preprocess the data: handle missing values, parse dates, standardize column names.  
3. Analyze temporal trends: number of matches per tournament, average goals per match over time.  
4. Examine team performance: top scoring teams, win/draw/loss distributions, stages reached.  
5. Explore host‑country impact: home advantage, goals scored by hosts vs. visitors.  
6. Summarize findings and outline next steps for deeper analysis or modeling.

In [2]:
import pandas as pd

In [12]:
df = pd.read_csv('../data/results_with_shootouts.csv')
mundial = df[df['tournament'] == 'FIFA World Cup']

In [15]:
mundial.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral,match_result,year,winner,first_shooter
1486,1930-07-13,Belgium,United States,0,3,FIFA World Cup,Montevideo,Uruguay,True,Away Win,1930,,
1487,1930-07-13,France,Mexico,4,1,FIFA World Cup,Montevideo,Uruguay,True,Home Win,1930,,
1488,1930-07-14,Brazil,Yugoslavia,1,2,FIFA World Cup,Montevideo,Uruguay,True,Away Win,1930,,
1489,1930-07-14,Peru,Romania,1,3,FIFA World Cup,Montevideo,Uruguay,True,Away Win,1930,,
1490,1930-07-15,Argentina,France,1,0,FIFA World Cup,Montevideo,Uruguay,True,Home Win,1930,,


In [26]:
print(mundial['year'].unique())

[1930 1934 1938 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990
 1994 1998 2002 2006 2010 2014 2018 2022]
