# Exploratory Data Analysis
## Paris 2024 Olympic Summer Games

Find the dataset [here](https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games?select=torch_route.csv). It is structured as follows.

```sh
├── results/                                # Results for each discipline            
├── athletes.csv                            # Personal information about all athletes
├── coaches.csv	                            # Personal information about all coaches
├── events.csv	                            # Details about all events that took place
├── medals.csv	                            # All medal holders
├── medals_total.csv                        # Medal counts grouped by country
├── medalists.csv	                        # Information on all medalists
├── nocs.csv	                            # National Olympic Committees (NOCs) codes and countries
├── schedule.csv	                        # Day-by-day schedule of all events
├── schedule_preliminary.csv	            # Preliminary schedule of all events
├── teams.csv	                            # List of all teams participating in the Games
├── schedule_preliminary.csv	            # Preliminary schedule of all events
├── technical_officials.csv	                # Technical officials (referees, judges, jury members)
├── torch_route.csv	                        # Locations of the Olympic torch relay
└── venues.csv	List of all Olympic venues  # List of all Olympic venues
```

First, download it using the following cell.

In [7]:
!kaggle datasets download -d piterfm/paris-2024-olympic-summer-games
!unzip paris-2024-olympic-summer-games.zip

Dataset URL: https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games
License(s): CC-BY-NC-SA-4.0
Downloading paris-2024-olympic-summer-games.zip to /Users/jeanperbet/Projects/dataringz/eda
 68%|█████████████████████████▋            | 2.00M/2.95M [00:00<00:00, 3.30MB/s]
100%|██████████████████████████████████████| 2.95M/2.95M [00:00<00:00, 3.90MB/s]
Archive:  paris-2024-olympic-summer-games.zip
  inflating: athletes.csv            
  inflating: coaches.csv             
  inflating: events.csv              
  inflating: medallists.csv          
  inflating: medals.csv              
  inflating: medals_total.csv        
  inflating: nocs.csv                
  inflating: results/3x3 Basketball.csv  
  inflating: results/Archery.csv     
  inflating: results/Artistic Gymnastics.csv  
  inflating: results/Artistic Swimming.csv  
  inflating: results/Athletics.csv   
  inflating: results/Badminton.csv   
  inflating: results/Basketball.csv  
  inflating: results/Beach Volleyba

In [2]:
import pandas as pd

medals_df = pd.read_csv("./data/medals.csv")

We can also retrieve the number of medals per country & sport.

In [6]:
def get_medal_ranking_by_sport(df, sport):
    """
    Returns a medal ranking table for a given sport.

    Parameters:
    - df: DataFrame containing Olympic results
    - sport: Name of the sport to filter

    Returns:
    - A DataFrame with medal counts grouped by country and ranked
    """
    # Filter by sport
    sport_df = df[df['discipline'] == sport]
    
    # Count medals per country
    medal_counts = sport_df.pivot_table(
        index='country_long',
        columns='medal_type',
        aggfunc='size',
        fill_value=0
    ).rename(columns={'Gold Medal': 'Gold', 'Silver Medal': 'Silver', 'Bronze Medal': 'Bronze'})

    # Ensure all medal types exist
    for medal in ['Gold', 'Silver', 'Bronze']:
        if medal not in medal_counts:
            medal_counts[medal] = 0

    # Compute total medals
    medal_counts['Total'] = medal_counts[['Gold', 'Silver', 'Bronze']].sum(axis=1)
    medal_counts = medal_counts.sort_values(by=['Gold', 'Silver', 'Bronze'], ascending=False)

    return medal_counts[['Gold', 'Silver', 'Bronze', 'Total']]

a = get_medal_ranking_by_sport(medals_df, 'Rugby Sevens')
a

medal_type,Gold,Silver,Bronze,Total
country_long,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
France,1,0,0,1
New Zealand,1,0,0,1
Canada,0,1,0,1
Fiji,0,1,0,1
South Africa,0,0,1,1
United States of America,0,0,1,1
