Here you should:


*   upload medals.csv from Paris_2024 datasets.
*   upload athletes.csv from Paris_2024 datasets.
*   upload tokyo_results.csv from tokyo_2020 datasets.

They are required in this notebook for applying operations of cleaning and fixing.
After running notebook download tokyo_medals.csv and paris_medals.csv

# **Imports**

In [None]:
import pandas as pd
from datetime import datetime

# **Load data files**

In [None]:
paris=pd.read_csv('medals.csv')
tokyo = pd.read_csv('tokyo_results.csv',encoding='ISO-8859-1')
paris_athletes = pd.read_csv('athletes.csv')

# **Clean tokyo csv file**

In [None]:
#Here we remove rows with missing values in the 'Medal' column in tokyo_results.csv
cleaned = tokyo.dropna(subset=['Medal'])

#Here we save cleaned dataframe as csv called tokyo_medals.csv
cleaned.to_csv('tokyo_medals.csv', index=False)

# **Make medals type same for both datasets**

In [None]:
#Here we replace detailed medal names with simpler ones in paris dataframe
paris['medal_type'] = paris['medal_type'].replace({
    'Gold Medal': 'Gold',
    'Silver Medal': 'Silver',
    'Bronze Medal': 'Bronze'
})

#Here we save updated dataframe as csv called paris_med.csv
paris.to_csv('paris_med.csv', index=False)

# **Calculating age of paris 2024 olympics athletes**

In [None]:
#Here we convert 'birth_date' column in the paris_athletes dataFrame to datetime format
paris_athletes['birth_date'] = pd.to_datetime(paris_athletes['birth_date'])

#Here we define the start date of Paris 2024 Olympics
olympics_start_date = datetime(2024, 7, 26)

#Here we made a function to calculate age on a specific date
def calculate_age_on_date(birth_date, on_date):
    age = on_date.year - birth_date.year - ((on_date.month, on_date.day) < (birth_date.month, birth_date.day))
    return age

#Here we apply the age calculation function to the 'birth_date' column and create a new column called 'age'
paris_athletes['age'] = paris_athletes['birth_date'].apply(lambda bd: calculate_age_on_date(bd, olympics_start_date))

#Here we saved updated dataframe as csv file called athletes_with_age.csv
paris_athletes.to_csv('athletes_with_age.csv', index=False)

# **Add age column to paris_medals dataset**

In [None]:
#Here we load updated datasets
athletes_up = pd.read_csv('athletes_with_age.csv')
paris_up = pd.read_csv('paris_med.csv')

#Here we convert 'code' columns in both datasets to string type for consistency in merging
athletes_up['code'] = athletes_up['code'].astype(str)
paris_up['code'] = paris_up['code'].astype(str)

#Here we combine medal data in paris_med.csv with athletes age in athletes_with_age.csv by joining them by code column.
merged = paris_up.merge(athletes_up[['code', 'age']], on='code', how='left')

#Here we save our merged dataframe as csv file called paris_medals.csv
merged.to_csv('paris_medals.csv', index=False)