# Récupération des datasets annexes
Afin de pouvoir effectuer les différentes analyses que nous avons prévues; nous allons avoir besoin de récupérer des données provenant de sources supplémentaires. 
Ainsi, nous allons télécharger les datasets supplémentaires dans `./data/additionnal/` en utilisant ce notebook. 

Nous nous occuperons également de nettoyer les données, afin d'utiliser uniquement ce dont nous avons besoin. 

## Pays organisateurs des JO, de leur création à 2028
Après quelques recherches sur internet, nous avons constaté que ce dataset existait déjà sur Kaggle. Nous avons fait le choix de le récupérer, au lieu d'aller récupérer les données par nous-même au moyen du webscrapping. Nous avons préféré adopter cette stratégie au regard du temps restant. 

Lien du dataset : https://www.kaggle.com/datasets/piterfm/olympic-games-hosts 

Le dataset n'étant disponible qu'au téléchargement direct après connexion à la plateforme, nous l'avons directement ajouté dans le répertoire sus-mentionné. 

In [1]:
# Dépendances
import pandas as pd
import numpy as np
import datetime
from sklearn import impute

### Nettoyage

In [2]:
# Chargement des données dans un dataframe pandas 
df_hosting_countries = pd.read_csv('data/additionnal/olympic_hosts.csv')

In [3]:
df_hosting_countries.head()

Unnamed: 0,Type,GamesUrl,Disciplines,DisciplinesList,Country,Date,Athletes,Countries,Events,City,Year
0,summergames,https://www.olympic.org/athens-1896,10,"['Athletics', 'Cycling Road', 'Cycling Track',...",Greece,06 Apr - 15 Apr,241.0,14.0,43.0,Athens,1896
1,summergames,https://www.olympic.org/paris-1900,20,"['Archery', 'Athletics', 'Basque Pelota', 'Cri...",France,14 May - 28 Oct,997.0,24.0,95.0,Paris,1900
2,summergames,https://www.olympic.org/st-louis-1904,19,"['Archery', 'Athletics', 'Basketball', 'Boxing...",United States of America,01 Jul - 23 Nov,651.0,12.0,95.0,St Louis,1904
3,summergames,https://www.olympic.org/london-1908,25,"['Archery', 'Athletics', 'Boxing', 'Cycling Tr...",Great Britain,27 Apr - 31 Oct,2008.0,22.0,110.0,London,1908
4,summergames,https://www.olympic.org/stockholm-1912,18,"['Athletics', 'Cycling Road', 'Diving', 'Eques...",Sweden,05 May - 27 Jul,2407.0,28.0,102.0,Stockholm,1912


In [4]:
# Nous allons exclure les youthgames
df_hosting_countries_filtered = df_hosting_countries[df_hosting_countries['Type'] != 'youthgames']

In [5]:
# Et retirer les colonnes dont nous n'avons pas besoin (GamesUrl, Disicipines, DisciplinesList, Athletes, Countries, Events, Date)
df_hosting_countries_filtered.drop(columns=['GamesUrl', 'Disciplines', 'DisciplinesList', 'Athletes', 'Countries', 'Events', 'Date'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_hosting_countries_filtered.drop(columns=['GamesUrl', 'Disciplines', 'DisciplinesList', 'Athletes', 'Countries', 'Events', 'Date'], inplace=True)


In [6]:
# Désormais, export du dataframe filtré dans un fichier CSV
df_hosting_countries_filtered.to_csv('data/additionnal/olympic_hosts_filtered.csv', index=False)

## Informations supplémentaires sur les jeux manquants dans le dataframe fourni. 

In [7]:
df_test = pd.read_csv('./data/athlete_events_old.csv')
df_test['Year'].max()

# Il manque donc les données à partir des jeux de 2018

2016

## Récupération des données manquantes

### Chargement et exploration des données récupérées

Pour les jeux de 2020 (été, Tokyo), nous avons trouvé le dataset suivant : https://www.kaggle.com/datasets/piterfm/tokyo-2020-olympics. Après téléchargement des différents datasets, nous le plaçons dans `./data/additionnal/tokyo2020`

In [8]:
df_tokyo_athletes = pd.read_csv('data/additionnal/tokyo2020/athletes.csv')
df_tokyo_coaches = pd.read_csv('data/additionnal/tokyo2020/coaches.csv')
df_tokyo_medals_total = pd.read_csv('data/additionnal/tokyo2020/medals_total.csv')
df_tokyo_medals = pd.read_csv('data/additionnal/tokyo2020/medals.csv')
df_tokyo_technical_officials = pd.read_csv('data/additionnal/tokyo2020/technical_officials.csv')

In [9]:
# athletes.csv
df_tokyo_athletes.head()
# Table de correspondance: 
# Name = name
# Sex = gender
# Age = birthdate - current year (2021)
# Height = height_m (prendre avant le /)
# Team = country
# NOC = country_code
# Games = Tokyo 2020
# Year = 2020
# Season = Summer
# City = Tokyo
# Sport = discipline (créer une table de correspondance?)
# Event = N/A
# Medal = N/A

Unnamed: 0,name,short_name,gender,birth_date,birth_place,birth_country,country,country_code,discipline,discipline_code,residence_place,residence_country,height_m/ft,url
0,AALERUD Katrine,AALERUD K,Female,1994-12-04,VESTBY,Norway,Norway,NOR,Cycling Road,CRD,,,,../../../en/results/cycling-road/athlete-profi...
1,ABAD Nestor,ABAD N,Male,1993-03-29,ALCOI,Spain,Spain,ESP,Artistic Gymnastics,GAR,MADRID,Spain,1.65/5'4'',../../../en/results/artistic-gymnastics/athlet...
2,ABAGNALE Giovanni,ABAGNALE G,Male,1995-01-11,GRAGNANO,Italy,Italy,ITA,Rowing,ROW,SABAUDIA,Italy,1.98/6'5'',../../../en/results/rowing/athlete-profile-n13...
3,ABALDE Alberto,ABALDE A,Male,1995-12-15,FERROL,Spain,Spain,ESP,Basketball,BKB,,,2.00/6'6'',../../../en/results/basketball/athlete-profile...
4,ABALDE Tamara,ABALDE T,Female,1989-02-06,VIGO,Spain,Spain,ESP,Basketball,BKB,,,1.92/6'3'',../../../en/results/basketball/athlete-profile...


In [10]:
# coaches.csv
df_tokyo_coaches.head()
# Ce dataframe ne nous servira pas
del df_tokyo_coaches

In [11]:
# medals_total.csv
df_tokyo_medals_total.head()
# Ce daraframe ne nous servira pas
del df_tokyo_medals_total

In [12]:
# medals.csv
df_tokyo_medals.head()
# Medal = medal_type (retirer "Medal")
# Event = event (à voir)
# Left join avec athletes sur athlete_name 

Unnamed: 0,medal_type,medal_code,medal_date,athlete_short_name,athlete_name,athlete_sex,athlete_link,country_code,discipline_code,event,country,discipline
0,Gold Medal,1,2021-07-24 00:00:00.0,KIM JD,KIM Je Deok,X,../../../en/results/archery/athlete-profile-n1...,KOR,ARC,Mixed Team,Republic of Korea,Archery
1,Gold Medal,1,2021-07-24 00:00:00.0,AN S,AN San,X,../../../en/results/archery/athlete-profile-n1...,KOR,ARC,Mixed Team,Republic of Korea,Archery
2,Silver Medal,2,2021-07-24 00:00:00.0,SCHLOESSER G,SCHLOESSER Gabriela,X,../../../en/results/archery/athlete-profile-n1...,NED,ARC,Mixed Team,Netherlands,Archery
3,Silver Medal,2,2021-07-24 00:00:00.0,WIJLER S,WIJLER Steve,X,../../../en/results/archery/athlete-profile-n1...,NED,ARC,Mixed Team,Netherlands,Archery
4,Bronze Medal,3,2021-07-24 00:00:00.0,ALVAREZ L,ALVAREZ Luis,X,../../../en/results/archery/athlete-profile-n1...,MEX,ARC,Mixed Team,Mexico,Archery


### Jointure des données en suivant la table de correspondance (TODO: faire une table en markdown)

In [13]:
# Création d'un dataframe avec les même colonnes que df_test
df_complete_tokyo2020 = pd.DataFrame(columns=df_test.columns)

# On s'assure qu'il n'y a pas de doublons
df_tokyo_athletes.drop_duplicates(inplace=True)

In [14]:
df_merged_bis = pd.merge(df_tokyo_athletes, df_tokyo_medals, how='left', left_on=['name', 'discipline_code'], right_on=['athlete_name', 'discipline_code'])
df_merged_bis

Unnamed: 0,name,short_name,gender,birth_date,birth_place,birth_country,country_x,country_code_x,discipline_x,discipline_code,...,medal_code,medal_date,athlete_short_name,athlete_name,athlete_sex,athlete_link,country_code_y,event,country_y,discipline_y
0,AALERUD Katrine,AALERUD K,Female,1994-12-04,VESTBY,Norway,Norway,NOR,Cycling Road,CRD,...,,,,,,,,,,
1,ABAD Nestor,ABAD N,Male,1993-03-29,ALCOI,Spain,Spain,ESP,Artistic Gymnastics,GAR,...,,,,,,,,,,
2,ABAGNALE Giovanni,ABAGNALE G,Male,1995-01-11,GRAGNANO,Italy,Italy,ITA,Rowing,ROW,...,,,,,,,,,,
3,ABALDE Alberto,ABALDE A,Male,1995-12-15,FERROL,Spain,Spain,ESP,Basketball,BKB,...,,,,,,,,,,
4,ABALDE Tamara,ABALDE T,Female,1989-02-06,VIGO,Spain,Spain,ESP,Basketball,BKB,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11876,ZWICKER Martin Detlef,ZWICKER MD,Male,1987-02-27,KOTHEN,Germany,Germany,GER,Hockey,HOC,...,,,,,,,,,,
11877,ZWOLINSKA Klaudia,ZWOLINSKA K,Female,1998-12-18,,,Poland,POL,Canoe Slalom,CSL,...,,,,,,,,,,
11878,ZYKOVA Yulia,ZYKOVA Y,Female,1995-11-25,KRASNOYARSK,Russian Federation,ROC,ROC,Shooting,SHO,...,2.0,2021-07-31 00:00:00.0,ZYKOVA Y,ZYKOVA Yulia,W,../../../en/results/shooting/athlete-profile-n...,ROC,50m Rifle 3 Positions Women,ROC,Shooting
11879,ZYUZINA Ekaterina,ZYUZINA E,Female,1996-12-08,LIPETSK,Russian Federation,ROC,ROC,Sailing,SAL,...,,,,,,,,,,


In [15]:
# Etant donné qu'un athlète peut avoir plusieurs médailles, et qu'on observe de nombreux champs vides, nous récupérons les médailles de chaque athlète dont l'attribut discipline 
# est absent. Puis, nous effectuerons une jointure simple, seulement avec le nom de l'athlète, pour récupérer les médailles obtenues par ces derniers.
missing_disciplines = df_tokyo_athletes[df_tokyo_athletes['discipline'].isna()]['name'].to_list()
missing_discipines_with_medals = df_tokyo_medals[df_tokyo_medals['athlete_name'].isin(missing_disciplines)]

In [16]:
athletes_to_be_added = []
# Itération dans df_tokyo_athletes pour ajouter dans athletes_to_be_added, qui sera ensuite ajouté au dataframe df_complete_tokyo2020

for index, row in df_tokyo_athletes.iterrows():
    athlete_name = row['name']
    athlete_sex = str(row['gender'])[0] if type(row['gender'] != float) else np.nan
    athlete_age = (2021 - datetime.date.fromisoformat(str(row['birth_date'])).year) if (type(row['birth_date'] == str) and str(row['birth_date']).lower() != 'nan' ) else np.nan
    athlete_height_in_m = (int(float(row['height_m/ft'].split('/')[0]) *100)) if (type(row['height_m/ft']) != float) else np.nan
    athlete_team = row['country']
    athlete_noc = row['country_code']
    athlete_games = 'Tokyo 2020'
    athlete_year = 2020
    athlete_season = 'Summer'
    athlete_city = 'Tokyo'
    athlete_discipline = row['discipline']
    athletes_to_be_added.append({
        'Name': athlete_name,
        'Sex': athlete_sex, 
        'Age': athlete_age,
        'Height': athlete_height_in_m,
        'Team': athlete_team,
        'NOC': athlete_noc,
        'Games': athlete_games,
        'Year': athlete_year,
        'Season': athlete_season,
        'City': athlete_city,
        'Sport': athlete_discipline,
    })

df_complete_tokyo2020 = df_complete_tokyo2020.from_records(athletes_to_be_added)


In [17]:
df_complete_tokyo2020

Unnamed: 0,Name,Sex,Age,Height,Team,NOC,Games,Year,Season,City,Sport
0,AALERUD Katrine,F,27.0,,Norway,NOR,Tokyo 2020,2020,Summer,Tokyo,Cycling Road
1,ABAD Nestor,M,28.0,165.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Artistic Gymnastics
2,ABAGNALE Giovanni,M,26.0,198.0,Italy,ITA,Tokyo 2020,2020,Summer,Tokyo,Rowing
3,ABALDE Alberto,M,26.0,200.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball
4,ABALDE Tamara,F,32.0,192.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball
...,...,...,...,...,...,...,...,...,...,...,...
11651,ZWICKER Martin Detlef,M,34.0,176.0,Germany,GER,Tokyo 2020,2020,Summer,Tokyo,Hockey
11652,ZWOLINSKA Klaudia,F,23.0,,Poland,POL,Tokyo 2020,2020,Summer,Tokyo,Canoe Slalom
11653,ZYKOVA Yulia,F,26.0,,ROC,ROC,Tokyo 2020,2020,Summer,Tokyo,Shooting
11654,ZYUZINA Ekaterina,F,25.0,,ROC,ROC,Tokyo 2020,2020,Summer,Tokyo,Sailing


In [18]:
# On effectue une jointure droite entre les deux dataframes, de manière à conserver seulement les athlètes avec différentes médailles dans différentes disciplines
df_merged = pd.merge(df_complete_tokyo2020, df_tokyo_medals, left_on=['Name', 'Sport'], right_on=['athlete_name', 'discipline'], how='inner')
df_merged.drop(columns=['medal_code', 'medal_date', 'athlete_short_name',
       'athlete_name', 'athlete_sex', 'athlete_link', 'country_code',
       'discipline_code', 'country', 'discipline'], inplace=True)

df_merged.drop_duplicates(inplace=True)

df_merged['medal_type'] = df_merged['medal_type'].apply(lambda x: x.replace('Medal', '').strip() if type(x) != float else x)
df_merged.rename(columns={'medal_type': 'Medal', 'event': 'Event'}, inplace=True)

In [19]:
# On récupère le nom des médaillés seulement, pour les retirer du dataframe final (qui ne contient pas encore les médailles obtenues par les athlètes), avant de fusionner
# le dataframe des médaillés seulement et celui des athlètes sans médailles
medailles_seulement = df_merged.Name.unique()
df_complete_tokyo2020.drop(df_complete_tokyo2020[df_complete_tokyo2020['Name'].isin(medailles_seulement)].index, inplace=True)
df_complete_tokyo2020 = pd.concat([df_complete_tokyo2020, df_merged])
df_complete_tokyo2020

Unnamed: 0,Name,Sex,Age,Height,Team,NOC,Games,Year,Season,City,Sport,Medal,Event
0,AALERUD Katrine,F,27.0,,Norway,NOR,Tokyo 2020,2020,Summer,Tokyo,Cycling Road,,
1,ABAD Nestor,M,28.0,165.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Artistic Gymnastics,,
2,ABAGNALE Giovanni,M,26.0,198.0,Italy,ITA,Tokyo 2020,2020,Summer,Tokyo,Rowing,,
3,ABALDE Alberto,M,26.0,200.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball,,
4,ABALDE Tamara,F,32.0,192.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2377,ZOU Jingyuan,M,23.0,158.0,People's Republic of China,CHN,Tokyo 2020,2020,Summer,Tokyo,Artistic Gymnastics,Gold,Men's Parallel Bars
2378,ZUBIMENDI Martin,M,22.0,180.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Football,Silver,Men
2379,ZUEV Alexander,M,25.0,193.0,ROC,ROC,Tokyo 2020,2020,Summer,Tokyo,3x3 Basketball,Silver,Men
2380,ZVEREV Alexander,M,24.0,198.0,Germany,GER,Tokyo 2020,2020,Summer,Tokyo,Tennis,Gold,Men's Singles


In [20]:
# On remplace les valeurs 'n' en NaN pour le sexe
df_complete_tokyo2020['Sex'] = df_complete_tokyo2020['Sex'].apply(lambda x: np.NAN if x == 'n' else x)

In [21]:
df_complete_tokyo2020

Unnamed: 0,Name,Sex,Age,Height,Team,NOC,Games,Year,Season,City,Sport,Medal,Event
0,AALERUD Katrine,F,27.0,,Norway,NOR,Tokyo 2020,2020,Summer,Tokyo,Cycling Road,,
1,ABAD Nestor,M,28.0,165.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Artistic Gymnastics,,
2,ABAGNALE Giovanni,M,26.0,198.0,Italy,ITA,Tokyo 2020,2020,Summer,Tokyo,Rowing,,
3,ABALDE Alberto,M,26.0,200.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball,,
4,ABALDE Tamara,F,32.0,192.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Basketball,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2377,ZOU Jingyuan,M,23.0,158.0,People's Republic of China,CHN,Tokyo 2020,2020,Summer,Tokyo,Artistic Gymnastics,Gold,Men's Parallel Bars
2378,ZUBIMENDI Martin,M,22.0,180.0,Spain,ESP,Tokyo 2020,2020,Summer,Tokyo,Football,Silver,Men
2379,ZUEV Alexander,M,25.0,193.0,ROC,ROC,Tokyo 2020,2020,Summer,Tokyo,3x3 Basketball,Silver,Men
2380,ZVEREV Alexander,M,24.0,198.0,Germany,GER,Tokyo 2020,2020,Summer,Tokyo,Tennis,Gold,Men's Singles


Export propre du fichier en csv dans `./data/athlete_events_tokyo_cleaned.csv`

In [22]:
df_complete_tokyo2020.to_csv('./data/athlete_events_tokyo_cleaned.csv', index=False)

Commentaires sur le fichier de Tokyo : pour certains athlètes, on n'a pas le sexe. Compte tenu des restrictions temporelles, nous avons fait le choix de ne pas les scrapper. 

## Ajout des pays organisateurs des JO au dataframe

Fusion avec l'autre fichier `./data/athlete_events_old.csv`

In [23]:
df_old = pd.read_csv('data/athlete_events_old.csv')
final_df = pd.concat([df_complete_tokyo2020, df_old])

Fonction qui retourne le pays organisateur des JO pour chaque ligne

In [24]:
def is_org_country(row):
    country = df_hosting_countries_filtered['Country'].where(df_hosting_countries_filtered['Year'] == row).dropna()
    if (len(country) < 1):
        return None
    else: 
        return country.values[0]

In [25]:
final_df['Country_org'] = final_df['Year'].apply(lambda x: is_org_country(x))

Export du dataframe traité

In [26]:
final_df.to_csv('data/athlete_events.csv', index=False)

## Récupération des données qui nous intéressent pour les pays

In [27]:
# On récupère les données qui nous intéressent à partir de 2000
since2000_data = final_df[final_df['Year'] > 1999]

In [59]:
# On replace NaN pour les absences de médailles. Déjà traité par Noé ? 
since2000_data['Medal']  = since2000_data['Medal'].apply(lambda x: 'NoMedal' if pd.isna(x) else x)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  since2000_data['Medal']  = since2000_data['Medal'].apply(lambda x: 'NoMedal' if pd.isna(x) else x)


In [83]:
# On regroupe par année, event et équipe pour les médailles (sports d'équipe)
since20000_data_teams = pd.DataFrame(since2000_data.groupby(['Team', 'Year', 'Event', 'Medal']).mean(numeric_only=True))
since20000_data_teams.drop(columns=['ID'], inplace=True) # On supprime l'ID qui n'est pas utile ici

In [84]:
since20000_data_teams.index.get_level_values('Medal').unique()

Index(['NoMedal', 'Bronze', 'Silver', 'Gold'], dtype='object', name='Medal')

In [85]:
since20000_data_teams

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Age,Height,Weight
Team,Year,Event,Medal,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,2004,Athletics Men's 100 metres,NoMedal,19.0,168.0,64.0
Afghanistan,2004,Athletics Women's 100 metres,NoMedal,18.0,180.0,56.0
Afghanistan,2004,Boxing Men's Welterweight,NoMedal,19.0,170.0,69.0
Afghanistan,2004,Judo Women's Middleweight,NoMedal,18.0,165.0,70.0
Afghanistan,2004,"Wrestling Men's Featherweight, Freestyle",NoMedal,19.0,,
...,...,...,...,...,...,...
Zimbabwe,2016,Rowing Women's Single Sculls,NoMedal,29.0,175.0,72.0
Zimbabwe,2016,Shooting Men's Double Trap,NoMedal,42.0,182.0,80.0
Zimbabwe,2016,Swimming Men's 100 metres Freestyle,NoMedal,22.0,181.0,84.0
Zimbabwe,2016,Swimming Women's 100 metres Backstroke,NoMedal,32.0,176.0,64.0


In [72]:
# On créée un dataset propre sommant le nombre de médailles par pays/année
medals_per_country = pd.DataFrame(columns=since20000_data_teams.index.get_level_values('Year').sort_values().unique())
medals_per_country.index.name = "Country"

In [123]:
# On itère dans les équipes
for team in since20000_data_teams.index.get_level_values('Team').unique().sort_values():
    print(team)
    # On itère dans les années
    for year in since20000_data_teams.loc[team].index.get_level_values('Year').unique().sort_values():
        print(year)
        med = since20000_data_teams.loc[team, year].index.get_level_values('Medal').value_counts()
        print(med)
        # TODO: Ajout au dataframe

Afghanistan
2004
NoMedal    5
Name: Medal, dtype: int64
2008
NoMedal    3
Bronze     1
Name: Medal, dtype: int64
2012
NoMedal    5
Bronze     1
Name: Medal, dtype: int64
2016
NoMedal    3
Name: Medal, dtype: int64
Albania
2000
NoMedal    5
Name: Medal, dtype: int64
2004
NoMedal    7
Name: Medal, dtype: int64
2006
NoMedal    3
Name: Medal, dtype: int64
2008
NoMedal    12
Name: Medal, dtype: int64
2010
NoMedal    2
Name: Medal, dtype: int64
2012
NoMedal    9
Name: Medal, dtype: int64
2014
NoMedal    2
Name: Medal, dtype: int64
2016
NoMedal    6
Name: Medal, dtype: int64
Algeria
2000
NoMedal    39
Bronze      3
Silver      1
Gold        1
Name: Medal, dtype: int64
2004
NoMedal    57
Name: Medal, dtype: int64
2006
NoMedal    3
Name: Medal, dtype: int64
2008
NoMedal    42
Silver      1
Bronze      1
Name: Medal, dtype: int64
2010
NoMedal    1
Name: Medal, dtype: int64
2012
NoMedal    28
Gold        1
Name: Medal, dtype: int64
2016
NoMedal    48
Silver      2
Name: Medal, dtype: int64
Americ

In [105]:


print(since20000_data_teams.loc['Afghanistan', 2004])

                                                   Age  Height  Weight
Event                                    Medal                        
Athletics Men's 100 metres               NoMedal  19.0   168.0    64.0
Athletics Women's 100 metres             NoMedal  18.0   180.0    56.0
Boxing Men's Welterweight                NoMedal  19.0   170.0    69.0
Judo Women's Middleweight                NoMedal  18.0   165.0    70.0
Wrestling Men's Featherweight, Freestyle NoMedal  19.0     NaN     NaN
