# Activity 2: Analyzing Different Scenarios and Generating the Appropriate Visualization

We'll be working with the 120 years of Olympic History dataset acquired by Randi Griffin from https://www.sports-reference.com/ and made available on the GitHub repository of this book. Your assignment is to identify the top five sports based on the largest number of medals awarded in the year 2016, and then perform the following analysis:

1.  Generate a plot indicating the number of medals awarded in each of the top five sports in 2016.
2.  Plot a graph depicting the distribution of the age of medal winners in the top five sports in 2016.
3.  Find out which national teams won the largest number of medals in the top five sports in 2016.
4.  Observe the trend in the average weight of male and female athletes winning in the top five sports in 2016.

## High-Level Steps

1.  Download the dataset and format it as a pandas DataFrame.
2.  Filter the DataFrame to only include the rows corresponding to medal winners from 2016.
3.  Find out the medals awarded in 2016 for each sport.
4.  List the top five sports based on the largest number of medals awarded. Filter the DataFrame one more time to only include the records for the top five sports in 2016.
5.  Generate a bar plot of record counts corresponding to each of the top five sports.
6.  Generate a histogram for the Age feature of all medal winners in the top five sports (2016).
7.  Generate a bar plot indicating how many medals were won by each country's team in the top five sports in 2016.
8.  Generate a bar plot indicating the average weight of players, categorized based on gender, winning in the top five sports in 2016.

In [14]:
## Étape 1 : Importation des bibliothèques et chargement des données

In [17]:
import pandas as pd
import matplotlib.pyplot as plt

# Charger les données
df = pd.read_csv('athlete_events.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'athlete_events.csv'

In [None]:
## Étape 2 : Filtrer les données pour les médaillés de 2016

In [19]:
df_2016 = df[(df['Year'] == 2016) & (df['Medal'].notna())]

NameError: name 'df' is not defined

In [None]:
## Étape 3 : Identifier les 5 sports ayant remporté le plus de médailles

In [None]:
top_sports = df_2016['Sport'].value_counts().nlargest(5).index.tolist()
df_top_sports = df_2016[df_2016['Sport'].isin(top_sports)]
df_top_sports['Sport'].value_counts()

In [None]:
## Étape 4 : Graphique du nombre de médailles par sport

In [None]:
df_top_sports['Sport'].value_counts().plot(kind='bar', figsize=(10,6))
plt.title('Nombre de médailles par sport (Top 5) en 2016')
plt.xlabel('Sport')
plt.ylabel('Nombre de médailles')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
## Étape 5 : Distribution de l'âge des médaillés

In [None]:
df_top_sports['Age'].dropna().hist(bins=20, edgecolor='black', figsize=(10,6))
plt.title("Distribution de l'âge des médaillés (Top 5 sports) en 2016")
plt.xlabel('Âge')
plt.ylabel('Nombre d’athlètes')
plt.tight_layout()
plt.show()

In [None]:
## Étape 6 : Nombre de médailles par pays

In [None]:
df_top_sports['NOC'].value_counts().nlargest(10).plot(kind='bar', figsize=(10,6))
plt.title('Nombre de médailles par pays (Top 10) en 2016')
plt.xlabel('Pays (NOC)')
plt.ylabel('Nombre de médailles')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
## Étape 7 : Poids moyen des athlètes médaillés par sexe

In [None]:
avg_weight_by_gender = df_top_sports.dropna(subset=['Weight']).groupby('Sex')['Weight'].mean()
avg_weight_by_gender.plot(kind='bar', figsize=(6,6))
plt.title('Poids moyen des médaillés en 2016 (par sexe)')
plt.xlabel('Sexe')
plt.ylabel('Poids moyen (kg)')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()