USA Today recently had an article about US women winning a clear majority of the US medals at the Tokyo 2020 Olympics. This lead me to wonder what is the evolution of this trend over time, more specifically what fraction of the gold medals was captured by the US male/female athletes at the various Summer Olympic Games? Are there any clear trends, especially US women taking over the world or US men swiftly abandoning the forefront of the gold medal rankings?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Olympic games up to Tokyo 2020

In [None]:
hosts = pd.read_csv('../input/olympic-games-medals-19862018/olympic_hosts.csv')
hosts.head()

In [None]:
#List of summer games and which year they were held in

is_summer_event = hosts['game_season'] == 'Summer'
summer_events   = hosts['game_slug'][is_summer_event]
summer_years    = hosts['game_year'][is_summer_event]
summer_years    = summer_years.append(pd.Series([2020])) #Later we will add Tokyo 2020 Olympics 
                                                        #from a different data set

In [None]:
#All medals up to 2019
medals = pd.read_csv('../input/olympic-games-medals-19862018/olympic_medals.csv')

#full_discipline_title will allow us to avoid double counting team disciplines
medals['full_discipline_title'] = medals['discipline_title'] + ' ' + medals['event_title']

medals.head()

In [None]:
#Arrays to store fraction of gold medals won by US male/female athletes at each Games
m_ratio = np.zeros(len(summer_events) + 1)   #Plus one for Tokyo 2020 Olympics added later
f_ratio = np.zeros(len(summer_events) + 1)

#Loop over all summer events up to 2019
for idx, event in enumerate(summer_events):
    
    event_filter = medals['slug_game'] == event
    gold_filter  = medals['medal_type'] == 'GOLD'
    m_filter     = medals['event_gender'] == 'Men'
    f_filter     = medals['event_gender'] == 'Women'
    us_filter    = medals['country_3_letter_code'] == 'USA'
    
    #Find all male/female gold winners at the event - all countries / US only
    m_data    = medals[event_filter & gold_filter & m_filter]
    us_m_data = medals[event_filter & gold_filter & m_filter & us_filter]
        
    f_data    = medals[event_filter & gold_filter & f_filter]
    us_f_data = medals[event_filter & gold_filter & f_filter & us_filter]
    
    #Calculate fraction of gold medals won by US athletes of both genders
    #Unique is used to avoid double counting team events where multiple athletes get a gold medal
    num_m_gold    = len(m_data['full_discipline_title'].unique())
    num_f_gold    = len(f_data['full_discipline_title'].unique())
    num_us_m_gold = len(us_m_data['full_discipline_title'].unique())
    num_us_f_gold = len(us_f_data['full_discipline_title'].unique())

    m_ratio[idx] = num_us_m_gold/num_m_gold
    
    if num_f_gold > 0:
        f_ratio[idx] = num_us_f_gold/num_f_gold

# Tokyo 2020 from another dataset

In [None]:
tokyo_medals = pd.read_csv('../input/olympic-tokyo-2020/MedalsByCountryGender.csv')
tokyo_medals.head()

In [None]:
#Filter by gender
tokyo_medals_m = tokyo_medals[tokyo_medals['Gender'] == 'Male']
tokyo_medals_f = tokyo_medals[tokyo_medals['Gender'] == 'Female']

#Calculate fraction of gold medals won by US athletes of both genders
num_m_gold = sum(tokyo_medals_m['Gold'])
num_f_gold = sum(tokyo_medals_f['Gold'])

num_us_m_gold = tokyo_medals_m['Gold'][tokyo_medals_m['NOCCode'] == 'USA'].values[0]
num_us_f_gold = tokyo_medals_f['Gold'][tokyo_medals_f['NOCCode'] == 'USA'].values[0]

m_ratio[-1] = num_us_m_gold / num_m_gold
f_ratio[-1] = num_us_f_gold / num_f_gold

# Plot results

In [None]:
plt.scatter(summer_years, m_ratio, color = 'navy',    label = 'Male')
plt.scatter(summer_years, f_ratio, color = 'crimson', label = 'Female')
plt.ylim([-0.02,1.02])
plt.ylabel('Fraction of gold medals won by US athletes')
plt.xlabel('Year')
plt.legend()
plt.show()

US males pretty consistently won about 20-25% of gold medals up to 1980's, but - with the exception of the 1984 Olympics - their success rate has been steadily dropping since.

US females have been relatively more competitive on the global scene, though the fraction of the gold medals they win is also decreasing over time. Interestingly, there has been a significant uptick in the fraction of gold medals captured by the US female athletes at the 2012 Olympics, but this seems to be  going away and US females are again getting closer and closer to the fraction of the gold medals won by their male counterparts.