In [1]:
import pandas as pd
import plotly.express as px
import numpy as np

## Interactive map

In [2]:
# Toggling the season here updates all data below
season = 'Winter'

Build dataframe of athletes and their respective games - even though we are reading from a DataFrame with more results, we are dropping the useless columns as soon as possible so we always stay with the simplest data possible.

In [3]:
athlete_df = pd.read_csv('Olympic_Athlete_Event_Results.csv', usecols=['edition', 'athlete'])
athlete_df[['year', 'season']] = athlete_df['edition'].str.split(' ', expand=True, n=1)

athlete_df['year'] = athlete_df['year'].astype(int)

# Remove "Olympics" from "Summer/Winter Olympics"
athlete_df['season'] = athlete_df['season'].str.split(' ', n=1).str[0]

# rename the column to be uniform with the DataFrame we will merge with later
# not necessary, but helpful to simplify our data
athlete_df = athlete_df.rename(columns={'athlete': 'name'})
athlete_df = athlete_df[athlete_df['season'] == season]

athlete_df

Unnamed: 0,edition,name,year,season
519,1994 Winter Olympics,Faauuga Muagututia,1994,Winter
520,1994 Winter Olympics,Brad Kiltz,1994,Winter
554,2022 Winter Olympics,Nathan Crumpton,2022,Winter
1381,2006 Winter Olympics,Erjon Tola,2006,Winter
1382,2006 Winter Olympics,Erjon Tola,2006,Winter
...,...,...,...,...
314902,2022 Winter Olympics,Karlien Sleper,2022,Winter
314903,2022 Winter Olympics,Viktória Čerňanská,2022,Winter
314904,2022 Winter Olympics,Kim Yu-Ran,2022,Winter
314905,2022 Winter Olympics,Jazmine Fenlator-Victorian,2022,Winter


In [4]:
sex_df = pd.read_csv('Olympic_Athlete_Bio.csv', usecols=['name', 'sex', 'country'])
sex_df = sex_df[sex_df['sex'] == 'Female']
sex_df

Unnamed: 0,name,sex,country
2,Nathalie Wunderlich,Female,Switzerland
8,Taeko Kubo,Female,Japan
13,Dannette Leininger,Female,United States
14,Nanna Skodborg Merrald,Female,Denmark
16,Hannah Afriyie,Female,Ghana
...,...,...,...
155015,Catarina Lindqvist,Female,Sweden
155016,Yevheniya Filanenko,Female,Ukraine
155018,Frances Schroth,Female,United States
155023,Miyu Nagaoka,Female,Japan


We use `df.merge()` on the names, and immediately drop duplicate rows because merging will give us a row for every name in `athlete_df` and every name in `sex_df`

In [5]:
df = athlete_df.merge(sex_df, on='name').drop_duplicates().sort_values('year')
df

Unnamed: 0,edition,name,year,season,sex,country
14497,1924 Winter Olympics,Olga Przedrzymirska,1924,Winter,Female,Poland
16703,1924 Winter Olympics,Svea Norén,1924,Winter,Female,Sweden
15836,1924 Winter Olympics,Z. Pandaković,1924,Winter,Female,Yugoslavia
2023,1924 Winter Olympics,Helene Engelmann,1924,Winter,Female,Austria
2022,1924 Winter Olympics,Herma Planck-Szabo,1924,Winter,Female,Austria
...,...,...,...,...,...,...
7321,2022 Winter Olympics,Sofie Krehl,2022,Winter,Female,Germany
7324,2022 Winter Olympics,Pia Fink,2022,Winter,Female,Germany
7327,2022 Winter Olympics,Coletta Rydzek,2022,Winter,Female,Germany
7298,2022 Winter Olympics,Jana Fischer,2022,Winter,Female,Germany


Then, let's count the number of country appearances. We group the merged `df` by year and country (in our case, the order doesn't matter), then count the number of rows in each group. 

In [6]:
appearances = df.groupby(['year', 'country']).size().reset_index(name='country_appearances')
df = df.merge(appearances, on=['year', 'country'])
df

Unnamed: 0,edition,name,year,season,sex,country,country_appearances
0,1924 Winter Olympics,Olga Przedrzymirska,1924,Winter,Female,Poland,1
1,1924 Winter Olympics,Svea Norén,1924,Winter,Female,Sweden,1
2,1924 Winter Olympics,Z. Pandaković,1924,Winter,Female,Yugoslavia,1
3,1924 Winter Olympics,Helene Engelmann,1924,Winter,Female,Austria,2
4,1924 Winter Olympics,Herma Planck-Szabo,1924,Winter,Female,Austria,2
...,...,...,...,...,...,...,...
10288,2022 Winter Olympics,Coletta Rydzek,2022,Winter,Female,Germany,48
10289,2022 Winter Olympics,Jana Fischer,2022,Winter,Female,Germany,48
10290,2022 Winter Olympics,Nevena Ignjatović,2022,Winter,Female,Serbia,1
10291,2022 Winter Olympics,Anna Torsani,2022,Winter,Female,San Marino,1


At this point if you are happy with `df`, you may want to save df with `df.to_csv()`. That way, you can load that in without the preprocessing above.

Now, let's start plotting. I added `powspace()` so we have some control over our colorbar. The argument to pay attention to is `power`: increasing this will give small numbers (i.e. the countries with fewer appearances) a more intense color, drawing more attention to them and making them look closer to the countries with large appearances.

If you set `power=1`, this is equivalent to linear spacing. If you set something ridiculously high like `power=10`, almost the entire map will look like they've appeared in the Olympics a bunch of times.

In [7]:
def powspace(start, stop, power, num):
    '''
    start: first endpoint of resulting array
    stop: last endpoint of resulting array
    power: power to use when spacing out points in array
    num: number of points in resulting array
    '''
    start = np.power(start, 1/float(power))
    stop = np.power(stop, 1/float(power))
    return np.power(np.linspace(start, stop, num=num), power)

Below is a custom definition of our colorbar, which allows us some nice nonlinear scaling. It defines `colormap_vals`, which is a list of `[(val0, color0), (val1, color1), ...]`. Feel free to adj

In [8]:
#fro the least to the highest number appr=e
colorbar_range = df['country_appearances'].min(), df['country_appearances'].max()

# Pick some thematic color scheme
colors = px.colors.sequential.Darkmint if season == 'Winter' else px.colors.sequential.OrRd

colormap_vals = powspace(start=0, stop=1, power=3, num=len(colors) - 1)
colormap_vals = [(0, colors[0]), *[(colormap_vals[i], colors[i + 1]) for i in range(len(colormap_vals))]]

In [12]:
fig = px.choropleth(
    df,
    locations="country",
    locationmode='country names',
    color='country_appearances',
    projection='natural earth',
    animation_frame='year',
    title=f'Women {season} Olympics Participation timelapse',
    color_continuous_scale=colormap_vals,
    range_color=colorbar_range)

fig.show() 