First, Python packages need to be imported for this project.

In [1]:
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
import pandas as pd

Next, within the dataset chronicling all medals from all Olympic Games, those pertaining to 1960 through 2022 need to be selected.



In [2]:
# Initial data dump
col_list = ['discipline_title', 'slug_game', 'event_title', 
            'medal_type', 'country_name', 'country_3_letter_code']
medal_table = pd.read_csv('archive/olympic_medals.csv', usecols = col_list)

# Extract columns for hosting city and year from "slug_game" column
medal_table['City'] = medal_table['slug_game'].str[:-4].str.replace('-', ' ').str.strip().str.title()
medal_table['Year'] = medal_table['slug_game'].str[-4:]

# Format event titles and medal types
medal_table['event_title'] = medal_table['event_title'].str.title()
medal_table['medal_type'] = medal_table['medal_type'].str.lower().str.title()

# Clean medal table with organized data
new_col_list = ['Year', 'City', 'discipline_title', 'event_title', 
                'medal_type', 'country_name', 'country_3_letter_code']
new_medal_table = medal_table[new_col_list]

# Rename cleaned medal columns
new_medal_table = new_medal_table.rename(columns = {
    'discipline_title': 'Sport',
    'event_title': 'Event',
    'medal_type': 'Medal',
    'country_name': 'Country Name',
    'country_3_letter_code': 'Country Code'
})

In [3]:
new_medal_table

Unnamed: 0,Year,City,Sport,Event,Medal,Country Name,Country Code
0,2022,Beijing,Curling,Mixed Doubles,Gold,Italy,ITA
1,2022,Beijing,Curling,Mixed Doubles,Gold,Italy,ITA
2,2022,Beijing,Curling,Mixed Doubles,Silver,Norway,NOR
3,2022,Beijing,Curling,Mixed Doubles,Silver,Norway,NOR
4,2022,Beijing,Curling,Mixed Doubles,Bronze,Sweden,SWE
...,...,...,...,...,...,...,...
21692,1896,Athens,Weightlifting,Heavyweight - One Hand Lift Men,Silver,Denmark,DEN
21693,1896,Athens,Weightlifting,Heavyweight - One Hand Lift Men,Bronze,Greece,GRE
21694,1896,Athens,Weightlifting,Heavyweight - Two Hand Lift Men,Gold,Denmark,DEN
21695,1896,Athens,Weightlifting,Heavyweight - Two Hand Lift Men,Silver,Great Britain,GBR


The table is now compact and readable, but now needs only the rows that pertain to the Winter Olympics games from 1960 to 2022.

1964 &emsp; Innsbruck, Austria

1968 &emsp; Grenoble, France

1972 &emsp; Sapporo, Japan

1976 &emsp; Innsbruck, Austria

1980 &emsp; Lake Placid, New York, USA

1984 &emsp; Sarajevo, Bosnia and Herzegovina (Yugoslavia)

1988 &emsp; Calgary, Alberta, Canada

1992 &emsp; Albertville, France

1994 &emsp; Lillehamer, Norway

1998 &emsp; Nagano, Japan

2002 &emsp;Salt Lake City, Utah, USA

2006 &emsp; Turin, Italy

2010 &emsp; Vancouver, British Columbia, Canada

2014 &emsp; Sochi, Russia

2018 &emsp; Pyeongchang, South Korea

2022 &emsp; Beijing, China

In [4]:
# List the years and cities for the Winter Olympic games of interest
winter_years = ['1964', '1968', '1972', '1976','1980', '1984', '1988', '1992', 
                '1994', '1998', '2002', '2006', '2010', '2014', '2018', '2022']
                
winter_cities = ['Innsbruck', 'Grenoble', 'Sapporo', 'Innsbruck', 'Lake Placid',  'Sarajevo', 
                 'Calgary', 'Albertille', 'Lillehamer', 'Nagano',  'Salt Lake City', 'Turin', 
                 'Vancouver', 'Sochi', 'Pyeongchang', 'Beijing']

# Select the rows based on the above set of games
winter_medal_table = new_medal_table.loc[new_medal_table['Year'].isin(winter_years)]
winter_medal_table = winter_medal_table.loc[winter_medal_table['City'].isin(winter_cities)]

In [5]:
winter_medal_table

Unnamed: 0,Year,City,Sport,Event,Medal,Country Name,Country Code
0,2022,Beijing,Curling,Mixed Doubles,Gold,Italy,ITA
1,2022,Beijing,Curling,Mixed Doubles,Gold,Italy,ITA
2,2022,Beijing,Curling,Mixed Doubles,Silver,Norway,NOR
3,2022,Beijing,Curling,Mixed Doubles,Silver,Norway,NOR
4,2022,Beijing,Curling,Mixed Doubles,Bronze,Sweden,SWE
...,...,...,...,...,...,...,...
16012,1964,Innsbruck,Bobsleigh,Two-Man Men,Bronze,Italy,ITA
16013,1964,Innsbruck,Bobsleigh,Two-Man Men,Bronze,Italy,ITA
16014,1964,Innsbruck,Bobsleigh,Four-Man Men,Gold,Canada,CAN
16015,1964,Innsbruck,Bobsleigh,Four-Man Men,Silver,Austria,AUT


Now with the medal count data aggregated, it shall be important to clean the list of countries below in order to simplify it, since it consists of countries no longer in existence (such as the Soviet Union or Yugoslavia) and those that have for not very long (such as Belarus and Ukraine).

In [6]:
winter_medal_table['Country Name'].value_counts()

Norway                                   274
United States of America                 266
Germany                                  242
Canada                                   200
Austria                                  184
Soviet Union                             180
Switzerland                              135
Netherlands                              134
Sweden                                   129
German Democratic Republic (Germany)     129
Russian Federation                       114
France                                   114
Finland                                  112
Italy                                    112
People's Republic of China                78
Republic of Korea                         69
Japan                                     63
Federal Republic of Germany               48
ROC                                       38
Czech Republic                            33
Great Britain                             23
Slovenia                                  21
Poland    

That is why the three-letter country code is critical for comparing data in the Winter Olympic medal count (1960 - 2022) versus those which are currently evaluated by the World Bank for historic GNI per Capita. 

In [7]:
# Initial data dump (note: data only extends to 2020 and will have to be extrapolated)
WB_country_list = ['Country Name', 'Country Code']
WB_year_list = ['1964', '1968', '1972', '1976','1980', '1984', '1988', '1992', 
                '1994', '1998', '2002', '2006', '2010', '2014', '2018', '2020']
WB_GNI_per_capita = pd.read_csv(
    'API_NY.GNP.PCAP.CD_DS2_en_csv_v2_3889743/API_NY.GNP.PCAP.CD_DS2_en_csv_v2_3889743.csv', 
    skiprows = 4,
    usecols = (WB_country_list + WB_year_list))

# Limit list only to countries that have scored medals in the Winter Olympics
WB_GNI_per_capita = WB_GNI_per_capita.loc[WB_GNI_per_capita['Country Code'] \
                                          .isin(winter_medal_table['Country Code'])]

In [8]:
WB_GNI_per_capita

Unnamed: 0,Country Name,Country Code,1964,1968,1972,1976,1980,1984,1988,1992,1994,1998,2002,2006,2010,2014,2018,2020
13,Australia,AUS,2110.0,2730.0,3890.0,7870.0,10830.0,12030.0,14120.0,18540.0,18860.0,21780.0,20030.0,34150.0,46690.0,65180.0,53070.0,53690.0
14,Austria,AUT,1240.0,1680.0,2760.0,5980.0,11460.0,9270.0,18070.0,24520.0,26140.0,27970.0,24970.0,41460.0,49610.0,50370.0,48950.0,48350.0
17,Belgium,BEL,1670.0,2270.0,3640.0,7580.0,13980.0,8860.0,17010.0,23170.0,25090.0,26530.0,24000.0,39900.0,46950.0,47800.0,46010.0,45750.0
25,Belarus,BLR,,,,,,,,6260.0,5210.0,1560.0,1380.0,3510.0,6150.0,7620.0,5730.0,6360.0
35,Canada,CAN,,,,,,,,,,,23610.0,38510.0,44490.0,52200.0,45080.0,43580.0
40,China,CHN,90.0,90.0,130.0,190.0,220.0,250.0,330.0,390.0,470.0,800.0,1110.0,2060.0,4340.0,7470.0,9540.0,10550.0
54,Czech Republic,CZE,,,,,,,,3360.0,4170.0,6110.0,6650.0,14000.0,19400.0,18900.0,20560.0,22070.0
70,Spain,ESP,640.0,1010.0,1590.0,3430.0,6210.0,4580.0,9410.0,15610.0,14470.0,15440.0,15590.0,27820.0,31880.0,29130.0,29280.0,27360.0
71,Estonia,EST,,,,,,,,,,,4620.0,11500.0,14710.0,19000.0,21300.0,23170.0
75,Finland,FIN,1620.0,2090.0,3170.0,6840.0,11450.0,10790.0,21540.0,24170.0,20170.0,25820.0,25550.0,42800.0,49600.0,49390.0,48160.0,49780.0


Obviously, due to the birth of new nations as well as the emergence of formerly communist ones (such as Poland and Romania) in the world scene result in a lack of GNI per capita data for up until the early 1990s. Even data for established developed countries such as Canada is missing up until the early 2000s, as well as some countries have missing data altogether (such North Korea and Liechtenstein at the present).

In order to conduct a more focused analysis of more recent Winter Olympic trends, it would be far simpler and more accurate to set the beginning of the study to 2002 in order to accommodate the perennial powerhouse nation of Canada as well as eliminate Liechtenstein (which is culturally similar to Austria and Switzerland) and North Korea (which is just a poor version of South Korea) entirely.

In [9]:
# Trimmed down year list
year_list = ['2002', '2006', '2010', '2014', '2018', '2020']

# Limit list only to countries with reliable data as explained above
GNI_per_capita = WB_GNI_per_capita[WB_country_list + year_list].loc[
    (WB_GNI_per_capita['Country Code'] != 'LIE') &
    (WB_GNI_per_capita['Country Code'] != 'PRK')]

In [10]:
GNI_per_capita

Unnamed: 0,Country Name,Country Code,2002,2006,2010,2014,2018,2020
13,Australia,AUS,20030.0,34150.0,46690.0,65180.0,53070.0,53690.0
14,Austria,AUT,24970.0,41460.0,49610.0,50370.0,48950.0,48350.0
17,Belgium,BEL,24000.0,39900.0,46950.0,47800.0,46010.0,45750.0
25,Belarus,BLR,1380.0,3510.0,6150.0,7620.0,5730.0,6360.0
35,Canada,CAN,23610.0,38510.0,44490.0,52200.0,45080.0,43580.0
40,China,CHN,1110.0,2060.0,4340.0,7470.0,9540.0,10550.0
54,Czech Republic,CZE,6650.0,14000.0,19400.0,18900.0,20560.0,22070.0
70,Spain,ESP,15590.0,27820.0,31880.0,29130.0,29280.0,27360.0
71,Estonia,EST,4620.0,11500.0,14710.0,19000.0,21300.0,23170.0
75,Finland,FIN,25550.0,42800.0,49600.0,49390.0,48160.0,49780.0


Now given this streamlined table, the final step shall be to extrapolate the GNI per capita of the respective countries from 2020 to 2022 using the InterpolatedUnivariateSpline feature of the SciPy package. Use a linear curve to account for the fact that these mostly developed countries have a slower economic growth rate compared to that of developing countries.

In [11]:
# Initialize a new column as a list
new_col = []

# Loop through each country to extrapolate to the year 2022
for index, country in enumerate(GNI_per_capita['Country Code'], 0):
    # Given values
    xi = np.array([int(x) for x in year_list])
    yi = np.array(GNI_per_capita.loc[GNI_per_capita['Country Code'] == country][year_list])
    # Positions to inter/extrapolate
    x = np.linspace(2002, 2022)
    # spline order: 1 linear, 2 quadratic, 3 cubic ... 
    order = 1
    # do inter/extrapolation
    s = InterpolatedUnivariateSpline(xi, yi, k = order)
    y = s(x)
    new_col.append(y[(len(y)) - 1])

# Extract columns for hosting city and year from "slug_game" column
GNI_per_capita['2022'] = new_col

# Finally, drop the column for the year 2020
GNI_per_capita = GNI_per_capita.drop(columns = '2020', axis = 1)

In [12]:
GNI_per_capita

Unnamed: 0,Country Name,Country Code,2002,2006,2010,2014,2018,2022
13,Australia,AUS,20030.0,34150.0,46690.0,65180.0,53070.0,54310.0
14,Austria,AUT,24970.0,41460.0,49610.0,50370.0,48950.0,47750.0
17,Belgium,BEL,24000.0,39900.0,46950.0,47800.0,46010.0,45490.0
25,Belarus,BLR,1380.0,3510.0,6150.0,7620.0,5730.0,6990.0
35,Canada,CAN,23610.0,38510.0,44490.0,52200.0,45080.0,42080.0
40,China,CHN,1110.0,2060.0,4340.0,7470.0,9540.0,11560.0
54,Czech Republic,CZE,6650.0,14000.0,19400.0,18900.0,20560.0,23580.0
70,Spain,ESP,15590.0,27820.0,31880.0,29130.0,29280.0,25440.0
71,Estonia,EST,4620.0,11500.0,14710.0,19000.0,21300.0,25040.0
75,Finland,FIN,25550.0,42800.0,49600.0,49390.0,48160.0,51400.0
