# 50 years of swimming

This is a tool to use the hidden API of [World Aquatics](https://www.worldaquatics.com/) (formerly known as FINA), the international federation for water sports.

In [8]:
import requests
import json
import pandas as pd
import time

##  Get the data 1: Common races

As a first thing, I want to get the historical data of the top-50 times for the common races of all four strokes: 50, 100 and 200 meters. I will limit my search to only include time made in Olympic-size pools (50 meters), and to only the past 50 years.

Because the data I am searching is 50 results, for 12 races among 2 genders, I expect the df to be 1,200 rows per year. With 51 years, this equals to 61,200

In [5]:
endpoint = 'https://api.worldaquatics.com/fina/rankings/swimming?'
genders = ['F','M']
distances = ['50', '100', '200']
strokes = ['FREESTYLE', 'BACKSTROKE', 'BREASTSTROKE', 'BUTTERFLY']
pool = 'LCM'
years = [year for year in range(1972, 2023)]

page_size = 50 # Change this value to get more or less result per search.

dfs = [] # Create an empty list where I will temporarly store rankings after each search

for year in years:

    for gender in genders:

        for stroke in strokes:
            time.sleep(0.5)

            for distance in distances:

                url = f'{endpoint}gender={gender}&distance={distance}&stroke={stroke}&poolConfiguration={pool}&year={year}&startDate=&endDate=&timesMode=ALL_TIMES&regionId=&countryId=&pageSize={page_size}'
                response = requests.get(url)
                data = response.json()
                
                print(f"Fetching {year} {gender} {stroke} {distance}m")
                
                temp_df = pd.DataFrame(data['swimmingWorldRankings'])
                dfs.append(temp_df)

df = pd.concat(dfs, ignore_index=True) # Add rankings to the main df
df.shape

Fetching 1972 F FREESTYLE 50m
Fetching 1972 F FREESTYLE 100m
Fetching 1972 F FREESTYLE 200m
Fetching 1972 F BACKSTROKE 50m
Fetching 1972 F BACKSTROKE 100m
Fetching 1972 F BACKSTROKE 200m
Fetching 1972 F BREASTSTROKE 50m
Fetching 1972 F BREASTSTROKE 100m
Fetching 1972 F BREASTSTROKE 200m
Fetching 1972 F BUTTERFLY 50m
Fetching 1972 F BUTTERFLY 100m
Fetching 1972 F BUTTERFLY 200m
Fetching 1972 M FREESTYLE 50m
Fetching 1972 M FREESTYLE 100m
Fetching 1972 M FREESTYLE 200m
Fetching 1972 M BACKSTROKE 50m
Fetching 1972 M BACKSTROKE 100m
Fetching 1972 M BACKSTROKE 200m
Fetching 1972 M BREASTSTROKE 50m
Fetching 1972 M BREASTSTROKE 100m
Fetching 1972 M BREASTSTROKE 200m
Fetching 1972 M BUTTERFLY 50m
Fetching 1972 M BUTTERFLY 100m
Fetching 1972 M BUTTERFLY 200m
Fetching 1973 F FREESTYLE 50m
Fetching 1973 F FREESTYLE 100m
Fetching 1973 F FREESTYLE 200m
Fetching 1973 F BACKSTROKE 50m
Fetching 1973 F BACKSTROKE 100m
Fetching 1973 F BACKSTROKE 200m
Fetching 1973 F BREASTSTROKE 50m
Fetching 1973 F BREA

Fetching 1982 M BREASTSTROKE 100m
Fetching 1982 M BREASTSTROKE 200m
Fetching 1982 M BUTTERFLY 50m
Fetching 1982 M BUTTERFLY 100m
Fetching 1982 M BUTTERFLY 200m
Fetching 1983 F FREESTYLE 50m
Fetching 1983 F FREESTYLE 100m
Fetching 1983 F FREESTYLE 200m
Fetching 1983 F BACKSTROKE 50m
Fetching 1983 F BACKSTROKE 100m
Fetching 1983 F BACKSTROKE 200m
Fetching 1983 F BREASTSTROKE 50m
Fetching 1983 F BREASTSTROKE 100m
Fetching 1983 F BREASTSTROKE 200m
Fetching 1983 F BUTTERFLY 50m
Fetching 1983 F BUTTERFLY 100m
Fetching 1983 F BUTTERFLY 200m
Fetching 1983 M FREESTYLE 50m
Fetching 1983 M FREESTYLE 100m
Fetching 1983 M FREESTYLE 200m
Fetching 1983 M BACKSTROKE 50m
Fetching 1983 M BACKSTROKE 100m
Fetching 1983 M BACKSTROKE 200m
Fetching 1983 M BREASTSTROKE 50m
Fetching 1983 M BREASTSTROKE 100m
Fetching 1983 M BREASTSTROKE 200m
Fetching 1983 M BUTTERFLY 50m
Fetching 1983 M BUTTERFLY 100m
Fetching 1983 M BUTTERFLY 200m
Fetching 1984 F FREESTYLE 50m
Fetching 1984 F FREESTYLE 100m
Fetching 1984 F FRE

Fetching 1993 M FREESTYLE 200m
Fetching 1993 M BACKSTROKE 50m
Fetching 1993 M BACKSTROKE 100m
Fetching 1993 M BACKSTROKE 200m
Fetching 1993 M BREASTSTROKE 50m
Fetching 1993 M BREASTSTROKE 100m
Fetching 1993 M BREASTSTROKE 200m
Fetching 1993 M BUTTERFLY 50m
Fetching 1993 M BUTTERFLY 100m
Fetching 1993 M BUTTERFLY 200m
Fetching 1994 F FREESTYLE 50m
Fetching 1994 F FREESTYLE 100m
Fetching 1994 F FREESTYLE 200m
Fetching 1994 F BACKSTROKE 50m
Fetching 1994 F BACKSTROKE 100m
Fetching 1994 F BACKSTROKE 200m
Fetching 1994 F BREASTSTROKE 50m
Fetching 1994 F BREASTSTROKE 100m
Fetching 1994 F BREASTSTROKE 200m
Fetching 1994 F BUTTERFLY 50m
Fetching 1994 F BUTTERFLY 100m
Fetching 1994 F BUTTERFLY 200m
Fetching 1994 M FREESTYLE 50m
Fetching 1994 M FREESTYLE 100m
Fetching 1994 M FREESTYLE 200m
Fetching 1994 M BACKSTROKE 50m
Fetching 1994 M BACKSTROKE 100m
Fetching 1994 M BACKSTROKE 200m
Fetching 1994 M BREASTSTROKE 50m
Fetching 1994 M BREASTSTROKE 100m
Fetching 1994 M BREASTSTROKE 200m
Fetching 1994

Fetching 2004 F BUTTERFLY 50m
Fetching 2004 F BUTTERFLY 100m
Fetching 2004 F BUTTERFLY 200m
Fetching 2004 M FREESTYLE 50m
Fetching 2004 M FREESTYLE 100m
Fetching 2004 M FREESTYLE 200m
Fetching 2004 M BACKSTROKE 50m
Fetching 2004 M BACKSTROKE 100m
Fetching 2004 M BACKSTROKE 200m
Fetching 2004 M BREASTSTROKE 50m
Fetching 2004 M BREASTSTROKE 100m
Fetching 2004 M BREASTSTROKE 200m
Fetching 2004 M BUTTERFLY 50m
Fetching 2004 M BUTTERFLY 100m
Fetching 2004 M BUTTERFLY 200m
Fetching 2005 F FREESTYLE 50m
Fetching 2005 F FREESTYLE 100m
Fetching 2005 F FREESTYLE 200m
Fetching 2005 F BACKSTROKE 50m
Fetching 2005 F BACKSTROKE 100m
Fetching 2005 F BACKSTROKE 200m
Fetching 2005 F BREASTSTROKE 50m
Fetching 2005 F BREASTSTROKE 100m
Fetching 2005 F BREASTSTROKE 200m
Fetching 2005 F BUTTERFLY 50m
Fetching 2005 F BUTTERFLY 100m
Fetching 2005 F BUTTERFLY 200m
Fetching 2005 M FREESTYLE 50m
Fetching 2005 M FREESTYLE 100m
Fetching 2005 M FREESTYLE 200m
Fetching 2005 M BACKSTROKE 50m
Fetching 2005 M BACKSTROK

Fetching 2015 F BACKSTROKE 100m
Fetching 2015 F BACKSTROKE 200m
Fetching 2015 F BREASTSTROKE 50m
Fetching 2015 F BREASTSTROKE 100m
Fetching 2015 F BREASTSTROKE 200m
Fetching 2015 F BUTTERFLY 50m
Fetching 2015 F BUTTERFLY 100m
Fetching 2015 F BUTTERFLY 200m
Fetching 2015 M FREESTYLE 50m
Fetching 2015 M FREESTYLE 100m
Fetching 2015 M FREESTYLE 200m
Fetching 2015 M BACKSTROKE 50m
Fetching 2015 M BACKSTROKE 100m
Fetching 2015 M BACKSTROKE 200m
Fetching 2015 M BREASTSTROKE 50m
Fetching 2015 M BREASTSTROKE 100m
Fetching 2015 M BREASTSTROKE 200m
Fetching 2015 M BUTTERFLY 50m
Fetching 2015 M BUTTERFLY 100m
Fetching 2015 M BUTTERFLY 200m
Fetching 2016 F FREESTYLE 50m
Fetching 2016 F FREESTYLE 100m
Fetching 2016 F FREESTYLE 200m
Fetching 2016 F BACKSTROKE 50m
Fetching 2016 F BACKSTROKE 100m
Fetching 2016 F BACKSTROKE 200m
Fetching 2016 F BREASTSTROKE 50m
Fetching 2016 F BREASTSTROKE 100m
Fetching 2016 F BREASTSTROKE 200m
Fetching 2016 F BUTTERFLY 50m
Fetching 2016 F BUTTERFLY 100m
Fetching 2016 

(47637, 36)

I had 47,637 rows, which is not the 61,200 that I was expecting. Why ?

In [6]:
df.columns

Index(['resultRef', 'rank', 'time', 'order', 'disciplineId',
       'disciplineGroupId', 'splitId', 'eventId', 'personId', 'resultId',
       'records', 'resultDate', 'athleteResultAge', 'stamp', 'finaPoints',
       'tags', 'standard', 'pool', 'disciplineName', 'fullName', 'lastName',
       'firstName', 'eventCity', 'eventName', 'eventCountryCode',
       'eventCountryFlagId', 'dateOfBirth', 'yearOfBirth',
       'participantCountryCode', 'participantCountryName',
       'participantCountryFlagId', 'medalTag', 'club', 'clubName',
       'clubCountryCode', 'heatId'],
      dtype='object')

In [9]:
df['rank'].value_counts()

rank
1     1122
2     1091
5     1086
7     1083
8     1083
4     1078
6     1067
3     1061
16    1036
13    1036
12    1031
10    1028
11    1013
9     1007
14    1006
15    1005
17     994
19     994
24     989
20     988
25     988
18     985
23     984
22     965
28     953
21     953
33     949
27     943
29     933
26     930
31     930
30     925
35     913
34     912
42     907
36     903
32     901
41     893
39     890
37     880
38     864
40     863
45     856
43     854
48     850
44     848
47     828
46     821
49     787
50     631
Name: count, dtype: int64

U-oh, it seems like results are pretty much all over the place. For some races, there seems to be not enough data available.

In [10]:
df['disciplineName'].value_counts()

disciplineName
Men 100 Freestyle         2304
Women 100 Freestyle       2285
Men 200 Freestyle         2283
Men 100 Backstroke        2264
Men 100 Butterfly         2259
Women 100 Backstroke      2252
Women 200 Freestyle       2250
Men 100 Breaststroke      2223
Women 100 Butterfly       2218
Women 100 Breaststroke    2213
Men 200 Backstroke        2193
Men 200 Butterfly         2192
Women 200 Breaststroke    2192
Men 200 Breaststroke      2169
Women 200 Backstroke      2145
Women 200 Butterfly       2141
Men 50 Freestyle          1896
Women 50 Freestyle        1865
Men 50 Backstroke         1462
Women 50 Backstroke       1438
Men 50 Butterfly          1353
Women 50 Butterfly        1352
Men 50 Breaststroke       1344
Women 50 Breaststroke     1344
Name: count, dtype: int64

It definetly seems like that for some races there's no data available. We'll have to keep this mind during our analysis. In the meantime, though, let's get the rest of the data that we need.

## Get the data 2: Freestyle-specific races

In [11]:
# Freestyle-specific races

endpoint = 'https://api.worldaquatics.com/fina/rankings/swimming?'
genders = ['F','M']
distances = ['400', '800', '1500']
strokes = ['FREESTYLE']
pool = 'LCM'
years = [year for year in range(1972, 2023)]
page_size = 50

for year in years:

    for gender in genders:

        for stroke in strokes:
            time.sleep(0.5)

            for distance in distances:

                url = f'{endpoint}gender={gender}&distance={distance}&stroke={stroke}&poolConfiguration={pool}&year={year}&startDate=&endDate=&timesMode=ALL_TIMES&regionId=&countryId=&pageSize={page_size}'
                response = requests.get(url)
                data = response.json()
                print(f"Fetching {year} {gender} {stroke} {distance}m")
                temp_df = pd.DataFrame(data['swimmingWorldRankings'])
                dfs.append(temp_df)

df = pd.concat(dfs, ignore_index=True)
df.shape

Fetching 1972 F FREESTYLE 400m
Fetching 1972 F FREESTYLE 800m
Fetching 1972 F FREESTYLE 1500m
Fetching 1972 M FREESTYLE 400m
Fetching 1972 M FREESTYLE 800m
Fetching 1972 M FREESTYLE 1500m
Fetching 1973 F FREESTYLE 400m
Fetching 1973 F FREESTYLE 800m
Fetching 1973 F FREESTYLE 1500m
Fetching 1973 M FREESTYLE 400m
Fetching 1973 M FREESTYLE 800m
Fetching 1973 M FREESTYLE 1500m
Fetching 1974 F FREESTYLE 400m
Fetching 1974 F FREESTYLE 800m
Fetching 1974 F FREESTYLE 1500m
Fetching 1974 M FREESTYLE 400m
Fetching 1974 M FREESTYLE 800m
Fetching 1974 M FREESTYLE 1500m
Fetching 1975 F FREESTYLE 400m
Fetching 1975 F FREESTYLE 800m
Fetching 1975 F FREESTYLE 1500m
Fetching 1975 M FREESTYLE 400m
Fetching 1975 M FREESTYLE 800m
Fetching 1975 M FREESTYLE 1500m
Fetching 1976 F FREESTYLE 400m
Fetching 1976 F FREESTYLE 800m
Fetching 1976 F FREESTYLE 1500m
Fetching 1976 M FREESTYLE 400m
Fetching 1976 M FREESTYLE 800m
Fetching 1976 M FREESTYLE 1500m
Fetching 1977 F FREESTYLE 400m
Fetching 1977 F FREESTYLE 800

Fetching 2015 M FREESTYLE 800m
Fetching 2015 M FREESTYLE 1500m
Fetching 2016 F FREESTYLE 400m
Fetching 2016 F FREESTYLE 800m
Fetching 2016 F FREESTYLE 1500m
Fetching 2016 M FREESTYLE 400m
Fetching 2016 M FREESTYLE 800m
Fetching 2016 M FREESTYLE 1500m
Fetching 2017 F FREESTYLE 400m
Fetching 2017 F FREESTYLE 800m
Fetching 2017 F FREESTYLE 1500m
Fetching 2017 M FREESTYLE 400m
Fetching 2017 M FREESTYLE 800m
Fetching 2017 M FREESTYLE 1500m
Fetching 2018 F FREESTYLE 400m
Fetching 2018 F FREESTYLE 800m
Fetching 2018 F FREESTYLE 1500m
Fetching 2018 M FREESTYLE 400m
Fetching 2018 M FREESTYLE 800m
Fetching 2018 M FREESTYLE 1500m
Fetching 2019 F FREESTYLE 400m
Fetching 2019 F FREESTYLE 800m
Fetching 2019 F FREESTYLE 1500m
Fetching 2019 M FREESTYLE 400m
Fetching 2019 M FREESTYLE 800m
Fetching 2019 M FREESTYLE 1500m
Fetching 2020 F FREESTYLE 400m
Fetching 2020 F FREESTYLE 800m
Fetching 2020 F FREESTYLE 1500m
Fetching 2020 M FREESTYLE 400m
Fetching 2020 M FREESTYLE 800m
Fetching 2020 M FREESTYLE 150

(58499, 36)

## 1. Get the data 3: Medley-specific races

In [12]:
# Medley-specific races

endpoint = 'https://api.worldaquatics.com/fina/rankings/swimming?'
genders = ['F','M']
distances = ['200','400']
strokes = ['MEDLEY']
pool = 'LCM'
years = [year for year in range(1972, 2023)]
page_size = 50

for year in years:

    for gender in genders:

        for stroke in strokes:
            time.sleep(0.5)

            for distance in distances:

                url = f'{endpoint}gender={gender}&distance={distance}&stroke={stroke}&poolConfiguration={pool}&year={year}&startDate=&endDate=&timesMode=ALL_TIMES&regionId=&countryId=&pageSize={page_size}'
                response = requests.get(url)
                data = response.json()
                print(f"Fetching {year} {gender} {stroke} {distance}m")
                temp_df = pd.DataFrame(data['swimmingWorldRankings'])
                dfs.append(temp_df)

df = pd.concat(dfs, ignore_index=True)
df.shape

Fetching 1972 F MEDLEY 200m
Fetching 1972 F MEDLEY 400m
Fetching 1972 M MEDLEY 200m
Fetching 1972 M MEDLEY 400m
Fetching 1973 F MEDLEY 200m
Fetching 1973 F MEDLEY 400m
Fetching 1973 M MEDLEY 200m
Fetching 1973 M MEDLEY 400m
Fetching 1974 F MEDLEY 200m
Fetching 1974 F MEDLEY 400m
Fetching 1974 M MEDLEY 200m
Fetching 1974 M MEDLEY 400m
Fetching 1975 F MEDLEY 200m
Fetching 1975 F MEDLEY 400m
Fetching 1975 M MEDLEY 200m
Fetching 1975 M MEDLEY 400m
Fetching 1976 F MEDLEY 200m
Fetching 1976 F MEDLEY 400m
Fetching 1976 M MEDLEY 200m
Fetching 1976 M MEDLEY 400m
Fetching 1977 F MEDLEY 200m
Fetching 1977 F MEDLEY 400m
Fetching 1977 M MEDLEY 200m
Fetching 1977 M MEDLEY 400m
Fetching 1978 F MEDLEY 200m
Fetching 1978 F MEDLEY 400m
Fetching 1978 M MEDLEY 200m
Fetching 1978 M MEDLEY 400m
Fetching 1979 F MEDLEY 200m
Fetching 1979 F MEDLEY 400m
Fetching 1979 M MEDLEY 200m
Fetching 1979 M MEDLEY 400m
Fetching 1980 F MEDLEY 200m
Fetching 1980 F MEDLEY 400m
Fetching 1980 M MEDLEY 200m
Fetching 1980 M MEDL

(67013, 36)

In [13]:
df['disciplineName'].value_counts()

disciplineName
Men 100 Freestyle         2304
Women 100 Freestyle       2285
Men 200 Freestyle         2283
Men 100 Backstroke        2264
Men 100 Butterfly         2259
Women 100 Backstroke      2252
Women 200 Freestyle       2250
Men 400 Freestyle         2229
Men 100 Breaststroke      2223
Women 100 Butterfly       2218
Women 100 Breaststroke    2213
Women 400 Freestyle       2207
Men 200 Backstroke        2193
Men 200 Butterfly         2192
Women 200 Breaststroke    2192
Men 200 Breaststroke      2169
Men 400 Medley            2156
Women 200 Backstroke      2145
Women 200 Butterfly       2141
Women 200 Medley          2138
Men 200 Medley            2122
Women 400 Medley          2098
Men 1500 Freestyle        2046
Women 800 Freestyle       1997
Men 50 Freestyle          1896
Women 50 Freestyle        1865
Men 50 Backstroke         1462
Women 50 Backstroke       1438
Men 50 Butterfly          1353
Women 50 Butterfly        1352
Men 50 Breaststroke       1344
Women 50 Breaststroke   

In [None]:
# Save results in a csv file and move to a new notebook for the cleaning
df.to_csv('1_output.csv')