# Introduction

The World DanceSport Federation (WDSF) website contains information about world dance competitions beginning from 1996. The goal of this project is to use the data from the WDSF website to examine why certain competitors perform better than others. The project will proceed as follows:

    1. Extract competition data from the WDSF API.
    2. Convert data into clean data frames in pandas.
    3. Look at the basic properties of the dataset.
    4. Investigate if there are factors related to competition performance. 

# Part 1: Extracting Data from the API

For each competition I want to extract the competition IDs, participant IDs, participant names, participant countries, overall participant ranks, and participant scores for each round.

In [1]:
# import all necessary packages
import requests
import time
import csv


https://services.worlddancesport.org/api/1/competition contains the following information for each competition: a link, an ID, a name, and the last modified date. I will use this URL to get the competition IDs and store all of the IDs into a list. The competitions IDs can then be used to access the URLs that contain the participant IDs.

In [6]:
def get_competitions():
    """
    get information from the competition page of the API
    """
    
    comp_url = "https://services.worlddancesport.org/api/1/competition"
    params = {'format':'json'}
    
    ## request to competition information page
    r = requests.get(comp_url, params=params, auth=(username, password)
    
    return r

In [10]:
response = get_competitions()

In [13]:
js = response.json() 
print(js.__repr__()[0:1000])

[{'link': [{'href': 'https://services.worlddancesport.org/api/1/competition/35093', 'rel': 'self'}, {'href': 'https://services.worlddancesport.org/api/1/participant?competitionId=35093', 'rel': 'http://services.worlddancesport.org/rel/competition/participants', 'type': 'application/vnd.worlddancesport.participants+json'}, {'href': 'https://services.worlddancesport.org/api/1/official?competitionId=35093', 'rel': 'http://services.worlddancesport.org/rel/competition/officials', 'type': 'application/vnd.worlddancesport.officials+json'}], 'id': 35093, 'name': 'INTERNATIONAL OPEN STANDARD  ADULT - Finland - Finland - 1996/01/04', 'lastmodifiedDate': '2010-11-13T10:16:26'}, {'link': [{'href': 'https://services.worlddancesport.org/api/1/competition/35094', 'rel': 'self'}, {'href': 'https://services.worlddancesport.org/api/1/participant?competitionId=35094', 'rel': 'http://services.worlddancesport.org/rel/competition/participants', 'type': 'application/vnd.worlddancesport.participants+json'}, {

The get_competitions() function returned a list of dictionaries. Each item in the list corresponds to a single competition.

In [23]:
# js is a list
# js[x] is a dict

js[2].keys()

dict_keys(['link', 'id', 'name', 'lastmodifiedDate'])

In [33]:
len(js)

21123

There are 21,123 competitions.

In [36]:
# make a list of competition IDs

compID_list = []

for i in range(21123):
    compID_list.append(js[i]['id'])

In [45]:
# write the competition IDs into a csv file

import csv

with open('competitionID.csv', 'w') as csvfile:
    writer = csv.writer(csvfile, delimiter=',')
    writer.writerow(compID_list)

The list of competition IDs can be used to extract the participant ID for each participant.

In [84]:
def get_participantIDs(compID_list):
    """
    Get participant IDs:
    Use the competition ID list (compID_list) to get URLs for the participants page in each competion.
    Then extract the participant IDs.
    """
    
    url = "https://services.worlddancesport.org/api/1/participant?competitionId="
    params = {'format':'json'}
    partID_list = []
    
    #loop through compID_list to get participant IDs for each competition
    for i in range(len(compID_list)):
        compID = compID_list[i]
        compID = str(compID)
        res = requests.get(url+compID, params=params, auth=(username, password))
        res = res.json()
        for j in range(len(res)):
            partID_list = partID_list.append(res[j]['id'])
        time.sleep(.1)
        
    return partID_list

In [59]:
# write participant IDs into a CSV file

with open('participantID.csv', 'w') as csvfile:
    writer = csv.writer(csvfile, delimiter=',')
    writer.writerow(partID_list)

In [54]:
partID_list = get_participantIDs(compID_list)
len(partID_list)

672128

There are 672,128 participant IDs. Each participant ID has a URL that contains information about the participant, such as the name, country, and competition scores. 

In [None]:
# Initializing a list the size of the participant ID list. 
# This list will be used to store information for each participant.

listlength = len(partID_list)
comp_list = [None]*(listlength)

In [83]:
def get_scores(partID_list, listlength, comp_list):
    """
    Extract competition information from the API, including scores.
    
    This function takes in the participant ID list (partID_list), an empty list (comp_list), and an integer that is the 
    length of partID_list (listlength).
    
    This function returns a list containing the information for each competition.
    """  
    
    url = 'https://services.worlddancesport.org/api/1/participant/'
    params = {'format':'json'}
    
    # loop through each participant ID in order to get the URL and extract competition information
    # then add the competition information to a list
    for i in range(listlength):
        partID = partID_list[i]
        partID = str(partID)
        resp = requests.get(url+partID, params=params, auth=(username, password))
        comp_list[i] = resp.json()
        time.sleep(.5)
        
    return comp_list

In [None]:
comp_list = get_scores(partID_list, listlength, comp_list)

In [166]:
# write competition data into a csv file

file = open("competitionstats5.csv", "w", encoding="utf-8")
dict_writer = csv.DictWriter(file, fieldnames=["coupleId","name","country","id","status","basepoints", "rank", "competitionId", "rounds"], 
                        extrasaction='ignore')
dict_writer.writeheader()
dict_writer.writerows(comp_list)

file.close()