# Parsing Data from [datasport.com](https://www.datasport.com/en/)

We use postman to understand the parameters used by the url request, asked for the exercise.

(However, notice that there are equivalent tools for other browser - for instance, for firefox:
http://stackoverflow.com/questions/28997326/postman-addons-like-in-firefox)

In [88]:
# important modules for this HW
import bs4 # doc: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
import requests as rq 
import re
import time
# previous useful modules
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('notebook')

In [89]:
form_source = rq.get("https://www.datasport.com/en/")
form_soup = bs4.BeautifulSoup(form_source.text, "html.parser")
# print(form_soup.prettify())

Let's get all the `select` menus of the page, using the `find_all` method of *BeautifulSoup* which allows to search for all tags of a certain type.

In [90]:
selectors = form_soup.find_all('select')
print(len(selectors))

4


Most importantly, we can find out what each tag is about by printing the its `name` attribute :

In [4]:
for num, s in enumerate(selectors):
    print("Select n°{} : {}".format(num, s.attrs['name'])) # wild french appears...

Select n°0 : etyp
Select n°1 : eventmonth
Select n°2 : eventyear
Select n°3 : eventlocation


In [5]:
for s in selectors:
    options = s.find_all('option')
    options_desc_values = [(o.text, o.attrs['value']) for o in options]
    print(s.attrs['name'] + ':')
    for (d,v) in options_desc_values:
        print("- {} [{}]".format(d,v)) # more french

etyp:
- All [all]
- ---- [all]
- Cross-Country-Skiing [Cross-Country-Skiing]
- Cycling [Cycling]
- Cycling,MTB [Cycling,MTB]
- Cycling,Others [Cycling,Others]
- Duathlon [Duathlon]
- Inline [Inline]
- MTB [MTB]
- MTB,Cycling [MTB,Cycling]
- MTB,Cycling,Others [MTB,Cycling,Others]
- MTB,Others [MTB,Others]
- MTB,X-Hours [MTB,X-Hours]
- Others [Others]
- Others,Inline,Running,MTB [Others,Inline,Running,MTB]
- Running [Running]
- Running,Inline [Running,Inline]
- Running,MTB [Running,MTB]
- Running,MTB,Others [Running,MTB,Others]
- Running,Skiing/Snowboard [Running,Skiing/Snowboard]
- Running,Waffenlauf [Running,Waffenlauf]
- Running,Walking [Running,Walking]
- Running,Walking,MTB [Running,Walking,MTB]
- Running,Walking,Others [Running,Walking,Others]
- Running,X-Hours [Running,X-Hours]
- Skiing/Snowboard [Skiing/Snowboard]
- Triathlon [Triathlon]
- Triathlon,Duathlon [Triathlon,Duathlon]
- Triathlon,Others [Triathlon,Others]
- Waffenlauf [Waffenlauf]
- Walking [Walking]
- X-Hours [X-Hour

## Let's get some data

In order to get started, we can now start collecting the results from the Lausanne marathone, one of the main early event in Switzerland.  

Understand the html of the main page, and __extract the relevant parameters__ to query:

### Pages links

6 test pages.
Change only base_url to decide which page to parse

In [403]:
laus_mar_url = 'https://services.datasport.com/2016/lauf/lamara/'
fri_half_url = 'https://services.datasport.com/2013/lauf/semi-marathon-fribourg/'
german_mar_url='https://services.datasport.com/2014/lauf/grmarathon/'
kapoag_url='https://services.datasport.com/2013/lauf/kapoag/'
laufen_url='https://services.datasport.com/2010/lauf/laufen/'
sommer_url='https://services.datasport.com/2014/lauf/sommer-gommer/'
emme_url='https://services.datasport.com/2010/lauf/emme/'
biel_url='https://services.datasport.com/2009/lauf/bielercross/'
lugano_url='https://services.datasport.com/2010/lauf/stralugano/'
# PARSED PAGE
base_url=laus_mar_url

result_html = rq.get(base_url)

# use BS to get the classes in which the data is devided:

result_soup = bs4.BeautifulSoup(result_html.text, "lxml")
result_font = result_soup.find_all('font')

print('number of categories in the main page:', len(result_font))

number of categories in the main page: 119


In [404]:
# we look for the classements par ordre alphabetique

# FOR THIS IT DOES NOT WORK - category to be got from the category field, not from the pace 
# https://services.datasport.com/2016/lauf/ascona-locarno-marathon/
# https://services.datasport.com/2010/lauf/emme/alfaa.htm

links=[] # It contains all the tables to be parsed
for n_font, font in enumerate(result_font):
    if font.get('size')=='3':
        links_to_process=font.findAll('a')
        for link in links_to_process:
            link=str(link)
            try:
                link=link.split('"')[1]
                if link[:4]=='ALFA':
                    links.append(base_url+'/'+link)
            except:
                pass
        break
links

['https://services.datasport.com/2016/lauf/lamara//ALFAA.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAB.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAC.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAD.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAE.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAF.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAG.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAH.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAI.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAJ.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAK.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAL.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAM.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAN.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFAO.HTM',
 'https://services.datasport.com/2016/lauf/lamara//ALFA

## Get the tables

Query the datasport.com with the right parameters and finally get the __tables__

A lot of checks are done to check if the table structure is standard

### Table format check
The table has to contains these fields according to the language

In [405]:
# There are more fields than that. These are the only the ones that matters
# Important to automatically check if some tables are differently structured
# Impossible to manually check all the tables for all the games.
header_fields_french=[['catégorie'],['rang'],['nom'],['an'],['lieu','pays/lieu'],['équipe'],['pénalité'],['temps'],['retard']]
optional_french=['pénalité']
first_excluded_field_french='doss'
last_field_french='moyenne'
header_fields_german=[['Kategorie'],['Rang'],['Name/Ort','Name'],['Jg'],['Team/Ortschaft','Land/Ort'],['Team'],['Zeit'],['Rückstand']]
optional_german=['Team']
first_excluded_field_german='Stnr'
last_field_german='Schnitt'
header_fields_italian=[['categoria'],['posto'],['nome'],['anno'],['località'],['squadra'],['tempo'],['ritardo']]
optional_italian=['nazione']
first_excluded_field_italian='pett'
last_field_italian='media'

parse_time() will be used both to parse the time fields and to check if a field is a time field or not

In [406]:
def parse_time(time,check_only=False):
    ''' Return a parsing of the time
    '''
    if time.count(',')==0 and not check_only:
        raise()
    time=re.split("[:.,]+",time)
    while len(time)<4:
        time=[0]+time
    hours,minutes,seconds,mseconds=[float(x) for x in time]
    
    if not check_only:
        return (hours,minutes,seconds,mseconds)

process_legend() is a function to check if the table has a standard format

In [407]:
def process_legend(legend):
    ''' Check if the legend is in a compatible format and find the language of the legend
    @return language, if pace is available
    
    The pace is necessary to get the distance of the run if it is not available in the description.
    '''
    
        
    legend=str(legend).split('¦')[0]
    legend=re.sub('<[^>]+>', ' ', legend)
    legend=legend.lstrip()
    
    # check language
    if legend.startswith(header_fields_french[0][0]):
        language='French'
        header_fields=header_fields_french
        first_excluded=first_excluded_field_french
        optional=optional_french
        last_field=last_field_french
    elif legend.startswith(header_fields_german[0][0]):
        language='German'
        header_fields=header_fields_german
        first_excluded=first_excluded_field_german
        optional=optional_german
        last_field=last_field_german
    elif legend.startswith(header_fields_italian[0][0]):
        language='Italian'
        header_fields=header_fields_italian
        first_excluded=first_excluded_field_italian
        optional=optional_italian
        last_field=last_field_italian
    else:
        print(legend)
        raise('Error, problems in language detection')
        return '',False,True
    
    # Check if all words are present
    for words in header_fields:
        found=False
        for word in words:
            if legend.startswith(word):
                legend=legend.split(word)[1]
                legend=legend.lstrip()
                found=True
                break
            
        if found==False:
            if words[0] in optional:
                pass
            else:
                print(words)
                print(legend)
                raise('Error, word not known')
                return '',False,True
    legend_splitted=legend.split(' ')
    legend_first_excluded=legend_splitted[0]
    if legend_first_excluded != first_excluded:
        print(legend_first_excluded)
        print(legend)
        raise('First excluded element not good')
    legend_splitted=[x.lstrip() for x in legend_splitted]
    legend_splitted=[x for x in legend_splitted if x!='' ]
    last=legend_splitted[-1]

    if last.startswith(last_field):
        return language,True
    
    
    
    return language,False

### Hypothesis

*Fields* - standard fields for each language:
1. catégorie (0)
2. rang (1) (CAN BE MERGED WITH NOM)
3. nom (2) (CAN BE MERGED WITH RANG)
4. an (3)
5. lieu (3)
6. équipe  (4) (MAYBE MISSING)
7. pénalité (5) (NOT ALWAYS PRESENT)
8. temps (6)
9. retard (7)

*Only* 1,2,3,4,5,8,9 are parsed!!
After 5, it checks if the other fields are a time field. If they are not, they are not used.
If more than 2 times are found an error is raised.
If 1 time is found it is supposed that it is the final time, not the delay.
If 0 times are found the player is not used and it is printed

The presence of these fields is automatically checked in process_legend(). They have to be in this order.
If they are not, an error is raised.
Other possible problems:
1. temps and retard should be formatted in a way parsable by parse_time()
2. Also the other fields should be formatted in the same way as Lausanne Marathon

### Parsing of category/sex/length
We are not interested in the specific category of the race. It will be deduced by the year

We are strongly interested in:
1. Sex
2. Length of the race

These informations are not easily parsable.

TO BE VERIFIED

It seems that *sex* is always included in some way in category: here are the words in the second part of the category string that contains the sex.

Don't parse if it ends with 'W', it can be a walking and it makes confusion.

In [408]:
men_category=['Hommes','Herren','Boys','Hom','Gar']
men_category_starting_ending_word=['H','M']
women_category=['Femmes','Damen','Girls','Dam','Fam','Fille']
women_category_starting_ending_word=['D','F']
women_category_only_starting_word=['W']

In [409]:
def number_or_majuscule(letter):
    return (letter.isdigit() or letter.isupper())

### Fields parsing
The fields are parsed in process_fields.

In [410]:
def process_category(category):  
    split=category.split('-')
    if len(split)==2:
        first,second=split
    elif len(split)==1:
        second=split[0]
        first=False
    else: 
        print(category)
        raise('Category not expected')
    # Category retrieval
    try:
        float(r)
    
    # Sex retrieval
    sex=False
    for word in men_category:
        if word in second:
            sex='M'
            break
    for word in men_category_starting_ending_word:
        if (second.startswith(word) and number_or_majuscule(second[len(word)])) or second.endswith(word):
            sex='M'
            break
    for word in women_category:
        if word in second:
            if sex=='M':
                print(category)
                raise('Double sex detected')
            sex='F'
            break
    for word in women_category_starting_ending_word:
        if (second.startswith(word) and number_or_majuscule(second[len(word)])): 
            if sex=='M':
                print(category)
                raise('Double sex detected')
            sex='F'
            break
    for word in women_category_only_starting_word:
        if second.startswith(word): 
            if sex=='M':
                print(category)
                raise('Double sex detected')
            sex='F'
            break
    return first,sex

In [411]:
def process_fields(runner_splitted,pace):
    ''' @ paramethers
            runner_splitted is a list of fields. It is created by the for loop in the Parsing section.
                It is not well formatted. Some fields can be merged together. Check hypothesis.
        @ returns
            the list of fields that will be directly imported in the database
    '''
    fields_processed=[]
    # The first element is the category - process it
    fields_processed+=process_category(runner_splitted[0])
    
    # Check if splitting second element
    try:
        splitted=runner_splitted[1].split('.')
    except:
        print(runner_splitted)
        raise()
    try:
        splitted[0]=int(splitted[0])
    except:
        return []
    
    first_to_check=3
    if len(splitted)!=1 and splitted[1]!='': #rang and nom are merged
        splitted[1]=splitted[1].lstrip()
        fields_processed+=splitted
        first_to_check=2
    else:
        fields_processed.append(splitted[0])
        fields_processed.append(runner_splitted[2])
        
    # Split the an-lieu element
    fields_processed+=runner_splitted[first_to_check].split(' ',1)
    first_to_check+=1
    
    # Take only the first element (the year). The second is kept only if it is a time (not encountered yet)
    try:
        parse_time(fields_processed[-1])
        raise('It should not be a date')
    except:
        pass
        #del fields_processed[-1]
    
    # Insert all times found after the year (if they are not 2 raise an error)
    added_fields=0
    for i in range(first_to_check,len(runner_splitted)):
        try:
            parse_time(runner_splitted[i])
            fields_processed.append(runner_splitted[i])
            added_fields+=1
        except:
            pass
    if added_fields==0:
        return []
    if added_fields==1:
        fields_processed.append('----')
        added_fields=2
    if added_fields!=2:
        print(added_fields)
        print(runner_splitted)
        raise('Added fields not equal to 2')
    
    # Add pace if present
    if pace:
        try:
            parse_time(runner_splitted[-1],check_only=True)
            fields_processed.append(runner_splitted[-1])
        except:
            print(fields_processed)
            print(runner_splitted)
            return []
            raise('pace not present')
    else:
        fields_processed.append(False)
        
    return fields_processed
    

## Parsing

In [412]:
def do_parse(runner):
    return True
    start=runner[:3]
    if start=='10-' or start=='21-' or start=='42-':
        return True
    

In [413]:
final_list=[]
t1=time.time()
for link in links:
    # Get raw HTML response
    result_html = rq.get(link)#, params=rang_to_query[0])

    # Use BeautifulSoup and extract the first (and only) HTML table
    result_soup = bs4.BeautifulSoup(result_html.text, "lxml")

    results=result_soup.findAll('font')  # Search for all fonts
    language,pace=process_legend(results[0])
    print(language,pace)
    del results[0]    # This is the legend
    for table in results:
        if table.get('size')=='2': # If size is 1 it stores the split times, not interesting
            
            # NOT TRUE IN GENERAL !!!!!!!!!!!!!!!!!!!!
            runner_list=str(table).split('\n')         # Each line is delimited by \n
            for k,runner in enumerate(runner_list):
                runner=runner.split('¦')[0] # The part on the right of ¦ is composed by partial times if present
                start_runner=runner[:]
                
                runner=re.sub('<[^>]+>', ' ', runner) # Remove all text between <>
                runner=re.sub('  +','#@$&',runner)       # Replace all the double or more spaces with &
                
                runner=runner.replace('\n','')        # Remove the \n at the beginning of the line
                

                runner=runner.replace(' \r','')       # Remove the \r at the beginning of the line
                runner=runner.replace('\r','')       # Remove the \r at the beginning of the line
                runner=runner.lstrip()                 # The first athlete starts with a space
                
                # The team can be empty, check:
                start=runner.split('#@$&')[0]
                if do_parse(start):
                    runner2=runner.split('#@$&') # Split the fields
                    if len(runner2)==1:
                        break
                    
                    # It works ONLY if the number of fields are the same for different languages
                    runner=process_fields(runner2,pace=pace) 
                    if len(runner)!=0:
                        final_list.append(runner)         # Append to the final list  
                    else:
#                         print(start_runner)
                        print(runner2)
print('Time: ',time.time()-t1)
    

French True
['10W-NW', '---', 'Abansir Florence', '1971 Lausanne', '-----', '1:28.29,2', '-----', '(18024)', 'diplôme', 'foto', 'video', '---', '8.50 ']
['10W-NW', '---', 'Aburto Ana', '1961 Penthalaz', '-----', '1:40.46,2', '-----', '(18027)', 'diplôme', 'foto', 'video', '---', '10.04 ']
['Pink-Ch', '---', 'Achanta Céline', '1985 St-Sulpice VD', '-----', '-----', '-----', '(30038)', 'diplôme', 'foto', 'video', '---', '---- ']
['10W-NW', '---', 'Acher Dominique', '1965 F-Fresne le Plan', 'PSN PREAUX', '-----', '1:05.18,6', '-----', '(18028)', 'diplôme', 'foto', 'video', '---', '6.31 ']
['10W-NW', '---', 'Acher Sylvie', '1970 F-Fresne le Plan', 'PSN PREAUX', '-----', '1:12.55,7', '-----', '(18029)', 'diplôme', 'foto', 'video', '---', '7.17 ']
['10W-Walk', '---', 'Achtari Roxane', '1998 Le Mont-sur-Lausanne', '-----', '1:28.34,9', '-----', '(17074)', 'diplôme', 'foto', 'video', '---', '8.51 ']
['10W-Walk', '---', 'Adé Roniger Sylvia', '1964 Lausanne', '-----', '1:19.43,4', '-----', '(170

In [414]:
df = pd.DataFrame(final_list)
df = df.rename(columns={0:'cat',1:'sex',2:'rang',3:'nom',4:'an',5:'lieu',6:'temps',7:'retard',8:'pace'})

In [415]:
df

Unnamed: 0,cat,sex,rang,nom,an,lieu,temps,retard,pace
0,21,M,147,Abaidia Jilani,1966,St-Légier-La Chiésaz,"1:45.28,4","25.56,8",4.59
1,21,F,81,Abaidia Sandrine,1972,St-Légier,"1:49.40,8","24.09,5",5.11
2,M,F,33,Abaidia Selma,2006,St-Légier-La Chiésaz,"7.12,2","1.36,3",4.48
3,21,M,103,Abb Jochen,1948,Ernen,"2:50.40,7","1:21.28,7",8.05
4,10,M,426,Abbas Dhia,1961,Lausanne,"1:13.04,1","38.13,0",7.18
5,21,M,640,Abbet Florian,1982,Pully,"1:56.01,7","47.33,8",5.29
6,10,F,517,Abdala Maria Lucia,1979,Lausanne,"1:01.30,8","27.09,1",6.09
7,10,M,152,Abdela Esa,1992,Pully,"42.44,1","14.26,0",4.16
8,21,M,67,Abdelaziem Ahmed Ramy Bac,1992,Lausanne,"1:29.06,1","20.32,9",4.13
9,M,M,2,Abdelmoumène Eden,2003,F-Evian les Bains,"10.47,6","0.46,9",2.34


In [355]:
df.cat.value_counts()

10    1522
30    1168
K      204
Name: cat, dtype: int64

In [356]:
df.to_pickle('lausanne')

# ******* ******* ******* ******* *******  
# OLD CODE 
# ******* ******* ******* ******* ******* ******* 

In [None]:
df = pd.read_html(result_table.decode())[0]
df.head()

In [None]:
df.columns = df.loc[1]                # use row 2 as column names
df = df.drop([0, 1])                  # drop useless first rows
df = df.drop([np.nan], axis=1)        # drop useless nan column
df.index = df['No Sciper']            # use sciper column as index

# Drop some columns
df = df.drop(['Orientation Bachelor', 'Orientation Master', 'Filière opt.', 'Type Echange', 'Ecole Echange'], axis=1)

# Do some renaming
df.index.name = 'sciper'
df.columns = ['gender', 'full_name', 'specialization', 'minor', 'status', 'sciper']

# Map gender to more standard names
dict_gender = {'Monsieur': 'male','Madame': 'female'}
df.gender.replace(dict_gender, inplace=True)
df.head()

## Some tools

We can define a helper function which, given a base URL and a dictionary of parameters, will fetch the data and fill a DataFrame with it.

In [None]:
def get_data(base_url, params_dict):
    """Get data from IS-Academia in a pandas DataFrame"""
    
    # Same sequence of operations of above, with a check if the result_table is empty
    
    result_html = rq.get(base_url,params=params_dict)
    result_soup = bs4.BeautifulSoup(result_html.text, "lxml")
    result_table = result_soup.find_all('table')[0]
    
    if (result_table.text == ''):
        # Return empty dataframe
        df = pd.DataFrame()
    else:
        # Build a DataFrame containing the data, with SCIPER as index
        df = pd.read_html(result_table.decode())[0]
        try:
            df.columns = df.loc[1]                # use 2nd row as column names
            df = df.drop([0, 1])                  # drop useless first rows
            df = df.drop([np.nan], axis=1)        # drop useless nan column
            df.index = df['No Sciper']            # use sciper column as index
        
            # Drop some columns
            df = df.drop(['Orientation Bachelor', 'Orientation Master', 'Filière opt.', 'Type Echange', 'Ecole Echange'], axis=1)
            # Do some renaming
            df.index.name = 'sciper'
            df.columns = ['gender', 'full_name', 'specialization', 'minor', 'status', 'sciper']
            # Map gender to more standard names
            dict_gender = {'Monsieur': 'male','Madame': 'female'}
            df.gender.replace(dict_gender, inplace=True)
        except:
            df = pd.DataFrame()
    
    return df

The following lines test this function with hardcoded values :

In [None]:
base_url = "http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?"
params_dict = {
    'ww_x_GPS': 2021043255,
    'ww_i_reportModel': 133685247,
    'ww_i_reportModelXsl': 133685270,
    'ww_x_UNITE_ACAD': 249847,
    'ww_x_PERIODE_ACAD': 355925344,
    'ww_x_PERIODE_PEDAGO': 249108,
    'ww_x_HIVERETE':2936286
}

get_data(base_url, params_dict).head()

Finally let's get all the possible values in a cleaner way and keep them in variables that we will use throughout this notebook.

In [None]:
acad_period = {}
level = {}
semester = {}
acad_unit = {}

for s in selectors:
    options = s.find_all('option')
    options_desc_values = [(o.text, o.attrs['value']) for o in options]
    s_name = s.attrs['name']
    choices = {d: int(v) for (d,v) in options_desc_values if d!=''}
    
    if s_name == 'ww_x_PERIODE_ACAD':
        acad_period = choices
    elif s_name == 'ww_x_PERIODE_PEDAGO':
        level = choices
    elif s_name == 'ww_x_HIVERETE':
        for (d,v) in options_desc_values:
            if 'automne' in d:
                semester['automne'] = int(v)
            elif 'printemps' in d:
                semester['printemps'] =int(v)
    elif s_name == 'ww_x_UNITE_ACAD':
        acad_unit = choices

# Example of result
acad_period

### Store data locally

In [None]:
# Get bachelor data for every year and store it if it's not empty
import os
local_dir = '.local-data'
try:
    os.mkdir(local_dir)
except FileExistsError:
    # directory exists
    print("Using existing '" + local_dir + "' directory")

In [None]:
# Fixed values
params_dict = {
    'ww_x_GPS': -1,
    'ww_i_reportModel': 133685247,
    'ww_i_reportModelXsl': 133685270,
    'ww_x_UNITE_ACAD': acad_unit['Informatique']
}

# Iterate over all the varying params and keep only data for bachelors
for year_key, year_value in acad_period.items():
    for level_key, level_value in level.items():
        for semester_key, semester_value in semester.items():
            if 'bachelor' in level_key.lower():
                params_dict['ww_x_PERIODE_ACAD'] = year_value
                params_dict['ww_x_PERIODE_PEDAGO'] = level_value
                params_dict['ww_x_HIVERETE'] = semester_value
                
                df = get_data(base_url, params_dict)
                if not df.empty:
                    # Persist dataframe locally with pickle
                    filename = year_key + '-' + level_key.replace(' ', '-').lower() + '-' + semester_key
                    df.to_pickle(local_dir + '/' + filename)

In [None]:
# the previous cell should download 60 files!, as you can check with this command:
print(len([name for name in os.listdir(local_dir)]))

We hereby show an example of dataframe laoded from the files previously download:

In [None]:
df_example = pd.read_pickle(local_dir + '/2007-2008-bachelor-semestre-6-printemps')
df_example.head()