## Datasport Public Page Analytics

This notebook, queries the datasport public pages and retrieves the ranking and athletes to do some basic analytics, like number of ahtletes, and the average birthyear etc.

To run the notebook on macOS, from a terminal, install the `notebook` component und run `jupyter`:

    pip install notebook
    jupyter notebook
    
Once opened, start executing the blocks from top down, to get some data

### Requirements

The following block installs the requirements and imports the necessary libraries

In [None]:
# Install requests
!pip2 install requests
!pip2 install PTable

In [1]:
# Import requirements
import requests
import string
import os
import re
from prettytable import PrettyTable

### Datasport public page query

We query the public ranking pages of datasport base on a year, an run name and a letter.
Thus we use a url like the following

```python
run = 'escalade'
year = 2016
letter = a
url = 'https://services.datasport.com/{}/lauf/{}/alfa{}.htm'.format(year, run, letter)

# > https://services.datasport.com/2016/lauf/escalade/alfaa.htm
```

The following block are the required methods to run 

In [6]:
def parse_athletes_regex(data, year=2016):
    """Parse an athlete line with regex"""
    athletes = []
    if year >= 2000:
        ath = {}
        regex = ur"(^|>)(?P<category>[-/\w]+)\s+(?P<rank>[\d-]+)\.?\s+(?P<name>.+?)\s+(?P<year>\d{4}|\d{2}|\?{4})\s+(?P<city>.+?)\s\s.+$"
        matches = re.finditer(regex, data, re.MULTILINE)
        matchNum = 0

        for matchNum, match in enumerate(matches):
            matchNum = matchNum + 1

            ath['category'] = match.group('category')
            ath['rank'] = match.group('rank')
            ath['name'] = match.group('name')
            ath['city'] = match.group('city')
            try:
                ath['year'] = int(match.group('year'))
            except:
                ath['year'] = 0

            ath['country'] = 'CH'
            if ath['city'][1:2] == '-':
                ath['country'] = ath['city'][0:1]

            athletes.append(ath)

        # print 'Matched {} athletes in data.'.format(matchNum)
            
    return athletes

def analyse_datasport_url(run, year, filn, url, store_local=False):
    """Analyse a datasport URL"""

    filename = '{}/{}/{}'.format(run, year, filn)
    
    if store_local and os.path.exists(filename):
        with open(filename, 'r') as fd:
            # print 'Reading from local file...',
            data = fd.read()
    else:
        # print 'Querying Datasport website...',
        r  = requests.get(url)
        data = r.text

        if store_local:
            if not os.path.exists(os.path.dirname(filename)):
                try:
                    os.makedirs(os.path.dirname(filename))
                except OSError as exc: # Guard against race condition
                    if exc.errno != errno.EEXIST:
                        raise
            with open(filename, 'wb') as fd:
                for chunk in r.iter_content(chunk_size=128):
                    fd.write(chunk)
    
    return parse_athletes_regex(data, year)

def category_to_group(category):
    if category in ['PousF-A9', 'PousF-B6', 'PousF-B7', 'Cad-A-F']:
        return 'Ecolières'
    if category in ['PousM-B6', 'Cad-A-M']:
        return 'Ecoliers'
    
    if category in ['Mix2-H', 'Mix3-H']:
        return 'Hommes'
    
    if category in ['Mix2-F', 'Mix3-F']:
        return 'Femmes'
    
    if category in ['Walk-Adu']:
        return 'Walk'

def get_datasport(run, from_year, to_year):
    """Query datasport years"""
    dd = {}
    for year in range(from_year,to_year):
        print 'Querying letters a-z for {} and year {}'.format(run, year)
        dd[year] = []
        parsed_data = []
        for letter in string.ascii_lowercase:
            # print 'Querying letter {} for {}.'.format(letter, year),
            filn = 'alfa{}.htm'.format(letter)
            url = 'https://services.datasport.com/{}/lauf/{}/alfa{}.htm'.format(year, run, letter)
            # Query the data
            try:
                new_data = analyse_datasport_url(run, year, filn, url, True)
                # append to array
                parsed_data = parsed_data + new_data
            except Exception, e:
                print e
        dd[year] = parsed_data

    return dd
 
def print_stats(run, year, parsed):
    year_sum = sum(a['year'] for a in parsed)
    year_count = len([x for x in parsed if x['year']>0])
    if year_count == 0:
        average = 0
    else:
        average = year_sum / year_count
    
    print 'For {}, year {}: Average {}, Participants {}'.format(run, year, average, len(parsed))

def print_datasport(dd, from_year, to_year):
    """Print datasport years"""
    for year in range(from_year,to_year):
        print_stats(run, year, dd[year])
        

def get_pretty_table(dd, from_year, to_year):
    """Print pretty table for years"""
    table = PrettyTable()
    table.field_names = ["Year", "Average", "Athletes"]
    table.align["Year"] = "r"
    table.align["Average"] = "r"
    table.align["Athletes"] = "r"
    for year in range(from_year,to_year):
        year_sum = sum(a['year'] for a in dd[year])
        year_count = len([x for x in dd[year] if x['year']>0])
        if year_count == 0:
            average = 0
        else:
            average = year_sum / year_count
            
        table.add_row([year,average,len(dd[year])])
    return table

### Execute queries

The following block will execute the queries and run the analytics.  
Change run name (see below for other runs), start and end year

In [19]:
run = 'morat'
from_year = 2000
to_year = 2017

dd = get_datasport(run, from_year, to_year)
print '-------------------------------'
print 'RESULTS for {}:'.format(run)
print get_pretty_table(dd, from_year, to_year)

Querying letters a-z for morat and year 2000
Querying letters a-z for morat and year 2001
Querying letters a-z for morat and year 2002
Querying letters a-z for morat and year 2003
Querying letters a-z for morat and year 2004
Querying letters a-z for morat and year 2005
Querying letters a-z for morat and year 2006
Querying letters a-z for morat and year 2007
Querying letters a-z for morat and year 2008
Querying letters a-z for morat and year 2009
Querying letters a-z for morat and year 2010
Querying letters a-z for morat and year 2011
Querying letters a-z for morat and year 2012
Querying letters a-z for morat and year 2013
Querying letters a-z for morat and year 2014
Querying letters a-z for morat and year 2015
Querying letters a-z for morat and year 2016
-------------------------------
RESULTS for morat:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      64 |     6793 |
| 2001 |      61 |     6309 |
| 2002 |      69 |     6405 |
| 20

#### Other runs

The folowing other runs were tested as well and should work with the notebook

In [None]:
runs = ['escalade', 'morat', 'lamara', 'km20', 'kerzers', 
        'gurten', 'frauenlauf', 'zinal', 'zuerich', 'greifenseelauf',
        'trotteuse', 'winterthur']

### Sample results

The following are sample results as run on Dec. 1st 2016.

```
-------------------------------
RESULTS for escalade:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      76 |    13275 |
| 2001 |      74 |    16817 |
| 2002 |      69 |    20465 |
| 2003 |      74 |    18251 |
| 2004 |      76 |    18567 |
| 2005 |      84 |    19540 |
| 2006 |      77 |    18677 |
| 2007 |    1987 |    21804 |
| 2008 |    1984 |    20420 |
| 2009 |    1978 |    21480 |
| 2010 |    1979 |    22142 |
| 2011 |    1987 |    24899 |
| 2012 |    1986 |    29265 |
| 2013 |    1987 |    29519 |
| 2014 |    1985 |    32159 |
| 2015 |    1990 |    37121 |
| 2016 |       0 |        0 |
+------+---------+----------+
RESULTS for morat:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      64 |     6793 |
| 2001 |      61 |     6309 |
| 2002 |      69 |     6405 |
| 2003 |      64 |     6609 |
| 2004 |      67 |     6883 |
| 2005 |      64 |     6833 |
| 2006 |      71 |     7477 |
| 2007 |    1965 |     7731 |
| 2008 |    1969 |     8771 |
| 2009 |    1972 |     8989 |
| 2010 |    1971 |     8735 |
| 2011 |    1984 |     8644 |
| 2012 |    1972 |     9151 |
| 2013 |    1981 |    10895 |
| 2014 |    1981 |    10761 |
| 2015 |    1980 |    10796 |
| 2016 |    1986 |    12337 |
+------+---------+----------+
-------------------------------
RESULTS for lamara:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |       0 |        0 |
| 2001 |       0 |        0 |
| 2002 |      64 |     6226 |
| 2003 |      62 |     7007 |
| 2004 |      67 |     8454 |
| 2005 |      66 |     7999 |
| 2006 |      68 |     8297 |
| 2007 |    1973 |     8407 |
| 2008 |    1968 |     8160 |
| 2009 |    1971 |     9540 |
| 2010 |    1976 |    10036 |
| 2011 |    1978 |    10639 |
| 2012 |    1972 |     9611 |
| 2013 |    1973 |    12011 |
| 2014 |    1974 |    12848 |
| 2015 |    1981 |    13074 |
| 2016 |    1978 |    13557 |
+------+---------+----------+
-------------------------------
RESULTS for km20:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      74 |    10269 |
| 2001 |      73 |     9491 |
| 2002 |      77 |    12762 |
| 2003 |      80 |    10959 |
| 2004 |      82 |    12370 |
| 2005 |      84 |    12818 |
| 2006 |    1978 |    12468 |
| 2007 |    1978 |    13794 |
| 2008 |    1979 |    13760 |
| 2009 |    1987 |    14581 |
| 2010 |    1983 |    16318 |
| 2011 |    1988 |    16382 |
| 2012 |    1991 |    17083 |
| 2013 |    1989 |    18672 |
| 2014 |    1991 |    19762 |
| 2015 |    1993 |    22944 |
| 2016 |    1988 |    24702 |
+------+---------+----------+
-------------------------------
RESULTS for kerzers:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      67 |     3848 |
| 2001 |      62 |     4127 |
| 2002 |      60 |     4452 |
| 2003 |      74 |     5398 |
| 2004 |      67 |     5656 |
| 2005 |      75 |     5582 |
| 2006 |      67 |     5975 |
| 2007 |    1972 |     6715 |
| 2008 |    1972 |     7736 |
| 2009 |    1971 |     7179 |
| 2010 |    1966 |     6821 |
| 2011 |    1969 |     7151 |
| 2012 |    1977 |     7519 |
| 2013 |    1975 |     7158 |
| 2014 |    1985 |     8250 |
| 2015 |    1985 |     7878 |
| 2016 |    1974 |     8286 |
+------+---------+----------+
-------------------------------
RESULTS for gurten:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      63 |      990 |
| 2001 |      61 |     1112 |
| 2002 |      67 |      947 |
| 2003 |      65 |     1154 |
| 2004 |      64 |     1349 |
| 2005 |      64 |     1679 |
| 2006 |      63 |     1590 |
| 2007 |    1967 |     1465 |
| 2008 |    1970 |     1386 |
| 2009 |    1975 |     1564 |
| 2010 |    1962 |     1333 |
| 2011 |    1973 |     1470 |
| 2012 |    1978 |     1342 |
| 2013 |    1971 |     1432 |
| 2014 |    1981 |     1786 |
| 2015 |    1974 |     2118 |
| 2016 |    1970 |     1961 |
+------+---------+----------+
-------------------------------
RESULTS for frauenlauf:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      71 |    10680 |
| 2001 |      61 |    10439 |
| 2002 |      67 |    11675 |
| 2003 |      67 |    10900 |
| 2004 |      68 |    11691 |
| 2005 |      67 |    13435 |
| 2006 |      68 |    13264 |
| 2007 |    1963 |    12138 |
| 2008 |    1964 |    12109 |
| 2009 |    1972 |    12889 |
| 2010 |    1974 |    12292 |
| 2011 |    1974 |    13168 |
| 2012 |    1972 |    13747 |
| 2013 |    1973 |    13737 |
| 2014 |    1972 |    13473 |
| 2015 |    1983 |    13870 |
| 2016 |    1985 |    14374 |
+------+---------+----------+
-------------------------------
RESULTS for zuerich:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |       0 |        0 |
| 2001 |       0 |        0 |
| 2002 |       0 |        0 |
| 2003 |      66 |     4851 |
| 2004 |      62 |     6029 |
| 2005 |      63 |     5816 |
| 2006 |      63 |     5350 |
| 2007 |    1962 |     4642 |
| 2008 |    1964 |     4599 |
| 2009 |    1966 |     5009 |
| 2010 |    1968 |     3487 |
| 2011 |    1971 |     3957 |
| 2012 |    1971 |     5463 |
| 2013 |    1973 |     5320 |
| 2014 |    1977 |     6042 |
| 2015 |    1974 |     5897 |
| 2016 |    1974 |     5791 |
+------+---------+----------+
-------------------------------
RESULTS for winterthur:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      62 |      894 |
| 2001 |      60 |     1076 |
| 2002 |      61 |     1514 |
| 2003 |      61 |     1729 |
| 2004 |      67 |     2070 |
| 2005 |       0 |        0 |
| 2006 |       0 |        0 |
| 2007 |    1976 |     2079 |
| 2008 |    1966 |     2210 |
| 2009 |    1968 |     2165 |
| 2010 |    1969 |     1998 |
| 2011 |    1971 |     2385 |
| 2012 |    1978 |     2092 |
| 2013 |    1974 |     2730 |
| 2014 |    1975 |     2427 |
| 2015 |    1975 |     2703 |
| 2016 |    1979 |     2559 |
+------+---------+----------+
-------------------------------
RESULTS for greifenseelauf:
+------+---------+----------+
| Year | Average | Athletes |
+------+---------+----------+
| 2000 |      67 |     6763 |
| 2001 |      68 |     7263 |
| 2002 |      71 |     8251 |
| 2003 |      65 |     9210 |
| 2004 |      62 |    11904 |
| 2005 |      65 |    11719 |
| 2006 |      68 |    10825 |
| 2007 |    1968 |    11566 |
| 2008 |    1968 |    11305 |
| 2009 |    1970 |    13379 |
| 2010 |    1970 |    11699 |
| 2011 |    1971 |    12136 |
| 2012 |    1968 |    12368 |
| 2013 |    1977 |    11907 |
| 2014 |    1978 |    12176 |
| 2015 |    1978 |    11906 |
| 2016 |    1978 |    11484 |
+------+---------+----------+
```