# Races information

This notebook describes the steps to get a final table describing the parsed races at a high level.

In [1]:
import numpy as np
import pandas as pd

## Getting weather

We start from the file [`links2runs.csv`](../datasets/links2runs.csv), containing all links to the races/competitions we care about (this same file is also used to get *detailed* information about the results and participants in each race).

Using this file and the two scripts [`weather.py`](../weather/weather.py) and [`weather_utils.py`](../weather/weather_utils.py), we can get the weather conditions for half of the races, using the [World Weather Online API](https://developer.worldweatheronline.com/api/). The result is the file [`races-information-weather.csv`](../datasets/races-information-weather.csv).

In [2]:
races_weather = pd.read_csv('../datasets/races-information-weather.csv', index_col=0)
races_weather.tail()

Unnamed: 0,Unnamed: 0.1,Date,Name,Place,URL,min_temp,max_temp,uv_index,weather_desc
2003,2003,sam. 12.12.2015,"Christmas Midnight Run, Lausanne",Lausanne,http://services.datasport.com/2015/lauf/midnight,6.0,15.0,0.0,Clear
2004,2004,sam. 12.12.2015,"La Trotteuse-Tissot, La Chaux-de-Fonds",La Chaux-de-Fonds,http://services.datasport.com/2015/lauf/trotteuse,2.0,8.0,0.0,Clear
2005,2005,sam. 05.12.2015,"Course de l'Escalade, Genève",Genève,http://services.datasport.com/2015/lauf/escalade,0.0,9.0,0.0,Sunny
2006,2006,sam. 05.12.2015,Gossauer Weihnachtslauf,Gossau SG,http://services.datasport.com/2015/lauf/gossau,4.0,14.0,0.0,Clear
2007,2007,sam. 05.12.2015,Course à Travers Aigle,Aigle,http://services.datasport.com/2015/lauf/aigle,,,,


Despite some effort on our weather scripts, we added a dummy column as a side effect, let's remove it.

In [3]:
races_weather_clean = races_weather.drop('Unnamed: 0.1', axis=1).drop_duplicates().reset_index(drop=True)
races_weather_clean.columns = ['date','name','location','url','min_temp','max_temp','uv_index','weather_desc']
print(races_weather_clean.shape)
races_weather_clean.tail()

(2004, 8)


Unnamed: 0,date,name,location,url,min_temp,max_temp,uv_index,weather_desc
1999,sam. 12.12.2015,"Christmas Midnight Run, Lausanne",Lausanne,http://services.datasport.com/2015/lauf/midnight,6.0,15.0,0.0,Clear
2000,sam. 12.12.2015,"La Trotteuse-Tissot, La Chaux-de-Fonds",La Chaux-de-Fonds,http://services.datasport.com/2015/lauf/trotteuse,2.0,8.0,0.0,Clear
2001,sam. 05.12.2015,"Course de l'Escalade, Genève",Genève,http://services.datasport.com/2015/lauf/escalade,0.0,9.0,0.0,Sunny
2002,sam. 05.12.2015,Gossauer Weihnachtslauf,Gossau SG,http://services.datasport.com/2015/lauf/gossau,4.0,14.0,0.0,Clear
2003,sam. 05.12.2015,Course à Travers Aigle,Aigle,http://services.datasport.com/2015/lauf/aigle,,,,


## Getting coordinates

At this point we can also add coordinates that were retrieved in [`complete_geography.csv`](../datasets/complete_geography.csv).

In [9]:
#geography = pd.read_csv('../datasets/geography.csv', encoding='latin1', index_col=0, na_values=['n', 'a'])
geography = pd.read_csv('../datasets/complete_geography.csv', encoding='latin1', index_col=0, usecols=[0,1,2,3], na_values=['n', 'a'])
geography.columns = ['location','latitude','longitude']
print(geography.shape)
geography.head()

(1990, 3)


Unnamed: 0,location,latitude,longitude
0,St-Légier-La Chiésaz,46.471718,6.876771
1,St-Légier,46.471718,6.876771
2,Ernen,46.3985,8.145773
3,Lausanne,46.519653,6.632273
4,Pully,46.509268,6.665495


We merge the dataframes to build a more complete one :

In [10]:
races_gps = pd.merge(races_weather_clean, geography, on='location', how='left').drop_duplicates()
races_gps = races_gps.reset_index(drop=True)
print(races_gps.shape)
races_gps.tail()

(2004, 10)


Unnamed: 0,date,name,location,url,min_temp,max_temp,uv_index,weather_desc,latitude,longitude
1999,sam. 12.12.2015,"Christmas Midnight Run, Lausanne",Lausanne,http://services.datasport.com/2015/lauf/midnight,6.0,15.0,0.0,Clear,46.519653,6.632273
2000,sam. 12.12.2015,"La Trotteuse-Tissot, La Chaux-de-Fonds",La Chaux-de-Fonds,http://services.datasport.com/2015/lauf/trotteuse,2.0,8.0,0.0,Clear,47.103489,6.832784
2001,sam. 05.12.2015,"Course de l'Escalade, Genève",Genève,http://services.datasport.com/2015/lauf/escalade,0.0,9.0,0.0,Sunny,46.204391,6.143158
2002,sam. 05.12.2015,Gossauer Weihnachtslauf,Gossau SG,http://services.datasport.com/2015/lauf/gossau,4.0,14.0,0.0,Clear,47.415561,9.248852
2003,sam. 05.12.2015,Course à Travers Aigle,Aigle,http://services.datasport.com/2015/lauf/aigle,,,,,46.319025,6.970566


## Parsing dates

Final step : we can parse the dates, in case it's useful later..

In [11]:
races_final = races_gps.copy()

In [12]:
week_dict = {
    'lun': 'monday',
    'mar': 'tuesday',
    'mer': 'wednesday',
    'jeu': 'thursday',
    'ven': 'friday',
    'sam': 'saturday',
    'dim': 'sunday'
}

In [13]:
races_final['weekday'] = races_final.date.apply(lambda x: week_dict[x.split('.')[0].strip()])
races_final['day'] = races_final.date.apply(lambda x: int(x.split('.')[1].strip()))
races_final['month'] = races_final.date.apply(lambda x: int(x.split('.')[2].strip()))
races_final['year'] = races_final.date.apply(lambda x: int(x.split('.')[3].strip()))

In [14]:
races_final.head()

Unnamed: 0,date,name,location,url,min_temp,max_temp,uv_index,weather_desc,latitude,longitude,weekday,day,month,year
0,sam. 27.03.1999,Männedörfler Waldlauf,Männedorf,http://services.datasport.com/1999/zkb/maennedorf,,,,,47.257463,8.694673,saturday,27,3,1999
1,sam. 20.03.1999,Kerzerslauf,Kerzers,http://services.datasport.com/1999/lauf/kerzers,,,,,46.97489,7.195437,saturday,20,3,1999
2,sam. 24.04.1999,Luzerner Stadtlauf,Luzern,http://services.datasport.com/1999/lauf/luzern,,,,,47.050168,8.309307,saturday,24,4,1999
3,sam. 24.04.1999,20km de Lausanne,Lausanne,http://services.datasport.com/1999/lauf/km20,,,,,46.519653,6.632273,saturday,24,4,1999
4,sam. 24.04.1999,"Chäsitzerlouf, Kehrsatz",Kehrsatz,http://services.datasport.com/1999/lauf/kehrsatz,,,,,,,saturday,24,4,1999


## Exporting

In [15]:
races_final.to_csv('../datasets/races-information.csv')