# RUN PERFORMANCE PROJECT - Pau Sampietro

## Time Series Approach for predicting future races

In this file, we focus on some specific data, those flat moves whose lenghts are between 7 and 13 km. We want to know how the moving time for these moves is evolving with time. That could provides us a more accurate prediction for similar competitions

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from rpdb import read_table, read_table_sql, export_table
from datetime import datetime

#### Import moves from the database

In [14]:
Ssql = "SELECT * FROM moves WHERE athlete = 'P'"
moves = read_table_sql('moves', Ssql)
moves.head()

Unnamed: 0,index,move,start_time,distance,calories,athlete,accum_ascent,moving_time,pace,heart_rate,ascent_ratio
0,0,1,2017-07-02 09:46:40,4019.0,670.0,P,280.0,29.8,7.41,166.0,69.7
1,1,2,2017-07-02 10:32:23,4995.0,585.0,P,63.0,30.9,6.19,168.0,12.6
2,2,3,2017-07-19 17:49:13,4374.0,602.0,P,115.0,22.6,5.17,180.0,26.3
3,3,4,2017-07-20 17:56:41,3005.0,365.0,P,91.0,16.9,5.62,167.0,30.3
4,4,5,2017-07-24 16:17:34,6540.0,760.0,P,241.0,42.0,6.42,161.0,36.9


### 1. Filtering by distance, checking data interval and sorting moves

#### 1.1. We get only the flat moves within 7 and 13 km (10+/-3)

In [17]:
moves_ts = moves[(moves.distance >= 7000) & (moves.distance <= 13000) & (moves.ascent_ratio < 35)]
len(moves_ts)

51

#### 1.2. Preparing data: sort values, reindex, drop all but start_time & pace

In [18]:
moves_ts['start_time'].min(), moves_ts['start_time'].max()

(Timestamp('2017-08-11 17:23:17'), Timestamp('2019-01-24 08:13:43'))

* Droping all but useful time series columns, checking NaN values

In [20]:
cols = ['index', 'move','distance', 'calories', 'athlete', 'accum_ascent', 'moving_time', 'heart_rate', 'ascent_ratio']
moves_ts.drop(cols, axis=1, inplace=True)
moves_ts = moves_ts.sort_values('start_time')
moves_ts.isnull().sum()

start_time    0
pace          0
dtype: int64