The purpose of this notebook is to create a critical pace + gradient (CPG) chart. That is, it will look at multiple runs, find the maximum speed at various gradients and time intervals over all runs, and plot it. I am treating the csv file as if it has 1-second recording turned on (even if it doesn't). 

In this notebook, all computations are done from scratch. That is, it pulls in the converted csv files, computes the max speed matrix for each run, finds maximums, then plots it. In my notebook "CPG Chart", I assume that these matrices are already created, and so it simply reads them in and creates the chart.

In [1]:
import pandas as pd
import numpy as np
import os
import plotly.plotly as py
import plotly.graph_objs as go

In [2]:
INDIR = r'data/csv/'

files = os.listdir(INDIR)

df_list = []

for file in files:
    if file.endswith('.csv'):
        df_list.append(pd.read_csv(INDIR + file))

We'll write a function to do basic gradient cleaning. It will simply find gradients that are too large (in absolute value), where "too large" is a parameter which can be set. It replaces these "too large" values either with the median, mean, or zero. This could certainly be expanded and improved, but for now it works fine.

In [3]:
def clean_gradient(grad_series, replacement_grad='median', max_grad=0.4):
    bad_index = grad_series.loc[np.abs(grad_series) > max_grad].index
    grad_series.fillna(0)
    if len(bad_index) == 0:
        return grad_series
    elif replacement_grad == 'median':
        grad_series.loc[bad_index] = np.median(grad_series.values)
    elif replacement_grad == 'mean':
        grad_series.loc[bad_index] = np.mean(grad_series.values)
    elif replacement_grad == 'zero':
        grad_series.loc[bad_index] = 0
    return grad_series

It is helpful to have a column showing the gradient as a number between 0 and 90 (in degrees).

In [4]:
for df in df_list:
    df['gradient'] = clean_gradient(df['gradient'])
    df['gradient_100'] = 100*df['gradient']



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



We want to now compute the maximum speed at each gradient. To do so we'll build a (much faster) function which returns a dict with the gradient as the key and the max speed at that gradient as the value. We'll then use this to create a data frame with this info. The columns will be the gradient, and the rows will be the time in seconds. Initially, this data frame may have "gaps" in the columns. That is, there might be gradients which don't show up, i.e. gradient of 11 and 9, but not 10. This is due to the activit(ies) not having those gradients. This is then fixed below.

In [5]:
def better_max_speed_t(t, df):
    rolling_df = df[['inst_speed_meters_sec', 'gradient_100']].rolling(window=t).mean().fillna(0)
    rolling_df['rounded_gradient_100'] = rolling_df['gradient_100'].apply(np.round)
    rolling_groupby = rolling_df.groupby('rounded_gradient_100')
    
    max_speed = {int(name): max(group['inst_speed_meters_sec'].values) for name,group in rolling_groupby}
    return max_speed

$t$ and $g$ are the time and gradient intervals respectively to consider. Right now there are problems if you make t not start at zero (I think due to t being used as an index for a data frame). But if you have $t$ start at zero everything is fine. 

In [6]:
t = np.arange(301)
g = np.arange(-20, 21)

For each activity we create a data frame where the columns are the gradients, and the rows are the times (in seconds). The value at $(t,g)$ is the maximum speed maintained for $t$ seconds at average gradient $g$. We then create a list of all of these for easy access later.

In [7]:
max_speed_list_of_df = [pd.DataFrame([better_max_speed_t(t_val, df) for t_val in t]).fillna(0) for df in df_list]

We mentioned earlier that there may be "missing gradients" in an activity. To remedy this we create a "zero data frame" which has rows $t$ and columns $g$, and is entirely filled with zeros. In fact, we create one such "zero data frame" for each activity data frame created above. We then "update" these zero data frames with the max speed data frames created for each activity. This way, if there are any missing gradients in the activity, they just becomes zeros in the updated data frame. 

It would be nice to be able to do this with list comprehension, however there is an issue. In particular, you can't update a data frame and then save it to another. That is, you can't do new_df = df_1.update(df_2). Pandas will just make new_df be a NoneType. So you have to manually create a zero data frame, update it, and put it back in the list. I save them into a new list just to avoid being destructive. 

In [8]:
max_speed_list_of_df_updated = []

for df in max_speed_list_of_df:
    df_data = df
    df = pd.DataFrame(0, index=t, columns=g)
    df.update(df_data)
    max_speed_list_of_df_updated.append(df)

This is pretty standard, except for a couple small things. One is that Plotly is complaining when I specify the x and y axes as $t$ and $g$. I'm not sure why, so I'm just leaving it out for now. Second is that we have these "max_speed" data frames for each activity, and we want to find the component-wise max of each one (since, for a particular $(t,g)$ pair, the maximum speed over all activities corresponding to that $(t,g)$ value is what our chart should show). But if you put all the max_speed data frames into a list and ask numpy to find the max, it only sees a list with a single value and thinks you made a mistake. So you "reduce" the list first. To be honest I'm not sure what exactly this does, but I got it from a StackOverflow page, and it works great.

In [9]:
ZZ_list = [df[g].iloc[t].as_matrix() for df in max_speed_list_of_df_updated]
ZZ = np.maximum.reduce(ZZ_list)

A list of all the Plotly colorscales can be found here: https://community.plot.ly/t/what-colorscales-are-available-in-plotly-and-which-are-the-default/2079

In [10]:
data = [
    go.Surface(x=t, y=g, z=ZZ, colorscale=[[0, 'rgb(255,255,255,0)'], [1, 'rgb(255,0,0,1)']])
]

layout = go.Layout(
            scene = dict(
                xaxis = dict(
                    title='Time (s)'),
                yaxis = dict(
                    title='Gradient'),
                zaxis = dict(
                    title='Speed (m/s)'),)
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

In [12]:
from plotly.widgets import GraphWidget


The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.


IPython.utils.traitlets has moved to a top-level traitlets package.



<IPython.core.display.Javascript object>

In [39]:
fixed_gradient = -5

data = [
    go.Scatter(x=t, y=gradient_speed_array(fixed_gradient, ZZ, g))
]

layout = go.Layout(
            scene = dict(
                xaxis = dict(
                    title='Time (s)'),
                yaxis = dict(
                    title='Gradient'),
                zaxis = dict(
                    title='Speed (m/s)'),)
)

updatemenus=list([
    dict(
        buttons=list([   
            dict(label = '10',
                 method = 'restyle',
                 args = [{'x': t},
                         {'y': gradient_speed_array(10, ZZ, g)},
                         {'title': '10'}
                        ]
                ),
            dict(label = '-10',
                 method = 'restyle',
                 args = [{'title': 'New title'}
                        ]
                )  
        ]),
        direction = 'left',
        pad = {'r': 10, 't': 10},
        showactive = True,
        type = 'buttons',
        x = 0.1,
        xanchor = 'left',
        y = 1.1,
        yanchor = 'top' 
    ),
])

layout['updatemenus'] = updatemenus

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

In [24]:
graph.restyle({'x': t, 'y': gradient_speed_array(10, ZZ, g)}, indices=[0])
py.plot(graph)

PlotlyError: The `figure_or_data` positional argument must be either `dict`-like or `list`-like.

In [33]:
help(layout)

Help on Layout in module plotly.graph_objs.graph_objs object:

class Layout(PlotlyDict)
 |  Valid attributes for 'layout' at path [] under parents ():
 |  
 |      ['angularaxis', 'annotations', 'autosize', 'bargap', 'bargroupgap',
 |      'barmode', 'barnorm', 'boxgap', 'boxgroupgap', 'boxmode', 'calendar',
 |      'direction', 'dragmode', 'font', 'geo', 'height', 'hiddenlabels',
 |      'hiddenlabelssrc', 'hidesources', 'hoverlabel', 'hovermode', 'images',
 |      'legend', 'mapbox', 'margin', 'orientation', 'paper_bgcolor',
 |      'plot_bgcolor', 'radialaxis', 'scene', 'separators', 'shapes',
 |      'showlegend', 'sliders', 'smith', 'ternary', 'title', 'titlefont',
 |      'updatemenus', 'width', 'xaxis', 'yaxis']
 |  
 |  Run `<layout-object>.help('attribute')` on any of the above.
 |  '<layout-object>' is the object at []
 |  
 |  Method resolution order:
 |      Layout
 |      PlotlyDict
 |      builtins.dict
 |      PlotlyBase
 |      builtins.object
 |  
 |  Methods inherited

Next, we'll allow the user to plot their curve while fixing a gradient.

In [42]:
from matplotlib import pyplot as plt
%matplotlib inline

In [14]:
def gradient_speed_array(gradient, max_speed_matrix, gradient_range):
    '''Return a numpy array showing the maximum speed achieved at a given gradient'''
    gradient_index = np.where(gradient_range == gradient)
    gradient_speed_array = max_speed_matrix[:,gradient_index[0][0]]
    return gradient_speed_array

In [74]:
fixed_gradient = -5

data = [
    go.Scatter(x=t, y=gradient_speed_array(fixed_gradient, ZZ, g))
]

fig = go.Figure(data=data)
py.iplot(fig)