<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#TimeSeries" data-toc-modified-id="TimeSeries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>TimeSeries</a></span><ul class="toc-item"><li><span><a href="#time-property" data-toc-modified-id="time-property-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>time property</a></span></li><li><span><a href="#data-property" data-toc-modified-id="data-property-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>data property</a></span><ul class="toc-item"><li><span><a href="#Data-as-n-dimensional-arrays" data-toc-modified-id="Data-as-n-dimensional-arrays-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Data as n-dimensional arrays</a></span></li></ul></li><li><span><a href="#time_info-property" data-toc-modified-id="time_info-property-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>time_info property</a></span></li><li><span><a href="#data_info-property" data-toc-modified-id="data_info-property-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>data_info property</a></span></li></ul></li></ul></div>

TimeSeries
==========
Although we often see the concept of time series in financial, forecast, and even psychological science, the concept of time series also applies very well to biomechanics. As a matter of fact, biomechanical data is almost always expressed as series of data in time. For example, the trajectory of a marker, an electromyographic signal, series of forces and moments measured by force plates.

In Python, Pandas DataFrame is well suited for time series analysis. We can very well assign the time of each sample as the index of a DataFrame, and assign different signals to columns. However, there are three aspects where DataFrames are suboptimal in biomechanics processing:

1. We often have to deal not just with series of data points, but also with **series of vectors** (e.g., the trajectory of a marker) or even **series of matrices** (e.g., a series of transformation matrices expressing the trajectory of a rigid body in space). In these cases, expressing multiple vectors and matrices in a bidimensional DataFrame is surely not optimal.

2. We often have different measures with different **units**, and there is no practical way with a DataFrame to associate units to columns, other than keeping a separate dictionary.

3. We often deal with **events** (e.g., heel contact in gait, hand contact in wheelchair propulsion), and there is no practical way with a DataFrame to keep a list of those events with their corresponding times, other than keeping a separate list.

To overcome these limitation, `ktk` provides the `TimeSeries` and `TimeSeriesEvent` classes, which are the basis of every `ktk` module. These python classes are largely inspired by Matlab's `timeseries`, `tsdata.event` and `tscollection` classes. Note that TimeSeries are inherently compatible with Pandas DataFrame, using the two conversion methods `from_dataframe` and `to_dataframe`.

In [10]:
import ktk
import pandas as pd
import numpy as np

A TimeSeries mostly consists of five properties: `time`, `time_info`, `data`, `data_info`, and `events`. To better understand these properties, we will begin by opening some columns of a csv file as a Pandas DataFrame. This csv is a recording performed using the SmartWheel instrumented wheelchair wheel.

In [36]:
# Read some columns
df = pd.read_csv('data/timeseries/smartwheel.csv',
                 usecols=[18, 19, 20, 21, 22, 23],
                 names=['Fx', 'Fy', 'Fz', 'Mx', 'My', 'Mz'])

# Since the sampling rate is 240 Hz, then assign i_sample/240 to the DataFrames's index
df.index = np.arange(df.shape[0]) / 240
df

Unnamed: 0,Fx,Fy,Fz,Mx,My,Mz
0.000000,1.27,-0.89,-0.20,-0.03,0.05,-0.03
0.004167,0.49,-0.83,-0.51,0.02,-0.01,-0.13
0.008333,0.00,-0.78,-0.51,0.04,-0.07,-0.18
0.012500,-0.13,-0.93,-0.41,0.03,-0.16,-0.18
0.016667,-0.02,-0.89,0.00,0.01,-0.21,-0.13
...,...,...,...,...,...,...
65.566667,0.52,-0.62,1.33,0.17,0.21,-0.15
65.570833,0.47,-0.52,1.63,0.11,0.13,-0.13
65.575000,0.43,-0.23,2.04,0.02,0.08,-0.03
65.579167,0.51,0.06,2.35,-0.05,0.12,0.03


Now, we can convert this DataFrame to a TimeSeries to see how the data will look in the TimeSeries.

In [37]:
ts = ktk.TimeSeries().from_dataframe(df)
ts

TimeSeries with attributes:
           data: <dict with 6 entries>,
      data_info: <dict with 0 entries>,
         events: <list of 0 items>
           time: <array of shape (15741,)>,
      time_info: <dict with 1 entries>,

time property
-------------

This is the time vector, which tells at which time correspond each of the samples. The time vector is a one-dimensional numpy array of length N, where N is the number of samples.

In [38]:
ts.time

array([0.00000000e+00, 4.16666667e-03, 8.33333333e-03, ...,
       6.55750000e+01, 6.55791667e+01, 6.55833333e+01])

data property
-------------

This is a dictionary where each key corresponds to a specific data, and where each data is an array with N as its first dimension. Here, we had 6 columns in the DataFrame. Each of these columns corresponds to a data key, and each data key contains the column values as a one-dimensional numpy array of length N.

In [43]:
ts.data

{'Fx': array([1.27, 0.49, 0.  , ..., 0.43, 0.51, 0.58]),
 'Fy': array([-0.89, -0.83, -0.78, ..., -0.23,  0.06,  0.34]),
 'Fz': array([-0.2 , -0.51, -0.51, ...,  2.04,  2.35,  2.65]),
 'Mx': array([-0.03,  0.02,  0.04, ...,  0.02, -0.05, -0.09]),
 'My': array([ 0.05, -0.01, -0.07, ...,  0.08,  0.12,  0.17]),
 'Mz': array([-0.03, -0.13, -0.18, ..., -0.03,  0.03,  0.08])}

### Data as n-dimensional arrays ###

Up to now, there is no true benefit to using a TimeSeries instead of a DataFrame to process the forces and moments measured by the instrumented wheel. Let's see how the TimeSeries addresses point 1: **dealing with series of vectors and matrices**.

In reality, Fx, Fy and Fz are components of a single entity which is a force vector. Similarly, Mx, My and Mz are components of a single entity which is a moment vector. Let's see what happens if we name the DataFrames' columns differently.

In [47]:
df.columns = ['Forces[0]', 'Forces[1]', 'Forces[2]', 'Moments[0]', 'Moments[1]', 'Moments[2]']
df

Unnamed: 0,Forces[0],Forces[1],Forces[2],Moments[0],Moments[1],Moments[2]
0.000000,1.27,-0.89,-0.20,-0.03,0.05,-0.03
0.004167,0.49,-0.83,-0.51,0.02,-0.01,-0.13
0.008333,0.00,-0.78,-0.51,0.04,-0.07,-0.18
0.012500,-0.13,-0.93,-0.41,0.03,-0.16,-0.18
0.016667,-0.02,-0.89,0.00,0.01,-0.21,-0.13
...,...,...,...,...,...,...
65.566667,0.52,-0.62,1.33,0.17,0.21,-0.15
65.570833,0.47,-0.52,1.63,0.11,0.13,-0.13
65.575000,0.43,-0.23,2.04,0.02,0.08,-0.03
65.579167,0.51,0.06,2.35,-0.05,0.12,0.03


Now we convert this DataFrame to a TimeSeries:

In [48]:
ts = ktk.TimeSeries().from_dataframe(df)
ts.data

{'Forces': array([[ 1.27, -0.89, -0.2 ],
        [ 0.49, -0.83, -0.51],
        [ 0.  , -0.78, -0.51],
        ...,
        [ 0.43, -0.23,  2.04],
        [ 0.51,  0.06,  2.35],
        [ 0.58,  0.34,  2.65]]),
 'Moments': array([[-0.03,  0.05, -0.03],
        [ 0.02, -0.01, -0.13],
        [ 0.04, -0.07, -0.18],
        ...,
        [ 0.02,  0.08, -0.03],
        [-0.05,  0.12,  0.03],
        [-0.09,  0.17,  0.08]])}

We see that instead of being separated into 6 separate components, the three components of both the forces and moments are now grouped in two Nx3 arrays. This may greatly simplify subsequent data processing. For example, let's see how we can create a new data key in the TimeSeries that corresponds to the total vectorial force $F_{tot} = \sqrt{(F_x^2 + F_y^2 + F_z^2)}$:

In [49]:
ts.data['Ftot'] = np.sqrt(np.sum(ts.data['Forces']**2, axis=1))
ts.data

{'Forces': array([[ 1.27, -0.89, -0.2 ],
        [ 0.49, -0.83, -0.51],
        [ 0.  , -0.78, -0.51],
        ...,
        [ 0.43, -0.23,  2.04],
        [ 0.51,  0.06,  2.35],
        [ 0.58,  0.34,  2.65]]),
 'Moments': array([[-0.03,  0.05, -0.03],
        [ 0.02, -0.01, -0.13],
        [ 0.04, -0.07, -0.18],
        ...,
        [ 0.02,  0.08, -0.03],
        [-0.05,  0.12,  0.03],
        [-0.09,  0.17,  0.08]]),
 'Ftot': array([1.56364958, 1.09045862, 0.93193347, ..., 2.09747467, 2.40545214,
        2.73395318])}

This new data will appear as a new DataFrame column, should we want to continue the data processing using Pandas.

In [50]:
df = ts.to_dataframe()
df

Unnamed: 0,Forces[0],Forces[1],Forces[2],Moments[0],Moments[1],Moments[2],Ftot
0.000000,1.27,-0.89,-0.20,-0.03,0.05,-0.03,1.563650
0.004167,0.49,-0.83,-0.51,0.02,-0.01,-0.13,1.090459
0.008333,0.00,-0.78,-0.51,0.04,-0.07,-0.18,0.931933
0.012500,-0.13,-0.93,-0.41,0.03,-0.16,-0.18,1.024646
0.016667,-0.02,-0.89,0.00,0.01,-0.21,-0.13,0.890225
...,...,...,...,...,...,...,...
65.566667,0.52,-0.62,1.33,0.17,0.21,-0.15,1.556824
65.570833,0.47,-0.52,1.63,0.11,0.13,-0.13,1.774317
65.575000,0.43,-0.23,2.04,0.02,0.08,-0.03,2.097475
65.579167,0.51,0.06,2.35,-0.05,0.12,0.03,2.405452


time_info property
------------------

This property allows associating metadata to the time vector. It is a simple dictionary where each key is the name of a metadata. By default, `time_info` includes the `Unit` metadata, which corresponds to `s`. Any other metadata can be added by adding new keys in `time_info`.

In [51]:
ts.time_info

{'Unit': 's'}

data_info property
------------------

Similarly, `data_info` allows associating metadata to data. This is especially useful for addressing point 2: **dealing with several units**. This property is a dictionary of dictionaries. Each key corresponds to one of the data keys, and each enclosed dictionary provides metadata for this specific data.

To ease the management of `data_info`, one can use the TimeSeries' `add_data_info` method.

In [52]:
ts.add_data_info('Forces', 'Unit', 'N')
ts.add_data_info('Moments', 'Unit', 'Nm')
ts.add_data_info('Ftot', 'Unit', 'N')
ts.data_info

{'Forces': {'Unit': 'N'}, 'Moments': {'Unit': 'Nm'}, 'Ftot': {'Unit': 'N'}}

As for Pandas DataFrames, the TimeSeries provide a `plot` method to quickly plot the TimeSeries contents using matplotlib. This method, for example, makes use of `data_info` to write the data units on the plots.

In [1]:
ts.plot()

NameError: name 'ts' is not defined