# Period Searcher
This is an exploratory test to find a method that finds the period of an activity shapelet

In [2]:
import matplotlib.pyplot as plt
%matplotlib qt
import pandas as pd

## Generate activity tables
For now, we will only focus on the x-axis and create a sample to examine in detail.

In [3]:
walking = pd.read_csv('../data/107/13_treadmill_3mph_0%.csv', names=["tick", "timestamp",
                                             "activity", "x", "y",
                                             "z", "user"], index_col=False)
sample = walking.loc[5500:6000,'x']

Normalize sample index

In [4]:
sample = sample.to_frame()
sample.columns = ['x']
sample = sample.reset_index()
sample = sample.drop('index', 1)

In [5]:
plt.close()
plt.plot(sample)

[<matplotlib.lines.Line2D at 0x12010fda0>]

## Threshold analysis
First examine using a threshold to find period peaks.  This method assumes there will be a consistent peak somewhere within each shaplet.  Potentially problematic if the activity does not produce an obvious 'marking' peak characteristic.

In [6]:
walking.describe()

Unnamed: 0,tick,timestamp,activity,x,y,z,user
count,21501.0,21501.0,21501.0,21501.0,21501.0,21501.0,21501.0
mean,776561.0,20130810000000.0,13.0,-0.948345,0.246826,0.145762,107.0
std,6206.948405,104.9532,0.0,0.236696,0.20578,0.148296,0.0
min,765811.0,20130810000000.0,13.0,-1.836,-0.22,-0.279,107.0
25%,771186.0,20130810000000.0,13.0,-1.114,0.117,0.056,107.0
50%,776561.0,20130810000000.0,13.0,-0.947,0.214,0.12,107.0
75%,781936.0,20130810000000.0,13.0,-0.786,0.331,0.226,107.0
max,787311.0,20130810000000.0,13.0,-0.334,1.226,0.889,107.0


In [7]:
walking_cut = walking.loc[2200:19000,'x']
walking_cut.describe()

count    16801.000000
mean        -0.948915
std          0.258736
min         -1.836000
25%         -1.147000
50%         -0.918000
75%         -0.762000
max         -0.334000
Name: x, dtype: float64

In [8]:
plt.close()
plt.hist(walking_cut, bins=1000)

(array([   1.,    0.,    0.,    0.,    0.,    1.,    0.,    0.,    0.,
           1.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,
           0.,    1.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,
           0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,
           0.,    2.,    0.,    1.,    0.,    0.,    0.,    1.,    0.,
           2.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,
           2.,    0.,    1.,    0.,    0.,    0.,    2.,    0.,    0.,
           0.,    0.,    0.,    3.,    0.,    0.,    0.,    2.,    0.,
           2.,    0.,    2.,    0.,    2.,    0.,    0.,    4.,    0.,
           1.,    0.,    2.,    0.,    1.,    0.,    0.,    0.,    1.,
           0.,    5.,    0.,    2.,    0.,    0.,    0.,    0.,    0.,
           7.,    0.,    1.,    0.,    4.,    0.,    4.,    0.,    0.,
           0.,    2.,    0.,    1.,    0.,    1.,    0.,    6.,    0.,
           0.,    0.,    7.,    0.,    7.,    0.,    6.,    0.,    6.,
      

In [9]:
plt.close()
builder = walking_cut
plt.plot(builder, 'b')
plt.plot([builder.first_valid_index(), builder.last_valid_index()], [-1.47, -1.47], 'g')

[<matplotlib.lines.Line2D at 0x123cb7550>]

I am not sure if this method will work too well with a single threshold.  There appears to be too much variability in acceleration in the x-axis

In [10]:
plt.close()
plt.plot(sample)

[<matplotlib.lines.Line2D at 0x124dea5f8>]

## Testing PeakUtils


In [28]:
import peakutils
indexes = peakutils.indexes(sample['x'].values, thres=-3, min_dist=.5)
peaks = sample.iloc[indexes]
plt.close()
plt.plot(sample)
plt.plot(peaks, "r+")

[<matplotlib.lines.Line2D at 0x12d642cc0>]

In [51]:
flipped = sample * -1
flipped_indexes = peakutils.indexes(flipped['x'].values, thres=.8)
flipped_peaks = flipped.iloc[flipped_indexes]
plt.close()
plt.plot(flipped)
plt.plot(flipped_peaks, 'r+')

[<matplotlib.lines.Line2D at 0x12a1d1940>]

In [52]:
flipped_peaks['peak'] = flipped_peaks.index
flipped_peaks.describe()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,x,peak
count,5.0,5.0
mean,1.6726,204.4
std,0.077854,155.271697
min,1.589,9.0
25%,1.607,106.0
50%,1.677,203.0
75%,1.71,302.0
max,1.78,402.0


In [86]:
flipped_peaks['shift'] = flipped_peaks.index
diff = flipped_peaks.loc[:,['shift']].shift(1)
diff = diff.join(flipped_peaks.loc[:, ['peak']])
diff = diff.dropna(0)
flipped_peaks

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,x,peak,shift
9,1.607,9,9
106,1.677,106,106
203,1.71,203,203
302,1.78,302,302
402,1.589,402,402


In [90]:
average = diff['peak'] - diff['shift']
average.mean()

98.25

This gave us relatively accurate results for the sample. The variance over the entire set may be a problem

In [102]:
def estimate_period(df, threshold):
    indexes = peakutils.indexes(df['x'].values, thres=threshold)
    indexes = pd.Series(indexes)
    diff = indexes.drop(0) - indexes.shift(1).drop(0)
    return diff.mean()
estimate_period(sample * -1, .8)

98.25