#Preparing data for input into the hmm["activity"] models

In [1]:
from __future__ import division

%matplotlib inline
import pandas as pd
import thinkdsp
import thinkplot
import numpy as np

from magnitude import magnitude
from pipeline import preprocess, extract_features_with_sliding_window, learn

In [2]:
#Ryan's fancy way of inputting data in an easier way
data_dict = {'walking':{},'jogging':{},'upstairs':{},'downstairs':{}}
names = ['meg','ryan','dennis']
acts = ['walking', 'jogging', 'upstairs', 'downstairs']
for name in names:
    data_file_names = ['data/{}_{}_long.csv'.format(name, activity) for activity in acts]
    for i,file in enumerate(data_file_names):
        df = pd.read_csv(file)
        data_dict[acts[i]][name] = df

In [3]:
data_dict["walking"]["meg"].head()

Unnamed: 0,x,y,z,time
0,0.493804,2.130241,8.994417,1430067490092
1,0.135272,1.395221,7.765593,1430067490265
2,-2.08535,2.178125,9.363723,1430067490445
3,-2.765303,1.742979,9.216479,1430067490625
4,-1.693299,-0.641047,10.671555,1430067490805


In [4]:
feature_dict = {}
for activity, activity_data_dict in data_dict.iteritems():
    print "Activity: {}".format(activity)
    feature_dict[activity] = {}
    for person, person_data in activity_data_dict.iteritems():
        print "Person: {}".format(person)
        print "Person Data: \n {}".format(person_data.head())
        
        a_norm = preprocess(person_data)
        obs = extract_features_with_sliding_window(a_norm, n_windows=10)
        feature_dict[activity][person] = obs
    print

Activity: walking
Person: meg
Person Data: 
           x         y          z           time
0  0.493804  2.130241   8.994417  1430067490092
1  0.135272  1.395221   7.765593  1430067490265
2 -2.085350  2.178125   9.363723  1430067490445
3 -2.765303  1.742979   9.216479  1430067490625
4 -1.693299 -0.641047  10.671555  1430067490805
Person: dennis
Person Data: 
           x         y         z           time
0 -4.590283 -8.205527  1.548450  1430066134259
1 -4.669292 -8.217499  2.177526  1430066134439
2 -4.509479 -8.155848  2.326566  1430066134620
3 -4.317344 -8.166023  2.051831  1430066134800
4 -4.507683 -8.309077  2.343325  1430066134980
Person: ryan
Person Data: 
           x         y         z           time
0 -4.590283 -8.205527  1.548450  1430066134259
1 -4.669292 -8.217499  2.177526  1430066134439
2 -4.509479 -8.155848  2.326566  1430066134620
3 -4.317344 -8.166023  2.051831  1430066134800
4 -4.507683 -8.309077  2.343325  1430066134980

Activity: downstairs
Person: meg
Person Data

Here is the matrix that represents the features extracted for a sequence of windows.  Let's call this matrix $X$

In [5]:
feature_dict['downstairs']['dennis']

[array([[  0.33333333,  12.87909771],
        [  0.33333333,  36.88440099],
        [  0.33333333,  36.5647144 ],
        [  0.25      ,  21.97315802],
        [  0.2       ,  34.39416802],
        [  0.2       ,  28.47431305],
        [  0.25      ,  24.86148643],
        [  0.14285714,  21.78608675],
        [  0.09090909,  11.73977745],
        [  0.08333333,   3.53140031]]), array([[  0.33333333,  36.88440099],
        [  0.33333333,  36.5647144 ],
        [  0.25      ,  21.97315802],
        [  0.2       ,  34.39416802],
        [  0.2       ,  28.47431305],
        [  0.25      ,  24.86148643],
        [  0.14285714,  21.78608675],
        [  0.09090909,  11.73977745],
        [  0.08333333,   3.53140031],
        [  1.        ,   0.21724057]]), array([[  0.33333333,  36.5647144 ],
        [  0.25      ,  21.97315802],
        [  0.2       ,  34.39416802],
        [  0.2       ,  28.47431305],
        [  0.25      ,  24.86148643],
        [  0.14285714,  21.78608675],
        [ 

Check out the ```learn``` function in ```pipeline.py```: the real magic takes place there.  After training on the features, a dictionary of hidden markov models is returned (```hidden_markov_models```).  These four models can give us a (log)likelihood that a new sequence belongs to the activity they model. The model with the maximum likelihood will tell us which activity is happening.

In [6]:
hidden_markov_models = learn(feature_dict)

Plug in different strings for the feature dict, to make a score of how likely the sequence, for a particular activity and user, be represented by the activity model.  Right now, we are training and testing on the same dataset.

In [7]:
for activity in acts:
    print activity
    print hidden_markov_models[activity].score(feature_dict['downstairs']['dennis'][0])

walking
-31.4626921909
jogging
-36.0912030657
upstairs
-43.5830694044
downstairs
-30.9317716329


If we want to make activity prediction on unseen data, all we have to do is the following:
    
    for activity in acts:
        print activity
        
        # Score a unseen sequence of observations
        # unseen_obs will be shape (n_windows, n_features)
        # unseen_obs = feature_dict[activity][unseen_user][i]
        print hidden_markov_model[activity].score(unseen_obs)

Then check out which log likelihood is highest.  That will be the predicted activity.
        