# Activity Classification from Raw Sensor Data

This project describes how to take raw sensor data from PowerSense (and iOS app), break it into uniformly sized samples, featurize those samples in some way, and classify activities based on those chosen features.

## Generate samples from raw data

These two snippets use _pandas_ to read the CSV file containing raw sensor data and call `get_samples` to split the data frames into 10 second (i.e. 1000 data points at 100 Hz) samples. These samples are stored in a dictionary and are featurized in the next section. 

In [1]:
def get_samples(data_points, points_per_sample):
    return [data_points[i:i+points_per_sample] for i in range(0, len(data_points), points_per_sample)]

In [33]:
import csv
import math
import pandas as pd
import matplotlib.pyplot as plt

POINTS_PER_SAMPLE = 1000

activities = ["sitting", "walking", "walking_holding_phone", "stairs", "car"]

activity_samples = {}    # Mapping from activity to list of 10 second data frames
for activity in activities:
    activity_samples[activity] = get_samples(pd.read_csv(activity + '.csv'), POINTS_PER_SAMPLE)
    print "Found", len(activity_samples[activity]), "samples for", activity

Found 30 samples for sitting
Found 30 samples for walking
Found 30 samples for walking_holding_phone
Found 30 samples for stairs
Found 30 samples for car


## Featurize samples

Below, I take each sample generated above and featurize it. The features I chose are the mean and variance of each of the fields provided by the sensors (12 fields, 24 features). I also tried using a Fourier Transform to calculate the frequency and wavelength for each sensor in the sample, but this negatively affected my classifiers, so I have commented it out.

By the end of this snippet, the variables _X_ and _y_ contain the features and labels of the data. Also, _X\_train_, _y\_train_, _X\_test_, and _y\_test_ are provided for manual scoring.

In [36]:
import numpy.fft as fft
import numpy as np

ACCELERATION_FIELDS = ['user_acc_x', 'user_acc_y', 'user_acc_z']
ATTITUDE_FIELDS = ['attitude_roll', 'attitude_pitch', 'attitude_yaw']
ROTATION_RATE_FIELDS = ['rotation_rate_x', 'rotation_rate_y', 'rotation_rate_z']
GRAVITY_FIELDS = ['gravity_x', 'gravity_y', 'gravity_z']

FIELDS = ACCELERATION_FIELDS + ATTITUDE_FIELDS + ROTATION_RATE_FIELDS + GRAVITY_FIELDS

activity_features = {}    # Mapping from activity to list of samples' features
for activity in activities:
    activity_features[activity] = []
    for sample in activity_samples[activity]:
        # Find mean and variance
        sample_features = [sample.mean()[field] for field in FIELDS]
        sample_features += [sample.var()[field] for field in FIELDS]
        
        # Use FFT to find wavelength and frequency
        #sample_features += [np.argmax(np.abs(fft.fft(sample[field]))) for field in FIELDS]
        #sample_features += [max(np.abs(fft.fft(sample[field]))) for field in FIELDS]
        
        # Add all the features
        activity_features[activity].append(sample_features)

X = []
X_train = []
X_test = []
for activity in activities:
    X += activity_features[activity]
    X_train += activity_features[activity][:26]
    X_test += activity_features[activity][26:]

# Generate labels
y = [int(math.floor(float(i) / 150 * 5)) for i in range(0, 150)]
y_train = [int(math.floor(float(i) / 130 * 5)) for i in range(0, 130)]
y_test = [int(math.floor(float(i) / 20 * 5)) for i in range(0, 20)]

print "Featurization complete."


Featurization complete.


## Train and test classifiers

Below, for each classifier I want to try, I import the classifier from SciKit Learn, initialize it (with default parameters, and cross validate it (with three folds). The accuracy is printed out with its 95% confidence interval.

In [37]:
from sklearn import cross_validation

# SVM
from sklearn import svm
svmClassifier = svm.SVC()
svmScores = cross_validation.cross_val_score(svmClassifier, X, y)
print("SVM: %0.2f (+/- %0.2f)" % (svmScores.mean(), svmScores.std() * 2))

# Decision Tree
from sklearn.tree import DecisionTreeClassifier
dtClassifier = DecisionTreeClassifier()
dtScores = cross_validation.cross_val_score(dtClassifier, X, y)
print("Decision Tree: %0.2f (+/- %0.2f)" % (dtScores.mean(), dtScores.std() * 2))

# Logisitic Regression
from sklearn.linear_model import LogisticRegression
lrClassifier = LogisticRegression()
lrScores = cross_validation.cross_val_score(lrClassifier, X, y)
print("Logistic Regression: %0.2f (+/- %0.2f)" % (lrScores.mean(), lrScores.std() * 2))

SVM: 0.85 (+/- 0.10)
Decision Tree: 0.92 (+/- 0.23)
Logistic Regression: 0.95 (+/- 0.08)


In [66]:
breathing_data = pd.read_csv('breathing.csv')

BREATHING_FIELD = 'user_acc_z'

plt.plot(np.abs(fft.fft(breathing_data[BREATHING_FIELD])))
plt.ylim([0,10])
#plt.show()


wavelength = np.argmax(np.abs(fft.fft(breathing_data[BREATHING_FIELD]))[1000:29000])

print 'Time per breath:', float(wavelength)/100, 'seconds'
print 'Respiratory rate:', 60.0/wavelength

Time per breath: 21.55 seconds
Respiratory rate: 0.0278422273782
