# Turi Create Activity Classification with HAPT

TBD

# References

* UCI Machine Language Repository.  Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set. http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions

* Turi Create Activity Classification. https://apple.github.io/turicreate/docs/userguide/activity_classifier/

# Sensor Data

In the HAPT dataset, the sensors were sampled at 50Hz (50 times a second). With this knowledge we can control the output prediction via a `prediction_window` paramer. For example, if we want to produce a prediction every 5 seconds, and the sensors are sampled at 50Hz (as they are in this dataset)  we set the `prediction_window`  250 (5 sec * 50 samples per second).

# Data Plots

TBD

# Environment setup

In [1]:
import turicreate as tc

# Loading the Dataset into Turi Create

The data was downloaded from the UCI archive (see reference above) and stored in a directory that is below this notebook's directory: `data/HAPT Data Set/`.

This folder contains 3 types of files - a file containing the performed activities for each experiment, files containing the collected accelerometer samples, and files containing the collected gyroscope samples.

The first file is labels.txt, which contains activities performed for each experiment. The labels are specified by sample index ranges. For example, in experiment 1 the subject was performing activity number 5 between the 250th collected sample and the 1232th collected sample. The activities are encoded between numbers 1 and 6. We convert these to strings at the end of this section. The code below imports Turi Create, loads labels.txt into an SFrame, and defines a function to find the label given a sample index.

In [2]:
dataDir = 'data/HAPT Data Set/RawData/'

In [3]:
def find_label_for_containing_interval(intervals, index):
    containing_interval = intervals[:, 0][(intervals[:, 1] <= index) & (index <= intervals[:, 2])]
    if len(containing_interval) == 1:
        return containing_interval[0]

Load the labels

In [4]:
labels = tc.SFrame.read_csv(dataDir + 'labels.txt', delimiter=' ', header=False,
                            verbose=False)
labels = labels.rename({'X1': 'exp_id', 'X2': 'user_id', 'X3': 'activity_id',
                        'X4': 'start', 'X5': 'end'})
labels

exp_id,user_id,activity_id,start,end
1,1,5,250,1232
1,1,7,1233,1392
1,1,4,1393,2194
1,1,8,2195,2359
1,1,5,2360,3374
1,1,11,3375,3662
1,1,6,3663,4538
1,1,10,4539,4735
1,1,4,4736,5667
1,1,9,5668,5859


Next, we need to get the accelerometer and gyroscope data for each experiment. For each experiment, every sensor's data is in a separate file. In the code below we load the accelerometer and gyroscope data from all experiments into a single SFrame. While loading the collected samples, we also calculate the label for each sample using our previously defined function. The final SFrame contains a column named exp_id to identify each unique sessions.

In [6]:
from glob import glob

acc_files = glob(dataDir + 'acc_*.txt')
gyro_files = glob(dataDir + 'gyro_*.txt')

Load the data

In [7]:
data = tc.SFrame()
files = zip(sorted(acc_files), sorted(gyro_files))
for acc_file, gyro_file in files:
    exp_id = int(acc_file.split('_')[1][-2:])

    # Load accel data
    sf = tc.SFrame.read_csv(acc_file, delimiter=' ', header=False, verbose=False)
    sf = sf.rename({'X1': 'acc_x', 'X2': 'acc_y', 'X3': 'acc_z'})
    sf['exp_id'] = exp_id

    # Load gyro data
    gyro_sf = tc.SFrame.read_csv(gyro_file, delimiter=' ', header=False, verbose=False)
    gyro_sf = gyro_sf.rename({'X1': 'gyro_x', 'X2': 'gyro_y', 'X3': 'gyro_z'})
    sf = sf.add_columns(gyro_sf)

    # Calc labels
    exp_labels = labels[labels['exp_id'] == exp_id][['activity_id', 'start', 'end']].to_numpy()
    sf = sf.add_row_number()
    sf['activity_id'] = sf['id'].apply(lambda x: find_label_for_containing_interval(exp_labels, x))
    sf = sf.remove_columns(['id'])

    data = data.append(sf)


Finally, we encode the labels back into a readable string format, and save the resulting SFrame.

In [8]:
target_map = {
    1.: 'walking',          
    2.: 'climbing_upstairs',
    3.: 'climbing_downstairs',
    4.: 'sitting',
    5.: 'standing',
    6.: 'laying'
}


Use same lables as in the UCI experiment

In [9]:
data = data.filter_by(list(target_map.keys()), 'activity_id')
data['activity'] = data['activity_id'].apply(lambda x: target_map[x])
data = data.remove_column('activity_id')

data.save('hapt_data.sframe')

Load sessions from the preprocessed data

In [10]:
data = tc.SFrame('hapt_data.sframe')
print(data)

+-------------------+---------------------+---------------------+--------+
|       acc_x       |        acc_y        |        acc_z        | exp_id |
+-------------------+---------------------+---------------------+--------+
| 1.020833394742025 | -0.1250000020616516 |  0.105555564319952  |   1    |
| 1.025000070391787 | -0.1250000020616516 |  0.1013888947481719 |   1    |
| 1.020833394742025 | -0.1250000020616516 |  0.1041666724366978 |   1    |
| 1.016666719092262 | -0.1250000020616516 |  0.1083333359304957 |   1    |
| 1.018055610975516 | -0.1277777858281599 |  0.1083333359304957 |   1    |
| 1.018055610975516 | -0.1291666655554495 |  0.1041666724366978 |   1    |
|  1.01944450285877 | -0.1250000020616516 |  0.1013888947481719 |   1    |
| 1.016666719092262 | -0.1236111101783975 | 0.09722222517639174 |   1    |
| 1.020833394742025 | -0.1277777858281599 | 0.09861111705964588 |   1    |
|  1.01944450285877 | -0.1152777831908018 | 0.09444444748786576 |   1    |
+-------------------+----

Train and split by recorded sessions

In [11]:
train, test = tc.activity_classifier.util.random_split_by_session(data,
                                                                  session_id='exp_id',
                                                                  fraction=0.8)

In [12]:
len(test['exp_id'].unique())

5

Now we create the activity classifier. Use the GPU if one is available

In [13]:
model = tc.activity_classifier.create(train, session_id='exp_id', target='activity',
                                      prediction_window=50)

In [14]:
print(test)

+--------------------+---------------------+----------------------+--------+
|       acc_x        |        acc_y        |        acc_z         | exp_id |
+--------------------+---------------------+----------------------+--------+
| 0.9652778166595758 | -0.2527777878898114 |  0.1124999994242935  |   6    |
| 1.031944529808058  | -0.1819444477154254 | -0.02638889107998799 |   6    |
| 1.031944529808058  | -0.2263888907318412 |  0.0625000010308258  |   6    |
|  1.03472221632685  | -0.2083333448733957 | 0.02222222302770345  |   6    |
|  1.03472221632685  | -0.2083333448733957 | 0.02222222302770345  |   6    |
|  0.89722221162784  | -0.2027777894963437 | -0.09027778399406793 |   6    |
| 0.9652778166595758 | -0.2263888907318412 |  0.1013888947481719  |   6    |
| 1.001388908376467  | -0.2208333475107537 | 0.07777778135670985  |   6    |
| 1.022222286625279  |  -0.212499996211229 | 0.05972222334229981  |   6    |
|  1.0125000434425   | -0.2291666744983494 |  0.1097222278137498  |   6    |

Evaluate the model

In [15]:
metrics = model.evaluate(test)

In [16]:
print(metrics['accuracy'])

0.8651272505536122


Save the Model for use in Swift Core ML

In [17]:
model.save('hapt.model')
model.export_coreml('UCIHAPTClassifier.mlmodel')