# Multirotor Activity Recognition - TRAINING YOUR MODEL

Welcome to this training module that teaches you how to perform activity recognition with the help of a hosted Machine Learning instance. This notebook will guide you through the process of extracting information from a large dataset of existing IMU flight data.

## Step 1) Getting started

Let's start by importing libraries and defining runtime constants we will be running during the course of this training module. These constants will help keep track of file directories we will be interacting with.

In [1]:
print('===[STEP 1 - START]===\n')
print('Setting up activity runtime environment...')

print('Importing used libraries...')
from IPython.display import Markdown, display
import json
import boto3
import boto3.session
import string
import os
import csv
import numpy as np

print('Defining runtime constants...')
s3_workspace_bucket = 'mldelarosa-thesis'
s3_subdir_group_training_flight_log = 'mar-lab-workspace/exercise-training/group-training-dataset/'
s3_subdir_group_training_dataset = 'mar-lab-workspace/exercise-training/group-training-dataset/'

jupyter_subdir_group_training_dataset = './data/training-group-dataset/'
jupyter_subdir_group_training_flight_log = './data/training-group-logs/'
jupyter_subdir_group_workspace = './data/group-workspace/'

print('Defining custom functions...')
def make_path(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

def get_s3_client():
    session = boto3.session.Session()
    s3 = session.resource(service_name='s3', verify=True)
    return s3.meta.client


def print_markup(string):
    display(Markdown(string))
    
def print_file_preview(file, lines, hasHeader):
    stringFormat = ''
    print('\n\n===== FILE PREVIEW: ' + file + ' =====')
    if hasHeader:
        with open(file) as fileToPreview:
            try:
                fileIter = [next(fileToPreview) for x in range(lines)]
                for x in range(lines): 
                    lineCSV = fileIter[x].rstrip().split(',')
                    if x == 0:
                        numOfColumns = len(lineCSV)
                        dash = '-' * (17 * numOfColumns)
                        stringFormat = '{:15.15}| ' * numOfColumns
                        print(dash)
                        print(stringFormat.format(*lineCSV))
                        print(dash)
                    else:
                        print(stringFormat.format(*lineCSV))
            except StopIteration:
                print("*File too short for preview*")
                pass
                    
    else:
        with open(file) as fileToPreview:
            fileIter = [next(fileToPreview) for x in range(lines)]
            for x in range(lines):
                lineCSV = fileIter[x].rstrip().split(',')
                if x == 0:
                    numOfColumns = len(lineCSV)
                    stringFormat = '{:16.16}| ' * numOfColumns
                print(stringFormat.format(*lineCSV)[0:125])
    
    print('...')
    print('===== END FILE PREVIEW =====\n\n')

GROUP_NAME='held-with-manual-control'

print('[DONE] Runtime initialized')
print('\n===[STEP 1 - END]===')

===[STEP 1 - START]===

Setting up activity runtime environment...
Importing used libraries...
Defining runtime constants...
Defining custom functions...
[DONE] Runtime initialized

===[STEP 1 - END]===


## Step 2) Download MAR Dataset for training the model
To start training our linear classifier, let's download the data necessary for training our model. The __imu-database-training-set__ file downloaded in this step is a CSV data file that holds a large number of IMU entries recorded from reliable flight sessions. Each row is labelled so that the linear classifier knows what class of data an IMU entry belongs to.

In [2]:
print('===[STEP 2 - DOWNLOAD TRAINING DATA SET]===\n')

print('Please enter the name of your lab group\'s training dataset:')
GROUP_NAME = (input('Group Name: ') or "held-with-manual-control")


# Download prepped training data
s3_filepath_mar_training_database = 'mar-lab-workspace/exercise-training/group-training-dataset/' + GROUP_NAME + '/imu-data-log-latest'
jupyter_filepath_mar_training_database = './data/exercise-training-session/' + GROUP_NAME + '/imu-db/imu-database-training-set.csv'

# Download prepped evaluation data
s3_filepath_mar_evaluation_database = 'mar-lab-workspace/exercise-training/group-training-dataset/' + GROUP_NAME + '/imu-data-log-latest'
jupyter_filepath_mar_evaluation_database = './data/exercise-training-session/' + GROUP_NAME + '/imu-db/imu-database-evaluation-set.csv'

client = get_s3_client()
print('Downloading prepped training data...')
print('Downloading from: ' + s3_filepath_mar_training_database)
print('Downloading to: ' + jupyter_filepath_mar_training_database)
make_path(jupyter_filepath_mar_training_database)
group_flight_record = client.download_file(Bucket='mldelarosa-thesis',
                                           Key=s3_filepath_mar_training_database,
                                           Filename=jupyter_filepath_mar_training_database)

print('Downloading prepped evaluation data...')
print('Downloading from: ' + s3_filepath_mar_evaluation_database)
print('Downloading to: ' + jupyter_filepath_mar_evaluation_database)
make_path(jupyter_filepath_mar_evaluation_database)
group_flight_record = client.download_file(Bucket='mldelarosa-thesis',
                                           Key=s3_filepath_mar_evaluation_database,
                                           Filename=jupyter_filepath_mar_evaluation_database)

print('Download completed...')
print_file_preview(jupyter_filepath_mar_training_database, 10, 1)
print_file_preview(jupyter_filepath_mar_evaluation_database, 10, 1)
print('\n===[STEP 2 - END]===')

===[STEP 2 - DOWNLOAD TRAINING DATA SET]===

Please enter the name of your lab group's training dataset:
Group Name: poorly-trained
Downloading prepped training data...
Downloading from: mar-lab-workspace/exercise-training/group-training-dataset/poorly-trained/imu-data-log-latest
Downloading to: ./data/exercise-training-session/poorly-trained/imu-db/imu-database-training-set.csv
Downloading prepped evaluation data...
Downloading from: mar-lab-workspace/exercise-training/group-training-dataset/poorly-trained/imu-data-log-latest
Downloading to: ./data/exercise-training-session/poorly-trained/imu-db/imu-database-evaluation-set.csv
Download completed...


===== FILE PREVIEW: ./data/exercise-training-session/poorly-trained/imu-db/imu-database-training-set.csv =====
-----------------------------------------------------------------------------------------------------------------------
label          | accelerometer_x| accelerometer_y| accelerometer_z| gyrometer_x    | gyrometer_y    | gyromet

## Step 3) Extract samples by label and store them in subdirectories
Now that we have the data necessary to train our linear model, we organize the data by class in order to start the training process. For each unique label value in the __imu-database-training-set__ file, a new directory is created for that specific label. Rows are than compiled into multiple CSV files organized by label. 

In [3]:
def extract_label_sets_from_file(data_filepath, destination_dir):
    labelled_file = open(data_filepath, 'r')
    
    csv_columns = ['accelerometer_x','accelerometer_y','accelerometer_z','gyrometer_x','gyrometer_y','gyrometer_z']
    
    running_index = 0
    running_label = ''
    running_sample_index = {}
    running_sample_filename = ''
    labelled_row_reader = csv.DictReader(labelled_file)
    print('Extracting data labels to directories in: [' + destination_dir + ']...')
    for labelled_row in labelled_row_reader:
        if(running_label != labelled_row['label']):
            # Iterate sample file index for the current label
            running_label = labelled_row['label']
            if running_label in running_sample_index.keys():
                running_sample_index[running_label] = running_sample_index[running_label] + 1;
            else:
                running_sample_index[running_label] = 0;
            running_index = 0;
            running_sample_filename = destination_dir + \
                                        running_label + '/' + \
                                        running_label + '-sample-' + \
                                        str(running_sample_index[running_label]) + '.csv'
            make_path(running_sample_filename)
            running_sample_file = open(running_sample_filename, 'w')
            running_sample_file.write(','.join(csv_columns) + '\n')
            running_sample_file.write(labelled_row['accelerometer_x'] \
                                + ',' + labelled_row['accelerometer_y'] \
                                + ',' + labelled_row['accelerometer_z'] \
                                + ',' + labelled_row['gyrometer_x'] \
                                + ',' + labelled_row['gyrometer_y'] \
                                + ',' + labelled_row['gyrometer_z'] + '\n')
        else:
            running_index = running_index + 1
            running_sample_file.write(labelled_row['accelerometer_x'] \
                                + ',' + labelled_row['accelerometer_y'] \
                                + ',' + labelled_row['accelerometer_z'] \
                                + ',' + labelled_row['gyrometer_x'] \
                                + ',' + labelled_row['gyrometer_y'] \
                                + ',' + labelled_row['gyrometer_z'] + '\n')
    return running_sample_filename

print('===[STEP 3 - SEPARATING DATASETS BY LABEL]===\n')


print('Extracting labels for training dataset...')
s3_subdir_group_training_session = './data/exercise-training-session/' + GROUP_NAME + '/'
print('Extracting data labels from: ' + jupyter_filepath_mar_training_database)
lastSampleFile = extract_label_sets_from_file(jupyter_filepath_mar_training_database,
                                                s3_subdir_group_training_session + 'imu-db/')
print_file_preview(lastSampleFile, 10, 1)

print('Extracting labels for evaluation dataset...')
s3_subdir_group_evaluation_session = './data/exercise-training-session/' + GROUP_NAME + '/'
print('Extracting data labels from: ' + jupyter_filepath_mar_evaluation_database)
lastSampleFile = extract_label_sets_from_file(jupyter_filepath_mar_evaluation_database,
                                              s3_subdir_group_evaluation_session + 'imu-db/')
print_file_preview(lastSampleFile, 10, 1)

print('Samples have been extracted by label')
print('\n===[STEP 3 - END]===')

===[STEP 3 - SEPARATING DATASETS BY LABEL]===

Extracting labels for training dataset...
Extracting data labels from: ./data/exercise-training-session/poorly-trained/imu-db/imu-database-training-set.csv
Extracting data labels to directories in: [./data/exercise-training-session/poorly-trained/imu-db/]...


===== FILE PREVIEW: ./data/exercise-training-session/poorly-trained/imu-db/down/down-sample-0.csv =====
------------------------------------------------------------------------------------------------------
accelerometer_x| accelerometer_y| accelerometer_z| gyrometer_x    | gyrometer_y    | gyrometer_z    | 
------------------------------------------------------------------------------------------------------
-0.0117188     | 0.0168457      | -1.03809       | -0.00255877    | -0.00370992    | -0.000799999   | 
-0.0126953     | 0.0175781      | -1.04004       | 0.0356092      | 0.0420916      | -0.10767       | 
-0.0134277     | 0.0168457      | -1.04199       | 0.0127084      | 0.049

## Step 4) Calculate features from sliding windows on each label
Features are calculated windows of data organized by label. Such features include average, median, and possibly variance. In this cell, the csv data is sliced into windows of data for which features can be calculated from.

In [4]:
print('===[STEP 4 - EXTRACT FEATURES FOR EACH LABEL]===\n')

from itertools import islice

feature_csv_columns = ['average', 'median']
imu_data_columns = ['accelerometer_x','accelerometer_y','accelerometer_z','gyrometer_x','gyrometer_y','gyrometer_z']
imu_label_vocabulary = ['backward', 'forward', 'left', 'neutral', 'right', 'up', 'down']

def feature_average(data_sample):
    fSum = 0;
    nIndex = 0;
    for data in data_sample:
        fSum = fSum + data
        nIndex = nIndex + 1
    return float(fSum / nIndex)

def feature_variance(data_sample):
    return np.var(data_sample)

def feature_median(data_sample):
    return np.median(data_sample, axis=0)

feature_calculations = {
    'average' : feature_average,
    'median' : feature_median
}

# Read a *.csv file and extract the sliding window
import collections
def extract_features_from_imu_data_samples_for_label(data_sample_filepath, features_filepath, data_label):
#     print('Extracting for feature: ', data_label, 'from', data_sample_filepath)
    with open(data_sample_filepath, 'r') as csv_file:
        # extract data records by row
        reader = csv.DictReader(csv_file)
        sliding_windows = []
        sliding_index = 0
        window_step_forward = 1
        window_length = 4
        
        # extract sliding windows from rows
        sliding_window_csv = []
        for row in reader:
            sliding_window_csv.append(row)
            if(len(sliding_window_csv) == window_length + 1):
                del sliding_window_csv[0]
            if(sliding_index % window_step_forward == 0 and len(sliding_window_csv) == window_length):
                sliding_windows.append(list(sliding_window_csv))
            sliding_index = sliding_index + 1
        running_window_lines = []
                
        for feature_name, feature_func in feature_calculations.items():
            for imu_data_column in imu_data_columns:
                window_sequences = []
                for sliding_window in sliding_windows:
                    window_sequence = []
                    for window in sliding_window:
                        window_sequence.append(float(window[imu_data_column]))
                    window_sequences.append(window_sequence)
#                 print(imu_data_column, ' - ', window_sequences)
                
                window_index = 0
                comma_index = 0;
                window_count = len(window_sequences)
                while len(running_window_lines) < window_count:
                    running_window_lines.append('')
                for window in window_sequences:
                    running_window_lines[window_index % window_count] += (str(feature_func(window))) + ','
#                   running_window_lines[window_index % window_count] += ('{' + feature_name + ':' + imu_data_column + ':' + str(feature_func(window))) + '},'
#                   print(imu_data_column, ' _ ', feature_name, feature_func(window))
                    window_index = window_index + 1
#     print('PRINT FOR WINDOW ', window_count , running_window_lines)

    with open(features_filepath, 'a') as features_file:
        for feature_line in running_window_lines:
            features_file.write(data_label + ',' + feature_line[:-1] + '\n')
            

# Define labels to iterate through the raw IMU data directories
import glob
import os
labels = set()
labels = ['backward', 'forward', 'left', 'neutral', 'right', 'up', 'down']

# clear out training data from the last execution
with open(s3_subdir_group_training_session + 'training-feature-data-latest.csv', 'w') as file:
    file.write('')
    

# Iterate through each raw IMU data sample and extract their features:
print('Calculating features for samples in the training data directory...\n')
for labelled_features in labels:
    print('Extracting features for the label [' + labelled_features + ']...')
    print('Extracting features into the filepath ' + s3_subdir_group_training_session + 'imu-db/' + labelled_features + '/*.csv')
    for raw_data_dir in glob.glob(s3_subdir_group_training_session + 'imu-db/' + labelled_features + '/*.csv', recursive=False):
        extract_features_from_imu_data_samples_for_label(raw_data_dir, s3_subdir_group_training_session + 'training-feature-data-latest.csv', labelled_features)
print('\nThe features have been extracted and stored into: ' + s3_subdir_group_training_session + 'training-feature-data-latest.csv')
print_file_preview(s3_subdir_group_training_session + 'training-feature-data-latest.csv', 5, 0)


print('Calculating features for samples in the evaluation data directory...\n')
for labelled_features in labels:
    print('Extracting features for the label [' + labelled_features + ']...')
    print('Extracting features into the filepath ' + s3_subdir_group_evaluation_session + 'imu-db/' + labelled_features + '/*.csv')
    for raw_data_dir in glob.glob(s3_subdir_group_evaluation_session + 'imu-db/' + labelled_features + '/*.csv', recursive=False):
        extract_features_from_imu_data_samples_for_label(raw_data_dir, s3_subdir_group_evaluation_session + 'evaluation-feature-data-latest.csv', labelled_features)
    print('')
print('\nThe features have been extracted and stored into: ' + s3_subdir_group_evaluation_session + 'evaluation-feature-data-latest.csv')
print_file_preview(s3_subdir_group_evaluation_session + 'evaluation-feature-data-latest.csv', 5, 0)

print('\n===[STEP 4 - END]===')

===[STEP 4 - EXTRACT FEATURES FOR EACH LABEL]===

Calculating features for samples in the training data directory...

Extracting features for the label [backward]...
Extracting features into the filepath ./data/exercise-training-session/poorly-trained/imu-db/backward/*.csv
Extracting features for the label [forward]...
Extracting features into the filepath ./data/exercise-training-session/poorly-trained/imu-db/forward/*.csv
Extracting features for the label [left]...
Extracting features into the filepath ./data/exercise-training-session/poorly-trained/imu-db/left/*.csv
Extracting features for the label [neutral]...
Extracting features into the filepath ./data/exercise-training-session/poorly-trained/imu-db/neutral/*.csv
Extracting features for the label [right]...
Extracting features into the filepath ./data/exercise-training-session/poorly-trained/imu-db/right/*.csv
Extracting features for the label [up]...
Extracting features into the filepath ./data/exercise-training-session/poorly-


## Step 5) Train a linear classifier
A linear classifier is trained upon the extracted feature set calculated from the previous step.

In [7]:
print('===[STEP 5 - TRAIN A LINEAR CLASSIFIER]===')

import tensorflow as tf
import os
import shutil
import sys

# THIS MODEL-DIR POINTS TO THE CLASSIFICATION EXERCISE WORKSPACE
model_dir = '../mar-classification-exercise/tmp/model/' + GROUP_NAME + '/'
train_data = './data/exercise-training-session/' + GROUP_NAME + '/training-feature-data-latest.csv'
#eval_data = './data/exercise-training-session/' + GROUP_NAME + '/training-feature-data-latest.csv
eval_data = './data/exercise-training-session/' + GROUP_NAME + '/evaluation-feature-data-latest.csv'

# delete the model directory
shutil.rmtree(model_dir, ignore_errors=True)

# declare feature columns within csv
mean_acc_x = tf.feature_column.numeric_column(key='mean_acc_x', dtype=tf.float64);
mean_acc_y = tf.feature_column.numeric_column(key='mean_acc_y', dtype=tf.float64);
mean_acc_z = tf.feature_column.numeric_column(key='mean_acc_z', dtype=tf.float64);

mean_gyro_roll = tf.feature_column.numeric_column(key='mean_gyro_roll', dtype=tf.float64);
mean_gyro_pitch = tf.feature_column.numeric_column(key='mean_gyro_pitch', dtype=tf.float64);
mean_gyro_yaw = tf.feature_column.numeric_column(key='mean_gyro_yaw', dtype=tf.float64);

median_acc_x = tf.feature_column.numeric_column(key='median_acc_x', dtype=tf.float64);
median_acc_y = tf.feature_column.numeric_column(key='median_acc_y', dtype=tf.float64);
median_acc_z = tf.feature_column.numeric_column(key='median_acc_z', dtype=tf.float64);

median_gyro_roll = tf.feature_column.numeric_column(key='median_gyro_roll', dtype=tf.float64);
median_gyro_pitch = tf.feature_column.numeric_column(key='median_gyro_pitch', dtype=tf.float64);
median_gyro_yaw = tf.feature_column.numeric_column(key='median_gyro_yaw', dtype=tf.float64);




# stack feature columns into a single array
imu_window_feature_columns = [
        mean_acc_x, mean_acc_y, mean_acc_z,
        mean_gyro_roll, mean_gyro_pitch, mean_gyro_yaw,
        median_acc_x, median_acc_y, median_acc_z,
        median_gyro_roll, median_gyro_pitch, median_gyro_yaw,
    ]

run_config=tf.estimator.RunConfig().replace(
    session_config=tf.ConfigProto(device_count={'GPU': 0})
)

def input_fn(data_file):
    assert tf.gfile.Exists(data_file),('%s not found')
    records_default = [['neutral'],
                       [0.0], [0.0], [0.0],
                       [0.0], [0.0], [0.0],
                       [0.0], [0.0], [0.0],
                       [0.0], [0.0], [0.0]]
    csv_columns = [
                    'label',
                    'mean_acc_x','mean_acc_y','mean_acc_z',
                    'mean_gyro_roll','mean_gyro_pitch','mean_gyro_yaw',
                    'median_acc_x','median_acc_y','median_acc_z',
                    'median_gyro_roll','median_gyro_pitch','median_gyro_yaw'
    ]
    
    def parse_csv(value):
        columns = tf.decode_csv(value, records_default)
        features = dict(zip(csv_columns, columns))
        labels = features.pop('label')
        return features, labels
    
    dataset = tf.data.TextLineDataset(data_file)
    dataset = dataset.shuffle(200)
    dataset = dataset.map(parse_csv, 4)
    dataset = dataset.batch(200)
    return dataset

model = tf.estimator.LinearClassifier(
    model_dir=model_dir,
    feature_columns=imu_window_feature_columns,
    config=run_config,
    n_classes=7,
    label_vocabulary=['backward', 'forward', 'left', 'neutral', 'right', 'up', 'down']
)

# A Deep Neural Network Classifier can also be substituted for linear classifier
# model = tf.estimator.DNNClassifier(
#     model_dir=model_dir,
#     feature_columns=imu_window_feature_columns,
#     config=run_config,
#     hidden_units=[100, 75, 50, 25],
#     n_classes=7,
#     label_vocabulary=['backward', 'forward', 'left', 'neutral', 'right', 'up', 'down']
# )


# Train and evaluate the model every certain number of epochs.
# More information available at https://docs.aws.amazon.com/sagemaker/latest/dg/tf-training-inference-code-template.html
for n in range(100):
    # Display INFO logs from tensorflow every 10 iterations
    if ((n+1) % 20 == 0):
        print('\n')
        print('=' * 80)
        print('Detail execution of epoch', n)
        tf.logging.set_verbosity(tf.logging.INFO)
    else:
        print('Training at epoch', n, '...')
        tf.logging.set_verbosity(tf.logging.ERROR)
    
    model.train(input_fn=lambda: input_fn(
        train_data))
    
#     Detailed evaluation log of model's loss and accuracy during an epoch
#     results = model.evaluate(input_fn=lambda: input_fn(
#          eval_data))
#     print("{},{},{}".format(n, results['loss'], results['accuracy']))
    
    # Display evaluation metrics every 10 iterations
    if ((n+1) % 20 == 0):
        print ("Evaluating model...")
        results = model.evaluate(input_fn=lambda: input_fn(
        eval_data))
        print(results)
        print('=' * 80)

print('')
print('=' * 80)
print ("Final Model Evaluation")

for key in results:
    print("  {}: {}".format(key, results[key]))
print('=' * 80)

print('[DONE]')
print('\n===[STEP 5 - END]===')

===[STEP 5 - TRAIN A LINEAR CLASSIFIER]===
Training at epoch 0 ...
Training at epoch 1 ...
Training at epoch 2 ...
Training at epoch 3 ...
Training at epoch 4 ...
Training at epoch 5 ...
Training at epoch 6 ...
Training at epoch 7 ...
Training at epoch 8 ...
Training at epoch 9 ...
Training at epoch 10 ...
Training at epoch 11 ...
Training at epoch 12 ...
Training at epoch 13 ...
Training at epoch 14 ...
Training at epoch 15 ...
Training at epoch 16 ...
Training at epoch 17 ...
Training at epoch 18 ...


Detail execution of epoch 19
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ../mar-classification-exercise/tmp/model/poorly-trained/model.ckpt-779
INFO:tensorflow:Saving checkpoints for 780 into ../mar-classification-exercise/tmp/model/poorly-trained/model.ckpt.
INFO:tensorflow:loss = 519.5571, step = 780
INFO:tensorflow:Saving checkpoints for 820 into ../mar-classification-exercise/tmp/model/poorly-trained/model.ckpt.
INFO:tensorflow:Loss for fin

In [None]:
#tf.train.list_variables(model_dir)

weightNames = model.get_variable_names()
#weightValues = [model.get_variable_value(name) for name in wt_names]

for name in weightNames:
    print(name, ':\n', model.get_variable_value(name), '\n')