## Generic Timeseries SageMaker Template with Gluon

This is a template to run the human activity recognition notebook. Refer the `smartphone_human_activity_classification_gluon.ipynb` for non sagemaker version

In [35]:
import os
import boto3
import sagemaker
from sagemaker.mxnet import MXNet
from mxnet import gluon
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Load and Package Data

1. Load your train and test data as numpy arrays
2. Package data as a pickle file and upload to S3

by doing this, we can use the generic_ts.py file to run any timeseries classification task with SageMaker

In [5]:
import csv
import numpy as np

def get_labels_from_csv(path):
    values = []
    with open(path, 'rb') as csvfile:
        rd = csv.reader(csvfile, delimiter=',')
        for row in rd:
            values.append(float(row[0]))
    return np.array(values).astype('float32')


INPUT_SIGNAL_TYPES = [
    "body_acc_x_",
    "body_acc_y_",
    "body_acc_z_",
    "body_gyro_x_",
    "body_gyro_y_",
    "body_gyro_z_",
    "total_acc_x_",
    "total_acc_y_",
    "total_acc_z_"
]

LABELS = [
    "WALKING",
    "WALKING_UPSTAIRS",
    "WALKING_DOWNSTAIRS",
    "SITTING",
    "STANDING",
    "LAYING"
]


path = 'ts_data'

train = [path + "/train/%strain.txt" % signal for signal in INPUT_SIGNAL_TYPES]
test = [path + "/test/%stest.txt" % signal for signal in INPUT_SIGNAL_TYPES]


def load_data(files):
    arr = []
    for fname in files:
        with open(fname, 'r') as f:
            rows = [row.replace('  ', ' ').strip().split(' ') for row in f]
            arr.append([np.array(ele, dtype=np.float32) for ele in rows])
    return np.transpose(np.array(arr), (1, 2, 0))

In [7]:
X_train = load_data(train)
X_test = load_data(test)

X_train.shape, X_test.shape

((7352, 128, 9), (2947, 128, 9))

In [10]:
y_train_path = path + "/train/y_train.txt"
y_train = get_labels_from_csv(y_train_path)

y_test_path = path + "/test/y_test.txt"
y_test = get_labels_from_csv(y_test_path)

In [13]:
import pickle
pickle.dump([X_train, y_train], open('train.pkl', 'wb'))
pickle.dump([X_test, y_test], open('test.pkl', 'wb'))

In [24]:
X_t, y_t = pickle.load(open('pkl_data/test/test.pkl', "rb"))
X_t.shape, y_t.shape

((2947, 128, 9), (2947,))

--- end of data transformation, loading and packaging ----

## Uploading the data

We use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value `inputs` identifies the location -- we will use this later when we start the training job.

In [16]:
inputs = sagemaker_session.upload_data(path='pkl_data', key_prefix='data/har_pkl')

## execute cell below to view the code

In [None]:
!cat generic_ts.py

## Run the training script on SageMaker

The ```MXNet``` class allows us to run our training function on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on a single m4.xlarge instance. 

In [36]:
m = MXNet("generic_ts.py", 
          role=role, 
          train_instance_count=1, 
          train_instance_type="ml.p2.xlarge",
          hyperparameters={'batch_size': 32, 
                         'epochs': 1, 
                         'learning_rate': 0.01, 
                         'momentum': 0.9, 
                         'log_interval': 100,
                         'n_out': len(LABELS),
                         'num_gpus': 1
                          })

After we've constructed our `MXNet` object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.


In [38]:
m.fit(inputs)

INFO:sagemaker:Creating training-job with name: sagemaker-mxnet-py2-gpu-2018-01-29-07-20-17-950


...................................................................................
[31mexecuting startup script (first run)[0m
[31m2018-01-29 07:27:06,741 INFO - root - running container entrypoint[0m
[31m2018-01-29 07:27:06,741 INFO - root - starting train task[0m
[31m2018-01-29 07:27:09,477 INFO - mxnet_container.train - MXNetTrainingEnvironment: {'enable_cloudwatch_metrics': False, 'available_gpus': 1, 'channels': {u'training': {u'TrainingInputMode': u'File', u'RecordWrapperType': u'None', u'S3DistributionType': u'FullyReplicated'}}, '_ps_verbose': 0, 'resource_config': {u'current_host': u'algo-1', u'hosts': [u'algo-1']}, 'user_script_name': u'generic_ts.py', 'input_config_dir': '/opt/ml/input/config', 'channel_dirs': {u'training': u'/opt/ml/input/data/training'}, 'code_dir': '/opt/ml/code', 'output_data_dir': '/opt/ml/output/data/', 'output_dir': '/opt/ml/output', 'model_dir': '/opt/ml/model', 'hyperparameters': {u'sagemaker_program': u'generic_ts.py', u'num_gpus': 1, u'lea