# Morse Code Lab
This lab uses data produced by using the IoTanium device as a Morse code signal generator.  This notebook demonstrates the use of an SKLearn model, which will be trained to recognize particular signals. 

In this lab, the data have already been generated by manually tapping out Morse code patterns on the IoTanium button.  Each of two patterns were tapped out many times and recorded.  These recordings are stored as a sequence of `(time, value)` pairs, using the IoTanium onboard clock and the button value (1=pressed, 0=not pressed).  

The two sequences recorded are:
- `....  ..` = `HI`
- `...  ---  ...` = `SOS` (international nautical/aviation distress signal)

You can learn more about Morse code here: https://en.wikipedia.org/wiki/Morse_code

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import json
import os
import boto3
import sagemaker
from sagemaker import get_execution_role

Define the training data location

In [None]:
bucket_name='iotanium-test'
prefix='iotanium/data_7'

Role and session information needed for the .fit and .deploy operations by SageMaker

In [None]:
role = get_execution_role()

In [None]:
print(role)

In [None]:
sagemaker_session = sagemaker.Session()

In [None]:
s3_client = boto3.client('s3')
s3 = boto3.resource('s3')

# Data Preparation

Dump out a list of all the available data files.

In [None]:
obj_list=s3_client.list_objects(Bucket=bucket_name, Prefix=prefix)
files=[]
for contents in obj_list['Contents']:
    files.append(contents['Key'])
print(files)

Define a function to read the JSON data files and extract the (time, value) tuples to arrays `x` and `y`

In [None]:
def read_data(filename):
    data = None
    with open(filename, 'r') as fh:
        raw = json.load(fh)
        data = raw['records']
        rawx,rawy = zip(*data)
        
        #-- subtract off initial timestamp so x starts at zero
        x = np.array(rawx)-rawx[0]
        y = np.array(rawy)
    return x,y

These function does the necessary trimming of the recorded signal to +/- 250 millisec of the leading edge and trailing edge.  The result is then interpolated to a uniformly spaced array of 500 values.

There is one function for the using the touch sensor (where touch drops values) and one for button sensor (where push increases values, from 0 to 1).  The one to invoke is wrapped in `prepare_data(x,y)` below.

In [None]:
def prepare_data_touch(x,y):

    #-- define a threshold to determine leading/trailing exceedance edge
    my = np.median(y)
    thresh = 0.75 * my
    
    #-- find leading/trailing edges
    x0 = np.min(x[y<thresh]) if np.any(y<thresh) else np.min(x)
    xN = np.max(x[y<thresh]) if np.any(y<thresh) else np.max(x)
    
    #-- start and end a bit before/after edges
    x0 -= 250
    xN += 250
    
    xn = (x-x0) / float(xN-x0)
    yn = y[(xn>=0) & (xn<=1)]
    xn = xn[(xn>=0) & (xn<=1)]
    
    newx = np.arange(500)*0.002
    newy = np.interp(newx,xn,yn)
    return newx,newy

In [None]:
def prepare_data_button(x,y):

    #-- define a threshold to determine leading/trailing exceedance edge
    thresh = 0.5
    
    #-- find leading/trailing edges
    x0 = np.min(x[y>thresh]) if np.any(y>thresh) else np.min(x)
    xN = np.max(x[y>thresh]) if np.any(y>thresh) else np.max(x)
    
    #-- start and end a bit before/after edges
    x0 -= 250
    xN += 250
    
    xn = (x-x0) / float(xN-x0)
    yn = y[(xn>=0) & (xn<=1)]
    xn = xn[(xn>=0) & (xn<=1)]
    
    newx = np.arange(500)*0.002
    newy = np.interp(newx,xn,yn)
    return newx,newy

Select which sensor will be used.

In [None]:
def prepare_data(x,y):
    return prepare_data_button(x,y)

# Examine Training Data

In [None]:
os.makedirs(prefix, exist_ok=True)

Copy the data files to the local notebook instance, perform data preparation and plot the result.

In [None]:
for filename in files:
    s3_loc = 's3://{}/{}'.format(bucket_name, filename)
    s3.Bucket(bucket_name).download_file(filename, filename)
    rawx,rawy = read_data(filename)
    x,y = prepare_data(rawx,rawy)
    plt.plot(x,y)
    plt.show()

# Training

Import the wrapper that lets SageMaker treat SKLearn algorithms as if they were SageMaker "native" models.

In [None]:
from sagemaker.sklearn.estimator import SKLearn

In [None]:
script_path = 'morse_kmeans2.py'

In [None]:
!pygmentize morse_kmeans2.py

Create the object which will act as the model.  This will wrap the external Python code (which contains the same cleaning functions above) and the SKLearn's version of KMeans clustering.

In [None]:
instance_type = 'ml.m4.xlarge'
#instance_type = 'local'

In [None]:
sklearn = SKLearn(
    entry_point=script_path,
    train_instance_type=instance_type,
    role=role,
    hyperparameters={'n_clusters': 2})

Fit the SKLearn model to the data (i.e. determine clusters).  This will spin up and spin down the instance type specified above, and can take several minutes.

In [None]:
sklearn.fit({'train': 's3://iotanium-test/iotanium/data_7'})

Print the model artifacts location.

In [None]:
print(sklearn.model_data)

# Examine Model
SageMaker always stores the model artifacts are stored in S3, regardless of whether the model was trained locally or on cloud instances. In order to unpack the details of the model, we have to make a copy of the model artifacts on the notebook and load the object. Fortunately, SKLearn models are open source, so we examine the model here. Note this sub section is purely diagnostic curiousity, and not strictly part of a train/deploy operation

In [None]:
import urllib
import pickle
from sklearn.externals import joblib

In [None]:
u = urllib.parse.urlsplit(sklearn.model_data)

In [None]:
s3_client.download_file(u.netloc, u.path[1:], 'model.tar.gz')

In [None]:
!tar -xzvf model.tar.gz

In [None]:
skmodel = joblib.load("model.joblib")

We can see what the "cluster centers" look like; these are basically what the model sees as the nominal version of each signal and what any new data will be compared against.

In [None]:
plt.plot(skmodel.cluster_centers_[0])

In [None]:
plt.plot(skmodel.cluster_centers_[1])

Examine how model labeled the training data

In [None]:
skmodel.labels_

# Deploy Endpoint

Create the endpoint. This will take in the JSON sent from the IoT Core->Lambda function->endpoint path and return a JSON response (received and modified by the Lambda function). Note that this instance must be deleted manually when you're done messing with it.

In [None]:
predictor = sklearn.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

In [None]:
runtime = boto3.Session().client(service_name='runtime.sagemaker',region_name='us-east-1')

Get the name of the endpoint.  This must be added to the Lambda function environment so it knows where to send data.

In [None]:
endpoint_name = predictor._get_endpoint_config_name()
endpoint_name

Get one of the available files (used for training, but just to see what happens)

In [None]:
with open(files[0], 'rb') as f:
    payload = f.read()
#payload = bytearray(payload)

Send the file to the endpoint to examine the response

In [None]:
response = runtime.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType='application/json',
                                   Body=payload)

In [None]:
response_body = response['Body'].read()

In [None]:
print(response_body.decode('utf-8'))

# Clean up
Delete the endpoint when you're done to avoid ongoing charges.

In [None]:
sklearn.delete_endpoint()