## UFO Sightings Evaluation and Optimization


1. [Create and train our "optimized" model (Linear Learner)](#1.-Create-and-train-our-%22optimized%22-model-(Linear-Learner))
1. Compare the results!

In [1]:
#Import all the needed libraries.

import pandas as pd
import numpy as np
from datetime import datetime


import boto3
from sagemaker import get_execution_role
import sagemaker

In [2]:
#pre-requisites: create S3 bucket, upload files, set IAM roles for Sagemaker to access to S3 bucket
role = get_execution_role()
bucket = 'sagemaker-ml-lab-tx'

---

### 1. Create and train our "optimized" model (Linear Learner)

Hyperparameter Tuning

In [3]:
#get the recordIO file for the training data that is in S3
train_file = 'ufo_sightings_train_recordIO_protobuf.data'
training_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_train/{}'.format(bucket, train_file)
print('The Pipe mode recordIO protobuf training data: {}'.format(training_recordIO_protobuf_location))

The Pipe mode recordIO protobuf training data: s3://sagemaker-ml-lab-tx/algorithms_lab/linearlearner_train/ufo_sightings_train_recordIO_protobuf.data


In [4]:
#get the recordIO file for the validation data that is in S3
validation_file = 'ufo_sightings_validatioin_recordIO_protobuf.data'
validate_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_validation/{}'.format(bucket, validation_file)
print('The Pipe mode recordIO protobuf validation data: {}'.format(validate_recordIO_protobuf_location))

The Pipe mode recordIO protobuf validation data: s3://sagemaker-ml-lab-tx/algorithms_lab/linearlearner_validation/ufo_sightings_validatioin_recordIO_protobuf.data


# 

In [5]:
from sagemaker.amazon.amazon_estimator import get_image_uri
import sagemaker

container = get_image_uri(boto3.Session().region_name, 'linear-learner', "1")

Let's create a job and use the optimzed hyperparamters.

In [6]:
# Create a training job name
job_name = 'ufo-linear-learner-job-optimized-{}'.format(datetime.now().strftime("%Y%m%d%H%M%S"))
print('Here is the job name {}'.format(job_name))

# Here is where the model-artifact will be stored
output_location = 's3://{}/optimization_evaluation_lab/linearlearner_optimized_output'.format(bucket)

Here is the job name ufo-linear-learner-job-optimized-20200405091708


In [7]:
%%time
sess = sagemaker.Session()

# Setup the LinearLeaner algorithm from the ECR container
linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.c4.xlarge',
                                       output_path=output_location,
                                       sagemaker_session=sess,
                                       input_mode='Pipe')
# Setup the hyperparameters
linear.set_hyperparameters( feature_dim=22, 
                            predictor_type='multiclass_classifier',
                            num_classes=3,
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here)
                          )


# Launch a training job. This method calls the CreateTrainingJob API call
data_channels = {
    'train': training_recordIO_protobuf_location,
    'validation': validate_recordIO_protobuf_location
}
linear.fit(data_channels, job_name=job_name)

2020-04-05 09:17:16 Starting - Starting the training job...
2020-04-05 09:17:18 Starting - Launching requested ML instances......
2020-04-05 09:18:20 Starting - Preparing the instances for training......
2020-04-05 09:19:33 Downloading - Downloading input data...
2020-04-05 09:20:13 Training - Training image download completed. Training in progress.
2020-04-05 09:20:13 Uploading - Uploading generated training model[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[04/05/2020 09:20:05 INFO 140377241651008] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'inf


2020-04-05 09:20:20 Completed - Training job completed
Training seconds: 47
Billable seconds: 47
CPU times: user 592 ms, sys: 30.1 ms, total: 622 ms
Wall time: 3min 42s


Now we can compare the amount of time billed and the accuracy compared to our baseline model.