## DeepAR Model - Predict AAPL stock price  

Note: This dataset is not a true timeseries as there a lot of gaps

We have data only for first 20 days of each month and model needs to predict the rentals for 
the remaining days of the month. The dataset consists of two years data. DeepAR will shine with true multiple-timeseries dataset like the electricity example given below

In [79]:
import time
import numpy as np
import pandas as pd
import json
import matplotlib.pyplot as plt
import datetime

import boto3
import sagemaker
from sagemaker import get_execution_role

# This code is derived from AWS SageMaker Samples:
# https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/deepar_electricity
# https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/deepar_synthetic

In [80]:
# Set a good base job name when building different models
# It will help in identifying trained models and endpoints
with_categories = False
if with_categories:
    base_job_name = 'APAL-with-categories'
else:
    base_job_name = 'AAPL-no-categories'

In [81]:
# Specify your bucket name
bucket = 'mw-ml-sagemaker'
prefix = 'deepar/AAPL'

# This structure allows multiple training and test files for model development and testing
if with_categories:
    s3_data_path = "{}/{}/data_with_categories".format(bucket, prefix)
else:
    s3_data_path = "{}/{}/data".format(bucket, prefix)
    

s3_output_path = "{}/{}/output".format(bucket, prefix)

In [82]:
s3_data_path,s3_output_path

('mw-ml-sagemaker/deepar/AAPL/data', 'mw-ml-sagemaker/deepar/AAPL/output')

In [83]:
# File name is referred as key name in S3
# Files stored in S3 are automatically replicated across
# three different availability zones in the region where the bucket was created.
# http://boto3.readthedocs.io/en/latest/guide/s3.html
def write_to_s3(filename, bucket, key):
    with open(filename,'rb') as f: # Read in binary mode
        return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)

In [84]:
# Upload one or more training files and test files to S3
if with_categories:
    write_to_s3('train_with_categories.json',bucket,'deepar/AAPL/data_with_categories/train/train_with_categories.json')
    write_to_s3('test_with_categories.json',bucket,'deepar/AAPL/data_with_categories/test/test_with_categories.json')
else:
    write_to_s3('AAPLtrain.json',bucket,'deepar/AAPL/data/AAPLtrain/train.json')
    write_to_s3('AAPLtest.json',bucket,'deepar/AAPL/data/AAPLtest/test.json')

In [85]:
# Establish a session with AWS
sess = sagemaker.Session()
role = get_execution_role()

In [86]:
# This role contains the permissions needed to train, deploy models
# SageMaker Service is trusted to assume this role
print(role)

arn:aws:iam::480536818350:role/service-role/AmazonSageMaker-ExecutionRole-20201001T191047


In [87]:
# https://sagemaker.readthedocs.io/en/stable/api/utility/image_uris.html#sagemaker.image_uris.retrieve

# SDK 2 uses image_uris.retrieve the container image location

# Use DeepAR Container
container = sagemaker.image_uris.retrieve("forecasting-deepar",sess.boto_region_name)

print (f'Using DeepAR Container {container}')

Using DeepAR Container 522234722520.dkr.ecr.us-east-1.amazonaws.com/forecasting-deepar:1


In [88]:
container

'522234722520.dkr.ecr.us-east-1.amazonaws.com/forecasting-deepar:1'

In [89]:
freq='D' # Timeseries consists Hourly Data and we need to predict hourly rental count

# how far in the future predictions can be made
# 12 days worth of hourly forecast 
prediction_length = 5

# aws recommends setting context same as prediction length as a starting point. 
# This controls how far in the past the network can see
context_length = 15

In [90]:
# Configure the training job
# Specify type and number of instances to use
#   Reference: http://sagemaker.readthedocs.io/en/latest/estimators.html
# SDK 2.x version does not require train prefix for instance count and type

estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path="s3://" + s3_output_path,
    sagemaker_session=sess,
    base_job_name=base_job_name)

In [91]:
freq, context_length, prediction_length

('D', 15, 5)

In [92]:
# https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html
hyperparameters = {
    "time_freq": freq,
    "epochs": "400",
    "early_stopping_patience": "10",
    "mini_batch_size": "64",
    "learning_rate": "5E-4",
    "context_length": str(context_length),
    "prediction_length": str(prediction_length),
    "cardinality" : "auto" if with_categories else ''
}

In [93]:
hyperparameters

{'time_freq': 'D',
 'epochs': '400',
 'early_stopping_patience': '10',
 'mini_batch_size': '64',
 'learning_rate': '5E-4',
 'context_length': '15',
 'prediction_length': '5',
 'cardinality': ''}

In [94]:
estimator.set_hyperparameters(**hyperparameters)

In [95]:
# Here, we are simply referring to train path and test path
# You can have multiple files in each path
# SageMaker will use all the files
data_channels = {
    "train": "s3://{}/AAPLtrain/".format(s3_data_path),
    "test": "s3://{}/AAPLtest/".format(s3_data_path)
}

In [96]:
data_channels

{'train': 's3://mw-ml-sagemaker/deepar/AAPL/data/AAPLtrain/',
 'test': 's3://mw-ml-sagemaker/deepar/AAPL/data/AAPLtest/'}

In [97]:
# This step takes around 35 minutes to train the model with m4.xlarge instance
estimator.fit(inputs=data_channels)

2020-12-30 23:42:14 Starting - Starting the training job...
2020-12-30 23:42:38 Starting - Launching requested ML instancesProfilerReport-1609371734: InProgress
.........
2020-12-30 23:43:59 Starting - Preparing the instances for training...
2020-12-30 23:44:39 Downloading - Downloading input data...
2020-12-30 23:45:05 Training - Training image download completed. Training in progress.[34mArguments: train[0m
[34m[12/30/2020 23:45:06 INFO 140347361175360] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'num_dynamic_feat': u'auto', u'dropout_rate': u'0.10', u'mini_batch_size': u'128', u'test_quantiles': u'[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'num_eval_samples': u'100', u'learning_rate': u'0.001', u'num_cells': u'40', u'num_layers': u'2', u'embedding_dimension': u'10', u'_kvstore': u'auto', u'_num_kv_servers': u'auto', u'cardinality': u'auto', u'lik

In [98]:
job_name = estimator.latest_training_job.name

In [99]:
# You can also deploy a model using the job name. The job name is also available 
# in the sagemaker console -> Training -> Training Jobs
# job_name = 'deepar-biketrain-with-categories-2018-12-21-04-05-44-478'

In [100]:
print ('job name: {0}'.format(job_name))

job name: AAPL-no-categories-2020-12-30-23-42-14-622


In [101]:
# Create an endpoint for real-time predictions
# SDK 2. parameter name for container: image_uri

endpoint_name = sess.endpoint_from_job(
    job_name=job_name,
    initial_instance_count=1,
    instance_type='ml.m4.xlarge',
    image_uri=container,
    role=role)

---------------!

In [102]:
print ('endpoint name: {0}'.format(endpoint_name))

endpoint name: AAPL-no-categories-2020-12-30-23-42-14-622


In [103]:
# In the next lab, we will use the above endpoint for inference
# We will delete the endpoint in the next lab