# EN.705.603.82.FA22 Creating AI-Enabled Systems<br/><br/>Stock Close Price Prediction with SageMaker


1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
    1. [Permissions and environment variables](#Permissions-and-environment-variables)
    2. [Model definitions](#Model-definitions)
    3. [Data Setup](#Data-setup)
3. [Training the network locally](#Training)
4. [Set up hosting for the model](#Set-up-hosting-for-the-model)
    1. [Export from TensorFlow](#Export-the-model-from-tensorflow)
    2. [Import model into SageMaker](#Import-model-into-SageMaker)
    3. [Create endpoint](#Create-endpoint) 
5. [Validate the endpoint for use](#Validate-the-endpoint-for-use)


## Introduction 

We will do a regression task for stock close price prediction, training locally in the box from where this notebook is being run. We then set up a real-time hosted endpoint in SageMaker.

## Prequisites and Preprocessing
### Permissions and environment variables

Here we set up the linkage and authentication to AWS services. The Sagemaker SDK will use S3 defualt buckets when needed.

In [None]:
!pip install --upgrade tensorflow sagemaker
!pip install --upgrade pandas-datareader

In [None]:
import boto3
import numpy as np
import os
import pandas as pd
import re
import sagemaker
from sagemaker.tensorflow import TensorFlowModel
from sagemaker.utils import S3DataConfig

import shutil
import tarfile
import tensorflow as tf
from tensorflow.python.keras.utils.np_utils import to_categorical
from tensorflow.keras.layers import Input, Dense

from pandas_datareader import data
import datetime as dt
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from sklearn import preprocessing
from keras.layers import Conv1D,Flatten,MaxPooling1D,Bidirectional,LSTM,Dropout,TimeDistributed,MaxPool2D
from keras.layers import Dense,GlobalAveragePooling2D
import matplotlib.pyplot as plt

role = sagemaker.get_execution_role()
sm_session = sagemaker.Session()
bucket_name = sm_session.default_bucket()

### Model Definitions

For this task, we'll use a CNN and LSTM network architecture.

In [None]:
model = Sequential()
model.add(TimeDistributed(Conv1D(128, kernel_size=1, activation='relu', input_shape=(None,50,1))))
model.add(TimeDistributed(MaxPooling1D(2)))
model.add(TimeDistributed(Conv1D(256, kernel_size=1, activation='relu')))
model.add(TimeDistributed(MaxPooling1D(2)))
model.add(TimeDistributed(Conv1D(512, kernel_size=1, activation='relu')))
model.add(TimeDistributed(MaxPooling1D(2)))
model.add(TimeDistributed(Flatten()))
model.add(Bidirectional(LSTM(200,return_sequences=True)))
model.add(Dropout(0.25))
model.add(Bidirectional(LSTM(200,return_sequences=False)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='RMSprop', loss='mse')

### Data Setup

We'll use the data provided by Yahoo Finance.

In [None]:
start_date = '2008-01-07'
end_date = dt.datetime.today()
stock = data.get_data_yahoo('AMZN', start_date, end_date)
stock.head(10)

In [None]:
window_size = 50
X = []
Y = []
for i in range(0 , len(stock) - window_size -1 , 1):
    first = stock.iloc[i, stock.columns.get_loc('Close')]
    temp = []
    temp2 = []
    for j in range(window_size):
        temp.append((stock.iloc[i + j, stock.columns.get_loc('Close')] - first) / first)
    temp2.append((stock.iloc[i +window_size, stock.columns.get_loc('Close')] - first) / first)
    X.append(np.array(temp).reshape(50, 1))
    Y.append(np.array(temp2).reshape(1,1))
train_X,test_X,train_label,test_label = train_test_split(X, Y, test_size=0.1,shuffle=False)

train_X = np.array(train_X)
test_X = np.array(test_X)
train_label = np.array(train_label)
test_label = np.array(test_label)

train_X = train_X.reshape(train_X.shape[0],1,50,1)
test_X = test_X.reshape(test_X.shape[0],1,50,1)

train_labels = np.array(train_df.pop("class"))
test_labels = np.array(test_df.pop("class"))

## Training the Network Locally

Here, we train the network using the Tensorflow .fit method, just like if we were using our local computers.

In [None]:
model.fit(train_X, train_label, validation_data=(test_X,test_label), epochs=40,batch_size=64,shuffle =False)
print(model.evaluate(test_X,test_label))
# model.summary()
predicted  = model.predict(test_X)
test_label = (test_label[:,0])
predicted = np.array(predicted[:,0]).reshape(-1,1)
for j in range(len_t , len_t + len(test_X)):
    temp =stock.iloc[j,stock.columns.get_loc('Close')]
    test_label[j - len_t] = test_label[j - len_t] * temp + temp
    predicted[j - len_t] = predicted[j - len_t] * temp + temp
plt.plot(test_label, color = 'black', label = ' Stock Price')
plt.plot(predicted, color = 'green', label = 'Predicted  Stock Price')
plt.title(' Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel(' Stock Price')
plt.legend()
plt.show()

## Set up hosting for the model

### Export the model from tensorflow

In order to set up hosting, we have to import the model from training to hosting. We will begin by exporting the model from TensorFlow and saving it to our file system. We also need to convert the model into a form that is readable by ``sagemaker.tensorflow.model.TensorFlowModel``. There is a small difference between a SageMaker model and a TensorFlow model. The conversion is easy and fairly trivial. Simply move the tensorflow exported model into a directory ``export\Servo\`` and tar the entire directory. SageMaker will recognize this as a loadable TensorFlow model.

In [None]:
model.save("export/Servo/1")
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("export")

Open a new sagemaker session and upload the model on to the default S3 bucket. We can use the ``sagemaker.Session.upload_data`` method to do this. We need the location of where we exported the model from TensorFlow and where in our default bucket we want to store the model(``/model``). The default S3 bucket can be found using the ``sagemaker.Session.default_bucket`` method.

Here, we upload the model to S3

In [None]:
s3_response = sm_session.upload_data("model.tar.gz", bucket=bucket_name, key_prefix="model")

### Import model into SageMaker

Use the ``sagemaker.tensorflow.model.TensorFlowModel`` to import the model into SageMaker that can be deployed. We need the location of the S3 bucket where we have the model and the role for authentication.

In [None]:
sagemaker_model = TensorFlowModel(
    model_data=f"s3://{bucket_name}/model/model.tar.gz",
    role=role,
    framework_version="2.3",
)

### Create endpoint

Now the model is ready to be deployed at a SageMaker endpoint. We can use the ``sagemaker.tensorflow.model.TensorFlowModel.deploy`` method to do this.

In [None]:
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge")

### Validate the endpoint for use

We can now use this endpoint to classify an example to ensure that it works. The output from `predict` will be an array of probabilities for each of the 3 classes.

In [None]:
sample = np.array([[[[ 0.        ],
        [-0.01323304],
        [ 0.01075182],
        [-0.00729324],
        [ 0.0112782 ],
        [ 0.00872183],
        [ 0.01383456],
        [ 0.00082707],
        [ 0.0037594 ],
        [-0.00796991],
        [ 0.00992487],
        [ 0.01293234],
        [ 0.01045112],
        [ 0.00436092],
        [ 0.00360899],
        [-0.0115789 ],
        [-0.0034587 ],
        [-0.03872182],
        [-0.03684206],
        [-0.02451124],
        [-0.02097739],
        [-0.04624061],
        [-0.05330824],
        [-0.07691732],
        [-0.06037593],
        [-0.04172935],
        [-0.05060153],
        [-0.06127821],
        [-0.06248118],
        [-0.04278197],
        [-0.05691729],
        [-0.04436091],
        [-0.04586465],
        [-0.04624061],
        [-0.05804512],
        [-0.0630827 ],
        [-0.06556392],
        [-0.05969927],
        [-0.07112781],
        [-0.05345865],
        [-0.05338345],
        [-0.04706769],
        [-0.04413536],
        [-0.05180451],
        [-0.04248121],
        [-0.0189474 ],
        [-0.02526316],
        [-0.02142862],
        [-0.00909779],
        [-0.01909769]]]])
predicted  = predictor.predict(sample)
predicted[0][0] * 150 + 150

Delete all temporary directories so that we are not affecting the next run. Also, optionally delete the end points.

In [None]:
os.remove("model.tar.gz")
shutil.rmtree("export")

In [None]:
predictor.delete_endpoint()