#  Cloud Storage Cost Forecasting

Mphasis cloud storage cost forecasting helps businesses assess the cost incurred from their cloud storage based on historic data. This will help businesses get some understanding of the potential cost for their cloud resources and help them plan better to manage storage services like S3 buckets, EC2 storage, Elastic Block Store, Amazon Glacier etc. it uses ensemble ML algorithms with automatic model selection algorithms. This solution performs automated model selection to apply the right model based on the input data.  


## Contents

1. [Prequisites](#Prerequisite)
1. [Data Dictionary](#Data-Dictionary)
1. [Set Up The Environment](#Set-up-the-environment)
1. [Create The Model](#Create-Model)
1. [Batch Transform Job](#Batch-Transform-Job)
1. [Invoke Endpoint](#Invoking-through-Endpoint)

### Prerequisites

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.


### Data Dictionary

- The input has to be a '.csv' file with 'utf-8' encoding. PLEASE NOTE: If your input .csv file is not 'utf-8' encoded, model   will not perform as expected
1. Have an unique identifier column called 'maskedsku'. eg. 'maskedsku' can be shipmentid
2. The date format of the columns should be: 'YYYY-MM-DD'

### Sample input data

In [1]:
import pandas as pd
df = pd.read_csv("sample.csv")
df.head(10)

Unnamed: 0,maskedsku,2018-08-01 12:00,2018-08-01 13:00,2018-08-01 14:00,2018-08-01 15:00,2018-08-01 16:00,2018-08-01 17:00,2018-08-01 18:00,2018-08-01 19:00,2018-08-01 20:00,...,2018-08-02 13:00,2018-08-02 14:00,2018-08-02 15:00,2018-08-02 16:00,2018-08-02 17:00,2018-08-02 18:00,2018-08-02 19:00,2018-08-02 20:00,2018-08-02 21:00,2018-08-02 22:00
0,product_1,13380.82192,15244.93151,14925.20548,13585.9726,11365.47945,20060.54795,12861.36986,14945.2274,14490.37808,...,15046.35616,19864.93151,14184.9863,12370.84932,19949.58904,14228.38356,19529.55616,16279.7589,14330.9589,15056.87671


### Create the session

The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [2]:
import sagemaker as sage
from time import gmtime, strftime
from sagemaker import get_execution_role

sess = sage.Session()
role = get_execution_role()

## Create Model

Now we use the Model Package to create a model

In [3]:
# Please use the appropriate ARN obtained after subscribing to the model to define 'model_package_arn'

model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/cloud-storage-cost-forecasting-v2'
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sage.Session()
model = ModelPackage(model_package_arn=model_package_arn,
                    role = role,
                    sagemaker_session = sagemaker_session)


## Input File

Now we pull a sample input file for testing the model.

In [4]:
sample_txt="s3://mphasis-marketplace/timeseries-cloud-cost/sample.csv"

## Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works.

In [5]:
import json 
import uuid


transformer = model.transformer(1, 'ml.m5.xlarge')
transformer.transform(sample_txt, content_type='text/csv')
transformer.wait()
#transformer.output_path
print("Batch Transform complete")


................[34mImporting plotly failed. Interactive plots will not work.
 * Serving Flask app "serve" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)[0m
[34m169.254.255.130 - - [27/May/2020 15:08:39] "#033[37mGET /ping HTTP/1.1#033[0m" 200 -[0m
[34m169.254.255.130 - - [27/May/2020 15:08:39] "#033[33mGET /execution-parameters HTTP/1.1#033[0m" 404 -
   maskedsku  2018-08-01 12:00  ...  2018-08-02 21:00  2018-08-02 22:00[0m
[34m0  product_1       13380.82192  ...        14330.9589       15056.87671
[0m
[34m[1 rows x 36 columns][0m
[34m35[0m
[35m169.254.255.130 - - [27/May/2020 15:08:39] "#033[37mGET /ping HTTP/1.1#033[0m" 200 -[0m
[35m169.254.255.130 - - [27/May/2020 15:08:39] "#033[33mGET /execution-parameters HTTP/1.1#033[0m" 404 -
   maskedsku  2018-08-01 12:00  ...  2018-08-02 21:00  2018-08-02 22:00[0m
[35m0  product_1       13380.82192  ...        143


[34m[[18800.86840301 17346.99722368 16615.47939156 20395.29275408
  15807.85348161 14621.32257556 23281.10011288 16831.52042248
  19451.87520196 19323.46981089 14953.41235292 16989.65126357
  20188.14224263 18734.27106331 18002.75323118 21782.5665937
  17195.12732123 16008.59641518 24668.3739525  18218.7942621
  20839.14904158 20710.74365052 16340.68619254 18376.9251032 ]]
   maskedsku 2018-08-01 12:00  ... 201808032100_forecast 201808032200_forecast[0m
[34m0  product_1          13380.8  ...               16340.7               18376.9
[0m
[34m[1 rows x 60 columns][0m
[34m169.254.255.130 - - [27/May/2020 15:09:15] "#033[37mPOST /invocations HTTP/1.1#033[0m" 200 -[0m
[34mINFO:werkzeug:169.254.255.130 - - [27/May/2020 15:09:15] "#033[37mPOST /invocations HTTP/1.1#033[0m" 200 -[0m
[35m[[18800.86840301 17346.99722368 16615.47939156 20395.29275408
  15807.85348161 14621.32257556 23281.10011288 16831.52042248
  19451.87520196 19323.46981089 14953.41235292 16989.65126357
  20188.14

## Output from Batch Transform

Note: Ensure that the following package is installed on the local system : boto3

In [6]:
import boto3
print(transformer.output_path)
bucketFolder = transformer.output_path.rsplit('/')[3]
bucket_name=transformer.output_path.rsplit('/')[2]

#print(s3bucket,s3prefix)
s3_conn = boto3.client("s3")
bucket_name="sagemaker-us-east-2-786796469737"
with open('result.csv', 'wb') as f:
    s3_conn.download_fileobj(bucket_name,bucketFolder+'/sample.csv.out', f)
    print("Output file loaded from bucket")

s3://sagemaker-us-east-2-786796469737/cloud-storage-cost-forecasting-v2-2020--2020-05-27-15-05-55-092
Output file loaded from bucket


In [7]:
df = pd.read_csv("result.csv")
df.head(10)

Unnamed: 0.1,Unnamed: 0,maskedsku,2018-08-01 12:00,2018-08-01 13:00,2018-08-01 14:00,2018-08-01 15:00,2018-08-01 16:00,2018-08-01 17:00,2018-08-01 18:00,2018-08-01 19:00,...,201808031300_forecast,201808031400_forecast,201808031500_forecast,201808031600_forecast,201808031700_forecast,201808031800_forecast,201808031900_forecast,201808032000_forecast,201808032100_forecast,201808032200_forecast
0,0,product_1,13380.82192,15244.93151,14925.20548,13585.9726,11365.47945,20060.54795,12861.36986,14945.2274,...,18002.753231,21782.566594,17195.127321,16008.596415,24668.373953,18218.794262,20839.149042,20710.743651,16340.686193,18376.925103


## Invoking through Endpoint
This is another way of deploying the model that provides results as real time inference. Here is a sample endpoint for reference

In [8]:
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit

role = get_execution_role()

sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()

In [9]:
content_type='text/csv'
model_name='timeseries-cloud-compute'
real_time_inference_instance_type='ml.c4.2xlarge'

In [10]:
# Please use the appropriate ARN obtained after subscribing to the model to define 'model_package_arn'
model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/cloud-storage-cost-forecasting-v2'

In [11]:
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sage.Session()

In [12]:
#Define predictor wrapper class
def predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session,content_type=content_type)
#create a deployable model from the model package.
model = ModelPackage(role=role,
                    model_package_arn=model_package_arn,
                    sagemaker_session=sagemaker_session,
                    predictor_cls=predict_wrapper)

In [13]:
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

---------------!

###  1. Invoking endpoint result through CLI command

In [14]:
file_name="sample.csv"

In [15]:
!aws sagemaker-runtime invoke-endpoint --endpoint-name $model_name --body fileb://$file_name --content-type 'text/csv' --region us-east-2 result_endpoint.csv

{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


In [16]:
df = pd.read_csv("result_endpoint.csv")
df.head(10)

Unnamed: 0.1,Unnamed: 0,maskedsku,2018-08-01 12:00,2018-08-01 13:00,2018-08-01 14:00,2018-08-01 15:00,2018-08-01 16:00,2018-08-01 17:00,2018-08-01 18:00,2018-08-01 19:00,...,201808031300_forecast,201808031400_forecast,201808031500_forecast,201808031600_forecast,201808031700_forecast,201808031800_forecast,201808031900_forecast,201808032000_forecast,201808032100_forecast,201808032200_forecast
0,0,product_1,13380.82192,15244.93151,14925.20548,13585.9726,11365.47945,20060.54795,12861.36986,14945.2274,...,18002.753231,21782.566594,17195.127321,16008.596415,24668.373953,18218.794262,20839.149042,20710.743651,16340.686193,18376.925103


### 2. Invoking endpoint result through python code

In [17]:
f = open('./sample.csv', mode='r')
data=f.read()
prediction = predictor.predict(data)

In [18]:
from io import StringIO

s=str(prediction,'utf-8')
data = StringIO(s) 
df=pd.read_csv(data)
df

Unnamed: 0.1,Unnamed: 0,maskedsku,2018-08-01 12:00,2018-08-01 13:00,2018-08-01 14:00,2018-08-01 15:00,2018-08-01 16:00,2018-08-01 17:00,2018-08-01 18:00,2018-08-01 19:00,...,201808031300_forecast,201808031400_forecast,201808031500_forecast,201808031600_forecast,201808031700_forecast,201808031800_forecast,201808031900_forecast,201808032000_forecast,201808032100_forecast,201808032200_forecast
0,0,product_1,13380.82192,15244.93151,14925.20548,13585.9726,11365.47945,20060.54795,12861.36986,14945.2274,...,18002.753231,21782.566594,17195.127321,16008.596415,24668.373953,18218.794262,20839.149042,20710.743651,16340.686193,18376.925103


In [19]:
predictor.delete_endpoint()