In [None]:
# default_exp core

# Forecast Notebook

# Getting Data Ready

The overall process for  using Amazon Forecast is the following:

1. Create a Dataset Group, this is the large box that isolates models and the data they are trained on from each other.
1. Create a Dataset, in Forecast there are 3 types of dataset, Target Time Series, Related Time Series, and Item Metadata. The Target Time Series is required, the others provide additional context with certain algorithms. 
1. Import data, this moves the information from S3 into a storage volume where the data can be used for training and validation.
1. Train a model, Forecast automates this process for you but you can also select particular algorithms, and you can provide your own hyper parameters or use Hyper Parameter Optimization(HPO) to determine the most performant values for you.
1. Deploy a Predictor, here you are deploying your model so you can use it to generate a forecast.
1. Query the Forecast, given a request bounded by time for an item, return the forecast for it. Once you have this you can evaluate its performance or use it to guide your decisions about the future.
1. Export the Forecast , export your forecast result to s3 bucket and you can easilt access it.


**Read Every Cell FULLY before executing it**

For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)

## Setup

Import the standard Python libraries that are used in this lesson.

In [None]:
import sys
import os
import json
import time
import pandas as pd
import boto3
import util

Configure the S3 bucket name for all the Forecasting outputs and region name for this lesson.

- If you don't have an S3 bucket, create it first on S3. OR If you used CloudFormation Wizard to set up the environment, use same bucket name as you specified in the setup process.
- Although we have set the region to us-east-1 as a default value below, you can choose any of the regions that the service is available in.

In [None]:
text_widget_bucket = util.create_text_widget( "bucket_name", "input your S3 bucket name" )
text_widget_region = util.create_text_widget( "region", "input region name.", default_value="us-east-1" )

In [None]:
bucket_name = text_widget_bucket.value
assert bucket_name, "bucket_name not set."

region = text_widget_region.value
assert region, "region not set."

The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that.

In [None]:
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

## Data Preparation<a class="anchor" id="DataPrep"></a>

## Input S3 bucket (raw data)

For tetsing Execute only below cell(Raw data is present locally)

In [None]:
df = pd.read_csv("data/item-demand-time.csv", dtype = object, names=['timestamp','value','item'])
df.head(3)

If raw data is present in S3 Bucket

In [None]:
s3_bucket = "bucket_name"
s3_input_prefix = 'prefix_name'
s3_input_filename = 'filename'

In [None]:
s3 = boto3.resource('s3')
obj = s3.Object(s3_bucket, s3_input_prefix+'/'+s3_input_filename)
body = obj.get()['Body'].read()
df=pd.read_csv(io.BytesIO(body))
df.head(3)

Notice in the output above there are 3 columns of data:

1. The Timestamp
1. A Value
1. An Item

These are the 3 key required pieces of information to generate a forecast with Amazon Forecast. More can be added but these 3 must always remain present.


**If you want to train yoir model only you can skip this cells and firectly run the "Create the Dataset Group and Dataset"**

Example:- The dataset happens to span January 01, 2014 to Deceber 31, 2014. For our testing we would like to keep the last month of information in a different CSV. We are also going to save January to November to a different CSV as well.


#### Divide your Main Dataset Into Two different Dataframe One for Training and one to check your Forecasting Result and Compare them with Actual Results.

You can divide your data accroding to timestammp column (90% for Training and 10% for Testing)

In [None]:
# Select January to November for one dataframe.
jan_to_oct = df[(df['timestamp'] >= '2014-01-01') & (df['timestamp'] <= '2014-10-31')]

# Select the month of December for another dataframe.
df = pd.read_csv("../../common/data/item-demand-time.csv", dtype = object, names=['timestamp','value','item'])
remaining_df = df[(df['timestamp'] >= '2014-10-31') & (df['timestamp'] <= '2014-12-01')]

Now export them to CSV files and place them into your `data` folder.

In [None]:
jan_to_oct.to_csv("data/item-demand-time-train.csv", header=False, index=False)
remaining_df.to_csv("data/item-demand-time-validation.csv", header=False, index=False)

At this time the data is ready to be sent to S3 where Forecast will use it later. The following cells will upload the data to S3.

In [None]:
key="elec_data/item-demand-time-train.csv"

boto3.Session().resource('s3').Bucket(bucket_name).Object(key).upload_file("data/item-demand-time-train.csv")

## Creating the Dataset Group and Dataset <a class="anchor" id="dataset"></a>

In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. 

More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`.


It is importan to also convey how Amazon Forecast can understand your time-series information. That the cell immediately below does that, the next one configures your variable names for the Project, DatasetGroup, and Dataset.


In [None]:
import ipywidgets as widgets
Frequency=widgets.Dropdown(
    options=['Y', 'M', 'W','D','H','30min','15min','10min','5min','1min'],
    value='H',
    description='FREQUENCY:',
    disabled=False,
)
Frequency

Dropdown(description='FREQUENCY:', index=4, options=('Y', 'M', 'W', 'D', 'H', '30min', '15min', '10min', '5min…

In [None]:
# "DatasetFrequency": "Y|M|W|D|H|30min|15min|10min|5min|1min"
DATASET_FREQUENCY = Frequency.value

# TIMESTAMP_FORMAT:- "yyyy-MM-dd", "yyyy-MM-dd hh:mm:ss"
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"

H


In [None]:
project = 'util_power_forecastdemo'
datasetName= project+'_ds'
datasetGroupName= project +'_dsg'
s3DataPath = "s3://"+bucket_name+"/"+key

### Create the Dataset Group


In [None]:
Domain=widgets.Dropdown(
    options=['RETAIL', 'CUSTOM', 'INVENTORY_PLANNING','EC2_CAPACITY','WORK_FORCE','WEB_TRAFFIC','METRICS'],
    value='CUSTOM',
    description='DOMAIN:',
    disabled=False,
)
Domain

Dropdown(description='DOMAIN:', index=1, options=('RETAIL', 'CUSTOM', 'INVENTORY_PLANNING', 'EC2_CAPACITY', 'W…

In [None]:
# "Domain": "'RETAIL'|'CUSTOM'|'INVENTORY_PLANNING'|'EC2_CAPACITY'|'WORK_FORCE'|'WEB_TRAFFIC'|'METRICS'"
create_dataset_group_response = forecast.create_dataset_group(DatasetGroupName=datasetGroupName,
                                                              Domain=Domain.value,
                                                             )
datasetGroupArn = create_dataset_group_response['DatasetGroupArn']

In [None]:
forecast.describe_dataset_group(DatasetGroupArn=datasetGroupArn)

### Create the Schema

In [None]:
# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.
# "AttributeType": "string|datetime|float"
schema ={
   "Attributes":[
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      },
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      }
   ]
}

### Create the Dataset

In [None]:
DATASET_TYPE=widgets.Dropdown(
    options=['TARGET_TIME_SERIES', 'RELATED_TIME_SERIES', 'ITEM_METADATA'],
    value='TARGET_TIME_SERIES',
    description='DatasetType:',
    disabled=False,
)
DATASET_TYPE

Dropdown(description='DatasetType:', options=('TARGET_TIME_SERIES', 'RELATED_TIME_SERIES', 'ITEM_METADATA'), v…

In [None]:
# "Domain": "'RETAIL'|'CUSTOM'|'INVENTORY_PLANNING'|'EC2_CAPACITY'|'WORK_FORCE'|'WEB_TRAFFIC'|'METRICS'"
# "DatasetType": "'TARGET_TIME_SERIES'|'RELATED_TIME_SERIES|ITEM_METADATA'"

response=forecast.create_dataset(
                    Domain=Domain.value,
                    DatasetType=DATASET_TYPE.value,
                    DatasetName=datasetName,
                    DataFrequency=DATASET_FREQUENCY, 
                    Schema = schema
)

In [None]:
#hide
datasetArn = response['DatasetArn']
forecast.describe_dataset(DatasetArn=datasetArn)

### Add Dataset to Dataset Group

In [None]:
forecast.update_dataset_group(DatasetGroupArn=datasetGroupArn, DatasetArns=[datasetArn])

### Create IAM Role for Forecast

Like many AWS services, Forecast will need to assume an IAM role in order to interact with your S3 resources securely. In the sample notebooks, we use the get_or_create_iam_role() utility function to create an IAM role. Please refer to ["notebooks/common/util/fcst_utils.py"](../../common/util/fcst_utils.py) for implementation.

In [None]:
# Create the role to provide to Amazon Forecast.
role_name = "ForecastNotebookRole-demo"
role_arn = util.get_or_create_iam_role( role_name = role_name )

### Create Data Import Job


Now that Forecast knows how to understand the CSV we are providing, the next step is to import the data from S3 into Amazon Forecaast.

In [None]:
datasetImportJobName = 'EP_DSIMPORT_JOB_TARGET'
ds_import_job_response=forecast.create_dataset_import_job(DatasetImportJobName=datasetImportJobName,
                                                          DatasetArn=datasetArn,
                                                          DataSource= {
                                                              "S3Config" : {
                                                                 "Path":s3DataPath,
                                                                 "RoleArn": role_arn
                                                              } 
                                                          },
                                                          TimestampFormat=TIMESTAMP_FORMAT
                                                         )

In [None]:
ds_import_job_arn=ds_import_job_response['DatasetImportJobArn']
print(ds_import_job_arn)

Check the status of dataset, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes.

In [None]:
status_indicator = util.StatusIndicator()

while True:
    status = forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)['Status']
    status_indicator.update(status)
    if status in ('ACTIVE', 'CREATE_FAILED'): break
    time.sleep(10)

status_indicator.end()

In [None]:
forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)

# Building Your Predictor


The overall process for this is:


* Create a Predictor
* Deploy a Predictor
* Obtain a Forecast

To get started, simply execute the cells below:


## Create a Predictor

Now in the previous Cells, your data was imported to be used by Forecast, here we will once again define your dataset information and then start building your model or predictor.

Forecasthorizon is the number of number of time points to predicted in the future. For weekly data, a value of 12 means 12 weeks. for hourly data, we try forecast the next day, so we can set to 24.



Algorithm:-
If you are unsure of which algorithm to use to train your model, choose AutoML when creating a predictor and let Forecast select the optimal algorithm for your datasets. Otherwise, you can manually select one of the built-in algorithms.


In [None]:
predictorName= project+'_deeparp_algo'

In [None]:
forecastHorizon = 24

#### ALgorithms
1. Amazon Forecast CNN-QR, Convolutional Neural Network - Quantile Regression :-  arn:aws:forecast:::algorithm/CNN-QR  
1. Amazon Forecast DeepAR+ is a proprietary machine learning algorithm :-  arn:aws:forecast:::algorithm/Deep_AR_Plus
1. Prophet is a time series forecasting algorithm :- arn:aws:forecast:::algorithm/Prophet
1. The Amazon Forecast Non-Parametric Time Series (NPTS) proprietary algorithm :- arn:aws:forecast:::algorithm/NPTS
1. Autoregressive Integrated Moving Average (ARIMA) :- arn:aws:forecast:::algorithm/ARIMA
1. Exponential Smoothing (ETS) is a commonly used statistical algorithm :- arn:aws:forecast:::algorithm/ETS

For Detail Information:- https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html

In [None]:
 algorithmArn = 'arn:aws:forecast:::algorithm/Deep_AR_Plus'

In [None]:
# If your PerformAutoML is true then you can remove the AlgorithmArn field. AutlMl will automatically try to find the best algorithm.

create_predictor_response=forecast.create_predictor(PredictorName=predictorName, 
                                                  AlgorithmArn=algorithmArn,
                                                  ForecastHorizon=forecastHorizon,
                                                  PerformAutoML= False,
                                                  PerformHPO=False,
                                                  EvaluationParameters= {"NumberOfBacktestWindows": 1, 
                                                                         "BackTestWindowOffset": 24}, 
                                                  InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
                                                  FeaturizationConfig= {"ForecastFrequency": "H", 
                                                                        "Featurizations": 
                                                                        [
                                                                          {"AttributeName": "target_value", 
                                                                           "FeaturizationPipeline": 
                                                                            [
                                                                              {"FeaturizationMethodName": "filling", 
                                                                               "FeaturizationMethodParameters": 
                                                                                {"frontfill": "none", 
                                                                                 "middlefill": "zero", 
                                                                                 "backfill": "zero"}
                                                                              }
                                                                            ]
                                                                          }
                                                                        ]
                                                                       }
                                                 )

In [None]:
predictor_arn=create_predictor_response['PredictorArn']

Check the status of the predictor. When the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [None]:
status_indicator = util.StatusIndicator()

while True:
    status = forecast.describe_predictor(PredictorArn=predictor_arn)['Status']
    status_indicator.update(status)
    if status in ('ACTIVE', 'CREATE_FAILED'): break
    time.sleep(10)

status_indicator.end()

### Get Error Metrics

In [None]:
forecast.get_accuracy_metrics(PredictorArn=predictor_arn)

## Create a Forecast

Now create a forecast using the model that was trained

In [None]:
forecastName= project+'_deeparp_algo_forecast'

In [None]:
create_forecast_response=forecast.create_forecast(ForecastName=forecastName,
                                                  PredictorArn=predictor_arn)
forecast_arn = create_forecast_response['ForecastArn']

Check the status of the forecast process, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [None]:
status_indicator = util.StatusIndicator()

while True:
    status = forecast.describe_forecast(ForecastArn=forecast_arn)['Status']
    status_indicator.update(status)
    if status in ('ACTIVE', 'CREATE_FAILED'): break
    time.sleep(10)

status_indicator.end()

### Get Forecast
Once created, the forecast results are ready and you view them. 

In [None]:
print(forecast_arn)
print()
forecastResponse = forecastquery.query_forecast(
    ForecastArn=forecast_arn,
    Filters={"item_id":"client_12"} 
    # Specify your filters here
)
print(forecastResponse)

# Evaluating Your Forecast
### You can skip this step If you dont want to evaluate on test data and can run directly to Export Forecast Cell.

Now is the time to pull down the predictions from this Predictor, and compare them to the actual observed values. This will let us know the impact of accuracy based on the Forecast.

You can extend the approaches here to compare multiple models or predictors and to determine the impact of improved accuracy on your use case.

Overview:

* Plotting the Actual Results
* Plotting the Prediction
* Comparing the Prediction to Actual Results

## Plotting the Actual Results

Take a samll dataframe and plot the result  during that timestamp to evaluate the results.


In the first Part of  notebook we created a file of observed values, we are now going to select a given date and customer from that dataframe and are going to plot the actual usage data for that customer. 

In [None]:
actual_df = pd.read_csv("data/item-demand-time-validation.csv", names=['timestamp','value','item'])
actual_df.head()

Next we need to reduce the data to just the day we wish to plot, which is the First of November 2014.

In [None]:
actual_df = actual_df[(actual_df['timestamp'] >= '2014-10-31') & (actual_df['timestamp'] < '2014-11-01')]

Lastly, only grab the items for client_12 (Any Single customer to visualize more properly)

In [None]:
actual_df = actual_df[(actual_df['item'] == 'client_12')]
actual_df.head()

In [None]:
actual_df.plot()

## Plotting the Prediction:

Next we need to convert the JSON response from the Predictor to a dataframe that we can plot.

For Info :- https://docs.aws.amazon.com/forecast/latest/dg/metrics.html

In [None]:
# Generate DF 
prediction_df_p10 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p10'])
prediction_df_p10.head()

In [None]:
# Plot
prediction_df_p10.plot()

The above merely did the p10 values, now do the same for p50 and p90.

In [None]:
prediction_df_p50 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p50'])
prediction_df_p90 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p90'])

## Comparing the Prediction to Actual Results

After obtaining the dataframes the next task is to plot them together to determine the best fit.

In [None]:
# We start by creating a dataframe to house our content, here source will be which dataframe it came from
results_df = pd.DataFrame(columns=['timestamp', 'value', 'source'])

Import the observed values into the dataframe:

In [None]:
for index, row in actual_df.iterrows():
    clean_timestamp = dateutil.parser.parse(row['timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['value'], 'source': 'actual'} , ignore_index=True)

In [None]:
# To show the new dataframe
results_df.head()

In [None]:
# Now add the P10, P50, and P90 Values
for index, row in prediction_df_p10.iterrows():
    clean_timestamp = dateutil.parser.parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p10'} , ignore_index=True)
for index, row in prediction_df_p50.iterrows():
    clean_timestamp = dateutil.parser.parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p50'} , ignore_index=True)
for index, row in prediction_df_p90.iterrows():
    clean_timestamp = dateutil.parser.parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p90'} , ignore_index=True)

In [None]:
results_df

In [None]:
pivot_df = results_df.pivot(columns='source', values='value', index="timestamp")

pivot_df

In [None]:
pivot_df.plot()

# Export Forecast

In [None]:
forecastExportName= project+'_deeparp_forecast_export'

In [None]:
outputPath="s3://"+bucket_name+"/output"

In [None]:
forecast_export_response = forecast.create_forecast_export_job(
                                                                ForecastExportJobName = forecastExportName,
                                                                ForecastArn=forecastArn, 
                                                                Destination = {
                                                                   "S3Config" : {
                                                                       "Path":outputPath,
                                                                       "RoleArn": role_arn
                                                                   } 
                                                                }
                                                              )

In [None]:
forecastExportJobArn = forecast_export_response['ForecastExportJobArn']

In [None]:
status_indicator = util.StatusIndicator()

while True:
    status = forecast.describe_forecast_export_job(ForecastExportJobArn=forecastExportJobArn)['Status']
    status_indicator.update(status)
    if status in ('ACTIVE', 'CREATE_FAILED'): break
    time.sleep(10)

status_indicator.end()

Check s3 bucket for results

In [None]:
s3.list_objects(Bucket=bucketName,Prefix="output")

Once you are done exporting this Forecast you can cleanup all the work that was done by executing the below cells.

# Cleanup

After building completing the notebooks you may want to delete the following to prevent any unwanted charges:

* Forecasts
* Predictors
* Datasets
* Dataset Groups

## Imports and Connections to AWS

The following lines import all the necessary libraries and then connect you to Amazon Forecast.


## Defining the Things to Cleanup

In [None]:
# Delete the Foreacst:
util.wait_till_delete(lambda: forecast.delete_forecast(ForecastArn=forecast_arn))

In [None]:
# Delete the Predictor:
util.wait_till_delete(lambda: forecast.delete_predictor(PredictorArn=predictor_arn))

In [None]:
# Delete Import
util.wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ds_import_job_arn))

In [None]:
# Delete the Dataset:
util.wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=datasetArn))

In [None]:
# Delete the DatasetGroup:
util.wait_till_delete(lambda: forecast.delete_dataset_group(DatasetGroupArn=datasetGroupArn))

In [None]:
# Delete your file in S3
boto3.Session().resource('s3').Bucket(bucket_name).Object(key).delete()

## IAM Role and Policy Cleanup

The very last step in the notebooks is to remove the policies that were attached to a role and then to delete it. No changes should need to be made here, just execute the cell.

In [None]:
util.delete_iam_role(role_name)

All that remains to cleanup here is to now go back to the CloudFormation console and delete the stack. You have successfully removed all resources that were created.