## Setup<a class="anchor" id="setup"></a>

In [1]:
import boto3
from time import sleep
from util.fcst_utils import wait_till_delete
import subprocess

In [2]:
session = boto3.Session(region_name='us-east-1') 

forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

In [3]:
forecast.list_dataset_groups()

{'DatasetGroups': [{'DatasetGroupArn': 'arn:aws:forecast:us-east-1:452432741922:dataset-group/util_power_forecastdemo_dsg',
   'DatasetGroupName': 'util_power_forecastdemo_dsg',
   'CreationTime': datetime.datetime(2019, 10, 29, 14, 21, 53, 114000, tzinfo=tzlocal()),
   'LastModificationTime': datetime.datetime(2019, 10, 29, 14, 22, 16, 176000, tzinfo=tzlocal())},
  {'DatasetGroupArn': 'arn:aws:forecast:us-east-1:452432741922:dataset-group/sisamex_gp',
   'DatasetGroupName': 'sisamex_gp',
   'CreationTime': datetime.datetime(2019, 7, 29, 19, 31, 53, 23000, tzinfo=tzlocal()),
   'LastModificationTime': datetime.datetime(2019, 7, 29, 19, 31, 53, 23000, tzinfo=tzlocal())}],
 'ResponseMetadata': {'RequestId': 'e6800f27-6f81-47b3-b579-c2e96f1264c5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 26 Dec 2019 21:41:50 GMT',
   'x-amzn-requestid': 'e6800f27-6f81-47b3-b579-c2e96f1264c5',
   'content-length': '435',
   'connection': 'kee

Amazon forecast can import data from Amazon S3. We first explore the data locally to see the fields

In [5]:
import pandas as pd
df = pd.read_csv("../data/item-demand-time.csv", dtype = object,header=None)
df.head(3)

Unnamed: 0,0,1,2
0,2014-01-01 01:00:00,38.34991708126038,client_12
1,2014-01-01 02:00:00,33.5820895522388,client_12
2,2014-01-01 03:00:00,34.41127694859037,client_12


Now upload the data to S3. But before doing that, go into your AWS Console, select S3 for the service and create a new bucket inside the  `us-east-1` region. Use that bucket name convention of `amazon-forecast-unique-value-data`. The name must be unique, if you get an error, just adjust until your name works, then update the `bucketName` cell below.

In [13]:
s3 = session.client('s3')
accountId = boto3.client('sts').get_caller_identity().get('Account')

In [8]:
bucketName = "amazon-forecast-data-{0}".format(accountId) # Update the unique-value bit here.
key="elec_data/item-demand-time.csv"

In [9]:
s3.upload_file(Filename="../data/item-demand-time.csv", Bucket=bucketName, Key=key)

In [10]:
bucketName

'forecast-mggaska-4'

In [14]:
# One time setup only, uncomment the following command to create the role to provide to Amazon Forecast. 
# Save the generated role for all future calls to use for importing or exporting data. 

cmd = 'python ../setup_forecast_permissions.py '+bucketName
p = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [15]:
roleArn = 'arn:aws:iam::%s:role/amazonforecast'%accountId

### CreateDataset

More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`. Also for your project name, update it to reflect your name in a lowercase format.

In [16]:
DATASET_FREQUENCY = "H" 
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"

In [17]:
project = 'electric_power_forecastdemo' # Replace this with a unique name here, make sure the entire name is < 30 characters.
datasetName= project+'_ds'
datasetGroupName= project +'_gp'
s3DataPath = "s3://"+bucketName+"/"+key

In [18]:
datasetName

'electric_power_forecastdemo_ds'

### Schema Definition 
### We are defining the attributes for the model 

In [19]:
# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.
schema ={
   "Attributes":[
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      },
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      }
   ]
}

response=forecast.create_dataset(
                    Domain="CUSTOM",
                    DatasetType='TARGET_TIME_SERIES',
                    DatasetName=datasetName,
                    DataFrequency=DATASET_FREQUENCY, 
                    Schema = schema
                   )
datasetArn = response['DatasetArn']

In [20]:
forecast.describe_dataset(DatasetArn=datasetArn)

{'DatasetArn': 'arn:aws:forecast:us-east-1:452432741922:dataset/electric_power_forecastdemo_ds',
 'DatasetName': 'electric_power_forecastdemo_ds',
 'Domain': 'CUSTOM',
 'DatasetType': 'TARGET_TIME_SERIES',
 'DataFrequency': 'H',
 'Schema': {'Attributes': [{'AttributeName': 'timestamp',
    'AttributeType': 'timestamp'},
   {'AttributeName': 'target_value', 'AttributeType': 'float'},
   {'AttributeName': 'item_id', 'AttributeType': 'string'}]},
 'EncryptionConfig': {},
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2019, 12, 26, 21, 52, 38, 356000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2019, 12, 26, 21, 52, 38, 356000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '1c3fe74a-bd3e-4792-a555-b3c1395cb83e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 26 Dec 2019 21:52:41 GMT',
   'x-amzn-requestid': '1c3fe74a-bd3e-4792-a555-b3c1395cb83e',
   'content-length': '521',
   'connection': 'keep-

In [21]:
create_dataset_group_response = forecast.create_dataset_group(DatasetGroupName=datasetGroupName,
                                                              Domain="CUSTOM",
                                                              DatasetArns= [datasetArn]
                                                             )
datasetGroupArn = create_dataset_group_response['DatasetGroupArn']

If you have an existing datasetgroup, you can update it using **update_dataset_group** to update dataset group.



In [22]:
forecast.describe_dataset_group(DatasetGroupArn=datasetGroupArn)

{'DatasetGroupName': 'electric_power_forecastdemo_gp',
 'DatasetGroupArn': 'arn:aws:forecast:us-east-1:452432741922:dataset-group/electric_power_forecastdemo_gp',
 'DatasetArns': ['arn:aws:forecast:us-east-1:452432741922:dataset/electric_power_forecastdemo_ds'],
 'Domain': 'CUSTOM',
 'Status': 'ACTIVE',
 'CreationTime': datetime.datetime(2019, 12, 26, 21, 52, 46, 942000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2019, 12, 26, 21, 52, 46, 942000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '498067d7-0301-42f6-afd8-16e564bab1dc',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 26 Dec 2019 21:52:49 GMT',
   'x-amzn-requestid': '498067d7-0301-42f6-afd8-16e564bab1dc',
   'content-length': '363',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create Data Import Job
Brings the data into Amazon Forecast system ready to forecast from raw data. 

In [23]:
datasetImportJobName = 'EP_DSIMPORT_JOB_TARGET'
ds_import_job_response=forecast.create_dataset_import_job(DatasetImportJobName=datasetImportJobName,
                                                          DatasetArn=datasetArn,
                                                          DataSource= {
                                                              "S3Config" : {
                                                                 "Path":s3DataPath,
                                                                 "RoleArn": roleArn
                                                              } 
                                                          },
                                                          TimestampFormat=TIMESTAMP_FORMAT
                                                         )

In [24]:
ds_import_job_arn=ds_import_job_response['DatasetImportJobArn']
print(ds_import_job_arn)

arn:aws:forecast:us-east-1:452432741922:dataset-import-job/electric_power_forecastdemo_ds/EP_DSIMPORT_JOB_TARGET


Check the status of dataset, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes.

In [25]:
while True:
    dataImportStatus = forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)['Status']
    print(dataImportStatus)
    if dataImportStatus != 'ACTIVE' and dataImportStatus != 'CREATE_FAILED':
        sleep(30)
    else:
        break

CREATE_PENDING
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
ACTIVE


In [26]:
forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)

{'DatasetImportJobName': 'EP_DSIMPORT_JOB_TARGET',
 'DatasetImportJobArn': 'arn:aws:forecast:us-east-1:452432741922:dataset-import-job/electric_power_forecastdemo_ds/EP_DSIMPORT_JOB_TARGET',
 'DatasetArn': 'arn:aws:forecast:us-east-1:452432741922:dataset/electric_power_forecastdemo_ds',
 'TimestampFormat': 'yyyy-MM-dd hh:mm:ss',
 'DataSource': {'S3Config': {'Path': 's3://forecast-mggaska-4/elec_data/item-demand-time.csv',
   'RoleArn': 'arn:aws:iam::452432741922:role/amazonforecast'}},
 'FieldStatistics': {'item_id': {'Count': 23973,
   'CountDistinct': 3,
   'CountNull': 0},
  'target_value': {'Count': 23973,
   'CountDistinct': 4818,
   'CountNull': 0,
   'CountNan': 0,
   'Min': '0.0',
   'Max': '212.27197346600326',
   'Avg': 50.447323170680725,
   'Stddev': 38.72169238224658},
  'timestamp': {'Count': 23973,
   'CountDistinct': 7991,
   'CountNull': 0,
   'Min': '2014-01-01T01:00:00Z',
   'Max': '2014-11-29T23:00:00Z'}},
 'DataSize': 0.0010688817128539085,
 'Status': 'ACTIVE',
 'C

### Create Predictor with customer forecast horizon

Forecast horizon is the number of number of time points to predicted in the future. For weekly data, a value of 12 means 12 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24.

In [27]:
predictorName= project+'_prophet'

In [28]:
forecastHorizon = 24

In [31]:
algorithmArn = 'arn:aws:forecast:::algorithm/Prophet'

## Select a backtesting method

<img src='backtesting.png' />

In [32]:
create_predictor_response=forecast.create_predictor(PredictorName=predictorName, 
                                                  AlgorithmArn=algorithmArn,
                                                  ForecastHorizon=forecastHorizon,
                                                  PerformAutoML= False,
                                                  PerformHPO=False,
                                                  EvaluationParameters= {"NumberOfBacktestWindows": 1, 
                                                                         "BackTestWindowOffset": 24}, 
                                                  InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
                                                  FeaturizationConfig= {"ForecastFrequency": "H", 
                                                                        "Featurizations": 
                                                                        [
                                                                          {"AttributeName": "target_value", 
                                                                           "FeaturizationPipeline": 
                                                                            [
                                                                              {"FeaturizationMethodName": "filling", 
                                                                               "FeaturizationMethodParameters": 
                                                                                {"frontfill": "none", 
                                                                                 "middlefill": "zero", 
                                                                                 "backfill": "zero"}
                                                                              }
                                                                            ]
                                                                          }
                                                                        ]
                                                                       }
                                                 )

In [33]:
predictorArn=create_predictor_response['PredictorArn']

Check the status of the predictor. When the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [34]:
while True:
    predictorStatus = forecast.describe_predictor(PredictorArn=predictorArn)['Status']
    print(predictorStatus)
    if predictorStatus != 'ACTIVE' and predictorStatus != 'CREATE_FAILED':
        sleep(30)
    else:
        break

CREATE_PENDING
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
ACTIVE


### Get Error Metrics

In [35]:
forecast.get_accuracy_metrics(PredictorArn=predictorArn)

{'PredictorEvaluationResults': [{'AlgorithmArn': 'arn:aws:forecast:::algorithm/Prophet',
   'TestWindows': [{'EvaluationType': 'SUMMARY',
     'Metrics': {'RMSE': 23.126529322357886,
      'WeightedQuantileLosses': [{'Quantile': 0.9,
        'LossValue': 0.10779105103274744},
       {'Quantile': 0.5, 'LossValue': 0.29570812541583624},
       {'Quantile': 0.1, 'LossValue': 0.18988755392508985}]}},
    {'TestWindowStart': datetime.datetime(2014, 11, 29, 0, 0, tzinfo=tzlocal()),
     'TestWindowEnd': datetime.datetime(2014, 11, 30, 0, 0, tzinfo=tzlocal()),
     'ItemCount': 3,
     'EvaluationType': 'COMPUTED',
     'Metrics': {'RMSE': 23.126529322357886,
      'WeightedQuantileLosses': [{'Quantile': 0.9,
        'LossValue': 0.10779105103274744},
       {'Quantile': 0.5, 'LossValue': 0.29570812541583624},
       {'Quantile': 0.1, 'LossValue': 0.18988755392508985}]}}]}],
 'ResponseMetadata': {'RequestId': '4cc64c98-63e0-487e-85e4-8cbb78b871f3',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'

### Create Forecast

Now create a forecast using the model that was trained.

In [36]:
forecastName= project+'_prophet_algo_forecast'

In [37]:
create_forecast_response=forecast.create_forecast(ForecastName=forecastName,
                                                  PredictorArn=predictorArn)
forecastArn = create_forecast_response['ForecastArn']

Check the status of the forecast process, when the status change from **CREATE_IN_PROGRESS** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [38]:
while True:
    forecastStatus = forecast.describe_forecast(ForecastArn=forecastArn)['Status']
    print(forecastStatus)
    if forecastStatus != 'ACTIVE' and forecastStatus != 'CREATE_FAILED':
        sleep(30)
    else:
        break

CREATE_PENDING
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
ACTIVE


### Get Forecast

Once created, the forecast results are ready and you view them. 

In [39]:
print(forecastArn)
forecastResponse = forecastquery.query_forecast(
    ForecastArn=forecastArn,
    Filters={"item_id":"client_12"}
)
print(forecastResponse)

arn:aws:forecast:us-east-1:452432741922:forecast/electric_power_forecastdemo_prophet_algo_forecast
{'Forecast': {'Predictions': {'p10': [{'Timestamp': '2014-11-30T00:00:00', 'Value': -33.15486145019531}, {'Timestamp': '2014-11-30T01:00:00', 'Value': -31.74407196044922}, {'Timestamp': '2014-11-30T02:00:00', 'Value': -29.004981994628906}, {'Timestamp': '2014-11-30T03:00:00', 'Value': -31.651647567749023}, {'Timestamp': '2014-11-30T04:00:00', 'Value': -28.399169921875}, {'Timestamp': '2014-11-30T05:00:00', 'Value': -40.839046478271484}, {'Timestamp': '2014-11-30T06:00:00', 'Value': -40.90826416015625}, {'Timestamp': '2014-11-30T07:00:00', 'Value': -31.936710357666016}, {'Timestamp': '2014-11-30T08:00:00', 'Value': -8.097722053527832}, {'Timestamp': '2014-11-30T09:00:00', 'Value': 33.220367431640625}, {'Timestamp': '2014-11-30T10:00:00', 'Value': 53.681114196777344}, {'Timestamp': '2014-11-30T11:00:00', 'Value': 58.61423873901367}, {'Timestamp': '2014-11-30T12:00:00', 'Value': 52.869369506

# Export Forecast

You can export forecast to s3 bucket. To do so an role with s3 put access is needed, but this has already been created.

In [40]:
forecastExportName= project+'_prophet_algo_forecast_export'

In [41]:
outputPath="s3://"+bucketName+"/output"

In [42]:
forecast_export_response = forecast.create_forecast_export_job(
                                                                ForecastExportJobName = forecastExportName,
                                                                ForecastArn=forecastArn, 
                                                                Destination = {
                                                                   "S3Config" : {
                                                                       "Path":outputPath,
                                                                       "RoleArn": roleArn
                                                                   } 
                                                                }
                                                              )

In [43]:
forecastExportJobArn = forecast_export_response['ForecastExportJobArn']
print(forecastExportJobArn)

arn:aws:forecast:us-east-1:452432741922:forecast-export-job/electric_power_forecastdemo_prophet_algo_forecast/electric_power_forecastdemo_prophet_algo_forecast_export


In [44]:
while True:
    forecastExportStatus = forecast.describe_forecast_export_job(ForecastExportJobArn=forecastExportJobArn)['Status']
    print(forecastExportStatus)
    if forecastExportStatus != 'ACTIVE' and forecastExportStatus != 'CREATE_FAILED':
        sleep(30)
    else:
        break

CREATE_PENDING
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
CREATE_IN_PROGRESS
ACTIVE


Check s3 bucket for results

In [45]:
s3.list_objects(Bucket=bucketName,Prefix="output")

{'ResponseMetadata': {'RequestId': 'AE0EF51AF4A738A8',
  'HostId': 'Ry/nZAALfm4gkDhYnfXGX8ojGov/W8X3uVoXbI15JVpe0ajl8gycwXr8o0TM8owLOJHO1zqKdIM=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Ry/nZAALfm4gkDhYnfXGX8ojGov/W8X3uVoXbI15JVpe0ajl8gycwXr8o0TM8owLOJHO1zqKdIM=',
   'x-amz-request-id': 'AE0EF51AF4A738A8',
   'date': 'Thu, 26 Dec 2019 22:46:53 GMT',
   'x-amz-bucket-region': 'us-east-1',
   'content-type': 'application/xml',
   'transfer-encoding': 'chunked',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'IsTruncated': False,
 'Marker': '',
 'Contents': [{'Key': 'output/_CHECK',
   'LastModified': datetime.datetime(2019, 12, 26, 22, 43, 37, tzinfo=tzlocal()),
   'ETag': '"d41d8cd98f00b204e9800998ecf8427e"',
   'Size': 0,
   'StorageClass': 'STANDARD',
   'Owner': {'DisplayName': 'mggaska',
    'ID': 'baf4d6a59fe0001e3a97a726133c9e6f7a27dd92c1a1db51d4738a4997eb3cd6'}},
  {'Key': 'output/_SUCCESS',
   'LastModified': datetime.datetime(2019, 12, 26, 22, 44, 48, tzin

# Cleanup

While Forecast is in preview there are no charges for using it, but to future proof this work below are the instructions to cleanup your work space.

In [None]:
# Delete forecast export for both algorithms
wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecastExportJobArn))

In [None]:
# Delete forecast
wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecastArn))

In [None]:
# Delete predictor
wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictorArn))

In [None]:
# Delete Import
wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=ds_import_job_arn))

In [None]:
# Delete the dataset
wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=datasetArn))

In [None]:
# Delete Dataset Group
wait_till_delete(lambda: forecast.delete_dataset_group(DatasetGroupArn=datasetGroupArn))