# This notebook describes creation of a forecasting model and its deployment on AKS.

### Before start

Install FTK using [shell](https://azuremlftkrelease.blob.core.windows.net/latest/install_amlpf_linux.sh) or [batch](https://azuremlftkrelease.blob.core.windows.net/latest/install_amlpf_windows.bat) scripts.  
To run this notebook please install the python SDK by running 
```
activate azuremlftk_nov2018
pip install --upgrade azureml-sdk[notebooks,automl]
```
Login to Azure
```
az login
```
After installation is complete, select Kernel>Change Kernel>azuremlftk_nov2018.

#### Imports

In [None]:
import os
import pandas as pd
import numpy as np
import json

from ftk import TimeSeriesDataFrame, ForecastDataFrame
from ftk.operationalization import ScoreContext
from ftk.transforms import TimeSeriesImputer, TimeIndexFeaturizer, DropColumns, GrainIndexFeaturizer 
from ftk.models import RegressionForecaster
from sklearn.ensemble import RandomForestRegressor
from ftk.pipeline import AzureMLForecastPipeline
from ftk.data import load_dominicks_oj_dataset

#### Load data
To train and test model load the Dominicks data set.

In [None]:
train_tsdf, test_tsdf = load_dominicks_oj_dataset()
# Use a TimeSeriesImputer to linearly interpolate missing values
imputer = TimeSeriesImputer(input_column='Quantity', 
                            option='interpolate',
                            method='linear',
                            freq='W-WED')

train_imputed_tsdf = imputer.transform(train_tsdf)

#### Prepare the pipeline.
Create the forecasting pipeline to be deployed.

In [None]:
oj_series_freq = 'W-WED'
oj_series_seasonality = 52

# DropColumns: Drop columns that should not be included for modeling. `logmove` is the log of the number of 
# units sold, so providing this number would be cheating. `WeekFirstDay` would be 
# redundant since we already have a feature for the last day of the week.
columns_to_drop = ['logmove', 'WeekFirstDay', 'week']
column_dropper = DropColumns(columns_to_drop)
# TimeSeriesImputer: Fill missing values in the features
# First, we need to create a dictionary with key as column names and value as values used to fill missing 
# values for that column. We are going to use the mean to fill missing values for each column.
columns_with_missing_values = train_imputed_tsdf.columns[pd.DataFrame(train_imputed_tsdf).isnull().any()].tolist()
columns_with_missing_values = [c for c in columns_with_missing_values if c not in columns_to_drop]
missing_value_imputation_dictionary = {}
for c in columns_with_missing_values:
    missing_value_imputation_dictionary[c] = train_imputed_tsdf[c].mean()
fillna_imputer = TimeSeriesImputer(option='fillna', 
                                   input_column=columns_with_missing_values,
                                   value=missing_value_imputation_dictionary)
# TimeIndexFeaturizer: extract temporal features from timestamps
time_index_featurizer = TimeIndexFeaturizer(correlation_cutoff=0.1, overwrite_columns=True)

# GrainIndexFeaturizer: create indicator variables for stores and brands
grain_featurizer = GrainIndexFeaturizer(overwrite_columns=True, ts_frequency=oj_series_freq)

random_forest_model_deploy = RegressionForecaster(estimator=RandomForestRegressor(), make_grain_features=False)

pipeline_deploy = AzureMLForecastPipeline([('drop_columns', column_dropper), 
                                           ('fillna_imputer', fillna_imputer),
                                           ('time_index_featurizer', time_index_featurizer),
                                           ('random_forest_estimator', random_forest_model_deploy)
                                          ])

## Deployment

#### Create the required files.
We will now deploy the model as a web service. That means we will create a docker image with the service logic and deploy it as [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/). The image creation of Forecasting model requires the model contained in the pickle file and dependencies file. This file is required to create the conda environment.

In [None]:
import pickle
with open('pipeline.pkl', 'wb') as f:
    pickle.dump(pipeline_deploy, f)

Conda dependencies file

In [None]:
%%writefile conda_dependencies.yml
################################################################################
#
# Create Azure ML Forecasting Toolkit Conda environments on Linux platforms. 
# This yml is used specifically in creating containers on ACR for use 
# AML deployments.
#
################################################################################

name: azuremlftk_nov2018
dependencies:
  # AzureML FTK dependencies
  - pyodbc
  - statsmodels
  - pandas
  - scikit-learn==0.19.1
  - tensorflow
  - keras
  - distributed==1.23.1

  - pip:
    # AML logging
    - https://azuremldownloads.azureedge.net/history-packages/preview/azureml.primitives-1.0.11.491405-py3-none-any.whl
    - https://azuremldownloads.azureedge.net/history-packages/preview/azureml.logging-1.0.81-py3-none-any.whl
    
    #azure ml
    - azureml-sdk[automl]
    
    #Dependencies from other AML packages
    - https://azuremlftkrelease.blob.core.windows.net/azpkgdaily/azpkgcore-1.0.18309.1b1-py3-none-any.whl
    - https://azuremlftkrelease.blob.core.windows.net/azpkgdaily/azpkgsql-1.0.18309.1b1-py3-none-any.whl

    # AMLPF package  
    - https://azuremlftkrelease.blob.core.windows.net/dailyrelease/azuremlftk-0.1.18305.1a1-py3-none-any.whl

#### Run the deployment.

In [None]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

Initialize a workspace object from persisted configuration.

The workspace is an Azure resource that holds all of your models, docker images, and services created. It can be configured using the file in json format. The example of this file is shown below.

In [None]:
%%writefile workspace_aks.json
{
    "subscription_id": "<subscription id>",
    "resource_group": "<resource group>",
    "workspace_name": "<workspace name>",
    "location": "<location>"
}

If the workspace is not already present create it.

In [None]:
from azureml.core import Workspace
from azureml.exceptions import ProjectSystemException
ws = None
try:
    #Try to get the workspace if it exists.
    ws = Workspace.from_config("workspace_aks.json")
except ProjectSystemException:
    #If the workspace was not found, create it.
    with open("workspace_aks.json", 'r') as config:
        ws_data = json.load(config)
    ws = Workspace.create(name = ws_data["workspace_name"],
                          subscription_id = ws_data["subscription_id"],
                          resource_group = ws_data["resource_group"],
                          location = ws_data["location"])
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

#### Register Model

You can add tags and descriptions to your models. The below call registers `pipeline.pkl` file as a model with the name `aksforecast` in the workspace.

In [None]:
from azureml.core.model import Model

model = Model.register(model_path = "pipeline.pkl",
                       model_name = "aksforecast",
                       workspace = ws)

Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers.

In [None]:
regression_models = Model.list(ws)
for m in regression_models:
    print("Name:", m.name,"\tVersion:", m.version, "\tDescription:", m.description, m.tags)

You can pick a specific model to deploy

In [None]:
print(model.name, model.description, model.version, sep = '\t')

### Create Docker Image

Create `score.py`. Note that the `aksforecast` in the `get_model_path` call is referring to a same named model `aksforecast` registered under the workspace.

In [None]:
%%writefile score.py
import pickle
import json

from ftk.operationalization.score_script_helper import run_impl
from azureml.core.model import Model

def init():
    #init method will be executed once at start of the docker - load the model
    global pipeline
    #Get the model path.
    pipeline_pickle_file = Model.get_model_path("aksforecast")
    #Load the model.
    with open(pipeline_pickle_file, 'rb') as f:
        pipeline = pickle.load(f)

#Run method is executed once per call.
def run(input_data):
    #The JSON encoded input_data will be interpreted as a TimeSeriedData frame and will 
    #be used for forecasting.
    #Return the JSON encoded data frame with forecast.
    return run_impl(input_data, pipeline=pipeline)

Note that following command can take a few minutes. An image can contain multiple models.

In [None]:
from azureml.core.image import Image, ContainerImage

image_config = ContainerImage.image_configuration(runtime= "python",
                                 execution_script="score.py",
                                 conda_file="conda_dependencies.yml")

image = Image.create(name = "ftkimage1",
                     # this is the model object 
                     models = [model],
                     image_config = image_config, 
                     workspace = ws)

Monitor image creation.

In [None]:
image.wait_for_creation(show_output = True)

List images and find out the detailed build log for debugging.

In [None]:
for i in Image.list(workspace = ws):
    print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))

### Deploy image as web service on Azure Container Instance

Create AKS using provisioning configuration. It defines the number of nodes, their sizes and SSL certificate for secure connection to the endpoint. Here we use the defaiult values for these parameters.

In [None]:
from azureml.core.compute import ComputeTarget, AksCompute
# Use the default configuration (can also provide parameters to customize)
prov_config = AksCompute.provisioning_configuration()

aks_name = 'my-aks-8' 
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws, 
                                  name = aks_name, 
                                  provisioning_configuration = prov_config)

Monitor the AKS creation. This may take 15-20 minutes.

In [None]:
%%time
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

The deployment configuration defines how much resources should be reserved for this container. Start deployment using newly created AKS. Note that the service creation can take few minutes.

In [None]:
#Set the web service configuration (using default here)
from azureml.core.webservice import AksWebservice, Webservice
aks_config = AksWebservice.deploy_configuration()

aks_service = Webservice.deploy_from_image(workspace = ws, 
                                           name = aks_name,
                                           image = image,
                                           deployment_config = aks_config,
                                           deployment_target = aks_target)
aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)

If there was a problem during deployment it may be useful to analyze the deployment logs.
Even after deployment operation is complete the container still may be in the "Creating" state. Make sure that creating is complete and logs are shown correctly before proceeding.

In [None]:
print(aks_service.get_logs())

### Test web service

Create a validation data set to benchmark new service.
You might ask why we are sending a ForecastDataFrame to the service? We do it to give it the values of the future predictor variable, like price at the future time. 

In [None]:
imputer = TimeSeriesImputer(input_column='Quantity', 
                            option='interpolate',
                            method='linear',
                            freq='W-WED')    
train_imputed_tsdf = imputer.transform(train_tsdf)
validate_ts = train_imputed_tsdf.assign(PointForecast=0.0, DistributionForecast=np.nan)
validate_fdf = ForecastDataFrame(validate_ts, pred_point='PointForecast', pred_dist='DistributionForecast')
sc_validate = ScoreContext(input_training_data_tsdf=train_imputed_tsdf,
                           input_scoring_data_fcdf=validate_fdf, 
                           pipeline_execution_type='train_predict')

We are sending the training data set to train the pickled model:

In [None]:
train_imputed_tsdf.head()

ForecastDataFrame for validation contains predictor values and the empty columns for predicted values. In this case it is columns DistributionForecast and PointForecast.

In [None]:
validate_fdf.head()

ScoreContext contains both training and prediction(validation) data frames and helps to serialize these data to JSON format understood by the service. 

Run the prediction and show the results

In [None]:
json_direct =aks_service.run(sc_validate.to_json())
fcdf_direct=ForecastDataFrame.construct_from_json(json_direct)
fcdf_direct.head()

### Delete AKS service and resiurce group

This part of a notebook is oprtional and intended to clean up after work is complete. First delete the service.

In [None]:
aks_service.delete()

Check if services are present in the workspace.

In [None]:
[svc.name for svc in Webservice.list(ws)]

Delete the resource group.<br/>
**Note** This operation is danger and will delete all the content of the resource group.
To delete group the azure sdk package needs to be installed:
```
pip install https://azuremlftkrelease.blob.core.windows.net/azpkgdaily/azpkgamlsdk-1.0.18309.1b1-py3-none-any.whl
```

In [None]:
from azpkgamlsdk.deployment.utils_environment import delete_resource_group

delete_resource_group(ws.resource_group, ws.subscription_id)