# Step 2: Model Building & Operationalization
Using the training and test data sets we constructed in the 1st notebook`1_data_ingestion_and_preparation.ipynb`, in this notebook we will buid the model using a neural network type called LSTM network for scenario described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict failure in aircraft engines.

Once trained, we will operationalize the model through the deployment of a web service using Azure container instance.

In [None]:
%load_ext autoreload
%autoreload 2

import os
import json
import numpy as np
import pandas as pd
from math import exp
from common.utils import to_tensors
from sklearn.metrics import (precision_score,recall_score,f1_score)

from azureml.core import  (Workspace,Run,VERSION,
                           Experiment,Datastore)
from azureml.core.compute import (AmlCompute, ComputeTarget)
from azureml.exceptions import ComputeTargetException

from azureml.train.dnn import PyTorch
from azureml.train.hyperdrive import *
from azureml.widgets import RunDetails

from azureml.core import Environment
from azureml.core.model import InferenceConfig,Model
from azureml.core.webservice import AciWebservice






PROJECT_DIR = os.getcwd()
TRAINING_DIR = os.path.join(PROJECT_DIR, 'train')
SCORING_DIR = os.path.join(PROJECT_DIR, 'score')
EXPERIMENT_NAME = "deep_predictive_maintenance"
CLUSTER_NAME = "gpu-cluster"
ACI_SVC_NAME = 'predictive-maintenance-svc'

print('SDK verison', VERSION)

## Azure ML workspace

In [None]:
ws = Workspace.from_config()
print('Workspace loaded:', ws.name)

## Data store

We have previously created the labeled data set in the `Code\1_Data Ingestion and Preparation.ipynb` Jupyter notebook and stored it in default data store of the AML workspace.

Here, we call path method that returns an instance to [data reference](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.data_reference.datareference?view=azure-ml-py) which  will be passed to the training script during the run execution.

In [None]:
ds = ws.datastores['workspaceblobstore']
data_path = "data"
ds_path = ds.path(data_path)
print(ds_path)

## Compute target

Here, we provision the AML Compute that will be used to execute training script

In [None]:
try:
    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=6)

    # create the cluster
    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)

compute_target.wait_for_completion(show_output=True)

## Modeling

The traditional predictive maintenance machine learning models are based on feature engineering, the manual construction of variable using domain expertise and intuition. This usually makes these models hard to reuse as the feature are specific to the problem scenario and the available data may vary between customers. Perhaps the most attractive advantage of deep learning they automatically do feature engineering from the data, eliminating the need for the manual feature engineering step.

When using LSTMs in the time-series domain, one important parameter is the sequence length, the window to examine for failure signal. This may be viewed as picking a `window_size` (i.e. 5 cycles) for calculating the rolling features in the [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3). The rolling features included rolling mean and rolling standard deviation over the 5 cycles for each of the 21 sensor values. In deep learning, we allow the LSTMs to extract abstract features out of the sequence of sensor values within the window. The expectation is that patterns within these sensor values will be automatically encoded by the LSTM.

Another critical advantage of LSTMs is their ability to remember from long-term sequences (window sizes) which is hard to achieve by traditional feature engineering. Computing rolling averages over a window size of 50 cycles may lead to loss of information due to smoothing over such a long period. LSTMs are able to use larger window sizes and use all the information in the window as input. 

http://colah.github.io/posts/2015-08-Understanding-LSTMs/ contains more information on the details of LSTM networks.

This sample illustrates the LSTM approach to binary classification using a sequence_length of 50 cycles to predict the probability of engine failure within 30 days.

##  Implementation and hyperparameters tuning

Building a Neural Net requires determining the network architecture.

In this scenario we will build the network using Pytorch framework as opposed to original sample that have used Keras. As such from the SDK, we use the pytorch [estimator](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-pytorch).

Search for the best hyperparameters is achieved using [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)

In the train directory, the listed below files are used as follow:

 - Utils.py: contains data preparation to read csv files and transform them into lstm ready 3D tensors.
 - network.py: contains LSTM network definition
 - train.py: training and evaluation code
 - entry.py: train and save model to storage

## Pytorch Estimator

Here, we define the Pytorch estimator.

In [None]:
script_params = {
    '--epochs': 2,
    '--data_path': ds_path,
    '--output_dir': './outputs'
}

estimator = PyTorch(source_directory = TRAINING_DIR, 
                    conda_packages = ['pandas', 'numpy', 'scikit-learn'],
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='entry.py',
                    use_gpu=True)

## Hyperparameters tuning using Hyperdrive

Here, we define hyerdrive configuration, given the high cost associated with aircraft engine failure and how detrimental it is, we will tune and optimize our model for recall metric.

For completness we will be tracking precision and F1 as well.

In [None]:
param_sampling = RandomParameterSampling( {
        'learning_rate':uniform(1e-4, 1e-2),
        'l2':uniform(1e-4, 1e-3),
        'dropout':uniform(.5,.7),
        'batch_size':choice(16,32,64),
        'hidden_units':choice(4,6)
    }
)

termination_policy = BanditPolicy(slack_factor=.1, 
                                  evaluation_interval=1, 
                                  delay_evaluation=1
                                 )

hd_run_config = HyperDriveConfig(estimator=estimator,
                                 hyperparameter_sampling=param_sampling,
                                 policy=termination_policy,
                                 primary_metric_name='recall',
                                 primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                 max_total_runs=10,
                                 max_concurrent_runs=5
                                )

We submit the exepriment for execution and render the Run execution through the widget

In [None]:
experiment = Experiment(workspace=ws, name=EXPERIMENT_NAME)

run = experiment.submit(hd_run_config)
run

In [None]:
RunDetails(run).show()

# Model registration

With the model training done and hyperparameters tuned, we save the best trained model found by hyperdrive based on the primary metric, we have selected.

In [None]:
best_run = run.get_best_run_by_primary_metric()

model = best_run.register_model(model_name='deep_pdm', model_path='outputs/network.pth')
print(model.name, 'saved')

## Model operationalization


We are now ready to operationalizing the model and deloying the webservice. For testing purposes, we wil use ACI to serve predictions.

For More details on Model deployment workflow in Azure Machine learning service,click [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#deployment-workflow) 

The artifacts included in the image are under the score  directory. 
The listed below files are used as follow:

 - network.py: contains LSTM network definition 
 - score.py: scoring file containing model loading and predictin serving
 - myenv.yml: contain python libraries needed by score.py

## Image creation

Here, we instantiate an image configuration object and follow-up with Image creation

In [None]:
#Image_configuration call require  current directory to be where score.py and dependencies reside

os.chdir(SCORING_DIR) 
print("Switched current directory to",os.getcwd())

image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                 runtime = "python",
                                                 conda_file = "myenv.yml",
                                                 dependencies = ["network.py"],
                                                 description = "Image of predictive maintenance model",
                                                 tags = { "type": "lstm_classifier"}
                                                 )
image = ContainerImage.create(name = "dpm-image", 
                              models = [model], 
                              image_config = image_config,
                              workspace = ws
                              )
image.wait_for_creation(show_output=True)


os.chdir(PROJECT_DIR)
print("Reverted to root experiment directory")

## Web service deployment

With image built and published in the Azure container registry associated with our Azure machine learning workspace, we proceed with the deployment of the web service. for testing purposes, we opt for Azure container instance instead of Azure Kubernetes service cluster

In [None]:

aci_config = AciWebservice.deploy_configuration(cpu_cores=2, 
                                               memory_gb=2, 
                                               tags={"type":"deep predictive maintenance"}, 
                                               description='Predict equipment failure')

service = Webservice.deploy_from_image(workspace=ws,
                                       name=ACI_SVC_NAME,
                                       deployment_config=aci_config,
                                       image = image)

service.wait_for_deployment(show_output=True)

## Test Web service

Finally, we score the test data set against the webservice we've just deployed,and we'll report peformance metrics

In [None]:
testfile_path = os.path.join(PROJECT_DIR, 'data/preprocessed_test_file.csv')
X,y,engine_ids = to_tensors(testfile_path, is_test = True)

output_df = pd.DataFrame(columns = ['engine ID', 'prediction', 'likelihood'])

for i,x in enumerate(X):
    output =service.run(json.dumps({'input_data': x[np.newaxis,:].tolist()}))
    output_df.loc[i] = [str(engine_ids[i]), float(output['prediction']),
                        round(exp(output['likelihood']),2)]
                                                                       
output_df.T

## Test set performance

Lastly, we report precision, recall and F1 performance metrics on the test data

In [None]:
y_hat = output_df.prediction

print("Precision:",round(precision_score(y, y_hat),2))
print("Recall:",round(recall_score(y, y_hat),2))
print("F1:",round(f1_score(y, y_hat),2))

## Tear down resources

Now that we're done, we delete the ACI deployment

In [None]:
service.delete()