# Step 2: Model Building & Evaluation

For this notebook, use Python 3.8 - AzureML Kernel


Using the training and test data sets we constructed in the `Code/1_data_ingestion_and_preparation.ipynb` Jupyter notebook, this notebook builds an Azure AutoML for similar scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict failure in aircraft engines. We will store the model for deployment in an Azure web service which we build in the bottom section of this notebook.


Dataset derived from:
https://data.nasa.gov/Aerospace/CMAPSS-Jet-Engine-Simulated-Data/ff5v-kuh6/data


In [None]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import os
import numpy as np


import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Dataset




In [None]:
print("This notebook was created using version 1.47.0 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

In [None]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl-Predictive-Maintenance'

experiment=Experiment(ws, experiment_name)

output = {}
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

## Create or Attach existing AmlCompute
A compute target is required to execute the Automated ML run. In this tutorial, you create AmlCompute as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
                                                           max_nodes=2)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# Data

### Load Data



In [None]:

# These file names detail the data files. 
TRAIN_DATA = 'PM_train_files.pkl'
TEST_DATA = 'PM_test_files.pkl'

# We'll serialize the model in json format
LSTM_MODEL = 'modellstm.json'

# and store the weights in h5
MODEL_WEIGHTS = 'modellstm.h5'

train_df = pd.read_pickle(TRAIN_DATA)
display(train_df.head(10))

test_df = pd.read_pickle(TEST_DATA)

#test_df.head(10)

In [None]:
y_train = train_df[["label1"]]
X_train = train_df.drop(["RUL","label2","id"],axis=1)
X_train.to_csv("PM_train_files.csv")
#y_train.head()
X_train.head()

In [None]:
y_test = test_df[["label1"]]
X_test = test_df.drop(["RUL","label2","id"],axis=1)
X_test.to_csv("PM_test_files.csv")

#y_test.head()
#X_test.head()

In [None]:
#np.savetxt('PM_train_features.csv', X_train, delimiter=',')
#np.savetxt('PM_train_labels.csv', y_train, delimiter=',')

train_df.drop(["RUL","label2","id"],axis=1).to_csv('PM_train.csv')

datastore = ws.get_default_datastore()
datastore.upload_files(files=['./PM_train.csv'],
                       target_path='PD_AutoML_Classifier/',
                       overwrite=True)

input_dataset = Dataset.Tabular.from_delimited_files(path=[(datastore, 'PD_AutoML_Classifier/PM_train.csv')])


## Train

Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.

|Property|Description|
|-|-|
|**task**|classification or regression|
|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|
|**enable_early_stopping**|Stop the run if the metric score is not showing improvement.|
|**n_cross_validations**|Number of cross validation splits.|
|**training_data**|Input dataset, containing both features and label column.|
|**label_column_name**|The name of the label column.|

**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)

In [None]:
automl_settings = {
    "n_cross_validations": 5,
    #"primary_metric": 'AUC_weighted',
    "primary_metric": "accuracy",
    "enable_early_stopping": True,
    "max_concurrent_iterations": 4, # This is a limit for testing purpose, please increase it as per cluster size
    "experiment_timeout_hours": 1.00, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible
    "verbosity": logging.INFO,
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             compute_target = compute_target,
                             training_data = input_dataset,
                             label_column_name = "label1",
                             **automl_settings
                            )

In [None]:
remote_run = experiment.submit(automl_config, show_output = False)

In [None]:
remote_run.wait_for_completion(show_output=False)

I will add code a at later date, however you do have the ability to delpyu the model from the portal id you choose to do so .

**How to deploy an AutoML model to an online endpoint**

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-automl-endpoint?view=azureml-api-2&tabs=Studio


