# Microsoft Azure ML Automated Machine Learning
![alt text](https://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg "NASA Ames")
## _NASA Predictive Maintenance Sample_ 

## Purpose and Challenge

The purpose of this notebook is for the user to build and deploy a Machine Learning (ML) application using Azure Machine Learning (AML) Service.

The challenge we will tackle is predictive maintenance: when will a certain piece of machinery will fail, so that we are prepared to fix or replace it in advance _before_ it fails.

This notebook has the complete code to load, prep, train and deploy the model. We chose a small public data set for this demo so as to run the entire process in only few minutes.

Following are the high level steps:

1. Acquire and Prepare Data
2. Train using automated machine learning to get the best possible model
3. Deploy the model


## Prepare the environment for training

In [None]:
import logging
import os
import random
import time

from matplotlib import pyplot as plt
from matplotlib.pyplot import imshow
import numpy as np
import pandas as pd

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.widgets import RunDetails
from azureml.core.model import Model

Define the experiment name and read the config file

In [None]:
# Retrieve workspace
ws = Workspace.from_config()

# Choose a name for the experiment and specify the project folder.
experiment_name = 'automl-predictive-rul'
project_folder = './sample_projects/automl-demo-predmain'

experiment = Experiment(ws, experiment_name)

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace Name'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
pd.DataFrame(data = output, index = ['']).T

## 1. Acquire and Prepare Data
For this notebook, we will use the NASA Prognostics Center's Turbo-Fan Failure dataset.  It is located here: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan

We have it as .txt file in the same folder. We read it into a Pandas DataFrame.
Note the headers were not in the space seperated txt file, so we assign them from the ReadMe in the zip file. In pandas we use read_csv with the delimiter option, even with a space delimited file.

In [None]:
import pandas as pd
train = pd.read_csv("train_FD001.txt", delimiter="\s|\s\s", index_col=False, engine='python', names=['unit','cycle','os1','os2','os3','sm1','sm2','sm3','sm4','sm5','sm6','sm7','sm8','sm9','sm10','sm11','sm12','sm13','sm14','sm15','sm16','sm17','sm18','sm19','sm20','sm21'])

Take a quick look at the data

In [None]:
train.head(5)

Our dataset has a number of units in it, with each engine flight listed as a cycle. The cycles count up until the engine fails. What we would like to predict is the no. of cycles until failure. 
So we need to calculate a new column called "Remaining Useful Life", or RUL, for short. It will be the last cycle value minus each cycle value per unit.

In [None]:
def assignrul(df):
    maxi = df['cycle'].max()
    df['rul'] = maxi - df['cycle']
    return df
    
train_new = train.groupby('unit').apply(assignrul)

train_new.columns

Now our dataframe has the 'RUL' column.  Predicting this value will be the objective of this exercise.

In [None]:
train_new.head(192)

First note that some of the sensor measurements do seem to be changing as we near 0 RUL (sm3, sm4, sm14, sm17). This implies that we should be able to make a model that will be useful enough for business value.

We are now ready to train a model on this data using Automated ML.

## 2. Train using automated machine learning

Here we utilize Azure's AutoML package to automate the scaling of the sensors, selection of sensors, and automatically train and evaluate many different types of ML models.

Create training data

In [None]:
# remove the unit ID and cycle number
X_train = train_new.iloc[:,2:26].values
# extract the RUL column to be the target column
y_train = train_new.iloc[:,26:27].values.astype(int).flatten()

### Split data to train and test

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_train,
                                                    y_train,
                                                    test_size=0.3,
                                                    random_state=100)
X_train = pd.DataFrame(X_train)
X_test = pd.DataFrame(X_test)
print(X_train.shape)
print(X_test.shape)

In [None]:
X_test[0:1]

Now we are ready to configure automated ML.  We provide necessary information on: what we want to predict, what accuracy metric we want to use, how many models we want to try, and many other parameters.  AutoML will also automatically scale the data for us.

## Configure Automated ML

Set the automated ML run. Full list of parameters is available [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train).

|Property|Description|
|-|-|
|**task**|classification or regression|
|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|
|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|
|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|
|**n_cross_validations**|Number of cross validation splits.|
|**X**|(sparse) array-like, shape = [n_samples, n_features]|
|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|
|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|
|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|
|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|

In [None]:
# Create Auto ML configuration
Automl_config = AutoMLConfig(task = 'regression',
                             primary_metric = 'r2_score',
                             iteration_timeout_minutes = 5,
                             iterations = 5,                             
                             blacklist_models = ['KNN','RandomForest'],
                             X = X_train,
                             y = y_train,
                             n_cross_validations = 3,
                             preprocess = False,
                             experiment_exit_score = 0.985,
                             path=project_folder)

Finally we are ready to launch AutoML.  This step can take many minutes, but AutoML will give you updates as models are trained and evaluated by the metric we specified above.  AutoML also let us know which scaling method was used.  The information from each ML model training will be stored in the Experiment section of the ML Workspace, where we can review it through Azure Portal.

In [None]:
# Submit the training job. The output will show the iterations as they finish one by one
experiment=Experiment(ws, experiment_name)
local_run = experiment.submit(Automl_config, show_output=True)

### View the training run in a graphic widget

In [None]:
RunDetails(local_run).show()

### Retrieve the best model (according to the primary metric)

In [None]:
# find the run with the highest accuracy value.
best_run, fitted_model = local_run.get_output()
print(best_run)

## 3. Deploy Model

In [None]:
# register best model in workspace. The output of this cell is important
description = 'AutoML NASA RUL Regression'
tags = None
model = local_run.register_model(description=description, tags=tags)
local_run.model_id # Use the model id that is printed out in the cell below

After we register the model in our ML Workspace, it should be visible in Azure Portal.

Now we want to deploy the model as a REST API that we can feed a row or rows of "X" data to, and return the predicted 'RUL' value.  To accomplish this, we will build a container image in our AML Workspace and deploy that image as a Container instance in Azure's ACI service.  We will then obtain an IP address where we can submit data and receive back the predicted 'RUL' value.

There are 3 things we need: 
1. A score.py file that contains the init() and run() functions with instructions on how to load and socre with the model
2. A myenv.yml file that contains information on the python environment in which the model needs to run
3. Configurations for our images and our services, using functions provided by AzureML service.

The cells below help you set these up.   You will need to use the registered model name provided by the cell above.

### Create scoring script

In [None]:
%%writefile score.py
# Scoring Script
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression

from azureml.core.model import Model

import azureml.train.automl

def init():
    global model
    model_path = Model.get_model_path('<<modelid>>')
    print(model_path)
    model = joblib.load(model_path)
    

def run(raw_data):
    # grab and prepare the data
    data = (np.array(json.loads(raw_data)['data'])).reshape(1,-1)
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

Replace the 'modelid' tag with the actual model ID

In [None]:
# Substitute the actual model id in the script file.

script_file_name = 'score.py'

with open(script_file_name, 'r') as cefr:
    content = cefr.read()

with open(script_file_name, 'w') as cefw:
    cefw.write(content.replace('<<modelid>>', local_run.model_id))

### Create the conda environment file

In [None]:
from azureml.core.conda_dependencies import CondaDependencies

myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','lightgbm'], pip_packages=['azureml-sdk[automl]'])

conda_env_file_name = 'myenv.yml'
myenv.save_to_file('.', conda_env_file_name)

In [None]:
with open("myenv.yml","r") as f:
    print(f.read())

### Create the webservice configuration

In [None]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=2, 
                                               memory_gb=2, 
                                               tags={"data": "RUL",  "method" : "sklearn"}, 
                                               description='Predict RUL with Azure AutoML')

### Create the container image and deploy as a webservice

Finally, configure the container image and deploy the service. Make sure the filenames match, your Workspace is in variable ws, and your model name is correct. It will create your containter image and deploy it as a webservice.

This process can take up to 10 minutes, so please be patient. You can check the progress bar periodically ...

In [None]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage

# configure the image
image_config = ContainerImage.image_configuration(execution_script="score.py", 
                                                  runtime="python", 
                                                  conda_file="myenv.yml",
                                                  tags = {'ml': "Regression", 'type': "automl"},
                                                  description = "Image for automated ML NASA predictive maintenance")

# deploy the image to a webservice
service = Webservice.deploy_from_model(workspace=ws,
                                       name='automl-rul-regression',
                                       deployment_config=aciconfig,
                                       models=[model],
                                       image_config=image_config)

service.wait_for_deployment(show_output=True)

Just as a check, we can retrieve the URI for the scoring function.

In [None]:
print(service.scoring_uri)

### Test the service

Let's check to see if the service is working.  Here we submit a single row of data from X_train to see if it returns a reasonable prediction.

In [None]:
import requests
import json

# send a random row from the test set to score
#random_index = np.random.randint(0, len(X_train)-1)
input_data = "{\"data\": " + str(X_test[1:2].values.tolist()) + "}" #str(list(X_train[0].reshape(1,-1)[0])) + "}"

headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_data, headers=headers)

print("POST to url", service.scoring_uri)
print("input data:", input_data)
print("label:", y_test[1:2])
print("prediction:", resp.text)

Here we see one engine evolving through many flights, or cycles.  As we approach failure, the rul declines to zero, as does the prediction.  This is a good example of how the predictive model can assist in estimate the future failure of the engine.

Note that the model does not perform well at high rul.  This is an acceptable outcome as the engine is far from failure.

### Delete the web service resource

To avoid any run-away Azure costs, we always delete un-necessary services when we are done.

In [None]:
service.delete()

If the workspace will not be in use, it is advisable to delete it also