# Step 2: Model Building & Evaluation
Using the training and test data sets we constructed in the `Code/1_data_ingestion_and_preparation.ipynb` Jupyter notebook, this notebook builds a LSTM network for scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict failure in aircraft engines. We will store the model for deployment in an Azure web service which we build in the `Code/3_operationalization.ipynb` Jupyter notebook.

In [115]:

# import the libraries
import os
import pandas as pd
import numpy as np

from azureml.core import  (Workspace,Run,VERSION,
                           Experiment,Datastore)
from azureml.core.compute import (AmlCompute, ComputeTarget)
from azureml.exceptions import ComputeTargetException

from utils import tensorize
from sklearn.metrics import confusion_matrix, recall_score, precision_score, accuracy_score
print('SDK verison', VERSION)

SDK verison 1.0.2


## Azure ML workspace

In [107]:
subscription_id = 'fe375bc2-9f1a-4909-ad0d-9319806d5e97'
resource_group = 'amlenv_rg'
workspace_name = 'vienna'
location = 'westeurope'

In [108]:
project_folder = os.getcwd()
exp_name = "deep_pred"

ws = Workspace(workspace_name = workspace_name,
               subscription_id = subscription_id,
               resource_group = resource_group)

ws.write_config()
print('Workspace loaded:', ws.name)

Wrote the config file config.json to: /home/sasuke/dev/amlsamples/deep_predictive_maintenance/aml_config/config.json
Workspace loaded: vienna


## Load feature data set

We have previously created the labeled data set in the `Code\1_Data Ingestion and Preparation.ipynb` Jupyter notebook and stored it in default datastore of the AML workspace.

Here We download the training/testing datasets here.

In [109]:


ds = Datastore.get(ws,'workspaceblobstore')
ds.download(project_folder, overwrite=True, show_progress = True)

data_path = "data"
ds_path = ds.path(data_path)


Client-Request-ID=7600cf42-fd62-11e8-8ec1-b1aefe35ea0d Retry policy did not allow for a retry: Server-Timestamp=Tue, 11 Dec 2018 16:33:16 GMT, Server-Request-ID=f55fe30d-a01e-0019-5c6f-912594000000, HTTP status code=416, Exception=The range specified is invalid for the current size of the resource. ErrorCode: InvalidRange<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidRange</Code><Message>The range specified is invalid for the current size of the resource.RequestId:f55fe30d-a01e-0019-5c6f-912594000000Time:2018-12-11T16:33:16.3292181Z</Message></Error>.
Client-Request-ID=760177f8-fd62-11e8-8ec1-b1aefe35ea0d Retry policy did not allow for a retry: Server-Timestamp=Tue, 11 Dec 2018 16:33:15 GMT, Server-Request-ID=86ad80b9-d01e-002c-136f-914980000000, HTTP status code=416, Exception=The range specified is invalid for the current size of the resource. ErrorCode: InvalidRange<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidRange</Code><Message>The range specified is invali

Load the data and dump a short summary of the resulting DataFrame.

In [116]:
train_df = pd.read_csv(os.path.join(project_folder, 'preprocessed_train_file.csv'))
train_df.head(5)

Unnamed: 0,engine_id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,RUL,label1,label2,cycle_norm
0,1,1,0.45977,0.166667,0.0,0.0,0.183735,0.406802,0.309757,0.0,...,0.0,0.333333,0.0,0.0,0.713178,0.724662,191,0,0,0.0
1,1,2,0.609195,0.25,0.0,0.0,0.283133,0.453019,0.352633,0.0,...,0.0,0.333333,0.0,0.0,0.666667,0.731014,190,0,0,0.00277
2,1,3,0.252874,0.75,0.0,0.0,0.343373,0.369523,0.370527,0.0,...,0.0,0.166667,0.0,0.0,0.627907,0.621375,189,0,0,0.00554
3,1,4,0.54023,0.5,0.0,0.0,0.343373,0.256159,0.331195,0.0,...,0.0,0.333333,0.0,0.0,0.573643,0.662386,188,0,0,0.00831
4,1,5,0.390805,0.333333,0.0,0.0,0.349398,0.257467,0.404625,0.0,...,0.0,0.416667,0.0,0.0,0.589147,0.704502,187,0,0,0.01108


In [117]:
test_df = pd.read_csv(os.path.join(project_folder, 'preprocessed_test_file.csv'))
test_df.head(5)

Unnamed: 0,engine_id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,cycle_norm,RUL,label1,label2
0,1,1,0.632184,0.75,0.0,0.0,0.545181,0.310661,0.269413,0.0,...,0.0,0.333333,0.0,0.0,0.55814,0.661834,0.0,142,0,0
1,1,2,0.344828,0.25,0.0,0.0,0.150602,0.379551,0.222316,0.0,...,0.0,0.416667,0.0,0.0,0.682171,0.686827,0.00277,141,0,0
2,1,3,0.517241,0.583333,0.0,0.0,0.376506,0.346632,0.322248,0.0,...,0.0,0.416667,0.0,0.0,0.728682,0.721348,0.00554,140,0,0
3,1,4,0.741379,0.5,0.0,0.0,0.370482,0.285154,0.408001,0.0,...,0.0,0.25,0.0,0.0,0.666667,0.66211,0.00831,139,0,0
4,1,5,0.58046,0.5,0.0,0.0,0.391566,0.352082,0.332039,0.0,...,0.0,0.166667,0.0,0.0,0.658915,0.716377,0.01108,138,0,0


## Modelling

The traditional predictive maintenance machine learning models are based on feature engineering, the manual construction of variable using domain expertise and intuition. This usually makes these models hard to reuse as the feature are specific to the problem scenario and the available data may vary between customers. Perhaps the most attractive advantage of deep learning they automatically do feature engineering from the data, eliminating the need for the manual feature engineering step.

When using LSTMs in the time-series domain, one important parameter is the sequence length, the window to examine for failure signal. This may be viewed as picking a `window_size` (i.e. 5 cycles) for calculating the rolling features in the [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3). The rolling features included rolling mean and rolling standard deviation over the 5 cycles for each of the 21 sensor values. In deep learning, we allow the LSTMs to extract abstract features out of the sequence of sensor values within the window. The expectation is that patterns within these sensor values will be automatically encoded by the LSTM.

Another critical advantage of LSTMs is their ability to remember from long-term sequences (window sizes) which is hard to achieve by traditional feature engineering. Computing rolling averages over a window size of 50 cycles may lead to loss of information due to smoothing over such a long period. LSTMs are able to use larger window sizes and use all the information in the window as input. 

http://colah.github.io/posts/2015-08-Understanding-LSTMs/ contains more information on the details of LSTM networks.

This notebook illustrates the LSTM approach to binary classification using a sequence_length of 50 cycles to predict the probability of engine failure within 30 days.

We use the [Keras LSTM](https://keras.io/layers/recurrent/) with [Tensorflow](https://tensorflow.org) as a backend. Here layers expect an input in the shape of an array of 3 dimensions (samples, time steps, features) where samples is the number of training sequences, time steps is the look back window or sequence length and features is the number of features of each sequence at each time step.

We define a function to generate this array, as we'll use it repeatedly.

In [9]:
X_train, y_train = tensorize(train_df, timestep = sequence_length, istest = False)
print(X_train.shape, y_train.shape)

NameError: name 'sequence_length' is not defined

In [118]:
training_dir = './train'
os.makedirs(training_dir, exist_ok=True)

# choose a name for your cluster
cluster_name = "gpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=2)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

Found existing compute target.


## LSTM Network

Building a Neural Net requires determining the network architecture. In this scenario we will build a network of only 2 layers, with dropout. The first LSTM layer with 100 units, one for each input sequence, followed by another LSTM layer with 50 units. We will also apply dropout each LSTM layer to control overfitting. The final dense output layer employs a sigmoid activation corresponding to the binary classification requirement.

Since we have many more healthy cycles than failure cycles, we also look at precision and recall. In all cases, we assume the model threshold is at $Pr = 0.5$. In order to tune this, we need to look at a test data set. 

In [119]:
%%writefile ./train/network.py

import torch 
import torch.nn as nn
import torch.utils.data as utils

class Network(nn.Module):
    
    def __init__(self, input_size, hidden_size, nb_layers, dropout, nb_classes=2):
        super(Network, self).__init__()
        
        self.hidden_size = hidden_size
        self.nb_layers = nb_layers
        self.dropout = nn.Dropout(dropout)
        self.lstm0 = nn.LSTM(input_size, hidden_size, nb_layers, batch_first=True)
        self.lstm1 = nn.LSTM(hidden_size, hidden_size//2, nb_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size//2, nb_classes)
        self.activation = nn.ReLU()
        
    
    def forward(self, x):
        
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.nb_layers, x.size(0), self.hidden_size).to(device) 
        c0 = torch.zeros(self.nb_layers, x.size(0), self.hidden_size).to(device)
        
        h1 = torch.zeros(self.nb_layers, x.size(0), self.hidden_size//2).to(device) 
        c1 = torch.zeros(self.nb_layers, x.size(0), self.hidden_size//2).to(device)
        
        # Forward propagate LSTM
        self.lstm0.flatten_parameters()
        out, _ = self.lstm0(x, (h0, c0))
        out = self.activation(out)
        out = self.dropout(out)
        
        self.lstm1.flatten_parameters()
        out, _ = self.lstm1(out, (h1, c1))
        out = self.activation(out)
        
        # retrieve hidden state of the last time step
        out = self.fc(out[:, -1, :])
       
        return out


Overwriting ./train/network.py


In [120]:
%%writefile ./train/train.py


import torch 
import torch.nn as nn
import torch.utils.data as utils
from azureml.core import Run
import numpy as np
import pandas as pd
from utils import tensorize
import network
from sklearn.metrics import (recall_score, 
                             precision_score, 
                             accuracy_score)


def train(run , training_file, 
          device,input_size, 
          hidden_size, nb_layers,
          dropout, nb_classes):
    
    X_train, y_train = to_tensors(training_file)
    input_size = X_train.shape[2]
    
    
    
    X_train = torch.from_numpy(X_train)
    y_train = torch.from_numpy(y_train)
    dataset = utils.TensorDataset(X_train,y_train) 
    dataloader = utils.DataLoader(dataset)
    
    network = Network(input_size,hidden_size, nb_layers, dropout, nb_classes).to(device)

    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(network.parameters(), lr=learning_rate)

    # Train the model
    for epoch in range(nb_epochs):
        for i, (X, y) in enumerate(dataloader):
            X = X.reshape(-1, sequence_length, input_size).to(device)
            y = y.to(device)

            # Forward pass
            y_pred = network(X)
            loss = criterion(y_pred, y)

            # Backprop
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            if (i+1) % 100 == 0:
                print('epoch [{}/{}], loss: {:.4f}'
                                   .format(epoch+1,nb_epochs, loss.item()))
                run.log('loss', loss.item())      

    return network

def to_tensors(df_path, istest = False):
    '''
     Converts dataset to dataset and labels tensors
     
     params:
         df_path: path to csv data file
         istest: testing set being passed default to false
         
     return X (m,50,25), y (m,)
    '''
    
    timestep = 50
    
    train_df = pd.read_csv(df_path)
    X_train, y_train = tensorize(train_df, timestep = timestep, istest = istest)
    print(X_train.shape, y_train.shape)
    return X_train, y_train

def evaluate(testfile_path, network, device):
    
    '''
        Evaluate model on testing set
        
        params:
            testfile_path: path to testing file
    '''
    
    X_test,y_test = to_tensors(testfile_path, istest = True)
    
    X_test = torch.from_numpy(seq_array_test).to(device)
    y_test = torch.from_numpy(label_array_test).to(device)
    print(X_test.size())

    y_pred = network(X_test)
    
    y_pred_np = y_pred.to('cpu').data.numpy()
    y_test_np = y_test.to('cpu').data.numpy()
    y_pred_np = np.argmax(y_pred_np, axis=1)
    
    accuracy = accuracy_score(y_test_np, y_pred_np)
    precision_test = precision_score(y_test_np, y_pred_np)
    recall_test = recall_score(y_test_np, y_pred_np)
    
    run.log('Test Accurracy', accuracy)
    run.log('Test Precision', precision_test)
    run.log('Test Recall', recall_test)

if __name__ == '__main__':
    
    parser = argparse.ArgumentParser()
    
    parser.add_argument('--nb_epochs', type=int, default=2,
                        help='number of epochs to train')
    parser.add_argument('--learning_rate', type=float,
                        default=1e-3, help='learning rate')
    parser.add_argument('--dropout', type=float,
                        default=.2, help='drop out')
    parser.add_argument('--data_path', type=str, 
                        help='path to training-set file')
    parser.add_argument('--output_dir', type=str, 
                        help='output directory')
    
    args = parser.parse_args()
    nb_epochs = args.nb_epochs
    learning_rate = args.learning_rate
    dropout = args.dropout
    data_path = args.data_path
    output_dir = args.output_dir
   
    hidden_size = 128
    nb_layers = 1 
    nb_classes = 2
    batch_size = 32
    
    os.makedirs(data_path, exist_ok = True)
    print('DATA PATH', data_path)
    
    training_file = os.path.join(data_path, 'preprocessed_train_file.csv')
    print('TRAINING FILE', training_file)
    X_train, y_train = to_tensors(training_file)
    input_size = X_train.shape[2]
    
    
    
    X_train = torch.from_numpy(X_train)
    y_train = torch.from_numpy(y_train)
    
    dataset = utils.TensorDataset(X_train,y_train) 
    dataloader = utils.DataLoader(dataset)
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    run = Run.get_context()
    network = train(run, dataloader, 
                    device, input_size,
                    hidden_size, nb_layers,
                    dropout, nb_classes)
    evaluate(testfile_path, network, device)
    
    os.makedirs(output_dir, exist_ok = True)
    model_path = os.path.join(output_dir, 'network.pth')
    torch.save(network, model_path)
    run.register_model(model_name = 'network.pth', model_path = model_path)

Overwriting ./train/train.py


## Model testing
Next, we look at the performance on the test data. Only the last cycle data for each engine id in the test data is kept for testing purposes. In order to compare the results to the template, we pick the last sequence for each id in the test data.

In [121]:
from azureml.train.dnn import PyTorch

script_params = {
    '--nb_epochs': 2,
    '--learning_rate': 1e-3,
    '--dropout': .2,
    '--data_path': ds_path,
    '--output_dir': './outputs'
}

estimator = PyTorch(source_directory = training_dir, 
                    conda_packages = ['pandas', 'numpy', 'scikit-learn'],
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='train.py',
                    use_gpu=True)

Now we can test the model with the test data. We report the model accuracy on the test set, and compare it to the training accuracy. By definition, the training accuracy should be optimistic since the model was optimized for those observations. The test set accuracy is more general, and simulates how the model was intended to be used to predict forward in time. This is the number we should use for reporting how the model performs.

Similarly for the test set confusion matrix. 

In [122]:
experiment = Experiment(workspace=ws, name=exp_name)
run = experiment.submit(estimator)


In [123]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 's…

The confusion matrix uses absolute counts, so comparing the test and training set confusion matrices is difficult. Instead, it is  better to use precision and recall. 

 * _Precision_ measures how accurate your model predicts failures. What percentage of the failure predictions are actually failures.
 * _Recall_ measures how well the model captures thos failures. What percentage of the true failures did your model capture.
 
These measures are tightly coupled, and you can typically only choose to maximize one of them (by manipulating the probability threshold) and have to accept the other as is.


## Saving the model  

The LSTM network is made up of two components, the architecture and the model weights. We'll save these model components in two files, the architecture in a `json` file that the `keras` package can use to rebuild the model, and the weights in an `HDF5` heirachy that rebuild the exact model. 

In [17]:
model_name = 'lstm.pth'
model_path = os.path.join(os.getcwd(), model_name)
torch.save(network,model_path)
print("Model saved to", model_path)

Model saved to /home/sasuke/dev/amlsamples/deep_predictive_maintenance/lstm.pth


  "type " + obj.__name__ + ". It won't be checked "


To test the save operations, we can reload the model files into a test model `loaded_model` and rescore the test dataset.

In [18]:
the_model= torch.load(model_path)
print(the_model)


Network(
  (dropout): Dropout(p=0.2)
  (lstm): LSTM(25, 128, batch_first=True)
  (lstm2): LSTM(128, 64, batch_first=True)
  (fc): Linear(in_features=64, out_features=2, bias=True)
  (activation): ReLU()
)


# Persist the model

In order to pass the model to our next notebook, we will write the model files to the shared folder within the Azure ML Workbench project. https://docs.microsoft.com/en-us/azure/machine-learning/preview/how-to-read-write-files

In the `Code\3_operationalization.ipynb` Jupyter notebook, we will create the functions needed to operationalize and deploy any model to get realtime predictions. The artifacts created will be stored in one of your Azure storage containers for you to deploy and test your own web service.