# Step 2: Model Building & Evaluation
Using the training and test data sets we constructed in the `Code/1_data_ingestion_and_preparation.ipynb` Jupyter notebook, this notebook builds a LSTM network for scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict failure in aircraft engines. We will store the model for deployment in an Azure web service which we build in the `Code/3_operationalization.ipynb` Jupyter notebook.



https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/train-and-log/keras_mnist_with_mlflow.ipynb
https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/model-management/model_management.ipynb
https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/deploy/mlflow_sdk_online_endpoints.ipynb

In [2]:
#%pip install azureml  
#%pip install azure-ai-ml

Collecting azure-ai-ml
  Downloading azure_ai_ml-1.11.1-py3-none-any.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m32.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting azure-storage-file-share<13.0.0
  Downloading azure_storage_file_share-12.14.2-py3-none-any.whl (266 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.4/266.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting marshmallow<4.0.0,>=3.5
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Collecting pydash<7.0.6,>=6.0.0
  Downloading pydash-7.0.5-py3-none-any.whl (109 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.7/109.7 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting colorama<0.5.0
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting strictyaml<2.0.0
  Do

Installing collected packages: pydash, marshmallow, colorama, strictyaml, azure-core, azure-storage-file-share, azure-storage-blob, azure-storage-file-datalake, azure-ai-ml
  Attempting uninstall: azure-core
    Found existing installation: azure-core 1.27.1
    Uninstalling azure-core-1.27.1:
      Successfully uninstalled azure-core-1.27.1
  Attempting uninstall: azure-storage-blob
    Found existing installation: azure-storage-blob 12.13.0
    Uninstalling azure-storage-blob-12.13.0:
      Successfully uninstalled azure-storage-blob-12.13.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
azureml-mlflow 1.51.0 requires azure-storage-blob<=12.13.0,>=12.5.0, but you have azure-storage-blob 12.18.3 which is incompatible.
azureml-core 1.51.0.post1 requires packaging<=23.0,>=20.0, but you have packaging 23.1 which is incompatible.[0m[31m
[0mSuccessfully in

In [3]:
# import the libraries
import h5py
import os
import pandas as pd
import numpy as np
import json 
import random
import string

import urllib
import glob
import pickle
import re

from sklearn.metrics import confusion_matrix, recall_score, precision_score
import keras
from keras.models import Sequential
from keras.models import model_from_json

from sklearn import datasets
from keras.layers import Dense, Dropout, LSTM, Activation

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

import mlflow
import mlflow.keras
from mlflow.deployments import get_deploy_client

import warnings
warnings.filterwarnings('ignore')

In [4]:
experiment_name="LSTM-PD-classifier"
mlflow.set_experiment(experiment_name=experiment_name)

2023/10/14 14:12:20 INFO mlflow.tracking.fluent: Experiment with name 'LSTM-PD-classifier' does not exist. Creating a new experiment.


<Experiment: artifact_location='', creation_time=1697292740901, experiment_id='5080febc-a68b-4204-893c-4e82eec28000', last_update_time=None, lifecycle_stage='active', name='LSTM-PD-classifier', tags={}>

## Load feature data set

We have previously created the labeled data set in the `Code\1_Data Ingestion and Preparation.ipynb` Jupyter notebook and stored it in local persistant storage. We define the storage locations for both the notebook input and output here.

In [5]:

# These file names detail the data files. 
TRAIN_DATA = 'PM_train_files.pkl'
TEST_DATA = 'PM_test_files.pkl'

# We'll serialize the model in json format
LSTM_MODEL = 'modellstm.json'

# and store the weights in h5
MODEL_WEIGHTS = 'modellstm.h5'

Load the data and dump a short summary of the resulting DataFrame.

In [6]:
train_df = pd.read_pickle(TRAIN_DATA)
train_df.head(10)

Unnamed: 0,id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,RUL,label1,label2,cycle_norm
0,1,1,0.45977,0.166667,0.0,0.0,0.183735,0.406802,0.309757,0.0,...,0.0,0.333333,0.0,0.0,0.713178,0.724662,191,0,0,0.0
1,1,2,0.609195,0.25,0.0,0.0,0.283133,0.453019,0.352633,0.0,...,0.0,0.333333,0.0,0.0,0.666667,0.731014,190,0,0,0.00277
2,1,3,0.252874,0.75,0.0,0.0,0.343373,0.369523,0.370527,0.0,...,0.0,0.166667,0.0,0.0,0.627907,0.621375,189,0,0,0.00554
3,1,4,0.54023,0.5,0.0,0.0,0.343373,0.256159,0.331195,0.0,...,0.0,0.333333,0.0,0.0,0.573643,0.662386,188,0,0,0.00831
4,1,5,0.390805,0.333333,0.0,0.0,0.349398,0.257467,0.404625,0.0,...,0.0,0.416667,0.0,0.0,0.589147,0.704502,187,0,0,0.01108
5,1,6,0.252874,0.416667,0.0,0.0,0.268072,0.292784,0.272113,0.0,...,0.0,0.25,0.0,0.0,0.651163,0.65272,186,0,0,0.01385
6,1,7,0.557471,0.583333,0.0,0.0,0.38253,0.46392,0.261985,0.0,...,0.0,0.333333,0.0,0.0,0.744186,0.667219,185,0,0,0.01662
7,1,8,0.304598,0.75,0.0,0.0,0.406627,0.259865,0.316003,0.0,...,0.0,0.25,0.0,0.0,0.643411,0.574979,184,0,0,0.019391
8,1,9,0.545977,0.583333,0.0,0.0,0.274096,0.434707,0.21185,0.0,...,0.0,0.333333,0.0,0.0,0.705426,0.707539,183,0,0,0.022161
9,1,10,0.310345,0.583333,0.0,0.0,0.150602,0.440375,0.307394,0.0,...,0.0,0.416667,0.0,0.0,0.627907,0.794256,182,0,0,0.024931


In [7]:
test_df = pd.read_pickle(TEST_DATA)

test_df.head(10)

Unnamed: 0,id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,cycle_norm,RUL,label1,label2
0,1,1,0.632184,0.75,0.0,0.0,0.545181,0.310661,0.269413,0.0,...,0.0,0.333333,0.0,0.0,0.55814,0.661834,0.0,142,0,0
1,1,2,0.344828,0.25,0.0,0.0,0.150602,0.379551,0.222316,0.0,...,0.0,0.416667,0.0,0.0,0.682171,0.686827,0.00277,141,0,0
2,1,3,0.517241,0.583333,0.0,0.0,0.376506,0.346632,0.322248,0.0,...,0.0,0.416667,0.0,0.0,0.728682,0.721348,0.00554,140,0,0
3,1,4,0.741379,0.5,0.0,0.0,0.370482,0.285154,0.408001,0.0,...,0.0,0.25,0.0,0.0,0.666667,0.66211,0.00831,139,0,0
4,1,5,0.58046,0.5,0.0,0.0,0.391566,0.352082,0.332039,0.0,...,0.0,0.166667,0.0,0.0,0.658915,0.716377,0.01108,138,0,0
5,1,6,0.568966,0.75,0.0,0.0,0.271084,0.17615,0.217421,0.0,...,0.0,0.333333,0.0,0.0,0.596899,0.624827,0.01385,137,0,0
6,1,7,0.5,0.666667,0.0,0.0,0.271084,0.268149,0.38133,0.0,...,0.0,0.25,0.0,0.0,0.550388,0.691798,0.01662,136,0,0
7,1,8,0.534483,0.5,0.0,0.0,0.400602,0.214737,0.314652,0.0,...,0.0,0.416667,0.0,0.0,0.705426,0.591273,0.019391,135,0,0
8,1,9,0.293103,0.5,0.0,0.0,0.201807,0.485066,0.506921,0.0,...,0.0,0.25,0.0,0.0,0.744186,0.770367,0.022161,134,0,0
9,1,10,0.356322,0.416667,0.0,0.0,0.259036,0.309789,0.276671,0.0,...,0.0,0.25,0.0,0.0,0.565891,0.673571,0.024931,133,0,0


## Modelling

The traditional predictive maintenance machine learning models are based on feature engineering, the manual construction of variable using domain expertise and intuition. This usually makes these models hard to reuse as the feature are specific to the problem scenario and the available data may vary between customers. Perhaps the most attractive advantage of deep learning they automatically do feature engineering from the data, eliminating the need for the manual feature engineering step.

When using LSTMs in the time-series domain, one important parameter is the sequence length, the window to examine for failure signal. This may be viewed as picking a `window_size` (i.e. 5 cycles) for calculating the rolling features in the [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3). The rolling features included rolling mean and rolling standard deviation over the 5 cycles for each of the 21 sensor values. In deep learning, we allow the LSTMs to extract abstract features out of the sequence of sensor values within the window. The expectation is that patterns within these sensor values will be automatically encoded by the LSTM.

Another critical advantage of LSTMs is their ability to remember from long-term sequences (window sizes) which is hard to achieve by traditional feature engineering. Computing rolling averages over a window size of 50 cycles may lead to loss of information due to smoothing over such a long period. LSTMs are able to use larger window sizes and use all the information in the window as input. 

http://colah.github.io/posts/2015-08-Understanding-LSTMs/ contains more information on the details of LSTM networks.

This notebook illustrates the LSTM approach to binary classification using a sequence_length of 50 cycles to predict the probability of engine failure within 30 days.

We are going to use autologging capabilities in MLflow to track parameters and metrics:

In [8]:
mlflow.tensorflow.autolog()

In [9]:
# pick a large window size of 50 cycles
sequence_length = 50

We use the [Keras LSTM](https://keras.io/layers/recurrent/) with [Tensorflow](https://tensorflow.org) as a backend. Here layers expect an input in the shape of an array of 3 dimensions (samples, time steps, features) where samples is the number of training sequences, time steps is the look back window or sequence length and features is the number of features of each sequence at each time step.

We define a function to generate this array, as we'll use it repeatedly.

In [10]:
# function to reshape features into (samples, time steps, features) 
def gen_sequence(id_df, seq_length, seq_cols):
    """ Only sequences that meet the window-length are considered, no padding is used. This means for testing
    we need to drop those which are below the window-length. An alternative would be to pad sequences so that
    we can use shorter ones """
    data_array = id_df[seq_cols].values
    num_elements = data_array.shape[0]
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        yield data_array[start:stop, :]

The sequences are built from the features (sensor and settings) values across the time steps (cycles) within each engine. 

In [11]:
# pick the feature columns 
sequence_cols = ['setting1', 'setting2', 'setting3', 'cycle_norm']
key_cols = ['id', 'cycle']
label_cols = ['label1', 'label2', 'RUL']

input_features = test_df.columns.values.tolist()
sensor_cols = [x for x in input_features if x not in set(key_cols)]
sensor_cols = [x for x in sensor_cols if x not in set(label_cols)]
sensor_cols = [x for x in sensor_cols if x not in set(sequence_cols)]

# The time is sequenced along
# This may be a silly way to get these column names, but it's relatively clear
sequence_cols.extend(sensor_cols)

print(sequence_cols)

['setting1', 'setting2', 'setting3', 'cycle_norm', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']


In [12]:
# generator for the sequences
seq_gen = (list(gen_sequence(train_df[train_df['id']==id], sequence_length, sequence_cols)) 
           for id in train_df['id'].unique())

# generate sequences and convert to numpy array
seq_array = np.concatenate(list(seq_gen)).astype(np.float32)
seq_array.shape

(15631, 50, 25)

We also create a function to label these sequences.

In [13]:
# function to generate labels
def gen_labels(id_df, seq_length, label):
    data_array = id_df[label].values
    num_elements = data_array.shape[0]
    return data_array[seq_length:num_elements, :]

We will only be using the LSTM to predict failure within the next 30 days (`label1`). To predict other labels, we could change this call before building the LSTM network.

In [14]:
# generate labels
label_gen = [gen_labels(train_df[train_df['id']==id], sequence_length, ['label1']) 
             for id in train_df['id'].unique()]
label_array = np.concatenate(label_gen).astype(np.float32)
label_array.shape

(15631, 1)

## LSTM Network

Building a Neural Net requires determining the network architecture. In this scenario we will build a network of only 2 layers, with dropout. The first LSTM layer with 100 units, one for each input sequence, followed by another LSTM layer with 50 units. We will also apply dropout each LSTM layer to control overfitting. The final dense output layer employs a sigmoid activation corresponding to the binary classification requirement.

In [15]:
# build the network
# Feature weights
nb_features = seq_array.shape[2]
nb_out = label_array.shape[1]

# LSTM model
model = Sequential()

# The first layer
model.add(LSTM(
         input_shape=(sequence_length, nb_features),
         units=100,
         return_sequences=True))

# Plus a 20% dropout rate
model.add(Dropout(0.2))

# The second layer
model.add(LSTM(
          units=50,
          return_sequences=False))

# Plus a 20% dropout rate
model.add(Dropout(0.2))

# Dense sigmoid layer
model.add(Dense(units=nb_out, activation='sigmoid'))

# With adam optimizer and a binary crossentropy loss. We will opimize for model accuracy.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Verify the architecture 
print(model.summary())

2023-10-14 14:12:43.215429: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 50, 100)           50400     
                                                                 
 dropout (Dropout)           (None, 50, 100)           0         
                                                                 
 lstm_1 (LSTM)               (None, 50)                30200     
                                                                 
 dropout_1 (Dropout)         (None, 50)                0         
                                                                 
 dense (Dense)               (None, 1)                 51        
                                                                 
Total params: 80,651
Trainable params: 80,651
Non-trainable params: 0
_________________________________________________________________
None


It takes about 15 seconds per epoch to build this model on a DS4_V2 standard [Data Science Virtual Machine for Linux (Ubuntu)](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu) using only CPU compute.

As soon as the train method is executed, MLflow will stat a run in Azure ML to start tracking the experiment's run. However, it is always a good idea to start the run manually so you have the run ID at hand quickly. This is not required though.

Important: When running training routines in Azure ML as jobs, you don't need to start or end the run in your training code as it is automatically done for you by Azure ML.

In [16]:
run = mlflow.start_run()

In [17]:
%%time
# fit the network
model.fit(seq_array, # Training features
          label_array, # Training labels
          epochs=20,   # We'll stop after 20 epochs
          batch_size=200, # 
          validation_split=0.20, # Use 20% of data to evaluate the loss. (val_loss)
          verbose=1, #
          callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', # Monitor the validation loss
                                                     min_delta=0,    # until it doesn't change (or gets worse)
                                                     patience=7,  # patience > 1 so it continutes if it is not consistently improving
                                                     verbose=0, 
                                                     mode='auto')]) 

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20




INFO:tensorflow:Assets written to: /tmp/tmplxss6tj8/model/data/model/assets


INFO:tensorflow:Assets written to: /tmp/tmplxss6tj8/model/data/model/assets


CPU times: user 6min 58s, sys: 53.1 s, total: 7min 52s
Wall time: 3min 11s


<keras.callbacks.History at 0x7febe83894c0>

We optimized the network weights on the training set accuracy, which we examine here. 

In [18]:
# training metrics
scores = model.evaluate(seq_array, label_array, verbose=1, batch_size=200)
print('Training Accurracy: {}'.format(scores[1]))


Training Accurracy: 0.9806793928146362


We can examine the training set performance by looking at the model confusion matrix. Accurate predictions lie along the diagonal of the matrix, errors are on the off diagonal.

In [19]:
# make predictions and compute confusion matrix


y_true = label_array

y_pred=model.predict(seq_array) 
y_pred = np.round(y_pred)







Once done with the training, let's end the run:

Important: Remember that when training with jobs, you should not start/end runs manually.

In [20]:
mlflow.end_run()

Let's explore the parameters that got logged:

In [21]:
run = mlflow.get_run(run.info.run_id)
pd.DataFrame(data=[run.data.params], index=["Value"]).T

Unnamed: 0,Value
epochs,20
batch_size,200
validation_split,0.2
shuffle,True
class_weight,
sample_weight,
initial_epoch,0
steps_per_epoch,
validation_steps,
validation_batch_size,


Let's explore the metrics values:

In [22]:
pd.DataFrame(data=[run.data.metrics], index=["Value"]).T

Unnamed: 0,Value
loss,0.049972
val_accuracy,0.978574
accuracy,0.977367
val_loss,0.047456
stopped_epoch,14.0


Let's explore artifacts that got logged in the run. This requires to use the MLflow client:

In [23]:
client = mlflow.tracking.MlflowClient()
client.list_artifacts(run_id=run.info.run_id)

[<FileInfo: file_size=-1, is_dir=True, path='model'>,
 <FileInfo: file_size=-1, is_dir=False, path='model_summary.txt'>,
 <FileInfo: file_size=-1, is_dir=True, path='tensorboard_logs'>]

As you can see in this example, three artifacts are availble in the run:

- **model**, the path where the model is stored. Note that this artifact is a directory.
- **model_summary.txt** -> Contains a summary of the training process of the TensorFlow model. This is TensorFlow
- **tensorboard_logs** -> The TensorBoard logs. Note that this artifact is a directory.
specific.

You can download any artifact using the method download_artifact

In [24]:
file_path = mlflow.artifacts.download_artifacts(
    run_id=run.info.run_id, artifact_path="model_summary.txt"
)

In [25]:
with open(file_path, "r") as f:
    print(f.readlines())



**autolog** has also logged the model for us, let's try to get it back

In [26]:

classifier = mlflow.keras.load_model(f"runs:/{run.info.run_id}/model")

In [27]:
type(classifier)

keras.engine.sequential.Sequential

In [28]:
classifier.predict(seq_array).argmax(axis=-1)



array([0, 0, 0, ..., 0, 0, 0])

In [29]:
print('Training Confusion matrix\n- x-axis is true labels.\n- y-axis is predicted labels')
cm = confusion_matrix(y_true, y_pred)
cm

Training Confusion matrix
- x-axis is true labels.
- y-axis is predicted labels


array([[12298,   233],
       [   69,  3031]])

Since we have many more healthy cycles than failure cycles, we also look at precision and recall. In all cases, we assume the model threshold is at $Pr = 0.5$. In order to tune this, we need to look at a test data set. 

In [30]:
# compute precision and recall
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = 2 * (precision * recall) / (precision + recall)
print( 'Training Precision: ', precision, '\n', 'Training Recall: ', recall, '\n', 'Training F1 Score:', f1)


Training Precision:  0.9286151960784313 
 Training Recall:  0.977741935483871 
 Training F1 Score: 0.9525455688246387


## Model testing
Next, we look at the performance on the test data. Only the last cycle data for each engine id in the test data is kept for testing purposes. In order to compare the results to the template, we pick the last sequence for each id in the test data.

In [31]:
seq_array_test_last = [test_df[test_df['id']==id][sequence_cols].values[-sequence_length:] 
                       for id in test_df['id'].unique() if len(test_df[test_df['id']==id]) >= sequence_length]

seq_array_test_last = np.asarray(seq_array_test_last).astype(np.float32)
seq_array_test_last.shape

(93, 50, 25)

We also ned the test set labels in the correct format.

In [32]:
y_mask = [len(test_df[test_df['id']==id]) >= sequence_length for id in test_df['id'].unique()]

label_array_test_last = test_df.groupby('id')['label1'].nth(-1)[y_mask].values
label_array_test_last = label_array_test_last.reshape(label_array_test_last.shape[0],1).astype(np.float32)
label_array_test_last.shape

print(seq_array_test_last.shape)
print(label_array_test_last.shape)

(93, 50, 25)
(93, 1)


Now we can test the model with the test data. We report the model accuracy on the test set, and compare it to the training accuracy. By definition, the training accuracy should be optimistic since the model was optimized for those observations. The test set accuracy is more general, and simulates how the model was intended to be used to predict forward in time. This is the number we should use for reporting how the model performs.

In [33]:
# test metrics
scores_test = model.evaluate(seq_array_test_last, label_array_test_last, verbose=2)
print('Test Accurracy: {}'.format(scores_test[1]))


3/3 - 0s - loss: 0.0545 - accuracy: 0.9677 - 81ms/epoch - 27ms/step
Test Accurracy: 0.9677419066429138


Similarly for the test set confusion matrix. 

In [34]:
# make predictions and compute confusion matrix
y_pred_test = model.predict(seq_array_test_last)
y_pred_test = np.round(y_pred_test)
y_true_test = label_array_test_last
print('Confusion matrix\n- x-axis is true labels.\n- y-axis is predicted labels')
cm = confusion_matrix(y_true_test, y_pred_test)
cm

Confusion matrix
- x-axis is true labels.
- y-axis is predicted labels


array([[66,  2],
       [ 1, 24]])

The confusion matrix uses absolute counts, so comparing the test and training set confusion matrices is difficult. Instead, it is  better to use precision and recall. 

 * _Precision_ measures how accurate your model predicts failures. What percentage of the failure predictions are actually failures.
 * _Recall_ measures how well the model captures thos failures. What percentage of the true failures did your model capture.
 
These measures are tightly coupled, and you can typically only choose to maximize one of them (by manipulating the probability threshold) and have to accept the other as is.


In [35]:
# compute precision and recall
precision_test = precision_score(y_true_test, y_pred_test)
recall_test = recall_score(y_true_test, y_pred_test)
f1_test = 2 * (precision_test * recall_test) / (precision_test + recall_test)
print( 'Test Precision: ', precision_test, '\n', 'Test Recall: ', recall_test, '\n', 'Test F1 Score:', f1_test)


Test Precision:  0.9230769230769231 
 Test Recall:  0.96 
 Test F1 Score: 0.9411764705882353


## Saving the model  

The LSTM network is made up of two components, the architecture and the model weights. We'll save these model components in two files, the architecture in a `json` file that the `keras` package can use to rebuild the model, and the weights in an `HDF5` heirachy that rebuild the exact model. 

In [36]:
# Save the model for operationalization: https://machinelearningmastery.com/save-load-keras-deep-learning-models/

 
# save model
# serialize model to JSON
model_json = model.to_json()
with open(LSTM_MODEL, "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights(MODEL_WEIGHTS)
print("Model saved")

Model saved


To test the save operations, we can reload the model files into a test model `loaded_model` and rescore the test dataset.

In [37]:


print(keras.__version__)

# load json and create model
json_file = open(LSTM_MODEL, 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights(MODEL_WEIGHTS)

loaded_model.compile('sgd','mse')
print("Model loaded")

2.11.0
Model loaded


The model constructed from storage can be used to predict the probability of engine failure.

In [38]:
score = loaded_model.predict(seq_array,verbose=1)
print(score.shape)
print(score)

(15631, 1)
[[2.3313487e-05]
 [2.3846611e-05]
 [2.4460063e-05]
 ...
 [9.9970084e-01]
 [9.9972272e-01]
 [9.9975330e-01]]


# Persist the model



In [39]:
with open(LSTM_MODEL, 'wt') as json_file:
    json_file.write(model_json)
    print("json file written shared folder")
    json_file.close()
    
model.save_weights(os.path.join(MODEL_WEIGHTS))

json file written shared folder


# Step 3: Register and Deploy

#### Creating models from an existing run
If you have an Mlflow model logged inside of a run and you want to register it in a registry, you can do that by using the experiment and run ID information from the run. Let's create a simple experiment and run to demonstrate it:

In [40]:
exp = mlflow.get_experiment_by_name(experiment_name)
last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]
print(last_run.info.run_id)

18670e88-3a7f-48e3-a07e-bc3da19bdc50


In [41]:
model_name = "mlflow-PD_LSTM-model"
artifact_path = "model"

You can now register the model from the local path:

In [42]:
mlflow.register_model(f"runs:/{last_run.info.run_id}/{artifact_path}", model_name)

Successfully registered model 'mlflow-PD_LSTM-model'.
2023/10/14 14:17:26 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: mlflow-PD_LSTM-model, version 1
Created version '1' of model 'mlflow-PD_LSTM-model'.


<ModelVersion: aliases=[], creation_timestamp=1697293046092, current_stage='None', description='', last_updated_timestamp=1697293046092, name='mlflow-PD_LSTM-model', run_id='18670e88-3a7f-48e3-a07e-bc3da19bdc50', run_link='', source='azureml://eastus2.api.azureml.ms/mlflow/v2.0/subscriptions/d83b98a9-eaa6-475f-9ae6-1ef35394a1e5/resourceGroups/RG-ML-PredMaint/providers/Microsoft.MachineLearningServices/workspaces/ML-PredMaint/experiments/5080febc-a68b-4204-893c-4e82eec28000/runs/18670e88-3a7f-48e3-a07e-bc3da19bdc50/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

Online Endpoints have the concept of **Endpoint** and **Deployment**. An endpoint represents the API that customers uses to consume the model, while a deployment indicates the specific implementation of that API. This distinction allows users to decouple the API from the implementation and to change the underlying implementation without affecting the consumer.

In [43]:

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "PD-LSTM-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

Endpoint name: PD-LSTM-c022t


First, let's create an MLflow deployment client for Azure Machine Learning:

In [44]:
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

Let's create the endpoint with basic configuration:

In [45]:
endpoint = deployment_client.create_endpoint(endpoint_name)

We can get the scoring URI from the endpoint:

In [46]:
scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)["properties"][
    "scoringUri"
]
print(scoring_uri)

https://pd-lstm-c022t.eastus2.inference.ml.azure.com/score


To configure the hardware requirements of your deployment, you need to create a JSON file with the desired configuration:

In [47]:
deployment_name = "default"

In [48]:
deploy_config = {
    "instance_type": "Standard_DS3_v2",
    "instance_count": 1,
}

Write the configuration to a file:

In [49]:

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

The method **create_deployment** allows you to create a simple deployment using the configuration indicated in the configuration file. We are going to name this deployment "default".  This step may take 10-20 minutes, you can monitor it in the Azure ML Portal as well under Endpoints


In [50]:
version = 1

deployment = deployment_client.create_deployment(
    name=deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

................................................................................................................

By default, new deployments receive none of the traffic from the endpoint. Let's assign all of it to the deployment:

In [51]:
traffic_config = {"traffic": {deployment_name: 100}}

Let's write the configuration to a file:

In [52]:
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

We are going to use the key endpoint-config-file to update the configuration:

In [53]:
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

{'id': '/subscriptions/d83b98a9-eaa6-475f-9ae6-1ef35394a1e5/resourceGroups/rg-ml-predmaint/providers/Microsoft.MachineLearningServices/workspaces/ml-predmaint/onlineEndpoints/pd-lstm-c022t',
 'name': 'pd-lstm-c022t',
 'type': 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints',
 'systemData': {'createdBy': 'Shep Sheppard',
  'createdAt': '2023-10-14T14:17:40.747507Z',
  'lastModifiedAt': '2023-10-14T14:17:40.747507Z'},
 'tags': {},
 'location': 'eastus2',
 'identity': {'principalId': 'b82e32db-6ad3-4569-bbc4-3bc247e58df8',
  'tenantId': '72f988bf-86f1-41af-91ab-2d7cd011db47',
  'type': 'SystemAssigned'},
 'kind': 'Managed',
 'properties': {'authMode': 'AMLToken',
  'properties': {'azureml.mlflow_client_endpoint': 'True',
   'azureml.onlineendpointid': '/subscriptions/d83b98a9-eaa6-475f-9ae6-1ef35394a1e5/resourcegroups/rg-ml-predmaint/providers/microsoft.machinelearningservices/workspaces/ml-predmaint/onlineendpoints/pd-lstm-c022t',
   'AzureAsyncOperationUri': 'https://manag

Once you are ready, delete the created resources:

In [None]:
#deployment_client.delete_deployment(endpoint_name)