# Predictive Maintenance

This notebook is part of [*Hands-on Machine Learning for IoT*](https://github.com/pablodecm/datalab_ml_iot) tutorial by Pablo de Castro

## Tools

This notebook will use the following Python 3
libraries for data analytics and machine learning:
- pandas
- numpy
- matplotlib
- scikit-learn
- keras/tensorflow

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Dataset

The dataset that will be used, which was published by NASA [[3](#References)],
consist on simulated turbojet engine degradation
under different combinations of operational conditions.

The main task is to predict when the engine is about to fail before it fails.

<div align="center">
  <img src="images/airbus-turbofan.jpg" height="50%" style="max-width: 50%">
</div>


In [None]:
!mkdir data
!wget https://ti.arc.nasa.gov/c/6/ -O data/CMAPSSData.zip
!unzip -o data/CMAPSSData.zip -d data

In [None]:
# this converts the encoding
!iconv -f ISO-8859-1 -t UTF-8//TRANSLIT data/readme.txt -o data/readme_enc.txt
!cat data/readme.txt

In [None]:
# train and test data are simple space separated values
!head -5 data/train_FD001.txt

In [None]:
# the test truth is given as the number of steps to failure
# for each engine run in the test set (100 in total)
!head -5 data/RUL_FD001.txt

In [None]:
# load data (only gonna use FD001 dataset)
train_df = pd.read_csv('data/train_FD001.txt', sep=" ", header=None)
test_df  = pd.read_csv('data/test_FD001.txt', sep=" ", header=None)
print("train shape: ", train_df.shape, "test shape: ", test_df.shape)
# lets have a look at basic descriptive statistics
train_df.describe()

In [None]:
# we will remove columns 26 and 27 because of the NaNs
train_df.drop(train_df.columns[[26, 27]], axis=1, inplace=True)
test_df.drop(test_df.columns[[26, 27]], axis=1, inplace=True)
print("train shape: ", train_df.shape, "test shape: ", test_df.shape)

In [None]:
# the files did not contain headers
# we can create them based on the documentation
target_var = ['target_RUL']
index_columns_names =  ["UnitNumber","Cycle"]
op_settings_columns = ["Op_Setting_"+str(i) for i in range(1,4)]
sensor_columns =["Sensor_"+str(i) for i in range(1,22)]
column_names = index_columns_names + op_settings_columns + sensor_columns
print(column_names)

In [None]:
# name columns
train_df.columns = column_names
test_df.columns = column_names

# now the dataset looks better, e.g. the first unit
train_df[train_df.UnitNumber == 1].head(5)

### Remaining Useful Life (RUL)

The training data consists time-series for the engine sensors
for each cycle (i.e. timestep) until failure which happens
after the last time step.

Thus, the Remaining Useful Life (RUL), i.e. time until the
engine breaks, can be calculated based on the maximum cycle
of each unit present in the training set.


In [None]:
# find the last cycle per unit number
max_cycle = train_df.groupby('UnitNumber')['Cycle'].max().reset_index()
max_cycle.columns = ['UnitNumber', 'MaxOfCycle']
# merge the max cycle back into the original frame
train_df = train_df.merge(max_cycle, left_on='UnitNumber', right_on='UnitNumber', how='inner')
# calculate RUL for each row
target_RUL = train_df["MaxOfCycle"] - train_df["Cycle"]
# add columns and remove MaxOfCycle
train_df["target_RUL"] = target_RUL
train_df = train_df.drop("MaxOfCycle", axis=1)
# check that it worked for unit 1
train_df[train_df.UnitNumber == 1].head(5)

The test data does not correspond to running until failure
as discussed in the dataset documentation, the RUL at the last
step is instead provided on an additional file.

In [None]:
# get truth RUL
truth_df = pd.read_csv('data/RUL_FD001.txt', sep=" ", header=None)
truth_df.drop(truth_df.columns[[1]], axis=1, inplace=True)
# UnitNumber based
truth_df.columns = ["RUL_after_last"]
truth_df['UnitNumber'] =  truth_df.index + 1
# find the last cycle per unit number in test set
max_cycle = test_df.groupby('UnitNumber')['Cycle'].max().reset_index()
max_cycle.columns = ['UnitNumber', 'MaxOfCycle']
max_cycle['MaxOfCycle'] = max_cycle['MaxOfCycle'] + truth_df["RUL_after_last"]
# merge the max cycle back into the original frame
test_df = test_df.merge(max_cycle, left_on='UnitNumber', right_on='UnitNumber', how='inner')
# calculate RUL for each row
target_RUL = test_df["MaxOfCycle"] - test_df["Cycle"]
# add columns and remove MaxOfCycle
test_df["target_RUL"] = target_RUL
test_df = test_df.drop("MaxOfCycle", axis=1)
# check that it worked for unit 1
test_df[test_df.UnitNumber == 1].head(5)

### Defining the Problem

**The first step before starting with ML for a IoT application (or any problem for that matter)
is to understand well the task at hand.**

Predictive maintenance is about having accurate predictions (based on sensors or performances)
of when a machine or a industrial setup will fail and how to schedule costly maintenance
intelligently and reduce operating costs.

The most common problem definitions for predictive maintenance are: 

- Regression (**included here**): Predict the Remaining Useful Life (RUL)or Time to Failure (TTF).
- Binary classification (**included here**): Predict if an asset will fail within certain time frame (e.g. days).
- Multi-class classification (**not included here**): Predict if an asset will fail in different time windows or due to different failure models


The techniques are those also used for general time-series
forecasting but at application time we do not have access to the target value at the current or
previous time step.



### Feature Exploration

Independently on the chosen problem, it is always recommended
to interactively explore the variables to be considered
in the predictive modelling problem.

It is important to not only consider the data available
in the training set but also that expected in the test
or production environment. Alternatively we can
incur in:
- **target leakage**: use information that has predictive power
as input of the model during training but will no be available
in production or for the real data.
- **domain mismatch**: if the training and test/production data
are different the trained models might not perform well in
the real word scenario.

In [None]:
# we will consider all features except the UnitNumber and the target
basic_features = train_df.columns.difference(["UnitNumber","target_RUL"])
print("basic_features: ",basic_features)

#### Exercise: Plot the Sensor Data for given Unit

For a given unit of the training set, plot the time-series of some of the sensor data.

In [None]:
# space for exercise solution

#### Exercise: Compare the Distribution of Variables in the Train and Test sets

Compare graphically the distribution of some of the
input variables (e.g. 'Cycle', 'Op_Setting_1' or sensor data) for the train and test sets.

In [None]:
# space for exercise solution

#### Exercise: Compare the Distrubution of the Target in the Train and Test sets

Compare graphically the distribution of the target RUL in the train and test sets.
Will this affect the training? How could it be avoided (see reference [9](#References))

In [None]:
# space for exercise solution

### Feature Transformations and Engineering

Another important step, particularly when dealing with
most techniques other than Deep Learning, is *feature
preprocessing/scaling* and *feature engineering*.

Feature preprocessing/scaling can facilitate model training by
scaling the features to dimensionless values based on the properties
of the dataset.

Feature engineering, the definition of new variables based
on those available, is particularly important for time-series
data when we want to
make a prediction for each timestep as is the case for all the
problems considered in this notebook.




In [None]:
from sklearn import preprocessing

# we will use the Standard Scaler
scaler = preprocessing.StandardScaler()

features = basic_features
X_unscaled = train_df[features].astype('float64')
X = pd.DataFrame(scaler.fit_transform(X_unscaled),
                 columns = features,
                 index = train_df.index)
y = train_df["target_RUL"]

X.describe()

In [None]:
X_test_unscaled = test_df[features].astype('float64')
X_test = pd.DataFrame(scaler.transform(X_test_unscaled),
                      columns = features,
                      index = test_df.index)
y_test = test_df["target_RUL"]

X_test.describe()

### RUL Prediction as a Regression Task

**How many more cycles an in-service engine will last before it fails?**

The task of predicting the Remaining Useful Life (RUL)
can be casted as a regression task in the context of machine
learning. RUL can also be referred as Time to Failure (TTF).


The goal in a regression problem is to find a function
$f_R(\boldsymbol{x})$ that approximates the true target $y$. The
target in this problem will be the RUL, while $\boldsymbol{x}$
will be the input features (e.g. sensor readings).

To measure the goodness of our regression function, we need
to define and score function, which will be also the loss
function for some of the machine learning techniques considered.
We will be considering the mean squared error:
$$ \textrm{MSE} = \sum_{i=1}^{n} (y - f_R(\boldsymbol{x}))^2$$
which is one of the most common losses used for regression,
depending on the problem alternative loss definitions might
be more beneficial.

#### Baseline and Feature Importance

Sometimes is useful to train a simple model for the task at hand to
get a baseline performance. A random forest has the advantage that
it can also provide a list of the relative importance of the
different features.

In [None]:
from sklearn import ensemble

rf = ensemble.RandomForestRegressor()
simple_rf = ensemble.RandomForestRegressor(n_estimators = 200, max_depth = 15)
simple_rf.fit(X, y)

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

y_pred = simple_rf.predict(X)
print("[Train] Simple RF Mean Squared Error: ", mean_squared_error(y, y_pred))
print("[Train] Simple RF Mean Absolute Error: ", mean_absolute_error(y, y_pred))
print("[Train] Simple RF r-squared: ", r2_score(y, y_pred))

In [None]:
y_test_pred = simple_rf.predict(X_test)
print("[Test] Simple RF Mean Squared Error: ", mean_squared_error(y_test, y_test_pred))
print("[Test] Simple RF Mean Absolute Error: ", mean_absolute_error(y_test, y_test_pred))
print("[Test] Simple RF r-squared: ", r2_score(y_test, y_test_pred))

In [None]:
# graph feature importance
import matplotlib.pyplot as plt
importances = simple_rf.feature_importances_
indices = np.argsort(importances)[::-1]
feature_names = X.columns    
f, ax = plt.subplots(figsize=(11, 9))
plt.title("Feature ranking", fontsize = 20)
plt.bar(range(X.shape[1]), importances[indices], color="b", align="center")
plt.xticks(range(X.shape[1]), indices) #feature_names, rotation='vertical')
plt.xlim([-1, X.shape[1]])
plt.ylabel("importance", fontsize = 18)
plt.xlabel("index of the feature", fontsize = 18)
plt.show()
# list feature importance
important_features = pd.Series(data=simple_rf.feature_importances_,index=X.columns)
important_features.sort_values(ascending=False,inplace=True)
print(important_features.head(10))

#### Cross Validation and Hyper-Parameters

By means of cross validation and hyper-parameter search, we
can try to obtain a better regression model based on RandomForest.

In [None]:
from sklearn.model_selection import GroupKFold, GridSearchCV

rf = ensemble.RandomForestRegressor(n_estimators=100)

# to avoid having same UnitNumber in both sets
cv = GroupKFold(5)

param_grid = { "min_samples_leaf" : [2, 10, 25, 50, 100],
               "max_depth" : [7, 8, 9, 10, 11, 12]}

optimized_rf = GridSearchCV(estimator=rf,
                            cv = cv,
                            param_grid=param_grid,
                            scoring='neg_mean_squared_error',
                            verbose = 1,
                            n_jobs = -1)

optimized_rf.fit(X, y, groups = train_df.UnitNumber)

In [None]:
y_test_pred = optimized_rf.predict(X_test)
print("[Test] Optimized RF Mean Squared Error: ", mean_squared_error(y_test, y_test_pred))
print("[Test] Optimized RF Mean Absolute Error: ", mean_absolute_error(y_test, y_test_pred))
print("[Test] Optimized RF r-squared: ", r2_score(y_test, y_test_pred))

#### Other Models: Gradient Boosting

Gradient Boosted regression and classification are some of the
best performing model for a variety of tasks, let us see if they
can also be applied to this dataset.



In [None]:
from sklearn.model_selection import GroupKFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

gb = GradientBoostingRegressor()

# to avoid having same UnitNumber in both sets
cv = GroupKFold(5)

param_grid = { "alpha" : [.75, .9],
               "n_estimators" : [500],
               "learning_rate" :  [.01],
                "max_depth" : [4, 5, 6]
             }

optimized_gb= GridSearchCV(estimator=gb,
                            cv = cv,
                            param_grid=param_grid,
                            scoring='neg_mean_squared_error',
                            verbose = 1,
                            n_jobs = -1)

optimized_gb.fit(X, y, groups = train_df.UnitNumber)

In [None]:
y_test_pred = optimized_gb.predict(X_test)
print("[Test] Optimized GB Mean Squared Error: ", mean_squared_error(y_test, y_test_pred))
print("[Test] Optimized GB Mean Absolute Error: ", mean_absolute_error(y_test, y_test_pred))
print("[Test] Optimized GB r-squared: ", r2_score(y_test, y_test_pred))

#### Exercise: Train Another Cross Validated Model

Based on the previous examples, use cross validation to find a good
set of hyper-parameters and benchmark on the test dataset another scikit-learn regression model ([see documentation](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning)), for example choose between:
1. `sklearn.svm.SVR`: [Epsilon-Support Vector Regression](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR)
2. `sklearn.neural_network.MLPRegressor` : [Multilayer Perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor)
3. **Advanced Track**: use `GradientBoostingRegressor` but add some engineering features in order to try to improve the previous result (e.g. moving averages and standard deviations of features)

In [None]:
# space for the exercise solution

### Recurrent Neural Network Model for Regression

All the previous machine learning model can only take tabular data, i.e.
features at a given timestep $t$. Information about the previous time
step can only be include by means of clever feature engineering.

Deep Neural Networks, in particular Recurrent architectures such
as LSTM [[6,7]](#References), can be used
to automatically learn feature transformation on sequence data
for a given task.

Here, a basic example of training a LSTM based model for
predicting the RUL is provided, based on
the Keras-based implementation from [[2]](#References).

In [None]:
from tensorflow import keras
from tensorflow.keras.models import Sequential,load_model
from tensorflow.keras.layers import Dense, Dropout, LSTM, Activation

# define path to save model
model_path = 'regression_model.h5'

In [None]:
# pick a large window size of 50 cycles
seq_length = 50

# generator to reshape features into (samples, time steps, features) 
def gen_sequence(unit_number_df, seq_length = seq_length, seq_cols = features):
    """ Only sequences that meet the window-length are considered, no padding is used. This means for testing
    we need to drop those which are below the window-length. An alternative would be to pad sequences so that
    we can use shorter ones """
    # for one unit number I put all the rows in a single matrix
    data_matrix = unit_number_df[seq_cols].values
    num_elements = data_matrix.shape[0]
    # Iterate over two lists in parallel.
    # For example UnitNumber 1 have 192 rows and sequence_length is equal to 50
    # so zip iterate over two following list of numbers
    # 0 50 -> from row 0 to row 50
    # 1 51 -> from row 1 to row 51
    # 2 52 -> from row 2 to row 52
    # ...
    # 111 191 -> from row 111 to 191
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        yield data_matrix[start:stop, :]

# test, it has to be 142 (192-seq_length)
val=list(gen_sequence(X[train_df['UnitNumber']==1]))
print(len(val))


In [None]:
# generator for the sequences
# transform each UnitNumber
# of the train dataset in a sequence
seq_gen = (list(gen_sequence(X[train_df['UnitNumber']== un])) 
           for un in train_df['UnitNumber'].unique())

# convert sequences and convert to numpy array
seq_array = np.concatenate(list(seq_gen)).astype(np.float32)
print(seq_array.shape)

In [None]:
# function to generate labels
def gen_labels(unit_number_df, seq_length = seq_length,
               label = ["target_RUL"]):
    """ Only sequences that meet the window-length are considered, no padding is used. This means for testing
    we need to drop those which are below the window-length. An alternative would be to pad sequences so that
    we can use shorter ones """
    # For one id I put all the labels in a single matrix.
    # For example:
    # [[1]
    # [4]
    # [1]
    # [5]
    # [9]
    # ...
    # [200]] 
    data_matrix = unit_number_df[label].values
    num_elements = data_matrix.shape[0]
    # I have to remove the first seq_length labels
    # because for one id the first sequence of seq_length size have as target
    # the last label (the previus ones are discarded).
    # All the next id's sequences will have associated step by step one label as target.
    return data_matrix[seq_length:num_elements, :]

# generate labels
label_gen = [gen_labels(train_df[train_df['UnitNumber'] == un]) 
             for un in train_df['UnitNumber'].unique()]

label_array = np.concatenate(label_gen).astype(np.float32)
label_array.shape

In [None]:
# Next, we build a deep network. 
# The first layer is an LSTM layer with 100 units followed by another LSTM layer with 50 units. 
# Dropout is also applied after each LSTM layer to control overfitting. 
# Final layer is a Dense output layer with single unit and linear activation since this is a regression problem.
nb_features = seq_array.shape[2]
nb_out = label_array.shape[1]

model = Sequential()
model.add(LSTM(
         input_shape=(seq_length, nb_features),
         units=100,
         return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
          units=50,
          return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(units=nb_out))
model.add(Activation("linear"))
model.compile(loss='mean_squared_error', optimizer='rmsprop',metrics=['mae'])

print(model.summary())

In [None]:
# fit the network
history = model.fit(seq_array, label_array, epochs=100, batch_size=200, validation_split=0.05, verbose=2,
          callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=0, mode='min'),
                       keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', save_best_only=True, mode='min', verbose=0)]
)

In [None]:
fig, axs = plt.subplots(2,1, figsize=(10,10))

axs[0].plot(history.history['mean_absolute_error'])
axs[0].plot(history.history['val_mean_absolute_error'])
axs[0].set_title('model MAE')
axs[0].legend(['train', 'test'], loc='upper right')
axs[0].set_xlabel('epoch')


axs[1].plot(history.history['loss'])
axs[1].plot(history.history['val_loss'])
axs[1].set_title('model MSE (loss)')
axs[1].legend(['train', 'test'], loc='upper right')
axs[1].set_xlabel('epoch')

y_pred = model.predict(seq_array,verbose=1, batch_size=200)
y = label_array
print("[Train] LSTM Regression Mean Squared Error: ", mean_squared_error(y, y_pred))
print("[Train] LSTM Regression Mean Absolute Error: ", mean_absolute_error(y, y_pred))
print("[Train] LSTM Regression r-squared: ", r2_score(y, y_pred))

#### Test Set Evaluation

The real world performance is better evaluate over the set
of data that has not been used for training.

In [None]:
# pick the last sequence for each id in the test data
sequence_cols = features
seq_array_test_last = [X_test[test_df['UnitNumber']==un][sequence_cols].values[-seq_length:] 
                       for un in test_df['UnitNumber'].unique() if len(test_df[test_df['UnitNumber']== un]) >= seq_length]

seq_array_test_last = np.asarray(seq_array_test_last).astype(np.float32)

# similarly, we pick the labels
y_mask = [len(test_df[test_df['UnitNumber']==un]) >= seq_length for un in test_df['UnitNumber'].unique()]
label_array_test_last = test_df.groupby('UnitNumber')['target_RUL'].nth(-1)[y_mask].values
label_array_test_last = label_array_test_last.reshape(label_array_test_last.shape[0],1).astype(np.float32)

print(seq_array_test_last.shape)
print(label_array_test_last.shape)

In [None]:
import os
# if best iteration's model was saved then load and use it
if os.path.isfile(model_path):
    estimator = load_model(model_path)

    y_pred_test = estimator.predict(seq_array_test_last, batch_size=200)
    y_true_test = label_array_test_last

In [None]:
print("[Test] LSTM Regression Mean Squared Error: ", mean_squared_error(y_true_test, y_pred_test))
print("[Test] LSTM Regression Mean Absolute Error: ", mean_absolute_error(y_true_test, y_pred_test))
print("[Test] LSTM Regression r-squared: ", r2_score(y_true_test, y_pred_test))

In [None]:
# Plot in blue color the predicted data and in green color the
# actual data to verify visually the accuracy of the model.
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(y_pred_test, color="blue")
ax.plot(y_true_test, color="green")
ax.set_title('prediction on unseem data')
ax.set_ylabel('RUL value')
ax.set_xlabel('row')
ax.legend(['predicted', 'actual data'], loc='lower left')

### Predict Failures using Classification

**Will the unit fail within a certain time-frame (i.e. number of cycles)?**

This can be though of a classification problem,
where the boolean target is whether the unit will fail
within the next $n$ timesteps (e.g. 15 cycles).

The goal in a (soft) classification problem is to find a function
$f_C(\boldsymbol{x})$ that approximates probabilities of belonging
to a set of classes or categories $y$. The
target in this problem will be the whether the engine
will fail in the next 15 timesteps, while $\boldsymbol{x}$
will be the input features (e.g. sensor readings).

To measure the goodness of our classification function, we need
to define and score function, which will be also the loss
function for the machine learning techniques considered.
We will be considering the binary cross entropy:
$$ \textrm{BCE} = \sum_{i=1}^{n} (y \log \left(f_C(\boldsymbol{x})\right) - (1-y) y \log \left(1-f_C(\boldsymbol{x}) \right )$$
which is one of the most common losses used for binary
classification, and could be extended also to the 
multiclass/multilabel problem.


In [None]:
from sklearn.model_selection import train_test_split

# we can keep the same input features, we have only to compute the
# new target (we will only use the train set for classification)
cycles = 15
train_df['Target_15_Cycles'] = np.where(train_df['target_RUL'] <= cycles, 1, 0 )

y_clf = train_df["Target_15_Cycles"]

X_clf_train, X_clf_test, y_clf_train, y_clf_test = train_test_split(X, y_clf, test_size=0.2, random_state=1234)

train_df.tail()

#### Gradient Boosting for Classification

Similarly to what was done for the regression problem and given
the good performance observed for the Gradient Boosting model,
we will train a gradient boosted model for classification.


In [None]:
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier

gb_clf = GradientBoostingClassifier()

# to avoid having same UnitNumber in both sets
cv = KFold(3)

param_grid = { "n_estimators" : [500],
               "learning_rate" :  [.01],
                "max_depth" : [6]
             }

optimized_gb_clf = GridSearchCV(estimator=gb_clf,
                            cv = cv,
                            param_grid=param_grid,
                            verbose = 1,
                            n_jobs = -1)

optimized_gb_clf.fit(X_clf_train, y_clf_train)

In [None]:
from sklearn.metrics import roc_auc_score, accuracy_score, precision_score, recall_score, confusion_matrix
from sklearn.metrics import classification_report

y_test_clf_proba = optimized_gb_clf.predict_proba(X_clf_test)[:, 1]
y_test_clf_pred = optimized_gb_clf.predict(X_clf_test)

print("Confusion Matrix:")
print(confusion_matrix(y_clf_test,y_test_clf_pred))
print("Gradient Boosting Classifier Accuracy: "+"{:.1%}".format(accuracy_score(y_clf_test,y_test_clf_pred)));
print("Gradient Boosting Classifier Precision: "+"{:.1%}".format(precision_score(y_clf_test,y_test_clf_pred)));
print("Gradient Boosting Classifier Recall: "+"{:.1%}".format(recall_score(y_clf_test,y_test_clf_pred)));
print("Classification Report:")
print(classification_report(y_clf_test,y_test_clf_pred))


In [None]:
from sklearn import metrics

fpr, tpr, threshold = metrics.roc_curve(y_clf_test,y_test_clf_proba)
roc_auc = metrics.auc(fpr, tpr)

fig, ax = plt.subplots()

ax.set_title('Receiver Operating Characteristic Curve')
ax.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
ax.legend(loc = 'lower right')
ax.plot([0, 1], [0, 1],'r--')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_ylabel('True Positive Rate')
ax.set_xlabel('False Positive Rate')


#### Advanced Exercise: Recurrent Neural Network Model for Classification

Based on the previous regression example and the classification
target, train and evaluate a model for predictive
maintenance classification using a LSTM.

In [None]:
# space for the exercise solution

In [None]:
# more space for the exercise solution

## References

This notebook is heavily based on these resources on the topic:

- [1] [*Predictive Maintenance Template*](https://gallery.azure.ai/Collection/Predictive-Maintenance-Template-3)  by Microsoft Azure ML Team 
- [2] [*Predictive Maintenance using LSTM*](https://github.com/umbertogriffo/Predictive-Maintenance-using-LSTM) by Umberto Griffo
- [3] [*Predictive Maintenance ML (IIOT)*](https://www.kaggle.com/billstuart/predictive-maintenance-ml-iiot) by Bill Stuart

The dataset used was provided by NASA:

- [4] A. Saxena and K. Goebel (2008). [*Turbofan Engine Degradation Simulation Data Set*](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan), NASA Ames Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA


Other resources and tutorials on Predictive Maintenance:

- [5] [*Predictive Maintenance Modelling Guide*](https://gallery.azure.ai/Collection/Predictive-Maintenance-Implementation-Guide-1) by Fidan Boyly Uz

Two good really good posts on the concept and usefulness of Recurrent Neural Networks for sequence data:

- [6] [*The Unreasonable Effectiveness of Recurrent Neural Networks*](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy

- [7] [*Understanding LSTM Networks*](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Christopher Olah

This recent papers (with code) use this dataset in combination
with more advanced Deep Learning architectures and data augmentation
to achieve state of the art (SOTA):

- [8] S. Theng et al. [*Long short-term
memory network for remaining useful life estimation*](https://ieeexplore.ieee.org/document/7998311) in Proc. IEEE International Conference on Prognostics and Health

- [9] L. Jayasinghe et al. [*Temporal Convolutional Memory Networks for
Remaining Useful Life Estimation of Industrial Machinery*](https://github.com/LahiruJayasinghe/RUL-Net) in IEEE International Conference on Industrial Technology (ICIT2019)

Not many books on this topic, there is a book on ML for IOT (free 1 month subscription online), yet the
advanced ML chapter do not include specific IoT applications:

- [10] [Hands-On Artificial Intelligence for IoT](https://www.packtpub.com/big-data-and-business-intelligence/hands-artificial-intelligence-iot?utm_source=github&utm_medium=repository&utm_campaign=9781788836067) by Amita Kapoor (2019)