#  Predictive Maintenance using an RNN with Watson Studio
   
Machine learning models for predictive maintenance predict equipment failure before it happens avoiding unplanned downtime costs resulting from failure and avoiding in some cases,  hundreds of thousands of dollars per day depending on industry.

Most equipment will have sensors that generate time series data that can be used to build a machine learning model to predict failure. There are different techniques to building model with this kind of data.

This Watson Studio lab  will demonstrate one such technique building a classification model using an  RNN with LSTM to predict machine failure within a specific time horizon (e.g. the next 10 days). RNNs work well with time series data as they can ingest sequences of data and find predictive capability in these ordered sequences (as opposed to models that just ingest unordered pieces of data).  The data used to train the model comes from NASA and was released to the general public. It has testing and training data that includes sensor data for aircraft engines and failure data for each engine in a time series. It was downloaded from [this NASA website](https://c3.nasa.gov/dashlink/resources/139/)

## Setup
    
1. Download the file with the NASA data from [here](https://raw.githubusercontent.com/ibm-ai-education/predictive-maintenance-classification-lab/master/data/nasa-pm-data.zip) to your local system. The name of the file is nasa-pm-data.zip

2. Unzip the file in an empty folder on your system
  
3. Click on the data icon  at the top right of the notebook window and then select and upload the following 3 files one by one

  * train_FD001.csv
  * test_FD001.csv
  * RUL_FD001.csv

![Data icon](https://raw.githubusercontent.com/ibm-ai-education/predictive-maintenance-classification-lab/master/images/ss6.png) 

    
4. Once the files are uploaded, run each cell in the notebook after reading the description of what is being done with each cell.  For cells that instruct you to insert code to create a Dataframe for a file, put the cursor in the cell and then select the following from the data area on the right 

      **Insert to code->Insert pandas Dataframe**

![Insert code](https://raw.githubusercontent.com/ibm-ai-education/predictive-maintenance-classification-lab/master/images/ss7.png)
    

In [None]:
# All required imports
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.utils import class_weight

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
from keras.callbacks import EarlyStopping

import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline  

In [None]:
# With your cursor in this cell, insert the code to read the train_FD001.csv dataset into a DataFrame as instructed in 
# Step 4) in the 1st  cell of this notebook




In [None]:
# IMPORTANT: df_train must be set to  the variable created for the dataframe in the cell abaove. 
df_train = df_data_?

In [None]:
# With your cursor in this cell, insert the code to read the test_FD001.csv dataset into a DataFrame as instructed in
# Step 4) in the 1st  cell of this notebook


In [None]:
# IMPORTANT: df_test must be set to  the variable created for the dataframe in the cell abaove.
df_test = df_data_?

In [None]:
# With your cursor in this cell, insert the code to read the RUL_FD001.csv dataset into a DataFrame as instructed in
# Step 4) in the 1st  cell of this notebook



In [None]:
# IMPORTANT: df_test_ground_truth must be set to  the variable created for the dataframe in the cell abaove.
df_test_ground_truth = df_data_?

###  Prepare training data
We need to add a column to the training data that indicates the time to failure  of each row and a column indicating whether or not the time to failure is less than or equal to our time horizon (ie 10 time periods)

In [None]:
# Our time horizon to predict failure
time_horizon = 10
# Add time to failure column
df_train['ttf'] = df_train.groupby(['engine_id'])['elapsed_time'].transform(max)-df_train['elapsed_time']
# Add label indicating  failure within our time horizon
df_train['failed_within_time_horizon'] = df_train['ttf'].apply(lambda x: 1 if x <= time_horizon else 0)
df_train.head()

### Prepare test data
We need to add a column to the test  data that indicates the time to failure of each row and a column indicating whether or not the time to failure is less than or equal to our time horizon (ie 20 time periods). This is a bit more complicated than doing this for the training data as the failure information for the test data is in a separate dataframe (ie `df_test_ground_truth`)

In [None]:
# Get the last recorded time in the test data for each machine
lrt = pd.DataFrame(df_test.groupby('engine_id')['elapsed_time'].max()).reset_index()
lrt.columns = ['engine_id', 'last_recorded_time']
lrt.head()

In [None]:
# Calculate actual time of failure for test data 
df_test_ground_truth['time_of_failure']=df_test_ground_truth['time_to_failure'] + lrt['last_recorded_time']
df_test_ground_truth.head()

In [None]:
# Merge ground truth data into test data and calculate time to failure
df_test_ground_truth.drop('time_to_failure', axis=1, inplace=True)
df_test=df_test.merge(df_test_ground_truth,on=['engine_id'],how='left')
df_test['ttf']=df_test['time_of_failure'] - df_test['elapsed_time']
df_test.drop('time_of_failure', axis=1, inplace=True)


# Add label indicating  failure within our time horizon
df_test['failed_within_time_horizon'] = df_test['ttf'].apply(lambda x: 1 if x <= time_horizon else 0)
df_test.head()

### Prepare data for LSTM
LSTM requires the training and testing data in the form of an array of sequences.  We'll also normalize the  feature values to prevent large  data values from  unduly influencing our model. 

In [None]:
# Column names for convenience
feature_columns = ['elapsed_time_norm','setting1','setting2','setting3','sensor1','sensor2','sensor3','sensor4','sensor5','sensor6','sensor7','sensor8','sensor9','sensor10','sensor11','sensor12','sensor13','sensor14','sensor15','sensor16','sensor17','sensor18','sensor19','sensor20','sensor21']
target_column = 'failed_within_time_horizon'
key_columns = ['engine_id','elapsed_time']

In [None]:
# Scale training and testing data
#scaler=StandardScaler()
scaler=MinMaxScaler()
df_train_scaled = df_train.copy()
df_test_scaled = df_test.copy()

df_train_scaled['elapsed_time_norm'] = df_train_scaled['elapsed_time']
df_test_scaled['elapsed_time_norm'] = df_test_scaled['elapsed_time']
df_train_scaled[feature_columns]=scaler.fit_transform(df_train_scaled[feature_columns])
df_test_scaled[feature_columns]=scaler.transform(df_test_scaled[feature_columns])

In [None]:
# Functions to reshape the training and testing data for LSTM
def gen_sequence(input_df, seq_length, seq_cols):
    df_zeros=pd.DataFrame(np.zeros((seq_length-1,input_df.shape[1])),columns=input_df.columns)
    input_df=df_zeros.append(input_df,ignore_index=True)
    data_array = input_df[seq_cols].values
    num_elements = data_array.shape[0]
    lstm_array=[]
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        lstm_array.append(data_array[start:stop, :])
    return np.array(lstm_array)

# function to generate labels
def gen_label(input_df, seq_length, seq_cols,label):
    df_zeros=pd.DataFrame(np.zeros((seq_length-1,input_df.shape[1])),columns=input_df.columns)
    input_df=df_zeros.append(input_df,ignore_index=True)
    data_array = input_df[seq_cols].values
    num_elements = data_array.shape[0]
    y_label=[]
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        y_label.append(input_df[label][stop])
    return np.array(y_label)

# function to generate key mapping from generated data to original data
def gen_keymap(input_df, seq_length, keys):
    df_zeros=pd.DataFrame(np.zeros((seq_length-1,input_df.shape[1])),columns=input_df.columns)
    input_df=df_zeros.append(input_df,ignore_index=True)
    data_array = input_df[keys].values
    num_elements = data_array.shape[0]
    y_keys=[]
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        y_keys.append([input_df[keys[0]][stop], input_df[keys[1]][stop]])
    return np.array(y_keys)

In [None]:
# Generate training data using the reshaping functions
sequence_length = 50
x_train=np.concatenate(list(list(gen_sequence(df_train_scaled[df_train_scaled['engine_id']==id], sequence_length, feature_columns)) for id in df_train_scaled['engine_id'].unique()))
print(x_train.shape)

# generate y_train
y_train=np.concatenate(list(list(gen_label(df_train_scaled[df_train_scaled['engine_id']==id], sequence_length, feature_columns,target_column)) for id in df_train_scaled['engine_id'].unique()))
print(y_train.shape)


In [None]:
# Generate test data using the reshaping functions
x_test=np.concatenate(list(list(gen_sequence(df_test_scaled[df_test_scaled['engine_id']==id], sequence_length, feature_columns)) for id in df_test_scaled['engine_id'].unique()))
print(x_test.shape)

# generate y_test
y_test=np.concatenate(list(list(gen_label(df_test_scaled[df_test_scaled['engine_id']==id], sequence_length, feature_columns,target_column)) for id in df_test_scaled['engine_id'].unique()))
print(y_test.shape)

# Generate keymap to map reshaped test data to original test data
x_test_keymap=np.concatenate(list(list(gen_keymap(df_test_scaled[df_test_scaled['engine_id']==id], sequence_length, key_columns)) for id in df_test_scaled['engine_id'].unique()))
print(x_test_keymap.shape)



In [None]:
# We use weights because there are relatively few failures in the dataset. Weights allow the cost function 
# to penalize wrong predictions for the  sparse label more. It;s imporatnt to do this when false negatives
# cost the organization more than false positives 
class_weights =  dict(enumerate(class_weight.compute_class_weight('balanced',
                                                 np.unique(y_train),
                                                 y_train)))
print(class_weights)

### Build LSTM Model
We'll build the LSTM model using Keras. 
We use two LSTM layers each followwed by a Droput layer (to avoid overfitting) followed by a Dense layer that uses sigmoid activation (because we've framed  this as a Classification problem).

In [None]:
# Create the model in Keras
number_of_features = x_train.shape[2]

model = Sequential()

model.add(LSTM(
         input_shape=(sequence_length, number_of_features),
         units=2*sequence_length,
         return_sequences=True))
model.add(Dropout(0.3))


model.add(LSTM(
          units=sequence_length,
          return_sequences=False))

model.add(Dropout(0.2))

model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

### Apply Model to Training Data
Next we'll apply the model to our training data. Note this step  will take about 10 minutes to complete. Now is a good time to grap some coffee.

In [None]:
# Build model with training data
history=model.fit(x_train, y_train, epochs=10, batch_size=4*sequence_length, validation_split=0.30, verbose=1, class_weight=class_weights,
     callbacks = [EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=0, mode='auto')])
#history=model.fit(x_train, y_train, epochs=10, batch_size=4*sequence_length, validation_split=0.30, verbose=1, 
#        callbacks = [EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=0, mode='auto')])

### Analyze model and run with test data
Here we'll analyze the model's performance. We'll look at both train9ng and validation data by epaoch to see if the model appears to converge over time

In [None]:
# Plot validation and training loss vs epoch number
plt.plot(history.history['loss'], label='training loss')
plt.plot(history.history['val_loss'], label='validation loss')
plt.legend()
plt.show()

In [None]:
# training metrics
scores = model.evaluate(x_train, y_train, verbose=1, batch_size=200)
print('Accuracy: {}'.format(scores[1]))

### Run test data through model
We'll look at how the modle the test data set (data it has never seen before to assess how well the model generalizes to data outside the training set)

In [None]:
# Run test data through model, compute  accuuracy and print a confusion matrix. 
y_pred=model.predict_classes(x_test)
print('Accuracy of model on test data: ',accuracy_score(y_test,y_pred))
print('Confusion Matrix: \n',confusion_matrix(y_test,y_pred))

**Note:** Normally for predictive maintenance the most important value in the confusion matrix is the nunber at the bottom left. This is the number of false negatives which represents machine failure that is incorrectly predicted as not failing during the time horizon. The number at the top right is the number of false positives. This represents a prediction of failure from the model when there was not a failure. This tends to be less costlty because it would lead to premature maintenance on a piece of equipment  that  would evertually be serviced anyway.

### See which engines we missed our predictions for 

We'll generate a list of indexes for false postivies and false negatives and map them back to the test data frame

In [None]:
# Get false positives and false positives
y_pred_flat = y_pred.reshape(-1)
false_positives = np.flatnonzero(np.asarray(y_pred_flat>y_test)).tolist()
false_negatives = np.flatnonzero(np.asarray(y_pred_flat<y_test)).tolist()  
print(f"{len(false_positives)} false positives")
print(f"{len(false_negatives)} false negatives")


In the next cell the details of the  false positives are displayed. The ttf column shows the impact  of following the model's predictions. Subtract 10 from this number to see how much earlier  the model is telling us to service the machines in this list. If the  value is 16 for example it means our model causes us to service that machine 6 time units earlier than we actually need to. This is typically cheaper than false negatives where the model fails to predict machine failure before they actually occured.

In [None]:
# False positives
# Copy structure of df_test but not data
df_false_positives = df_test[0:0]
# Use the generated keymap for the test data to look up corresponding data in test data frame
for i in range(len(false_positives)):
    df_false_positives = df_false_positives.append(df_test[(df_test[key_columns[0]] == x_test_keymap[false_positives[i]][0]) & (df_test[key_columns[1]] == x_test_keymap[false_positives[i]][1])])
df_false_positives.head(df_false_positives.shape[0])

In [None]:
# False negatives
# Copy structure of df_test but not data
df_false_negatives = df_test[0:0]
# Use the generated keymap for the test data to look up corresponding data in test data frame
for i in range(len(false_negatives)):
    df_false_negatives = df_false_negatives.append(df_test[(df_test[key_columns[0]] == x_test_keymap[false_negatives[i]][0]) & (df_test[key_columns[1]] == x_test_keymap[false_negatives[i]][1])])
df_false_negatives.head(df_false_negatives.shape[0])

### Summary
Congratulations ! You've gone through an example of using a Recurrent Neural Network to build a predictive maintenance model framed as a classification problem where we try to predict which pieces of equipment will fail in a given time horizon. There are other approaches to predictive  maintenance like using regression to predict the remaining useful life (RUL) of a piece of equipmnet or  using anomaly detection to quickly identify outlier sensor  data that is correlated with machine failure.