# Introduction to Predictive Maintenance using Deep Learning
## Fault Classification using deep learning with Keras

#### Author Nagdev Amruthnath
Date: 1/10/2019

##### Citation Info
If you are using this for your research, please use the following for citation. 

Amruthnath, Nagdev, and Tarun Gupta. "A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance." In 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pp. 355-361. IEEE, 2018.

##### Disclaimer
This is a tutorial for performing fault detection using machine learning. You this code at your own risk. I do not gurantee that this would work as shown below. If you have any suggestions please branch this project.

## Introduction
This is the first of four part demostration series of using machine learning for predictive maintenance.   

The area of predictive maintenance has taken a lot of prominence in the last couple of years due to various reasons. With new algorithms and methodologies growing across different learning methods, it has remained a challenge for industries to adopt which method is fit, robust and provide most accurate detection. One the most common learning approaches used today for fault diagnosis is supervised learning. This is wholly based on the predictor variable and response variable. In this tutorial, we will be looking into deep learning models for fault classification using keras package in R.


## Load libraries 

In [1]:
#install custom Ensemble Package
devtools::install_github("nagdevAmruthnath/EnsembleML")
library(EnsembleML)

Skipping install of 'EnsembleML' from a github remote, the SHA1 (e4318dbf) has not changed since last install.
  Use `force = TRUE` to force installation


In [2]:
options(warn=-1)

# load libraries
library(caret)
library(dplyr)
library(EnsembleML)

Loading required package: lattice
Loading required package: ggplot2

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



## Load data
Here we are using data from a bench press. There are total of four different states in this machine and they are split into four different csv files. We need to load the data first. In the data time represents the time between samples, ax is the acceleration on x axis, ay is the acceleration on y axis, az is the acceleration on z axis and at is the G's. The data was collected at sample rate of 100hz.   

Four different states of the machine were collected  
1. Nothing attached to drill press
2. Wooden base attached to drill press
3. Imbalance created by adding weight to one end of wooden base
4. Imbalacne created by adding weight to two ends of wooden base.

In [3]:
setwd("/home")
#read csv files
file1 = read.csv("dry run.csv", sep=",", header =T)
file2 = read.csv("base.csv", sep=",", header =T)
file3 = read.csv("imbalance 1.csv", sep=",", header =T)
file4 = read.csv("imbalance 2.csv", sep=",", header =T)

#Add labels to data
file1$label = 1
file2$label = 2
file3$label = 3
file4$label = 4

#view top rows of data
#head(file1)

We can look at the summary of each file using summary function in R. Below, we can observe that 66 seconds long data is available. We also have min, max and mean for each of the variables. 

In [4]:
# summary of each file
summary(file1)

      time              ax                 ay                  az         
 Min.   : 0.002   Min.   :-2.11880   Min.   :-2.143600   Min.   :-4.1744  
 1st Qu.:16.507   1st Qu.:-0.41478   1st Qu.:-0.625250   1st Qu.:-0.7359  
 Median :33.044   Median : 0.02960   Median :-0.022050   Median :-0.1468  
 Mean   :33.037   Mean   : 0.01233   Mean   : 0.008697   Mean   :-0.1021  
 3rd Qu.:49.535   3rd Qu.: 0.46003   3rd Qu.: 0.641700   3rd Qu.: 0.4298  
 Max.   :66.033   Max.   : 2.09620   Max.   : 2.003000   Max.   : 4.9466  
       aT            label  
 Min.   :0.032   Min.   :1  
 1st Qu.:0.848   1st Qu.:1  
 Median :1.169   Median :1  
 Mean   :1.277   Mean   :1  
 3rd Qu.:1.579   3rd Qu.:1  
 Max.   :5.013   Max.   :1  

## Data Aggregration and feature extraction
Here, the data is aggregated by 1 minute and features are extracted. Features are extracted to reduce the dimension of the data and only storing the representation of the data. 

In [5]:
file1$group = as.factor(round(file1$time))
file2$group = as.factor(round(file2$time))
file3$group = as.factor(round(file3$time))
file4$group = as.factor(round(file4$time))
#(file1,20)

#list of all files
files = list(file1, file2, file3, file4)

#loop through all files and combine
features = NULL
for (i in 1:4){
res = files[[i]] %>%
    group_by(group) %>%
    summarize(ax_mean = mean(ax),
              ax_sd = sd(ax),
              ax_min = min(ax),
              ax_max = max(ax),
              ax_median = median(ax),
              ay_mean = mean(ay),
              ay_sd = sd(ay),
              ay_min = min(ay),
              ay_may = max(ay),
              ay_median = median(ay),
              az_mean = mean(az),
              az_sd = sd(az),
              az_min = min(az),
              az_maz = max(az),
              az_median = median(az),
              aT_mean = mean(aT),
              aT_sd = sd(aT),
              aT_min = min(aT),
              aT_maT = max(aT),
              aT_median = median(aT),
              label = mean(label)
             )
    features = rbind(features, res)
} %>% as.data.frame()

#view all features
features$label = ifelse(features$label==1, "OK", ifelse(features$label==2, "base", ifelse(features$label==3, "Imb-1", "Imb-2")))
features = features %>% na.omit() %>% mutate_at('label',as.factor)  %>% as.data.frame()

#head(features)

## Create sample size for training the model
From the information, we know that we have four states in the data. Based on this information, the data is split into train and test samples. The train set is used to build the model and test set is used to validate the model. The ratio between train and test is 80:20. You can adjust this based on type of data. The below table shows the number of observations for each group.   

Note: It is adviced to have atleast 30 samples for each group. 

In [6]:
table(features$label)



 base Imb-1 Imb-2    OK 
  109    93    93    67 

From the above results, we can observe that there are atleast 30 samples for each group. Now, we can used this data to split into train and test set. 

In [12]:
#create samples of 80:20 ratio
sample = sample(nrow(features) , nrow(features)* 0.70)

train1 = features[sample,2:ncol(features)]
test1 = features[-sample,2:ncol(features)]


In [13]:
is.factor(train1[,'label'])

## Ensemble Modeling

### Fault Classification using Ensemble Models
Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. The main causes of error in learning models are due to noise, bias and variance.
Ensemble methods help to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms.

### Train Multiple models at once

In [15]:
mm = multipleModels(train = train1, test = test1, y = "label", models = c("parRF", "knn", "svmRadial", "nnet"))
mm

# weights:  129
initial  value 372.183353 
iter  10 value 302.084608
iter  20 value 224.113374
iter  30 value 145.606094
iter  40 value 87.015573
iter  50 value 64.412881
iter  60 value 62.601315
iter  70 value 61.770846
iter  80 value 61.605244
iter  90 value 61.318439
iter 100 value 61.036910
final  value 61.036910 
stopped after 100 iterations
# weights:  129
initial  value 417.012160 
iter  10 value 319.687250
iter  20 value 163.294440
iter  30 value 85.300481
iter  40 value 65.658585
iter  50 value 63.225166
iter  60 value 62.684881
iter  70 value 62.401495
iter  80 value 62.294629
iter  90 value 62.255201
iter 100 value 62.230713
final  value 62.230713 
stopped after 100 iterations
# weights:  129
initial  value 445.656478 
iter  10 value 290.807505
iter  20 value 171.806323
iter  30 value 90.048886
iter  40 value 65.562345
iter  50 value 63.919257
iter  60 value 63.582862
iter  70 value 63.302945
iter  80 value 62.265786
iter  90 value 61.482850
iter 100 value 60.910019
final  v

$summary
           Accuracy     Kappa AccuracyLower AccuracyUpper AccuracyNull
parRF     0.9633028 0.9501372     0.9087017     0.9899123    0.3211009
knn       0.9633028 0.9501372     0.9087017     0.9899123    0.3211009
svmRadial 0.9633028 0.9501372     0.9087017     0.9899123    0.3211009
nnet      0.9633028 0.9501372     0.9087017     0.9899123    0.3211009
          AccuracyPValue McnemarPValue
parRF       1.895382e-46           NaN
knn         1.895382e-46           NaN
svmRadial   1.895382e-46           NaN
nnet        1.895382e-46           NaN

$models
$models$parRF
Neural Network 

253 samples
 20 predictor
  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times) 
Summary of sample sizes: 226, 228, 227, 228, 228, 228, ... 
Resampling results across tuning parameters:

  size  decay  Accuracy   Kappa    
  1     0e+00  0.8071300  0.7338399
  1     1e-04  0.8692250  0.8206072
  1     1e-01  0.9298206  0.9050334
  3

In the above training, multiple models were trained and the trained tuning parametrics are as shown above. Now we can use the above trained models to create an ensemble. 

#### Ensemble training
In Ensemble training, multiple models are trained in the previous step as input and are fed to a new model to predict the out come. ensembleTrain() function can be used as follows. The classification output is a confusion matrix and the results are as follows. 

In [16]:
em = ensembleTrain(mm, train =train1, test =test1, y = "label", emsembleModelTrain = "rf")
em

$summary
Confusion Matrix and Statistics

          Reference
Prediction base Imb-1 Imb-2 OK
     base    35     0     0  1
     Imb-1    0    29     0  1
     Imb-2    0     2    24  0
     OK       0     0     0 17

Overall Statistics
                                          
               Accuracy : 0.9633          
                 95% CI : (0.9087, 0.9899)
    No Information Rate : 0.3211          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9501          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: base Class: Imb-1 Class: Imb-2 Class: OK
Sensitivity               1.0000       0.9355       1.0000    0.8947
Specificity               0.9865       0.9872       0.9765    1.0000
Pos Pred Value            0.9722       0.9667       0.9231    1.0000
Neg Pred Value            1.0000       0.9747       1.0000    0.9783
Prevalen

Thanks for staying till here. 

### Conclusion
Ensemble models are pretty famous and used everywhere. They always don't promise high accuracy but, the results can sometimes be unsual. Please note that the models are very difficult to interpret and sometimes unexplainable. 