In [1]:
%matplotlib inline

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA

import skimage

import scipy.stats as st
from scipy.signal import find_peaks, peak_widths

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from statsmodels.graphics.mosaicplot import mosaic

In [14]:
RANDOM_STATE = 26

INPUT_DATA_MACHINE = "data/ai4i2020.csv"
META_DATA_MACHINE = "data/machine_meta_data.xlsx"

INPUT_DATA_METRO = r"../../data/dataset_train.csv"
META_DATA_METRO = "data/metro_meta_data.xlsx"

# Predictive Maintenance

## Abstract
TODO

## Introduction

### Overview

Machines and equipment must be periodically inspected and serviced. Parts are replaced and consumables are replenished. One strategy is to repair the equipment only after a break down occurs; this is called **corrective / reactive maintenance**. If, on the other hand, this maintenance is done in a scheduled points in time (say every month) we call it **preventive / scheduled maintenance**.

Both strategies have drawbacks (e.g. over-maintenance: equipment gets maintained whether it needs it or not). A more scientific alternative is to determine the condition of the machine and forecast the time left till the next failure; this is called **predictive maintenance**. Good definition is also [this one](https://www.pwc.de/de/digitale-transformation/digital-factories-2020-shaping-the-future-of-manufacturing.pdf):
> Remote monitoring of dynamic condition of machines with help of sensor data and big data analytics to predict maintenance and repair situations. This helps to increase resource  availability and optimize maintenance efforts.

### Meta Data

Meta data (e.g. columns descriptions) is prepared and stored in an external excel file.

### Datasets

We will be using two datasets. The first one deals with a machine that produces parts. The second one is regarding the metro train system in Porto, Portugal. 

## Case 1: Machine maintenance

### Data

#### Description

The first data set contains data from a parts-machining process. It is one of the few suitable datasets on the topic that I have managed to find. The data is not measured from an actual process, it was artificially generated using a set of rules. Here are the [description](https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset), [paper](https://ieeexplore.ieee.org/document/9253083), and [data](https://archive.ics.uci.edu/ml/machine-learning-databases/00601/ai4i2020.csv).

#### Columns

The original columns are read from the excel meta data file. A `group` column is used for easier filtering later. Full description of the variables and how they are generated is in the data description (see link above).

In [4]:
machine_meta = pd.read_excel(META_DATA_MACHINE, sheet_name = "features")

In [6]:
original_columns = machine_meta[["original_name", "group", "description_short"]]
original_columns

Unnamed: 0,original_name,group,description_short
0,UDI,na,observation identifier
1,Product ID,product,product numberc (type + unique serial number)
2,Type,product,"product type (H, M, L)"
3,Air temperature [K],process,ambient air temperature in K
4,Process temperature [K],process,process temperature in K
5,Rotational speed [rpm],process,motor speed in rpm
6,Torque [Nm],process,motor torque in Nm
7,Tool wear [min],process,accumulated machining tool wear in min
8,Machine failure,failure,flag that any of the 5 failures has occurred
9,TWF,failure,tool wear failure


### Clean and tidy

#### Read

In [9]:
machine = pd.read_csv(INPUT_DATA_MACHINE)

#### Get to know

In [10]:
machine.columns

Index(['UDI', 'Product ID', 'Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]', 'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF',
       'RNF'],
      dtype='object')

In [11]:
machine.dtypes

UDI                          int64
Product ID                  object
Type                        object
Air temperature [K]        float64
Process temperature [K]    float64
Rotational speed [rpm]       int64
Torque [Nm]                float64
Tool wear [min]              int64
Machine failure              int64
TWF                          int64
HDF                          int64
PWF                          int64
OSF                          int64
RNF                          int64
dtype: object

In [12]:
machine.shape

(10000, 14)

In [13]:
machine.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
UDI,10000.0,5000.5,2886.89568,1.0,2500.75,5000.5,7500.25,10000.0
Air temperature [K],10000.0,300.00493,2.000259,295.3,298.3,300.1,301.5,304.5
Process temperature [K],10000.0,310.00556,1.483734,305.7,308.8,310.1,311.1,313.8
Rotational speed [rpm],10000.0,1538.7761,179.284096,1168.0,1423.0,1503.0,1612.0,2886.0
Torque [Nm],10000.0,39.98691,9.968934,3.8,33.2,40.1,46.8,76.6
Tool wear [min],10000.0,107.951,63.654147,0.0,53.0,108.0,162.0,253.0
Machine failure,10000.0,0.0339,0.180981,0.0,0.0,0.0,0.0,1.0
TWF,10000.0,0.0046,0.067671,0.0,0.0,0.0,0.0,1.0
HDF,10000.0,0.0115,0.106625,0.0,0.0,0.0,0.0,1.0
PWF,10000.0,0.0095,0.097009,0.0,0.0,0.0,0.0,1.0


#### Select features



In [None]:
columns_to_keep = machine_meta[machine_meta.keep == "yes"].original_name
machine = machine[columns_to_keep]

In [16]:
machine_meta[machine_meta.keep == "no"].original_name

0           UDI
1    Product ID
Name: original_name, dtype: object

#### Check for missing values

No missing values.

In [12]:
machine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Type                     10000 non-null  object 
 1   Air temperature [K]      10000 non-null  float64
 2   Process temperature [K]  10000 non-null  float64
 3   Rotational speed [rpm]   10000 non-null  int64  
 4   Torque [Nm]              10000 non-null  float64
 5   Tool wear [min]          10000 non-null  int64  
 6   Machine failure          10000 non-null  int64  
 7   TWF                      10000 non-null  int64  
 8   HDF                      10000 non-null  int64  
 9   PWF                      10000 non-null  int64  
 10  OSF                      10000 non-null  int64  
 11  RNF                      10000 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 937.6+ KB


In [13]:
machine.isna().sum()

Type                       0
Air temperature [K]        0
Process temperature [K]    0
Rotational speed [rpm]     0
Torque [Nm]                0
Tool wear [min]            0
Machine failure            0
TWF                        0
HDF                        0
PWF                        0
OSF                        0
RNF                        0
dtype: int64

### Rename columns

In [14]:
new_column_names = machine_meta[machine_meta.keep == "yes"].new_name
new_column_names.name = ""
machine.columns = new_column_names

### Convert types

Convert temperatures from K to C

In [15]:
machine.columns

Index(['product_type', 'air_temperature', 'process_temperature', 'speed',
       'torque', 'tool_wear', 'failure_any', 'failure_toolwear',
       'failure_heatdissipation', 'failure_power', 'failure_overstrain',
       'failure_random'],
      dtype='object', name='')

In [16]:
machine[["air_temperature", "process_temperature"]] = machine[["air_temperature", "process_temperature"]] - 273.15

Convert to category.

In [17]:
machine.product_type = machine.product_type.astype("category")

### Create new feature `elapsed_time`
Total duration of machining calculated by adding up all `tool_wear` durations.

In minutes.

In [18]:
elapsed_time_data = [0]
start_duration = 0

for t in machine.tool_wear[1:]:
    if t == 0:
        start_duration = elapsed_time_data[-1]
    elapsed_time_data.append(t + start_duration)

In [19]:
machine["elapsed_time"] = pd.to_timedelta(elapsed_time_data, unit = "minutes")

### Create new feature `power`
Calculated motor power. Shaft angular speed expressed in radians per second, T expressed in newton-meters.

In [20]:
# Convert speed from rpm to rad
speed_rad = machine.speed * 0.104719755

machine["power"] = speed_rad * machine.torque

### Reorder columns

In [21]:
machine = machine[[
    "elapsed_time",
    "product_type",
    "air_temperature",
    "process_temperature",
    "speed",
    "torque",
    "power",
    "tool_wear",
    "failure_any",
    "failure_toolwear",
    "failure_heatdissipation",
    "failure_power",
    "failure_overstrain",
    "failure_random",
]]

### Explore

### Test hypothesis

### Model

### Columns


In [5]:
original_columns = machine_meta[["original_name", "group", "description_short"]]
original_columns

Unnamed: 0,original_name,group,description_short
0,UDI,na,observation identifier
1,Product ID,product,product numberc (type + unique serial number)
2,Type,product,"product type (H, M, L)"
3,Air temperature [K],process,ambient air temperature in K
4,Process temperature [K],process,process temperature in K
5,Rotational speed [rpm],process,motor speed in rpm
6,Torque [Nm],process,motor torque in Nm
7,Tool wear [min],process,accumulated machining tool wear in min
8,Machine failure,failure,flag that any of the 5 failures has occurred
9,TWF,failure,tool wear failure


## Case 2: Metro in Porto

Column desription:

Articles and papers I came across during the research:
- [Fault diagnosis for the Space Shuttle main engine](https://www.researchgate.net/publication/23618683_Fault_diagnosis_for_the_Space_Shuttle_main_engine)
- [AI-Automated Hard-Hat Detection](https://assets-global.website-files.com/618cdeef45d18e4ef2fd85f3/621cef628758fd1c35be832b_AI-Automated-Hard-Hat-Detection.pdf)
- [Machine learning algorithm predicts how to get the most out of electric vehicle batteries
](https://www.cam.ac.uk/research/news/machine-learning-algorithm-predicts-how-to-get-the-most-out-of-electric-vehicle-batteries)
- ! [Machine Learning Application in the Manufacturing Industry](https://mobidev.biz/blog/machine-learning-application-use-cases-manufacturing-industry)
- [Using Python and Selenium to get coordinates from street addresses](https://towardsdatascience.com/using-python-and-selenium-to-get-coordinates-from-street-addresses-62706b6ac250)
- [Smoothing time series in Pandas](https://www.mikulskibartosz.name/smoothing-time-series-in-pandas/)

Videos:
- [AI Inspection: Machine Learning / Computer Vision for Visual Defect Detection](https://youtu.be/UY6xbrcViVw)

nondestructive testing, sensors

Predict machinery failure using sensor to proactively perform maintenance and reduce downtime.

From [here](https://mobidev.biz/blog/machine-learning-application-use-cases-manufacturing-industry):
> ML MODELS USED FOR PREDICTIVE MAINTENANCE
> - Regression Models: these predict the Remaining Useful Life (RUL) of the equipment. This uses historical and static data and allows manufacturers to see how many days are left until the machine experiences a failure.
> - Classification Models: these models predict failures within a predefined time span.
> - Anomaly Detection Models: These flag devices upon detecting abnormal system behavior.