# Predicitve maintenance 101

author: Junyoung Park at KAIST

<br/>
<center><img src="img/jet_engine.jpg" width="400" height="400"><em>Jet engine on test</em></center>

Predictive maintenance techniques are designed to help determine the condition of in-service equipment in order to estimate when maintenance should be performed. Such techniques can help bussines to cut down the cost for regular inspection for the eqiupment or schedule and prepare the replacement for the expected failure. 

In this project, we will use jet engine run-to-fail dataset that contains multiple failure trajectories of jet-engine with 21 sensors and 3 controllable settings. Our primary goal of the project is building an algorithm that predicts failure timing under the given histroical sensor observations and controllable inputs. For simplicity of the problem, we will assume the data itself is determnisitic.

In [2]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

## Data description

Jet engine run-to-fail dataset contains 100 realization of jet engine failures. Each trajectory of failure were measured with __evenly spaced__ observation spans. 
* __id__ and __cycle__ shows the id and cycles of experiments.
* __setting i__ (i = 1,2,3) indicates __controllable__ operational conditions of each experiment at the specific cycle
* __s i__ (i = 1,2, ... 21) are sensor measurements. 
* __ttf__ (=time-to-failure) denotes how many cycles remain before failure.
* __broken__ is a binary indicator which shows the jet enginer is broken.

In this project, we will assume that the state of jet engine is fully inferreable from the trajcetory of __setting i__ and __s i__.

In [16]:
df = pd.read_csv('C:/data/project/jet_engine_dataset.csv')
df.head(5)

FileNotFoundError: File b'C:/data/project/jet_engine_dataset.csv' does not exist

In [10]:
experiment_indices = df['id'].unique()
print("Number of experiemnts: {} ".format(len(experiment_indices)))

NameError: name 'df' is not defined

In [None]:
visualize_n = 20

fig, ax = plt.subplots(1,1, figsize=(15,5))
for i in experiment_indices[:visualize_n]:
    id_cond = df['id'] == i
    brk_cond = df['broken'] == 1
    _plot = ax.plot(df[id_cond]['cycle'], df[id_cond]['ttf'], color='blue')
    _scatter = ax.scatter(df[id_cond & brk_cond]['cycle'], df[id_cond & brk_cond]['ttf'], c='orange', label='broken')
ax.set_title('Jet enginer run-to-failure experiments ({} experiments)'.format(visualize_n))
ax.set_xlabel('cycles')
ax.set_ylabel('time-to-failure (residual cycles)')
ax.legend((_scatter,), ('failure',))
plt.show()

## Task description

Build an algorithm that predicts when the given jet engine would fail. i.e. predicts remaining useful lifetime __(RUL)__ of the jet engine.

Things you can earn additional credits:
* Build an algorithm that infer the state of jet engine from sensor measurements only. one possible way is to train classification algorithm that classifies whether the jet engine has failure or not.

* Perform __ROC__ (Receiver Operating Characteristic) analysis with the trained classifier.


## Scoring guide

We will __not measure__ the performance of prediction algorithms. instead our main concern would be:

* Proper data preparation were done? (if necessary, data cleansing and feature engineering)
* Preliminary analysis on data was done? (qualitative, quantitative either methods would be fine)
* The logical explanation on model selection including architectures of neural network
* Selecting properly designed loss functions
* Training result analysis
* whether the trained model is __practically beneficial__?


__practically beneficial__ means whether you can deploy your model in practice. For instance, at certain moment you cannot get the future sensor observations. However, you can presume that controllable inputs are depending on your hand. Therefore, your model cannot use the future sensor observations when they decide the  time-to-failure at runtime but the model can use future trajectory of controllable inputs.

In [1]:
df['s1']

NameError: name 'df' is not defined

In [None]:
def preprocess(df):
    df = drop_features(df)
    df = fill_age(df)
    df = fill_embarked_cabin(df)
    df = fill_numbers(df)
    df = simplify_cabin(df)
    df = encode_features(df, ['Sex', 'Cabin', 'Embarked'])
    
    return df
    
def drop_features(df):
    return df.drop(['Name', 'Ticket'], axis = 1)

def fill_age(df):
    df['Age'] = df['Age'].fillna(df['Age'].median())
    return df

def fill_embarked_cabin(df):
    df['Cabin'] = df['Cabin'].fillna('N')
    df['Embarked'] = df['Embarked'].fillna('N')
    return df

def fill_numbers(df):
    for col in df.select_dtypes(['float', 'int']).columns:
        df[col] = df[col].fillna(df[col].mean())
   
    return df

def simplify_cabin(df):
    df['Cabin'] = df['Cabin'].apply(lambda x: x[0])
    return df
