<a href="https://colab.research.google.com/github/sachinkun21/Predictive_Maintenance_using_ML_Microsoft_CaseStudy/blob/master/CaculateTimeSinceLastTransaction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Days Since Last Replacement from Maintenance
A crucial data set in this example is the maintenance records which contain the information of component replacement records. Possible features from this data set can be, for example, the number of replacements of each component in the last 3 months to incorporate the frequency of replacements. However, more relevent information would be to calculate how long it has been since a component is last replaced as that would be expected to correlate better with component failures since the longer a component is used, the more degradation should be expected.

As a side note, creating lagging features from maintenance data is not as straightforward as for telemetry and errors, so the features from this data are generated in a more custom way. This type of ad-hoc feature engineering is very common in predictive maintenance since domain knowledge plays a big role in understanding the predictors of a problem. In the following, the days since last component replacement are calculated for each component type as features from the maintenance data.

In [1]:
import pandas as pd
import numpy as np

from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

maint = pd.read_csv('My Drive/PDM_using_ML_Microsoft_CaseStudy/data/PdM_maint.csv')
maint.head()

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).
/gdrive


Unnamed: 0,datetime,machineID,comp
0,2018-06-01 06:00:00,1,comp2
1,2018-07-16 06:00:00,1,comp4
2,2018-07-31 06:00:00,1,comp3
3,2018-12-13 06:00:00,1,comp1
4,2019-01-05 06:00:00,1,comp4


In [2]:
maint.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3286 entries, 0 to 3285
Data columns (total 3 columns):
datetime     3286 non-null object
machineID    3286 non-null int64
comp         3286 non-null object
dtypes: int64(1), object(2)
memory usage: 77.1+ KB


In [0]:
maint['datetime'] = pd.to_datetime(maint['datetime'], format='%Y-%m-%d %H:%M:%S')

In [4]:
# create a column for each error type
comp_rep = pd.get_dummies(maint.set_index('datetime')).reset_index()
comp_rep.columns = ['datetime', 'machineID', 'comp1', 'comp2', 'comp3', 'comp4']

# combine repairs for a given machine in a given hour
comp_rep = comp_rep.groupby(['machineID', 'datetime']).sum().reset_index()
comp_rep.head()

Unnamed: 0,machineID,datetime,comp1,comp2,comp3,comp4
0,1,2018-06-01 06:00:00,0,1,0,0
1,1,2018-07-16 06:00:00,0,0,0,1
2,1,2018-07-31 06:00:00,0,0,1,0
3,1,2018-12-13 06:00:00,1,0,0,0
4,1,2019-01-05 06:00:00,1,0,0,1


Generate Date Time and Merge

In [5]:
components = ['comp1', 'comp2', 'comp3', 'comp4']

for comp in components:
    # convert indicator to most recent date of component change 
    comp_rep.loc[comp_rep[comp] < 1, comp] = None
    
    # Now We will replace all the Non Null values with Unix TimeStamp using datetime column
    comp_rep.loc[-comp_rep[comp].isnull(), comp] = comp_rep.loc[-comp_rep[comp].isnull(), 'datetime']
    
    # Forward-fill the most-recent date of Component change
    comp_rep[comp] = comp_rep[comp].fillna(method='ffill')


comp_rep.head()

Unnamed: 0,machineID,datetime,comp1,comp2,comp3,comp4
0,1,2018-06-01 06:00:00,,1527832800000000000,,
1,1,2018-07-16 06:00:00,,1527832800000000000,,1.531721e+18
2,1,2018-07-31 06:00:00,,1527832800000000000,1.533017e+18,1.531721e+18
3,1,2018-12-13 06:00:00,1.544681e+18,1527832800000000000,1.533017e+18,1.531721e+18
4,1,2019-01-05 06:00:00,1.546668e+18,1527832800000000000,1.533017e+18,1.546668e+18


In [6]:
# remove dates in 2018 (may have NaN or future component change dates)    
comp_rep = comp_rep.loc[comp_rep['datetime'] > pd.to_datetime('2019-01-01')]
comp_rep.head()

Unnamed: 0,machineID,datetime,comp1,comp2,comp3,comp4
4,1,2019-01-05 06:00:00,1.546668e+18,1527832800000000000,1.533017e+18,1.546668e+18
5,1,2019-01-20 06:00:00,1.547964e+18,1527832800000000000,1.547964e+18,1.546668e+18
6,1,2019-02-04 06:00:00,1.547964e+18,1527832800000000000,1.54926e+18,1.54926e+18
7,1,2019-02-19 06:00:00,1.547964e+18,1527832800000000000,1.550556e+18,1.54926e+18
8,1,2019-03-06 06:00:00,1.551852e+18,1527832800000000000,1.550556e+18,1.54926e+18


In [7]:
# replace dates of most recent component change with Number of days since most recent component change
for comp in components:
    comp_rep[comp] = (comp_rep['datetime'].astype(int) - comp_rep[comp]) / (24*60*60*1000000000)
    #comp_rep[comp] =  pd.to_timedelta(comp_rep[comp].astype(str), unit='D')
comp_rep.head()

Unnamed: 0,machineID,datetime,comp1,comp2,comp3,comp4
4,1,2019-01-05 06:00:00,0.0,218.0,158.0,0.0
5,1,2019-01-20 06:00:00,0.0,233.0,0.0,15.0
6,1,2019-02-04 06:00:00,15.0,248.0,0.0,0.0
7,1,2019-02-19 06:00:00,30.0,263.0,0.0,15.0
8,1,2019-03-06 06:00:00,0.0,278.0,15.0,30.0
