## Data Prerpocessing for PHM08 Data Set
In this notebook, we create two types of training datasets for the PHM08 dataset. The first dataset is for a regression or timeseries forecasting model which has `RUL (remining useful lifetime)` column. The second dataset is for a classification model which has `FLAG` column which indicates "1" as a failure and "0" as a non-failure.

In [None]:
# load data from raw data
import pandas as pd
import numpy as np
names = ["unit", "time", "settings1", "settings2", "settings3", "sensor1", "sensor2", "sensor3", "sensor4", "sensor5", "sensor6", "sensor7", "sensor8", "sensor9", "sensor10", "sensor11", "sensor12", "sensor13", "sensor14", "sensor15", "sensor16", "sensor17", "sensor18", "sensor19", "sensor20", "sensor21", "dummy1", "dummy2"]
df = pd.read_csv("../data/raw/PHM08/train.txt", delimiter=" ",header=None,names=names,index_col=None)
df.head(10)

In [None]:
# delete unnecessary columns
df = df.drop(["dummy1", "dummy2"], axis="columns")
print(df.columns)

In [None]:
# check types of columns
df["unit"] = df["unit"].astype(str)
print(df.dtypes)

In [None]:
# total # of unit number 
unit_list = df["unit"].unique()
total_unit = len(unit_list)
print("# of total unit:", total_unit)

In [None]:
# explore data of UnitNum==1
df[df["unit"] == "1"]

In [None]:
# explore lifetime for each unit
life = []
for x in range(total_unit):
    filtered = df[df["unit"] == str(x+1)]
    life.append(filtered["time"].max())
print(life)

In [None]:
# visualize lifetime by histogram
%matplotlib inline
import matplotlib.pyplot as plt

plt.hist(life, bins=20)

In [None]:
# create RUL (remaining useful lifetime) column
for unit in range(total_unit+1):
    df.loc[df["unit"] == str(unit), "RUL"] = df[df["unit"] == str(unit)]["time"].max() - df[df["unit"] == str(unit)]["time"]
df.tail()

In [None]:
# "1" for the last 30 cycles before failure (RUL = 0), "0" otherwise.
df.loc[df["RUL"] <= 30, "FLAG"] = "1"
df.loc[df["RUL"] > 30, "FLAG"] = "0"

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.to_csv("../data/processed/data.csv", index=None)