## Predictive Maintenance: Remaining Useful Life (RUL) Prediction

### 1. Problem Statement
Build a machine learning system that predicts the Remaining Useful Life (RUL) of turbofan engines using multivariate sensor data from the NASA CMAPSS dataset. The goal is to estimate how many operating cycles an engine has left before failure, enabling proactive maintenance, reduced downtime, and improved operational efficiency.

### 2. Data Collection
Turbofan Engine Degradation Simulation: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/
Engine degradation simulation was carried out using the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS). Four different sets were simulated under different combinations of operational conditions and fault modes. This records several sensor channels to characterize fault evolution. The data set was provided by the NASA Ames Prognostics Center of Excellence (PCoE).

#### 2.1 Import Data and Required Packages
Importing necessary libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Processing data files, from text to csv format

Train Dataset:

In [15]:
col_names = [
    "engine_id", "cycle",
    "op_setting_1", "op_setting_2", "op_setting_3",
] + [f"sensor_{i}" for i in range(1, 22)]

train_df = pd.read_csv("../data/CMAPSSData/train_FD001.txt", sep=" ", header=None)

In [16]:
train_df.shape

(20631, 28)

In [17]:
train_df.columns

Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27],
      dtype='int64')

In [18]:
train_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,18,19,20,21,22,23,24,25,26,27
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419,,
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236,,
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442,,
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739,,
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044,,


In [19]:
train_df.iloc[:, 26:].isnull().all()

26    True
27    True
dtype: bool

In [20]:
train_df.drop(train_df.columns[[26,27]], axis=1, inplace=True)

In [21]:
train_df.columns = col_names

In [22]:
train_df.head()

Unnamed: 0,engine_id,cycle,op_setting_1,op_setting_2,op_setting_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_12,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,521.66,2388.02,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,522.28,2388.07,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,522.42,2388.03,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,522.86,2388.08,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,522.19,2388.04,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044


In [13]:
train_df.to_csv("../data/data_csv/train_FD001.csv", index=False)

Test Dataset:

In [25]:
test_df = pd.read_csv("../data/CMAPSSData/test_FD001.txt", sep=" ", header=None)
test_df.drop(test_df.columns[[26,27]], axis=1, inplace=True)
test_df.columns = col_names
test_df.to_csv("../data/data_csv/test_FD001.csv", index=False)

In [37]:
test_df.shape

(13096, 26)

In [38]:
test_df.head()

Unnamed: 0,engine_id,cycle,op_setting_1,op_setting_2,op_setting_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_12,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21
0,1,1,0.0023,0.0003,100.0,518.67,643.02,1585.29,1398.21,14.62,...,521.72,2388.03,8125.55,8.4052,0.03,392,2388,100.0,38.86,23.3735
1,1,2,-0.0027,-0.0003,100.0,518.67,641.71,1588.45,1395.42,14.62,...,522.16,2388.06,8139.62,8.3803,0.03,393,2388,100.0,39.02,23.3916
2,1,3,0.0003,0.0001,100.0,518.67,642.46,1586.94,1401.34,14.62,...,521.97,2388.03,8130.1,8.4441,0.03,393,2388,100.0,39.08,23.4166
3,1,4,0.0042,0.0,100.0,518.67,642.44,1584.12,1406.42,14.62,...,521.38,2388.05,8132.9,8.3917,0.03,391,2388,100.0,39.0,23.3737
4,1,5,0.0014,0.0,100.0,518.67,642.51,1587.19,1401.92,14.62,...,522.15,2388.03,8129.54,8.4031,0.03,390,2388,100.0,38.99,23.413


RUL - Groundtruth

In [None]:
rul_df = pd.read_csv("../data/CMAPSSData/RUL_FD001.txt", sep=" ", header=None)
rul_df.drop(rul_df.columns[[1]], axis=1, inplace=True)
rul_df.columns = ["RUL"]
rul_df.to_csv("../data/data_csv/rul_FD001.csv", index=False)

In [None]:
rul_df.shape


In [39]:
rul_df.head()

Unnamed: 0,RUL
0,112
1,98
2,69
3,82
4,91
